Re: vacuum verbose detail logs are unclear; log at *start* of each stage

2020-01-25 Thread Justin Pryzby
On Wed, Jan 22, 2020 at 02:34:57PM +0900, Michael Paquier wrote:
> From patch 0003:
> /*
> +* Indent multi-line DETAIL if being sent to client (verbose)
> +* We don't know if it's sent to the client (client_min_messages);
> +* Also, that affects output to the logfile, too; assume that it's 
> more
> +* important to format messages requested by the client than to make
> +* verbose logs pretty when also sent to the logfile.
> +*/
> +   msgprefix = elevel==INFO ? "!\t" : "";
> Such stuff gets a -1 from me.  This is not project-like, and you make
> the translation of those messages much harder than they should be.

I don't see why its harder to translate ?  Do you mean because it changes the
strings by adding %s ?

-   appendStringInfo(, ngettext("%u page is entirely empty.\n",
-   "%u 
pages are entirely empty.\n",
+   appendStringInfo(, ngettext("%s%u page is entirely empty.\n",
+   "%s%u 
pages are entirely empty.\n",
...

I did raise two questions regarding translation:

I'm not sure why this one doesn't use get ngettext() ?  Seems to have been
missed at a8d585c0.
|appendStringInfo(, _("There were %.0f unused item identifiers.\n"),

Or why this one does use _/gettext() ?  (580ddcec suggests that I'm missing
something, but I just experimented, and it really seems to do nothing, since
"%s" shouldn't be translated).
|appendStringInfo(, _("%s."), pg_rusage_show());

Also, I realized it's possible to write different strings to the log vs the
client (with and without a prefix) by calling errdetail_internal() and
errdetail_log().

Here's a version rebased on top of f942dfb9, and making use of errdetail_log.
I'm not sure if it address your concern about translation, but it doesn't
change strings.

I think it's not needed or desirable to change what's written to the logfile,
since CSV logs have a separate "detail" field, and text logs are indented.  The
server log is unchanged:

> 2020-01-25 23:08:40.451 CST [13971] INFO:  "t": removed 0, found 160 
> nonremovable row versions in 1 out of 888 pages
> 2020-01-25 23:08:40.451 CST [13971] DETAIL:  0 dead row versions cannot be 
> removed yet, oldest xmin: 781
> There were 0 unused item identifiers.
> Skipped 0 pages due to buffer pins, 444 frozen pages.
> 0 pages are entirely empty.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.

If VERBOSE, then the client log has ! prefixes, with the style borrowed from
ShowUsage:

> INFO:  "t": removed 0, found 160 nonremovable row versions in 1 out of 888 
> pages
> DETAIL:  ! 0 dead row versions cannot be removed yet, oldest xmin: 781
> ! There were 0 unused item identifiers.
> ! Skipped 0 pages due to buffer pins, 444 frozen pages.
> ! 0 pages are entirely empty.
> ! CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.

I mentioned before that maybe the client's messages with newlines should be
indented similarly to how the they're done in the text logfile.  I looked,
that's append_with_tabs() in elog.c.  So that's a different possible
implementation, which would apply to any message with newlines (or possibly
just DETAIL).

I'll also fork the allvisible/frozen/hintbits patches to a separate thread.

Thanks,
Justin
>From a3d0b41435655615ab13f808ec7c30e53e596e50 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 25 Jan 2020 21:25:37 -0600
Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce

---
 src/backend/access/heap/vacuumlazy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..8e8ea9d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1690,7 +1690,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
-	appendStringInfo(, _("%s."), pg_rusage_show());
+	appendStringInfo(, "%s.", pg_rusage_show());
 
 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
-- 
2.7.4

>From 2db7c4e3482120b2a83cda74603f2454da7eaa03 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 25 Jan 2020 22:50:46 -0600
Subject: [PATCH v3 2/4] vacuum verbose: use ngettext() everywhere possible

---
 src/backend/access/heap/vacuumlazy.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8e8ea9d..eb9

Re: explain HashAggregate to report bucket and memory stats

2020-01-26 Thread Justin Pryzby
On Sun, Jan 26, 2020 at 08:14:25AM -0600, Justin Pryzby wrote:
> On Fri, Jan 03, 2020 at 10:19:25AM -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the 
> > > Aggregate
> > > node would report...anything.
> > 
> > Find attached my WIP attempt to implement this.
> > 
> > Jeff: can you suggest what details Aggregate should show ?
> 
> Rebased on top of 10013684970453a0ddc86050bba813c64321
> And added https://commitfest.postgresql.org/27/2428/

Attached for real.
>From 42634353f44ab44a06ffe4ae36a0dcbb848408ca Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 1 Jan 2020 13:09:33 -0600
Subject: [PATCH v1 1/2] refactor: show_hinstrument and avoid showing memory
 use if not verbose..

This changes explain analyze at least for Hash(join), but doesn't affect
regression tests, since all the HashJoin tests seem to run explain without
analyze, so nbatch=0, and no stats are shown.

But for future patch to show stats for HashAgg (for which nbatch=1, always), we
want to show buckets in explain analyze, but don't want to show memory, since
it's machine-specific.
---
 src/backend/commands/explain.c | 79 +++---
 1 file changed, 52 insertions(+), 27 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f523adb..11b5857 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -103,6 +103,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_hinstrument(ExplainState *es, HashInstrumentation *h);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -2713,43 +2714,67 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		}
 	}
 
-	if (hinstrument.nbatch > 0)
+	show_hinstrument(es, );
+}
+
+/*
+ * Show hash bucket stats and (optionally) memory.
+ */
+static void
+show_hinstrument(ExplainState *es, HashInstrumentation *h)
+{
+	long		spacePeakKb = (h->space_peak + 1023) / 1024;
+
+	// Currently, this isn't shown for explain of hash(join) since nbatch=0 without analyze
+	// But, it's shown for hashAgg since nbatch=1, always.
+	// Need to 1) avoid showing memory use if !analyze; and, 2) avoid memory use if not verbose
+
+	if (h->nbatch <= 0)
+		return;
+	/* This avoids showing anything if it's explain without analyze; should we just check that, instead ? */
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Hash Buckets", NULL,
+			   h->nbuckets, es);
+		ExplainPropertyInteger("Original Hash Buckets", NULL,
+			   h->nbuckets_original, es);
+		ExplainPropertyInteger("Hash Batches", NULL,
+			   h->nbatch, es);
+		ExplainPropertyInteger("Original Hash Batches", NULL,
+			   h->nbatch_original, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB",
+			   spacePeakKb, es);
+	}
+	else
 	{
-		long		spacePeakKb = (hinstrument.space_peak + 1023) / 1024;
 
-		if (es->format != EXPLAIN_FORMAT_TEXT)
-		{
-			ExplainPropertyInteger("Hash Buckets", NULL,
-   hinstrument.nbuckets, es);
-			ExplainPropertyInteger("Original Hash Buckets", NULL,
-   hinstrument.nbuckets_original, es);
-			ExplainPropertyInteger("Hash Batches", NULL,
-   hinstrument.nbatch, es);
-			ExplainPropertyInteger("Original Hash Batches", NULL,
-   hinstrument.nbatch_original, es);
-			ExplainPropertyInteger("Peak Memory Usage", "kB",
-   spacePeakKb, es);
-		}
-		else if (hinstrument.nbatch_original != hinstrument.nbatch ||
- hinstrument.nbuckets_original != hinstrument.nbuckets)
-		{
+		if (h->nbatch_original != h->nbatch ||
+			 h->nbuckets_original != h->nbuckets) {
 			ExplainIndentText(es);
 			appendStringInfo(es->str,
-			 "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
-			 hinstrument.nbuckets,
-			 hinstrument.nbuckets_original,
-			 hinstrument.nbatch,
-			 hinstrument.nbatch_original,
-			 spacePeakKb);
+		"Buckets: %d (originally %d)   Batches: %d (originally %d)",
+		h->nbuckets,
+		h->nbuckets_original,
+		h->nbatch,
+	

vacuum verbose: show pages marked allvisible/frozen/hintbits

2020-01-26 Thread Justin Pryzby
I'm forking this thread since it's separate topic, and since keeping in a
single branch hasn't made maintaining the patches any easier.
https://www.postgresql.org/message-id/CAMkU%3D1xAyWnwnLGORBOD%3Dpyv%3DccEkDi%3DwKeyhwF%3DgtB7QxLBwQ%40mail.gmail.com
On Sun, Dec 29, 2019 at 01:15:24PM -0500, Jeff Janes wrote:
> Also, I'd appreciate a report on how many hint-bits were set, and how many
> pages were marked all-visible and/or frozen.  When I do a manual vacuum, it
> is more often for those purposes than it is for removing removable rows
> (which autovac generally does a good enough job of).

The first patch seems simple enough but the 2nd could use critical review.
>From 57eede7d1158904d6b66532c7d0ce6a59803210f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Dec 2019 14:56:02 -0600
Subject: [PATCH v1 1/2] Report number of pages marked allvisible/frozen..

..as requested by Jeff Janes
---
 src/backend/access/heap/vacuumlazy.c | 37 ++--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..9975699 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -283,7 +283,9 @@ typedef struct LVRelStats
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_live_tuples;	/* new estimated total # of live tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
+	BlockNumber pages_removed;	/* Due to truncation */
+	BlockNumber pages_allvisible;
+	BlockNumber pages_frozen;
 	double		tuples_deleted;
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	LVDeadTuples *dead_tuples;
@@ -602,11 +604,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 			 get_namespace_name(RelationGetNamespace(onerel)),
 			 RelationGetRelationName(onerel),
 			 vacrelstats->num_index_scans);
-			appendStringInfo(, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen\n"),
+			appendStringInfo(, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen, %u marked all visible, %u marked frozen\n"),
 			 vacrelstats->pages_removed,
 			 vacrelstats->rel_pages,
 			 vacrelstats->pinskipped_pages,
-			 vacrelstats->frozenskipped_pages);
+			 vacrelstats->frozenskipped_pages,
+			 vacrelstats->pages_allvisible,
+			 vacrelstats->pages_frozen);
 			appendStringInfo(,
 			 _("tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable, oldest xmin: %u\n"),
 			 vacrelstats->tuples_deleted,
@@ -751,6 +755,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
 	vacrelstats->nonempty_pages = 0;
+	vacrelstats->pages_allvisible = 0;
+	vacrelstats->pages_frozen = 0;
+
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
@@ -1170,6 +1177,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
   vmbuffer, InvalidTransactionId,
   VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+vacrelstats->pages_allvisible++;
+vacrelstats->pages_frozen++;
 END_CRIT_SECTION();
 			}
 
@@ -1501,8 +1510,12 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		{
 			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-			if (all_frozen)
+			if (all_frozen) {
 flags |= VISIBILITYMAP_ALL_FROZEN;
+vacrelstats->pages_frozen++;
+			}
+
+			vacrelstats->pages_allvisible++;
 
 			/*
 			 * It should never be the case that the visibility map page is set
@@ -1690,6 +1703,14 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
+	appendStringInfo(, ngettext("Marked %u page all visible, ",
+	"Marked %u pages all visible, ",
+	vacrelstats->pages_allvisible),
+	vacrelstats->pages_allvisible);
+	appendStringInfo(, ngettext("%u page frozen.\n",
+	"%u pages frozen.\n",
+	vacrelstats->pages_frozen),
+	vacrelstats->pages_frozen);
 	appendStringInfo(, _("%s."), pg_rusage_show());
 
 	ereport(elevel,
@@ -1912,10 +1933,14 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 		uint8		flags = 0;
 
 		/* Set the VM all-frozen bit to flag, if needed */
-		if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0) {
 			flags |= VISIBILITYMAP_ALL_VISIBLE;
-		if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0 && all_frozen)
+			vacrelstats->pages_al

Re: explain HashAggregate to report bucket and memory stats

2020-01-26 Thread Justin Pryzby
On Fri, Jan 03, 2020 at 10:19:25AM -0600, Justin Pryzby wrote:
> On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > What would I find very useful is [...] if the HashAggregate node under
> > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > node would report...anything.
> 
> Find attached my WIP attempt to implement this.
> 
> Jeff: can you suggest what details Aggregate should show ?

Rebased on top of 10013684970453a0ddc86050bba813c64321
And added https://commitfest.postgresql.org/27/2428/




Re: error context for vacuum to include block number

2020-01-26 Thread Justin Pryzby
It occured to me that there's an issue with sharing vacrelstats between
scan/vacuum, since blkno and stage are set by the heap/index vacuum routines,
but not reset on their return to heap scan.  Not sure if we should reset them,
or go back to using a separate struct, like it was here:
https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com

On Sun, Jan 26, 2020 at 11:38:13PM -0600, Justin Pryzby wrote:
> From 592a77554f99b5ff9035c55bf19a79a1443ae59e Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Thu, 12 Dec 2019 20:54:37 -0600
> Subject: [PATCH v14 2/3] vacuum errcontext to show block being processed
> 
> As requested here.
> https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
> ---
>  src/backend/access/heap/vacuumlazy.c | 85 
> +++-
>  1 file changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/access/heap/vacuumlazy.c 
> b/src/backend/access/heap/vacuumlazy.c
> index 114428b..a62dc79 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -290,8 +290,14 @@ typedef struct LVRelStats
>   int num_index_scans;
>   TransactionId latestRemovedXid;
>   boollock_waiter_detected;
> -} LVRelStats;
>  
> + /* Used by the error callback */
> + char*relname;
> + char*relnamespace;
> + BlockNumber blkno;
> + char*indname;
> + int stage;  /* 0: scan heap; 1: vacuum heap; 2: 
> vacuum index */
> +} LVRelStats;
>  
>  /* A few variables that don't seem worth passing around as parameters */
>  static int   elevel = -1;
> @@ -360,6 +366,7 @@ static void end_parallel_vacuum(Relation *Irel, 
> IndexBulkDeleteResult **stats,
>   LVParallelState 
> *lps, int nindexes);
>  static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
>  static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
> +static void vacuum_error_callback(void *arg);
>  
>  
>  /*
> @@ -721,6 +728,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   PROGRESS_VACUUM_MAX_DEAD_TUPLES
>   };
>   int64   initprog_val[3];
> + ErrorContextCallback errcallback;
>  
>   pg_rusage_init();
>  
> @@ -867,6 +875,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   else
>   skipping_blocks = false;
>  
> + /* Setup error traceback support for ereport() */
> + vacrelstats.relnamespace = 
> get_namespace_name(RelationGetNamespace(onerel));
> + vacrelstats.relname = relname;
> + vacrelstats.blkno = InvalidBlockNumber; /* Not known yet */
> + vacrelstats.stage = 0;
> +
> + errcallback.callback = vacuum_error_callback;
> + errcallback.arg = (void *) 
> + errcallback.previous = error_context_stack;
> + error_context_stack = 
> +
>   for (blkno = 0; blkno < nblocks; blkno++)
>   {
>   Buffer  buf;
> @@ -888,6 +907,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>  #define FORCE_CHECK_PAGE() \
>   (blkno == nblocks - 1 && should_attempt_truncation(params))
>  
> + vacrelstats.blkno = blkno;
> +
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, 
> blkno);
>  
>   if (blkno == next_unskippable_block)
> @@ -984,12 +1005,18 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   vmbuffer = InvalidBuffer;
>   }
>  
> + /* Pop the error context stack */
> + error_context_stack = errcallback.previous;
> +
>   /* Work on all the indexes, then the heap */
>   lazy_vacuum_all_indexes(onerel, Irel, indstats,
>   lps, 
> nindexes);
>   /* Remove tuples from heap */
>   lazy_vacuum_heap(onerel);
>  
> + /* Replace error context while continuing heap scan */
> + error_context_stack = 
> +
>   /*
>* Forget the now-vacuumed tuples, and press on, but be 
> careful
>* not to reset latestRemovedXid since we want that 
> value to be
> @@ -1593,6 +1620,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   RecordPageWithFreeSpace(onerel, blkno, freespace);
>   }
>  
> + /* Pop the error context stack */
> +   

Re: error context for vacuum to include block number

2020-01-26 Thread Justin Pryzby
On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
> > (VERBOSE, PARALLEL 0) t;
> > INFO:  vacuuming "public.t"
> > DEBUG:  "t_a_idx": vacuuming index
> > 2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
> > statement timeout
> > 2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
> > "public.t_a_idx"
> > 2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 
> > 0) t;
> > ERROR:  canceling statement due to statement timeout
> > CONTEXT:  while vacuuming relation "public.t_a_idx"
> 
> It'd be a bit nicer if it said index "public.t_a_idx" for relation "public.t".

I think that tips the scale in favour of making vacrelstats a global.
I added that as a 1st patch, and squished the callback patches into one.

Also, it seems to me we shouldn't repeat the namespace of the index *and* its
table.  I tried looking for consistency here:

grep -r '\\"%s.%s\\"' --incl='*.c' |grep '\\"%s\\"'
src/backend/commands/cluster.c: (errmsg("clustering 
\"%s.%s\" using index scan on \"%s\"",
src/backend/access/heap/vacuumlazy.c:   errcontext(_("while vacuuming 
index \"%s\" on table \"%s.%s\""),

grep -r 'index \\".* table \\"' --incl='*.c'
src/backend/catalog/index.c:(errmsg("building index 
\"%s\" on table \"%s\" serially",
src/backend/catalog/index.c:
(errmsg_plural("building index \"%s\" on table \"%s\" with request for %d 
parallel worker",
src/backend/catalog/index.c:
   "building index \"%s\" on table \"%s\" with request for %d parallel workers",
src/backend/catalog/catalog.c:   errmsg("index \"%s\" 
does not belong to table \"%s\"",
src/backend/commands/indexcmds.c:   (errmsg("%s %s 
will create implicit index \"%s\" for table \"%s\"",
src/backend/commands/tablecmds.c:errmsg("index 
\"%s\" for table \"%s\" does not exist",
src/backend/commands/tablecmds.c:errmsg("index 
\"%s\" for table \"%s\" does not exist",
src/backend/commands/tablecmds.c:   
 errdetail("The index \"%s\" belongs to a constraint in table \"%s\" but no 
constraint exists for index \"%s\".",
src/backend/commands/cluster.c:  
errmsg("index \"%s\" for table \"%s\" does not exist",
src/backend/parser/parse_utilcmd.c:  
errmsg("index \"%s\" does not belong to table \"%s\"",
>From 8ee9ffc1325118438309ee25e9b33c61cccd022f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 26 Jan 2020 22:38:10 -0600
Subject: [PATCH v14 1/3] make vacrelstats a global

---
 src/backend/access/heap/vacuumlazy.c | 276 +--
 1 file changed, 136 insertions(+), 140 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..114428b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -302,16 +302,17 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
+LVRelStats vacrelstats = {0};
 
 /* non-export function prototypes */
 static void lazy_scan_heap(Relation onerel, VacuumParams *params,
-		   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
+		   Relation *Irel, int nindexes,
 		   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+static void lazy_vacuum_heap(Relation onerel);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	IndexBulkDeleteResult **stats,
-	LVRelStats *vacrelstats, LVParallelState *lps,
+	LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
 			  LVDeadTuples *dead_tuples, double reltuples);
@@ -319,13 +320,11 @@ static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
 			   double reltuples, bool estimated_count);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-			 int tupindex, LVRelStats *vacrelstats, Buff

Re: error context for vacuum to include block number

2020-01-27 Thread Justin Pryzby
On Mon, Jan 27, 2020 at 03:59:58PM +0900, Masahiko Sawada wrote:
> On Mon, 27 Jan 2020 at 14:38, Justin Pryzby  wrote:
> > On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > > > CONTEXT:  while vacuuming relation "public.t_a_idx"
> > >
> > > It'd be a bit nicer if it said index "public.t_a_idx" for relation 
> > > "public.t".
> >
> > I think that tips the scale in favour of making vacrelstats a global.
> > I added that as a 1st patch, and squished the callback patches into one.
> 
> Hmm I don't think it's a good idea to make vacrelstats global. If we
> want to display the relation name and its index name in error context
> we might want to define a new struct dedicated for error context
> reporting. That is it has blkno, stage and relation name and schema
> name for both table and index and then we set these variables of
> callback argument before performing a vacuum phase. We don't change
> LVRelStats at all.

On Mon, Jan 27, 2020 at 12:14:38AM -0600, Justin Pryzby wrote:
> It occured to me that there's an issue with sharing vacrelstats between
> scan/vacuum, since blkno and stage are set by the heap/index vacuum routines,
> but not reset on their return to heap scan.  Not sure if we should reset them,
> or go back to using a separate struct, like it was here:
> https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com

I went back to this, original, way of doing it.
The parallel vacuum patch made it harder to pass the table around :(
And has to be separately tested:

| SET statement_timeout=0; DROP TABLE t; CREATE TABLE t AS SELECT 
generate_series(1,9)a; CREATE INDEX ON t(a); CREATE INDEX ON t(a); UPDATE t 
SET a=1+a; SET statement_timeout=99;VACUUM(VERBOSE, PARALLEL 2) t;

I had to allocate space for the table name within the LVShared struct, not just
a pointer, otherwise it would variously crash or fail to output the index name.
I think pointers can't be passed to parallel process except using some
heavyweight thing like shm_toc_...

I guess the callback could also take the index relid instead of name, and use
something like IndexGetRelation().

> Although the patch replaces get_namespace_name and
> RelationGetRelationName but we use namespace name of relation at only
> two places and almost ereport/elog messages use only relation name
> gotten by RelationGetRelationName which is a macro to access the
> relation name in Relation struct. So I think adding relname to
> LVRelStats would not be a big benefit. Similarly, adding table
> namespace to LVRelStats would be good to avoid calling
> get_namespace_name whereas I'm not sure it's worth to have it because
> it's expected not to be really many times.

Right, I only tried that to save a few LOC and maybe make shorter lines.
It's not important so I'll drop that patch.
>From fb53a62620aab180e8b250be50fefac1b40f50c2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v15 1/2] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 85 +++-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..d11c7af 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,13 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	BlockNumber blkno;
+	int			stage;	/* 0: scan heap; 1: vacuum heap; 2: vacuum index */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +368,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +732,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +880,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.stage = 0;
+
+	errcall

typos in comments and user docs

2020-02-05 Thread Justin Pryzby
I sent earlier version of this a few times last year along with bunch of other
doc patches but it was never picked up.  So maybe I'll try send one at a time
in more digestible chunks.
https://www.postgresql.org/message-id/flat/20190427025647.GD3925%40telsasoft.com#e1731c33455145eadc1158042cc411f9

>From cb5842724330dfcfc914f2e3effdbfe4843be565 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 9 May 2019 21:13:55 -0500
Subject: [PATCH] spelling and typos

---
 doc/src/sgml/bloom.sgml| 2 +-
 doc/src/sgml/config.sgml   | 2 +-
 doc/src/sgml/ref/alter_table.sgml  | 2 +-
 doc/src/sgml/sources.sgml  | 4 ++--
 src/backend/access/transam/README.parallel | 2 +-
 src/backend/storage/buffer/bufmgr.c| 2 +-
 src/backend/storage/sync/sync.c| 2 +-
 src/include/access/tableam.h   | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 6eeadde..c341b65 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -65,7 +65,7 @@
  
   Number of bits generated for each index column. Each parameter's name
   refers to the number of the index column that it controls.  The default
-  is 2 bits and maximum is 4095.  
Parameters for
+  is 2 bits and the maximum is 4095. 
 Parameters for
   index columns not actually used are ignored.
  
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b2c89bd..102698b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4318,7 +4318,7 @@ ANY num_sync 
( numeric_literal, REM
 
  
   SET NOT NULL may only be applied to a column
-  providing none of the records in the table contain a
+  provided none of the records in the table contain a
   NULL value for the column.  Ordinarily this is
   checked during the ALTER TABLE by scanning the
   entire table; however, if a valid CHECK constraint is
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index 5831ec4..b5d28e7 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -511,7 +511,7 @@ Hint:   the addendum
 

 There are functions in the backend that will double-quote their own output
-at need (for example, format_type_be()).  Do not put
+as needed (for example, format_type_be()).  Do not put
 additional quotes around the output of such functions.

 
@@ -880,7 +880,7 @@ BETTER: unrecognized node type: 42
  practices.
 
 
- Features from later revision of the C standard or compiler specific
+ Features from later revisions of the C standard or compiler specific
  features can be used, if a fallback is provided.
 
 
diff --git a/src/backend/access/transam/README.parallel 
b/src/backend/access/transam/README.parallel
index 85e5840..99c588d 100644
--- a/src/backend/access/transam/README.parallel
+++ b/src/backend/access/transam/README.parallel
@@ -169,7 +169,7 @@ differently because of them.  Right now, we don't even 
allow that.
 At the end of a parallel operation, which can happen either because it
 completed successfully or because it was interrupted by an error, parallel
 workers associated with that operation exit.  In the error case, transaction
-abort processing in the parallel leader kills of any remaining workers, and
+abort processing in the parallel leader kills off any remaining workers, and
 the parallel leader then waits for them to die.  In the case of a successful
 parallel operation, the parallel leader does not send any signals, but must
 wait for workers to complete and exit of their own volition.  In either
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index aba3960..5880054 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4291,7 +4291,7 @@ ts_ckpt_progress_comparator(Datum a, Datum b, void *arg)
  *
  * *max_pending is a pointer instead of an immediate value, so the coalesce
  * limits can easily changed by the GUC mechanism, and so calling code does
- * not have to check the current configuration. A value is 0 means that no
+ * not have to check the current configuration. A value of 0 means that no
  * writeback control will be performed.
  */
 void
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 9cb7c65..8282a47 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -216,7 +216,7 @@ SyncPostCheckpoint(void)
 
/*
 * As in ProcessSyncRequests, we don't want to stop absorbing 
fsync
-* requests for along time when there are many deletions to be 
done.
+* requests for a long time when there are many deletions to be 
done.
 * We can safely call AbsorbSyncRequests() at this point in the 
loop
 * (note it might 

Re: ALTER tbl rewrite loses CLUSTER ON index

2020-02-06 Thread Justin Pryzby
I wondered if it wouldn't be better if CLUSTER ON was stored in pg_class as the
Oid of a clustered index, rather than a boolean in pg_index.

That likely would've avoided (or at least exposed) this issue.
And avoids the possibility of having two indices marked as "clustered".
These would be more trivial:
mark_index_clustered
/* We need to find the index that has indisclustered set. */




Re: typos in comments and user docs

2020-02-06 Thread Justin Pryzby
On Thu, Feb 06, 2020 at 04:43:18PM +0530, Amit Kapila wrote:
> On Thu, Feb 6, 2020 at 10:45 AM Michael Paquier  wrote:
> >
> > On Thu, Feb 06, 2020 at 08:47:14AM +0530, Amit Kapila wrote:
> > > Your changes look fine to me on the first read.  I will push this to
> > > HEAD unless there are any objections.   If we want them in
> > > back-branches, we might want to probably segregate the changes based
> > > on the branch until those apply.
> >
> > +1.  It would be nice to back-patch the user-visible changes in the
> > docs.
> >
> 
> Fair enough, Justin, is it possible for you to segregate the changes
> that can be backpatched?

Looks like the whole patch can be applied to master and v12 [0].

My original thread from last year was about docs added in v12, so bloom.sgml is
the only user-facing doc which can be backpatched.  README.parallel and
bufmgr.c changes could be backpatched but I agree it's not necessary.

Note, the bloom typo seems to complete a change that was started here:

|commit 31ff51adc855e3ffe8e3c20e479b8d1a4508feb8
|Author: Alexander Korotkov 
|Date:   Mon Oct 22 00:23:26 2018 +0300
|
|Fix some grammar errors in bloom.sgml
|
|Discussion: 
https://postgr.es/m/CAEepm%3D3sijpGr8tXdyz-7EJJZfhQHABPKEQ29gpnb7-XSy%2B%3D5A%40mail.gmail.com
|Reported-by: Thomas Munro
|Backpatch-through: 9.6

Justin

[0] modulo a fix for a typo which I introduced in another patch in this branch,
which shouldn't have been in this patch; fixed in the attached.
>From a1780229e024e2e4b9a0549bcd516bb80b2d5a8d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 9 May 2019 21:13:55 -0500
Subject: [PATCH] spelling and typos

---
 doc/src/sgml/bloom.sgml| 2 +-
 doc/src/sgml/ref/alter_table.sgml  | 2 +-
 doc/src/sgml/sources.sgml  | 4 ++--
 src/backend/access/transam/README.parallel | 2 +-
 src/backend/storage/buffer/bufmgr.c| 2 +-
 src/backend/storage/sync/sync.c| 2 +-
 src/include/access/tableam.h   | 2 +-
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 6eeadde..c341b65 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -65,7 +65,7 @@
  
   Number of bits generated for each index column. Each parameter's name
   refers to the number of the index column that it controls.  The default
-  is 2 bits and maximum is 4095.  Parameters for
+  is 2 bits and the maximum is 4095.  Parameters for
   index columns not actually used are ignored.
  
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 5de3676..a22770c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -222,7 +222,7 @@ WITH ( MODULUS numeric_literal, REM
 
  
   SET NOT NULL may only be applied to a column
-  providing none of the records in the table contain a
+  provided none of the records in the table contain a
   NULL value for the column.  Ordinarily this is
   checked during the ALTER TABLE by scanning the
   entire table; however, if a valid CHECK constraint is
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index 5831ec4..b5d28e7 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -511,7 +511,7 @@ Hint:   the addendum
 

 There are functions in the backend that will double-quote their own output
-at need (for example, format_type_be()).  Do not put
+as needed (for example, format_type_be()).  Do not put
 additional quotes around the output of such functions.

 
@@ -880,7 +880,7 @@ BETTER: unrecognized node type: 42
  practices.
 
 
- Features from later revision of the C standard or compiler specific
+ Features from later revisions of the C standard or compiler specific
  features can be used, if a fallback is provided.
 
 
diff --git a/src/backend/access/transam/README.parallel b/src/backend/access/transam/README.parallel
index 85e5840..99c588d 100644
--- a/src/backend/access/transam/README.parallel
+++ b/src/backend/access/transam/README.parallel
@@ -169,7 +169,7 @@ differently because of them.  Right now, we don't even allow that.
 At the end of a parallel operation, which can happen either because it
 completed successfully or because it was interrupted by an error, parallel
 workers associated with that operation exit.  In the error case, transaction
-abort processing in the parallel leader kills of any remaining workers, and
+abort processing in the parallel leader kills off any remaining workers, and
 the parallel leader then waits for them to die.  In the case of a successful
 parallel operation, the parallel leader does not send any signals, but must
 wait for workers to complete and exit of their own volition.  In either
diff --git a/src/backend/stor

Re: error context for vacuum to include block number

2020-02-01 Thread Justin Pryzby
Thanks for reviewing again

On Sun, Feb 02, 2020 at 10:45:12AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. Here is some review comments:
> 
> 1.
> +typedef struct
> +{
> +   char*relnamespace;
> +   char*relname;
> +   char*indname; /* If vacuuming index */
> 
> I think "Non-null if vacuuming index" is better.

Actually it's undefined garbage (not NULL) if not vacuuming index.

> And tablename is better than relname for accuracy?

The existing code uses relname, so I left that, since it's strange to
start using tablename and then write things like:

|   errcbarg.tblname = relname;
...
|   errcontext(_("while scanning block %u of relation \"%s.%s\""),
|   cbarg->blkno, cbarg->relnamespace, cbarg->tblname);

Also, mat views can be vacuumed.

> 2.
> +   BlockNumber blkno;
> +   int stage;  /* 0: scan heap; 1: vacuum heap; 2: vacuum index */
> +} vacuum_error_callback_arg;
> 
> Why do we not support index cleanup phase?

The patch started out just handling scan_heap.  The added vacuum_heap.  Then
added vacuum_index.  Now, I've added index cleanup.

> 4.
> +/*
> + * Setup error traceback support for ereport()
> + * ->relnamespace and ->relname are already set
> + */
> +errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
> +errcbarg.stage = 1;
> 
> relnamespace and relname of errcbarg is not set as it is initialized
> in this function.

Thanks. That's an oversight from switching back to local vars instead of
LVRelStats while updating the patch while out of town..

I don't know how to consistently test the vacuum_heap case, but rechecked it 
just now.

postgres=# SET client_min_messages=debug; SET statement_timeout=0; UPDATE t SET 
a=1+a; SET statement_timeout=150; VACUUM(VERBOSE, PARALLEL 1) t;
...
2020-02-01 23:11:06.482 CST [26609] ERROR:  canceling statement due to 
statement timeout
2020-02-01 23:11:06.482 CST [26609] CONTEXT:  while vacuuming block 33 of 
relation "public.t"

> 5.
> @@ -177,6 +177,7 @@ typedef struct LVShared
>  * the lazy vacuum.
>  */
> Oid relid;
> +   charrelname[NAMEDATALEN];   /* tablename, used for error callback 
> */
> 
> How about getting relation
> name from index relation? That is, in lazy_vacuum_index we can get
> table oid from the index relation by IndexGetRelation() and therefore
> we can get the table name which is in palloc'd memory. That way we
> don't need to add relname to any existing struct such as LVRelStats
> and LVShared.

See attached

Also, I think we shouldn't show a block number if it's "Invalid", to avoid
saying "while vacuuming block 4294967295 of relation ..."

For now, I made it not show any errcontext at all in that case.
>From 94f715818dcdf3225a3e7404e395e4a0f0818b0c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v16 1/3] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..43859bd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,13 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	BlockNumber blkno;
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +368,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +732,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +880,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.phase = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
+
+	errcallbac

ALTER tbl rewrite loses CLUSTER ON index

2020-02-02 Thread Justin Pryzby
Other options are preserved by ALTER (and CLUSTER ON is and most obviously
should be preserved by CLUSTER's rewrite), so I think (SET) CLUSTER should be
preserved by ALTER, too.

As far as I can see, this should be the responsibility of something in the
vicinity of ATPostAlterTypeParse/RememberIndexForRebuilding.

Attach patch sketches a fix.

ts=# SET client_min_messages=debug; DROP TABLE t; CREATE TABLE t(i int); CREATE 
INDEX ON t(i)WITH(fillfactor=11, vacuum_cleanup_index_scale_factor=12); CLUSTER 
t USING t_i_key; ALTER TABLE t ALTER i TYPE bigint; \d t
SET
DEBUG:  drop auto-cascades to type t
DEBUG:  drop auto-cascades to type t[]
DEBUG:  drop auto-cascades to index t_i_idx
DROP TABLE
CREATE TABLE
DEBUG:  building index "t_i_idx" on table "t" serially
CREATE INDEX
ERROR:  index "t_i_key" for table "t" does not exist
DEBUG:  rewriting table "t"
DEBUG:  building index "t_i_idx" on table "t" serially
DEBUG:  drop auto-cascades to type pg_temp_3091172777
DEBUG:  drop auto-cascades to type pg_temp_3091172777[]
ALTER TABLE
 Table "public.t"
 Column |  Type  | Collation | Nullable | Default 
++---+--+-
 i  | bigint |   |  | 
Indexes:
"t_i_idx" btree (i) WITH (fillfactor='11', 
vacuum_cleanup_index_scale_factor='12')
>From f235a691722a464059358cd6b1d744f75d7bf92f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 2 Feb 2020 09:49:57 -0600
Subject: [PATCH v1] preserve CLUSTER ON during ALTER TABLE

---
 src/backend/commands/tablecmds.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f599393..c4e6cbd 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11616,6 +11616,7 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab)
 		}
 		else
 		{
+			Relation indrel;
 			/* OK, capture the index's existing definition string */
 			char	   *defstring = pg_get_indexdef_string(indoid);
 
@@ -11623,6 +11624,18 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab)
 indoid);
 			tab->changedIndexDefs = lappend(tab->changedIndexDefs,
 			defstring);
+			/* Preserve CLUSTER ON if set */
+			indrel = index_open(indoid, AccessShareLock);
+			if (indrel->rd_index->indisclustered) {
+char buf[3*NAMEDATALEN + 24];
+sprintf(buf, "ALTER TABLE %s CLUSTER ON %s",
+		get_rel_name(tab->relid), get_rel_name(indoid)); // XXX: schema
+tab->changedIndexOids = lappend_oid(tab->changedIndexOids,
+	indoid);
+tab->changedIndexDefs = lappend(tab->changedIndexDefs,
+pstrdup(buf));
+			}
+			index_close(indrel, NoLock);
 		}
 	}
 }
@@ -11901,6 +11914,11 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 	 * the new table definition.
 	 */
 }
+else if (cmd->subtype == AT_ClusterOn)
+{
+	tab->subcmds[AT_PASS_OLD_INDEX] =
+		lappend(tab->subcmds[AT_PASS_OLD_INDEX], cmd);
+}
 else
 	elog(ERROR, "unexpected statement subtype: %d",
 		 (int) cmd->subtype);
-- 
2.7.4



Re: typos in comments and user docs

2020-02-06 Thread Justin Pryzby
On Fri, Feb 07, 2020 at 08:33:40AM +0530, Amit Kapila wrote:
> On Thu, Feb 6, 2020 at 7:26 PM Justin Pryzby  wrote:
> >
> > On Thu, Feb 06, 2020 at 04:43:18PM +0530, Amit Kapila wrote:
> > > On Thu, Feb 6, 2020 at 10:45 AM Michael Paquier  
> > > wrote:
> > > >
> > > > On Thu, Feb 06, 2020 at 08:47:14AM +0530, Amit Kapila wrote:
> > > > > Your changes look fine to me on the first read.  I will push this to
> > > > > HEAD unless there are any objections.   If we want them in
> > > > > back-branches, we might want to probably segregate the changes based
> > > > > on the branch until those apply.
> > > >
> > > > +1.  It would be nice to back-patch the user-visible changes in the
> > > > docs.
> > > >
> > >
> > > Fair enough, Justin, is it possible for you to segregate the changes
> > > that can be backpatched?
> >
> > Looks like the whole patch can be applied to master and v12 [0].
> 
> If we decide to backpatch, then why not try to backpatch as far as
> possible (till 9.5)?  If so, then it would be better to separate
> changes which can be backpatched till 9.5, if that is tedious, then
> maybe we can just back-patch (in 12) bloom.sgml change as a separate
> commit and rest commit it in HEAD only.  What do you think?

I don't think I was clear.  My original doc review patches were limited to
this:

On Sat, Mar 30, 2019 at 05:43:33PM -0500, Justin Pryzby wrote:
> I reviewed docs like this:
> git log -p remotes/origin/REL_11_STABLE..HEAD -- doc


STABLE..REL_12_STABLE.  So after a few minutes earlier today of cherry-pick, I
concluded that only bloom.sgml is applicable further back than v12.  Probably,
I either noticed that minor issue at the same time as nearby doc changes in
v12(?), or maybe noticed that issue later, independently of doc review, but
then tacked it on to the previous commit, for lack of any better place.

Justin




Re: typos in comments and user docs

2020-02-06 Thread Justin Pryzby
On Fri, Feb 07, 2020 at 09:26:04AM +0530, Amit Kapila wrote:
> On Fri, Feb 7, 2020 at 8:41 AM Justin Pryzby  wrote:
> >
> > On Fri, Feb 07, 2020 at 08:33:40AM +0530, Amit Kapila wrote:
> > > On Thu, Feb 6, 2020 at 7:26 PM Justin Pryzby  wrote:
> > > >
> > > > On Thu, Feb 06, 2020 at 04:43:18PM +0530, Amit Kapila wrote:
> > > > > On Thu, Feb 6, 2020 at 10:45 AM Michael Paquier  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Feb 06, 2020 at 08:47:14AM +0530, Amit Kapila wrote:
> > > > > > > Your changes look fine to me on the first read.  I will push this 
> > > > > > > to
> > > > > > > HEAD unless there are any objections.   If we want them in
> > > > > > > back-branches, we might want to probably segregate the changes 
> > > > > > > based
> > > > > > > on the branch until those apply.
> > > > > >
> > > > > > +1.  It would be nice to back-patch the user-visible changes in the
> > > > > > docs.
> > > > > >
> > > > >
> > > > > Fair enough, Justin, is it possible for you to segregate the changes
> > > > > that can be backpatched?
> > > >
> > > > Looks like the whole patch can be applied to master and v12 [0].
> > >
> 
> I tried your patch master and it failed to apply.
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file doc/src/sgml/bloom.sgml
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file doc/src/sgml/config.sgml
> Hunk #1 FAILED at 4318.
> 1 out of 1 hunk FAILED -- saving rejects to file doc/src/sgml/config.sgml.rej

I think you applied the first patch, which I corrected here.
https://www.postgresql.org/message-id/20200206135640.GG403%40telsasoft.com

Just rechecked it works for master and v12.

$ git checkout -b test2  origin/master
Branch test2 set up to track remote branch master from origin.
Switched to a new branch 'test2'
$ patch -p1 <0001-spelling-and-typos.patch
patching file doc/src/sgml/bloom.sgml
patching file doc/src/sgml/ref/alter_table.sgml
patching file doc/src/sgml/sources.sgml
patching file src/backend/access/transam/README.parallel
patching file src/backend/storage/buffer/bufmgr.c
patching file src/backend/storage/sync/sync.c
patching file src/include/access/tableam.h

$ patch -p1 <0001-spelling-and-typos.patch
patching file doc/src/sgml/bloom.sgml
patching file doc/src/sgml/ref/alter_table.sgml
Hunk #1 succeeded at 220 (offset -2 lines).
patching file doc/src/sgml/sources.sgml
patching file src/backend/access/transam/README.parallel
patching file src/backend/storage/buffer/bufmgr.c
Hunk #1 succeeded at 4268 (offset -23 lines).
patching file src/backend/storage/sync/sync.c
patching file src/include/access/tableam.h
Hunk #1 succeeded at 1167 (offset -18 lines).

The bloom patch there works for v11.
Attached now another version for v10-.
>From 43c6b9ec80d26a858387ac657043988b8b52e812 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 6 Feb 2020 22:14:20 -0600
Subject: [PATCH] Bloom patch for v10 and v9.6

---
 doc/src/sgml/bloom.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 87d0ad7..84da235 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -65,7 +65,7 @@
  
   Number of bits generated for each index column. Each parameter's name
   refers to the number of the index column that it controls.  The default
-  is 2 bits and maximum is 4095.  Parameters for
+  is 2 bits and the maximum is 4095.  Parameters for
   index columns not actually used are ignored.
  
 
-- 
2.7.4



Re: ALTER tbl rewrite loses CLUSTER ON index (consider moving indisclustered to pg_class)

2020-02-07 Thread Justin Pryzby
On Thu, Feb 06, 2020 at 02:24:47PM -0300, Alvaro Herrera wrote:
> On 2020-Feb-06, Justin Pryzby wrote:
> 
> > I wondered if it wouldn't be better if CLUSTER ON was stored in pg_class as 
> > the
> > Oid of a clustered index, rather than a boolean in pg_index.
> 
> Maybe.  Do you want to try a patch?

I think the attached is 80% complete (I didn't touch pg_dump).

One objection to this change would be that all relations (including indices)
end up with relclustered fields, and pg_index already has a number of bools, so
it's not like this one bool is wasting a byte.

I think relisclustered was a's clever way of avoiding that overhead (c0ad5953).
So I would be -0.5 on moving it to pg_class..

But I think 0001 and 0002 are worthy.  Maybe the test in 0002 should live
somewhere else.
>From 7eea0a17e495fe13379ffd589b551f2f145f5672 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 6 Feb 2020 21:48:13 -0600
Subject: [PATCH v1 1/3] Update comment obsolete since b9b8831a

---
 src/backend/commands/cluster.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index e9d7a7f..3adcbeb 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1539,9 +1539,9 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	/*
 	 * Get all indexes that have indisclustered set and are owned by
-	 * appropriate user. System relations or nailed-in relations cannot ever
-	 * have indisclustered set, because CLUSTER will refuse to set it when
-	 * called with one of them as argument.
+	 * appropriate user. Shared relations cannot ever have indisclustered
+	 * set, because CLUSTER will refuse to set it when called with one as
+	 * an argument.
 	 */
 	indRelation = table_open(IndexRelationId, AccessShareLock);
 	ScanKeyInit(,
-- 
2.7.4

>From 4777be522a7aa8b8c77b13f765cbd02043438f2a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 7 Feb 2020 08:12:50 -0600
Subject: [PATCH v1 2/3] Give developer a helpful kick in the pants if they
 change natts in one place but not another

---
 src/backend/bootstrap/bootstrap.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index bfc629c..d5e1888 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -25,7 +25,9 @@
 #include "access/xlog_internal.h"
 #include "bootstrap/bootstrap.h"
 #include "catalog/index.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "common/link-canary.h"
 #include "libpq/pqsignal.h"
@@ -49,6 +51,7 @@
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/relmapper.h"
+#include "utils/syscache.h"
 
 uint32		bootstrap_data_checksum_version = 0;	/* No checksum */
 
@@ -602,6 +605,26 @@ boot_openrel(char *relname)
 	TableScanDesc scan;
 	HeapTuple	tup;
 
+	/* Check that pg_class data is consistent now, rather than failing obscurely later */
+	struct { Oid oid; int natts; }
+		checknatts[] = {
+		{RelationRelationId, Natts_pg_class,},
+		{TypeRelationId, Natts_pg_type,},
+		{AttributeRelationId, Natts_pg_attribute,},
+		{ProcedureRelationId, Natts_pg_proc,},
+	};
+
+	for (int i=0; irelnatts);
+		ReleaseSysCache(tuple);
+	}
+
 	if (strlen(relname) >= NAMEDATALEN)
 		relname[NAMEDATALEN - 1] = '\0';
 
-- 
2.7.4

>From ed886f8202486dea8069b719d35a5d0db7f3277c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 6 Feb 2020 12:56:34 -0600
Subject: [PATCH v1 3/3] Make cluster a property of table in pg_index..

..rather than of indexes in pg_index.

The only issue with this is that it makes pg_class larger, and the new column
applies not only to tables, but to indices.
---
 doc/src/sgml/catalogs.sgml |  14 +--
 src/backend/catalog/heap.c |   1 +
 src/backend/catalog/index.c|   6 --
 src/backend/commands/cluster.c | 172 +
 src/backend/commands/tablecmds.c   |   5 +-
 src/backend/utils/cache/relcache.c |   1 -
 src/bin/psql/describe.c|   4 +-
 src/include/catalog/pg_class.dat   |   2 +-
 src/include/catalog/pg_class.h |   3 +
 src/include/catalog/pg_index.h |   1 -
 10 files changed, 93 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a10b665..8efeaff 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1752,6 +1752,13 @@ SCRAM-SHA-256$iteration count:
  
 
  
+  relclustered
+  oid
+  
+  The OID of the index last clustered, or zero
+ 
+
+ 
   relpages
   int4
   
@@ -3808,13 +3815,6 @@ SCRAM-SHA-256$iteration count:
  
 
  
-  indisclu

Re: error context for vacuum to include block number

2020-02-07 Thread Justin Pryzby
On Tue, Feb 04, 2020 at 01:58:20PM +0900, Masahiko Sawada wrote:
> Here is the comment for v16 patch:
> 
> 2.
> I think we can set the error context for heap scan again after
> freespace map vacuum finishing, maybe after reporting the new phase.
> Otherwise the user will get confused if an error occurs during
> freespace map vacuum. And I think the comment is unclear, how about
> "Set the error context fro heap scan again"?

Good point

> 3.
> +   if (cbarg->blkno!=InvalidBlockNumber)
> +   errcontext(_("while scanning block %u of relation \"%s.%s\""),
> +   cbarg->blkno, cbarg->relnamespace, cbarg->relname);
> 
> We can use BlockNumberIsValid macro instead.

Thanks.  See attached, now squished together patches.

I added functions to initialize the callbacks, so error handling is out of the
way and minimally distract from the rest of vacuum.
>From 95265412c56f3b308eed16531d7c83243e278f4f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v17 1/2] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 117 +++
 1 file changed, 117 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..9358ab4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname; /* undefined while not processing index */
+	BlockNumber blkno;	/* undefined while not processing heap */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +733,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +881,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.phase = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) 
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +913,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1011,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1038,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1627,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
@@ -1772,11 +1805,26 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	int			npages;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
+	ErrorContextCallback e

ALTER TABLE rewrite to use clustered order

2020-02-08 Thread Justin Pryzby
Forking this thread
https://www.postgresql.org/message-id/20181227132417.xe3oagawina7775b%40alvherre.pgsql

On Wed, Dec 26, 2018 at 01:09:39PM -0500, Robert Haas wrote:
> ALTER TABLE already has a lot of logic that is oriented towards being
> able to do multiple things at the same time.  If we added CLUSTER,
> VACUUM FULL, and REINDEX to that set, then you could, say, change a
> data type, cluster, and change tablespaces all in a single SQL
> command.

On Thu, Dec 27, 2018 at 10:24:17AM -0300, Alvaro Herrera wrote:
> I think it would be valuable to have those ALTER TABLE variants that rewrite
> the table do so using the cluster order, if there is one, instead of the heap
> order, which is what it does today.

That's a neat idea.

I haven't yet fit all of ALTERs processing logic in my head ... but there's an
issue that ALTER (unlike CLUSTER) needs to deal with column type promotion, so
the indices may need to be dropped and recreated.  The table rewrite happens
AFTER dropping indices (and all other processing), but the clustered index
can't be scanned if it's just been dropped.  I handled that by using a
tuplesort, same as heapam_relation_copy_for_cluster.

Experimental patch attached.  With clustered ALTER:

template1=# DROP TABLE t; CREATE TABLE t AS SELECT generate_series(1,999)i; 
CREATE INDEX ON t(i DESC); ALTER TABLE t CLUSTER ON t_i_idx; ALTER TABLE t 
ALTER i TYPE bigint; SELECT * FROM t LIMIT 9;
DROP TABLE
SELECT 999
CREATE INDEX
ALTER TABLE
ALTER TABLE
  i  
-
 999
 998
 997
 996
 995
 994
 993
 992
 991
(9 rows)

0001 patch is stolen from the nearby thread:
https://www.postgresql.org/message-id/flat/20200207143935.GP403%40telsasoft.com
It doesn't make much sense for ALTER to use a clustered index when rewriting a
table, if doesn't also go to the effort to preserve the cluster property when
rebuilding its indices.

0002 patch is included and not squished with 0003 to show the original
implementation using an index scan (by not dropping indices on the old table,
and breaking various things), and the evolution to tuplesort.

Note, this doesn't use clustered order when rewriting only due to tablespace
change.  Alter currently does an AM specific block copy without looking at
tuples.  But I think it'd be possible to use tuplesort and copy if desired.
>From f93bbd6c30e883068f46ff86def28d0e66aea4f5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 7 Feb 2020 22:06:57 -0600
Subject: [PATCH v1 1/3] Preserve CLUSTER ON index during ALTER rewrite

Amit Langote and Justin Pryzby

https://www.postgresql.org/message-id/flat/20200202161718.GI13621%40telsasoft.com
---
 src/backend/commands/tablecmds.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b7c8d66..642a85c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 static void RebuildConstraintComment(AlteredTableInfo *tab, int pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
+static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
@@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			newcmd->def = (Node *) stmt;
 			tab->subcmds[AT_PASS_OLD_INDEX] =
 lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd);
+
+			/* Preserve index's indisclustered property, if set. */
+			PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId);
 		}
 		else if (IsA(stm, AlterTableStmt))
 		{
@@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			 rel,
 			 NIL,
 			 indstmt->idxname);
+
+	/* Preserve index's indisclustered property, if set. */
+	PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid);
 }
 else if (cmd->subtype == AT_AddConstraint)
 {
@@ -11997,6 +12004,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid,
 }
 
 /*
+ * For a table's index that is to be recreated due to PostAlterType
+ * processing, preserve its indisclustered property by issuing ALTER TABLE
+ * CLUSTER ON command on the table that will run after the command to recreate
+ * the index.
+ */
+static void
+PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	Assert(OidIsValid(indoid));
+	Assert(pass == AT_PASS_OLD_INDEX);
+
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indoid);
+	indexForm = (Form_pg_index) GETS

Re: error context for vacuum to include block number

2020-01-24 Thread Justin Pryzby
Thanks for reviewing

On Wed, Jan 22, 2020 at 05:37:06PM +0900, Masahiko Sawada wrote:
> I'm not sure it's worth to have patches separately but I could apply

The later patches expanded on the initial scope, and to my understanding the
1st callback is desirable but the others are maybe still at question.

> 1. * The comment should be updated as we use both relname and
> relnamespace for ereporting.
> 
> * We can leave both blkno and stage that are used only for error
> context reporting put both relname and relnamespace to top of
> LVRelStats.

Updated in the 0005 - still not sure if that patch will be desirable, though.
Also, I realized that it's we cannot use vacrelstats->blkno instead of local
blkno variable, since vacrelstats->blkno is used simultaneously by scan heap
and vacuum heap).

> * The 'stage' is missing to support index cleanup.

But the callback isn't used during index cleanup ?

> * It seems to me strange that only initialization of latestRemovedXid
> is done after error callback initialization.

Yes, that was silly - I thought it was just an artifact of diff's inability to
express rearranged code any better.

> * Maybe we can initialize relname and relnamespace in heap_vacuum_rel
> rather than in lazy_scan_heap. BTW variables of vacrelstats are
> initialized different places: some of them in heap_vacuum_rel and
> others in lazy_scan_heap. I think we can gather those that can be
> initialized at that time to heap_vacuum_rel.

I think that's already true ?  But topic for a separate patch, if not.

> 3. Maybe we can do like:

done

> 4. These comments can be merged like:

done

> 5. Why do we need to initialize blkno in spite of not using?

removed

> 6.
> +   cbarg->blkno, cbarg->relnamespace, cbarg->relname);
> * 'vacrelstats' would be a better name than 'cbarg'.

Really?  I'd prefer to avoid repeating long variable three times:

vacrelstats->blkno, vacrelstats->relnamespace, 
vacrelstats->relname);

> * In index vacuum, how about "while vacuuming index \"%s.%s\""?

Yes.  I'm still unclear if this is useful without a block number, or why it
wouldn't be better to write DEBUG1 log with index name before vacuuming each.

Justin
>From 6332127178e29967dfeb12577eb9a61e813a33a8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v13 1/5] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 41 ++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b331f4c..822fa3d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -287,8 +287,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used by the error callback */
+	char		*relname;
+	char 		*relnamespace;
+	BlockNumber blkno;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -358,6 +362,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -721,6 +726,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
 
 	pg_rusage_init();
 
@@ -867,6 +873,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = relname;
+	vacrelstats->blkno = InvalidBlockNumber; /* Not known yet */
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) vacrelstats;
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -888,6 +904,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		vacrelstats->blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -984,13 +1002,18 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbu

tableam options for pg_dump/ALTER/LIKE

2020-01-28 Thread Justin Pryzby
I made these casual comments.  If there's any agreement on their merit, it'd be
nice to implement at least the first for v13.

In <20190818193533.gl11...@telsasoft.com>, I wrote: 
>  . What do you think about pg_restore --no-tableam; similar to 
>--no-tablespaces, it would allow restoring a table to a different AM:
>PGOPTIONS='-c default_table_access_method=zedstore' pg_restore 
> --no-tableam ./pg_dump.dat -d postgres
>Otherwise, the dump says "SET default_table_access_method=heap", which
>overrides any value from PGOPTIONS and precludes restoring to new AM.

That appears to be a trivial variation on no-tablespace:

/* do nothing in --no-tablespaces mode */
if (ropt->noTablespace)
return;

>  . it'd be nice if there was an ALTER TABLE SET ACCESS METHOD, to allow
>migrating data.  Otherwise I think the alternative is:
>   begin; lock t;
>   CREATE TABLE new_t LIKE (t INCLUDING ALL EXCLUDING INDEXES) USING 
> (zedstore);
>   INSERT INTO new_t SELECT * FROM t;
>   for index; do CREATE INDEX...; done
>   DROP t; RENAME new_t (and all its indices). attach/inherit, etc.
>   commit;

Ideally that would allow all at once various combinations of altering
tablespace, changing AM, clustering, and reindexing, like what's discussed
here:
https://www.postgresql.org/message-id/flat/8a8f5f73-00d3-55f8-7583-1375ca8f6...@postgrespro.ru

>  . Speaking of which, I think LIKE needs a new option for ACCESS METHOD, which
>is otherwise lost.




Re: ALTER tbl rewrite loses CLUSTER ON index

2020-02-05 Thread Justin Pryzby
On Wed, Feb 05, 2020 at 03:53:45PM +0900, Amit Langote wrote:
> Hi Justin,
> 
> On Mon, Feb 3, 2020 at 1:17 AM Justin Pryzby  wrote:
> > Other options are preserved by ALTER (and CLUSTER ON is and most obviously
> > should be preserved by CLUSTER's rewrite), so I think (SET) CLUSTER should 
> > be
> > preserved by ALTER, too.
> 
> Yes.
> 
> create table foo (a int primary key);
> cluster foo;
> ERROR:  there is no previously clustered index for table "foo"
> cluster foo using foo_pkey;
> alter table foo alter a type bigint;
> cluster foo;
> ERROR:  there is no previously clustered index for table "foo"
> 
> With your patch, this last error doesn't occur.
> 
> Like you, I too suspect that losing indisclustered like this is
> unintentional, so should be fixed.

Thanks for checking.

It doesn't need to be said, but your patch is obviously superior.

I ran into this while looking into a suggestion from Alvaro that ALTER should
rewrite in order of a clustered index (if any) rather than in pre-existing heap
order (more on that another day).  So while this looks like a bug, and I can't
think how a backpatch would break something, my suggestion is that backpatching
a fix is of low value, so it's only worth +0.

Thanks
Justin




Re: error context for vacuum to include block number

2020-02-14 Thread Justin Pryzby
On Fri, Feb 14, 2020 at 12:30:25PM +0900, Masahiko Sawada wrote:
> * I think the function name is too generic. init_vacuum_error_callback
> or init_vacuum_errcallback is better.

> * The comment of this function is not accurate since this function is
> not only for heap vacuum but also index vacuum. How about just
> "Initialize vacuum error callback"?

> * I think it's easier to read the code if we set the relname and
> indname in the same order.

> * The comment I wrote in the previous mail seems better, because in
> this function the reader might get confused that 'rel' is a relation
> or an index depending on the phase but that comment helps it.

Fixed these

> * rel->rd_index->indexrelid should be rel->rd_index->indrelid.

Ack.  I think that's been wrong since I first wrote it two weeks ago :(
The error is probably more obvious due to the switch statement you proposed.

Thanks for continued reviews.

-- 
Justin
>From 94768a134118d30853b75a96b90166363f0fef5b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v19] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 120 +++
 1 file changed, 120 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..ebfb2e7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel, PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +908,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1006,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1033,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1622,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
@@ -1772,11 +1800,19 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	int			npages;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	/* Report that we are now vacuuming the heap */
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
  PROGRESS_VACUUM_PHAS

Re: explain HashAggregate to report bucket and memory stats

2020-02-19 Thread Justin Pryzby
On Sun, Feb 16, 2020 at 11:53:07AM -0600, Justin Pryzby wrote:
> Updated:
> 
>  . remove from explain analyze those tests which would display sort
>Memory/Disk.  Oops.

 . Rebased on top of 5b618e1f48aecc66e3a9f60289491da520faae19
 . Updated to avoid sort's Disk output, for real this time.
 . And fixed a syntax error in an intermediate commit.

-- 
Justin
>From 19ad84696710de7b5ac19e1124856701697d28c0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Feb 2020 12:03:11 -0600
Subject: [PATCH v4 1/7] Run some existing tests with explain (ANALYZE)..

..in a separate, earlier patch, to better show what bits are added by later
patches for hashtable instrumentation.
---
 src/test/regress/expected/groupingsets.out| 57 +++---
 src/test/regress/expected/select_parallel.out | 20 
 src/test/regress/expected/subselect.out   | 69 +++
 src/test/regress/expected/union.out   | 43 +
 src/test/regress/sql/groupingsets.sql | 12 ++---
 src/test/regress/sql/select_parallel.sql  |  4 +-
 src/test/regress/sql/subselect.sql| 25 ++
 src/test/regress/sql/union.sql|  4 +-
 8 files changed, 166 insertions(+), 68 deletions(-)

diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c..95d619c 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -458,16 +458,17 @@ ERROR:  aggregate functions are not allowed in FROM clause of their own query le
 LINE 3:lateral (select a, b, sum(v.x) from gstest_data(v.x) ...
  ^
 -- min max optimization should still work with GROUP BY ()
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select min(unique1) from tenk1 GROUP BY ();
- QUERY PLAN 
-
- Result
+ QUERY PLAN 
+
+ Result (actual rows=1 loops=1)
InitPlan 1 (returns $0)
- ->  Limit
-   ->  Index Only Scan using tenk1_unique1 on tenk1
+ ->  Limit (actual rows=1 loops=1)
+   ->  Index Only Scan using tenk1_unique1 on tenk1 (actual rows=1 loops=1)
  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+ Heap Fetches: 0
+(6 rows)
 
 -- Views with GROUPING SET queries
 CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
@@ -1126,14 +1127,14 @@ select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a)
 ---+---+-+---
 (0 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
-   QUERY PLAN   
-
- HashAggregate
+   QUERY PLAN   
+
+ HashAggregate (actual rows=0 loops=1)
Hash Key: a, b
Hash Key: a
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (4 rows)
 
 select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
@@ -1150,16 +1151,16 @@ select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),()
|   | | 0
 (3 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
-   QUERY PLAN   
-
- MixedAggregate
+   QUERY PLAN   
+
+ MixedAggregate (actual rows=3 loops=1)
Hash Key: a, b
Group Key: ()
Group Key: ()
Group Key: ()
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (6 rows)
 
 select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
@@ -1170,15 +1171,15 @@ select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
  | 0
 (3 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
-   QUERY PLAN   
-
- Aggregate
+   QUERY PLAN   
+
+ Aggregate (actual rows=3 loops=1)
Group Key: ()
Group Key: ()
Group Key: ()
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (5 rows)
 
 -- check that functionally depen

Re: error context for vacuum to include block number

2020-02-19 Thread Justin Pryzby
Rebased on top of 007491979461ff10d487e1da9bcc87f2fd834f26

Also, I was thinking that lazy_scan_heap doesn't needs to do this:

+   /* Pop the error context stack while calling vacuum */
+   error_context_stack = errcallback.previous;
...
+   /* Set the error context while continuing heap scan */
+   error_context_stack = 

It seems to me that's not actually necessary, since lazy_vacuum_heap will just
*push* a context handler onto the stack, and then pop it back off.  We don't
need to pop our context beforehand.  We also vacuum the FSM, and one might say
that we shouldn't report "...while scanning block number..." if it was
"vacuuming FSM" instead of "scanning heap", to which I would reply that either:
vacuuming FSM could be considered a part of scanning heap??  Or, maybe we
should add an additional callback for that, which is only not very nice since
we'd need to add a PROGRESS enum for which we don't actually report PROGRESS
(or stop using that enum).

I tested using variations on this that works as expected, that context is
correct during vacuum while scanning and after vacuum while scanning:

template1=# SET statement_timeout=0; SET maintenance_work_mem='1MB'; DROP TABLE 
tt; CREATE UNLOGGED TABLE tt(i int); INSERT INTO tt SELECT 
generate_series(1,39); CREATE INDEX ON tt(i); UPDATE tt SET i=i-1; SET 
statement_timeout=1222; VACUUM VERBOSE tt;

>From 91158171f75cd20e69b18843dd3b6525961e4e8b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v21 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 ++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43ef..9e69294 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +909,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1007,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1034,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1623,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* re

Re: subplan resets wrong hashtable

2020-02-09 Thread Justin Pryzby
On Sun, Feb 09, 2020 at 08:01:26PM -0800, Andres Freund wrote:
> Ugh, that indeed looks wrong. Did you check whether it can actively
> cause wrong query results? If so, did you do theoretically, or got to a
> query returning wrong results?

Actually .. I can "theoretically" prove that there's no wrong results from that
patch...since in that file it has no effect, the tested variables being zeroed
few lines earlier:

 @@ -499,51 +499,60 @@ buildSubPlanHash(SubPlanState *node, ExprContext 
*econtext)
*node->hashtable = NULL;
*node->hashnulls = NULL;
 node->havehashrows = false;
 node->havenullrows = false;
  
 nbuckets = (long) Min(planstate->plan->plan_rows, (double) LONG_MAX);
 if (nbuckets < 1)
 nbuckets = 1;
  
 -   node->hashtable = BuildTupleHashTable(node->parent,
 -  
   node->descRight,
 -  
   ncols,
 -  
   node->keyColIdx,
 -  
   node->tab_eq_funcoids,
 -  
   node->tab_hash_funcs,
 -  
   nbuckets,
 -  
   0,
 -  
   node->hashtablecxt,
 -  
   node->hashtempcxt,
 -  
   false);
*+   if (node->hashtable)
 +   ResetTupleHashTable(node->hashtable);
 +   else
 +   node->hashtable = BuildTupleHashTableExt(node->parent,
 
 ...
*+   if (node->hashnulls)
 +   ResetTupleHashTable(node->hashtable);
 +   else
 +   node->hashnulls = BuildTupleHashTableExt(node->parent,
 +  
  node->descRight,






subplan resets wrong hashtable

2020-02-09 Thread Justin Pryzby
I believe the 2nd hunk should reset node->hashnulls, rather than reset
->hashtable a 2nd time:

@@ -505,7 +505,10 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
if (nbuckets < 1)
nbuckets = 1;
 
-   node->hashtable = BuildTupleHashTable(node->parent,
+   if (node->hashtable)
+   ResetTupleHashTable(node->hashtable);
+   else
+   node->hashtable = BuildTupleHashTableExt(node->parent,

 node->descRight,

 ncols,

 node->keyColIdx,
...

@@ -527,7 +531,11 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
if (nbuckets < 1)
nbuckets = 1;
}
-   node->hashnulls = BuildTupleHashTable(node->parent,
+
+   if (node->hashnulls)
+   ResetTupleHashTable(node->hashtable);
+   else
+   node->hashnulls = BuildTupleHashTableExt(node->parent,

 node->descRight,

 ncols,

 node->keyColIdx,

Added here:

commit 356687bd825e5ca7230d43c1bffe7a59ad2e77bd
Author: Andres Freund 
Date:   Sat Feb 9 00:35:57 2019 -0800

Reset, not recreate, execGrouping.c style hashtables.




Re: subplan resets wrong hashtable

2020-02-09 Thread Justin Pryzby
On Sun, Feb 09, 2020 at 08:01:26PM -0800, Andres Freund wrote:
> Ugh, that indeed looks wrong. Did you check whether it can actively
> cause wrong query results? If so, did you do theoretically, or got to a
> query returning wrong results?

No, I only noticed while reading code.

I tried briefly to find a plan that looked like what I thought might be broken,
but haven't found anything close.

Justin




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-02-11 Thread Justin Pryzby
For your v7 patch, which handles REINDEX to a new tablespace, I have a few
minor comments:

+ * the relation will be rebuilt.  If InvalidOid is used, the default

=> should say "currrent", not default ?

+++ b/doc/src/sgml/ref/reindex.sgml
+TABLESPACE
...
+new_tablespace

=> I saw you split the description of TABLESPACE from new_tablespace based on
comment earlier in the thread, but I suggest that the descriptions for these
should be merged, like:

+   
+TABLESPACEnew_tablespace
+
+ 
+  Allow specification of a tablespace where all rebuilt indexes will be 
created.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM are specified, 
then
+  all unsuitable relations will be skipped and a single 
WARNING
+  will be generated.
+ 
+
+   

The existing patch is very natural, especially the parts in the original patch
handling vacuum full and cluster.  Those were removed to concentrate on
REINDEX, and based on comments that it might be nice if ALTER handled CLUSTER
and VACUUM FULL.  On a separate thread, I brought up the idea of ALTER using
clustered order.  Tom pointed out some issues with my implementation, but
didn't like the idea, either.

So I suggest to re-include the CLUSTER/VAC FULL parts as a separate 0002 patch,
the same way they were originally implemented.

BTW, I think if "ALTER" were updated to support REINDEX (to allow multiple
operations at once), it might be either:
|ALTER INDEX i SET TABLESPACE , REINDEX -- to reindex a single index on a given 
tlbspc
or
|ALTER TABLE tbl REINDEX USING INDEX TABLESPACE spc; -- to reindex all inds on 
table inds moved to a given tblspc
"USING INDEX TABLESPACE" is already used for ALTER..ADD column/table CONSTRAINT.

-- 
Justin




Re: explain HashAggregate to report bucket and memory stats

2020-02-16 Thread Justin Pryzby
Updated:

 . remove from explain analyze those tests which would display sort
   Memory/Disk.  Oops.
 . fix issue with the first patch showing zero "tuples" memory for some
   grouping sets.
 . reclassify memory as "tuples" if it has to do with "members".  So hashtable
   size is now redundant with nbuckets (if you know
   sizeof(TupleHashEntryData));

-- 
Justin
>From c989b75f820dbda0540b3d2cd092eaf1f8629baa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Feb 2020 12:03:11 -0600
Subject: [PATCH v3 1/7] Run some existing tests with explain (ANALYZE)..

..in a separate, earlier patch, to better show what bits are added by later
patches for hashtable instrumentation.
---
 src/test/regress/expected/groupingsets.out| 87 ++-
 src/test/regress/expected/select_parallel.out | 20 +++---
 src/test/regress/expected/subselect.out   | 69 +
 src/test/regress/expected/union.out   | 43 ++---
 src/test/regress/sql/groupingsets.sql | 16 ++---
 src/test/regress/sql/select_parallel.sql  |  4 +-
 src/test/regress/sql/subselect.sql| 25 
 src/test/regress/sql/union.sql|  4 +-
 8 files changed, 184 insertions(+), 84 deletions(-)

diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c..c052f7e 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -458,16 +458,17 @@ ERROR:  aggregate functions are not allowed in FROM clause of their own query le
 LINE 3:lateral (select a, b, sum(v.x) from gstest_data(v.x) ...
  ^
 -- min max optimization should still work with GROUP BY ()
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select min(unique1) from tenk1 GROUP BY ();
- QUERY PLAN 
-
- Result
+ QUERY PLAN 
+
+ Result (actual rows=1 loops=1)
InitPlan 1 (returns $0)
- ->  Limit
-   ->  Index Only Scan using tenk1_unique1 on tenk1
+ ->  Limit (actual rows=1 loops=1)
+   ->  Index Only Scan using tenk1_unique1 on tenk1 (actual rows=1 loops=1)
  Index Cond: (unique1 IS NOT NULL)
-(5 rows)
+ Heap Fetches: 0
+(6 rows)
 
 -- Views with GROUPING SET queries
 CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
@@ -1126,14 +1127,14 @@ select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a)
 ---+---+-+---
 (0 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
-   QUERY PLAN   
-
- HashAggregate
+   QUERY PLAN   
+
+ HashAggregate (actual rows=0 loops=1)
Hash Key: a, b
Hash Key: a
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (4 rows)
 
 select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
@@ -1150,16 +1151,16 @@ select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),()
|   | | 0
 (3 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
-   QUERY PLAN   
-
- MixedAggregate
+   QUERY PLAN   
+
+ MixedAggregate (actual rows=3 loops=1)
Hash Key: a, b
Group Key: ()
Group Key: ()
Group Key: ()
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (6 rows)
 
 select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
@@ -1170,15 +1171,15 @@ select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
  | 0
 (3 rows)
 
-explain (costs off)
+explain (costs off, timing off, summary off, analyze)
   select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
-   QUERY PLAN   
-
- Aggregate
+   QUERY PLAN   
+
+ Aggregate (actual rows=3 loops=1)
Group Key: ()
Group Key: ()
Group Key: ()
-   ->  Seq Scan on gstest_empty
+   ->  Seq Scan on gstest_empty (actual rows=0 loops=1)
 (5 rows)
 
 -- check that functional

reindex concurrently and two toast indexes

2020-02-16 Thread Justin Pryzby
Forking old, long thread:
https://www.postgresql.org/message-id/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:
> About reindex invalid indexes - i found one good question in archives [1]: 
> how about toast indexes?
> I check it now, i am able drop invalid toast index, but i can not drop 
> reduntant valid index.
> Reproduce:
> session 1: begin; select from test_toast ... for update;
> session 2: reindex table CONCURRENTLY test_toast ;
> session 2: interrupt by ctrl+C
> session 1: commit
> session 2: reindex table test_toast ;
> and now we have two toast indexes. DROP INDEX is able to remove only invalid 
> ones. Valid index gives "ERROR:  permission denied: 
> "pg_toast_16426_index_ccnew" is a system catalog"
> [1]: 
> https://www.postgresql.org/message-id/CAB7nPqT%2B6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw%40mail.gmail.com

It looks like this was never addressed.

I noticed a ccnew toast index sitting around since October - what do I do with 
it ?

ts=# DROP INDEX pg_toast.pg_toast_463881620_index_ccnew;
ERROR:  permission denied: "pg_toast_463881620_index_ccnew" is a system catalog

-- 
Justin




Re: error context for vacuum to include block number

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch!
> 
> 1.
> The above lines need a new line.

Done, thanks.

> 2.
> In lazy_vacuum_heap, we set the error context and then call
> pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> the opposite. And lazy_scan_heap also call pg_rusage first. I think
> lazy_vacuum_heap should follow them for consistency. That is, we can
> set error context after pages = 0.

Right. Maybe I did it the other way because the two uses of
PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.

> 3.
> We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> error context in lazy_truncate_heap as well. What do you think?
> 
> I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> no-op or explain it as a comment even if we don't.

I don't have strong feelings either way.

I looked a bit at the truncation phase.  It also truncates the FSM and VM
forks, which could be misleading if the error was in one of those files and not
the main filenode.

I'd have to find a way to test it... 
...and was pleasantly surprised to see that earlier phases don't choke if I
(re)create a fake 4GB table like:

postgres=# CREATE TABLE trunc(i int);
CREATE TABLE
postgres=# \x\t
Expanded display is on.
Tuples only is on.
postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
relfilenode | 59068

postgres=# \! bash -xc 'truncate -s 1G 
./pgsql13.dat111/base/12689/59068{,.{1..3}}'
+ truncate -s 1G ./pgsql13.dat111/base/12689/59074 
./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 
./pgsql13.dat111/base/12689/59074.3

postgres=# \timing 
Timing is on.
postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM 
VERBOSE trunc;
INFO:  vacuuming "public.trunc"
INFO:  "trunc": found 0 removable, 0 nonremovable row versions in 524288 out of 
524288 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2098
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
524288 pages are entirely empty.
CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
ERROR:  canceling statement due to statement timeout
CONTEXT:  while truncating relation "public.trunc" to 0 blocks

The callback surrounding RelationTruncate() seems hard to hit unless you add
CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.

The truncation uses a prefetch, which is more likely to hit any lowlevel error,
so I added callback there, too.

BTW, for the index cases, I didn't like repeating the namespace here, but WDYT ?
|CONTEXT:  while vacuuming index "public.t_i_idx3" of relation "t"

Thanks for rerere-reviewing.

-- 
Justin
>From 8295224470e0ce236025dfbff50de052978fae1d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v19 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 +++
 1 file changed, 130 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..5e734ee 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_call

Re: error context for vacuum to include block number

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 02:18:11PM +0900, Masahiko Sawada wrote:
> Oops it seems to me that it's better to set error context after
> tupindex = 0. Sorry for my bad.

I take your point but did it differently - see what you think

> And the above comment can be written in a single line as other same comments.

Thanks :)

> Hmm I don't think it's a good idea to have count_nondeletable_pages
> set the error context of PHASE_TRUNCATE.

I think if we don't do it there then we shouldn't bother handling
PHASE_TRUNCATE at all, since that's what's likely to hit filesystem or other
lowlevel errors, before lazy_truncate_heap() hits them.

> Because the patch sets the
> error context during RelationTruncate that actually truncates the heap
> but count_nondeletable_pages doesn't truncate it.

I would say that ReadBuffer called by the prefetch in
count_nondeletable_pages() is called during the course of truncation, the same
as ReadBuffer called during the course of vacuuming can be attributed to
vacuuming.

> I think setting the error context only during RelationTruncate would be a
> good start. We can hear other opinions from other hackers. Some hackers may
> want to set the error context for whole lazy_truncate_heap.

I avoided doing that since it has several "return" statements, each of which
would need to "Pop the error context stack", which is at risk of being
forgotten and left unpopped by anyone who adds or changes flow control.

Also, I just added this to the TRUNCATE case, even though that should never
happen: if (BlockNumberIsValid(cbarg->blkno))...

-- 
Justin
>From 977b1b5e00ce522bd775cf91f7a9c7a9345d3171 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v20 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 ++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..ce3efd7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +909,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1007,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1034,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1623,9 @@ lazy_scan_heap(Relation onerel, VacuumParam

Re: assert pg_class.relnatts is consistent

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 01:25:05PM +0900, Amit Langote wrote:
> > Pushed both of those.
> 
> Thank you.
> 
> It's amazing to see how simple bootstrapping has now become thanks to
> the work you guys have done recently.

On Fri, Feb 14, 2020 at 06:00:05PM +0900, Amit Langote wrote:
> > I can't write Perl myself (maybe Justin), but +1 to this idea.
> 
> I tried and think it works but not sure if that's good Perl
> programming.  See the attached.

And thanks for picking up perl so I didn't have to remember what I ever knew.

-- 
Justin




Re: ALTER tbl rewrite loses CLUSTER ON index (consider moving indisclustered to pg_class)

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 02:31:42PM +0900, Amit Langote wrote:
> Hi Justin,
> 
> On Fri, Feb 7, 2020 at 11:39 PM Justin Pryzby  wrote:
> > On Thu, Feb 06, 2020 at 02:24:47PM -0300, Alvaro Herrera wrote:
> > > On 2020-Feb-06, Justin Pryzby wrote:
> > >
> > > > I wondered if it wouldn't be better if CLUSTER ON was stored in 
> > > > pg_class as the
> > > > Oid of a clustered index, rather than a boolean in pg_index.
> > >
> > > Maybe.  Do you want to try a patch?
> >
> > I think the attached is 80% complete (I didn't touch pg_dump).
> >
> > One objection to this change would be that all relations (including indices)
> > end up with relclustered fields, and pg_index already has a number of 
> > bools, so
> > it's not like this one bool is wasting a byte.
> >
> > I think relisclustered was a's clever way of avoiding that overhead 
> > (c0ad5953).
> > So I would be -0.5 on moving it to pg_class..

In case there's any confusion: "a's" was probably me halfway changing
"someone's" to "a".

> Are you still for fixing ALTER TABLE losing relisclustered with the
> patch we were working on earlier [1], if not for moving relisclustered
> to pg_class anymore?

Thanks for remembering this one.

I think your patch is the correct fix.

I forgot to say it, but moving relisclustered to pg_class doesn't help to avoid
losting indisclustered: it still needs a fix just like this.  Anyway, I
withdrew my suggestion for moving to pg_class, since it has more overhead, even
for pg_class rows for relations which can't have indexes.

> I have read elsewhere [2] that forcing ALTER TABLE to rewrite in
> clustered order might not be a good option, but maybe that one is a
> more radical proposal than this.

Right; your fix seems uncontroversial.  I ran into this (indisclustered) bug
while starting to write that patch for "ALTER rewrite in clustered order".

-- 
Justin




Re: reindex concurrently and two toast indexes

2020-02-22 Thread Justin Pryzby
On Tue, Feb 18, 2020 at 02:29:33PM +0900, Michael Paquier wrote:
> On Sun, Feb 16, 2020 at 01:08:35PM -0600, Justin Pryzby wrote:
> > Forking old, long thread:
> > https://www.postgresql.org/message-id/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
> > On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:
> >> About reindex invalid indexes - i found one good question in archives [1]: 
> >> how about toast indexes?
> >> I check it now, i am able drop invalid toast index, but i can not drop 
> >> reduntant valid index.
> >> Reproduce:
> >> session 1: begin; select from test_toast ... for update;
> >> session 2: reindex table CONCURRENTLY test_toast ;
> >> session 2: interrupt by ctrl+C
> >> session 1: commit
> >> session 2: reindex table test_toast ;
> >> and now we have two toast indexes. DROP INDEX is able to remove
> >> only invalid ones. Valid index gives "ERROR:  permission denied:
> >> "pg_toast_16426_index_ccnew" is a system catalog" 
> >> [1]: 
> >> https://www.postgresql.org/message-id/CAB7nPqT%2B6igqbUb59y04NEgHoBeUGYteuUr89AKnLTFNdB8Hyw%40mail.gmail.com
> > 
> > It looks like this was never addressed.
> 
> On HEAD, this exact scenario leads to the presence of an old toast
> index pg_toast.pg_toast_*_index_ccold, causing the index to be skipped
> on a follow-up concurrent reindex:
> =# reindex table CONCURRENTLY test_toast ;
> WARNING:  XX002: cannot reindex invalid index
> "pg_toast.pg_toast_16385_index_ccold" concurrently, skipping
> LOCATION:  ReindexRelationConcurrently, indexcmds.c:2863
> REINDEX
> 
> And this toast index can be dropped while it remains invalid:
> =# drop index pg_toast.pg_toast_16385_index_ccold;
> DROP INDEX
> 
> I recall testing that stuff for all the interrupts which could be
> triggered and in this case, this waits at step 5 within
> WaitForLockersMultiple().  Now, in your case you take an extra step
> with a plain REINDEX, which forces a rebuild of the invalid toast
> index, making it per se valid, and not droppable.
> 
> Hmm.  There could be an argument here for skipping invalid toast
> indexes within reindex_index(), because we are sure about having at
> least one valid toast index at anytime, and these are not concerned
> with CIC.

Julien sent a patch for that, but here are my ideas (which you are free to
reject):

Could you require an AEL for that case, or something which will preclude
reindex table test_toast from working ?

Could you use atomic updates to ensure that exactly one index in an {old,new}
pair is invalid at any given time ?

Could you make the new (invalid) toast index not visible to other transactions?

-- 
Justin Pryzby




Re: explain HashAggregate to report bucket and memory stats

2020-02-22 Thread Justin Pryzby
On Sat, Feb 22, 2020 at 10:53:35PM +0100, Tomas Vondra wrote:
> I've started looking at this patch, because I've been long missing the

Thanks for looking

I have brief, initial comments before I revisit the patch.

> 3) Almost all executor nodes that are modified to include this new
> instrumentation struct also include TupleHashTable, and the data are
> essentially about the hash table. So my question is why not to include
> this into TupleHashTable - that would mean we don't need to modify any
> executor nodes, and it'd probably simplify code in explain.c too because
> we could simply pass the hashtable.

I considered this.  From 0004 commit message:

|Also, if instrumentation were implemented in simplehash.h, I think every
|insertion or deletion would need to check ->members and ->size (which isn't
|necessary for Agg, but is necessary in the general case, and specifically 
for
|tidbitmap, since it actually DELETEs hashtable entries).  Or else 
simplehash
|would need a new function like UpdateTupleHashStats, which the higher 
level nodes
|would need to call after filling the hashtable or before deleting tuples, 
which
|seems to defeat the purpose of implementing stats at a lower layer.

> 4) The one exception to (3) is BitmapHeapScanState, which does include
> TIDBitmap and not TupleHashTable. And then we have tbm_instrumentation
> which "fakes" the data based on the pagetable. Maybe this is a sign that
> TIDBitmap needs a slightly different struct?

Hm, I'd say that it "collects" the data that's not immediately present, not
fake it.  But maybe I did it poorly.  Also, maybe TIDBitmap shouldn't be
included in the patch..

> Also, I'm not sure why we
> actually need tbm_instrumentation()? It just copies the instrumentation
> data from TIDBitmap into the node level, but why couldn't we just look
> at the instrumentation data in TIDBitmap directly?

See 0004 commit message:

|TIDBitmap is a private structure, so add an accessor function to return its
|instrumentation, and duplicate instrumentation struct in BitmapHeapState.

Also, I don't know what anyone else thinks, but I think 0005 is a throwaway
commit.  It's implemented more nicely in execGrouping.c.

> But it's definitely strange that we only print memory info in verbose mode -
> IMHO it's much more useful info than the number of buckets etc.

Because I wanted to be able to put "explain analyze" into regression tests
(which can show: "Buckets: 4 (originally 2)").  But cannot get stable output
for any plan which uses Sort, without hacks like explain_sq_limit and
explain_parallel_sort_stats.

Actually, I wish there were a way to control Sort nodes' Memory/Disk output,
too.  I'm sure most of regression tests were meant to be run as explain(analyze 
NO),
but it'd be much better if analyze YES were reasonably easy in the general
case that might include Sort.  If someone seconds that, I will start a separate
thread.

-- 
Justin Pryzby




v12 "won't fix" item regarding memory leak in "ATTACH PARTITION without AEL"; (or, relcache ref counting)

2020-02-23 Thread Justin Pryzby
This links to a long thread, from which I've tried to quote some of the
most important mails, below.
https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items#Won.27t_Fix

I wondered if there's an effort to pursue a resolution for v13 ?

On Fri, Apr 12, 2019 at 11:42:24AM -0400, Tom Lane wrote in 
<31027.1555083...@sss.pgh.pa.us>:
> Michael Paquier  writes:
> > On Wed, Apr 10, 2019 at 05:03:21PM +0900, Amit Langote wrote:
> >> The problem lies in all branches that have partitioning, so it should be
> >> listed under Older Bugs, right?  You may have noticed that I posted
> >> patches for all branches down to 10.
> 
> > I have noticed.  The message from Tom upthread outlined that an open
> > item should be added, but this is not one.  That's what I wanted to
> > emphasize.  Thanks for making it clearer.
> 
> To clarify a bit: there's more than one problem here.  Amit added an
> open item about pre-existing leaks related to rd_partcheck.  (I'm going
> to review and hopefully push his fix for that today.)  However, there's
> a completely separate leak associated with mismanagement of rd_pdcxt,
> as I showed in [1], and it doesn't seem like we have consensus about
> what to do about that one.  AFAIK that's a new bug in 12 (caused by
> 898e5e329) and so it ought to be in the main open-items list.
> 
> This thread has discussed a bunch of possible future changes like
> trying to replace copying of relcache-provided data structures
> with reference-counting, but I don't think any such thing could be
> v12 material at this point.  We do need to fix the newly added
> leak though.
> 
>   regards, tom lane
> 
> [1] https://www.postgresql.org/message-id/10797.1552679128%40sss.pgh.pa.us
> 
> 

On Fri, Mar 15, 2019 at 05:41:47PM -0400, Robert Haas wrote in 
:
> On Fri, Mar 15, 2019 at 3:45 PM Tom Lane  wrote:
> > More to the point, we turned *one* rebuild = false situation into
> > a bunch of rebuild = true situations.  I haven't studied it closely,
> > but I think a CCA animal would probably see O(N^2) rebuild = true
> > invocations in a query with N partitions, since each time
> > expand_partitioned_rtentry opens another child table, we'll incur
> > an AcceptInvalidationMessages call which leads to forced rebuilds
> > of the previously-pinned rels.  In non-CCA situations, almost always
> > nothing happens with the previously-examined relcache entries.
> 
> That's rather unfortunate.  I start to think that clobbering the cache
> "always" is too hard a line.
> 
> > I agree that copying data isn't great.  What I don't agree is that this
> > solution is better.  In particular, copying data out of the relcache
> > rather than expecting the relcache to hold still over long periods
> > is the way we've done things everywhere else, cf RelationGetIndexList,
> > RelationGetStatExtList, RelationGetIndexExpressions,
> > RelationGetIndexPredicate, RelationGetIndexAttrBitmap,
> > RelationGetExclusionInfo, GetRelationPublicationActions.  I don't care
> > for a patch randomly deciding to do things differently on the basis of an
> > unsupported-by-evidence argument that it might cost too much to copy the
> > data.  If we're going to make a push to reduce the amount of copying of
> > that sort that we do, it should be a separately (and carefully) designed
> > thing that applies to all the relcache substructures that have the issue,
> > not one-off hacks that haven't been reviewed thoroughly.
> 
> That's not an unreasonable argument.  On the other hand, if you never
> try new stuff, you lose opportunities to explore things that might
> turn out to be better and worth adopting more widely.
> 
> I am not very convinced that it makes sense to lump something like
> RelationGetIndexAttrBitmap in with something like
> RelationGetPartitionDesc.  RelationGetIndexAttrBitmap is returning a
> single Bitmapset, whereas the data structure RelationGetPartitionDesc
> is vastly larger and more complex.  You can't say "well, if it's best
> to copy 32 bytes of data out of the relcache every time we need it, it
> must also be right to copy 10k or 100k of data out of the relcache
> every time we need it."
> 
> There is another difference as well: there's a good chance that
> somebody is going to want to mutate a Bitmapset, whereas they had
> BETTER NOT think that they can mutate the PartitionDesc.  So returning
> an uncopied Bitmapset is kinda risky in a way that returning an
> uncopied PartitionDesc is not.
> 
> If we want an at-least-somewhat unified solution here, I think we need
> to bite the bullet and make the planner hold a reference to the
> relcache throughout planning.  (The executor already keeps it open, I
> believe.) Otherwise, how is the relcache supposed to know when it can
> throw stuff away and when it can't?  The only alternative seems to be
> to have each subsystem hold its own reference count, as I did with the
> PartitionDirectory stuff, which is not something we'd want to scale
> out.
> 
> > I especially don't care 

Re: error context for vacuum to include block number

2020-02-13 Thread Justin Pryzby
On Thu, Feb 13, 2020 at 02:55:53PM +0900, Masahiko Sawada wrote:
> You need to add a newline to follow the limit line lengths so that the
> code is readable in an 80-column window. Or please run pgindent.

For now I :set tw=80

> 2.
> I think that making initialization process of errcontext argument a
> function is good. But maybe we can merge these two functions into one.

Thanks, this is better, and I used that.

> init_error_context_heap and init_error_context_index actually don't
> only initialize the callback arguments but also push the vacuum
> errcallback, in spite of the function name having 'init'. Also I think
> it might be better to only initialize the callback arguments in this
> function and to set errcallback by caller, rather than to wrap pushing
> errcallback by a function.

However I think it's important not to repeat this 4 times:
errcallback->callback = vacuum_error_callback;
errcallback->arg = errcbarg;
errcallback->previous = error_context_stack;
error_context_stack = errcallback;

So I kept the first 3 of those in the function and copied only assignment to
the global.  That helps makes the heap scan function clear, which assigns to it
twice.

BTW, for testing, I'm able to consistently hit the "vacuuming block" case like
this:

SET statement_timeout=0; DROP TABLE t; CREATE TABLE t(i int); CREATE INDEX ON 
t(i); INSERT INTO t SELECT generate_series(1,9); UPDATE t SET i=i-1; SET 
statement_timeout=111;  SET vacuum_cost_delay=3; SET vacuum_cost_page_dirty=0; 
SET vacuum_cost_page_hit=11; SET vacuum_cost_limit=33; SET 
statement_timeout=; VACUUM VERBOSE t;

Thanks for re-reviewing.

-- 
Justin
>From 5b8cad37244cdc310d78719b64ff44a464910598 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v18] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 120 +++
 1 file changed, 120 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..209f483 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname; /* undefined while not processing index */
+	BlockNumber blkno;	/* undefined while not processing heap */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_error_context(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_error_context(, , onerel, PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +908,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1006,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1033,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_

Re: explain HashAggregate to report bucket and memory stats

2020-02-15 Thread Justin Pryzby
On Mon, Feb 03, 2020 at 06:53:01AM -0800, Andres Freund wrote:
> On 2020-01-03 10:19:26 -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the 
> > > Aggregate
> > > node would report...anything.
> 
> Yea, that'd be amazing. It probably should be something every
> execGrouping.c using node can opt into.

Do you think it should be implemented in execGrouping/TupleHashTableData (as I
did) ?  I also did an experiment moving into the higher level nodes, but I
guess that's not actually desirable.  There's currently different output from
tests between the implementation using execGrouping.c and the one outside it,
so there's at least an issue with grouping sets.

> > +   hashtable->hinstrument.nbuckets_original = nbuckets;
> > +   hashtable->hinstrument.nbuckets = nbuckets;
> > +   hashtable->hinstrument.space_peak = entrysize * 
> > hashtable->hashtab->size;
> 
> That's not actually an accurate accounting of memory, because for filled
> entries a lot of memory is used to store actual tuples:

Thanks - I think I finally understood this.

I updated some existing tests to show the new output.  I imagine that's a
throwaway commit, and should eventually add new tests for each of these node
types under explain analyze.

I've been testing the various nodes like:

--heapscan:
DROP TABLE t; CREATE TABLE t (i int unique) WITH(autovacuum_enabled=off); 
INSERT INTO t SELECT generate_series(1,9); SET enable_seqscan=off; SET 
parallel_tuple_cost=0; SET parallel_setup_cost=0; SET enable_indexonlyscan=off; 
explain analyze verbose SELECT * FROM t WHERE i BETWEEN 999 and ;

--setop:
explain( analyze,verbose) SELECT * FROM generate_series(1,999) EXCEPT (SELECT 
NULL UNION ALL SELECT * FROM generate_series(1,9));
   Buckets: 2048 (originally 256)  Memory Usage: hashtable: 48kB, tuples: 8Kb

--recursive union:
explain analyze verbose WITH RECURSIVE t(n) AS ( SELECT 'foo' UNION SELECT n || 
' bar' FROM t WHERE length(n) < ) SELECT n, n IS OF (text) AS is_text FROM 
t;

--subplan
explain analyze verbose SELECT i FROM generate_series(1,999)i WHERE (i,i) NOT 
IN (SELECT 1,1 UNION ALL SELECT j,j FROM generate_series(1,9)j);
   Buckets: 262144 (originally 131072)  Memory Usage: hashtable: 6144kB, 
tuples: 782Kb
explain analyze verbose select i FROM generate_series(1,999)i WHERE(1,i) NOT in 
(select i,null::int from t) ;

--Agg:
explain (analyze,verbose) SELECT A,COUNT(1) FROM generate_series(1,9)a 
GROUP BY 1;
   Buckets: 262144 (originally 256)  Memory Usage: hashtable: 6144kB, tuples: 
782Kb

explain (analyze, verbose) select i FROM generate_series(1,999)i WHERE(1,1) not 
in (select a,null from (SELECT generate_series(1,9) a)x) ;

explain analyze verbose select * from (SELECT a FROM generate_series(1,99)a)v 
left join lateral (select v.a, four, ten, count(*) from (SELECT b four, 2 ten, 
b FROM generate_series(1,999)b)x group by cube(four,ten)) s on true order by 
v.a,four,ten;

--Grouping sets:
explain analyze verbose   select unique1,
 count(two), count(four), count(ten),
 count(hundred), count(thousand), count(twothousand),
 count(*)
from tenk1 group by grouping sets 
(unique1,twothousand,thousand,hundred,ten,four,two);

-- 
Justin
>From dff7109e4d82fd498ae8493caa0e4c84b0f04c74 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 15 Feb 2020 12:03:11 -0600
Subject: [PATCH v2 1/7] Run some existing tests with explain (ANALYZE)..

..in a separate, earlier patch, to better show what bits are added by later
patches for hashtable instrumentation.
---
 src/test/regress/expected/aggregates.out  |  20 +-
 src/test/regress/expected/groupingsets.out| 298 ++
 src/test/regress/expected/select_parallel.out |  20 +-
 src/test/regress/expected/subselect.out   |  69 ++
 src/test/regress/expected/union.out   |  71 +++---
 src/test/regress/sql/aggregates.sql   |   2 +-
 src/test/regress/sql/groupingsets.sql |  44 ++--
 src/test/regress/sql/select_parallel.sql  |   4 +-
 src/test/regress/sql/subselect.sql|  25 +++
 src/test/regress/sql/union.sql|   6 +-
 10 files changed, 341 insertions(+), 218 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f457b5b..b3dcbaa 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2342,18 +2342,20 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
 -- Make sure that generation of HashAggregate for uniqificat

Re: Make ringbuffer threshold and ringbuffer sizes configurable?

2020-02-19 Thread Justin Pryzby
On Wed, Feb 05, 2020 at 08:00:26PM -0800, Andres Freund wrote:
> I think it would make sense to have seqscan_ringbuffer_threshold,
> {bulkread,bulkwrite,vacuum}_ringbuffer_size.

I suggest the possibility of somehow forcing a ringbuffer for nonbulk writes
for the current session.

In our use-case, we have loader processes INSERTing data using prepared
statements, UPSERT, and/or multiple VALUES(),() lists.  Some of that data will
be accessed in the near future (15min-24hr) but some parts (large parts, even)
may never be accessed.  I imagine most of the buffer pages never get
usagecount > 0 before being evicted.

I think it'd still be desirable to make the backend do write() its own dirty
buffers to the OS, rather than leaving behind large numbers of dirty buffers
for another backend to deal with, since that *could* be a customer facing
report.  I'd prefer the report run 10% faster due to rarely hitting dirty
buffer (by avoiding the need to write out lots of someone elses data), than the
loaders to run 25% slower, due to constantly writing to the OS.

The speed of loaders is not something our customers would be concerned with.
It's okay if they are slower than they might be.  They need to keep up with
incoming data, but it'd rarely matter if we load a 15min interval of data in
5min instead of in 4.

We would use copy if we could, to get ring buffer during writes.  But cannot
due to UPSERT (and maybe other reasons).  

I have considered the possibility of loading data into a separate instance with
small (or in any case separate) shared_buffers and then tranferring its data to
a customer-facing report instance using pg_restore (COPY)...but the overhead to
maintain that would be significant for us (me).

-- 
Justin




Re: assert pg_class.relnatts is consistent

2020-02-13 Thread Justin Pryzby
On Thu, Feb 13, 2020 at 04:51:01PM +0900, Amit Langote wrote:
> On Thu, Feb 13, 2020 at 3:23 AM Justin Pryzby  wrote:
> > Forking this thread for two tangential patches which I think are more
> > worthwhile than the original topic's patch.
> > https://www.postgresql.org/message-id/20200207143935.GP403%40telsasoft.com
> >
> > Is there a better place to implement assertion from 0002 ?
> 
> I would think the answer to that would be related to the answer of why
> you think we need this assert in the first place?
> 
> I know I have made the mistake of not updating relnatts when I added
> relispartition, etc. to pg_class, only to be bitten by it in the form
> of seemingly random errors/crashes.  Is that why?

Right.  If adding or removing a column from pg_class (or others) it's necessary
not only to add the column in the .h file, and update references like Anum_*,
but also to update that catalog's own pg_class.relnatts in pg_class.dat.

On the other thead, Alvaro agreed it might be worth experimenting with moving
"indisclustered" from boolean in pg_index to an Oid in pg_class.  There's not
many references to it, so I was able to make most of the necessary changes
within an hour .. but spent some multiple of that tracing the crash in initdb,
which I would prefer to have failed less obscurely.

-- 
Justin




Re: bitmaps and correlation

2020-01-12 Thread Justin Pryzby
On Mon, Jan 06, 2020 at 11:26:06PM -0600, Justin Pryzby wrote:
> As Jeff has pointed out, high correlation has two effects in cost_index():
> 1) the number of pages read will be less;
> 2) the pages will be read more sequentially;
> 
> cost_index reuses the pages_fetched variable, so (1) isn't particularly clear,

I tried to make this more clear in 0001

> +   cost_per_page_corr = spc_random_page_cost -
> +   (spc_random_page_cost - spc_seq_page_cost)
> +   * (1-correlation*correlation);

And fixed bug: this should be c*c not 1-c*c.
>From d0819177ef1c6f86a588e3d2700ecff638f83b4a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Jan 2020 19:23:51 -0600
Subject: [PATCH v4 1/2] Make more clear the computation of min/max IO..

..and specifically the double use and effect of correlation.

Avoid re-use of the "pages_fetched" variable
---
 src/backend/optimizer/path/costsize.c | 47 +++
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033..bdc23a0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -491,12 +491,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 csquared;
 	double		spc_seq_page_cost,
 spc_random_page_cost;
-	Cost		min_IO_cost,
+	double		min_pages_fetched,	/* The min and max page count based on index correlation */
+max_pages_fetched;
+	Cost		min_IO_cost,	/* The min and max cost based on index correlation */
 max_IO_cost;
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
 	double		tuples_fetched;
-	double		pages_fetched;
 	double		rand_heap_pages;
 	double		index_pages;
 
@@ -579,7 +580,8 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	 * (just after a CLUSTER, for example), the number of pages fetched should
 	 * be exactly selectivity * table_size.  What's more, all but the first
 	 * will be sequential fetches, not the random fetches that occur in the
-	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * uncorrelated case (the index is expected to read fewer pages, *and* each
+	 * page read is cheaper).  So if the number of pages is more than 1, we
 	 * ought to charge
 	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
 	 * For partially-correlated indexes, we ought to charge somewhere between
@@ -604,17 +606,17 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * pro-rate the costs for one scan.  In this case we assume all the
 		 * fetches are random accesses.
 		 */
-		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+		max_pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
 			baserel->pages,
 			(double) index->pages,
 			root);
 
 		if (indexonly)
-			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
+			max_pages_fetched = ceil(max_pages_fetched * (1.0 - baserel->allvisfrac));
 
-		rand_heap_pages = pages_fetched;
+		rand_heap_pages = max_pages_fetched;
 
-		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+		max_IO_cost = (max_pages_fetched * spc_random_page_cost) / loop_count;
 
 		/*
 		 * In the perfectly correlated case, the number of pages touched by
@@ -626,17 +628,17 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * where such a plan is actually interesting, only one page would get
 		 * fetched per scan anyway, so it shouldn't matter much.)
 		 */
-		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+		min_pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
 
-		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+		min_pages_fetched = index_pages_fetched(min_pages_fetched * loop_count,
 			baserel->pages,
 			(double) index->pages,
 			root);
 
 		if (indexonly)
-			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
+			min_pages_fetched = ceil(min_pages_fetched * (1.0 - baserel->allvisfrac));
 
-		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+		min_IO_cost = (min_pages_fetched * spc_random_page_cost) / loop_count;
 	}
 	else
 	{
@@ -644,30 +646,31 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * Normal case: apply the Mackert and Lohman formula, and then
 		 * interpolate between that and the correlation-derived result.
 		 */
-		pages_fetched = index_pages_fetched(tuples_fetched,
+
+		/* For the perfectly uncorrelated case (csquared=0) */
+		max_pages_fetched = index_pages_fetched(tuples_fetched,
 			baserel->pages,
 			(double) index->pages,
 			root);
 
 		if (indexonly)
-			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
+			max_pages_fetched = ceil(max_pages_

Re: vacuum verbose detail logs are unclear; log at *start* of each stage; show allvisible/frozen/hintbits

2020-01-12 Thread Justin Pryzby
On Sun, Dec 29, 2019 at 01:15:24PM -0500, Jeff Janes wrote:
> On Fri, Dec 20, 2019 at 12:11 PM Justin Pryzby  wrote:
> 
> > This is a usability complaint.  If one knows enough about vacuum and/or
> > logging, I'm sure there's no issue.
> 
> > | 11  DEBUG:  "t": found 999 removable, 999 nonremovable row versions in 9 
> > out of 9 pages
> 
> I agree the mixture of pre-action and after-action reporting is rather
> confusing sometimes.  I'm more concerned about what the user sees in their
> terminal, though, rather than the server's log file.

Sorry, I ran vacuum (not verbose) with client_min_messages=debug, which was 
confusing.

> Also, the above quoted line is confusing.  It makes it sound like it found
> removable items, but didn't actually remove them.  I think that that is
> taking grammatical parallelism too far.  How about something like:
> 
> DEBUG:  "t": removed 999 row versions, found 999 nonremovable row versions in 
> 9 out of 9 pages

Since da4ed8bf, lazy_vacuum_heap() actually says: "removed %d [row versions] in
%d pages".  Strangely, the "found .. removable, .. nonremovable" in
lazy_scan_heap() is also from da4ed8bf.  Should we change them to match ?

> Also, I'd appreciate a report on how many hint-bits were set
> and how many pages were marked all-visible and/or frozen.

Possibly should fork this part to a different thread, but..
hint bits are being set by heap_prune_chain():

|#0  HeapTupleSatisfiesVacuum (htup=htup@entry=0x7fffabf0, 
OldestXmin=OldestXmin@entry=536, buffer=buffer@entry=167) at 
heapam_visibility.c:1245
|#1  0x7fb6eb3eb848 in heap_prune_chain (prstate=0x7fffabfccf30, 
OldestXmin=536, rootoffnum=1, buffer=167, relation=0x7fb6eb1e6858) at 
pruneheap.c:488
|#2  heap_page_prune (relation=relation@entry=0x7fb6eb1e6858, 
buffer=buffer@entry=167, OldestXmin=536, report_stats=report_stats@entry=false, 
latestRemovedXid=latestRemovedXid@entry=0x7fb6ed84a13c) at pruneheap.c:223
|#3  0x7fb6eb3f02a2 in lazy_scan_heap (aggressive=false, nindexes=0, 
Irel=0x0, vacrelstats=0x7fb6ed84a0c0, params=0x7fffabfcdfd0, 
onerel=0x7fb6eb1e6858) at vacuumlazy.c:970
|#4  heap_vacuum_rel (onerel=0x7fb6eb1e6858, params=0x7fffabfcdfd0, 
bstrategy=) at vacuumlazy.c:302

In the attached, I moved heap_page_prune to avoid a second loop over items.
Then, initdb crashed until I avoided calling heap_prepare_freeze_tuple() for
HEAPTUPLE_DEAD.  I'm not sure that's ok or maybe if it's exposing an issue.
I'm also not sure if t_infomask!=oldt_infomask is the right test.

One of my usability complaints was that the DETAIL includes newlines, which
makes it not apparent that it's detail, or that it's associated with the
preceding INFO.  Should those all be separate DETAIL messages (currently, only
the first errdetail is used, but maybe they should be catted together
usefully).  Should errdetail do something with newlines, like change them to
\n\t for output to the client (but not logfile).  Should vacuum itself do
something (but probably no change to logfiles).

I remembered that log_statement_stats looks like this:

2020-01-01 11:28:33.758 CST [3916] LOG:  EXECUTOR STATISTICS
2020-01-01 11:28:33.758 CST [3916] DETAIL:  ! system usage stats:
!   0.050185 s user, 0.000217 s system, 0.050555 s elapsed
!   [2.292346 s user, 0.215656 s system total]
[...]


It calls errdetail_internal("%s", str.data), same as vaccum, but the multi-line
detail messages are written like this:
|appendStringInfo(, "!\t...")
|...
|ereport(LOG,
|   (errmsg_internal("%s", title),
|   errdetail_internal("%s", str.data)));

Since they can run multiple times, including rusage, and there's not currently
any message shown before their action, I propose that lazy_vacuum_index/heap
should write VACUUM VERBOSE logs at DEBUG level.  Or otherwise show a log
before starting each action, at least those for which it logs completion.

I'm not sure why this one doesn't use get ngettext() ?  Missed at a8d585c0 ?
|appendStringInfo(, _("There were %.0f unused item identifiers.\n"),

Or why this one uses _/gettext() ?  (580ddcec suggests that I'm missing
something?).
|appendStringInfo(, _("%s."), pg_rusage_show());

Anyway, now it looks like this:
postgres=# VACUUM VERBOSE t;
INFO:  vacuuming "pg_temp_3.t"
INFO:  "t": removed 1998 row versions in 5 pages
INFO:  "t": removed 1998, found 999 nonremovable row versions in 9 out of 9 
pages
DETAIL:  ! 0 dead row versions cannot be removed yet, oldest xmin: 4505
!   There were 0 unused item identifiers.
!   Skipped 0 pages due to buffer pins, 0 frozen pages.
!   0 pages are entirely empty.
!   Marked 9 pages all visible, 4 pages frozen.
!   Wrote 1998 hint bits.
!   CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Thanks for your inpu

Re: [PATCH v1] pg_ls_tmpdir to show directories

2020-01-15 Thread Justin Pryzby
On Wed, Jan 15, 2020 at 11:21:36AM +0100, Fabien COELHO wrote:
> I'm trying to think about how to get rid of the strange structure and hacks,
> and the arbitrary looking size 2 array.
> 
> Also the recursion is one step, but I'm not sure why, ISTM it could/should
> go on always?

Because tmpfiles only go one level deep.

> Looking at the code, ISTM that relying on a stack/list would be much cleaner
> and easier to understand. The code could look like:

I'm willing to change the implementation, but only after there's an agreement
about the desired behavior (extra column, one level, etc).

Justin




Re: [PATCH v1] pg_ls_tmpdir to show directories

2020-01-16 Thread Justin Pryzby
On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
> Also, I'm not fully sure why ".*" files should be skipped, maybe it should
> be an option? Or the user can filter it with SQL if it does not want them?

I think if someone wants the full generality, they can do this:

postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 
'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s;
 name | size |  modification  | isdir 
--+--++---
 .foo | 4096 | 2020-01-16 08:57:04-05 | t

In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
WITH x AS (SELECT format('/PG_%s_%s', 
split_part(current_setting('server_version'), '.', 1), catalog_version_no) 
suffix FROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM 
(SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix, 
'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp') FROM y 
WHERE d='pgsql_tmp';

I think changing dotfiles is topic for another patch.
That would also affect pg_ls_dir, and everything else that uses the backing
function pg_ls_dir_files_recurse.  I'd have to ask why not also show . and .. ?

(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
to files matching PG_TEMP_FILE_PREFIX).

Justin




Re: progress report for ANALYZE

2020-01-16 Thread Justin Pryzby
On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> I just pushed this after some small extra tweaks.
> 
> Thanks, Yamada-san, for seeing this to completion!

Find attached minor fixes to docs - sorry I didn't look earlier.

Possibly you'd also want to change the other existing instances of "preparing
to begin".
>From de108e69b5d33c881074b0a04697d7061684f823 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 15 Jan 2020 23:10:29 -0600
Subject: [PATCH v1] Doc review for ANALYZE progress (a166d408)

---
 doc/src/sgml/monitoring.sgml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8b44fb1..10871b7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3525,7 +3525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
Whenever ANALYZE is running, the
pg_stat_progress_analyze view will contain a
-   row for each backend that is currently running that command.  The tables
+   row for each backend currently running ANALYZE.  The tables
below describe the information that will be reported and provide
information about how to interpret it.
   
@@ -3635,7 +3635,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  initializing
  
-   The command is preparing to begin scanning the heap.  This phase is
+   The command is preparing to scan the heap.  This phase is
expected to be very brief.
  
 
@@ -3643,7 +3643,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  acquiring sample rows
  
The command is currently scanning the table given by
-   current_relid to obtain sample rows.
+   relid to obtain sample rows.
  
 
 
@@ -3659,14 +3659,14 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  computing statistics
  
-   The command is computing statistics from the samples rows obtained during
+   The command is computing statistics from the sample rows obtained during
the table scan.
  
 
 
  computing extended statistics
  
-   The command is computing extended statistics from the samples rows obtained
+   The command is computing extended statistics from the sample rows obtained
durring the table scan.
  
 
-- 
2.7.4



should crash recovery ignore checkpoint_flush_after ?

2020-01-18 Thread Justin Pryzby
One of our PG12 instances was in crash recovery for an embarassingly long time
after hitting ENOSPC.  (Note, I first started wroting this mail 10 months ago
while running PG11 after having same experience after OOM).  Running linux.

As I understand, the first thing that happens syncing every file in the data
dir, like in initdb --sync.  These instances were both 5+TB on zfs, with
compression, so that's slow, but tolerable, and at least understandable, and
with visible progress in ps.

The 2nd stage replays WAL.  strace show's it's occasionally running
sync_file_range, and I think recovery might've been several times faster if
we'd just dumped the data at the OS ASAP, fsync once per file.  In fact, I've
just kill -9 the recovery process and edited the config to disable this lest it
spend all night in recovery.

$ sudo strace -p 12564 2>&1 |sed 33q
Process 12564 attached
sync_file_range(0x21, 0x2bba000, 0xa000, 0x2) = 0
sync_file_range(0xb2, 0x2026000, 0x1a000, 0x2) = 0
clock_gettime(CLOCK_MONOTONIC, {7521130, 31376505}) = 0

(gdb) bt
#0  0x0032b2adfe8a in sync_file_range () from /lib64/libc.so.6
#1  0x007454e2 in pg_flush_data (fd=, 
offset=, nbytes=) at fd.c:437
#2  0x007456b4 in FileWriteback (file=, 
offset=41508864, nbytes=16384, wait_event_info=167772170) at fd.c:1855
#3  0x0073dbac in IssuePendingWritebacks (context=0x7ffed45f8530) at 
bufmgr.c:4381
#4  0x0073f1ff in SyncOneBuffer (buf_id=, 
skip_recently_used=, wb_context=0x7ffed45f8530) at 
bufmgr.c:2409
#5  0x0073f549 in BufferSync (flags=6) at bufmgr.c:1991
#6  0x0073f5d6 in CheckPointBuffers (flags=6) at bufmgr.c:2585
#7  0x0050552c in CheckPointGuts (checkPointRedo=535426125266848, 
flags=6) at xlog.c:9006
#8  0x0050cace in CreateCheckPoint (flags=6) at xlog.c:8795
#9  0x00511740 in StartupXLOG () at xlog.c:7612
#10 0x006faaf1 in StartupProcessMain () at startup.c:207

That GUC is intended to reduce latency spikes caused by checkpoint fsync.  But
I think limiting to default 256kB between syncs is too limiting during
recovery, and at that point it's better to optimize for throughput anyway,
since no other backends are running (in that instance) and cannot run until
recovery finishes.  At least, if this setting is going to apply during
recovery, the documentation should mention it (it's a "recovery checkpoint")

See also
4bc0f16 Change default of backend_flush_after GUC to 0 (disabled).
428b1d6 Allow to trigger kernel writeback after a configurable number of writes.




Re: should crash recovery ignore checkpoint_flush_after ?

2020-01-18 Thread Justin Pryzby
On Sat, Jan 18, 2020 at 10:48:22AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2020-01-18 08:08:07 -0600, Justin Pryzby wrote:
> > One of our PG12 instances was in crash recovery for an embarassingly long 
> > time
> > after hitting ENOSPC.  (Note, I first started wroting this mail 10 months 
> > ago
> > while running PG11 after having same experience after OOM).  Running linux.
> > 
> > As I understand, the first thing that happens syncing every file in the data
> > dir, like in initdb --sync.  These instances were both 5+TB on zfs, with
> > compression, so that's slow, but tolerable, and at least understandable, and
> > with visible progress in ps.
> >
> > The 2nd stage replays WAL.  strace show's it's occasionally running
> > sync_file_range, and I think recovery might've been several times faster if
> > we'd just dumped the data at the OS ASAP, fsync once per file.  In fact, 
> > I've
> > just kill -9 the recovery process and edited the config to disable this 
> > lest it
> > spend all night in recovery.
> 
> I'm not quite sure what you mean here with "fsync once per file". The
> sync_file_range doesn't actually issue an fsync, even if sounds like it.

I mean if we didn't call sync_file_range() and instead let the kernel handle
the writes and then fsync() at end of checkpoint, which happens in any case.
I think I'll increase or maybe disable this GUC on our servers and, if needed,
adjust /proc/sys/vm/dirty_*ratio.

> It's ossible that ZFS's compression just does broken things here, I
> don't know.

Or, our settings aren't ideal or recovery is just going to perform poorly for
that.  Which I'm ok with, since it should be rare anyway, and recovery is
unlikely to be a big deal for us.

> > At least, if this setting is going to apply during
> > recovery, the documentation should mention it (it's a "recovery checkpoint")
> 
> That makes sense.

Find attached.
I modified a 2nd sentence since "that" was ambiguous, and could be read to
refer to "stalls".

@@ -2994,17 +2994,19 @@ include_dir 'conf.d'
 Whenever more than this amount of data has been
 written while performing a checkpoint, attempt to force the
 OS to issue these writes to the underlying storage.  Doing so will
 limit the amount of dirty data in the kernel's page cache, reducing
 the likelihood of stalls when an fsync is issued 
at the end of the
 checkpoint, or when the OS writes data back in larger batches in the
-background.  Often that will result in greatly reduced transaction
+background.  This feature will often result in greatly reduced 
transaction
 latency, but there also are some cases, especially with workloads
 that are bigger than , but smaller
 than the OS's page cache, where performance might degrade.  This
 setting may have no effect on some platforms.
+This setting also applies to the checkpoint written at the end of crash
+recovery.
 If this value is specified without units, it is taken as blocks,
 that is BLCKSZ bytes, typically 8kB.
 The valid range is
 between 0, which disables forced writeback,
 and 2MB.  The default is 256kB on
 Linux, 0 elsewhere.  (If BLCKSZ is 
not

What about also updating PS following the last xlog replayed ?
Otherwise it shows "recovering " for the duration of the recovery
checkpoint.

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7628,3 +7628,6 @@ StartupXLOG(void)
else
+   {
+   set_ps_display("recovery checkpoint", false);
CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | 
CHECKPOINT_IMMEDIATE);
+   }

> > 4bc0f16 Change default of backend_flush_after GUC to 0 (disabled).
> 
> FWIW, I still think this is the wrong default, and that it causes our
> users harm.

I have no opinion about the default, but the maximum seems low, as a maximum.
Why not INT_MAX, like wal_writer_flush_after ?

src/include/pg_config_manual.h:#define WRITEBACK_MAX_PENDING_FLUSHES 256

Thanks,
Justin
>From b1fc1b6746e46e51f506ef0995d3ceed9e7f4132 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 18 Jan 2020 13:44:08 -0600
Subject: [PATCH v1 1/2] Document that checkpoint_flush_after applies to
 end-of-recovery checkpoint

---
 doc/src/sgml/config.sgml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a217514..6568bb1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2999,12 +2999,14 @@ include_dir 'conf.d'
 checkpoint, or when the OS writes data back in larger batches in the
 background.  Often that will result in greatly reduc

Re: should crash recovery ignore checkpoint_flush_after ?

2020-01-19 Thread Justin Pryzby
On Sat, Jan 18, 2020 at 03:32:02PM -0800, Andres Freund wrote:
> On 2020-01-19 09:52:21 +1300, Thomas Munro wrote:
> > On Sun, Jan 19, 2020 at 3:08 AM Justin Pryzby  wrote:
> > Does sync_file_range() even do anything for non-mmap'd files on ZFS?
> 
> Good point. Next time it might be worthwhile to use strace -T to see
> whether the sync_file_range calls actually take meaningful time.

> Yea, it requires the pages to be in the pagecache to do anything:

>   if (!mapping_cap_writeback_dirty(mapping) ||
>   !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
>   return 0;

That logic is actually brand new (Sep 23, 2019, linux 5.4)
https://github.com/torvalds/linux/commit/c3aab9a0bd91b696a852169479b7db1ece6cbf8c#diff-fd2d793b8b4760b4887c8c7bbb3451d7

Running a manual CHECKPOINT, I saw stuff like:

sync_file_range(0x15f, 0x1442c000, 0x2000, 0x2) = 0 <2.953956>
sync_file_range(0x15f, 0x1443, 0x4000, 0x2) = 0 <0.006395>
sync_file_range(0x15f, 0x14436000, 0x4000, 0x2) = 0 <0.003859>
sync_file_range(0x15f, 0x1443e000, 0x2000, 0x2) = 0 <0.027975>
sync_file_range(0x15f, 0x14442000, 0x2000, 0x2) = 0 <0.48>

And actually, that server had been running its DB instance on a centos6 VM
(kernel-2.6.32-754.23.1.el6.x86_64), shared with the appserver, to mitigate
another issue last year.  I moved the DB back to its own centos7 VM
(kernel-3.10.0-862.14.4.el7.x86_64), and I cannot see that anymore.
It seems if there's any issue (with postgres or otherwise), it's vastly
mitigated or much harder to hit under modern kernels.

I also found these:
https://github.com/torvalds/linux/commit/23d0127096cb91cb6d354bdc71bd88a7bae3a1d5
 (master v5.5-rc6...v4.4-rc1)
https://github.com/torvalds/linux/commit/ee53a891f47444c53318b98dac947ede963db400
 (master v5.5-rc6...v2.6.29-rc1)

The 2nd commit is maybe the cause of the issue.

The first commit is supposedly too new to explain the difference between the
two kernels, but I'm guessing redhat maybe backpatched it into the 3.10 kernel.

Thanks,
Justin




Re: doc: vacuum full, fillfactor, and "extra space"

2020-01-19 Thread Justin Pryzby
Rebased against 40d964ec997f64227bc0ff5e058dc4a5770a70a9
>From b9f10d21de62354d953e388642fcdfc6d97a4a47 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 26 Dec 2019 18:54:28 -0600
Subject: [PATCH v2] doc: VACUUM FULL: separate paragraph; fillfactor

FILLFACTOR seems to apply here.  Also, "no extra space" was confusing since it
could be read to mean "requires zero extra space", when actually it will always
require extra space, usually approximately the size of the table.
---
 doc/src/sgml/ref/vacuum.sgml | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 404febb..ab1b8c2 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -79,11 +79,16 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 

Re: error context for vacuum to include block number

2020-01-19 Thread Justin Pryzby
Rebased against 40d964ec997f64227bc0ff5e058dc4a5770a70a9

I moved some unrelated patches to a separate thread ("vacuum verbose detail 
logs are unclear")
>From 33c7166e3c8f056a8eb6295ec92fed8c85eda7d6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Dec 2019 14:38:01 -0600
Subject: [PATCH v9 1/3] dedup2: skip_blocks

---
 src/backend/access/heap/vacuumlazy.c | 187 ---
 1 file changed, 84 insertions(+), 103 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b331f4c..9849685 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -660,6 +660,88 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
+ * Return whether skipping blocks or not.
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a single
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that we can't
+ * skip based on the visibility map, either all-visible for a regular scan
+ * or all-frozen for an aggressive scan.  We set it to nblocks if there's
+ * no such block.  We also set up the skipping_blocks flag correctly at
+ * this stage.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static bool
+skip_blocks(Relation onerel, VacuumParams *params, BlockNumber *next_unskippable_block, BlockNumber nblocks, Buffer *vmbuffer, bool aggressive)
+{
+	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+	{
+		while (*next_unskippable_block < nblocks)
+		{
+			uint8		vmstatus;
+
+			vmstatus = visibilitymap_get_status(onerel, *next_unskippable_block,
+vmbuffer);
+			if (aggressive)
+			{
+if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+	break;
+			}
+			else
+			{
+if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+	break;
+			}
+			vacuum_delay_point();
+			++*next_unskippable_block;
+		}
+	}
+
+
+	/*
+	 * We know we can't skip the current block.  But set up
+	 * skipping_blocks to do the right thing at the following blocks.
+	 */
+	if (*next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+		return true;
+	else
+		return false;
+}
+
+/*
  *	lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
@@ -794,78 +876,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	initprog_val[2] = dead_tuples->max_tuples;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-

Re: doc: vacuum full, fillfactor, and "extra space"

2020-01-14 Thread Justin Pryzby
On Fri, Dec 27, 2019 at 11:58:18AM +0100, Fabien COELHO wrote:
>> I started writing this patch to avoid the possibly-misleading phrase: "with 
>> no
>> extra space" (since it's expected to typically take ~2x space, or 1x "extra"
>> space).
>> 
>> But the original phrase "with no extra space" seems to be wrong anyway, since
>> it actually follows fillfactor, so say that.  Possibly should be backpatched.
> 
> Patch applies and compiles.
> 
> Given that the paragraph begins with "Plain VACUUM (without FULL)", it is
> better to have the VACUUM FULL explanations on a separate paragraph, and the

The original patch does that (Fabien agreed when I asked off list)




Re: error context for vacuum to include block number

2020-01-21 Thread Justin Pryzby
On Tue, Jan 21, 2020 at 05:54:59PM -0300, Alvaro Herrera wrote:
> > On Tue, Jan 21, 2020 at 03:11:35PM +0900, Masahiko Sawada wrote:
> > > Some of them conflicts with the current HEAD(62c9b52231). Please rebase 
> > > them.
> > 
> > Sorry, it's due to other vacuum patch in this branch.
> > 
> > New patches won't conflict, except for the 0005, so I renamed it for cfbot.
> > If it's deemed to be useful, I'll make a separate branch for the others.
> 
> I think you have to have some other patches applied before these,
> because in the context lines for some of the hunks there are
> double-slash comments.

And I knew that, so (re)tested that the extracted patches would apply, but it
looks like maybe the patch checker is less smart (or more strict) than git, so
it didn't work after all.

3rd attempt (sorry for the noise).
>From c16989c79fe331a3280bbbfb2dd9c040948cff53 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v12 1/5] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 38 ++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b331f4c..0c4ec7b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -287,8 +287,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used by the error callback */
+	char		*relname;
+	char 		*relnamespace;
+	BlockNumber blkno;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -358,6 +362,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -721,6 +726,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
 
 	pg_rusage_init();
 
@@ -867,6 +873,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = relname;
+	vacrelstats->blkno = InvalidBlockNumber; /* Not known yet */
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) vacrelstats;
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -888,6 +904,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		vacrelstats->blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -985,11 +1003,13 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			}
 
 			/* Work on all the indexes, then the heap */
+			/* Don't use the errcontext handler outside this function */
+			error_context_stack = errcallback.previous;
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
-
 			/* Remove tuples from heap */
 			lazy_vacuum_heap(onerel, vacrelstats);
+			error_context_stack = 
 
 			/*
 			 * Forget the now-vacuumed tuples, and press on, but be careful
@@ -1594,6 +1614,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
@@ -3372,3 +3395,14 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	table_close(onerel, ShareUpdateExclusiveLock);
 	pfree(stats);
 }
+
+/*
+ * Error context callback for errors occurring during vacuum.
+ */
+static void
+vacuum_error_callback(void *arg)
+{
+	LVRelStats *cbarg = arg;
+	errcontext("while scanning block %u of relation \"%s.%s\"",
+			cbarg->blkno, cbarg->relnamespace, cbarg->relname);
+}
-- 
2.7.4

>From 53290bc3f70b9896a0c581fb0ed7eb235840890c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:34:03 -0600
Subject: [PATCH v12 

Re: doc: alter table references bogus table-specific planner parameters

2020-01-21 Thread Justin Pryzby
On Mon, Jan 06, 2020 at 04:33:46AM +, Simon Riggs wrote:
> On Mon, 6 Jan 2020 at 04:13, Justin Pryzby  wrote:
> > > I agree with the sentiment of the third doc change, but your patch removes
> > > the mention of n_distinct, which isn't appropriate.
> >
> > I think it's correct to remove n_distinct there, as it's documented 
> > previously,
> > since e5550d5f.  That's a per-attribute option (not storage) and can't be
> > specified there.
> 
> OK, then agreed.

Attached minimal patch with just this hunk.

https://commitfest.postgresql.org/27/2417/
=> RFC

Justin

(I'm resending in a new thread since it looks like the first message was
somehow sent as a reply to an unrelated thread.)
>From 23873bbf32740b0f78f2102eb615e6a6aa615b8c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 5 Jan 2020 19:39:29 -0600
Subject: [PATCH v2] [doc] alter table references bogus table-specific planner
 parameters

https://commitfest.postgresql.org/27/2417/
Fixes for commit 6f3a13ff
Should backpatch to v10.
---
 doc/src/sgml/ref/alter_table.sgml | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index bae6e6a..67eddfb 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -714,9 +714,7 @@ WITH ( MODULUS numeric_literal, REM
  
   SHARE UPDATE EXCLUSIVE lock will be taken for
   fillfactor, toast and autovacuum storage parameters, as well as the
-  following planner related parameters:
-  effective_io_concurrency, parallel_workers, seq_page_cost,
-  random_page_cost, n_distinct and n_distinct_inherited.
+  parallel_workers planner parameter.
  
 

-- 
2.7.4



Re: error context for vacuum to include block number

2020-01-22 Thread Justin Pryzby
On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> > @@ -966,8 +986,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
> > LVRelStats *vacrelstats,
> > /* Work on all the indexes, then the heap */
> > +   /* Don't use the errcontext handler outside this 
> > function */
> > +   error_context_stack = errcallback.previous;
> > lazy_vacuum_all_indexes(onerel, Irel, indstats,
> > 
> > vacrelstats, lps, nindexes);
> > +   error_context_stack = 
> 
> Alternatively we could push another context for each index inside
> lazy_vacuum_all_indexes(). There's been plenty bugs in indexes
> triggering problems, so that could be worthwhile.

Is the callback for index vacuum useful without a block number?

FYI, I have another patch which would add DEBUG output before each stage, which
would be just as much information, and without needing to use a callback.
It's 0004 here:

https://www.postgresql.org/message-id/20200121134934.GY26045%40telsasoft.com
@@ -1752,9 +1753,12 @@ lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
{
int idx;

-   for (idx = 0; idx < nindexes; idx++)
+   for (idx = 0; idx < nindexes; idx++) {
+   ereport(DEBUG1, (errmsg("\"%s\": vacuuming index",
+   
RelationGetRelationName(Irel[idx];
lazy_vacuum_index(Irel[idx], [idx], 
vacrelstats->dead_tuples,





Re: error context for vacuum to include block number

2020-01-21 Thread Justin Pryzby
On Tue, Jan 21, 2020 at 03:11:35PM +0900, Masahiko Sawada wrote:
> Some of them conflicts with the current HEAD(62c9b52231). Please rebase them.

Sorry, it's due to other vacuum patch in this branch.

New patches won't conflict, except for the 0005, so I renamed it for cfbot.
If it's deemed to be useful, I'll make a separate branch for the others.
>From 9a28281a8f9c82634b263ce9b66b6c0cdfd01b2d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v11 1/6] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 38 ++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1dc5294..985293b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -290,8 +290,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used by the error callback */
+	char		*relname;
+	char 		*relnamespace;
+	BlockNumber blkno;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +365,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -813,6 +818,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
 
 	pg_rusage_init();
 
@@ -892,6 +898,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	next_unskippable_block = 0;
 	skipping_blocks = skip_blocks(onerel, params, _unskippable_block, nblocks, , aggressive);
 
+	/* Setup error traceback support for ereport() */
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = relname;
+	vacrelstats->blkno = InvalidBlockNumber; /* Not known yet */
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) vacrelstats;
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -913,6 +929,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		vacrelstats->blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -979,11 +997,13 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			}
 
 			/* Work on all the indexes, then the heap */
+			/* Don't use the errcontext handler outside this function */
+			error_context_stack = errcallback.previous;
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
-
 			/* Remove tuples from heap */
 			lazy_vacuum_heap(onerel, vacrelstats);
+			error_context_stack = 
 
 			/*
 			 * Forget the now-vacuumed tuples, and press on, but be careful
@@ -1602,6 +1622,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
@@ -3416,3 +3439,14 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	table_close(onerel, ShareUpdateExclusiveLock);
 	pfree(stats);
 }
+
+/*
+ * Error context callback for errors occurring during vacuum.
+ */
+static void
+vacuum_error_callback(void *arg)
+{
+	LVRelStats *cbarg = arg;
+	errcontext("while scanning block %u of relation \"%s.%s\"",
+			cbarg->blkno, cbarg->relnamespace, cbarg->relname);
+}
-- 
2.7.4

>From c7d0859e787a6d5ca2af2758163eb281eacc6d83 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:34:03 -0600
Subject: [PATCH v11 2/6] add errcontext callback in lazy_vacuum_heap, too

---
 src/backend/access/heap/vacuumlazy.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 985293b..a05c38a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1826,6 +1826,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	int

Re: error context for vacuum to include block number

2020-01-20 Thread Justin Pryzby
On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> This I do not get. I didn't yet fully wake up, so I might just be slow?

It was needlessly cute at the cost of clarity (meant to avoid setting
error_context_stack in lazy_scan_heap and again immediately on its return).

On Mon, Jan 20, 2020 at 11:13:05AM -0800, Andres Freund wrote:
> I was thinking that you could just use LVRelStats.

Done.

On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> Alternatively we could push another context for each index inside
> lazy_vacuum_all_indexes(). There's been plenty bugs in indexes
> triggering problems, so that could be worthwhile.

Did this too, although I'm not sure what kind of errors it'd find (?)

I considered elimating other uses of RelationGetRelationName, or looping over
vacrelstats->blkno instead of local blkno.  I did that in an additional patch
(that will cause conflicts if you try to apply it, due to other vacuum patch in
this branch).

CREATE TABLE t AS SELECT generate_series(1,9)a;

postgres=# SET client_min_messages=debug;SET statement_timeout=39; VACUUM 
(VERBOSE, PARALLEL 0) t;
INFO:  vacuuming "public.t"
2020-01-20 15:46:14.993 CST [20056] ERROR:  canceling statement due to 
statement timeout
2020-01-20 15:46:14.993 CST [20056] CONTEXT:  while scanning block 211 of 
relation "public.t"
2020-01-20 15:46:14.993 CST [20056] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) t;
ERROR:  canceling statement due to statement timeout
CONTEXT:  while scanning block 211 of relation "public.t"

SELECT 'CREATE INDEX ON t(a)' FROM generate_series(1,11);\gexec
UPDATE t SET a=a+1;

postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
(VERBOSE, PARALLEL 0) t;
INFO:  vacuuming "public.t"
DEBUG:  "t_a_idx": vacuuming index
2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
statement timeout
2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
"public.t_a_idx"
2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) t;
ERROR:  canceling statement due to statement timeout
CONTEXT:  while vacuuming relation "public.t_a_idx"

I haven't found a good way of exercizing the "vacuuming heap" path, though.
>From 44f5eaaef66c570395a9af2bdbe74943c9163c4d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v10 1/5] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 38 ++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bd2e7fb..2eb3caa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -290,8 +290,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used by the error callback */
+	char		*relname;
+	char 		*relnamespace;
+	BlockNumber blkno;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +365,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -813,6 +818,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
 
 	pg_rusage_init();
 
@@ -892,6 +898,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	next_unskippable_block = 0;
 	skipping_blocks = skip_blocks(onerel, params, _unskippable_block, nblocks, , aggressive);
 
+	/* Setup error traceback support for ereport() */
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = relname;
+	vacrelstats->blkno = InvalidBlockNumber; /* Not known yet */
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) vacrelstats;
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -913,6 +929,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		vacrelstats->blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -979,11 +997

avoid some calls to memset with array initializer

2020-01-02 Thread Justin Pryzby
Is there any appetite for use of array initializer rather than memset, as in
attached ?  So far, I only looked for "memset.*null", and I can't see that any
of these are hot paths, but saves a cycle or two and a line of code for each.

gcc 4.9.2 with -O2 emits smaller code with array initializer than with inlined
call to memset.

$ wc -l contrib/pageinspect/heapfuncs.S? 
 22159 contrib/pageinspect/heapfuncs.S0
 22011 contrib/pageinspect/heapfuncs.S1

Also true of gcc 5.4.  And 7.3:

 25294 contrib/pageinspect/heapfuncs.S0
 25234 contrib/pageinspect/heapfuncs.S1
>From 5117e66043b6c8c66c2f98fcd99fdaefec66f90e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 17:30:36 -0600
Subject: [PATCH v1 1/2] Avoid some calls to memset..

..in cases where that saves a couple lines of code.
Note that gcc has builtin for memset, but inlined function is still not same as
initializing to zero.

That should probably be faster, since the local storage can be zerod by the
compiler during stack manipulation, and could possibly allow for
additional optimization, too.
---
 contrib/pageinspect/ginfuncs.c | 12 +++-
 contrib/pageinspect/heapfuncs.c|  4 +---
 contrib/pageinspect/rawpage.c  |  4 +---
 contrib/pgstattuple/pgstatapprox.c |  4 +---
 src/backend/catalog/pg_collation.c |  4 +---
 src/backend/catalog/pg_db_role_setting.c   |  4 +---
 src/backend/catalog/pg_depend.c|  4 +---
 src/backend/catalog/pg_enum.c  |  6 ++
 src/backend/catalog/pg_inherits.c  |  4 +---
 src/backend/catalog/pg_range.c |  4 +---
 src/backend/catalog/pg_shdepend.c  |  8 ++--
 src/backend/commands/event_trigger.c   |  3 +--
 src/backend/commands/indexcmds.c   |  3 +--
 src/backend/commands/seclabel.c| 12 
 src/backend/commands/sequence.c| 12 +++-
 src/backend/commands/trigger.c |  4 +---
 src/backend/commands/tsearchcmds.c |  3 +--
 src/backend/replication/logical/logicalfuncs.c |  3 +--
 src/backend/replication/slotfuncs.c|  8 ++--
 src/backend/replication/walsender.c|  3 +--
 src/backend/statistics/mcv.c   |  5 +
 src/backend/utils/adt/genfile.c|  4 +---
 22 files changed, 32 insertions(+), 86 deletions(-)

diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
index 4b623fb..d9590bd 100644
--- a/contrib/pageinspect/ginfuncs.c
+++ b/contrib/pageinspect/ginfuncs.c
@@ -40,7 +40,7 @@ gin_metapage_info(PG_FUNCTION_ARGS)
 	GinMetaPageData *metadata;
 	HeapTuple	resultTuple;
 	Datum		values[10];
-	bool		nulls[10];
+	bool		nulls[10] = {0,};
 
 	if (!superuser())
 		ereport(ERROR,
@@ -63,8 +63,6 @@ gin_metapage_info(PG_FUNCTION_ARGS)
 
 	metadata = GinPageGetMeta(page);
 
-	memset(nulls, 0, sizeof(nulls));
-
 	values[0] = Int64GetDatum(metadata->head);
 	values[1] = Int64GetDatum(metadata->tail);
 	values[2] = Int32GetDatum(metadata->tailFreeSize);
@@ -95,7 +93,7 @@ gin_page_opaque_info(PG_FUNCTION_ARGS)
 	GinPageOpaque opaq;
 	HeapTuple	resultTuple;
 	Datum		values[3];
-	bool		nulls[3];
+	bool		nulls[3] = {0,};
 	Datum		flags[16];
 	int			nflags = 0;
 	uint16		flagbits;
@@ -139,8 +137,6 @@ gin_page_opaque_info(PG_FUNCTION_ARGS)
 		flags[nflags++] = DirectFunctionCall1(to_hex32, Int32GetDatum(flagbits));
 	}
 
-	memset(nulls, 0, sizeof(nulls));
-
 	values[0] = Int64GetDatum(opaq->rightlink);
 	values[1] = Int32GetDatum(opaq->maxoff);
 	values[2] = PointerGetDatum(construct_array(flags, nflags,
@@ -227,14 +223,12 @@ gin_leafpage_items(PG_FUNCTION_ARGS)
 		HeapTuple	resultTuple;
 		Datum		result;
 		Datum		values[3];
-		bool		nulls[3];
+		bool		nulls[3] = {0,};
 		int			ndecoded,
 	i;
 		ItemPointer tids;
 		Datum	   *tids_datum;
 
-		memset(nulls, 0, sizeof(nulls));
-
 		values[0] = ItemPointerGetDatum(>first);
 		values[1] = UInt16GetDatum(cur->nbytes);
 
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index aa7e4b9..5b8ddf3 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -179,13 +179,11 @@ heap_page_items(PG_FUNCTION_ARGS)
 		Datum		result;
 		ItemId		id;
 		Datum		values[14];
-		bool		nulls[14];
+		bool		nulls[14] = {0,};
 		uint16		lp_offset;
 		uint16		lp_flags;
 		uint16		lp_len;
 
-		memset(nulls, 0, sizeof(nulls));
-
 		/* Extract information from the line pointer */
 
 		id = PageGetItemId(page, inter_call_data->offset);
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index a7b0d17..0901429 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -225,7 +225,7 @@ page_header(PG_FUNCTION_ARGS)
 	Datum		result;
 	HeapTuple	tuple;
 	Datum		values[9];
-	bool		nulls[9];
+	bool		nulls[9] = {0,};
 
 	PageHeader	page;
 	XLogRecPtr

infinite histogram bounds and nan (Re: comment regarding double timestamps; and, infinite timestamps and NaN)

2020-01-02 Thread Justin Pryzby
On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
> > On v12, my test gives:
> > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 
> > day', '5 minutes');
> > |INSERT INTO t VALUES('-infinity');
> > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> > |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> > | Seq Scan on t  (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 
> > rows=289 loops=1)
> 
> This is what it should do.  There's only one histogram bucket, and
> it extends down to -infinity, so the conclusion is going to be that
> the WHERE clause excludes all but a small part of the bucket.  This
> is the correct answer based on the available stats; the problem is
> not with the calculation, but with the miserable granularity of the
> available stats.
> 
> > vs patched master:
> > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 
> > day', '5 minutes');
> > |INSERT INTO t VALUES('-infinity');
> > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> > |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> > | Seq Scan on t  (cost=0.00..5.62 rows=146 width=8) (actual 
> > time=0.048..0.444 rows=289 loops=1)
> 
> This answer is simply broken.  You've caused it to estimate half
> of the bucket, which is an insane estimate for the given bucket
> boundaries and WHERE constraint.
> 
> > IMO 146 rows is a reasonable estimate given a single histogram bucket of
> > infinite width,
> 
> No, it isn't.

When using floats, v12 also returns half the histogram:

 DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(0, 99, 1)::float;
 INSERT INTO t VALUES('-Infinity');
 ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
 explain analyze SELECT * FROM t WHERE t>='50';
 Seq Scan on t  (cost=0.00..2.26 rows=51 width=8) (actual time=0.014..0.020 
rows=50 loops=1)

I'm fine if the isnan() logic changes, but the comment indicates it's intended
to be hit for an infinite histogram bound, but that doesn't work for timestamps
(convert_to_scalar() should return (double)INFINITY and not
(double)INT64_MIN/MAX).

On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
> >> Uh, what?  This seems completely wrong to me.  We could possibly
> >> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
> >> I don't really see the point.  They'll compare to other timestamp
> >> values correctly without that, cf timestamp_cmp_internal().
> >> The example you give seems to me to be working sanely, or at least
> >> as sanely as it can given the number of histogram points available,
> >> with the existing code.  In any case, shoving NaNs into the
> >> computation is not going to make anything better.
> 
> > As I see it, the problem is that the existing code tests for isnan(), but
> > infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so 
> > there's
> > absurdly large values being used as if they were isnormal().
> 
> I still say that (1) you're confusing NaN with Infinity, and (2)
> you haven't actually shown that there's a problem to fix.
> These endpoint values are *not* NaNs.

I probably did confuse it while trying to make the behavior match the comment
for timestamps.
The Subject says NAN since isnan(binfrac) is what's supposed to be hit for that
case.

The NAN is intended to come from:

|binfrac = (val - low) / (high - low);

which is some variation of -inf / inf.

Justin




Re: error context for vacuum to include block number

2020-01-02 Thread Justin Pryzby
On Thu, Dec 26, 2019 at 09:57:04AM -0600, Justin Pryzby wrote:
> So rebasified against your patch.

Rebased against your patch in master this time.
>From dadb8dff6ea929d78f3695f606de9ade7674b7a1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 27 Nov 2019 20:07:10 -0600
Subject: [PATCH v8 1/5] Rename buf to avoid shadowing buf of another type

---
 src/backend/access/heap/vacuumlazy.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a5fe904..de8a89f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -520,7 +520,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	BlockNumber next_unskippable_block;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
-	StringInfoData buf;
+	StringInfoData sbuf;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -1435,33 +1435,33 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
-	initStringInfo();
-	appendStringInfo(,
+	initStringInfo();
+	appendStringInfo(,
 	 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 	 nkeep, OldestXmin);
-	appendStringInfo(, _("There were %.0f unused item identifiers.\n"),
+	appendStringInfo(, _("There were %.0f unused item identifiers.\n"),
 	 nunused);
-	appendStringInfo(, ngettext("Skipped %u page due to buffer pins, ",
+	appendStringInfo(, ngettext("Skipped %u page due to buffer pins, ",
 	"Skipped %u pages due to buffer pins, ",
 	vacrelstats->pinskipped_pages),
 	 vacrelstats->pinskipped_pages);
-	appendStringInfo(, ngettext("%u frozen page.\n",
+	appendStringInfo(, ngettext("%u frozen page.\n",
 	"%u frozen pages.\n",
 	vacrelstats->frozenskipped_pages),
 	 vacrelstats->frozenskipped_pages);
-	appendStringInfo(, ngettext("%u page is entirely empty.\n",
+	appendStringInfo(, ngettext("%u page is entirely empty.\n",
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
-	appendStringInfo(, _("%s."), pg_rusage_show());
+	appendStringInfo(, _("%s."), pg_rusage_show());
 
 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
 	RelationGetRelationName(onerel),
 	tups_vacuumed, num_tuples,
 	vacrelstats->scanned_pages, nblocks),
-			 errdetail_internal("%s", buf.data)));
-	pfree(buf.data);
+			 errdetail_internal("%s", sbuf.data)));
+	pfree(sbuf.data);
 }
 
 /*
-- 
2.7.4

>From a2b34f259174137ae9f616d9a8e8a57d5b35c4f4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Dec 2019 14:38:01 -0600
Subject: [PATCH v8 2/5] dedup2: skip_blocks

---
 src/backend/access/heap/vacuumlazy.c | 187 ---
 1 file changed, 84 insertions(+), 103 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index de8a89f..b94e052 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -480,6 +480,88 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
+ * Return whether skipping blocks or not.
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a single
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that we can't
+ * skip based on the visibility map, either all-visible for a regular scan
+ * or all-frozen for an aggressive scan.  We set it to nblocks if there's
+ * no such block.  We also set up the skipping_blocks flag correctly at
+ * this stage.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding

explain HashAggregate to report bucket and memory stats

2020-01-03 Thread Justin Pryzby
On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> What would I find very useful is [...] if the HashAggregate node under
> "explain analyze" would report memory and bucket stats; and if the Aggregate
> node would report...anything.

Find attached my WIP attempt to implement this.

Jeff: can you suggest what details Aggregate should show ?

Justin
>From 5d0afe5d92649f575d9b09ae19b31d2bfd5bfd12 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 1 Jan 2020 13:09:33 -0600
Subject: [PATCH v1 1/2] refactor: show_hinstrument and avoid showing memory
 use if not verbose..

This changes explain analyze at least for Hash(join), but doesn't break
affect regression tests, since they all run explain without analyze, so
nbatch=0, and no stats are shown.

But for future patch to show stats for HashAgg (for which nbatch=1, always), we
want to show buckets in explain analyze, but don't want to show memory, which
is machine-specific.
---
 src/backend/commands/explain.c | 73 ++
 1 file changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 497a3bd..d5eaf15 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -101,6 +101,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_hinstrument(ExplainState *es, HashInstrumentation *h);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -2702,43 +2703,59 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		}
 	}
 
-	if (hinstrument.nbatch > 0)
-	{
-		long		spacePeakKb = (hinstrument.space_peak + 1023) / 1024;
+	show_hinstrument(es, );
+}
 
-		if (es->format != EXPLAIN_FORMAT_TEXT)
-		{
-			ExplainPropertyInteger("Hash Buckets", NULL,
-   hinstrument.nbuckets, es);
-			ExplainPropertyInteger("Original Hash Buckets", NULL,
-   hinstrument.nbuckets_original, es);
-			ExplainPropertyInteger("Hash Batches", NULL,
-   hinstrument.nbatch, es);
-			ExplainPropertyInteger("Original Hash Batches", NULL,
-   hinstrument.nbatch_original, es);
-			ExplainPropertyInteger("Peak Memory Usage", "kB",
-   spacePeakKb, es);
-		}
-		else if (hinstrument.nbatch_original != hinstrument.nbatch ||
- hinstrument.nbuckets_original != hinstrument.nbuckets)
-		{
+/*
+ * Show hash bucket stats and (optionally) memory.
+ */
+static void
+show_hinstrument(ExplainState *es, HashInstrumentation *h)
+{
+	long		spacePeakKb = (h->space_peak + 1023) / 1024;
+
+	if (h->nbatch <= 0)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Hash Buckets", NULL,
+			   h->nbuckets, es);
+		ExplainPropertyInteger("Original Hash Buckets", NULL,
+			   h->nbuckets_original, es);
+		ExplainPropertyInteger("Hash Batches", NULL,
+			   h->nbatch, es);
+		ExplainPropertyInteger("Original Hash Batches", NULL,
+			   h->nbatch_original, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB",
+			   spacePeakKb, es);
+	}
+	else
+	{
+		if (h->nbatch_original != h->nbatch ||
+			 h->nbuckets_original != h->nbuckets) {
 			appendStringInfoSpaces(es->str, es->indent * 2);
 			appendStringInfo(es->str,
-			 "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
-			 hinstrument.nbuckets,
-			 hinstrument.nbuckets_original,
-			 hinstrument.nbatch,
-			 hinstrument.nbatch_original,
-			 spacePeakKb);
+		"Buckets: %d (originally %d)   Batches: %d (originally %d)",
+		h->nbuckets,
+		h->nbuckets_original,
+		h->nbatch,
+		h->nbatch_original);
 		}
 		else
 		{
 			appendStringInfoSpaces(es->str, es->indent * 2);
 			appendStringInfo(es->str,
-			 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
-			 hinstrument.nbuckets, hinstrument.nbatch,
-			 spacePeakKb);
+		"Buckets: %d  Batches: %d",
+		h->nbuckets,
+		h->nbatch);
 		}
+
+		if (es->verbose && es->analyze)
+			appendStringInfo(es->str,
+	"  Memory Usage: %ldkB",
+	spacePeakKb);
+		appendStringInfoChar(es->str, '\n');
 	}
 }
 
-- 
2.7.4

>From d691e492b619e5cc6a1fcd4134728c1c0852d589 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 31 Dec 2019 18:49:41 -0600
Subject: [PATCH v1 2/2] 

allow disabling indexscans without disabling bitmapscans

2020-01-04 Thread Justin Pryzby
Moving to -hackers

I was asking about how to distinguish the index cost component of an indexscan
from the cost of heap.
https://www.postgresql.org/message-id/20200103141427.GK12066%40telsasoft.com

On Fri, Jan 03, 2020 at 09:33:35AM -0500, Jeff Janes wrote:
> > It would help to be able to set enable_bitmapscan=FORCE (to make all index
> > scans go through a bitmap).
> 
> Doesn't enable_indexscan=off accomplish this already?  It is possible but
> not terribly likely to switch from index to seq, rather than from index to
> bitmap.  (Unless the index scan was being used to obtain an ordered result,
> but a hypothetical enable_bitmapscan=FORCE can't fix that).

No, enable_indexscan=off implicitly disables bitmap index scans, since it does:

cost_bitmap_heap_scan():
|startup_cost += indexTotalCost;

But maybe it shouldn't (?)  Or maybe it should take a third value, like
enable_indexscan=bitmaponly, which means what it says.  Actually the name is
confusable with indexonly, so maybe enable_indexscan=bitmap.

A third value isn't really needed anyway; its only utility is that someone
upgrading from v12 who uses enable_indexscan=off (presumably in limited scope)
wouldn't have to also set enable_bitmapscan=off - not a big benefit.

That doesn't affect regress tests at all.

Note, when I tested it, the cost of "bitmap heap scan" was several times higher
than the total cost of indexscan (including heap), even with CPU costs at 0.  I
applied my "bitmap correlation" patch, which seems to gives more reasonable
result.  In any case, the purpose of this patch was primarily diagnostic, and
the heap cost of index scan would be its total cost minus the cost of the
bitmap indexscan node when enable_indexscan=off.  The high cost attributed to
bitmap heapscan is topic for the other patch.

Justin
>From 6ad506879d8a754013b971197592fc9617850b7e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 4 Jan 2020 10:25:12 -0600
Subject: [PATCH v1] allow disabling indexscans but not bitmap scans

---
 src/backend/optimizer/path/costsize.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033..6f37386 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -972,7 +972,12 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 		 loop_count, ,
 		 _fetched);
 
-	startup_cost += indexTotalCost;
+	if (indexTotalCost > disable_cost && enable_bitmapscan)
+		/* enable_indexscan=off no longer itself disables bitmap scans */
+		startup_cost += indexTotalCost - disable_cost;
+	else
+		startup_cost += indexTotalCost;
+
 	T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;
 
 	/* Fetch estimated page costs for tablespace containing table. */
-- 
2.7.4



Re: error context for vacuum to include block number

2019-12-23 Thread Justin Pryzby
On Mon, Dec 16, 2019 at 11:49:56AM +0900, Michael Paquier wrote:
> On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
> > I named it so because it calls both lazy_vacuum_index
> > ("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
> > lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")
> > 
> > I suppose you don't think the other way around is better?
> > lazy_vacuum_index_heap
> 
> Dunno.  Let's see if others have other thoughts on the matter.  FWIW,
> I have a long history at naming things in a way others don't like :)

I renamed.

And deduplicated two more hunks into a 2nd function.

(I'm also including the changes I mentioned here ... in case anyone cares to
comment or review).
https://www.postgresql.org/message-id/20191220171132.GB30414%40telsasoft.com
>From 5317d9f3cee163762563f2255f1dd26800ea858b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 27 Nov 2019 20:07:10 -0600
Subject: [PATCH v6 1/6] Rename buf to avoid shadowing buf of another type

---
 src/backend/access/heap/vacuumlazy.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab09d84..3f52278 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -517,7 +517,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	BlockNumber next_unskippable_block;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
-	StringInfoData buf;
+	StringInfoData sbuf;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -1479,33 +1479,33 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
-	initStringInfo();
-	appendStringInfo(,
+	initStringInfo();
+	appendStringInfo(,
 	 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 	 nkeep, OldestXmin);
-	appendStringInfo(, _("There were %.0f unused item identifiers.\n"),
+	appendStringInfo(, _("There were %.0f unused item identifiers.\n"),
 	 nunused);
-	appendStringInfo(, ngettext("Skipped %u page due to buffer pins, ",
+	appendStringInfo(, ngettext("Skipped %u page due to buffer pins, ",
 	"Skipped %u pages due to buffer pins, ",
 	vacrelstats->pinskipped_pages),
 	 vacrelstats->pinskipped_pages);
-	appendStringInfo(, ngettext("%u frozen page.\n",
+	appendStringInfo(, ngettext("%u frozen page.\n",
 	"%u frozen pages.\n",
 	vacrelstats->frozenskipped_pages),
 	 vacrelstats->frozenskipped_pages);
-	appendStringInfo(, ngettext("%u page is entirely empty.\n",
+	appendStringInfo(, ngettext("%u page is entirely empty.\n",
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
-	appendStringInfo(, _("%s."), pg_rusage_show());
+	appendStringInfo(, _("%s."), pg_rusage_show());
 
 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
 	RelationGetRelationName(onerel),
 	tups_vacuumed, num_tuples,
 	vacrelstats->scanned_pages, nblocks),
-			 errdetail_internal("%s", buf.data)));
-	pfree(buf.data);
+			 errdetail_internal("%s", sbuf.data)));
+	pfree(sbuf.data);
 }
 
 
-- 
2.7.4

>From 8fdee2073e083388865f52874a672e8cc47f7c51 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 19:57:25 -0600
Subject: [PATCH v6 2/6] deduplication

---
 src/backend/access/heap/vacuumlazy.c | 103 +++
 1 file changed, 43 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3f52278..c6dc44b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -153,6 +153,7 @@ static BufferAccessStrategy vac_strategy;
 static void lazy_scan_heap(Relation onerel, VacuumParams *params,
 		   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
 		   bool aggressive);
+static void lazy_vacuum_index_and_heap(Relation onerel, LVRelStats *vacrelstats, Relation *Irel, int nindexes, IndexBulkDeleteResult **indstats);
 static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
@@ -740,12 +741,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
 			vacrelstats->num_dead_tuples > 0)

Re: error context for vacuum to include block number

2019-12-26 Thread Justin Pryzby
On Tue, Dec 24, 2019 at 01:19:09PM +0900, Michael Paquier wrote:
> On Mon, Dec 23, 2019 at 07:24:28PM -0600, Justin Pryzby wrote:
> > I renamed.
> 
> Hmm.  I have found what was partially itching me for patch 0002, and
> that's actually the fact that we don't do the error reporting for heap
> within lazy_vacuum_heap() because the code relies too much on updating
> two progress parameters at the same time, on top of the fact that you
> are mixing multiple concepts with this refactoring.  One problem is
> that if this code is refactored in the future, future callers of
> lazy_vacuum_heap() would miss the update of the progress reporting.
> Splitting things improves also the readability of the code, so
> attached is the refactoring I would do for this portion of the set.
> It is also more natural to increment num_index_scans when the

I agree that's better.
I don't see any reason why the progress params need to be updated atomically.
So rebasified against your patch.
>From 5317d9f3cee163762563f2255f1dd26800ea858b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 27 Nov 2019 20:07:10 -0600
Subject: [PATCH v7 1/6] Rename buf to avoid shadowing buf of another type

---
 src/backend/access/heap/vacuumlazy.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ab09d84..3f52278 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -517,7 +517,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	BlockNumber next_unskippable_block;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
-	StringInfoData buf;
+	StringInfoData sbuf;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -1479,33 +1479,33 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
-	initStringInfo();
-	appendStringInfo(,
+	initStringInfo();
+	appendStringInfo(,
 	 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 	 nkeep, OldestXmin);
-	appendStringInfo(, _("There were %.0f unused item identifiers.\n"),
+	appendStringInfo(, _("There were %.0f unused item identifiers.\n"),
 	 nunused);
-	appendStringInfo(, ngettext("Skipped %u page due to buffer pins, ",
+	appendStringInfo(, ngettext("Skipped %u page due to buffer pins, ",
 	"Skipped %u pages due to buffer pins, ",
 	vacrelstats->pinskipped_pages),
 	 vacrelstats->pinskipped_pages);
-	appendStringInfo(, ngettext("%u frozen page.\n",
+	appendStringInfo(, ngettext("%u frozen page.\n",
 	"%u frozen pages.\n",
 	vacrelstats->frozenskipped_pages),
 	 vacrelstats->frozenskipped_pages);
-	appendStringInfo(, ngettext("%u page is entirely empty.\n",
+	appendStringInfo(, ngettext("%u page is entirely empty.\n",
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
-	appendStringInfo(, _("%s."), pg_rusage_show());
+	appendStringInfo(, _("%s."), pg_rusage_show());
 
 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
 	RelationGetRelationName(onerel),
 	tups_vacuumed, num_tuples,
 	vacrelstats->scanned_pages, nblocks),
-			 errdetail_internal("%s", buf.data)));
-	pfree(buf.data);
+			 errdetail_internal("%s", sbuf.data)));
+	pfree(sbuf.data);
 }
 
 
-- 
2.7.4

>From c04d311358fe483bee4ccdfabf4a83f7f1d978b4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Dec 2019 22:42:28 -0600
Subject: [PATCH v7 2/6] michael dedup

---
 src/backend/access/heap/vacuumlazy.c | 96 
 1 file changed, 43 insertions(+), 53 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3f52278..36c92f8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -158,6 +158,9 @@ static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 			  IndexBulkDeleteResult **stats,
 			  LVRelStats *vacrelstats);
+static void lazy_vacuum_all_indexes(Relation onerel, LVRelStats *vacrelstats,
+	Relation *Irel, int nindexes,
+	IndexBulkDeleteResult **indstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult *stats,
 			   LVRelStats *vacrelstats);
@@ -740,12 +743,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		if ((vacrelstats

Re: ALTER INDEX fails on partitioned index

2019-12-26 Thread Justin Pryzby
On Mon, Jan 07, 2019 at 04:23:30PM -0300, Alvaro Herrera wrote:
> On 2019-Jan-05, Justin Pryzby wrote:
> > postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i);
> > postgres=# CREATE INDEX ON t(i) WITH(fillfactor=11);
> > postgres=# ALTER INDEX t_i_idx SET (fillfactor=12);
> > ERROR:  42809: "t_i_idx" is not a table, view, materialized view, or index
> > LOCATION:  ATWrongRelkindError, tablecmds.c:5031
> > 
> > I can't see that's deliberate,
> 
> Well, I deliberately ignored that aspect of the report at the time as it
> seemed to me (per discussion in thread [1]) that this behavior was
> intentional.  However, if I think in terms of things like
> pages_per_range in BRIN indexes, this decision seems to be a mistake,
> because surely we should propagate that value to children.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAH2-WzkOKptQiE51Bh4_xeEHhaBwHkZkGtKizrFMgEkfUuRRQg%40mail.gmail.com

Possibly attached should be backpatched through v11 ?

This allows SET on the parent index, which is used for newly created child
indexes, but doesn't itself recurse to children.

I noticed recursive "*" doesn't seem to be allowed for "alter INDEX":
postgres=# ALTER INDEX p_i2* SET (fillfactor = 22);
ERROR:  syntax error at or near "*"
LINE 1: ALTER INDEX p_i2* SET (fillfactor = 22);

Also, I noticed this "doesn't fail", but setting is neither recursively applied
nor used for new partitions.

postgres=# ALTER INDEX p_i_idx ALTER COLUMN 1 SET STATISTICS 123;

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581
>From fb05137c2c8a59dcab8fbd25fdee80e976588261 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 26 Dec 2019 21:40:06 -0600
Subject: [PATCH v1] Allow ALTER INDEX SET () on partitioned indexes

---
 src/backend/commands/tablecmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d9f13da..aea84d5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4067,7 +4067,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 		case AT_ResetRelOptions:	/* RESET (...) */
 		case AT_ReplaceRelOptions:	/* reset them all, then set just these */
 
-			ATSimplePermissions(rel, ATT_TABLE | ATT_VIEW | ATT_MATVIEW | ATT_INDEX);
+			ATSimplePermissions(rel, ATT_TABLE | ATT_VIEW | ATT_MATVIEW | ATT_INDEX | ATT_PARTITIONED_INDEX);
 			/* This command never recurses */
 			/* No command-specific prep needed */
 			pass = AT_PASS_MISC;
-- 
2.7.4



doc: vacuum full, fillfactor, and "extra space"

2019-12-26 Thread Justin Pryzby
I started writing this patch to avoid the possibly-misleading phrase: "with no
extra space" (since it's expected to typically take ~2x space, or 1x "extra"
space).

But the original phrase "with no extra space" seems to be wrong anyway, since
it actually follows fillfactor, so say that.  Possibly should be backpatched.

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ec2503d..9757352 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -75,10 +75,16 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 

Re: planner support functions: handle GROUP BY estimates ?

2019-12-26 Thread Justin Pryzby
On Sun, Dec 22, 2019 at 06:16:48PM -0600, Justin Pryzby wrote:
> On Tue, Nov 19, 2019 at 01:34:21PM -0600, Justin Pryzby wrote:
> > Tom implemented "Planner support functions":
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a391ff3c3d418e404a2c6e4ff0865a107752827b
> > https://www.postgresql.org/docs/12/xfunc-optimization.html
> > 
> > I wondered whether there was any consideration to extend that to allow
> > providing improved estimates of "group by".  That currently requires 
> > manually
> > by creating an expression index, if the function is IMMUTABLE (which is not
> > true for eg.  date_trunc of timestamptz).
> 
> I didn't hear back so tried implementing this for date_trunc().  Currently, 
> the

> I currently assume that the input data has 1 second granularity:
...
> If the input timestamps have (say) hourly granularity, rowcount will be
> *underestimated* by 3600x, which is worse than the behavior in master of
> overestimating by (for "day") 24x.
> 
> I'm trying to think of ways to address that:

In the attached, I handled that by using histogram and variable's initial
ndistinct estimate, giving good estimates even for intermediate granularities
of input timestamps.

|postgres=# DROP TABLE IF EXISTS t; CREATE TABLE t(i) AS SELECT a FROM 
generate_series(now(), now()+'11 day'::interval, '15 
minutes')a,generate_series(1,9)b; ANALYZE t;
|
|postgres=# explain analyze SELECT date_trunc('hour',i) i FROM t GROUP BY 1;
| HashAggregate  (cost=185.69..188.99 rows=264 width=8) (actual 
time=42.110..42.317 rows=265 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('minute',i) i FROM t GROUP BY 1;
| HashAggregate  (cost=185.69..198.91 rows=1057 width=8) (actual 
time=41.685..42.264 rows=1057 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('day',i) i FROM t GROUP BY 1;
| HashAggregate  (cost=185.69..185.83 rows=11 width=8) (actual 
time=46.672..46.681 rows=12 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('second',i) i FROM t GROUP BY 1;
| HashAggregate  (cost=185.69..198.91 rows=1057 width=8) (actual 
time=41.816..42.435 rows=1057 loops=1)
>From 772876dbd64ea0b1d2bb28f9ab67f577c4050468 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 15 Dec 2019 20:27:24 -0600
Subject: [PATCH v2 1/2] Planner support functions for GROUP BY f()..

..implemented for date_trunc()

See also a391ff3c3d418e404a2c6e4ff0865a107752827b
---
 src/backend/optimizer/util/plancat.c | 47 +
 src/backend/utils/adt/selfuncs.c | 28 +++
 src/backend/utils/adt/timestamp.c| 97 
 src/include/catalog/catversion.h |  2 +-
 src/include/catalog/pg_proc.dat  | 15 --
 src/include/nodes/nodes.h|  3 +-
 src/include/nodes/supportnodes.h | 17 +++
 src/include/optimizer/plancat.h  |  2 +
 8 files changed, 206 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c15654e..2469ca6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2009,6 +2009,53 @@ get_function_rows(PlannerInfo *root, Oid funcid, Node *node)
 }
 
 /*
+ * Return a multiplier [0..1] to help estimate effect on rowcount of GROUP BY
+ * f(x), relative to input x.
+ */
+double
+get_function_groupby(PlannerInfo *root, Oid funcid, Node *node, Node *var)
+{
+	HeapTuple	proctup;
+	Form_pg_proc procform;
+
+	proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+	if (!HeapTupleIsValid(proctup))
+		elog(ERROR, "cache lookup failed for function %u", funcid);
+	procform = (Form_pg_proc) GETSTRUCT(proctup);
+
+	if (OidIsValid(procform->prosupport))
+	{
+		SupportRequestGroupBy *sresult;
+		SupportRequestGroupBy req;
+
+		req.type = T_SupportRequestGroupBy;
+		req.root = root;
+		req.funcid = funcid;
+		req.node = node;
+		req.var = var;
+		req.factor = 1;			/* just for sanity */
+
+		sresult = (SupportRequestGroupBy *)
+			DatumGetPointer(OidFunctionCall1(procform->prosupport,
+			 PointerGetDatum()));
+
+		if (sresult == )
+		{
+			/* Success */
+			ReleaseSysCache(proctup);
+			return req.factor;
+		}
+	}
+
+	/* XXX No support function, or it failed */
+
+	ReleaseSysCache(proctup);
+
+	return 1;
+}
+
+
+/*
  * has_unique_index
  *
  * Detect whether there is a unique index on the specified attribute
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ff02b5a..eb0b86f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3154,10 +3154,38 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
 		 */
 		foreach(l2, varshere)
 		{
+			double ret;
 			Node	   *var = (Node *) lfirst(l2);
 
 			examine_variable(root, var, 0, );
 			varinfos = add_unique_group_var(root, varinfos, var, );
+
+		

Re: error context for vacuum to include block number (atomic progress update)

2019-12-29 Thread Justin Pryzby
On Sat, Dec 28, 2019 at 07:21:31PM -0500, Robert Haas wrote:
> On Thu, Dec 26, 2019 at 10:57 AM Justin Pryzby  wrote:
> > I agree that's better.
> > I don't see any reason why the progress params need to be updated 
> > atomically.
> > So rebasified against your patch.
> 
> I am not sure whether it's important enough to make a stink about, but
> it bothers me a bit that this is being dismissed as unimportant. The
> problem is that, if the updates are not atomic, then somebody might
> see the data after one has been updated and the other has not yet been
> updated. The result is that when the phase is
> PROGRESS_VACUUM_PHASE_VACUUM_INDEX, someone reading the information
> can't tell whether the number of index scans reported is the number
> *previously* performed or the number performed including the one that
> just finished. The race to see the latter state is narrow, so it
> probably wouldn't come up often, but it does seem like it would be
> confusing if it did happen.

What used to be atomic was this:

-   hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
-   hvp_val[1] = vacrelstats->num_index_scans + 1;

=> switch from PROGRESS_VACUUM_PHASE_VACUUM INDEX to HEAP and increment
index_vacuum_count, which is documented as the "Number of completed index
vacuum cycles."

Now, it 1) increments the number of completed scans; and, 2) then progresses
phase to HEAP, so there's a window where the number of completed scans is
incremented, and it still says VACUUM_INDEX.

Previously, if it said VACUUM_INDEX, one could assume that index_vacuum_count
would increase at least once more, and that's no longer true.  If someone sees
VACUUM_INDEX and some NUM_INDEX_VACUUMS, and then later sees VACUUM_HEAP or
other later stage, with same (maybe final) value of NUM_INDEX_VACUUMS, that's
different than previous behavior.

It seems to me that a someone or their tool monitoring pg_stat shouldn't be
confused by this change, since:
1) there's no promise about how high NUM_INDEX_VACUUMS will or won't go; and, 
2) index_vacuum_count didn't do anything strange like decreasing, or increased
before the scans were done; and,
3) the vacuum can finish at any time, and the monitoring process presumably
knows that when the PID is gone, it's finished, even if it missed intermediate
updates;

The behavior is different from before, but I think that's ok: the number of
scans is accurate, and the PHASE is accurate, even though it'll change a moment
later.

I see there's similar case here:
|/* report all blocks vacuumed; and that we're cleaning up */
|pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
|pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
| PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);

heap_blks_scanned is documented as "Number of heap blocks SCANNED", and it
increments exactly to heap_blks_total.  Would someone be confused if
heap_blks_scanned==heap_blks_total AND phase=='scanning heap' ?  I think they'd
just expect PHASE to be updated a moment later.  (And if it wasn't, I agree they
should then be legitimately confused or concerned).

Actually, the doc says:
|If heap_blks_scanned is less than heap_blks_total, the system will return to
|scanning the heap after this phase is completed; otherwise, it will begin
|cleaning up indexes AFTER THIS PHASE IS COMPLETED.

I read that to mean that it's okay if heap_blks_scanned==heap_blks_total when
scanning/vacuuming heap.

Justin




Re: [PATCH v1] pg_ls_tmpdir to show directories

2019-12-27 Thread Justin Pryzby
Re-added -hackers.

Thanks for reviewing.

On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
> The implementation simply extends an existing functions with a boolean to
> allow for sub-directories. However, the function does not seem to show
> subdir contents recursively. Should it be the case?

> STM that "//"-comments are not project policy.

Sure, but the patch is less important than the design, which needs to be
addressed first.  The goal is to somehow show tmpfiles (or at least dirs) used
by parallel workers.  I mentioned a few possible ways, of which this was the
simplest to implement.  Showing files beneath the dir is probably good, but
need to decide how to present it.  Should there be a column for the dir (null
if not a shared filesets)?  Or some other presentation, like a boolean column
"is_shared_fileset".

> I'm unconvinced by the skipping condition:
> 
>   +  if (!S_ISREG(attrib.st_mode) &&
>   +  (!dir_ok && S_ISDIR(attrib.st_mode)))
> continue;
> 
> which is pretty hard to read. ISTM you meant "not A and not (B and C)"
> instead?

I can write it as two ifs.  And, it's probably better to say:
if (!ISDIR() || !dir_ok)

..which is same as: !(B && C), as you said.

> Catalog update should be a date + number? Maybe this is best left to the
> committer?

Yes, but I preferred to include it, causing a deliberate conflict, to ensure
it's not forgotten.

Thanks,
Justin




comment regarding double timestamps; and, infinite timestamps and NaN

2019-12-29 Thread Justin Pryzby
selfuncs.c convert_to_scalar() says:

|* The several datatypes representing absolute times are all converted
|* to Timestamp, which is actually a double, and then we just use that
|* double value.  Note this will give correct results even for the "special"
|* values of Timestamp, since those are chosen to compare correctly;
|* see timestamp_cmp.

But:
https://www.postgresql.org/docs/10/release-10.html
|Remove support for floating-point timestamps and intervals (Tom Lane)
|This removes configure's --disable-integer-datetimes option. Floating-point 
timestamps have few advantages and have not been the default since PostgreSQL 
8.3.
|b6aa17e De-support floating-point timestamps.
|configure| 18 
++
|configure.in | 12 ++--
|doc/src/sgml/config.sgml |  8 +++-
|doc/src/sgml/datatype.sgml   | 55 
+++
|doc/src/sgml/installation.sgml   | 22 
--
|src/include/c.h  |  7 ---
|src/include/pg_config.h.in   |  4 
|src/include/pg_config.h.win32|  4 
|src/interfaces/ecpg/include/ecpg_config.h.in |  4 
|src/interfaces/ecpg/include/pgtypes_interval.h   |  2 --
|src/interfaces/ecpg/test/expected/pgtypeslib-dt_test2.c  |  6 ++
|src/interfaces/ecpg/test/expected/pgtypeslib-dt_test2.stdout |  2 ++
|src/interfaces/ecpg/test/pgtypeslib/dt_test2.pgc |  6 ++
|src/tools/msvc/Solution.pm   |  9 -
|src/tools/msvc/config_default.pl |  1 -
|15 files changed, 36 insertions(+), 124 deletions(-)

It's true that convert_to_scalar sees doubles:
|static double
|convert_timevalue_to_scalar(Datum value, Oid typid, bool *failure)
|{
|switch (typid)
|{
|case TIMESTAMPOID:
|return DatumGetTimestamp(value);

But:
$ git grep DatumGetTimestamp src/include/
src/include/utils/timestamp.h:#define DatumGetTimestamp(X)  ((Timestamp) 
DatumGetInt64(X))

So I propose it should say something like:

|* The several datatypes representing absolute times are all converted
|* to Timestamp, which is actually an int64, and then we just promote that
|* to double.  Note this will give correct results even for the "special"
|* values of Timestamp, since those are chosen to compare correctly;
|* see timestamp_cmp.

That seems to be only used for ineq_histogram_selectivity() interpolation of
histogram bins.  It looks to me that at least isn't working for "special
values", and needs to use something other than isnan().  I added debugging code
and tested the attached like:

DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', 
'5 minutes');
INSERT INTO t VALUES('-infinity');
ALTER TABLE t ALTER t SET STATISTICS 1;
ANALYZE t;
explain SELECT * FROM t WHERE t>='2010-12-29';
>From b0151e24819499607eb2894dd920d4d8ef74b57d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 30 Dec 2019 01:37:42 -0600
Subject: [PATCH v1 1/2] Correctly handle infinite timestamps in
 ineq_histogram_selectivity

---
 src/backend/utils/adt/selfuncs.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 3f77c7e..73e2359 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4284,14 +4284,19 @@ convert_one_bytea_to_scalar(unsigned char *value, int valuelen,
 static double
 convert_timevalue_to_scalar(Datum value, Oid typid, bool *failure)
 {
+	double	ret;
+
 	switch (typid)
 	{
 		case TIMESTAMPOID:
-			return DatumGetTimestamp(value);
+			ret = DatumGetTimestamp(value);
+			break;
 		case TIMESTAMPTZOID:
-			return DatumGetTimestampTz(value);
+			ret = DatumGetTimestampTz(value);
+			break;
 		case DATEOID:
-			return date2timestamp_no_overflow(DatumGetDateADT(value));
+			ret = date2timestamp_no_overflow(DatumGetDateADT(value));
+			break;
 		case INTERVALOID:
 			{
 Interval   *interval = DatumGetIntervalP(value);
@@ -4301,22 +4306,31 @@ convert_timevalue_to_scalar(Datum value, Oid typid, bool *failure)
  * average month length of 365.25/12.0 days.  Not too
  * accurate, but plenty good enough for our purposes.
  */
-return interval->time + interval->day * (double) USECS_PER_DAY +
+ret = interval->time + interval->day * (double) USECS_PER_DAY +
 	interval->month * ((DAYS_PER_YEAR / (double) MONTHS_PER_YEAR) * USECS_PER_DAY);
 			}
+			break;
 		case TIMEOID:
-			return DatumGetTimeADT(value);
+			ret = DatumGetTimeADT(value);
+			break;
 		case TIMETZOID:

Re: comment regarding double timestamps; and, infinite timestamps and NaN

2019-12-30 Thread Justin Pryzby
On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > That seems to be only used for ineq_histogram_selectivity() interpolation of
> > histogram bins.  It looks to me that at least isn't working for "special
> > values", and needs to use something other than isnan().
> 
> Uh, what?  This seems completely wrong to me.  We could possibly
> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
> I don't really see the point.  They'll compare to other timestamp
> values correctly without that, cf timestamp_cmp_internal().
> The example you give seems to me to be working sanely, or at least
> as sanely as it can given the number of histogram points available,
> with the existing code.  In any case, shoving NaNs into the
> computation is not going to make anything better.

As I see it, the problem is that the existing code tests for isnan(), but
infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's
absurdly large values being used as if they were isnormal().

src/include/datatype/timestamp.h:#define DT_NOBEGIN PG_INT64_MIN
src/include/datatype/timestamp.h-#define DT_NOEND   PG_INT64_MAX

On v12, my test gives:
|DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 
day', '5 minutes');
|INSERT INTO t VALUES('-infinity');
|ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
|explain analyze SELECT * FROM t WHERE t>='2010-12-29';
| Seq Scan on t  (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 
rows=289 loops=1)

vs patched master:
|DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 
day', '5 minutes');
|INSERT INTO t VALUES('-infinity');
|ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
|explain analyze SELECT * FROM t WHERE t>='2010-12-29';
| Seq Scan on t  (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 
rows=289 loops=1)

IMO 146 rows is a reasonable estimate given a single histogram bucket of
infinite width, and 3 rows is a less reasonable result of returning INT64_MAX
in one place and then handling it as a normal value.  The comments in
ineq_histogram seem to indicate that this case is intended to get binfrac=0.5:

|  Watch out for the possibility that we got a NaN or Infinity from the
|  division.  This can happen despite the previous checks, if for example "low" 
is
|  -Infinity.

I changed to use INFINITY, -INFINITY and !isnormal() rather than nan() and
isnan() (although binfrac is actually NAN at that point so the existing test is
ok).

Justin




Re: allow disabling indexscans without disabling bitmapscans

2020-01-04 Thread Justin Pryzby
On Sat, Jan 04, 2020 at 10:50:47AM -0600, Justin Pryzby wrote:
> > Doesn't enable_indexscan=off accomplish this already?  It is possible but
> > not terribly likely to switch from index to seq, rather than from index to
> > bitmap.  (Unless the index scan was being used to obtain an ordered result,
> > but a hypothetical enable_bitmapscan=FORCE can't fix that).
> 
> No, enable_indexscan=off implicitly disables bitmap index scans, since it 
> does:

I don't know how I went wrong, but the regress tests clued me in..it's as Jeff
said.

Sorry for the noise.

Justin




doc: alter table references bogus table-specific planner parameters

2020-01-05 Thread Justin Pryzby
commit 6f3a13ff058f15d565a30c16c0c2cb14cc994e42 Enhance docs for ALTER TABLE 
lock levels of storage parms
Author: Simon Riggs 
Date:   Mon Mar 6 16:48:12 2017 +0530


 SET ( storage_parameter = value [, ... ] )
...
-  Changing fillfactor and autovacuum storage parameters acquires a 
SHARE UPDATE EXCLUSIVE lock.
+  SHARE UPDATE EXCLUSIVE lock will be taken for 
+  fillfactor and autovacuum storage parameters, as well as the
+  following planner related parameters:
+  effective_io_concurrency, parallel_workers, seq_page_cost
+  random_page_cost, n_distinct and n_distinct_inherited.

effective_io_concurrency, seq_page_cost and random_page_cost cannot be set for
a table - reloptions.c shows that they've always been RELOPT_KIND_TABLESPACE.

n_distinct lock mode seems to have been changed and documented at e5550d5f ;
21d4e2e2 claimed to do the same, but the LOCKMODE is never used.

See also:

commit 21d4e2e20656381b4652eb675af4f6d65053607f Reduce lock levels for table 
storage params related to planning
Author: Simon Riggs 
Date:   Mon Mar 6 16:04:31 2017 +0530

commit 47167b7907a802ed39b179c8780b76359468f076 Reduce lock levels for ALTER 
TABLE SET autovacuum storage options
Author: Simon Riggs 
Date:   Fri Aug 14 14:19:28 2015 +0100

commit e5550d5fec66aa74caad1f79b79826ec64898688 Reduce lock levels of some 
ALTER TABLE cmds
Author: Simon Riggs 
Date:   Sun Apr 6 11:13:43 2014 -0400

commit 2dbbda02e7e688311e161a912a0ce00cde9bb6fc Reduce lock levels of CREATE 
TRIGGER and some ALTER TABLE, CREATE RULE actions.
Author: Simon Riggs 
Date:   Wed Jul 28 05:22:24 2010 +

commit d86d51a95810caebcea587498068ff32fe28293e Support ALTER TABLESPACE name 
SET/RESET ( tablespace_options ).
Author: Robert Haas 
Date:   Tue Jan 5 21:54:00 2010 +

Justin
>From 64699ee90ef6ebe9459e3b2b1f603f30ec2c49c8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 5 Jan 2020 19:39:29 -0600
Subject: [PATCH v1] Fixes for commit 6f3a13ff

Should backpatch to v10.
---
 doc/src/sgml/ref/alter_table.sgml | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index bb1e48a..e5f39c2 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -676,32 +676,30 @@ WITH ( MODULUS numeric_literal, REM

 

 SET ( storage_parameter = value [, ... ] )
 
  
-  This form changes one or more storage parameters for the table.  See
+  This form changes one or more storage or planner parameters for the table.  See
   
   for details on the available parameters.  Note that the table contents
-  will not be modified immediately by this command; depending on the
+  will not be modified immediately by setting its storage parameters; depending on the
   parameter you might need to rewrite the table to get the desired effects.
   That can be done with VACUUM
   FULL,  or one of the forms
   of ALTER TABLE that forces a table rewrite.
   For planner related parameters, changes will take effect from the next
   time the table is locked so currently executing queries will not be
   affected.
  
 
  
   SHARE UPDATE EXCLUSIVE lock will be taken for
   fillfactor, toast and autovacuum storage parameters, as well as the
-  following planner related parameters:
-  effective_io_concurrency, parallel_workers, seq_page_cost,
-  random_page_cost, n_distinct and n_distinct_inherited.
+  parallel_workers planner parameter.
  
 

 

 RESET ( storage_parameter [, ... ] )
-- 
2.7.4



Re: doc: alter table references bogus table-specific planner parameters

2020-01-05 Thread Justin Pryzby
On Mon, Jan 06, 2020 at 03:48:52AM +, Simon Riggs wrote:
> On Mon, 6 Jan 2020 at 02:56, Justin Pryzby  wrote:
> 
> > commit 6f3a13ff058f15d565a30c16c0c2cb14cc994e42 Enhance docs for ALTER 
> > TABLE lock levels of storage parms
> > Author: Simon Riggs 
> > Date:   Mon Mar 6 16:48:12 2017 +0530
> >
> > 
> >  SET (  > class="PARAMETER">storage_parameter =  > class="PARAMETER">value [, ... ] )
> > ...
> > -  Changing fillfactor and autovacuum storage parameters acquires a
> > SHARE UPDATE EXCLUSIVE lock.
> > +  SHARE UPDATE EXCLUSIVE lock will be taken for
> > +  fillfactor and autovacuum storage parameters, as well as the
> > +  following planner related parameters:
> > +  effective_io_concurrency, parallel_workers, seq_page_cost
> > +  random_page_cost, n_distinct and n_distinct_inherited.
> >
> > effective_io_concurrency, seq_page_cost and random_page_cost cannot be set
> > for
> > a table - reloptions.c shows that they've always been
> > RELOPT_KIND_TABLESPACE.
> 
> I agree with the sentiment of the third doc change, but your patch removes
> the mention of n_distinct, which isn't appropriate.

I think it's correct to remove n_distinct there, as it's documented previously,
since e5550d5f.  That's a per-attribute option (not storage) and can't be
specified there.


 SET ( attribute_option = value [, ... ] )
 RESET ( attribute_option [, ... ] )
 
  
   This form sets or resets per-attribute options.  Currently, the only
...
+ 
+  Changing per-attribute options acquires a
+  SHARE UPDATE EXCLUSIVE lock.
+ 

> The second change in your patch alters the meaning of the sentence in a way
> that is counter to the first change. The name of these parameters is
> "Storage Parameters" (in various places); I might agree with describing
> them in text as "storage or planner parameters", but if you do that you
> can't then just refer to "storage parameters" later, because if you do it
> implies that planner parameters operate differently to storage parameters,
> which they don't.

The 2nd change is:

   for details on the available parameters.  Note that the table contents
-  will not be modified immediately by this command; depending on the
+  will not be modified immediately by setting its storage parameters; 
depending on the
   parameter you might need to rewrite the table to get the desired effects.

I deliberately qualified that as referring only to "storage params" rather than
"this command", since planner params never "modify the table contents".
Possibly other instances in the document (and createtable) should be changed
for consistency.

Justin




Re: bitmaps and correlation

2020-01-06 Thread Justin Pryzby
Find attached cleaned up patch.
For now, I updated the regress/expected/, but I think the test maybe has to be
updated to do what it was written to do.
>From 36f547d69b8fee25869d6ef3ef26d327a8ba1205 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 1 Jan 2019 16:17:28 -0600
Subject: [PATCH v1] Use correlation statistic in costing bitmap scans..

Same as for an index scan, a correlated bitmap which accesses pages across a
small portion of the table should have a cost estimate much less than an
uncorrelated scan (like modulus) across the entire length of the table, the
latter having a high component of random access.

Note, Tom points out that there are cases where a column could be
tightly-"clumped" without being highly-ordered.  Since we have correlation
already, we use that until such time as someone implements a new statistic for
clumpiness.  This patch only intends to make costing of bitmap heap scan on par
with the same cost of index scan without bitmap.
---
 contrib/postgres_fdw/expected/postgres_fdw.out | 15 ++--
 src/backend/optimizer/path/costsize.c  | 94 +-
 src/backend/optimizer/path/indxpath.c  | 10 ++-
 src/include/nodes/pathnodes.h  |  3 +
 src/include/optimizer/cost.h   |  2 +-
 src/test/regress/expected/create_index.out | 16 ++---
 src/test/regress/expected/join.out | 18 +++--
 src/test/regress/sql/create_index.sql  |  8 ++-
 8 files changed, 118 insertions(+), 48 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c915885..fb4c7f2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2257,11 +2257,12 @@ SELECT * FROM ft1, ft2, ft4, ft5, local_tbl WHERE ft1.c1 = ft2.c1 AND ft1.c2 = f
  ->  Foreign Scan on public.ft1
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 100)) FOR UPDATE
-   ->  Materialize
+   ->  Sort
  Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
+ Sort Key: ft2.c1
  ->  Foreign Scan on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 100)) ORDER BY "C 1" ASC NULLS LAST FOR UPDATE
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 100)) FOR UPDATE
->  Sort
  Output: ft4.c1, ft4.c2, ft4.c3, ft4.*
  Sort Key: ft4.c1
@@ -2276,7 +2277,7 @@ SELECT * FROM ft1, ft2, ft4, ft5, local_tbl WHERE ft1.c1 = ft2.c1 AND ft1.c2 = f
  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" FOR UPDATE
  ->  Index Scan using local_tbl_pkey on public.local_tbl
Output: local_tbl.c1, local_tbl.c2, local_tbl.c3, local_tbl.ctid
-(47 rows)
+(48 rows)
 
 SELECT * FROM ft1, ft2, ft4, ft5, local_tbl WHERE ft1.c1 = ft2.c1 AND ft1.c2 = ft4.c1
 AND ft1.c2 = ft5.c1 AND ft1.c2 = local_tbl.c1 AND ft1.c1 < 100 AND ft2.c1 < 100 FOR UPDATE;
@@ -3318,10 +3319,12 @@ select c2, sum from "S 1"."T 1" t1, lateral (select sum(t2.c1 + t1."C 1") sum fr
Sort Key: t1.c2
->  Nested Loop
  Output: t1.c2, qry.sum
- ->  Index Scan using t1_pkey on "S 1"."T 1" t1
+ ->  Bitmap Heap Scan on "S 1"."T 1" t1
Output: t1."C 1", t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-   Index Cond: (t1."C 1" < 100)
+   Recheck Cond: (t1."C 1" < 100)
Filter: (t1.c2 < 3)
+   ->  Bitmap Index Scan on t1_pkey
+ Index Cond: (t1."C 1" < 100)
  ->  Subquery Scan on qry
Output: qry.sum, t2.c1
Filter: ((t1.c2 * 2) = qry.sum)
@@ -3329,7 +3332,7 @@ select c2, sum from "S 1"."T 1" t1, lateral (select sum(t2.c1 + t1."C 1") sum fr
  Output: (sum(

Re: bitmaps and correlation

2020-01-06 Thread Justin Pryzby
On Tue, Jan 07, 2020 at 09:21:03AM +0530, Dilip Kumar wrote:
> On Tue, Jan 7, 2020 at 1:29 AM Justin Pryzby  wrote:
> >
> > Find attached cleaned up patch.
> > For now, I updated the regress/expected/, but I think the test maybe has to 
> > be
> > updated to do what it was written to do.
> 
> I have noticed that in "cost_index" we have used the indexCorrelation
> for computing the run_cost, not the number of pages whereas in your
> patch you have used it for computing the number of pages.  Any reason
> for the same?

As Jeff has pointed out, high correlation has two effects in cost_index():
1) the number of pages read will be less;
2) the pages will be read more sequentially;

cost_index reuses the pages_fetched variable, so (1) isn't particularly clear,
but does essentially:

/* max_IO_cost is for the perfectly uncorrelated case 
(csquared=0) */
pages_fetched(MIN) = index_pages_fetched(tuples_fetched,
baserel->pages,
(double) index->pages,
root);
max_IO_cost = pages_fetchedMIN * spc_random_page_cost;

/* min_IO_cost is for the perfectly correlated case 
(csquared=1) */
pages_fetched(MAX) = ceil(indexSelectivity * (double) 
baserel->pages);
min_IO_cost = (pages_fetchedMAX - 1) * spc_seq_page_cost;


My patch 1) changes compute_bitmap_pages() to interpolate pages_fetched using
the correlation; pages_fetchedMIN is new:

> Patch
> - pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
> + pages_fetchedMAX = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
> +
> + /* pages_fetchedMIN is for the perfectly correlated case (csquared=1) */
> + pages_fetchedMIN = ceil(indexSelectivity * (double) baserel->pages);
> +
> + pages_fetched = pages_fetchedMAX + 
> indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);

And, 2) also computes cost_per_page by interpolation between seq_page and
random_page cost:

+   cost_per_page_corr = spc_random_page_cost -
+   (spc_random_page_cost - spc_seq_page_cost)
+   * (1-correlation*correlation);

Thanks for looking.  I'll update the name of pages_fetchedMIN/MAX in my patch
for consistency with cost_index.

Justin




Re: [PATCH v1] pg_ls_tmpdir to show directories

2019-12-27 Thread Justin Pryzby
On Fri, Dec 27, 2019 at 06:50:24PM +0100, Fabien COELHO wrote:
> >On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
> >>The implementation simply extends an existing functions with a boolean to
> >>allow for sub-directories. However, the function does not seem to show
> >>subdir contents recursively. Should it be the case?
> >
> >>STM that "//"-comments are not project policy.
> >
> >Sure, but the patch is less important than the design, which needs to be
> >addressed first.  The goal is to somehow show tmpfiles (or at least dirs) 
> >used
> >by parallel workers.  I mentioned a few possible ways, of which this was the
> >simplest to implement.  Showing files beneath the dir is probably good, but
> >need to decide how to present it.  Should there be a column for the dir (null
> >if not a shared filesets)?  Or some other presentation, like a boolean column
> >"is_shared_fileset".
> 
> Why not simply showing the files underneath their directories?
> 
>   /path/to/tmp/file1
>   /path/to/tmp/subdir1/file2
> 
> In which case probably showing the directory itself is not useful,
> and the is_dir column could be dropped?

The names are expected to look like this:

$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
1429774 drwxr-x---   3 postgres postgres 4096 Dec 27 13:51 
/var/lib/pgsql/12/data/base/pgsql_tmp
1698684 drwxr-x---   2 postgres postgres 4096 Dec  7 01:35 
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
169347 5492 -rw-r-   1 postgres postgres  5619712 Dec  7 01:35 
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
169346 5380 -rw-r-   1 postgres postgres  5505024 Dec  7 01:35 
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0

I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
Actually the results should be unique, either on filename or (dir,file).

"ls" wouldn't list same name twice, unless you list multiple dirs, like:
|ls a/b c/d.

It's worth thinking if subdir should be a separate column.

I'm interested to hear back from others.

Justin




Re: [PATCH v1] pg_ls_tmpdir to show directories

2019-12-28 Thread Justin Pryzby
On Sat, Dec 28, 2019 at 07:52:55AM +0100, Fabien COELHO wrote:
> >>Why not simply showing the files underneath their directories?
> >>
> >>  /path/to/tmp/file1
> >>  /path/to/tmp/subdir1/file2
> >>
> >>In which case probably showing the directory itself is not useful,
> >>and the is_dir column could be dropped?
> >
> >The names are expected to look like this:
> >
> >$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> >1429774 drwxr-x---   3 postgres postgres 4096 Dec 27 13:51 
> >/var/lib/pgsql/12/data/base/pgsql_tmp
> >1698684 drwxr-x---   2 postgres postgres 4096 Dec  7 01:35 
> >/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> >169347 5492 -rw-r-   1 postgres postgres  5619712 Dec  7 01:35 
> >/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> >169346 5380 -rw-r-   1 postgres postgres  5505024 Dec  7 01:35 
> >/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
> >
> >I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
> >It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
> >Actually the results should be unique, either on filename or (dir,file).
> 
> Ok, so this suggests recursing into subdirs, which requires to make a
> separate function of the inner loop.

Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy.
It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once
for every tuple to be returned.  So it's recursive and saves its state...

The attached is pretty ugly, but I can't see how to do better.
The alternative seems to be to build up a full list of pathnames in the SRF
initial branch, and stat them all later.  Or stat them all in the INITIAL case,
and keep a list of stat structures to be emited during future calls.

BTW, it seems to me this error message should be changed:

snprintf(path, sizeof(path), "%s/%s", fctx->location, 
de->d_name);
if (stat(path, ) < 0)
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not stat directory 
\"%s\": %m", dir)));
+errmsg("could not stat file \"%s\": 
%m", path)));

>From fd88be5f1687354d9990fb1838adc0db36bc6dde Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 23:34:14 -0600
Subject: [PATCH v2 1/2] BUG: in errmsg

---
 src/backend/utils/adt/genfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 5d4f26a..c978e15 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -590,7 +590,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 		if (stat(path, ) < 0)
 			ereport(ERROR,
 	(errcode_for_file_access(),
-	 errmsg("could not stat directory \"%s\": %m", dir)));
+	 errmsg("could not stat file \"%s\": %m", path)));
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))
-- 
2.7.4

>From fff91aec87f635755527e91aebb7554fa6385fec Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 14 Dec 2019 16:22:15 -0600
Subject: [PATCH v2 2/2] pg_ls_tmpdir to show directories

See also 9cd92d1a33699f86aa53d44ab04cc3eb50c18d11
---
 doc/src/sgml/func.sgml   |  14 +++--
 src/backend/utils/adt/genfile.c  | 132 ++-
 src/include/catalog/catversion.h |   2 +-
 3 files changed, 96 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a98158..8abc643 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21922,12 +21922,14 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

setof record

-List the name, size, and last modification time of files in the
-temporary directory for tablespace.  If
-tablespace is not provided, the
-pg_default tablespace is used.  Access is granted
-to members of the pg_monitor role and may be
-granted to other non-superuser roles.
+For files in the temporary directory for
+tablespace, list the name, size, and last modification time.
+Files beneath a first-level directory are shown, and include a pathname
+component of their parent directory; such files are used by parallel processes.
+If tablespace is not provided, the
+pg_default tablespace is used.  Access is granted to
+members of the pg_monitor role and may be granted to
+other non-superuser roles.

Re: [PATCH v1] pg_ls_tmpdir to show directories

2019-12-28 Thread Justin Pryzby
Here's a version which uses an array of directory_fctx, rather than of DIR and
location.  That avoids changing the data structure and collatoral implications
to pg_ls_dir().

Currently, this *shows* subdirs of subdirs, but doesn't decend into them.
So I think maybe toplevel subdirs should be shown, too.
And maybe the is_dir flag should be re-introduced (although someone could call
pg_stat_file if needed).
I'm interested to hear feedback on that, although this patch still isn't great.
>From dd3b2779939fc1b396fed1fba2f7cefc9a6b1ad5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 23:34:14 -0600
Subject: [PATCH v4 1/2] BUG: in errmsg

Note there's two changes here.
Should backpatch to v12, where pg_ls_tmpdir was added.
---
 src/backend/utils/adt/genfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 5d4f26a..c978e15 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -590,7 +590,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 		if (stat(path, ) < 0)
 			ereport(ERROR,
 	(errcode_for_file_access(),
-	 errmsg("could not stat directory \"%s\": %m", dir)));
+	 errmsg("could not stat file \"%s\": %m", path)));
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))
-- 
2.7.4

>From 30031c790fe5bb358f0bb372cb2d7975e2d688aa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 14 Dec 2019 16:22:15 -0600
Subject: [PATCH v4 2/2] pg_ls_tmpdir to show directories

See also 9cd92d1a33699f86aa53d44ab04cc3eb50c18d11
---
 doc/src/sgml/func.sgml  |  14 +++--
 src/backend/utils/adt/genfile.c | 136 ++--
 2 files changed, 97 insertions(+), 53 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a98158..8abc643 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21922,12 +21922,14 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

setof record

-List the name, size, and last modification time of files in the
-temporary directory for tablespace.  If
-tablespace is not provided, the
-pg_default tablespace is used.  Access is granted
-to members of the pg_monitor role and may be
-granted to other non-superuser roles.
+For files in the temporary directory for
+tablespace, list the name, size, and last modification time.
+Files beneath a first-level directory are shown, and include a pathname
+component of their parent directory; such files are used by parallel processes.
+If tablespace is not provided, the
+pg_default tablespace is used.  Access is granted to
+members of the pg_monitor role and may be granted to
+other non-superuser roles.

   
   
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index c978e15..6bfac64 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -522,12 +522,84 @@ pg_ls_dir_1arg(PG_FUNCTION_ARGS)
 	return pg_ls_dir(fcinfo);
 }
 
-/* Generic function to return a directory listing of files */
+/* Recursive helper to handle showing a first level of files beneath a subdir */
 static Datum
-pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
+pg_ls_dir_files_recurse(FunctionCallInfo fcinfo, FuncCallContext *funcctx, const char *dir, bool missing_ok, bool dir_ok)
+{
+	bool		nulls[3] = {0,};
+	Datum		values[3];
+
+	directory_fctx	*fctx = (directory_fctx *) funcctx->user_fctx;
+
+	while (1) {
+		struct dirent *de;
+		char *location;
+		DIR *dirdesc;
+
+		location = fctx[1].location ? fctx[1].location : fctx[0].location;
+		dirdesc = fctx[1].dirdesc ? fctx[1].dirdesc : fctx[0].dirdesc;
+
+		while ((de = ReadDir(dirdesc, location)) != NULL)
+		{
+			char		path[MAXPGPATH * 2];
+			HeapTuple	tuple;
+			struct stat	attrib;
+
+			/* Skip hidden files */
+			if (de->d_name[0] == '.')
+continue;
+
+			/* Get the file info */
+			snprintf(path, sizeof(path), "%s/%s", location, de->d_name);
+			if (stat(path, ) < 0)
+ereport(ERROR,
+		(errcode_for_file_access(),
+		 errmsg("could not stat file \"%s\": %m", path)));
+
+			/* Ignore anything but regular files or (if requested) dirs */
+			if (S_ISDIR(attrib.st_mode)) {
+/* Note: decend into dirs, but do not return a tuple for the dir itself */
+/* Do not expect dirs more than one level deep */
+if (dir_ok && !fctx[1].location) {
+	MemoryContext oldcontext;
+	oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+	fctx[1].location = pstrdup(path);
+	fctx[1].dirdesc = AllocateDir(path);
+	MemoryContextSwitchTo(oldcontext);
+	return pg_ls_dir_files_recurs

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-03-09 Thread Justin Pryzby
On Sat, Feb 29, 2020 at 08:53:04AM -0600, Justin Pryzby wrote:
> On Sat, Feb 29, 2020 at 03:35:27PM +0300, Alexey Kondratov wrote:
> > Anyway, new version is attached. It is rebased in order to resolve conflicts
> > with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes
> > this small comment fix.
> 
> Thanks for rebasing - I actually started to do that yesterday.
> 
> I extracted the bits from your original 0001 patch which handled CLUSTER and
> VACUUM FULL.  I don't think if there's any interest in combining that with
> ALTER anymore.  On another thread (1), I tried to implement that, and Tom
> pointed out problem with the implementation, but also didn't like the idea.
> 
> I'm including some proposed fixes, but didn't yet update the docs, errors or
> tests for that.  (I'm including your v8 untouched in hopes of not messing up
> the cfbot).  My fixes avoid an issue if you try to REINDEX onto pg_default, I
> think due to moving system toast indexes.

I was able to avoid this issue by adding a call to GetNewRelFileNode, even
though that's already called by RelationSetNewRelfilenode().  Not sure if
there's a better way, or if it's worth Alexey's v3 patch which added a
tablespace param to RelationSetNewRelfilenode.

The current logic allows moving all the indexes and toast indexes, but I think
we should use IsSystemRelation() unless allow_system_table_mods, like existing
behavior of ALTER.

template1=# ALTER TABLE pg_extension_oid_index SET tablespace pg_default;
ERROR:  permission denied: "pg_extension_oid_index" is a system catalog
template1=# REINDEX INDEX pg_extension_oid_index TABLESPACE pg_default;
REINDEX

Finally, I think the CLUSTER is missing permission checks.  It looks like
relation_is_movable was factored out, but I don't see how that helps ?

Alexey, I'm hoping to hear back if you think these changes are ok or if you'll
publish a new version of the patch addressing the crash I reported.
Or if you're too busy, maybe someone else can adopt the patch (I can help).

-- 
Justin
>From 259e84e31b5ef0348987036ebc8ef3cc1ba85aa9 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Sat, 29 Feb 2020 15:35:27 +0300
Subject: [PATCH v10 1/5] Allow REINDEX to change tablespace

>From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 30 Dec 2019 20:00:37 +0300
Subject: [PATCH v8] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml | 24 +-
 src/backend/catalog/index.c   | 75 --
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 96 ---
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/parser/gram.y | 14 ++--
 src/backend/tcop/utility.c|  6 +-
 src/bin/psql/tab-complete.c   |  6 ++
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h |  6 +-
 src/include/nodes/parsenodes.h|  1 +
 src/test/regress/input/tablespace.source  | 49 
 src/test/regress/output/tablespace.source | 66 
 15 files changed, 323 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index c54a7c420d..0628c94bb1 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ]
 
 where option can be one of:
 
@@ -174,6 +174,28 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7223679033..3d98e9164a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1235,9 +1235,13 @@ index_create(Relation heapRelation,
  * Create concurrently an index based on the definition of the one provided by
  * caller.  The index is inserted into catalogs and needs to be built later
  * on.  This

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-12 Thread Justin Pryzby
On Sun, Mar 08, 2020 at 04:30:44PM -0400, Tom Lane wrote:
> BTW, another thing I noticed while looking around is that some of
> the functions using SRF_RETURN_DONE() think they should clean up
> memory beforehand.  This is a waste of code/cycles, as long as the
> memory was properly allocated in funcctx->multi_call_memory_ctx,
> because funcapi.c takes care of deleting that context.
> 
> We should probably document that *any* manual cleanup before
> SRF_RETURN_DONE() is an antipattern.  If you have to have cleanup,
> it needs to be done via RegisterExprContextCallback instead.

This part appears to be already in place since
e4186762ffaa4188e16702e8f4f299ea70988b96:

|The memory context that is current when the SRF is called is a transient
|context that will be cleared between calls. This means that you do not need to
|call pfree on everything you allocated using palloc; it will go away anyway.
|However, if you want to allocate any data structures to live across calls, you
|need to put them somewhere else. The memory context referenced by
|multi_call_memory_ctx is a suitable location for any data that needs to survive
|until the SRF is finished running. In most cases, this means that you should
|switch into multi_call_memory_ctx while doing the first-call setup.

-- 
Justin




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-12 Thread Justin Pryzby
On Wed, Mar 11, 2020 at 03:32:38PM -0400, Tom Lane wrote:
> > I patched this one to see what it looks like and to allow /hopefully/ moving
> > forward one way or another with the pg_ls_tmpfile() patch set (or at least
> > avoid trying to do anything there which is too inconsistent with this fix).
> 
> I reviewed this, added some test cases, and pushed it, so that we can see

Thanks, tests were on my periphery..

|In passing, fix bogus error report for stat() failure: it was
|whining about the directory when it should be fingering the
|individual file.  Doubtless a copy-and-paste error.

Thanks again ; that was my 0001 patch on the other thread.  No rebase conflict
even ;)
https://www.postgresql.org/message-id/20191228101650.GG12890%40telsasoft.com

> Do you want to have a go at that?

First draft attached.  Note that I handled pg_ls_dir, even though I'm proposing
on the other thread to collapse/merge/meld it with pg_ls_dir_files [0].
Possibly that's a bad idea with tuplestore, due to returning a scalar vs a row
and needing to conditionally call CreateTemplateTupleDesc vs
get_call_result_type.  I'll rebase that patch later today.

I didn't write test cases yet.  Also didn't look for functions not on your
list.

I noticed this doesn't actually do anything, but kept it for now...except in
pg_ls_dir error case:

src/include/utils/tuplestore.h:/* tuplestore_donestoring() used to be required, 
but is no longer used */
src/include/utils/tuplestore.h:#define tuplestore_donestoring(state)((void) 
0)

I found a few documentation bits that I think aren't relevant, but could
possibly be misread to encourage the bad coding practice.  This is about *sql*
functions:

|37.5.8. SQL Functions Returning Sets
|When an SQL function is declared as returning SETOF sometype, the function's
|final query is executed TO COMPLETION, and each row it outputs is returned as
|an element of the result set.
|...
|Set-returning functions in the select list are always evaluated as though they
|are on the inside of a nested-loop join with the rest of the FROM clause, so
|that the function(s) are run TO COMPLETION before the next row from the FROM
|clause is considered.

-- 
Justin

[0] https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com
v9-0008-generalize-pg_ls_dir_files-and-retire-pg_ls_dir.patch 
>From 43e7e5a9b679a4172808b248df2bc3365b6336e4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 11 Mar 2020 10:09:18 -0500
Subject: [PATCH] SRF: avoid leaking resources if not run to completion

Change to return a tuplestore populated immediately and returned in full.

Discussion: https://www.postgresql.org/message-id/20200308173103.GC1357%40telsasoft.com

See also: 9cb7db3f0 and 085b6b66
---
 contrib/adminpack/adminpack.c|  83 +---
 contrib/pgrowlocks/pgrowlocks.c  | 163 ++-
 doc/src/sgml/xfunc.sgml  |  17 +++-
 src/backend/utils/adt/datetime.c | 100 ---
 src/backend/utils/adt/genfile.c  | 112 +++--
 src/backend/utils/adt/misc.c | 117 --
 src/include/funcapi.h|   7 +-
 7 files changed, 306 insertions(+), 293 deletions(-)

diff --git a/contrib/adminpack/adminpack.c b/contrib/adminpack/adminpack.c
index bc45e79895..2afb999c6e 100644
--- a/contrib/adminpack/adminpack.c
+++ b/contrib/adminpack/adminpack.c
@@ -504,50 +504,57 @@ pg_logdir_ls_v1_1(PG_FUNCTION_ARGS)
 static Datum
 pg_logdir_ls_internal(FunctionCallInfo fcinfo)
 {
-	FuncCallContext *funcctx;
 	struct dirent *de;
-	directory_fctx *fctx;
+
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	MemoryContext	oldcontext;
+	TupleDesc		tupdesc;
+	Tuplestorestate	*tupstore;
+	bool			randomAccess;
+	DIR*dirdesc;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not "
+	 "allowed in this context")));
 
 	if (strcmp(Log_filename, "postgresql-%Y-%m-%d_%H%M%S.log") != 0)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("the log_filename parameter must equal 'postgresql-%%Y-%%m-%%d_%%H%%M%%S.log'")));
 
-	if (SRF_IS_FIRSTCALL())
-	{
-		MemoryContext oldcontext;
-		TupleDesc	tupdesc;
-
-		funcctx = SRF_FIRSTCALL_INIT();
-		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+	/* The Tuplestore and TupleDesc should be created in ecxt_per_query_memory */
+	oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+	randomAccess = (rsinfo->allowedModes_Materialize_Random) != 0;
+	tup

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-09 Thread Justin Pryzby
> +++ b/src/backend/utils/misc/postgresql.conf.sample
> +#autovacuum_vacuum_insert_threshold = 1000   # min number of row 
> inserts
> + # before vacuum

Similar to a previous comment [0] about reloptions or GUC:

Can we say "threshold number of insertions before vacuum" ?
..or "maximum number of insertions before triggering autovacuum"

-- 
Justin

[0] 
https://www.postgresql.org/message-id/602873766faa0e9200a60dcc26dc10c636761d5d.camel%40cybertec.at




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-11 Thread Justin Pryzby
On Sun, Mar 08, 2020 at 03:40:09PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
> >> I guess we ought to change that function to use returns-a-tuplestore
> >> protocol instead of thinking it can hold a directory open across calls.
> > Thanks for the analysis.
> > Do you mean it should enumerate all files during the initial SRF call, or 
> > use
> > something other than the SRF_* macros ?
> It has to enumerate all the files during the first call.  I suppose it

> I've just finished scanning the source code and concluding that all
> of these functions are similarly broken:
> pg_ls_dir_files

I patched this one to see what it looks like and to allow /hopefully/ moving
forward one way or another with the pg_ls_tmpfile() patch set (or at least
avoid trying to do anything there which is too inconsistent with this fix).

> I don't see anything in the documentation (either funcapi.h or
> xfunc.sgml) warning that the function might not get run to completion,
> either ...

Also, at first glance, these seem to be passing constant "randomAccess=true"
rather than (bool) (rsinfo->allowedModes_Materialize_Random)

$ git grep -wl SFRM_Materialize |xargs grep -l 'tuplestore_begin_heap(true'
contrib/dblink/dblink.c
contrib/pageinspect/brinfuncs.c
contrib/pg_stat_statements/pg_stat_statements.c
src/backend/access/transam/xlogfuncs.c
src/backend/commands/event_trigger.c
src/backend/commands/extension.c
src/backend/foreign/foreign.c
src/backend/replication/logical/launcher.c
src/backend/replication/logical/logicalfuncs.c
src/backend/replication/logical/origin.c
src/backend/replication/slotfuncs.c
src/backend/replication/walsender.c
src/backend/storage/ipc/shmem.c
src/backend/utils/adt/pgstatfuncs.c
src/backend/utils/misc/guc.c
src/backend/utils/misc/pg_config.c

-- 
Justin
>From 1fd2918d65ed8cc64158e407e56ed61b44e951db Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 10 Mar 2020 21:01:47 -0500
Subject: [PATCH v1] pg_ls_dir_files: avoid leaking DIR if not run to
 completion

Change to return a tuplestore rather than leaving directory open across calls.

Discussion: https://www.postgresql.org/message-id/20200308173103.GC1357%40telsasoft.com
See also: 9cb7db3f0
---
 src/backend/utils/adt/genfile.c | 90 +++--
 1 file changed, 42 insertions(+), 48 deletions(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 5bda2af87c..046d218ffa 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -527,67 +527,63 @@ pg_ls_dir_1arg(PG_FUNCTION_ARGS)
 static Datum
 pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 {
-	FuncCallContext *funcctx;
 	struct dirent *de;
-	directory_fctx *fctx;
-
-	if (SRF_IS_FIRSTCALL())
-	{
-		MemoryContext oldcontext;
-		TupleDesc	tupdesc;
+	MemoryContext oldcontext;
+	TupleDesc	tupdesc;
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	bool		randomAccess = (rsinfo->allowedModes_Materialize_Random) != 0;
+	DIR			*dirdesc;
+	Tuplestorestate *tupstore;
 
-		funcctx = SRF_FIRSTCALL_INIT();
-		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+	/* check to see if caller supports us returning a tuplestore */
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not "
+	 "allowed in this context")));
 
-		fctx = palloc(sizeof(directory_fctx));
+	/* The Tuplestore and TupleDesc should be created in ecxt_per_query_memory */
+	oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
 
-		tupdesc = CreateTemplateTupleDesc(3);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "name",
-		   TEXTOID, -1, 0);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "size",
-		   INT8OID, -1, 0);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 3, "modification",
-		   TIMESTAMPTZOID, -1, 0);
-		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+	if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
 
-		fctx->location = pstrdup(dir);
-		fctx->dirdesc = AllocateDir(fctx->location);
+	tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
 
-		if (!fctx->dirdesc)
-		{
-			if (missing_ok && errno == ENOENT)
-			{
-MemoryContextSwitchTo(oldcontext);
-SRF_RETURN_DONE(funcctx);
-			}
-			else
-ereport(ERROR,
-		

Re: explain HashAggregate to report bucket and memory stats

2020-03-06 Thread Justin Pryzby
On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > +   ExplainIndentText(es);
> > +   appendStringInfo(es->str,
> > +   "Buckets: %ld (originally %ld)",
> > +   inst->nbuckets,
> > +   inst->nbuckets_original);
> 
> I'm not sure I like the alternative output formats here. All the other
> fields are separated with a comma, but the original size is in
> parens. I'd probably just format it as "Buckets: %lld " and then add
> ", Original Buckets: %lld" when differing.

It's done that way for consistency with hashJoin in show_hash_info().

> > +/* Update instrumentation stats */
> > +void
> > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> > +{
> > +   hashtable->instrument.nbuckets = hashtable->hashtab->size;
> > +   if (initial)
> > +   {
> > +   hashtable->instrument.nbuckets_original = 
> > hashtable->hashtab->size;
> > +   hashtable->instrument.space_peak_hash = 
> > hashtable->hashtab->size *
> > +   sizeof(TupleHashEntryData);
> > +   hashtable->instrument.space_peak_tuples = 0;
> > +   }
> > +   else
> > +   {
> > +#define maxself(a,b) a=Max(a,b)
> > +   /* hashtable->entrysize includes additionalsize */
> > +   maxself(hashtable->instrument.space_peak_hash,
> > +   hashtable->hashtab->size * 
> > sizeof(TupleHashEntryData));
> > +   maxself(hashtable->instrument.space_peak_tuples,
> > +   hashtable->hashtab->members * 
> > hashtable->entrysize);
> > +#undef maxself
> > +   }
> > +}
> 
> Not a fan of this macro.
> 
> I'm also not sure I understand what you're trying to do here?

I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations.  If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.

> > diff --git a/src/test/regress/expected/aggregates.out 
> > b/src/test/regress/expected/aggregates.out
> > index f457b5b150..b173b32cab 100644
> > --- a/src/test/regress/expected/aggregates.out
> > +++ b/src/test/regress/expected/aggregates.out
> > @@ -517,10 +517,11 @@ order by 1, 2;
> >   ->  HashAggregate
> > Output: s2.s2, sum((s1.s1 + s2.s2))
> > Group Key: s2.s2
> > +   Buckets: 4
> > ->  Function Scan on pg_catalog.generate_series s2
> >   Output: s2.s2
> >   Function Call: generate_series(1, 3)
> > -(14 rows)
> > +(15 rows)
> 
> These tests probably won't be portable. The number of hash buckets
> calculated will e.g. depend onthe size of the contained elements. And
> that'll e.g. will depend on whether pointers are 4 or 8 bytes.

I was aware and afraid of that.  Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode.  But as Tomas
pointed out, that's consistent with what's done elsewhere.

So is the solution to show stats only during explain ANALYZE ?

Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk.  That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee).  But does allow retiring
explain_sq_limit and explain_parallel_sort_stats.  I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:".  I guess in
either case, the tests shouldn't be included.

-- 
Justin
>From 2f6e6fda6352473da03f819eb32262a0501d746b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 31 Dec 2019 18:49:41 -0600
Subject: [PATCH v7 1/9] explain to show tuplehash bucket and memory stats..

Note that hashed SubPlan and recursiveUnion aren't affected in explain output,
probably since hashtables aren't allocated at that point.

Discussion: https://www.postgresql.org/message-id/flat/20200103161925.gm12...@telsasoft.com
---
 .../postgres_fdw/expected/postgres_fdw.out|  56 +--
 src/backend/commands/explain.c| 137 +++--
 src/backend

Re: pg_ls_tmpdir to show directories and shared filesets

2020-03-06 Thread Justin Pryzby
On Thu, Mar 05, 2020 at 10:18:38AM -0600, Justin Pryzby wrote:
> I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at
> once during the initial call), or 0002+3+4, which incrementally reads the dirs
> on each call (but requires keeping dirs opened).

I fixed an issue that leading dirs were being shown which should not have been,
which was easier in the 0004 patch, so squished.  And fixed a bug that
"special" files weren't excluded, and "missing_ok" wasn't effective.

> > I don't understand what purpose is served by having pg_ls_waldir() hide
> > directories.
> 
> We could talk about whether the other functions should show dirs, if it's 
> worth
> breaking their return type.  Or if they should show hidden or special files,
> which doesn't require breaking the return.  But until then I am to leave the
> behavior alone.

I don't see why any of the functions would exclude dirs, but ls_tmpdir deserves
to be fixed since core postgres dynamically creates dirs there.

Also ... I accidentally changed the behavior: master not only doesn't decend
into dirs, it hides them - that was my original complaint.  I propose to *also*
change at least tmpdir and logdir to show dirs, but don't decend.  I left
waldir alone for now.

Since v12 ls_tmpdir and since v10 logdir and waldir exclude dirs, I think we
should backpatch documentation to say so.

ISTM pg_ls_tmpdir and ls_logdir should be called with missing_ok=true, since
they're not created until they're used.

-- 
Justin
>From a5b9a03445d1c768662cafebd8ab3bd7a62890aa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 23:34:14 -0600
Subject: [PATCH v7 1/6] BUG: in errmsg

Note there's two changes here.
Should backpatch to v12, where pg_ls_tmpdir was added.
---
 src/backend/utils/adt/genfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 3741b87486..897b11a77d 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -590,7 +590,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 		if (stat(path, ) < 0)
 			ereport(ERROR,
 	(errcode_for_file_access(),
-	 errmsg("could not stat directory \"%s\": %m", dir)));
+	 errmsg("could not stat file \"%s\": %m", path)));
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))
-- 
2.17.0

>From 6ea85ec0a267930320b8454a33bca368a8544a2d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 6 Mar 2020 16:50:07 -0600
Subject: [PATCH v7 2/6] Document historic behavior about hiding directories
 and special files

Should backpatch to v10: tmpdir, waldir and archive_statusdir
---
 doc/src/sgml/func.sgml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 323366feb6..4c0ea5ab3f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21450,6 +21450,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 (mtime) of each file in the log directory. By default, only superusers
 and members of the pg_monitor role can use this function.
 Access may be granted to others using GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

@@ -21461,6 +21462,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 default only superusers and members of the pg_monitor role
 can use this function. Access may be granted to others using
 GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

@@ -21473,6 +21475,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 superusers and members of the pg_monitor role can
 use this function. Access may be granted to others using
 GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.
    
 

-- 
2.17.0

>From 5250d637493627f1ff3587bc73dd598bc1ca3ffc Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 6 Mar 2020 17:12:04 -0600
Subject: [PATCH v7 3/6] Document historic behavior about hiding directories
 and special files

Should backpatch to v12: tmpdir
---
 doc/src/sgml/func.sgml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4c0ea5ab3f..fc4d7f0f78 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21489,6 +21489,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 default only superusers and members of the pg_monitor
 role can use this function.  Access may be granted to others using
 GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

-- 
2.17.0

>From 8d00a1c80679d9a754a3988786e70c0385b46b30 Mon Sep 17 00:00:00 2001
F

pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Justin Pryzby
While working on a patch, I noticed this pre-existing behavior, which seems to
be new since v11, maybe due to changes to SRF.

|postgres=# SELECT pg_ls_dir('.') LIMIT 1;
|WARNING:  1 temporary files and directories not closed at end-of-transaction
|pg_ls_dir | pg_dynshmem

|postgres=# SELECT pg_ls_waldir() LIMIT 1;
|WARNING:  1 temporary files and directories not closed at end-of-transaction
|-[ RECORD 1 ]+-
|pg_ls_waldir | (00013192007B,16777216,"2020-03-08 03:50:34-07")


Note, that doesn't happen with "SELECT * FROM".

I'm not sure what the solution is to that, but my patch was going to make it
worse rather than better for pg_ls_tmpdir.

-- 
Justin




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Justin Pryzby
On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > While working on a patch, I noticed this pre-existing behavior, which seems 
> > to
> > be new since v11, maybe due to changes to SRF.
> 
> > |postgres=# SELECT pg_ls_dir('.') LIMIT 1;
> > |WARNING:  1 temporary files and directories not closed at 
> > end-of-transaction
> 
> Hmm, actually it looks to me like pg_ls_dir has been broken forever.
> The reason the warning didn't show up before v11 is that CleanupTempFiles
> didn't bleat about leaked "allocated" directories before that
> (cf 9cb7db3f0).
> 
> I guess we ought to change that function to use returns-a-tuplestore
> protocol instead of thinking it can hold a directory open across calls.
> It's not hard to think of use-cases where the existing behavior would
> cause issues worse than a nanny-ish WARNING, especially on platforms
> with tight "ulimit -n" limits.

Thanks for the analysis.

Do you mean it should enumerate all files during the initial SRF call, or use
something other than the SRF_* macros ?

-- 
Justin




Re: pg_ls_tmpdir to show directories and shared filesets

2020-03-07 Thread Justin Pryzby
On Sat, Mar 07, 2020 at 03:14:37PM +0100, Fabien COELHO wrote:
> Some feedback about the v7 patch set.

Thanks for looking again

> About v7.1, seems ok.
> 
> About v7.2 & v7.3 seems ok, altought the two could be merged.

These are separate since I proprose that one should be backpatched to v12 and
the other to v10.

> About v7.4:
...
> It seems that lists are used as FIFO structures by appending, fetching &
> deleting last, all of which are O(n). ISTM it would be better to use the
> head of the list by inserting, getting and deleting first, which are O(1).

I think you're referring to linked lists, but pglists are now arrays, for which
that's backwards.  See 1cff1b95a and d97b714a2.  For example, list_delete_last
says:
 * This is the opposite of list_delete_first(), but is noticeably cheaper
 * with a long list, since no data need be moved.

> ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should
> be "pg_ls_dir_files(..., true, DIR_HIDE);".

Oops, that affects an intermediate commit and maybe due to merge conflict.
Thanks.

> About v7.5 looks like a doc update which should be merged with v7.4.

No, v7.5 updates pg_proc.dat and changes the return type of two functions.
It's a short commit since all the infrastructure is implemented to make the
functions do whatever we want.  But it's deliberately separate since I'm
proposing a breaking change, and one that hasn't been discussed until now.

> Alas, ISTM that there are no tests on any of these functions:-(

Yeah.  Everything that includes any output is going to include timestamps;
those could be filtered out.  waldir is going to have random filenames, and a
differing number of rows.  But we should exercize pg_ls_dir_files at least
once..

My previous version had a bug with ignore_missing with pg_ls_tmpdir, which
would've been caught by a test like:
SELECT FROM pg_ls_tmpdir() WHERE name='Does not exist'; -- Never true, so the 
function runs to completion but returns zero rows.

The 0006 commit changes that for logdir, too.  Without 0006, that will ERROR if
the dir doesn't exist (which I think would be the default during regression
tests).

It'd be nice to run pg_ls_tmpdir before the tmpdir exists, and again
afterwards.  But I'm having trouble finding a single place to put it.  The
closest I can find is dbsize.sql.  Any ideas ?

-- 
Justin




Re: Memory-Bounded Hash Aggregation

2020-03-12 Thread Justin Pryzby
On Wed, Mar 11, 2020 at 11:55:35PM -0700, Jeff Davis wrote:
>  * tweaked EXPLAIN output some more
> Unless I (or someone else) finds something significant, this is close
> to commit.

Thanks for working on this ; I finally made a pass over the patch.

+++ b/doc/src/sgml/config.sgml
+  enable_groupingsets_hash_disk 
(boolean)
+Enables or disables the query planner's use of hashed aggregation for
+grouping sets when the size of the hash tables is expected to exceed
+work_mem.  See .  Note that this setting only
+affects the chosen plan; execution time may still require using
+disk-based hash aggregation.  ...
...
+  enable_hashagg_disk (boolean)
+... This only affects the planner choice;
+execution time may still require using disk-based hash
+aggregation. The default is on.

I don't understand what's meant by "the chosen plan".
Should it say, "at execution ..." instead of "execution time" ?

+Enables or disables the query planner's use of hashed aggregation plan
+types when the memory usage is expected to exceed

Either remove "plan types" for consistency with enable_groupingsets_hash_disk,
Or add it there.  Maybe it should say "when the memory usage would OTHERWISE BE
expected to exceed.."

+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+   Agg *agg   = (Agg *)aggstate->ss.ps.plan;
+   long memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;

I see this partially duplicates my patch [0] to show memory stats for (at
Andres' suggestion) all of execGrouping.c.  Perhaps you'd consider naming the
function something more generic in case my patch progresses ?  I'm using:
|show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es);

Mine also shows:
|ExplainPropertyInteger("Original Hash Buckets", NULL,
|ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
|ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",

[0] https://www.postgresql.org/message-id/20200306213310.GM684%40telsasoft.com

You added hash_mem_peak and hash_batches_used to struct AggState.
In my 0001 patch, I added instrumentation to struct TupleHashTable, and in my
0005 patch I move it into AggStatePerHashData and other State nodes.

+   if (from_tape)
+   partition_mem += HASHAGG_READ_BUFFER_SIZE;
+   partition_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;

=> That looks wrong ; should say += ?

+   gettext_noop("Enables the planner's use of hashed 
aggregation plans that are expected to exceed work_mem."),

should say:
"when the memory usage is otherwise be expected to exceed.."

-- 
Justin




Re: Additional size of hash table is alway zero for hash aggregates

2020-03-12 Thread Justin Pryzby
On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:
> On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:
> > When reading the grouping sets codes, I find that the additional size of
> > the hash table for hash aggregates is always zero, this seems to be
> > incorrect to me, attached a patch to fix it, please help to check.
> 
> Indeed, that's incorrect. Causes the number of buckets for the hashtable
> to be set higher - the size is just used for that.  I'm a bit wary of
> changing this in the stable branches - could cause performance changes?

I found that it was working when Andres implemented TupleHashTable, but broke
at:

| b5635948ab Support hashed aggregation with grouping sets.

So affects v11 and v12.  entrysize isn't used for anything else.

-- 
Justin




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-12 Thread Justin Pryzby
Thanks for working on this.  I have some minor comments.

In 0005:

+   /* Restore the input path (we might have addes 
Sort on top). */

=> added?  There's at least two more of the same typo.

+   /* also ignore already sorted paths */

=> You say that in a couple places, but I don't think "also" makes sense since
there's nothing preceding it ?

In 0004:

+* end up resorting the entire data set.  So, unless we 
can push

=> re-sorting

+ * Unlike generate_gather_paths, this does not look just as pathkeys of the

=> look just AT ?

+   /* now we know is_sorted == false */

=> I would just spell that "Assert", as I think you already do elsewhere.

+   /* continue */

=> Please consider saying "fall through", since "continue" means exactly the
opposite.


+generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool 
override_rows)
...
+   /* finally, consider incremental sort */
...
+   /* Also consider incremental sort. */

=> I think it's more confusing than useful with two comments - one is adequate.

In 0002:

+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
...
+ * make_incrementalsort --- basic routine to build a IncrementalSort plan node

=> AN incremental

+ * Initial size of memtuples array.  We're trying to select this size so that
+ * array don't exceed ALLOCSET_SEPARATE_THRESHOLD and overhead of allocation
+ * be possible less.  However, we don't cosider array sizes less than 1024

Four typos (?)
that array DOESN'T
and THE overhead
CONSIDER
I'm not sure, but "be possible less" should maybe say "possibly be less" ?

+   boolmaxSpaceOnDisk; /* true when maxSpace is value for 
on-disk

I suggest to call it IsMaxSpaceDisk

+   MemoryContext maincontext;  /* memory context for tuple sort 
metadata
+  that persist across multiple batches 
*/

persists

+ * a new sort.  It allows evade recreation of tuple sort (and save 
resources)
+ * when sorting multiple small batches.

allows to avoid?  Or allows avoiding?

+ *  When performing sorting by multiple keys input dataset could be already
+ *  presorted by some prefix of these keys.  We call them "presorted keys".

"already presorted" sounds redundant

+   int64   fullsort_group_count;   /* number of groups with equal 
presorted keys */
+   int64   prefixsort_group_count; /* number of groups with equal 
presorted keys */

I guess these should have different comments

-- 
Justin




Re: backend type in log_line_prefix?

2020-03-14 Thread Justin Pryzby
On Fri, Feb 21, 2020 at 10:09:38AM +0100, Peter Eisentraut wrote:
> From 75ac8ed0c47801712eb2aa300d9cb29767d2e121 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Thu, 20 Feb 2020 18:16:39 +0100
> Subject: [PATCH v2 3/4] Add backend type to csvlog and optionally 
> log_line_prefix

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index c1128f89ec..206778b1c3 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -6470,6 +6470,11 @@ What to Log

  characters are copied straight to the log line. Some escapes are
  only recognized by session processes, and will be treated as empty by
  background processes such as the main server process. Status
...
  Escape
  Effect
  Session only

>   Application name
>   yes
>  
> +
> + %b
> + Backend process type
> + yes

=> should say "no", it's not blank for background processes:

> +
> + if (MyProcPid == PostmasterPid)
> + backend_type_str = "postmaster";
> + else if (MyBackendType == B_BG_WORKER)
> + backend_type_str = 
> MyBgworkerEntry->bgw_type;
> + else
> + backend_type_str = 
> pgstat_get_backend_desc(MyBackendType);

-- 
Justin




Re: backend type in log_line_prefix?

2020-03-10 Thread Justin Pryzby
On Thu, Feb 13, 2020 at 06:43:32PM +0900, Fujii Masao wrote:
> If we do this, backend type should be also included in csvlog?

+1, I've been missing that

Note, this patch seems to correspond to:
b025f32e0b Add leader_pid to pg_stat_activity

I had mentioned privately to Julien missing this info in CSV log.

Should leader_pid be exposed instead (or in addition)?  Or backend_type be a
positive number giving the leader's PID if it's a parallel worker, or a some
special negative number like -BackendType to indicate a nonparallel worker.
NULL for a B_BACKEND which is not a parallel worker.

My hope is to answer to questions like these:

. is query (ever? usually?) using parallel paths?
. is query usefully using parallel paths?
. what queries are my max_parallel_workers(_per_process) being used for ?
. Are certain longrunning or frequently running queries which are using
  parallel paths using all max_parallel_workers and precluding other queries
  from using parallel query ?  Or, are semi-short queries sometimes precluding
  longrunning queries from using parallelism, when the long queries would
  better benefit ?

I think this patch alone wouldn't provide that, and there'd need to either be a
line logged for each worker.  Maybe it'd log full query+details (ugh), or just
log "parallel worker of pid...".  Or maybe there'd be a new column with which
the leader would log nworkers (workers planned vs workers launched - I would
*not* want to get this out of autoexplain).

-- 
Justin




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-10 Thread Justin Pryzby
I took a step back, and I wondered whether we should add a generic function for
listing a dir with metadata, possibly instead of changing the existing
functions.  Then one could do pg_ls_dir_metadata('pg_wal',false,false);

Since pg8.1, we have pg_ls_dir() to show a list of files.  Since pg10, we've
had pg_ls_logdir and pg_ls_waldir, which show not only file names but also
(some) metadata (size, mtime).  And since pg12, we've had pg_ls_tmpfile and
pg_ls_archive_statusdir, which also show metadata.

...but there's no a function which lists the metadata of an directory other
than tmp, wal, log.

One can do this:
|SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT 
a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c;
..but that's not as helpful as allowing:
|SELECT * FROM pg_ls_dir_metadata('.',true,true);

There's also no function which recurses into an arbitrary directory, so it
seems shortsighted to provide a function to recursively list a tmpdir.

Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can
write a SQL function to show the dir recursively.  It'd be trivial to plug in
wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely
trivial).
|SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp');

Also, on a neighboring thread[1], Tom indicated that the pg_ls_* functions
should enumerate all files during the initial call, which sounds like a bad
idea when recursively showing directories.  If we add a function recursing into
a directory, we'd need to discuss all the flags to expose to it, like recurse,
ignore_errors, one_filesystem?, show_dotfiles (and eventually bikeshed all the
rest of the flags in find(1)).

My initial patch [2] changed ls_tmpdir to show metadata columns including
is_dir, but not decend.  It's pretty unfortunate if a function called
pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that
(it's new in v12).

I'm interested to in feedback on the alternative approach, as attached.  The
final patch to include all the rest of columns shown by pg_stat_file() is more
of an idea/proposal and not sure if it'll be desirable.  But pg_ls_tmpdir() is
essentially the same as my v1 patch.

This is intended to be mostly independent of any fix to the WARNING I reported
[1].  Since my patch collapses pg_ls_dir into pg_ls_dir_files, we'd only need
to fix one place.  I'm planning to eventually look into Tom's suggestion of
returning tuplestore to fix that, and maybe rebase this patchset on top of
that.

-- 
Justin

[1] 
https://www.postgresql.org/message-id/flat/20200308173103.GC1357%40telsasoft.com
[2] https://www.postgresql.org/message-id/20191214224735.GA28433%40telsasoft.com
>From 2c4b2c408490ecde3cfb4e336a78942f7a6f8197 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Dec 2019 23:34:14 -0600
Subject: [PATCH v9 01/11] BUG: in errmsg

Note there's two changes here.
Should backpatch to v12, where pg_ls_tmpdir was added.
---
 src/backend/utils/adt/genfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 3741b87486..897b11a77d 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -590,7 +590,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 		if (stat(path, ) < 0)
 			ereport(ERROR,
 	(errcode_for_file_access(),
-	 errmsg("could not stat directory \"%s\": %m", dir)));
+	 errmsg("could not stat file \"%s\": %m", path)));
 
 		/* Ignore anything but regular files */
 		if (!S_ISREG(attrib.st_mode))
-- 
2.17.0

>From f3ef0c6ff664f2f26e95ce97e8b50a813bd1aab8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 6 Mar 2020 16:50:07 -0600
Subject: [PATCH v9 02/11] Document historic behavior about hiding directories
 and special files

Should backpatch to v10: tmpdir, waldir and archive_statusdir
---
 doc/src/sgml/func.sgml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 323366feb6..4c0ea5ab3f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21450,6 +21450,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 (mtime) of each file in the log directory. By default, only superusers
 and members of the pg_monitor role can use this function.
 Access may be granted to others using GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

@@ -21461,6 +21462,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 default only superusers and members of the pg_monitor role
 can use this function. Access may be granted to others using
 GRANT.
+Filenames beginning with a dot, directories, and other special files are not shown.

 

@@ -21473,6 +21475,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 superuser

Re: backend type in log_line_prefix?

2020-03-11 Thread Justin Pryzby
On Tue, Mar 10, 2020 at 02:01:42PM -0500, Justin Pryzby wrote:
> On Thu, Feb 13, 2020 at 06:43:32PM +0900, Fujii Masao wrote:
> > If we do this, backend type should be also included in csvlog?
> 
> +1, I've been missing that
> 
> Note, this patch seems to correspond to:
> b025f32e0b Add leader_pid to pg_stat_activity
> 
> I had mentioned privately to Julien missing this info in CSV log.
> 
> Should leader_pid be exposed instead (or in addition)?  Or backend_type be a

I looked more closely and played with the patch.

Can I suggest:

$ git diff
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 3a6f7f9456..56e0a1437e 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2945,7 +2945,7 @@ write_csvlog(ErrorData *edata)
if (MyProcPid == PostmasterPid)
appendCSVLiteral(, "postmaster");
else if (MyBackendType == B_BG_WORKER)
-   appendCSVLiteral(, MyBgworkerEntry->bgw_type);
+   appendCSVLiteral(, MyBgworkerEntry->bgw_name);
else
appendCSVLiteral(, pgstat_get_backend_desc(MyBackendType));


Then it logs the leader:
|2020-03-11 13:16:05.596 CDT,,,16289,,5e692ae3.3fa1,1,,2020-03-11 13:16:03 
CDT,4/3,0,LOG,0,"temporary file: path ""base/pgsql_tmp/pgsql_tmp16289.0"", 
size 4276224",,"explain analyze SELECT * FROM t a JOIN t b USING(i) WHERE 
i>999 GROUP BY 1;",,,"psql","parallel worker for PID 16210"

It'll be easy enough to extract the leader and join that ON leader=pid.

> I think this patch alone wouldn't provide that, and there'd need to either be 
> a
> line logged for each worker.  Maybe it'd log full query+details (ugh), or just
> log "parallel worker of pid...".  Or maybe there'd be a new column with which
> the leader would log nworkers (workers planned vs workers launched - I would
> *not* want to get this out of autoexplain).

I'm still not sure how to do that, though.
I see I can get what's needed at DEBUG1:

|2020-03-11 13:50:58.304 CDT,,,16196,,5e692aa7.3f44,22,,2020-03-11 13:15:03 
CDT,,0,DEBUG,0,"registering background worker ""parallel worker for PID 
16210""","","postmaster"

But I don't think it's viable to run for very long with log_statement=all,
log_min_messages=DEBUG.

-- 
Justin




Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Justin Pryzby
On Wed, Apr 08, 2020 at 12:02:39PM -0400, Alvaro Herrera wrote:
> On 2020-Apr-08, Justin Pryzby wrote:
> 
> > This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH 
> > FOR"
> > was first allowed on partition tables (86f575948).
> > 
> > I thought this would work like partitioned indexes (8b08f7d48), where 
> > detaching
> > a partition makes its index non-inherited, and attaching a partition marks a
> > pre-existing, matching partition as inherited rather than creating a new 
> > one.
> 
> Hmm.  Let's agree to what behavior we want, and then we implement that.
> It seems to me there are two choices:
> 
> 1. on detach, keep the trigger but make it independent of the trigger on
> parent.  (This requires that the trigger is made dependent on the
> trigger on parent, if the table is attached as partition again;
> otherwise you'd end up with multiple copies of the trigger if you
> detach/attach multiple times).
> 
> 2. on detach, remove the trigger from the partition.
> 
> I think (2) is easier to implement, but (1) is the more convenient
> behavior.

At telsasoft, we don't care (we uninherit tables before ALTERing parents to
avoid disruptive locking and to avoid worst-case disk use).

(1) is consistent with the behavior for indexes, which is a slight advantage
for users' ability to understand and keep track of the behavior.  But adding
triggers is pretty different so I'm not sure it's a totally compelling
parallel.

-- 
Justin




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Justin Pryzby
On Wed, Apr 08, 2020 at 11:57:08AM -0400, Robert Haas wrote:
> On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
>  wrote:
> > I think, Tushar point is that either we should allow both
> > vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> > both cases, we should through error.
> 
> Oh, yeah, good point. Somebody must not've been careful enough with
> the options-checking code.

Actually I think someone was too careful.

>From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Apr 2020 11:38:36 -0500
Subject: [PATCH v1] parallel vacuum: options check to use same test as in
 vacuumlazy.c

---
 src/backend/commands/vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 351d5215a9..660c854d49 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
boolfreeze = false;
boolfull = false;
booldisable_page_skipping = false;
-   boolparallel_option = false;
ListCell   *lc;
 
/* Set default value */
@@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
params.truncate = get_vacopt_ternary_value(opt);
else if (strcmp(opt->defname, "parallel") == 0)
{
-   parallel_option = true;
if (opt->arg == NULL)
{
ereport(ERROR,
@@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
Assert(!(params.options & VACOPT_SKIPTOAST));
 
-   if ((params.options & VACOPT_FULL) && parallel_option)
+   if ((params.options & VACOPT_FULL) && params.nworkers > 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot specify both FULL and PARALLEL 
options")));
-- 
2.17.0

>From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Apr 2020 11:38:36 -0500
Subject: [PATCH v1] parallel vacuum: options check to use same test as in
 vacuumlazy.c

---
 src/backend/commands/vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 351d5215a9..660c854d49 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	bool		freeze = false;
 	bool		full = false;
 	bool		disable_page_skipping = false;
-	bool		parallel_option = false;
 	ListCell   *lc;
 
 	/* Set default value */
@@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "parallel") == 0)
 		{
-			parallel_option = true;
 			if (opt->arg == NULL)
 			{
 ereport(ERROR,
@@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
 	Assert(!(params.options & VACOPT_SKIPTOAST));
 
-	if ((params.options & VACOPT_FULL) && parallel_option)
+	if ((params.options & VACOPT_FULL) && params.nworkers > 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot specify both FULL and PARALLEL options")));
-- 
2.17.0



DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Justin Pryzby
This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH FOR"
was first allowed on partition tables (86f575948).

I thought this would work like partitioned indexes (8b08f7d48), where detaching
a partition makes its index non-inherited, and attaching a partition marks a
pre-existing, matching partition as inherited rather than creating a new one.

DROP TABLE t, t1;
CREATE TABLE t(i int)PARTITION BY RANGE(i);
CREATE TABLE t1 PARTITION OF t FOR VALUES FROM(1)TO(2);
CREATE OR REPLACE FUNCTION trigf() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN 
END $$;
CREATE TRIGGER trig AFTER INSERT ON t FOR EACH ROW EXECUTE FUNCTION trigf();
SELECT tgrelid::regclass, * FROM pg_trigger WHERE tgrelid='t1'::regclass;
ALTER TABLE t DETACH PARTITION t1;
ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
ERROR:  trigger "trig" for relation "t1" already exists

DROP TRIGGER trig ON t1;
ERROR:  cannot drop trigger trig on table t1 because trigger trig on table t 
requires it
HINT:  You can drop trigger trig on table t instead.

I remember these, but they don't seem to be relevant to this issue, which seems
to be independant.

1fa846f1c9 Fix cloning of row triggers to sub-partitions
b9b408c487 Record parents of triggers

The commit for partitioned indexes talks about using an pre-existing index on
the child as a "convenience gadget", puts indexes into pg_inherit, and
introduces "ALTER INDEX..ATTACH PARTITION" and "CREATE INDEX..ON ONLY".

It's probably rare for a duplicate index to be useful (unless rebuilding to be
more optimal, which is probably not reasonably interspersed with altering
inheritence).  But I don't know if that's equally true for triggers.  So I'm
not sure what the intended behavior is, so I've stopped after implementing
a partial fix.
>From 2c31cac22178d904ee108b77f316886d1e2f6288 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 Apr 2020 22:43:26 -0500
Subject: [PATCH] WIP: fix detaching tables with inherited triggers

---
 src/backend/commands/tablecmds.c | 33 
 1 file changed, 33 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 037d457c3d..10a60e158f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16797,6 +16797,39 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 	}
 	table_close(classRel, RowExclusiveLock);
 
+	/* detach triggers too */
+	{
+		/* XXX: relcache.c */
+		ScanKeyData skey;
+		SysScanDesc	scan;
+		HeapTuple	trigtup;
+		Relation	tgrel = table_open(TriggerRelationId, RowExclusiveLock);
+
+		ScanKeyInit(, Anum_pg_trigger_tgrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGetDatum(RelationGetRelid(partRel)));
+
+		scan = systable_beginscan(tgrel, TriggerRelidNameIndexId,
+true, NULL, 1, );
+
+		while (HeapTupleIsValid(trigtup = systable_getnext(scan)))
+		{
+			Form_pg_trigger pg_trigger;
+			trigtup = heap_copytuple(trigtup);  /* need a modifiable copy */
+			pg_trigger = (Form_pg_trigger) GETSTRUCT(trigtup);
+			/* Set the trigger's parent to Invalid */
+			if (!OidIsValid(pg_trigger->tgparentid))
+continue;
+			if (!pg_trigger->tgisinternal)
+continue;
+			pg_trigger->tgparentid = InvalidOid;
+			pg_trigger->tgisinternal = false;
+			CatalogTupleUpdate(tgrel, >t_self, trigtup);
+			heap_freetuple(trigtup);
+		}
+		systable_endscan(scan);
+		table_close(tgrel, RowExclusiveLock);
+	}
+
 	/*
 	 * Detach any foreign keys that are inherited.  This includes creating
 	 * additional action triggers.
-- 
2.17.0



doc review for v13

2020-04-08 Thread Justin Pryzby
I reviewed docs for v13, like:
git log --cherry-pick origin/master...origin/REL_12_STABLE -p doc

I did something similar for v12 [0].  I've included portions of that here which
still seem lacking 12 months later (but I'm not intending to continue defending
each individual patch hunk).

I previously mailed separately about a few individual patches, some of which
have separate, ongoing discussion and aren't included here (incr sort, parallel
vacuum).

Justin

[0] 
https://www.postgresql.org/message-id/flat/20190709161256.GH22387%40telsasoft.com#56889b868e5886e36b90e9f5a1165186
>From 482b590355cd7df327602dd36e91721b827f9c37 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:31:04 -0500
Subject: [PATCH v1 01/19] docs: pg_statistic_ext.stxstattarget

commit c31132d87c6315bbbe4b4aa383705aaae2348c0e
Author: Tomas Vondra 
Date:   Wed Mar 18 16:48:12 2020 +0100
---
 doc/src/sgml/catalogs.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 386c6d7bd1..ce33df9e58 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6472,7 +6472,7 @@ SCRAM-SHA-256$iteration count:
.
A zero value indicates that no statistics should be collected.
A negative value says to use the system default statistics target.
-   Positive values stxstattarget
+   Positive values of stxstattarget
determine the target number of most common values
to collect.
   
-- 
2.17.0

>From 2a3a4d7028b02070447fafd37e66e72da59966bf Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:33:44 -0500
Subject: [PATCH v1 02/19] docs: reg* functions

commit 8408e3a557ad26a7e88f867a425b2b9a86c4fa04
Author: Peter Eisentraut 
Date:   Wed Mar 18 14:51:37 2020 +0100
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a38387b8c6..fd0f5d64b3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18796,7 +18796,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
to_regnamespace, to_regoper,
to_regoperator, to_regrole,
to_regproc, to_regprocedure, and
-   to_regtype, functions translate relation, collation, schema,
+   to_regtype translate relation, collation, schema,
operator, role, function, and type names (given as text) to
objects of the corresponding reg* type (see  about the types).  These functions differ from a
-- 
2.17.0

>From 6864ced0a9eaeab4c010d1f090b26b337f125742 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:43:42 -0500
Subject: [PATCH v1 03/19] Minus one

See also
ac862376037727e744f25030bd8b6090c707247b
---
 doc/src/sgml/config.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a0da4aabac..ea2749535d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6110,7 +6110,7 @@ local0.*/var/log/postgresql
  unoptimized queries in your applications.
  If this value is specified without units, it is taken as milliseconds.
  Setting this to zero prints all statement durations.
- Minus-one (the default) disables logging statement durations.
+ -1 (the default) disables logging statement durations.
  Only superusers can change this setting.
 
 
@@ -6162,7 +6162,7 @@ local0.*/var/log/postgresql
  traffic is too high to log all queries.
  If this value is specified without units, it is taken as milliseconds.
  Setting this to zero samples all statement durations.
- Minus-one (the default) disables sampling statement durations.
+ -1 (the default) disables sampling statement durations.
  Only superusers can change this setting.
 
 
-- 
2.17.0

>From bfb8439eb5618db3a36ca2794dbcc35489d98c27 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:51:32 -0500
Subject: [PATCH v1 04/19] doc: psql opclass/opfamily

commit b0b5e20cd8d1a58a8782d5dc806a5232db116e2f
Author: Alexander Korotkov 

ALSO, should we rename the "Purpose" column?  I see we have pg_amop.amoppurpose
so maybe it's fine ?
---
 doc/src/sgml/ref/psql-ref.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 0595d1c04b..cdd24fad98 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1244,7 +1244,7 @@ testdb=
 (see ).
 If access-method-patttern
 is specified, only operator classes associated with access methods whose
-names match pattern are listed.
+names match the pattern are listed.
 If input-type-pattern
 is specified, only operator classes associated with input types whose
 names match the pattern are list

Re: debian bugrept involving fast default crash in pg11.7

2020-04-09 Thread Justin Pryzby
On Thu, Apr 09, 2020 at 02:36:26PM -0400, Tim Bishop wrote:
> SELECT attrelid::regclass, * FROM pg_attribute WHERE atthasmissing;
> -[ RECORD 1 ]-+-
> attrelid  | download
> attrelid  | 22749
> attname   | filetype

But that table isn't involved in the crashing query, right ?
Are data_stage() and income_index() locally defined functions ?  PLPGSQL ??
Do they access the download table (or view or whatever it is) ?

Thanks,
-- 
Justin




<    1   2   3   4   5   6   7   8   9   10   >