Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-14 Thread Alvaro Herrera
Thanks for these corrections -- I applied them and a few minor changes
from myself, and pushed.  Another set of eyes over the result would be
most welcome.

I hope we can close this now :-)

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793=4647152)




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-13 Thread Justin Pryzby
On Thu, May 13, 2021 at 05:33:33PM -0400, Alvaro Herrera wrote:
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze 
> scale factor * number of tu
>  
>  is compared to the total number of tuples inserted, updated, or deleted
>  since the last ANALYZE.
> +For partitioned tables, inserts and updates on partitions are counted
> +towards this threshold; however partition meta-operations such as
> +attachment, detachment or drop are not, so running a manual
> +ANALYZE is recommended if the partition added or
> +removed contains a statistically significant volume of data.

I suggest: "Inserts, updates and deletes on partitions of a partitioned table
are counted towards this threshold; however DDL operations such as ATTACH,
DETACH and DROP are not, ...

> +   and in addition it will analyze each individual partition separately.

remove "and" and say in addition COMMA
Or:
"it will also recursive into each partition and update its statistics."

> +   By constrast, if the table being analyzed has inheritance children,
> +   ANALYZE will gather statistics for that table twice:
> +   once on the rows of the parent table only, and a second time on the
> +   rows of the parent table with all of its children.  This second set of
> +   statistics is needed when planning queries that traverse the entire
> +   inheritance tree.  The children tables are not individually analyzed
> +   in this case.

say "The child tables themselves.."

> +  
> +   For tables with inheritance children, the autovacuum daemon only
> +   counts inserts and deletes in the parent table itself when deciding
> +   whether to trigger an automatic analyze for that table.  If that table
> +   is rarely inserted into or updated, the inheritance statistics will
> +   not be up to date unless you run ANALYZE manually.
> +  

This should be emphasized:
Tuples changed in inheritence children do not count towards analyze on the
parent table.  If the parent table is empty or rarely changed, it may never 
be processed by autovacuum.  It's necesary to periodically run an manual
ANALYZE to keep the statistics of the table hierarchy up to date.

I don't know why it says "inserted or updated" but doesn't say "or deleted" -
that seems like a back-patchable fix.

> +++ b/doc/src/sgml/ref/pg_restore.sgml
> @@ -922,8 +922,10 @@ CREATE DATABASE foo WITH TEMPLATE template0;
>  
>
> Once restored, it is wise to run ANALYZE on each
> -   restored table so the optimizer has useful statistics; see
> -and
> +   restored table so the optimizer has useful statistics.
> +   If the table is a partition or an inheritance child, it may also be useful
> +   to analyze the parent table.
> +   See  and
>  for more information.

maybe say: "analyze the parent to update statistics for the table hierarchy".




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-13 Thread Alvaro Herrera
With English fixes from Bruce.

I think the note about autovacuum in the reference page for ANALYZE is a
bit out of place, but not enough to make me move the whole paragraph
elsewhere.  Maybe we should try to do that sometime.

-- 
Álvaro Herrera   Valdivia, Chile
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
>From 2c3e913543ff76a1b170fe6e9bf2aeb8c7e13852 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 May 2021 16:24:11 -0400
Subject: [PATCH v2] update docs on analyze on partitioned tables

---
 doc/src/sgml/maintenance.sgml|  5 
 doc/src/sgml/perform.sgml|  3 ++-
 doc/src/sgml/ref/analyze.sgml| 40 +++-
 doc/src/sgml/ref/pg_restore.sgml |  6 +++--
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index de7fd75e1c..b390debf2e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 
 is compared to the total number of tuples inserted, updated, or deleted
 since the last ANALYZE.
+For partitioned tables, inserts and updates on partitions are counted
+towards this threshold; however partition meta-operations such as
+attachment, detachment or drop are not, so running a manual
+ANALYZE is recommended if the partition added or
+removed contains a statistically significant volume of data.

 

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..ddd6c3ff3e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
-includes bulk loading large amounts of data into the table.  Running
+includes bulk loading large amounts of data into the table as well as
+attaching, detaching or dropping partitions.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..f99e49798e 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,20 +250,38 @@ ANALYZE [ VERBOSE ] [ table_and_columns
 
   
-If the table being analyzed has one or more children,
-ANALYZE will gather statistics twice: once on the
-rows of the parent table only, and a second time on the rows of the
-parent table with all of its children.  This second set of statistics
-is needed when planning queries that traverse the entire inheritance
-tree.  The autovacuum daemon, however, will only consider inserts or
-updates on the parent table itself when deciding whether to trigger an
-automatic analyze for that table.  If that table is rarely inserted into
-or updated, the inheritance statistics will not be up to date unless you
-run ANALYZE manually.
+   If the table being analyzed is partitioned, ANALYZE
+   will gather statistics by sampling blocks randomly from its partitions;
+   and in addition it will analyze each individual partition separately.
+   (However, in multi-level partitioning scenarios, each leaf partition
+   will only be analyzed once.)
+   By constrast, if the table being analyzed has inheritance children,
+   ANALYZE will gather statistics for that table twice:
+   once on the rows of the parent table only, and a second time on the
+   rows of the parent table with all of its children.  This second set of
+   statistics is needed when planning queries that traverse the entire
+   inheritance tree.  The child tables are not individually analyzed
+   in this case.
   
 
   
-If any of the child tables are foreign tables whose foreign data wrappers
+   The autovacuum daemon counts inserts and updates in the partitions
+   to determine if auto-analyze is needed.  However, adding or
+   removing partitions does not affect autovacuum daemon decisions,
+   so triggering a manual ANALYZE is recommended when
+   this occurs.
+  
+
+  
+   For tables with inheritance children, the autovacuum daemon only
+   counts inserts and deletes in the parent table itself when deciding
+   whether to trigger an automatic analyze for that table.  If that table
+   is rarely inserted into or updated, the inheritance statistics will
+   not be up to date unless you run ANALYZE manually.
+  
+
+  
+If any of the child tables or partitions are foreign tables whose foreign data wrappers
 do not support ANALYZE, those child tables are ignored while
 gathering inheritance statistics.
   
diff --git a/doc/src/sgml/ref/pg_restore.sgml 

Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-13 Thread Alvaro Herrera
New version, a bit more ambitious.  I think it's better to describe
behavior for partitioned tables ahead of inheritance.  Also, in the
ANALYZE reference page I split the topic in two: in one single paragraph
we now describe what happens with manual analyze for partitioned tables
and inheritance hierarchies; we describe the behavior of autovacuum in
one separate paragraph for each type of hierarchy, since the differences
are stark.

I noticed that difference while verifying the behavior that I was to
document.  If you look at ANALYZE VERBOSE output, it seems a bit
wasteful:

create table part (a int) partition by list (a);
create table part0 partition of part for values in (0);
create table part1 partition of part for values in (1);
create table part23 partition of part for values in (2, 3) partition by list 
(a);
create table part2 partition of part23 for values in (2);
create table part3 partition of part23 for values in (3);
insert into part select g%4 from generate_series(1, 5000) g;

analyze verbose part;

INFO:  analyzing "public.part" inheritance tree
INFO:  "part1": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  "part2": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  "part3": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  "part4": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 
dead rows; 7500 rows in sample, 12500060 estimated total rows
INFO:  analyzing "public.part1"
INFO:  "part1": scanned 3 of 55310 pages, containing 6779940 live rows and 
0 dead rows; 3 rows in sample, 12499949 estimated total rows
INFO:  analyzing "public.part2"
INFO:  "part2": scanned 3 of 55310 pages, containing 6779940 live rows and 
0 dead rows; 3 rows in sample, 12499949 estimated total rows
INFO:  analyzing "public.part34" inheritance tree
INFO:  "part3": scanned 15000 of 55310 pages, containing 339 live rows and 
0 dead rows; 15000 rows in sample, 12500060 estimated total rows
INFO:  "part4": scanned 15000 of 55310 pages, containing 3389940 live rows and 
0 dead rows; 15000 rows in sample, 12499839 estimated total rows
INFO:  analyzing "public.part3"
INFO:  "part3": scanned 3 of 55310 pages, containing 678 live rows and 
0 dead rows; 3 rows in sample, 12500060 estimated total rows
INFO:  analyzing "public.part4"
INFO:  "part4": scanned 3 of 55310 pages, containing 678 live rows and 
0 dead rows; 3 rows in sample, 12500060 estimated total rows
ANALYZE

-- 
Álvaro Herrera   Valdivia, Chile
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)
>From 6961e64a3ad5bfd10a14f544c470dbb93f9aadc3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 13 May 2021 16:24:11 -0400
Subject: [PATCH] update docs on analyze on partitioned tables

---
 doc/src/sgml/maintenance.sgml|  5 
 doc/src/sgml/perform.sgml|  3 ++-
 doc/src/sgml/ref/analyze.sgml| 40 +++-
 doc/src/sgml/ref/pg_restore.sgml |  6 +++--
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index de7fd75e1c..b390debf2e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 
 is compared to the total number of tuples inserted, updated, or deleted
 since the last ANALYZE.
+For partitioned tables, inserts and updates on partitions are counted
+towards this threshold; however partition meta-operations such as
+attachment, detachment or drop are not, so running a manual
+ANALYZE is recommended if the partition added or
+removed contains a statistically significant volume of data.

 

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..ddd6c3ff3e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
-includes bulk loading large amounts of data into the table.  Running
+includes bulk loading large amounts of data into the table as well as
+attaching, detaching or dropping partitions.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..8f8d3af985 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ 

Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-11 Thread Alvaro Herrera
On 2021-Apr-23, Justin Pryzby wrote:

> On Thu, Apr 22, 2021 at 12:43:46PM -0500, Justin Pryzby wrote:
> > 
> > I think that should probably have been written down somewhere other than for
> > the manual ANALYZE command, but in any case it seems to be outdated now.
> 
> Starting with this 

Agreed, we need some more docs here.  I lightly edited yours and ended
up with this -- mostly I think partitioned tables should not be in the
same paragraph as legacy inheritance because the behavior is different
enough (partitioned tables are not analyzed twice).

I'll give a deeper look tomorrow to see if other places also need edits.

Thanks

-- 
Álvaro Herrera   Valdivia, Chile
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..ddd6c3ff3e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
-includes bulk loading large amounts of data into the table.  Running
+includes bulk loading large amounts of data into the table as well as
+attaching, detaching or dropping partitions.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..9d5e2a9626 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,7 +250,17 @@ ANALYZE [ VERBOSE ] [ table_and_columns
 
   
-If the table being analyzed has one or more children,
+   If the table being analyzed is partitioned, ANALYZE
+   will gather statistics by scanning all of its partitions.
+   The autovacuum daemon counts inserts and updates in the partitions
+   to determine if auto-analyze is needed.  However, adding or
+   removing partitions does not affect the autovacuum daemon decisions,
+   so triggering a manual ANALYZE is recommended when
+   they occur.
+  
+
+  
+If the table being analyzed has one or more legacy inheritance children,
 ANALYZE will gather statistics twice: once on the
 rows of the parent table only, and a second time on the rows of the
 parent table with all of its children.  This second set of statistics
@@ -263,7 +273,7 @@ ANALYZE [ VERBOSE ] [ table_and_columns
 
   
-If any of the child tables are foreign tables whose foreign data wrappers
+If any of the child tables or partitions are foreign tables whose foreign data wrappers
 do not support ANALYZE, those child tables are ignored while
 gathering inheritance statistics.
   
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 93ea937ac8..474f18c73f 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -922,8 +922,10 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
   
Once restored, it is wise to run ANALYZE on each
-   restored table so the optimizer has useful statistics; see
-and
+   restored table so the optimizer has useful statistics.
+   If the table is a partition or an inheritance child, it may also be useful
+   to analyze the parent table.
+   See  and
 for more information.
   
 


Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-05-11 Thread Michael Paquier
On Wed, Apr 21, 2021 at 07:06:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Robert Haas wrote:
>> Does this need to worry about new partitions getting attached to a
>> partitioned table, or old ones getting detached? (Maybe it does
>> already, not sure.)
> 
> I was pinged because this is listed as an open item.  I don't think it
> is one.  Handling ATTACH/DETACH/DROP is important for overall
> consistency, of course, so we should do it eventually, but the fact that
> autovacuum runs analyze *at all* for partitioned tables is an enormous
> step forward from it not doing so.  I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.

Yeah, I'd agree that this could be done as some future work so it
looks fine to move it to the section for "won't fix" items, but that
sounds rather tricky to me as there are dependencies across the
partitions.

Now, I don't think that we are completely done either, as one
documentation patch has been sent here:
https://www.postgresql.org/message-id/20210423180152.ga17...@telsasoft.com

Alvaro, could you look at that?
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-23 Thread Justin Pryzby
On Thu, Apr 22, 2021 at 12:43:46PM -0500, Justin Pryzby wrote:
> Maybe the behavior should be documented, though.  Actually, I thought the
> pre-existing (non)behavior of autoanalyze would've been documented, and we'd
> now update that.  All I can find is this:
> 
> https://www.postgresql.org/docs/current/sql-analyze.html
> |The autovacuum daemon, however, will only consider inserts or updates on the
> |parent table itself when deciding whether to trigger an automatic analyze for
> |that table
> 
> I think that should probably have been written down somewhere other than for
> the manual ANALYZE command, but in any case it seems to be outdated now.

Starting with this 

>From a7ae56a879b6bacc4fc22cbd769851713be89840 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 23 Apr 2021 09:15:58 -0500
Subject: [PATCH] WIP: Add docs for autovacuum processing of partitioned tables

---
 doc/src/sgml/perform.sgml| 3 ++-
 doc/src/sgml/ref/analyze.sgml| 4 +++-
 doc/src/sgml/ref/pg_restore.sgml | 6 --
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..814c3cffbe 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND 
somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly 
recommended. This
-includes bulk loading large amounts of data into the table.  Running
+includes bulk loading large amounts of data into the table,
+or attaching/detaching partitions.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..179ae3555d 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -255,11 +255,13 @@ ANALYZE [ VERBOSE ] [ table_and_columnsANALYZE manually.
+For partitioned tables, inserts and updates on the partitions are counted
+towards auto-analyze on the parent.
   
 
   
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 93ea937ac8..260bf0feb7 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -922,8 +922,10 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
   
Once restored, it is wise to run ANALYZE on each
-   restored table so the optimizer has useful statistics; see
-and
+   restored table so the optimizer has useful statistics.
+   If the table is a partition or an inheritence child, it may also be useful
+   to analyze the parent table.
+   See  and
 for more information.
   
 
-- 
2.17.0





Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-22 Thread Justin Pryzby
On Wed, Apr 21, 2021 at 07:06:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Robert Haas wrote:
> > Does this need to worry about new partitions getting attached to a
> > partitioned table, or old ones getting detached? (Maybe it does
> > already, not sure.)
> 
> I was pinged because this is listed as an open item.  I don't think it
> is one.  Handling ATTACH/DETACH/DROP is important for overall
> consistency, of course, so we should do it eventually, but the fact that
> autovacuum runs analyze *at all* for partitioned tables is an enormous
> step forward from it not doing so.  I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.

I think this is okay, with the caveat that we'd be changing the behavior
(again) in a future release, rather than doing it all in v14.

Maybe the behavior should be documented, though.  Actually, I thought the
pre-existing (non)behavior of autoanalyze would've been documented, and we'd
now update that.  All I can find is this:

https://www.postgresql.org/docs/current/sql-analyze.html
|The autovacuum daemon, however, will only consider inserts or updates on the
|parent table itself when deciding whether to trigger an automatic analyze for
|that table

I think that should probably have been written down somewhere other than for
the manual ANALYZE command, but in any case it seems to be outdated now.

-- 
Justin




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-21 Thread yuzuko
Hi,

Thank you for discussing this item.

> I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.
>
I agree with your opinion.

> Now, if somebody sees a very trivial way to handle it, let's discuss it,
> but *I* don't see it.
>
I started thinking about the way to handle ATTACH/DETACH/DROP,
but I haven't created patches.  If no one has done it yet, I'll keep working.

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-21 Thread Alvaro Herrera
On 2021-Apr-09, Robert Haas wrote:

> Does this need to worry about new partitions getting attached to a
> partitioned table, or old ones getting detached? (Maybe it does
> already, not sure.)

I was pinged because this is listed as an open item.  I don't think it
is one.  Handling ATTACH/DETACH/DROP is important for overall
consistency, of course, so we should do it eventually, but the fact that
autovacuum runs analyze *at all* for partitioned tables is an enormous
step forward from it not doing so.  I think we should treat ATTACH/
DETACH/DROP handling as a further feature to be added in a future
release, not an open item to be fixed in the current one.

Now, if somebody sees a very trivial way to handle it, let's discuss it,
but *I* don't see it.

-- 
Álvaro Herrera   Valdivia, Chile
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-21 Thread Alvaro Herrera
On 2021-Apr-08, Tom Lane wrote:

> BTW, another thing that looks like a race condition is the
> extract_autovac_opts() call that is done a little bit earlier,
> also without lock.  I think this is actually safe, but it's ONLY
> safe because we resisted the calls by certain people to add a
> toast table to pg_class.  Otherwise, fetching reloptions could
> have involved a toast pointer dereference, and it would then be
> racy whether the toasted data was still there.  As-is, even if
> the pg_class row we're looking at has been deleted, we can safely
> disassemble its reloptions.  I think this matter is deserving
> of a comment at least.

True.  I added a comment there.

Thanks,

-- 
Álvaro Herrera   Valdivia, Chile
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."   (Fotis)
   (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-10 Thread Justin Pryzby
On Thu, Apr 08, 2021 at 04:11:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-08, Tom Lane wrote:
> 
> > > So I tend to think that my initial instinct was the better direction: we
> > > should not be doing any find_all_inheritors() here at all, but instead
> > > rely on pg_class.reltuples to be set for the partitioned table.
> > 
> > +1
> 
> This patch does that.

|commit 0e69f705cc1a3df273b38c9883fb5765991e04fe
|Author: Alvaro Herrera 
|Date:   Fri Apr 9 11:29:08 2021 -0400
|
|Set pg_class.reltuples for partitioned tables
|
|When commit 0827e8af70f4 added auto-analyze support for partitioned
|tables, it included code to obtain reltuples for the partitioned table
|as a number of catalog accesses to read pg_class.reltuples for each
|partition.  That's not only very inefficient, but also problematic
|because autovacuum doesn't hold any locks on any of those tables -- and
|doesn't want to.  Replace that code with a read of pg_class.reltuples
|for the partitioned table, and make sure ANALYZE and TRUNCATE properly
|maintain that value.
|
|I found no code that would be affected by the change of relpages from
|zero to non-zero for partitioned tables, and no other code that should
|be maintaining it, but if there is, hopefully it'll be an easy fix.

+   else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   {
+   /*
+* Partitioned tables don't have storage, so we don't set any 
fields in
+* their pg_class entries except for relpages, which is 
necessary for
+* auto-analyze to work properly.
+*/
+   vac_update_relstats(onerel, -1, totalrows,
+   0, false, 
InvalidTransactionId,
+   InvalidMultiXactId,
+   in_outer_xact);
+   }

This refers to "relpages", but I think it means "reltuples".

src/include/commands/vacuum.h:extern void vac_update_relstats(Relation relation,
src/include/commands/vacuum.h-BlockNumber 
num_pages,
src/include/commands/vacuum.h-double num_tuples,
src/include/commands/vacuum.h-BlockNumber 
num_all_visible_pages,

I'm adding it for the next round of "v14docs" patch if you don't want to make a
separate commit for that.

-- 
Justin




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-10 Thread Tomas Vondra
On 4/10/21 12:29 AM, Justin Pryzby wrote:
> On Fri, Apr 09, 2021 at 06:16:59PM -0400, Alvaro Herrera wrote:
>> On 2021-Apr-09, Justin Pryzby wrote:
>>
>>> One data point: we do DETACH/ATTACH tables during normal operation, before
>>> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking 
>>> the
>>> table for a long time.  It'd be undesirable (but maybe of no great 
>>> consequence)
>>> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
>>> afterwards.
>>
>> You mean to trigger an ANALYZE, not to trigger an ALTER, right?
> 
> Oops, right.  It's slightly undesirable for a DETACH to cause an ANALYZE.
> 
>> I think I agree with Tomas: we should do it by default, and offer some
>> way to turn that off.  I suppose a new reloptions, solely for
>> partitioned tables, would be the way to do it.
>>
>>> However, I think DROP should be handled ?
>>
>> DROP of a partition? ... I would think it should do the same as DETACH,
>> right?  Inform that however many rows the partition had, are now changed
>> in ancestors.
> 
> Yes, drop of an (attached) partition.  The case for DROP is clear, since it
> was clearly meant to go away forever.  The case for DETACH seems somewhat less
> clear.
> 
> The current behavior of pg_dump/restore (since 33a53130a) is to CREATE+ATTACH,
> so there's an argument that if DROPping the partition counts towards the
> parent's analyze, then so should CREATE+ATTACH.
> 

I think it's tricky to "optimize" the behavior after ATTACH/DETACH. I'd
argue that in principle, we should aim to keep accurate statistics,  so
ATTACH should be treated as insert of all rows, and DETACH should be
treated as delete of all rows. Se for the purpose of ANALYZE, we should
propagate reltuples as changes_since_analyze after ATTACH/DETACH.

Yes, it may result in more frequent ANALYZE on the parent, but I think
that's necessary. Repeated attach/detach of the same partition may bloat
the value, but I guess that's an example of "If it hurts don't do it."

What I think we might do is offer some light-weight analyze variant,
e.g. based on the merging of statistics (I've posted a PoC patch a
couple days ago.). That would make the ANALYZEs on parent much cheaper,
so those "unnecessary" analyzes would not be an issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Tom Lane
Andres Freund  writes:
> On 2021-04-09 11:54:30 -0400, Alvaro Herrera wrote:
>> Pushed now, thanks.

> I assume this is also the likely explanation for / fix for:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-04-08%2016%3A03%3A03

> ==3500389== VALGRINDERROR-BEGIN
> ==3500389== Invalid read of size 8
> ==3500389==at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)

Yeah, looks like the same thing to me; it's the same line that was
crashing in the non-valgrind reports.

regards, tom lane




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Alvaro Herrera
Hello

On 2021-Apr-09, Andres Freund wrote:

> I assume this is also the likely explanation for / fix for:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-04-08%2016%3A03%3A03
> 
> ==3500389== VALGRINDERROR-BEGIN
> ==3500389== Invalid read of size 8
> ==3500389==at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)
> ==3500389==by 0x4EE0AF: do_autovacuum (autovacuum.c:2168)
> ==3500389==by 0x4EEEA8: AutoVacWorkerMain (autovacuum.c:1715)
> ==3500389==by 0x4EEF7F: StartAutoVacWorker (autovacuum.c:1500)
> ==3500389==by 0x4FD2E4: StartAutovacuumWorker (postmaster.c:5539)

Hmm, I didn't try to reproduce this, but yeah it sounds quite likely
that it's the same issue -- line 3237 is the GETSTRUCT call where the
other one was crashing, which is now gone.

Thanks for pointing it out,

-- 
Álvaro Herrera   Valdivia, Chile




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Andres Freund
Hi,

On 2021-04-09 11:54:30 -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Tom Lane wrote:
>
> > Could we get this pushed sooner rather than later?  The buildfarm
> > is showing a wide variety of intermittent failures on HEAD, and it's
> > hard to tell how many of them trace to this one bug.
>
> Pushed now, thanks.

I assume this is also the likely explanation for / fix for:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-04-08%2016%3A03%3A03

==3500389== VALGRINDERROR-BEGIN
==3500389== Invalid read of size 8
==3500389==at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)
==3500389==by 0x4EE0AF: do_autovacuum (autovacuum.c:2168)
==3500389==by 0x4EEEA8: AutoVacWorkerMain (autovacuum.c:1715)
==3500389==by 0x4EEF7F: StartAutoVacWorker (autovacuum.c:1500)
==3500389==by 0x4FD2E4: StartAutovacuumWorker (postmaster.c:5539)
==3500389==by 0x4FE50A: sigusr1_handler (postmaster.c:5243)
==3500389==by 0x4A6513F: ??? (in 
/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==3500389==by 0x4DCA865: select (select.c:41)
==3500389==by 0x4FEB75: ServerLoop (postmaster.c:1701)
==3500389==by 0x4FFE52: PostmasterMain (postmaster.c:1409)
==3500389==by 0x442563: main (main.c:209)
==3500389==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==3500389==
==3500389== VALGRINDERROR-END
==3500389==
==3500389== Process terminating with default action of signal 11 (SIGSEGV): 
dumping core
==3500389==  Access not within mapped region at address 0x10
==3500389==at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)
==3500389==by 0x4EE0AF: do_autovacuum (autovacuum.c:2168)
==3500389==by 0x4EEEA8: AutoVacWorkerMain (autovacuum.c:1715)
==3500389==by 0x4EEF7F: StartAutoVacWorker (autovacuum.c:1500)
==3500389==by 0x4FD2E4: StartAutovacuumWorker (postmaster.c:5539)
==3500389==by 0x4FE50A: sigusr1_handler (postmaster.c:5243)
==3500389==by 0x4A6513F: ??? (in 
/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==3500389==by 0x4DCA865: select (select.c:41)
==3500389==by 0x4FEB75: ServerLoop (postmaster.c:1701)
==3500389==by 0x4FFE52: PostmasterMain (postmaster.c:1409)
==3500389==by 0x442563: main (main.c:209)
==3500389==  If you believe this happened as a result of a stack
==3500389==  overflow in your program's main thread (unlikely but
==3500389==  possible), you can try to increase the size of the
==3500389==  main thread stack using the --main-stacksize= flag.
==3500389==  The main thread stack size used in this run was 8388608.


Greetings,

Andres Freund




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Justin Pryzby
On Fri, Apr 09, 2021 at 06:16:59PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Justin Pryzby wrote:
> 
> > One data point: we do DETACH/ATTACH tables during normal operation, before
> > type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking 
> > the
> > table for a long time.  It'd be undesirable (but maybe of no great 
> > consequence)
> > to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
> > afterwards.
> 
> You mean to trigger an ANALYZE, not to trigger an ALTER, right?

Oops, right.  It's slightly undesirable for a DETACH to cause an ANALYZE.

> I think I agree with Tomas: we should do it by default, and offer some
> way to turn that off.  I suppose a new reloptions, solely for
> partitioned tables, would be the way to do it.
> 
> > However, I think DROP should be handled ?
> 
> DROP of a partition? ... I would think it should do the same as DETACH,
> right?  Inform that however many rows the partition had, are now changed
> in ancestors.

Yes, drop of an (attached) partition.  The case for DROP is clear, since it
was clearly meant to go away forever.  The case for DETACH seems somewhat less
clear.

The current behavior of pg_dump/restore (since 33a53130a) is to CREATE+ATTACH,
so there's an argument that if DROPping the partition counts towards the
parent's analyze, then so should CREATE+ATTACH.

-- 
Justin




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Alvaro Herrera
On 2021-Apr-09, Justin Pryzby wrote:

> One data point: we do DETACH/ATTACH tables during normal operation, before
> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
> table for a long time.  It'd be undesirable (but maybe of no great 
> consequence)
> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
> afterwards.

You mean to trigger an ANALYZE, not to trigger an ALTER, right?

I think I agree with Tomas: we should do it by default, and offer some
way to turn that off.  I suppose a new reloptions, solely for
partitioned tables, would be the way to do it.

> However, I think DROP should be handled ?

DROP of a partition? ... I would think it should do the same as DETACH,
right?  Inform that however many rows the partition had, are now changed
in ancestors.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Tomas Vondra



On 4/9/21 11:45 PM, Justin Pryzby wrote:
> On Fri, Apr 09, 2021 at 05:31:55PM -0400, Alvaro Herrera wrote:
>> On 2021-Apr-09, Robert Haas wrote:
>>
>>> Does this need to worry about new partitions getting attached to a
>>> partitioned table, or old ones getting detached? (Maybe it does
>>> already, not sure.)
>>
>> Good question.  It does not.
> 
> I think there's probably cases where this is desirable, and cases where it's
> undesirable, so I don't think it's necessarily a problem.
> 
> One data point: we do DETACH/ATTACH tables during normal operation, before
> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
> table for a long time.  It'd be undesirable (but maybe of no great 
> consequence)
> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
> afterwards.
> 
> However, I think DROP should be handled ?
> 

IMHO we should prefer the default behavior which favors having updated
statistics, and maybe have a way to override it for individual commands.
So ATTACH would update changes_since_analyze by default, but it would be
possible to disable that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Justin Pryzby
On Fri, Apr 09, 2021 at 05:31:55PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Robert Haas wrote:
> 
> > Does this need to worry about new partitions getting attached to a
> > partitioned table, or old ones getting detached? (Maybe it does
> > already, not sure.)
> 
> Good question.  It does not.

I think there's probably cases where this is desirable, and cases where it's
undesirable, so I don't think it's necessarily a problem.

One data point: we do DETACH/ATTACH tables during normal operation, before
type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
table for a long time.  It'd be undesirable (but maybe of no great consequence)
to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
afterwards.

However, I think DROP should be handled ?

-- 
Justin




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Alvaro Herrera
On 2021-Apr-09, Robert Haas wrote:

> Does this need to worry about new partitions getting attached to a
> partitioned table, or old ones getting detached? (Maybe it does
> already, not sure.)

Good question.  It does not.

I suppose you could just let that happen automatically -- I mean, next
time the partitioned table is analyzed, it'll scan all attached
partitions.  But if no tuples are modified afterwards in existing
partitions (a common scenario), and the newly attached partition
contains lots of rows, then only future rows in the newly attached
partition would affect the stats of the partitioned table, and it could
be a long time before that causes an analyze on the partitioned table to
occur.

Maybe a way to attack this is to send a the "anl_ancestors" message to
the collector on attach and detach, adding a new flag ("is
attach/detach"), which indicates to add not only "changes_since_analyze
- changes_since_analyze_reported", but also "n_live_tuples".

-- 
Álvaro Herrera39°49'30"S 73°17'W
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Robert Haas
On Fri, Apr 9, 2021 at 11:54 AM Alvaro Herrera  wrote:
> On 2021-Apr-09, Tom Lane wrote:
> > Could we get this pushed sooner rather than later?  The buildfarm
> > is showing a wide variety of intermittent failures on HEAD, and it's
> > hard to tell how many of them trace to this one bug.
>
> Pushed now, thanks.

Does this need to worry about new partitions getting attached to a
partitioned table, or old ones getting detached? (Maybe it does
already, not sure.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Alvaro Herrera
On 2021-Apr-09, Tom Lane wrote:

> Could we get this pushed sooner rather than later?  The buildfarm
> is showing a wide variety of intermittent failures on HEAD, and it's
> hard to tell how many of them trace to this one bug.

Pushed now, thanks.

-- 
Álvaro Herrera   Valdivia, Chile
"Digital and video cameras have this adjustment and film cameras don't for the
same reason dogs and cats lick themselves: because they can."   (Ken Rockwell)




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-09 Thread Tom Lane
Could we get this pushed sooner rather than later?  The buildfarm
is showing a wide variety of intermittent failures on HEAD, and it's
hard to tell how many of them trace to this one bug.

regards, tom lane




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Alvaro Herrera
On 2021-Apr-08, Zhihong Yu wrote:

> Hi,
> Within truncate_update_partedrel_stats(), dirty is declared within the loop.
> +   if (rd_rel->reltuples != 0)
> +   {
> ...
> +   if (dirty)
> 
> The two if blocks can be merged. The variable dirty can be dropped.

Hi, thanks for reviewing.  Yes, evidently I copied vac_update_relstats
too closely -- that boolean is not necessary.

-- 
Álvaro Herrera   Valdivia, Chile




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Zhihong Yu
On Thu, Apr 8, 2021 at 1:12 PM Alvaro Herrera 
wrote:

> On 2021-Apr-08, Tom Lane wrote:
>
> > > So I tend to think that my initial instinct was the better direction:
> we
> > > should not be doing any find_all_inheritors() here at all, but instead
> > > rely on pg_class.reltuples to be set for the partitioned table.
> >
> > +1
>
> This patch does that.
>
> --
> Álvaro Herrera39°49'30"S 73°17'W
> "I dream about dreams about dreams", sang the nightingale
> under the pale moon (Sandman)
>

Hi,
Within truncate_update_partedrel_stats(), dirty is declared within the loop.
+   if (rd_rel->reltuples != 0)
+   {
...
+   if (dirty)

The two if blocks can be merged. The variable dirty can be dropped.

Cheers


Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Alvaro Herrera
On 2021-Apr-08, Tom Lane wrote:

> > So I tend to think that my initial instinct was the better direction: we
> > should not be doing any find_all_inheritors() here at all, but instead
> > rely on pg_class.reltuples to be set for the partitioned table.
> 
> +1

This patch does that.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)
>From a54701552f2ba9295aae4fe0fc22c7bac912bf45 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 8 Apr 2021 11:10:44 -0400
Subject: [PATCH] Set pg_class.reltuples for partitioned tables

---
 src/backend/commands/analyze.c  | 12 +++
 src/backend/commands/tablecmds.c| 51 -
 src/backend/postmaster/autovacuum.c | 39 +-
 3 files changed, 63 insertions(+), 39 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5bdaceefd5..2de699d838 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -656,6 +656,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 in_outer_xact);
 		}
 	}
+	else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		/*
+		 * Partitioned tables don't have storage, so we don't set any fields in
+		 * their pg_class entries except for relpages, which is necessary for
+		 * auto-analyze to work properly.
+		 */
+		vac_update_relstats(onerel, -1, totalrows,
+			0, false, InvalidTransactionId,
+			InvalidMultiXactId,
+			in_outer_xact);
+	}
 
 	/*
 	 * Now report ANALYZE to the stats collector.  For regular tables, we do
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1f19629a94..deca860c80 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -337,6 +337,7 @@ typedef struct ForeignTruncateInfo
 static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
 static void truncate_check_perms(Oid relid, Form_pg_class reltuple);
 static void truncate_check_activity(Relation rel);
+static void truncate_update_partedrel_stats(List *parted_rels);
 static void RangeVarCallbackForTruncate(const RangeVar *relation,
 		Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
@@ -1755,6 +1756,7 @@ ExecuteTruncateGuts(List *explicit_rels,
 {
 	List	   *rels;
 	List	   *seq_relids = NIL;
+	List	   *parted_rels = NIL;
 	HTAB	   *ft_htab = NULL;
 	EState	   *estate;
 	ResultRelInfo *resultRelInfos;
@@ -1908,9 +1910,15 @@ ExecuteTruncateGuts(List *explicit_rels,
 		Relation	rel = (Relation) lfirst(lc1);
 		int			extra = lfirst_int(lc2);
 
-		/* Skip partitioned tables as there is nothing to do */
+		/*
+		 * Save OID of partitioned tables for later; nothing else to do for
+		 * them here.
+		 */
 		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		{
+			parted_rels = lappend_oid(parted_rels, RelationGetRelid(rel));
 			continue;
+		}
 
 		/*
 		 * Build the lists of foreign tables belonging to each foreign server
@@ -2061,6 +2069,9 @@ ExecuteTruncateGuts(List *explicit_rels,
 		ResetSequence(seq_relid);
 	}
 
+	/* Reset partitioned tables' pg_class.reltuples */
+	truncate_update_partedrel_stats(parted_rels);
+
 	/*
 	 * Write a WAL record to allow this set of actions to be logically
 	 * decoded.
@@ -2207,6 +2218,44 @@ truncate_check_activity(Relation rel)
 	CheckTableNotInUse(rel, "TRUNCATE");
 }
 
+/*
+ * Update pg_class.reltuples for all the given partitioned tables to 0.
+ */
+static void
+truncate_update_partedrel_stats(List *parted_rels)
+{
+	Relation	pg_class;
+	HeapTuple	tuple;
+	Form_pg_class rd_rel;
+	ListCell   *lc;
+
+	pg_class = table_open(RelationRelationId, RowExclusiveLock);
+
+	foreach(lc, parted_rels)
+	{
+		Oid		relid = lfirst_oid(lc);
+		bool	dirty = false;
+
+		tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "could not find tuple for relation %u", relid);
+		rd_rel = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (rd_rel->reltuples != 0)
+		{
+			rd_rel->reltuples = (float4) 0;
+			dirty = true;
+		}
+
+		if (dirty)
+			heap_inplace_update(pg_class, tuple);
+
+		heap_freetuple(tuple);
+	}
+
+	table_close(pg_class, RowExclusiveLock);
+}
+
 /*
  * storage_name
  *	  returns the name corresponding to a typstorage/attstorage enum value
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aef9ac4dd2..a799544738 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3209,44 +3209,7 @@ relation_needs_vacanalyze(Oid relid,
 	 */
 	if (PointerIsValid(tabentry) && AutoVacuumingActive())
 	{
-		if (classForm->relkind != RELKIND_PARTITIONED_TABLE)
-		{
-			reltuples = classForm->reltuples;
-		}
-		else
-		{
-			/*
-			 * If the relation is a partitioned table, we must add up
-			 * children's reltuples.
-			 */
-	

Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Apr-08, Tom Lane wrote:
>> Yeah.  I hit this on another machine that isn't using EXEC_BACKEND,
>> and I concur it looks more like a race condition.  I think the problem
>> is that autovacuum is calling find_all_inheritors() on a relation it
>> has no lock on, contrary to that function's API spec.

> Hmm.  Autovacuum tries hard to avoid grabbing locks on relations until
> really needed (at vacuum/analyze time), which is why all these tests
> only use data that can be found in the pg_class rows and pgstat entries.

Yeah, I was worried about that.

> So I tend to think that my initial instinct was the better direction: we
> should not be doing any find_all_inheritors() here at all, but instead
> rely on pg_class.reltuples to be set for the partitioned table.

+1

regards, tom lane




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Alvaro Herrera
On 2021-Apr-08, Tom Lane wrote:

> Yeah.  I hit this on another machine that isn't using EXEC_BACKEND,
> and I concur it looks more like a race condition.  I think the problem
> is that autovacuum is calling find_all_inheritors() on a relation it
> has no lock on, contrary to that function's API spec.  find_all_inheritors
> assumes the OID it's given is valid and locked, and adds it to the
> result list automatically.  Then it looks for children, and won't find
> any in the race case where somebody else just dropped the table.

Hmm.  Autovacuum tries hard to avoid grabbing locks on relations until
really needed (at vacuum/analyze time), which is why all these tests
only use data that can be found in the pg_class rows and pgstat entries.
So I tend to think that my initial instinct was the better direction: we
should not be doing any find_all_inheritors() here at all, but instead
rely on pg_class.reltuples to be set for the partitioned table.

I'll give that another look.  Most places already assume that reltuples
isn't set for a partitioned table, so they shouldn't care.  I wonder,
though, whether we should set relpages to some value other than 0 or -1.
(I'm inclined not to, since autovacuum does not use it.)

-- 
Álvaro Herrera   Valdivia, Chile




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Apr-08, Tom Lane wrote:
>> Looks like this has issues under EXEC_BACKEND:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2021-04-08%2005%3A50%3A08

> Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
> think this is unrelated to that, but rather a race condition.

Yeah.  I hit this on another machine that isn't using EXEC_BACKEND,
and I concur it looks more like a race condition.  I think the problem
is that autovacuum is calling find_all_inheritors() on a relation it
has no lock on, contrary to that function's API spec.  find_all_inheritors
assumes the OID it's given is valid and locked, and adds it to the
result list automatically.  Then it looks for children, and won't find
any in the race case where somebody else just dropped the table.
So we come back to relation_needs_vacanalyze with a list of just the
original rel OID, and since this loop believes that every syscache fetch
it does will succeed, kaboom.

I do not think it is sane to do find_all_inheritors() with no lock,
so I'd counsel doing something about that rather than band-aiding the
symptom.  On the other hand, it's also not really okay not to have
an if-test-and-elog after the SearchSysCache call.  "A cache lookup
cannot fail" is not an acceptable assumption in my book.

BTW, another thing that looks like a race condition is the
extract_autovac_opts() call that is done a little bit earlier,
also without lock.  I think this is actually safe, but it's ONLY
safe because we resisted the calls by certain people to add a
toast table to pg_class.  Otherwise, fetching reloptions could
have involved a toast pointer dereference, and it would then be
racy whether the toasted data was still there.  As-is, even if
the pg_class row we're looking at has been deleted, we can safely
disassemble its reloptions.  I think this matter is deserving
of a comment at least.

regards, tom lane




Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-08 Thread Alvaro Herrera
On 2021-Apr-08, Tom Lane wrote:

> Alvaro Herrera  writes:
> > autovacuum: handle analyze for partitioned tables
> 
> Looks like this has issues under EXEC_BACKEND:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2021-04-08%2005%3A50%3A08

Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
think this is unrelated to that, but rather a race condition.

The backtrace saved by buildfarm is:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  relation_needs_vacanalyze (relid=relid@entry=43057, 
relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0, 
tabentry=0x5611ec71b030, 
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=4,
 dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1, 
wraparound=0x7ffd78cc4ee2) at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
3237childclass = (Form_pg_class) 
GETSTRUCT(childtuple);
#0  relation_needs_vacanalyze (relid=relid@entry=43057, 
relopts=relopts@entry=0x0, classForm=classForm@entry=0x7e000501eef0, 
tabentry=0x5611ec71b030, 
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=4,
 dovacuum=dovacuum@entry=0x7ffd78cc4ee0, doanalyze=0x7ffd78cc4ee1, 
wraparound=0x7ffd78cc4ee2) at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
#1  0x5611eb09fc91 in do_autovacuum () at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:2168
#2  0x5611eb0a0f8b in AutoVacWorkerMain (argc=argc@entry=1, 
argv=argv@entry=0x5611ec61f1e0) at 
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:1715

the code in question is:

children = find_all_inheritors(relid, AccessShareLock, 
NULL);

foreach(lc, children)
{
Oid childOID = 
lfirst_oid(lc);
HeapTuple   childtuple;
Form_pg_class childclass;

childtuple = SearchSysCache1(RELOID, 
ObjectIdGetDatum(childOID));
childclass = (Form_pg_class) 
GETSTRUCT(childtuple);

Evidently SearchSysCache must be returning NULL, but how come that
happens, when we have acquired lock on the rel during
find_all_inheritors?

I would suggest that we do not take lock here at all, and just skip the
rel if SearchSysCache returns empty, as in the attached.  Still, I am
baffled about this crash.

-- 
Álvaro Herrera   Valdivia, Chile
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)
>From 2bb3e54862c37ee2a20fed21513a3df309381919 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 8 Apr 2021 11:10:44 -0400
Subject: [PATCH] Fix race condition in relation_needs_vacanalyze

---
 src/backend/postmaster/autovacuum.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aef9ac4dd2..96073d4597 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3223,18 +3223,23 @@ relation_needs_vacanalyze(Oid relid,
 			ListCell   *lc;
 
 			reltuples = 0;
 
-			/* Find all members of inheritance set taking AccessShareLock */
-			children = find_all_inheritors(relid, AccessShareLock, NULL);
+			/*
+			 * Find all members of inheritance set.  Beware that they may
+			 * disappear from under us, since we don't acquire any locks.
+			 */
+			children = find_all_inheritors(relid, NoLock, NULL);
 
 			foreach(lc, children)
 			{
 Oid			childOID = lfirst_oid(lc);
 HeapTuple	childtuple;
 Form_pg_class childclass;
 
 childtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(childOID));
+if (childtuple == NULL)
+	continue;
 childclass = (Form_pg_class) GETSTRUCT(childtuple);
 
 /* Skip a partitioned table and foreign partitions */
 if (RELKIND_HAS_STORAGE(childclass->relkind))
-- 
2.20.1