I've pushed the comment/assert cleanup.

Here's a cleaned up version of the relkind check. This is almost
identical to the patch from yesterday (plus the renames and updates of
comments for modified functions).

The one difference is that I realized the relkind check does not
actually say we can't do sampling - it just means we can't use
TABLESAMPLE to do it. We could still use "random()" ...

Furthermore, I don't think we should silently disable sampling when the
user explicitly requests TABLESAMPLE by specifying bernoulli/system for
the table - IMHO it's less surprising to just fail in that case.

So we now do this:

    if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO))
        method = ANALYZE_SAMPLE_RANDOM;

Yes, we may still disable sampling when reltuples is -1, 0 or something
like that. But that's a condition that is expected for new relations and
likely to fix itself, which is not the case for relkind.

Of course, all relkinds that don't support TABLESAMPLE currently have
reltuples value that will disable sampling anyway (e.g. views have -1).
So we won't actually fallback to random() anyway, because we can't
calculate the sample fraction.

That's a bit annoying for foreign tables pointing at a view, which is a
more likely use case than table pointing at a sequence. And likely more
of an issue, because views may return a many rows (while sequences have
only a single row).

But I realized we could actually still do "random()" sampling:

    SELECT * FROM t ORDER BY random() LIMIT $X;

where $X is the target number of rows for sample for the table. Which
would result in plans like this (given sufficient work_mem values)

                              QUERY PLAN
  -------------------------------------------------------------------
   Limit (actual rows=30000 loops=1)
     ->  Sort (actual rows=30000 loops=1)
           Sort Key: (random())
           Sort Method: top-N heapsort  Memory: 3916kB
           ->  Append (actual rows=1000000 loops=1)

Even with lower work_mem values this would likely be a win, due to
saving on network transfers.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 76270100186507c1403f7758131966d146db5b39 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Thu, 5 Jan 2023 21:13:20 +0100
Subject: [PATCH] Check relkind before using TABLESAMPLE in postgres_fdw

Before using TABLESAMPLE to acquire a sample from a foreign relation,
check the remote relkind. If it does not support TABLESAMPLE, disable
sampling and fallback to reading all rows.

This may seem unnecessary, because for relkinds that don't support
TABLESAMPLE we'll disable the sampling anyway, due to reltuples value
(e.g. views use -1, sequences 1). But that's rather accidental, and
might get broken after adding TABLESAMPLE support for other relkinds.
Better to make an explicit check.

Instead of disabling remote sampling for relkinds that do not support
TABLESAMPLE, consider fallback to using random() if the requested
sampling method was AUTO.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/951485.1672461744%40sss.pgh.pa.us
---
 contrib/postgres_fdw/deparse.c      |  7 ++--
 contrib/postgres_fdw/postgres_fdw.c | 60 ++++++++++++++++++++---------
 contrib/postgres_fdw/postgres_fdw.h |  2 +-
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 4461fb06f02..21237d18ef8 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2368,14 +2368,15 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 }
 
 /*
- * Construct SELECT statement to acquire the number of rows of a relation.
+ * Construct SELECT statement to acquire the number of rows and the relkind of
+ * a relation.
  *
  * Note: we just return the remote server's reltuples value, which might
  * be off a good deal, but it doesn't seem worth working harder.  See
  * comments in postgresAcquireSampleRowsFunc.
  */
 void
-deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+deparseAnalyzeInfoSql(StringInfo buf, Relation rel)
 {
 	StringInfoData relname;
 
@@ -2383,7 +2384,7 @@ deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
 	initStringInfo(&relname);
 	deparseRelation(&relname, rel);
 
-	appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+	appendStringInfoString(buf, "SELECT reltuples, relkind FROM pg_catalog.pg_class WHERE oid = ");
 	deparseStringLiteral(buf, relname.data);
 	appendStringInfoString(buf, "::pg_catalog.regclass");
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f8461bf18dc..44573c9fed2 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4974,11 +4974,14 @@ postgresAnalyzeForeignTable(Relation relation,
 }
 
 /*
- * postgresCountTuplesForForeignTable
+ * postgresGetAnalyzeInfoForForeignTable
  *		Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * We also determine if we can use TABLESAMPLE on the relation to acquire
+ * the ANALYZE sample.
  */
 static double
-postgresCountTuplesForForeignTable(Relation relation)
+postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
 {
 	ForeignTable *table;
 	UserMapping *user;
@@ -4986,6 +4989,10 @@ postgresCountTuplesForForeignTable(Relation relation)
 	StringInfoData sql;
 	PGresult   *volatile res = NULL;
 	volatile double reltuples = -1;
+	volatile char relkind = 0;
+
+	/* assume the remote relation does not support TABLESAMPLE */
+	*can_tablesample = false;
 
 	/*
 	 * Get the connection to use.  We do the remote access as the table's
@@ -4999,7 +5006,7 @@ postgresCountTuplesForForeignTable(Relation relation)
 	 * Construct command to get page count for relation.
 	 */
 	initStringInfo(&sql);
-	deparseAnalyzeTuplesSql(&sql, relation);
+	deparseAnalyzeInfoSql(&sql, relation);
 
 	/* In what follows, do not risk leaking any PGresults. */
 	PG_TRY();
@@ -5008,9 +5015,10 @@ postgresCountTuplesForForeignTable(Relation relation)
 		if (PQresultStatus(res) != PGRES_TUPLES_OK)
 			pgfdw_report_error(ERROR, res, conn, false, sql.data);
 
-		if (PQntuples(res) != 1 || PQnfields(res) != 1)
+		if (PQntuples(res) != 1 || PQnfields(res) != 2)
 			elog(ERROR, "unexpected result from deparseAnalyzeTuplesSql query");
 		reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+		relkind = *(PQgetvalue(res, 0, 1));
 	}
 	PG_FINALLY();
 	{
@@ -5021,6 +5029,11 @@ postgresCountTuplesForForeignTable(Relation relation)
 
 	ReleaseConnection(conn);
 
+	/* TABLESAMPLE is supported only for regular tables and matviews */
+	*can_tablesample = (relkind == RELKIND_RELATION ||
+						relkind == RELKIND_MATVIEW ||
+						relkind == RELKIND_PARTITIONED_TABLE);
+
 	return reltuples;
 }
 
@@ -5147,19 +5160,6 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("remote server does not support TABLESAMPLE feature")));
 
-	/*
-	 * For "auto" method, pick the one we believe is best. For servers with
-	 * TABLESAMPLE support we pick BERNOULLI, for old servers we fall-back to
-	 * random() to at least reduce network transfer.
-	 */
-	if (method == ANALYZE_SAMPLE_AUTO)
-	{
-		if (server_version_num < 95000)
-			method = ANALYZE_SAMPLE_RANDOM;
-		else
-			method = ANALYZE_SAMPLE_BERNOULLI;
-	}
-
 	/*
 	 * If we've decided to do remote sampling, calculate the sampling rate. We
 	 * need to get the number of tuples from the remote server, but skip that
@@ -5167,7 +5167,18 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	 */
 	if (method != ANALYZE_SAMPLE_OFF)
 	{
-		reltuples = postgresCountTuplesForForeignTable(relation);
+		bool	can_tablesample;
+
+		reltuples = postgresGetAnalyzeInfoForForeignTable(relation,
+														  &can_tablesample);
+
+		/*
+		 * Make sure we're not choosing TABLESAMPLE when the remote relation does
+		 * not support that. But only do this for "auto" - if the user explicitly
+		 * requested BERNOULLI/SYSTEM, it's better to fail.
+		 */
+		if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO))
+			method = ANALYZE_SAMPLE_RANDOM;
 
 		/*
 		 * Remote's reltuples could be 0 or -1 if the table has never been
@@ -5212,6 +5223,19 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 		}
 	}
 
+	/*
+	 * For "auto" method, pick the one we believe is best. For servers with
+	 * TABLESAMPLE support we pick BERNOULLI, for old servers we fall-back to
+	 * random() to at least reduce network transfer.
+	 */
+	if (method == ANALYZE_SAMPLE_AUTO)
+	{
+		if (server_version_num < 95000)
+			method = ANALYZE_SAMPLE_RANDOM;
+		else
+			method = ANALYZE_SAMPLE_BERNOULLI;
+	}
+
 	/*
 	 * Construct cursor that retrieves whole rows from remote.
 	 */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 4a633c5357f..02c11523199 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -223,7 +223,7 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
 								   List *returningList,
 								   List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
-extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeInfoSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 							  PgFdwSamplingMethod sample_method,
 							  double sample_frac,
-- 
2.39.0

Reply via email to