Re: segmentation fault using currtid and partitioned tables

2020-05-31 Thread Michael Paquier
On Sun, May 31, 2020 at 10:26:54PM -0400, Tom Lane wrote:
> Ugh.  Aside from the stale-pointer-deref problem, once we drop the lock
> we can't even be sure the table still exists.  +1 for back-patch.

Thanks.  Fixed down to 9.5 then to make prion happier.
--
Michael


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-05-31 Thread Tom Lane
Michael Paquier  writes:
> Woah.  This one is old, good catch from -DRELCACHE_FORCE_RELEASE.  It
> happens that since its introduction in a3519a2 from 2002,
> currtid_for_view() in tid.c closes the view and then looks at a RTE
> from it.  I have reproduced the issue and the patch attached takes
> care of the problem.  Would it be better to backpatch all the way down
> or is that not worth caring about?

Ugh.  Aside from the stale-pointer-deref problem, once we drop the lock
we can't even be sure the table still exists.  +1 for back-patch.

regards, tom lane




Re: segmentation fault using currtid and partitioned tables

2020-05-31 Thread Michael Paquier
On Mon, Jun 01, 2020 at 10:57:29AM +0900, Michael Paquier wrote:
> Applied this one then.  I also got to check the ODBC driver in more
> details, and I am indeed not seeing those functions getting used.
> One extra thing to know is that the ODBC driver requires libpq from at
> least 9.2, which may give one more argument to just remove them.
> 
> NB: prion has been failing, just looking into it.

Woah.  This one is old, good catch from -DRELCACHE_FORCE_RELEASE.  It
happens that since its introduction in a3519a2 from 2002,
currtid_for_view() in tid.c closes the view and then looks at a RTE
from it.  I have reproduced the issue and the patch attached takes
care of the problem.  Would it be better to backpatch all the way down
or is that not worth caring about?
--
Michael
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index cc699ee2f4..509a0fdffc 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -338,8 +338,13 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 	rte = rt_fetch(var->varno, query->rtable);
 	if (rte)
 	{
+		Datum		result;
+
+		result = DirectFunctionCall2(currtid_byreloid,
+	 ObjectIdGetDatum(rte->relid),
+	 PointerGetDatum(tid));
 		table_close(viewrel, AccessShareLock);
-		return DirectFunctionCall2(currtid_byreloid, ObjectIdGetDatum(rte->relid), PointerGetDatum(tid));
+		return result;
 	}
 }
 			}


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-05-31 Thread Michael Paquier
On Fri, May 29, 2020 at 03:48:40PM +0900, Michael Paquier wrote:
> Okay, I have switched the patch to do that.  Any comments or
> objections?

Applied this one then.  I also got to check the ODBC driver in more
details, and I am indeed not seeing those functions getting used.
One extra thing to know is that the ODBC driver requires libpq from at
least 9.2, which may give one more argument to just remove them.

NB: prion has been failing, just looking into it.
--
Michael


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Michael Paquier
On Thu, May 28, 2020 at 05:55:59PM -0700, Andres Freund wrote:
> And there only for very old servers (< 8.2), according to Hiroshi
> Inoue. Found that out post 12 freeze. I was planning to drop them for
> 13, but I unfortunately didn't get around to do so :(

[... digging ...]
Ah, I think I see your point from the code.  That's related to the use
of RETURNING for ctids.

> I guess we could decide to make a freeze exception to remove them now,
> although I'm not sure the reasons for doing so are strong enough.

Not sure that's a good thing to do after beta1 for 13, but there is an
argument for that in 14.  FWIW, my company is a huge user of the ODBC
driver (perhaps the biggest one?), and we have nothing even close to
8.2.

> I concur that it seems unnecessary to make these translatable, even with
> the reduced scope from
> https://www.postgresql.org/message-id/20200526025959.GE6155%40paquier.xyz

Okay, I have switched the patch to do that.  Any comments or
objections?
--
Michael
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 4ce8375eab..ac232a238f 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -31,6 +31,7 @@
 #include "parser/parsetree.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/varlena.h"
@@ -349,6 +350,12 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 	return (Datum) 0;
 }
 
+/*
+ *	currtid_byreloid
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid(), and
+ *		uses in input the OID of the relation to look at.
+ */
 Datum
 currtid_byreloid(PG_FUNCTION_ARGS)
 {
@@ -378,6 +385,11 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	if (rel->rd_rel->relkind == RELKIND_VIEW)
 		return currtid_for_view(rel, tid);
 
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		elog(ERROR, "cannot look at latest visible tid for relation \"%s.%s\"",
+			 get_namespace_name(RelationGetNamespace(rel)),
+			 RelationGetRelationName(rel));
+
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
@@ -391,6 +403,12 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	PG_RETURN_ITEMPOINTER(result);
 }
 
+/*
+ *	currtid_byrelname
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid2(), and
+ *		uses in input the relation name to look at.
+ */
 Datum
 currtid_byrelname(PG_FUNCTION_ARGS)
 {
@@ -415,6 +433,11 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	if (rel->rd_rel->relkind == RELKIND_VIEW)
 		return currtid_for_view(rel, tid);
 
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		elog(ERROR, "cannot look at latest visible tid for relation \"%s.%s\"",
+			 get_namespace_name(RelationGetNamespace(rel)),
+			 RelationGetRelationName(rel));
+
 	result = (ItemPointer) palloc(sizeof(ItemPointerData));
 	ItemPointerCopy(tid, result);
 
diff --git a/src/test/regress/expected/tid.out b/src/test/regress/expected/tid.out
new file mode 100644
index 00..e7e0d74780
--- /dev/null
+++ b/src/test/regress/expected/tid.out
@@ -0,0 +1,106 @@
+-- tests for functions related to TID handling
+CREATE TABLE tid_tab (a int);
+-- min() and max() for TIDs
+INSERT INTO tid_tab VALUES (1), (2);
+SELECT min(ctid) FROM tid_tab;
+  min  
+---
+ (0,1)
+(1 row)
+
+SELECT max(ctid) FROM tid_tab;
+  max  
+---
+ (0,2)
+(1 row)
+
+TRUNCATE tid_tab;
+-- Tests for currtid() and currtid2() with various relation kinds
+-- Materialized view
+CREATE MATERIALIZED VIEW tid_matview AS SELECT a FROM tid_tab;
+SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  tid (0, 1) is not valid for relation "tid_matview"
+SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- fails
+ERROR:  tid (0, 1) is not valid for relation "tid_matview"
+INSERT INTO tid_tab VALUES (1);
+REFRESH MATERIALIZED VIEW tid_matview;
+SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- ok
+ currtid 
+-
+ (0,1)
+(1 row)
+
+SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- ok
+ currtid2 
+--
+ (0,1)
+(1 row)
+
+DROP MATERIALIZED VIEW tid_matview;
+TRUNCATE tid_tab;
+-- Sequence
+CREATE SEQUENCE tid_seq;
+SELECT currtid('tid_seq'::regclass::oid, '(0,1)'::tid); -- ok
+ currtid 
+-
+ (0,1)
+(1 row)
+
+SELECT currtid2('tid_seq'::text, '(0,1)'::tid); -- ok
+ currtid2 
+--
+ (0,1)
+(1 row)
+
+DROP SEQUENCE tid_seq;
+-- Index, fails with incorrect relation type
+CREATE INDEX tid_ind ON tid_tab(a);
+SELECT currtid('tid_ind'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  "tid_ind" is an index
+SELECT currtid2('tid_ind'::text, '(0,1)'::tid); -- fails
+ERROR:  "tid_ind" is an index
+DROP INDEX tid_ind;
+-- Partitioned table, no storage
+CREATE TABLE tid_part (a int) PARTITION BY RANGE (a);
+SELECT currtid('tid_part'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  cannot look at latest visible

Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Tom Lane
Andres Freund  writes:
> On 2020-04-05 12:51:56 -0400, Tom Lane wrote:
>> I think it might be a good idea to make relations-without-storage
>> set up rd_tableam as a vector of dummy functions that will throw
>> some suitable complaint about "relation lacks storage".  NULL is
>> a horrible default for this.

> OTOH, it's kinda annoying having to maintain a not insignificant number
> of functions that needs to be updated whenever the tableam interface
> evolves.

That argument sounds pretty weak.  If you're making breaking changes
in the tableam API, updating the signatures (not even any code) of
some dummy functions seems like by far the easiest part.

regards, tom lane




Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Andres Freund
Hi,

On 2020-04-05 12:51:56 -0400, Tom Lane wrote:
> (2) The proximate cause of the crash is that rd_tableam is zero,
> so that the interface functions in tableam.h just crash hard.
> This seems like a pretty bad thing; does anyone want to promise
> that there are no other oversights of the same ilk elsewhere,
> and never will be?
> 
> I think it might be a good idea to make relations-without-storage
> set up rd_tableam as a vector of dummy functions that will throw
> some suitable complaint about "relation lacks storage".  NULL is
> a horrible default for this.

I don't have particularly strong views here. I can see a benefit to such
a pseudo AM. I can even imagine that there might some cases where we
would actually introduce some tableam functions for e.g. partitioned or
views tables, to centralize their handling more, instead of having such
considerations more distributed.  Clearly not worth actively trying to
do that for all existing code dealing with such relkinds, but there
might be cases where it's worthwhile.

OTOH, it's kinda annoying having to maintain a not insignificant number
of functions that needs to be updated whenever the tableam interface
evolves.  I guess we could partially hack our way through that by having
one such function, and just assigning it to all the mandatory callbacks
by way of a void cast. But that'd be mighty ugly.

Greetings,

Andres Freund




Re: segmentation fault using currtid and partitioned tables

2020-05-28 Thread Andres Freund
Hi,

On 2020-05-22 19:32:57 -0400, Alvaro Herrera wrote:
> On 2020-Apr-09, Michael Paquier wrote:
> 
> > Playing more with this stuff, it happens that we have zero code
> > coverage for currtid() and currtid2(), and the main user of those
> > functions I can find around is the ODBC driver:
> > https://coverage.postgresql.org/src/backend/utils/adt/tid.c.gcov.html
> 
> Yeah, they're there solely for ODBC as far as I know.

And there only for very old servers (< 8.2), according to Hiroshi
Inoue. Found that out post 12 freeze. I was planning to drop them for
13, but I unfortunately didn't get around to do so :(

I guess we could decide to make a freeze exception to remove them now,
although I'm not sure the reasons for doing so are strong enough.


> > There are multiple cases to consider, particularly for views:
> > - Case of a view with ctid as attribute taken from table.
> > - Case of a view with ctid as attribute with incorrect attribute
> > type.
> > It is worth noting that all those code paths can trigger various
> > elog() errors, which is not something that a user should be able to do
> > using a SQL-callable function.  There are also two code paths for
> > cases where a view has no or more-than-one SELECT rules, which cannot
> > normally be reached.
> 
> > All in that, I propose something like the attached to patch the
> > surroundings with tests to cover everything I could think of, which I
> > guess had better be backpatched?
> 
> I don't know, but this stuff is so unused that your patch seems
> excessive ... and I think we'd rather not backpatch something so large.
> I propose we do something less invasive in the backbranches, like just
> throw elog() errors (nothing fancy) where necessary to avoid the
> crashes.  Even for pg12 it seems that that should be sufficient.
> 
> For pg13 and beyond, I liked Tom's idea of installing dummy functions

I concur that it seems unnecessary to make these translatable, even with
the reduced scope from
https://www.postgresql.org/message-id/20200526025959.GE6155%40paquier.xyz

Greetings,

Andres Freund




Re: segmentation fault using currtid and partitioned tables

2020-05-27 Thread Alvaro Herrera
On 2020-May-26, Michael Paquier wrote:

> On Mon, May 25, 2020 at 06:29:10PM +0900, Michael Paquier wrote:
> > Perhaps you are right though, and that we don't need to spend this
> > much energy into improving the error messages so I am fine to discard
> > this part.  At the end, in order to remove the crashes, you just need
> > to keep around the two RELKIND_HAS_STORAGE() checks.  But I would
> > rather keep these two to use ereport(ERRCODE_FEATURE_NOT_SUPPORTED)
> > instead of elog(), and keep the test coverage of the previous patch
> > (including the tests for the aggregates I noticed were missing).
> > Would you be fine with that?
> 
> And this means the attached.  Thoughts are welcome.

Yeah, this looks good to me.  I would have used elog() instead, but
I don't care enough ... as a translator, I can come up with a message as
undecipherable as the original without worrying too much, since I
suspect nobody will ever see it in practice.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: segmentation fault using currtid and partitioned tables

2020-05-26 Thread Michael Paquier
On Wed, May 27, 2020 at 12:29:39AM -0500, Jaime Casanova wrote:
> so, currently the patch just installs protections on both currtid_*
> functions and adds some tests... therefore we can consider it as a bug
> fix and let it go in 13? actually also backpatch in 12...

Yes, and it has the advantage to be simple.

> only point to mention is a typo (a missing "l") in an added comment:
> 
> + * currtid_byrename

Oops, thanks.
--
Michael


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-05-26 Thread Jaime Casanova
On Mon, 25 May 2020 at 22:01, Michael Paquier  wrote:
>
> On Mon, May 25, 2020 at 06:29:10PM +0900, Michael Paquier wrote:
> > Perhaps you are right though, and that we don't need to spend this
> > much energy into improving the error messages so I am fine to discard
> > this part.  At the end, in order to remove the crashes, you just need
> > to keep around the two RELKIND_HAS_STORAGE() checks.  But I would
> > rather keep these two to use ereport(ERRCODE_FEATURE_NOT_SUPPORTED)
> > instead of elog(), and keep the test coverage of the previous patch
> > (including the tests for the aggregates I noticed were missing).
> > Would you be fine with that?
>
> And this means the attached.  Thoughts are welcome.

so, currently the patch just installs protections on both currtid_*
functions and adds some tests... therefore we can consider it as a bug
fix and let it go in 13? actually also backpatch in 12...

patch works, server doesn't crash anymore

only point to mention is a typo (a missing "l") in an added comment:

+ * currtid_byrename

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: segmentation fault using currtid and partitioned tables

2020-05-25 Thread Michael Paquier
On Mon, May 25, 2020 at 06:29:10PM +0900, Michael Paquier wrote:
> Perhaps you are right though, and that we don't need to spend this
> much energy into improving the error messages so I am fine to discard
> this part.  At the end, in order to remove the crashes, you just need
> to keep around the two RELKIND_HAS_STORAGE() checks.  But I would
> rather keep these two to use ereport(ERRCODE_FEATURE_NOT_SUPPORTED)
> instead of elog(), and keep the test coverage of the previous patch
> (including the tests for the aggregates I noticed were missing).
> Would you be fine with that?

And this means the attached.  Thoughts are welcome.
--
Michael
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 4ce8375eab..aa55ba7a81 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -31,6 +31,7 @@
 #include "parser/parsetree.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/varlena.h"
@@ -349,6 +350,12 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 	return (Datum) 0;
 }
 
+/*
+ *	currtid_byreloid
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid(), and
+ *		uses in input the OID of the relation to look at.
+ */
 Datum
 currtid_byreloid(PG_FUNCTION_ARGS)
 {
@@ -378,6 +385,13 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	if (rel->rd_rel->relkind == RELKIND_VIEW)
 		return currtid_for_view(rel, tid);
 
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot look at latest visible tid for relation \"%s.%s\"",
+		get_namespace_name(RelationGetNamespace(rel)),
+		RelationGetRelationName(rel;
+
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
@@ -391,6 +405,12 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	PG_RETURN_ITEMPOINTER(result);
 }
 
+/*
+ *	currtid_byrename
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid2(), and
+ *		uses in input the relation name to look at.
+ */
 Datum
 currtid_byrelname(PG_FUNCTION_ARGS)
 {
@@ -415,6 +435,13 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	if (rel->rd_rel->relkind == RELKIND_VIEW)
 		return currtid_for_view(rel, tid);
 
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot look at latest visible tid for relation \"%s.%s\"",
+		get_namespace_name(RelationGetNamespace(rel)),
+		RelationGetRelationName(rel;
+
 	result = (ItemPointer) palloc(sizeof(ItemPointerData));
 	ItemPointerCopy(tid, result);
 
diff --git a/src/test/regress/expected/tid.out b/src/test/regress/expected/tid.out
new file mode 100644
index 00..e7e0d74780
--- /dev/null
+++ b/src/test/regress/expected/tid.out
@@ -0,0 +1,106 @@
+-- tests for functions related to TID handling
+CREATE TABLE tid_tab (a int);
+-- min() and max() for TIDs
+INSERT INTO tid_tab VALUES (1), (2);
+SELECT min(ctid) FROM tid_tab;
+  min  
+---
+ (0,1)
+(1 row)
+
+SELECT max(ctid) FROM tid_tab;
+  max  
+---
+ (0,2)
+(1 row)
+
+TRUNCATE tid_tab;
+-- Tests for currtid() and currtid2() with various relation kinds
+-- Materialized view
+CREATE MATERIALIZED VIEW tid_matview AS SELECT a FROM tid_tab;
+SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  tid (0, 1) is not valid for relation "tid_matview"
+SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- fails
+ERROR:  tid (0, 1) is not valid for relation "tid_matview"
+INSERT INTO tid_tab VALUES (1);
+REFRESH MATERIALIZED VIEW tid_matview;
+SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- ok
+ currtid 
+-
+ (0,1)
+(1 row)
+
+SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- ok
+ currtid2 
+--
+ (0,1)
+(1 row)
+
+DROP MATERIALIZED VIEW tid_matview;
+TRUNCATE tid_tab;
+-- Sequence
+CREATE SEQUENCE tid_seq;
+SELECT currtid('tid_seq'::regclass::oid, '(0,1)'::tid); -- ok
+ currtid 
+-
+ (0,1)
+(1 row)
+
+SELECT currtid2('tid_seq'::text, '(0,1)'::tid); -- ok
+ currtid2 
+--
+ (0,1)
+(1 row)
+
+DROP SEQUENCE tid_seq;
+-- Index, fails with incorrect relation type
+CREATE INDEX tid_ind ON tid_tab(a);
+SELECT currtid('tid_ind'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  "tid_ind" is an index
+SELECT currtid2('tid_ind'::text, '(0,1)'::tid); -- fails
+ERROR:  "tid_ind" is an index
+DROP INDEX tid_ind;
+-- Partitioned table, no storage
+CREATE TABLE tid_part (a int) PARTITION BY RANGE (a);
+SELECT currtid('tid_part'::regclass::oid, '(0,1)'::tid); -- fails
+ERROR:  cannot look at latest visible tid for relation "public.tid_part"
+SELECT currtid2('tid_part'::text, '(0,1)'::tid); -- fails
+ERROR:  cannot look at latest visible tid for relation "public.tid_part"
+DROP TABLE tid_part;
+-- Views
+-- ctid 

Re: segmentation fault using currtid and partitioned tables

2020-05-25 Thread Michael Paquier
On Fri, May 22, 2020 at 07:32:57PM -0400, Alvaro Herrera wrote:
> I don't know, but this stuff is so unused that your patch seems
> excessive ... and I think we'd rather not backpatch something so large.
> I propose we do something less invasive in the backbranches, like just
> throw elog() errors (nothing fancy) where necessary to avoid the
> crashes.  Even for pg12 it seems that that should be sufficient.

Even knowing that those trigger a bunch of elog()s which are not
something that should be user-triggerable?  :)

Perhaps you are right though, and that we don't need to spend this
much energy into improving the error messages so I am fine to discard
this part.  At the end, in order to remove the crashes, you just need
to keep around the two RELKIND_HAS_STORAGE() checks.  But I would
rather keep these two to use ereport(ERRCODE_FEATURE_NOT_SUPPORTED)
instead of elog(), and keep the test coverage of the previous patch
(including the tests for the aggregates I noticed were missing).
Would you be fine with that?

> For pg13 and beyond, I liked Tom's idea of installing dummy functions
> for tables without storage -- that seems safer.

Not sure about that for v13.  That would be invasive post-beta.
--
Michael


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-05-22 Thread Alvaro Herrera
On 2020-Apr-09, Michael Paquier wrote:

> Playing more with this stuff, it happens that we have zero code
> coverage for currtid() and currtid2(), and the main user of those
> functions I can find around is the ODBC driver:
> https://coverage.postgresql.org/src/backend/utils/adt/tid.c.gcov.html

Yeah, they're there solely for ODBC as far as I know.

> There are multiple cases to consider, particularly for views:
> - Case of a view with ctid as attribute taken from table.
> - Case of a view with ctid as attribute with incorrect attribute
> type.
> It is worth noting that all those code paths can trigger various
> elog() errors, which is not something that a user should be able to do
> using a SQL-callable function.  There are also two code paths for
> cases where a view has no or more-than-one SELECT rules, which cannot
> normally be reached.

> All in that, I propose something like the attached to patch the
> surroundings with tests to cover everything I could think of, which I
> guess had better be backpatched?

I don't know, but this stuff is so unused that your patch seems
excessive ... and I think we'd rather not backpatch something so large.
I propose we do something less invasive in the backbranches, like just
throw elog() errors (nothing fancy) where necessary to avoid the
crashes.  Even for pg12 it seems that that should be sufficient.

For pg13 and beyond, I liked Tom's idea of installing dummy functions
for tables without storage -- that seems safer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: segmentation fault using currtid and partitioned tables

2020-04-08 Thread Michael Paquier
On Wed, Apr 08, 2020 at 04:13:31PM +0900, Michael Paquier wrote:
> I have been looking at the tree and the use of the table AM APIs, and
> those TID lookups are really a particular case compared to the other
> callers of the table AM callbacks.  Anyway, I have not spotted similar
> problems, so I find very tempting the option to just add some
> RELKIND_HAS_STORAGE() to tid.c where it matters and call it a day.

Playing more with this stuff, it happens that we have zero code
coverage for currtid() and currtid2(), and the main user of those
functions I can find around is the ODBC driver:
https://coverage.postgresql.org/src/backend/utils/adt/tid.c.gcov.html

There are multiple cases to consider, particularly for views:
- Case of a view with ctid as attribute taken from table.
- Case of a view with ctid as attribute with incorrect attribute
type.
It is worth noting that all those code paths can trigger various
elog() errors, which is not something that a user should be able to do
using a SQL-callable function.  There are also two code paths for
cases where a view has no or more-than-one SELECT rules, which cannot
normally be reached.

All in that, I propose something like the attached to patch the
surroundings with tests to cover everything I could think of, which I
guess had better be backpatched?  While on it, I have noticed that we
lack coverage for max(tid) and min(tid), so I have included a bonus
test.

Another issue is that we don't have any documentation for those
functions, in which case the best fit is a subsection for TID
operators under "Functions and Operators"?

For now, I am adding a patch to next CF so as we don't forget about
this set of issues.  Any thoughts?
--
Michael
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 4ce8375eab..90e03e890d 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -31,6 +31,7 @@
 #include "parser/parsetree.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/varlena.h"
@@ -300,20 +301,41 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 	for (i = 0; i < natts; i++)
 	{
 		Form_pg_attribute attr = TupleDescAttr(att, i);
+		char	   *attname = NameStr(attr->attname);
 
-		if (strcmp(NameStr(attr->attname), "ctid") == 0)
+		if (strcmp(attname, "ctid") == 0)
 		{
 			if (attr->atttypid != TIDOID)
-elog(ERROR, "ctid isn't of type TID");
+ereport(ERROR,
+		(errcode(ERRCODE_DATATYPE_MISMATCH),
+		 errmsg("cannot use attribute \"%s\" of type %s on view \"%s.%s\"",
+attname, format_type_be(attr->atttypid),
+get_namespace_name(RelationGetNamespace(viewrel)),
+RelationGetRelationName(viewrel;
 			tididx = i;
 			break;
 		}
 	}
+
 	if (tididx < 0)
-		elog(ERROR, "currtid cannot handle views with no CTID");
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot look at latest visible tid for (%u, %u) on view \"%s.%s\" without CTID",
+		ItemPointerGetBlockNumberNoCheck(tid),
+		ItemPointerGetOffsetNumberNoCheck(tid),
+		get_namespace_name(RelationGetNamespace(viewrel)),
+		RelationGetRelationName(viewrel;
+
 	rulelock = viewrel->rd_rules;
+
 	if (!rulelock)
-		elog(ERROR, "the view has no rules");
+		elog(ERROR,
+			 "cannot look at latest visible tid for (%u, %u) on view \"%s.%s\" with no rules",
+			 ItemPointerGetBlockNumberNoCheck(tid),
+			 ItemPointerGetOffsetNumberNoCheck(tid),
+			 get_namespace_name(RelationGetNamespace(viewrel)),
+			 RelationGetRelationName(viewrel));
+
 	for (i = 0; i < rulelock->numLocks; i++)
 	{
 		rewrite = rulelock->rules[i];
@@ -323,7 +345,13 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 			TargetEntry *tle;
 
 			if (list_length(rewrite->actions) != 1)
-elog(ERROR, "only one select rule is allowed in views");
+elog(ERROR,
+	 "cannot look at latest visible tid for (%u, %u) on view \"%s.%s\" with multiple SELECT rules",
+	 ItemPointerGetBlockNumberNoCheck(tid),
+	 ItemPointerGetOffsetNumberNoCheck(tid),
+	 get_namespace_name(RelationGetNamespace(viewrel)),
+	 RelationGetRelationName(viewrel));
+
 			query = (Query *) linitial(rewrite->actions);
 			tle = get_tle_by_resno(query->targetList, tididx + 1);
 			if (tle && tle->expr && IsA(tle->expr, Var))
@@ -345,10 +373,21 @@ currtid_for_view(Relation viewrel, ItemPointer tid)
 			break;
 		}
 	}
-	elog(ERROR, "currtid cannot handle this view");
+
+	elog(ERROR, "could not look at latest visible tid for (%u, %u) on view \"%s.%s\"",
+		 ItemPointerGetBlockNumberNoCheck(tid),
+		 ItemPointerGetOffsetNumberNoCheck(tid),
+		 get_namespace_name(RelationGetNamespace(viewrel)),
+		 RelationGetRelationName(viewrel));
 	return (Datum) 0;
 }
 
+/*
+ *	currtid_byreloid
+ *		Return the latest visible tid of a relation's tuple, associated
+ *		to the tid given in input.  This is a wrapper for currtid(), and
+ *		uses in input

Re: segmentation fault using currtid and partitioned tables

2020-04-08 Thread Michael Paquier
On Sun, Apr 05, 2020 at 12:51:56PM -0400, Tom Lane wrote:
> I think it might be a good idea to make relations-without-storage
> set up rd_tableam as a vector of dummy functions that will throw
> some suitable complaint about "relation lacks storage".  NULL is
> a horrible default for this.

Yeah, that's not good, but I am not really comfortable with the
concept of implying that (pg_class.relam == InvalidOid) maps to a
dummy AM callback set instead of NULL for rd_tableam.  That feels less
natural.  As mentioned upthread, the error that we get in ~11 is
confusing as well when using a relation that has no storage:
ERROR:  58P01: could not open file "base/16384/16385": No such file or directory

I have been looking at the tree and the use of the table AM APIs, and
those TID lookups are really a particular case compared to the other
callers of the table AM callbacks.  Anyway, I have not spotted similar
problems, so I find very tempting the option to just add some
RELKIND_HAS_STORAGE() to tid.c where it matters and call it a day.

Andres, what do you think?
--
Michael


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-04-05 Thread Tom Lane
Jaime Casanova  writes:
> Another one caught by sqlsmith, on the regression database run this
> query (using any non-partitioned table works fine):
> select currtid('pagg_tab'::regclass::oid, '(0,156)'::tid) >= '(1,158)'::tid;

Hm, so

(1) currtid_byreloid and currtid_byrelname lack any check to see
if they're dealing with a relkind that lacks storage.

(2) The proximate cause of the crash is that rd_tableam is zero,
so that the interface functions in tableam.h just crash hard.
This seems like a pretty bad thing; does anyone want to promise
that there are no other oversights of the same ilk elsewhere,
and never will be?

I think it might be a good idea to make relations-without-storage
set up rd_tableam as a vector of dummy functions that will throw
some suitable complaint about "relation lacks storage".  NULL is
a horrible default for this.

regards, tom lane




segmentation fault using currtid and partitioned tables

2020-04-05 Thread Jaime Casanova
Hi,

Another one caught by sqlsmith, on the regression database run this
query (using any non-partitioned table works fine):

"""
select currtid('pagg_tab'::regclass::oid, '(0,156)'::tid) >= '(1,158)'::tid;
"""

This works on 11 (well it gives an error because the file doesn't
exists) but crash the server on 12+

attached the stack trace from master

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
#0  0x00a3a3f1 in table_beginscan_tid (rel=0x7ff8ad4d01b8, 
snapshot=0x1336430) at ../../../../src/include/access/tableam.h:842
flags = 8
#1  0x00a3b1e3 in currtid_byreloid (fcinfo=0x130c298) at tid.c:384
reloid = 37534
tid = 0x128f498
result = 0x135b960
rel = 0x7ff8ad4d01b8
aclresult = ACLCHECK_OK
snapshot = 0x1336430
scan = 0x10
#2  0x006fc889 in ExecInterpExpr (state=0x130c100, econtext=0x130be00, 
isnull=0x7fff1f9659e7) at execExprInterp.c:698
fcinfo = 0x130c298
args = 0x130c2b8
nargs = 2
d = 140733723337632
op = 0x130c2f0
resultslot = 0x130c008
innerslot = 0x0
outerslot = 0x0
scanslot = 0x0
dispatch_table = {0x6fc140 , 0x6fc165 
, 0x6fc19b , 0x6fc1d4 
,
  0x6fc20d , 0x6fc296 , 
0x6fc31f , 0x6fc3a8 , 0x6fc3d7 
,
  0x6fc406 , 0x6fc435 , 
0x6fc463 , 0x6fc50a , 0x6fc5b1 
,
  0x6fc658 , 0x6fc6b3 , 
0x6fc756 , 0x6fc78c ,
  0x6fc7f9 , 0x6fc8c8 , 
0x6fc8f6 , 0x6fc924 ,
  0x6fc92f , 0x6fc997 , 
0x6fc9f0 , 0x6fc9fb ,
  0x6fca63 , 0x6fcabc , 
0x6fcaf4 , 0x6fcb69 ,
  0x6fcb94 , 0x6fcbdf , 
0x6fcc2d , 0x6fcc88 ,
  0x6fccca , 0x6fcd0f , 
0x6fcd3d , 0x6fcd6b ,
  0x6fcda5 , 0x6fce08 , 
0x6fce6b , 0x6fcea5 ,
  0x6fced3 , 0x6fcf01 , 
0x6fcf31 , 0x6fd01d ,
  0x6fd073 , 0x6fd25b , 
0x6fd34e , 0x6fd432 ,
  0x6fd50e , 0x6fd535 , 
0x6fd55c , 0x6fd583 ,
  0x6fd5aa , 0x6fd5d8 , 
0x6fd5ff , 0x6fd745 ,
  0x6fd849 , 0x6fd870 , 
0x6fd89e , 0x6fd8cc ,
  0x6fd8fa , 0x6fd950 , 
0x6fd977 , 0x6fd99e ,
  0x6fcfa7 , 0x6fda1a , 
0x6fda41 , 0x6fd9c5 ,
  0x6fd9f3 , 0x6fda68 , 
0x6fda8f , 0x6fdb2d ,
  0x6fdb54 , 0x6fdbf2 , 
0x6fdc20 , 0x6fdc4e ,
  0x6fdc89 , 0x6fdd3c , 
0x6fddca , 0x6fde51 ,
  0x6fdede , 0x6fdffe , 
0x6fe0e8 , 0x6fe1bc ,
  0x6fe2d9 , 0x6fe3c0 , 
0x6fe491 , 0x6fe4bf ,
  0x6fe4ed }
#3  0x006fe563 in ExecInterpExprStillValid (state=0x130c100, 
econtext=0x130be00, isNull=0x7fff1f9659e7) at execExprInterp.c:1807
No locals.
#4  0x0074a22c in ExecEvalExprSwitchContext (state=0x130c100, 
econtext=0x130be00, isNull=0x7fff1f9659e7) at 
../../../src/include/executor/executor.h:313
retDatum = 18446462598752812812
oldContext = 0x130b990
#5  0x0074a295 in ExecProject (projInfo=0x130c0f8) at 
../../../src/include/executor/executor.h:347
econtext = 0x130be00
state = 0x130c100
slot = 0x130c008
isnull = false
#6  0x0074a47b in ExecResult (pstate=0x130bce8) at nodeResult.c:136
node = 0x130bce8
outerTupleSlot = 0x779642 
outerPlan = 0x0
econtext = 0x130be00
#7  0x00712656 in ExecProcNodeFirst (node=0x130bce8) at 
execProcnode.c:444
No locals.
#8  0x00707085 in ExecProcNode (node=0x130bce8) at 
../../../src/include/executor/executor.h:245
No locals.
#9  0x00709b39 in ExecutePlan (estate=0x130bab0, planstate=0x130bce8, 
use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0x12b30c0, execute_once=true) at 
execMain.c:1646
slot = 0x0
current_tuple_count = 0
#10 0x007076a9 in standard_ExecutorRun (queryDesc=0x12ae280, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
estate = 0x130bab0
operation = CMD_SELECT
dest = 0x12b30c0
sendTuples = true
oldcontext = 0x12ae160
__func__ = "standard_ExecutorRun"
#11 0x007074d1 in ExecutorRun (queryDesc=0x12ae280, 
direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308
No locals.
#12 0x00921d77 in PortalRunSelect (portal=0x12f3e70, forward=true, 
count=0, dest=0x12b30c0) at pquery.c:912
queryDesc = 0x12ae280
direction = ForwardScanDirection
nprocessed = 0
__func__ = "PortalRunSelect"
#13 0x00921a30 in PortalRun (portal=0x12f3e70, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12b30c0, 
altdest=0x12b30c0, qc=0x7fff1f965d60)
at pquery.c:756
_save_exception_stack = 0x7fff1f965e70
_save_context_stack = 0x0
_local_sigjmp_buf = {{__jmpbuf = {0, 60194379429867500, 4706656, 
140733723337632, 0, 0, 60194379352272876, -59702997874741268}, __mask_was_saved 
= 0,
__saved_mask =