Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-04-19 Thread Bharath Rupireddy
On Tue, Apr 20, 2021 at 11:20 AM Masahiko Sawada  wrote:
>
> On Tue, Apr 20, 2021 at 11:04 AM Bharath Rupireddy
>  wrote:
> >
> > On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada  
> > wrote:
> > > I’ve updated the patch including the above comment.
> >
> > Thanks for the patch.
> >
> > I was trying to understand below statements:
> > + * we check without a buffer lock if the page is empty but 
> > the
> > + * caller doesn't need to recheck that since we assume 
> > that in
> > + * HEAP_INSERT_FROZEN case, only one process is inserting a
> > + * frozen tuple into this relation.
> > + *
> >
> > And earlier comments from upthread:
> >
> > >> AFAICS the page is always empty when RelationGetBufferForTuple
> > >> returned a valid vmbuffer.  So the "if" should be an "assert" instead.
> >
> > > There is a chance that RelationGetBufferForTuple() returns a valid
> > > vmbuffer but the page is not empty, since RelationGetBufferForTuple()
> > > checks without a lock if the page is empty. But when it comes to
> > > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now
> > > since only one process inserts tuples into the relation. Will fix."
> >
> > I'm not sure whether it is safe to assume "at least for now since only
> > one process inserts tuples into the relation". What if we allow
> > parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we
> > can do that or not. Correct me if I'm wrong.
>
> I think if my assumption is wrong or we allow parallel insert for
> HEAP_INSERT_FROZEN cases in the future, we need to deal with the case
> where frozen tuples are concurrently inserted into the same page. For
> example, we can release vmbuffer when we see the page is no longer
> empty, or we can return a valid buffer but require the caller to
> re-check if the page is still empty. The previous version patch took
> the former approach. More concretely, heap_insert() rechecked if the
> page is still empty in HEAP_INSERT_FROZEN case and set all_frozen_set
> if so. But AFAICS concurrently inserting frozen tuples into the same
> page doesn’t happen for now (COPY FREEZE and REFRESH MATERIALIZED VIEW
> are users of HEAP_INSERT_FROZEN), also pointed out by Horiguchi-san.
> So I added comments and assertions rather than addressing the case
> that never happens with the current code. If concurrently inserting
> frozen tuples into the same page happens, we should get the assertion
> failure that I added in RelationGetBufferForTuple().

Upon thinking further, concurrent insertions into the same page are
not possible while we are in heap_insert in between
RelationGetBufferForTuple and UnlockReleaseBuffer(buffer);.
RelationGetBufferForTuple will lock the buffer in exclusive mode, see
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); and comment " *Returns
pinned and exclusive-locked buffer of a page in given relation". Even
if parallel insertions are allowed in HEAP_INSERT_FROZEN cases, then
each worker will separately acquire pages, insert into them and they
skip getting visibility map page pin if the page is set all-visible by
another worker.

Some more comments on v3 patch:
1) Isn't it good to specify here that what we gain by avoiding pinning
visibility map page something like: gain a few seconds/avoid extra
function calls/or some other better wording?
+ * If the page already is non-empty and all-visible, we skip to
+ * pin on a visibility map buffer since we never clear and set
+ * all-frozen bit on visibility map during inserting a frozen
+ * tuple.
+ */

2) Isn't it good to put PageIsAllVisible(BufferGetPage(buffer))) in
the if clause instead of else if clause, because this is going to be
hitting most of the time, we could avoid page empty check every time?
+if (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0)
+visibilitymap_pin(relation, targetBlock, vmbuffer);
+else if (PageIsAllVisible(BufferGetPage(buffer)))
+skip_vmbuffer = true;

3) I found another typo in v3 - it is "will set" not "will sets":
+ *In HEAP_INSERT_FROZEN cases, we handle the possibility that the
caller will
+ *sets all-frozen bit on the visibility map page.  We pin on the visibility

4) I think a commit message can be added to the upcoming patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Pavel Stehule
>
>
>>
>> you cannot do it - with this you introduce strong dependency on nested
>> objects
>>
>
> What does the plpgsql_check do in this area?  I checked the README[1], but
> can't find
> anything about it.
>

When you run plpgsql_check with performance warning (disabled by default),
then it does check if all called functions are on the same or lower level
than checked functions have.  So when all called operations are stable
(read only), then the function can be marked as stable - and if the
function is marked as volatile, then plpgsql_check raises an warning.


>
>> until we have global temp tables, then it is blocker for usage of
>> temporary tables.
>>
>
All plpgsql expressions are SQL expressions - and anybody can call a
function against a temporary table. But local temporary tables don't exist
in typical CREATE FUNCTION time (registration time). Typically doesn't
exist in plpgsql compile time too. Usually temporary tables are created
inside executed plpgsql functions. So you cannot do any semantical (deeper)
check in registration, or compile time. Just because one kind of object
(temporary tables) doesn't exist. This is a difficult issue for
plpgsql_check too.


> Can you explain more about this?
>
>
>>  Can be nice if some functionality of plpgsql_check can be in core,
>> because I think so it is necessary for development, but the structure and
>> integration of SQL in PLpgSQL is very good (and very practical).
>>
>>
> I'm interested in plpgsql_check.  However I am still confused why we can
> do it in this way, but
> can't do it in the  VOLATILE_AUTO way.
>

You can do it. But you solve one issue, and introduce new kinds of more
terrible issues (related to dependencies between database's objects). The
design of plpgsql is pretty different from the design of Oracle's PL/SQL.
So proposed change means total conceptual change, and you need to write a
lot of new code that will provide necessary infrastructure. I am not sure
if the benefit is higher than the cost. It can be usable, if plpgsql can be
really compiled to some machine code - but it means ten thousands codes
without significant benefits - the bottleneck inside stored procedures is
SQL, and the compilation doesn't help with it.


>
>>
>>>
 b) When you migrate from Oracle,then you can use the STABLE flag, and
 it will be mostly correct.

>>>
>>> This was suggested in our team as well, but I don't think it is very
>>> strict.  For example:
>>> SELECT materialize_bills_for(userId) from users;  Any more proof to say
>>> "STABLE" flag
>>> is acceptable?
>>>
>>
>> Oracle doesn't allow write operations in functions. Or didn't allow it -
>> I am not sure what is possible now. So when you migrate data from Oracle,
>> and if the function is not marked as DETERMINISTIC, you can safely mark it
>> as STABLE.
>>
>
> You are correct.  Good to know the above points.
>

And DETERMINISTIC functions are IMMUTABLE on Postgres's side


>
>>  Elsewhere - it works 99% well. In special cases, there is some black
>> magic - with fresh snapshots, and with using autonomous transactions, and
>> these cases should be solved manually. Sometimes is good enough just
>> removing autonomous transactions, sometimes the complete rewrite is
>> necessary - or redesign functionality.
>>
>>
> is the 1% == "special cases" ?  Do you mind sharing more information about
> these cases,
> either document or code?
>

Unfortunately not. I have not well structured notes from work on ports from
Oracle to Postgres. And these 1% cases are very very variable. People are
very creative. But usually this code is almost very dirty, and not
critical. In Postgres we can use LISTEN, NOTIFY, or possibility to set
app_name or we can use RAISE NOTICE.



> [1] https://github.com/okbob/plpgsql_check/blob/master/README.md#features
>
>
> --
> Best Regards
> Andy Fan (https://www.aliyun.com/)
>


Re: terminate called after throwing an instance of 'std::bad_alloc'

2021-04-19 Thread Justin Pryzby
On Tue, Apr 20, 2021 at 12:38:26AM -0500, Justin Pryzby wrote:
> I don't know if this is related to the other issues, but this seems leaky.

And it explains how the context use counter can exceed its threshold.

create or replace function fn() returns void language plpgsql as $$ declare rec 
int; begin SELECT relpages INTO rec FROM pg_class LIMIT 1; end $$;
explain analyze
SELECT fn()
FROM generate_series(1,99);

SELECT SUM(a) FROM (VALUES(1))a(a);

time PGOPTIONS='-cclient_min_messages=debug -clog_executor_stats=off 
-clog_min_duration_statement=-1 -cjit=on -cjit_above_cost=0 
-cjit_inline_above_cost=0' psql ts -f jitleak.sql
...
psql:jitleak.sql:6: DEBUG:  recreating LLVM context after 100 uses

Question: does the jit context need to be recreated only when inlining is
enabled?  Or maybe it's better if it's not conditionalized like that..




Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-04-19 Thread Masahiko Sawada
On Tue, Apr 20, 2021 at 11:04 AM Bharath Rupireddy
 wrote:
>
> On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada  wrote:
> > I’ve updated the patch including the above comment.
>
> Thanks for the patch.
>
> I was trying to understand below statements:
> + * we check without a buffer lock if the page is empty but 
> the
> + * caller doesn't need to recheck that since we assume that 
> in
> + * HEAP_INSERT_FROZEN case, only one process is inserting a
> + * frozen tuple into this relation.
> + *
>
> And earlier comments from upthread:
>
> >> AFAICS the page is always empty when RelationGetBufferForTuple
> >> returned a valid vmbuffer.  So the "if" should be an "assert" instead.
>
> > There is a chance that RelationGetBufferForTuple() returns a valid
> > vmbuffer but the page is not empty, since RelationGetBufferForTuple()
> > checks without a lock if the page is empty. But when it comes to
> > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now
> > since only one process inserts tuples into the relation. Will fix."
>
> I'm not sure whether it is safe to assume "at least for now since only
> one process inserts tuples into the relation". What if we allow
> parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we
> can do that or not. Correct me if I'm wrong.

I think if my assumption is wrong or we allow parallel insert for
HEAP_INSERT_FROZEN cases in the future, we need to deal with the case
where frozen tuples are concurrently inserted into the same page. For
example, we can release vmbuffer when we see the page is no longer
empty, or we can return a valid buffer but require the caller to
re-check if the page is still empty. The previous version patch took
the former approach. More concretely, heap_insert() rechecked if the
page is still empty in HEAP_INSERT_FROZEN case and set all_frozen_set
if so. But AFAICS concurrently inserting frozen tuples into the same
page doesn’t happen for now (COPY FREEZE and REFRESH MATERIALIZED VIEW
are users of HEAP_INSERT_FROZEN), also pointed out by Horiguchi-san.
So I added comments and assertions rather than addressing the case
that never happens with the current code. If concurrently inserting
frozen tuples into the same page happens, we should get the assertion
failure that I added in RelationGetBufferForTuple().

>
> While we are modifying something in heap_insert:
> 1) Can we adjust the comment below in heap_insert to the 80char limit?
>   * If we're inserting frozen entry into an empty page,
>   * set visibility map bits and PageAllVisible() hint.
> 2) I'm thinking whether we can do page = BufferGetPage(buffer); after
> RelationGetBufferForTuple and use in all the places where currently
> BufferGetPage(buffer) is being used:
> if (PageIsAllVisible(BufferGetPage(buffer)),
> PageClearAllVisible(BufferGetPage(buffer)); and we could even remove
> the local variable page of if (RelationNeedsWAL(relation)).
> 3) We could as well get the block number once and use it in all the
> places in heap_insert, thus we can remove extra calls of
> BufferGetBlockNumber(buffer).

All points are reasonable to me. I'll incorporate them in the next version.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-04-19 Thread Amul Sul
On Tue, Apr 20, 2021 at 6:59 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 19 Apr 2021 16:27:25 +0530, Amul Sul  wrote in
> > On Mon, Apr 19, 2021 at 2:05 PM Kyotaro Horiguchi
> >  wrote:
> > > +   smgrwrite(RelationGetSmgr(index), INIT_FORKNUM, 
> > > BLOOM_METAPAGE_BLKNO,
> > >   (char *) metapage, true);
> > > -   log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
> > > +   log_newpage(&(RelationGetSmgr(index))->smgr_rnode.node, 
> > > INIT_FORKNUM,
> > >
> > > At the log_newpage, index is guaranteed to have rd_smgr. So I prefer
> > > to leave the line alone..  I don't mind other sccessive calls if any
> > > since what I don't like is the notation there.
> > >
> >
> > Perhaps, isn't that bad. It is good to follow the practice of using
> > RelationGetSmgr() for rd_smgr access, IMHO.
>
> I don't mind RelationGetSmgr(index)->smgr_rnode alone or
> &variable->member alone and there's not the previous call to
> RelationGetSmgr just above. How about using a temporary variable?
>
>   SMgrRelation srel = RelationGetSmgr(index);
>   smgrwrite(srel, ...);
>   log_newpage(srel->..);
>

Understood.  Used a temporary variable for the place where
RelationGetSmgr() calls are placed too close or in a loop.

Please have a look at the attached version, thanks for the review.

Regards,
Amul
From d3043a97044d506ab9255c6d6d446ad962ee308c Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 19 Apr 2021 03:16:27 -0400
Subject: [PATCH] Add RelationGetSmgr inline function

---
 contrib/amcheck/verify_nbtree.c   |  3 +-
 contrib/bloom/blinsert.c  |  7 ++--
 contrib/pg_prewarm/autoprewarm.c  |  4 +-
 contrib/pg_prewarm/pg_prewarm.c   |  5 +--
 contrib/pg_visibility/pg_visibility.c |  7 ++--
 src/backend/access/gist/gistbuild.c   | 15 
 src/backend/access/hash/hashpage.c|  4 +-
 src/backend/access/heap/heapam_handler.c  |  9 +++--
 src/backend/access/heap/rewriteheap.c | 11 ++
 src/backend/access/heap/visibilitymap.c   | 46 +--
 src/backend/access/nbtree/nbtree.c|  7 ++--
 src/backend/access/nbtree/nbtsort.c   | 16 +++-
 src/backend/access/spgist/spginsert.c | 15 
 src/backend/access/table/tableam.c|  8 ++--
 src/backend/catalog/heap.c|  2 -
 src/backend/catalog/index.c   |  5 +--
 src/backend/catalog/storage.c | 18 -
 src/backend/commands/tablecmds.c  |  9 +++--
 src/backend/storage/buffer/bufmgr.c   | 25 
 src/backend/storage/freespace/freespace.c | 40 ++--
 src/include/utils/rel.h   | 21 ++-
 21 files changed, 122 insertions(+), 155 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index b31906cbcfd..7f34eed04ba 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -301,8 +301,7 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
 		bool		heapkeyspace,
 	allequalimage;
 
-		RelationOpenSmgr(indrel);
-		if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
+		if (!smgrexists(RelationGetSmgr(indrel), MAIN_FORKNUM))
 			ereport(ERROR,
 	(errcode(ERRCODE_INDEX_CORRUPTED),
 	 errmsg("index \"%s\" lacks a main relation fork",
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index c34a640d1c4..ae10174d94a 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -164,6 +164,7 @@ void
 blbuildempty(Relation index)
 {
 	Page		metapage;
+	SMgrRelation reln = RelationGetSmgr(index);
 
 	/* Construct metapage. */
 	metapage = (Page) palloc(BLCKSZ);
@@ -177,9 +178,9 @@ blbuildempty(Relation index)
 	 * this even when wal_level=minimal.
 	 */
 	PageSetChecksumInplace(metapage, BLOOM_METAPAGE_BLKNO);
-	smgrwrite(index->rd_smgr, INIT_FORKNUM, BLOOM_METAPAGE_BLKNO,
+	smgrwrite(reln, INIT_FORKNUM, BLOOM_METAPAGE_BLKNO,
 			  (char *) metapage, true);
-	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+	log_newpage(&reln->smgr_rnode.node, INIT_FORKNUM,
 BLOOM_METAPAGE_BLKNO, metapage, true);
 
 	/*
@@ -187,7 +188,7 @@ blbuildempty(Relation index)
 	 * write did not go through shared_buffers and therefore a concurrent
 	 * checkpoint may have moved the redo pointer past our xlog record.
 	 */
-	smgrimmedsync(index->rd_smgr, INIT_FORKNUM);
+	smgrimmedsync(reln, INIT_FORKNUM);
 }
 
 /*
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index b3f73ea92d6..0289ea657cb 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -514,15 +514,13 @@ autoprewarm_database_main(Datum main_arg)
 			old_blk->filenode != blk->filenode ||
 			old_blk->forknum != blk->forknum)
 		{
-			RelationOpenSmgr(rel);
-
 			/*
 			 * smgrexists is not safe for illegal forknum, hence check whether
 			 * the passed forknum is valid before using it in smgrexists.
 			 */
 			if (blk->forknum > 

Re: Table refer leak in logical replication

2021-04-19 Thread Amit Langote
Thanks for taking a look.

On Tue, Apr 20, 2021 at 2:09 PM Michael Paquier  wrote:
> On Mon, Apr 19, 2021 at 09:44:05PM +0900, Amit Langote wrote:
> > Okay, how about the attached then?
>
> create_estate_for_relation() returns an extra resultRelInfo that's
> also saved within es_opened_result_relations.  Wouldn't is be simpler
> to take the first element from es_opened_result_relations instead?
> Okay, that's a nit and you are documenting things in a sufficient way,
> but that just seemed duplicated to me.

Manipulating the contents of es_opened_result_relations directly in
worker.c is admittedly a "hack", which I am reluctant to have other
places participating in.  As originally designed, that list is to
speed up ExecCloseResultRelations(), not as a place to access result
relations from.  The result relations targeted over the course of
execution of a query (update/delete) or a (possibly multi-tuple in the
future) replication apply operation will not be guaranteed to be added
to the list in any particular order, so assuming where a result
relation of interest can be found in the list is bound to be unstable.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: terminate called after throwing an instance of 'std::bad_alloc'

2021-04-19 Thread Justin Pryzby
I don't know if this is related to the other issues, but this seems leaky.

create or replace function fn() returns void language plpgsql as $$ declare rec 
int; begin SELECT relpages INTO rec FROM pg_class LIMIT 1; end $$;
explain analyze
SELECT fn()
FROM generate_series(1,999);

PGOPTIONS='-cclient_min_messages=debug -clog_executor_stats=on 
-clog_min_duration_statement=-1 -cjit=on -cjit_above_cost=0 
-cjit_inline_above_cost=0' psql ts -f jitleak.sql
!   70172 kB max resident size

PGOPTIONS='-cclient_min_messages=debug -clog_executor_stats=on 
-clog_min_duration_statement=-1 -cjit=on -cjit_above_cost=0 
-cjit_inline_above_cost=-1' psql ts -f jitleak.sql
!   25672 kB max resident size




Re: select 'x' ~ repeat('x*y*z*', 1000);

2021-04-19 Thread Tom Lane
Thomas Munro  writes:
> Just an observation: on REL_13_STABLE, $SUBJECT maps in ~170MB of
> memory, and on master it's ~204MB.  A backend running that was just
> nuked by the kernel due to lack of swap space on my tiny buildfarm
> animal elver (a VM with 1GB RAM, 2GB swap, not doing much else).

Yeah, that's not terribly surprising.  Note that the point of that
test case is to fail: it's supposed to verify that we apply the
REG_MAX_COMPILE_SPACE limit and error out before hitting a kernel
OOM condition.  When I redid regex memory allocation in 0fc1af174,
there was a question of how to map the old complexity limit to the
new one.  I went with

 #define REG_MAX_COMPILE_SPACE  \
-   (10 * sizeof(struct state) + 10 * sizeof(struct arcbatch))
+   (50 * (sizeof(struct state) + 4 * sizeof(struct arc)))
 #endif

knowing that that was a bit of a scale-up of the limit, but intending
that we'd not fail on any case that worked before.  We could knock
down the 50 multiplier a little, but I'm afraid we'd lose the
it-still-works property, because the edge cases are a little different
for the new code than the old.

regards, tom lane




Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Andy Fan
On Tue, Apr 20, 2021 at 11:32 AM Pavel Stehule 
wrote:

>
>
> út 20. 4. 2021 v 5:16 odesílatel Andy Fan 
> napsal:
>
>>
>>
>> On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> út 20. 4. 2021 v 4:47 odesílatel Andy Fan 
>>> napsal:
>>>


 > - a PL/PGSQL function's meaning depends on the search path in effect
 when it is called, unless it has a SET search_path clause or it fully
 qualifies all object references, so it isn't actually possible in general
 to determine what a function calls at definition time


 I'd think this one as a blocker issue at the beginning since I have to
 insist on
 any new features should not cause semantic changes for existing ones.
 Later I
 found the new definition. As for this feature request, I think we can
 define the
 features like this:

 1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG
 will auto
detect the volatile info based on current search_path / existing
function. If any embedded function can't be found, we can raise an
 error if
VOLATILE_AUTO is used. If people change the volatile attribute
 later, we can:
a). do nothing. This can be the documented feature. or. b). Maintain
 the
dependency tree between functions and if anyone is changed, other
 functions
should be recalculated as well.

 2. VOLATILE_AUTO should never be the default value. It only works when
 people
requires it.

 Then what we can get from this?  Thinking a user is migrating lots of
 UDF from
 other databases.  Asking them to check/set each function's attribute
 might
 be bad. However if we tell them about how VOLATILE_AUTO works, and they
 accept it (I guess most people would accept), then the migration would
 be
 pretty productive.

 I'm listening to any obvious reason to reject it.

>>>
>>> a) This analyses can be very slow - PLpgSQL does lazy planning - query
>>> plans are planned only when are required - and this feature requires
>>> complete planning current function and all nested VOLATILE_AUTO functions -
>>> so start of function can be significantly slower
>>>
>>
>> Actually I am thinking  we can do this when we compile the function,
>> which means that would
>> happen on the "CREATE FUNCTION " stage.   this would need some hacks for
>> sure.  Does
>> this remove your concern?
>>
>
> you cannot do it - with this you introduce strong dependency on nested
> objects
>

What does the plpgsql_check do in this area?  I checked the README[1], but
can't find
anything about it.


> until we have global temp tables, then it is blocker for usage of
> temporary tables.
>

Can you explain more about this?


>  Can be nice if some functionality of plpgsql_check can be in core,
> because I think so it is necessary for development, but the structure and
> integration of SQL in PLpgSQL is very good (and very practical).
>
>
I'm interested in plpgsql_check.  However I am still confused why we can do
it in this way, but
can't do it in the  VOLATILE_AUTO way.


>
>>
>>> b) When you migrate from Oracle,then you can use the STABLE flag, and it
>>> will be mostly correct.
>>>
>>
>> This was suggested in our team as well, but I don't think it is very
>> strict.  For example:
>> SELECT materialize_bills_for(userId) from users;  Any more proof to say
>> "STABLE" flag
>> is acceptable?
>>
>
> Oracle doesn't allow write operations in functions. Or didn't allow it - I
> am not sure what is possible now. So when you migrate data from Oracle, and
> if the function is not marked as DETERMINISTIC, you can safely mark it as
> STABLE.
>

You are correct.  Good to know the above points.


>  Elsewhere - it works 99% well. In special cases, there is some black
> magic - with fresh snapshots, and with using autonomous transactions, and
> these cases should be solved manually. Sometimes is good enough just
> removing autonomous transactions, sometimes the complete rewrite is
> necessary - or redesign functionality.
>
>
is the 1% == "special cases" ?  Do you mind sharing more information about
these cases,
either document or code?

[1] https://github.com/okbob/plpgsql_check/blob/master/README.md#features

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: pg_amcheck option to install extension

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 9:22 PM, Michael Paquier  wrote:
> 
> On Mon, Apr 19, 2021 at 08:39:06PM -0700, Mark Dilger wrote:
>> This is a classic privilege escalation attack.  Bob has one
>> privilege, and uses it to get another.
> 
> Bob is a superuser, so it has all the privileges of the world for this
> instance.  In what is that different from BASE_BACKUP or just COPY
> FROM PROGRAM?

I think you are conflating the concept of an operating system adminstrator with 
the concept of the database superuser/owner.  If the operating system user that 
postgres is running as cannot execute any binaries, then "copy from program" is 
not a way for a database admistrator to escape the jail.  If Bob does not have 
ssh access to the system, he cannot run pg_basebackup. 

> I am not following your argument here.

The argument is that the operating system user that postgres is running as, 
perhaps user "postgres", can read the files in the $PGDATA directory, but Bob 
can only see the MVCC view of the data, not the raw data.  Installing 
contrib/amcheck allows Bob to get a peak behind the curtain.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Table refer leak in logical replication

2021-04-19 Thread Michael Paquier
On Mon, Apr 19, 2021 at 09:44:05PM +0900, Amit Langote wrote:
> Okay, how about the attached then?

create_estate_for_relation() returns an extra resultRelInfo that's
also saved within es_opened_result_relations.  Wouldn't is be simpler
to take the first element from es_opened_result_relations instead?
Okay, that's a nit and you are documenting things in a sufficient way,
but that just seemed duplicated to me.

> I decided to go with just
> finish_estate(), because we no longer have to do anything relation
> specific there.

Fine by me.
--
Michael


signature.asc
Description: PGP signature


Re: Performance Evaluation of Result Cache by using TPC-DS

2021-04-19 Thread Yuya Watari
Hello David,

Thank you for your reply.

> Thanks for running that again.  I see from the EXPLAIN ANALYZE output
> that the planner did cost the Result Cache plan slightly more
> expensive than the Hash Join plan.  It's likely that add_path() did
> not consider the Hash Join plan to be worth keeping because it was not
> more than 1% better than the Result Cache plan. STD_FUZZ_FACTOR is set
> so new paths need to be at least 1% better than existing paths for
> them to be kept.  That's pretty unfortunate and that alone does not
> mean the costs are incorrect.  It would be good to know if that's the
> case for the other queries too.

Thanks for your analysis. I understood why HashJoin was not selected
in this query plan.

> To test that, I've set up TPC-DS locally, however, it would be good if
> you could send me the list of indexes that you've created.  I see the
> tool from the transaction processing council for TPC-DS only comes
> with the list of tables.
>
> Can you share the output of:

I listed all indexes on my machine by executing your query. I attached
the result to this e-mail. I hope it will help you.

Best regards,
Yuya Watari
  pg_get_indexdef   


 CREATE UNIQUE INDEX customer_address_pkey ON public.customer_address USING 
btree (ca_address_sk)
 CREATE UNIQUE INDEX customer_demographics_pkey ON public.customer_demographics 
USING btree (cd_demo_sk)
 CREATE UNIQUE INDEX date_dim_pkey ON public.date_dim USING btree (d_date_sk)
 CREATE UNIQUE INDEX ship_mode_pkey ON public.ship_mode USING btree 
(sm_ship_mode_sk)
 CREATE UNIQUE INDEX time_dim_pkey ON public.time_dim USING btree (t_time_sk)
 CREATE UNIQUE INDEX reason_pkey ON public.reason USING btree (r_reason_sk)
 CREATE UNIQUE INDEX income_band_pkey ON public.income_band USING btree 
(ib_income_band_sk)
 CREATE UNIQUE INDEX item_pkey ON public.item USING btree (i_item_sk)
 CREATE UNIQUE INDEX store_pkey ON public.store USING btree (s_store_sk)
 CREATE INDEX store_s_closed_date_sk_idx ON public.store USING btree 
(s_closed_date_sk)
 CREATE INDEX call_center_cc_closed_date_sk_idx ON public.call_center USING 
btree (cc_closed_date_sk)
 CREATE INDEX call_center_cc_open_date_sk_idx ON public.call_center USING btree 
(cc_open_date_sk)
 CREATE UNIQUE INDEX call_center_pkey ON public.call_center USING btree 
(cc_call_center_sk)
 CREATE INDEX customer_c_current_cdemo_sk_idx ON public.customer USING btree 
(c_current_cdemo_sk)
 CREATE UNIQUE INDEX customer_pkey ON public.customer USING btree 
(c_customer_sk)
 CREATE INDEX customer_c_first_shipto_date_sk_idx ON public.customer USING 
btree (c_first_shipto_date_sk)
 CREATE INDEX customer_c_first_sales_date_sk_idx ON public.customer USING btree 
(c_first_sales_date_sk)
 CREATE INDEX customer_c_current_hdemo_sk_idx ON public.customer USING btree 
(c_current_hdemo_sk)
 CREATE INDEX customer_c_current_addr_sk_idx ON public.customer USING btree 
(c_current_addr_sk)
 CREATE INDEX store_returns_sr_store_sk_idx ON public.store_returns USING btree 
(sr_store_sk)
 CREATE INDEX store_returns_sr_return_time_sk_idx ON public.store_returns USING 
btree (sr_return_time_sk)
 CREATE INDEX store_returns_sr_returned_date_sk_idx ON public.store_returns 
USING btree (sr_returned_date_sk)
 CREATE INDEX store_returns_sr_reason_sk_idx ON public.store_returns USING 
btree (sr_reason_sk)
 CREATE INDEX store_returns_sr_item_sk_idx ON public.store_returns USING btree 
(sr_item_sk)
 CREATE INDEX store_returns_sr_hdemo_sk_idx ON public.store_returns USING btree 
(sr_hdemo_sk)
 CREATE INDEX store_returns_sr_customer_sk_idx ON public.store_returns USING 
btree (sr_customer_sk)
 CREATE INDEX store_returns_sr_cdemo_sk_idx ON public.store_returns USING btree 
(sr_cdemo_sk)
 CREATE INDEX store_returns_sr_addr_sk_idx ON public.store_returns USING btree 
(sr_addr_sk)
 CREATE UNIQUE INDEX store_returns_pkey ON public.store_returns USING btree 
(sr_item_sk, sr_ticket_number)
 CREATE UNIQUE INDEX household_demographics_pkey ON 
public.household_demographics USING btree (hd_demo_sk)
 CREATE INDEX household_demographics_hd_income_band_sk_idx ON 
public.household_demographics USING btree (hd_income_band_sk)
 CREATE UNIQUE INDEX promotion_pkey ON public.promotion USING btree (p_promo_sk)
 CREATE INDEX promotion_p_end_date_sk_idx ON public.promotion USING btree 
(p_end_date_sk)
 CREATE INDEX promotion_p_start_date_sk_idx ON public.promotion USING btree 
(p_start_date_sk)
 CREATE INDEX promotion_p_item_sk_idx ON public.promotion USING btree 
(p_item_sk)
 CREATE UNIQUE INDEX catalog_page_pkey ON public.catalog_page USING btree 
(cp_catalog_page_sk)
 CREATE INDEX catalog_page_cp_end_date_sk_idx ON public.catalog_page USING 
btree (cp_end_date_sk)
 CREATE INDEX catalog_page_cp_start_date_sk_idx ON public.catalog_page USING 
btree (cp

select 'x' ~ repeat('x*y*z*', 1000);

2021-04-19 Thread Thomas Munro
Hi,

Just an observation: on REL_13_STABLE, $SUBJECT maps in ~170MB of
memory, and on master it's ~204MB.  A backend running that was just
nuked by the kernel due to lack of swap space on my tiny buildfarm
animal elver (a VM with 1GB RAM, 2GB swap, not doing much else).
Could also be related to an OS upgrade ~1 week ago.  It's obviously
time to increase the VM size which is no problem, but I thought those
numbers were interesting.




Stale description for pg_basebackup

2021-04-19 Thread Kyotaro Horiguchi
Hello.

It seems to me that there's a stale description in the documentation
of pg_basebackup.

https://www.postgresql.org/docs/13/app-pgbasebackup.html

> Note that there are some limitations in taking a backup from a standby:
...
> If you are using -X none, there is no guarantee that all WAL files
> required for the backup are archived at the end of backup.

Actually, pg_basebackup waits for the all required files to be
archived, which is an established behavior by commit
52f8a59dd9@PG10. However, the same commit seems to have forgot to
change the doc for pg_basebackup. (The current description is
introduced by 9a4d51077c@PG10)

The attached is a proposal to rewrite it as the following.

+ If you are using -X none, pg_basebackup may wait for a long time for
+ all the required WAL files to be archived. In that case, You may need
+ to call pg_switch_wal() on the primary to complete it sooner.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index a3e292d44a..73d5cbeaf7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -83,8 +83,7 @@ PostgreSQL documentation
 
 
  
-  If you are using -X none, there is no guarantee that all
-  WAL files required for the backup are archived at the end of backup.
+  If you are using -X none, pg_basebackup may wait for a long time for all the required WAL files to be archived.  In that case, You may need to call pg_switch_wal() on the primary to complete it sooner.
  
 
 


Re: locking [user] catalog tables vs 2pc vs logical rep

2021-04-19 Thread vignesh C
On Wed, Mar 31, 2021 at 5:47 PM vignesh C  wrote:
>
> On Wed, Mar 31, 2021 at 2:35 PM Ajin Cherian  wrote:
> >
> > The patch applies fine on HEAD and "make check" passes fine. No major 
> > comments on the patch, just a minor comment:
> >
> > If you could change the error from, " cannot PREPARE a transaction that has 
> > a lock on user catalog/system table(s)"
> > to "cannot PREPARE a transaction that has an exclusive lock on user 
> > catalog/system table(s)" that would be a more
> > accurate instruction to the user.
> >
>
> Thanks for reviewing the patch.
> Please find the updated patch which includes the fix for the same.

This similar problem exists in case of synchronous replication setup
having synchronous_standby_names referring to the subscriber, when we
do the steps "begin;lock pg_class; insert into test1 values(10);
commit". In this case while decoding of commit, the commit will wait
while trying to acquire a lock on pg_class relation, stack trace for
the same is given below:
#4 0x556936cd5d37 in ProcSleep (locallock=0x556937de8728,
lockMethodTable=0x5569371c2620 ) at proc.c:1361
#5 0x556936cc294a in WaitOnLock (locallock=0x556937de8728,
owner=0x556937e3cd90) at lock.c:1858
#6 0x556936cc1231 in LockAcquireExtended (locktag=0x7ffcbb23cff0,
lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true,
locallockp=0x7ffcbb23cfe8)
at lock.c:1100
#7 0x556936cbdbce in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117
#8 0x5569367afb12 in relation_open (relationId=1259, lockmode=1)
at relation.c:56
#9 0x556936a2 in table_open (relationId=1259, lockmode=1) at table.c:43
#10 0x556936e90a91 in RelidByRelfilenode (reltablespace=0,
relfilenode=16385) at relfilenodemap.c:192
#11 0x556936c40361 in ReorderBufferProcessTXN (rb=0x556937e8e760,
txn=0x556937eb8778, commit_lsn=23752880, snapshot_now=0x556937ea0a90,
command_id=0, streaming=false)
at reorderbuffer.c:2122
#12 0x556936c411b7 in ReorderBufferReplay (txn=0x556937eb8778,
rb=0x556937e8e760, xid=590, commit_lsn=23752880, end_lsn=23752928,
commit_time=672204445820756,
origin_id=0, origin_lsn=0) at reorderbuffer.c:2589
#13 0x556936c41239 in ReorderBufferCommit (rb=0x556937e8e760,
xid=590, commit_lsn=23752880, end_lsn=23752928,
commit_time=672204445820756, origin_id=0, origin_lsn=0)
at reorderbuffer.c:2613
#14 0x556936c2f4d9 in DecodeCommit (ctx=0x556937e8c750,
buf=0x7ffcbb23d610, parsed=0x7ffcbb23d4b0, xid=590, two_phase=false)
at decode.c:744

Thoughts?

Regards,
Vignesh




Re: pg_amcheck option to install extension

2021-04-19 Thread Michael Paquier
On Mon, Apr 19, 2021 at 08:39:06PM -0700, Mark Dilger wrote:
> This is a classic privilege escalation attack.  Bob has one
> privilege, and uses it to get another.

Bob is a superuser, so it has all the privileges of the world for this
instance.  In what is that different from BASE_BACKUP or just COPY
FROM PROGRAM?

I am not following your argument here.
--
Michael


signature.asc
Description: PGP signature


Re: pg_amcheck option to install extension

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 8:06 PM, Michael Paquier  wrote:
> 
> On Mon, Apr 19, 2021 at 07:15:23PM -0700, Mark Dilger wrote:
>> There is another issue to consider.  Installing pg_amcheck in no way
>> opens up an avenue of attack that I can see.  It is just a client
>> application with no special privileges.  But installing amcheck
>> arguably opens a line of attack; not one as significant as
>> installing pageinspect, but of the same sort.  Amcheck allows
>> privileged database users to potentially get information from the
>> tables that would otherwise be invisible even to them according to
>> mvcc rules.  (Is this already the case via some other functionality?
>> Maybe this security problem already exists?)  If the privileged
>> database user has file system access, then this is not at all
>> concerning, since they can already just open the files in a tool of
>> their choice, but I don't see any reason why installations should
>> require that privileged database users also be privileged to access
>> the file system.
> 
> By default, any functions deployed with amcheck have their execution
> rights revoked from public, meaning that only a superuser can run them
> with a default installation.  A non-superuser could execute them only
> once GRANT'd access to them.

Correct.  So having amcheck installed on the system provides the database 
superuser with a privilege escalation attack vector.  I am assuming here the 
database superuser is not a privileged system user, and can only log in 
remotely, has no direct access to the file system, etc.

Alice has a database with sensitive data.  She hires Bob to be her new database 
admin, with superuser privilege, but doesn't want Bob to see the sensitive 
data, so she deletes it first.  The data is dead but still on disk.

Bob discovers a bug in postgres that will corrupt dead rows that some index is 
still pointing at.  This attack requires sufficient privilege to trigger the 
bug, but presumably he has that much privilege, because he is a database 
superuser.  Let's call this attack C(x) where "C" means the corruption inducing 
function, and "x" is the indexed key for which dead rows will be corrupted.

Bob runs "CREATE EXTENSION amcheck", and then successively runs C(x) followed 
by amcheck for each interesting value of x.  Bob learns which of these values 
were in the system before Alice deleted them.

This is a classic privilege escalation attack.  Bob has one privilege, and uses 
it to get another.

Alice might foresee this behavior from Bob and choose not to install 
contrib/amcheck.  This is paranoid on her part, but does not cross the line 
into insanity.

The postgres community has every reason to keep amcheck in contrib so that 
users such as Alice can make this decision.

No similar argument has been put forward for why pg_amcheck should be kept in 
contrib.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Replication slot stats misgivings

2021-04-19 Thread Masahiko Sawada
On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada  wrote:
>
> On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > 4.
> > > > +CREATE VIEW pg_stat_replication_slots AS
> > > > +SELECT
> > > > +s.slot_name,
> > > > +s.spill_txns,
> > > > +s.spill_count,
> > > > +s.spill_bytes,
> > > > +s.stream_txns,
> > > > +s.stream_count,
> > > > +s.stream_bytes,
> > > > +s.total_txns,
> > > > +s.total_bytes,
> > > > +s.stats_reset
> > > > +FROM pg_replication_slots as r,
> > > > +LATERAL pg_stat_get_replication_slot(slot_name) as s
> > > > +WHERE r.datoid IS NOT NULL; -- excluding physical slots
> > > > ..
> > > > ..
> > > >
> > > > -/* Get the statistics for the replication slots */
> > > > +/* Get the statistics for the replication slot */
> > > >  Datum
> > > > -pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
> > > > +pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
> > > >  {
> > > >  #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
> > > > - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > > > + text *slotname_text = PG_GETARG_TEXT_P(0);
> > > > + NameData slotname;
> > > >
> > > > I think with the above changes getting all the slot stats has become
> > > > much costlier. Is there any reason why can't we get all the stats from
> > > > the new hash_table in one shot and return them to the user?
> > >
> > > I think the advantage of this approach would be that it can avoid
> > > showing the stats for already-dropped slots. Like other statistics
> > > views such as pg_stat_all_tables and pg_stat_all_functions, searching
> > > the stats by the name got from pg_replication_slots can show only
> > > available slot stats even if the hash table has garbage slot stats.
> > >
> >
> > Sounds reasonable. However, if the create_slot message is missed, it
> > will show an empty row for it. See below:
> >
> > postgres=# select slot_name, total_txns from pg_stat_replication_slots;
> >  slot_name | total_txns
> > ---+
> >  s1|  0
> >  s2|  0
> >|
> > (3 rows)
> >
> > Here, I have manually via debugger skipped sending the create_slot
> > message for the third slot and we are showing an empty for it. This
> > won't happen for pg_stat_all_tables, as it will set 0 or other initial
> > values in such a case. I think we need to address this case.
>
> Good catch. I think it's better to set 0 to all counters and NULL to
> reset_stats.
>
> >
> > > Given that pg_stat_replication_slots doesn’t show garbage slot stats
> > > even if it has, I thought we can avoid checking those garbage stats
> > > frequently. It should not essentially be a problem for the hash table
> > > to have entries up to max_replication_slots regardless of live or
> > > already-dropped.
> > >
> >
> > Yeah, but I guess we still might not save much by not doing it,
> > especially because for the other cases like tables/functions, we are
> > doing it without any threshold limit.
>
> Agreed.
>
> I've attached the updated patch, please review it.

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v9-0001-Use-HTAB-for-replication-slot-statistics.patch
Description: Binary data


Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Pavel Stehule
út 20. 4. 2021 v 5:16 odesílatel Andy Fan  napsal:

>
>
> On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule 
> wrote:
>
>>
>>
>> út 20. 4. 2021 v 4:47 odesílatel Andy Fan 
>> napsal:
>>
>>>
>>>
>>> > - a PL/PGSQL function's meaning depends on the search path in effect
>>> when it is called, unless it has a SET search_path clause or it fully
>>> qualifies all object references, so it isn't actually possible in general
>>> to determine what a function calls at definition time
>>>
>>>
>>> I'd think this one as a blocker issue at the beginning since I have to
>>> insist on
>>> any new features should not cause semantic changes for existing ones.
>>> Later I
>>> found the new definition. As for this feature request, I think we can
>>> define the
>>> features like this:
>>>
>>> 1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG
>>> will auto
>>>detect the volatile info based on current search_path / existing
>>>function. If any embedded function can't be found, we can raise an
>>> error if
>>>VOLATILE_AUTO is used. If people change the volatile attribute later,
>>> we can:
>>>a). do nothing. This can be the documented feature. or. b). Maintain
>>> the
>>>dependency tree between functions and if anyone is changed, other
>>> functions
>>>should be recalculated as well.
>>>
>>> 2. VOLATILE_AUTO should never be the default value. It only works when
>>> people
>>>requires it.
>>>
>>> Then what we can get from this?  Thinking a user is migrating lots of
>>> UDF from
>>> other databases.  Asking them to check/set each function's attribute
>>> might
>>> be bad. However if we tell them about how VOLATILE_AUTO works, and they
>>> accept it (I guess most people would accept), then the migration would be
>>> pretty productive.
>>>
>>> I'm listening to any obvious reason to reject it.
>>>
>>
>> a) This analyses can be very slow - PLpgSQL does lazy planning - query
>> plans are planned only when are required - and this feature requires
>> complete planning current function and all nested VOLATILE_AUTO functions -
>> so start of function can be significantly slower
>>
>
> Actually I am thinking  we can do this when we compile the function, which
> means that would
> happen on the "CREATE FUNCTION " stage.   this would need some hacks for
> sure.  Does
> this remove your concern?
>

you cannot do it - with this you introduce strong dependency on nested
objects - and that means a lot of problems - necessity of rechecks when any
nested object is changed. There will be new problems with dependency, when
you create functions, and until we have global temp tables, then it is
blocker for usage of temporary tables. The current behavior is not perfect,
but in this direction is very practical, and I would not change it. Can be
nice if some functionality of plpgsql_check can be in core, because I think
so it is necessary for development, but the structure and integration of
SQL in PLpgSQL is very good (and very practical).


>
>> b) When you migrate from Oracle,then you can use the STABLE flag, and it
>> will be mostly correct.
>>
>
> This was suggested in our team as well, but I don't think it is very
> strict.  For example:
> SELECT materialize_bills_for(userId) from users;  Any more proof to say
> "STABLE" flag
> is acceptable?
>

Oracle doesn't allow write operations in functions. Or didn't allow it - I
am not sure what is possible now. So when you migrate data from Oracle, and
if the function is not marked as DETERMINISTIC, you can safely mark it as
STABLE. Ora2pg does it. Elsewhere - it works 99% well. In special cases,
there is some black magic - with fresh snapshots, and with using autonomous
transactions, and these cases should be solved manually. Sometimes is good
enough just removing autonomous transactions, sometimes the complete
rewrite is necessary - or redesign functionality.







>
>
>> --
>>> Best Regards
>>> Andy Fan (https://www.aliyun.com/)
>>>
>>
>
> --
> Best Regards
> Andy Fan (https://www.aliyun.com/)
>


RE: Truncate in synchronous logical replication failed

2021-04-19 Thread osumi.takami...@fujitsu.com
On  Tuesday, April 20, 2021 10:53 AM Ajin Cherian  wrote:
> On Sat, Apr 17, 2021 at 2:04 PM osumi.takami...@fujitsu.com
>     > wrote:
> 
>   No problem. Thank you for updating the patch.
>   I've conducted some cosmetic changes. Could you please check
> this ?
>   That's already applied by pgindent.
> 
>   I executed RT for this and made no failure.
>   Just in case, I executed 010_truncate.pl 
> test 100 times in a tight loop,
>   which also didn't fail.
> 
> I reviewed the patch, ran make check, no issues. One minor comment:
> 
> Could you add the comment similar to RelationGetIndexAttrBitmap() on why
> the redo, it's not very obvious to someone reading the code, why we are
> refetching the index list here.
> 
> + /* Check if we need to redo */
> 
> + newindexoidlist = RelationGetIndexList(relation);
Yeah, makes sense. Fixed.
Its indents seem a bit weird but came from pgindent.
Thank you for your review !


Best Regards,
Takamichi Osumi



truncate_in_synchronous_logical_replication_v04.patch
Description: truncate_in_synchronous_logical_replication_v04.patch


Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Andy Fan
On Tue, Apr 20, 2021 at 10:57 AM Pavel Stehule 
wrote:

>
>
> út 20. 4. 2021 v 4:47 odesílatel Andy Fan 
> napsal:
>
>>
>>
>> > - a PL/PGSQL function's meaning depends on the search path in effect
>> when it is called, unless it has a SET search_path clause or it fully
>> qualifies all object references, so it isn't actually possible in general
>> to determine what a function calls at definition time
>>
>>
>> I'd think this one as a blocker issue at the beginning since I have to
>> insist on
>> any new features should not cause semantic changes for existing ones.
>> Later I
>> found the new definition. As for this feature request, I think we can
>> define the
>> features like this:
>>
>> 1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG
>> will auto
>>detect the volatile info based on current search_path / existing
>>function. If any embedded function can't be found, we can raise an
>> error if
>>VOLATILE_AUTO is used. If people change the volatile attribute later,
>> we can:
>>a). do nothing. This can be the documented feature. or. b). Maintain
>> the
>>dependency tree between functions and if anyone is changed, other
>> functions
>>should be recalculated as well.
>>
>> 2. VOLATILE_AUTO should never be the default value. It only works when
>> people
>>requires it.
>>
>> Then what we can get from this?  Thinking a user is migrating lots of UDF
>> from
>> other databases.  Asking them to check/set each function's attribute might
>> be bad. However if we tell them about how VOLATILE_AUTO works, and they
>> accept it (I guess most people would accept), then the migration would be
>> pretty productive.
>>
>> I'm listening to any obvious reason to reject it.
>>
>
> a) This analyses can be very slow - PLpgSQL does lazy planning - query
> plans are planned only when are required - and this feature requires
> complete planning current function and all nested VOLATILE_AUTO functions -
> so start of function can be significantly slower
>

Actually I am thinking  we can do this when we compile the function, which
means that would
happen on the "CREATE FUNCTION " stage.   this would need some hacks for
sure.  Does
this remove your concern?


> b) When you migrate from Oracle,then you can use the STABLE flag, and it
> will be mostly correct.
>

This was suggested in our team as well, but I don't think it is very
strict.  For example:
SELECT materialize_bills_for(userId) from users;  Any more proof to say
"STABLE" flag
is acceptable?



> --
>> Best Regards
>> Andy Fan (https://www.aliyun.com/)
>>
>

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Thomas Munro
On Tue, Apr 20, 2021 at 1:48 PM Julien Rouhaud  wrote:
> On Tue, Apr 20, 2021 at 12:05:27PM +1200, Thomas Munro wrote:
> > Yeah, that runs directly into non-trivial locking problems.  I felt
> > like some of the other complaints could conceivably be addressed in
> > time, including dumb stuff like Windows default locale string format
> > and hopefully some expression analysis problems, but not this.  I'll
> > hold off reverting for a few more days to see if anyone has any other
> > thoughts on that, because there doesn't seem to be any advantage in
> > being too hasty about it.
>
> I also feel that the ALTER TYPE example Tom showed earlier isn't something
> trivial to fix and cannot be done in pg14 :(

Just an idea:  It might be possible to come up with a scheme where
ALTER TYPE ADD ATTRIBUTE records versions somewhere at column add
time, and index_check_collation_versions() finds and checks those when
they aren't superseded by index->collation versions created by
REINDEX, or already present due to other dependencies on the same
collation.  Of course, the opposite problem applies when you ALTER
TYPE DROP ATTRIBUTE: you might have some zombie refobjversions you
don't need anymore, but that would seem to be the least of your
worries if you drop attributes from composite types used in indexes:

create type myrow as (f1 int, f2 int);
create table mytable (r1 myrow primary key);
insert into mytable
select row(generate_series(1, 10), generate_series(10, 1, -1))::myrow;
select * from mytable;
alter type myrow drop attribute f1;
select * from mytable;
select * from mytable where r1 = row(6); -- !!!
reindex table mytable;
select * from mytable where r1 = row(6);




Re: pg_amcheck option to install extension

2021-04-19 Thread Michael Paquier
On Mon, Apr 19, 2021 at 07:15:23PM -0700, Mark Dilger wrote:
> There is another issue to consider.  Installing pg_amcheck in no way
> opens up an avenue of attack that I can see.  It is just a client
> application with no special privileges.  But installing amcheck
> arguably opens a line of attack; not one as significant as
> installing pageinspect, but of the same sort.  Amcheck allows
> privileged database users to potentially get information from the
> tables that would otherwise be invisible even to them according to
> mvcc rules.  (Is this already the case via some other functionality?
> Maybe this security problem already exists?)  If the privileged
> database user has file system access, then this is not at all
> concerning, since they can already just open the files in a tool of
> their choice, but I don't see any reason why installations should
> require that privileged database users also be privileged to access
> the file system.

By default, any functions deployed with amcheck have their execution
rights revoked from public, meaning that only a superuser can run them
with a default installation.  A non-superuser could execute them only
once GRANT'd access to them.
--
Michael


signature.asc
Description: PGP signature


Re: Forget close an open relation in ReorderBufferProcessTXN()

2021-04-19 Thread Amit Langote
On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila  wrote:
> On Fri, Apr 16, 2021 at 11:24 PM Andres Freund  wrote:> > 
> This made me take a brief look at pgoutput.c - maybe I am missing
> > something, but how is the following not a memory leak?
> >
> > static void
> > maybe_send_schema(LogicalDecodingContext *ctx,
> >   ReorderBufferTXN *txn, ReorderBufferChange *change,
> >   Relation relation, RelationSyncEntry *relentry)
> > {
> > ...
> > /* Map must live as long as the session does. */
> > oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc),
> >
> > CreateTupleDescCopy(outdesc));
> > MemoryContextSwitchTo(oldctx);
> > send_relation_and_attrs(ancestor, xid, ctx);
> > RelationClose(ancestor);
> >
> > If - and that's common - convert_tuples_by_name() won't have to do
> > anything, the copied tuple descs will be permanently leaked.
> >
>
> I also think this is a permanent leak. I think we need to free all the
> memory associated with this map on the invalidation of this particular
> relsync entry (basically in rel_sync_cache_relation_cb).

I agree there's a problem here.

Back in:

https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTPU0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com

I had proposed to move the map creation from maybe_send_schema() to
get_rel_sync_entry(), mainly because the latter is where I realized it
belongs, though a bit too late.   Attached is the part of the patch
for this particular issue.  It also takes care to release the copied
TupleDescs if the map is found to be unnecessary, thus preventing
leaking into CacheMemoryContext.

--
Amit Langote
EDB: http://www.enterprisedb.com


pgoutput-map-tupdesc-leak.patch
Description: Binary data


Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Julien Rouhaud
On Mon, Apr 19, 2021 at 07:27:24PM -0700, Peter Geoghegan wrote:
> On Mon, Apr 19, 2021 at 6:45 PM Julien Rouhaud  wrote:
> > > This argument seems completely absurd to me.
> >
> > I'm not sure why?  For glibc at least, I don't see how we could not end up
> > raising false positive as you have a single glibc version for all its
> > collations.  If a user has say en_US and fr_FR, or any quite stable 
> > collation,
> > most of the glibc upgrades (except 2.28 of course) won't corrupt your 
> > indexes.
> 
> If the versions differ and your index happens to not be corrupt
> because it just so happened to not depend on any of the rules that
> have changed, then a complaint about the collation versions changing
> is not what I'd call a false positive. You can call it that if you
> want, I suppose -- it's just a question of semantics. But I don't
> think you should conflate two very different things. You seem to be
> suggesting that they're equivalent just because you can refer to both
> of them using the same term.
> 
> It's obvious that you could have an absence of index corruption even
> in the presence of a collation incompatibility. Especially when there
> is only 1 tuple in the index, say 

Yes, and technically you could still have corruption on indexes containing 1 or
even 0 rows in case of collation provider upgrade, eg if you have a WHERE
clause on the index that does depend on a collation.

> -- obviously the core idea is to
> manage the dependency on versioned collations, which isn't magic. Do
> you really think that's equivalent to having incorrect version
> dependencies?

No I don't think that's equivalent.  What I wanted to say that it's impossible
to raise a WARNING only if the index can really be corrupted (corner cases like
empty tables or similar apart) for instance because of how glibc report
versions, so raising WARNING in some limited corner cases that definitely can't
be corrupted (like because the index expression itself doesn't depend on the
ordering), which clearly isn't the same thing,  was in my opinion an acceptable
trade-off in a first version.  Sorry if that was (or still is) poorly worded.

In any case it was proven that the current approach has way bigger deficiencies
so it's probably not relevant anymore.




Re: Reduce the number of special cases to build contrib modules on windows

2021-04-19 Thread David Rowley
On Tue, 20 Apr 2021 at 09:28, Andrew Dunstan  wrote:
>
>
> On 4/19/21 12:24 PM, Alvaro Herrera wrote:
> >> diff --git a/src/tools/msvc/MSBuildProject.pm 
> >> b/src/tools/msvc/MSBuildProject.pm
> >> index ebb169e201..68606a296d 100644
> >> --- a/src/tools/msvc/MSBuildProject.pm
> >> +++ b/src/tools/msvc/MSBuildProject.pm
> >> @@ -310,11 +310,12 @@ sub WriteItemDefinitionGroup
> >>  my $targetmachine =
> >>$self->{platform} eq 'Win32' ? 'MachineX86' : 'MachineX64';
> >>
> >> -my $includes = $self->{includes};
> >> -unless ($includes eq '' or $includes =~ /;$/)
> >> +my $includes = "";
> >> +foreach my $inc (@{ $self->{includes} })
> >>  {
> >> -$includes .= ';';
> >> +$includes .= $inc . ";";
> >>  }
> > Perl note: you can do this more easily as
> >
> >   my $includes = join ';', @{$self->{includes}};
> >   $includes .= ';' unless $includes eq '';
> >
>
> or even more simply:
>
>
> my $includes = join ';', @{$self->{includes}}, "";

Both look more compact. Thanks. I'll include this for the next version.

David




Re: amcheck eating most of the runtime with wal_consistency_checking

2021-04-19 Thread Peter Geoghegan
On Mon, Apr 19, 2021 at 7:50 PM Michael Paquier  wrote:
> While testing wal_consistency_checking, I have noticed that by far
> most of the runtime is spent within the regression test check_btree on
> the series of three queries inserting each 100k tuples.  This also
> eats most of the run time of the test on HEAD.  Could we for example
> consider inserting less tuples with a lower fillfactor to reduce the
> runtime of the test without impacting its coverage in a meaningful
> way?

I don't see much point. wal_consistency_checking is intrinsically a
tool that increases the volume of WAL by a large multiple. Plus you
yourself only run it once a year.

I run it much more often than once a year (maybe once every 2 - 3
months), but I haven't noticed this at all.

-- 
Peter Geoghegan




Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Pavel Stehule
út 20. 4. 2021 v 4:47 odesílatel Andy Fan  napsal:

>
>
> > - a PL/PGSQL function's meaning depends on the search path in effect
> when it is called, unless it has a SET search_path clause or it fully
> qualifies all object references, so it isn't actually possible in general
> to determine what a function calls at definition time
>
>
> I'd think this one as a blocker issue at the beginning since I have to
> insist on
> any new features should not cause semantic changes for existing ones.
> Later I
> found the new definition. As for this feature request, I think we can
> define the
> features like this:
>
> 1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will
> auto
>detect the volatile info based on current search_path / existing
>function. If any embedded function can't be found, we can raise an
> error if
>VOLATILE_AUTO is used. If people change the volatile attribute later,
> we can:
>a). do nothing. This can be the documented feature. or. b). Maintain the
>dependency tree between functions and if anyone is changed, other
> functions
>should be recalculated as well.
>
> 2. VOLATILE_AUTO should never be the default value. It only works when
> people
>requires it.
>
> Then what we can get from this?  Thinking a user is migrating lots of UDF
> from
> other databases.  Asking them to check/set each function's attribute might
> be bad. However if we tell them about how VOLATILE_AUTO works, and they
> accept it (I guess most people would accept), then the migration would be
> pretty productive.
>
> I'm listening to any obvious reason to reject it.
>

a) This analyses can be very slow - PLpgSQL does lazy planning - query
plans are planned only when are required - and this feature requires
complete planning current function and all nested VOLATILE_AUTO functions -
so start of function can be significantly slower

b) When you migrate from Oracle,then you can use the STABLE flag, and it
will be mostly correct.

Regards

Pavel



> --
> Best Regards
> Andy Fan (https://www.aliyun.com/)
>


Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Andy Fan
>
> I'm listening to any obvious reason to reject it.
>
> Any obvious reason to reject it because of it would be a lose battle for
sure,
so I would not waste time on it.  Or vote up if you think it is possible and
useful.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


amcheck eating most of the runtime with wal_consistency_checking

2021-04-19 Thread Michael Paquier
Hi Peter,

While testing wal_consistency_checking, I have noticed that by far
most of the runtime is spent within the regression test check_btree on
the series of three queries inserting each 100k tuples.  This also
eats most of the run time of the test on HEAD.  Could we for example
consider inserting less tuples with a lower fillfactor to reduce the
runtime of the test without impacting its coverage in a meaningful
way?

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: 2 questions about volatile attribute of pg_proc.

2021-04-19 Thread Andy Fan
> - a PL/PGSQL function's meaning depends on the search path in effect when
it is called, unless it has a SET search_path clause or it fully qualifies
all object references, so it isn't actually possible in general to
determine what a function calls at definition time


I'd think this one as a blocker issue at the beginning since I have to
insist on
any new features should not cause semantic changes for existing ones. Later
I
found the new definition. As for this feature request, I think we can
define the
features like this:

1. We define a new attribute named VOLATILE_AUTO;  The semantic is PG will
auto
   detect the volatile info based on current search_path / existing
   function. If any embedded function can't be found, we can raise an error
if
   VOLATILE_AUTO is used. If people change the volatile attribute later, we
can:
   a). do nothing. This can be the documented feature. or. b). Maintain the
   dependency tree between functions and if anyone is changed, other
functions
   should be recalculated as well.

2. VOLATILE_AUTO should never be the default value. It only works when
people
   requires it.

Then what we can get from this?  Thinking a user is migrating lots of UDF
from
other databases.  Asking them to check/set each function's attribute might
be bad. However if we tell them about how VOLATILE_AUTO works, and they
accept it (I guess most people would accept), then the migration would be
pretty productive.

I'm listening to any obvious reason to reject it.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


HEAD looks clean with wal_consistency_checking = all

2021-04-19 Thread Michael Paquier
Hi all,

Like every year, I have done some tests with wal_consistency_checking
to see if any inconsistencies have been introduced in WAL replay.  And
the good news is that I have noticed nothing to worry about.

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Peter Geoghegan
On Mon, Apr 19, 2021 at 6:45 PM Julien Rouhaud  wrote:
> > This argument seems completely absurd to me.
>
> I'm not sure why?  For glibc at least, I don't see how we could not end up
> raising false positive as you have a single glibc version for all its
> collations.  If a user has say en_US and fr_FR, or any quite stable collation,
> most of the glibc upgrades (except 2.28 of course) won't corrupt your indexes.

If the versions differ and your index happens to not be corrupt
because it just so happened to not depend on any of the rules that
have changed, then a complaint about the collation versions changing
is not what I'd call a false positive. You can call it that if you
want, I suppose -- it's just a question of semantics. But I don't
think you should conflate two very different things. You seem to be
suggesting that they're equivalent just because you can refer to both
of them using the same term.

It's obvious that you could have an absence of index corruption even
in the presence of a collation incompatibility. Especially when there
is only 1 tuple in the index, say -- obviously the core idea is to
manage the dependency on versioned collations, which isn't magic. Do
you really think that's equivalent to having incorrect version
dependencies?

-- 
Peter Geoghegan




Re: pg_amcheck option to install extension

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 6:41 PM, Michael Paquier  wrote:
> 
> On Mon, Apr 19, 2021 at 12:53:29PM -0400, Tom Lane wrote:
>> FWIW, I think that putting them both in contrib makes the most
>> sense from a structural standpoint.
>> 
>> Either way, though, you'll still need the proposed option to
>> let the executable issue a CREATE EXTENSION to get the shlib
>> loaded.  Unless somebody is proposing that the extension be
>> installed-by-default like plpgsql, and that I am unequivocally
>> not for.
> 
> Agreed.  Something like src/extensions/ would be a tempting option,
> but I don't think that it is a good idea to introduce a new piece of
> infrastructure at this stage, so moving both to contrib/ would be the
> best balance with the current pieces at hand.

There is another issue to consider.  Installing pg_amcheck in no way opens up 
an avenue of attack that I can see.  It is just a client application with no 
special privileges.  But installing amcheck arguably opens a line of attack; 
not one as significant as installing pageinspect, but of the same sort.  
Amcheck allows privileged database users to potentially get information from 
the tables that would otherwise be invisible even to them according to mvcc 
rules.  (Is this already the case via some other functionality?  Maybe this 
security problem already exists?)  If the privileged database user has file 
system access, then this is not at all concerning, since they can already just 
open the files in a tool of their choice, but I don't see any reason why 
installations should require that privileged database users also be privileged 
to access the file system.

If you are not buying my argument here, perhaps a reference to how encryption 
functions are evaluated might help you see my point of view.  You don't ask, 
"can the attacker recover the plain text from the encrypted text", but rather, 
"can the attacker tell the difference between encrypted plain text and 
encrypted random noise."  That's because it is incredibly hard to reason about 
what an attacker might be able to learn.  Even though learning about old data 
using amcheck would be hard, you can't say that an attacker would never be able 
to recover information about deleted rows.  As such, security conscious 
installations are within reason to refuse to install it.

Since amcheck (and to a much larger extent, pageinspect) open potential data 
leakage issues, it makes sense for some security conscious sites to refuse to 
install it.  pg_amcheck on the other hand could be installed everywhere.  I 
understand why it might *feel* like pg_amcheck and amcheck have to both be 
installed to work, but I don't think that point of view makes much sense in 
reality.  The computer running the client and the computer running the server 
are frequently not the same computer.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Can a child process detect postmaster death when in pg_usleep?

2021-04-19 Thread Bharath Rupireddy
On Thu, Apr 15, 2021 at 11:48 AM Bharath Rupireddy
 wrote:
> > We definitely have replaced a lot of sleeps with latch.c primitives
> > over the past few years, since we got WL_EXIT_ON_PM_DEATH and
> > condition variables.  There may be many more to improve...  You
> > mentioned autovacuum... yeah, Stephen fixed one of these with commit
> > 4753ef37, but yeah it's not great to have those others in there...
>
> I have not looked at the commit 4753ef37 previously, but it
> essentially addresses the problem with pg_usleep for vacuum delay. I'm
> thinking we can also replace pg_usleep in below places based on the
> fact that pg_usleep should be avoided in 1) short waits in a loop 2)
> when wait time is dependent on user configurable parameters. And using
> WaitLatch may require us to add wait event types to WaitEventTimeout
> enum, but that's okay.

I'm attaching 3 patches that replace pg_usleep with WaitLatch: 0001 in
lazy_truncate_heap, 0002 in do_pg_stop_backup and 0003 for Pre and
Post Auth Delay. Regression tests pass with these patches. Please
review them.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 4cd54df7e27efb5dd82751fd8c7d15c475ce4a40 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 19 Apr 2021 19:25:02 +0530
Subject: [PATCH v1] Use a WaitLatch for lock waiting in lazy_truncate_heap

Instead of using pg_usleep() in lazy_truncate_heap(), use a
WaitLatch. This has the advantage that we will realize if the
postmaster has been killed since the last time we decided to
sleep.
---
 doc/src/sgml/monitoring.sgml| 5 +
 src/backend/access/heap/vacuumlazy.c| 6 +-
 src/backend/utils/activity/wait_event.c | 3 +++
 src/include/utils/wait_event.h  | 3 ++-
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf083bb77..65d51ce445 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2249,6 +2249,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   VacuumDelay
   Waiting in a cost-based vacuum delay point.
  
+ 
+  VacuumTruncateLock
+  Waiting to acquire exclusive lock on relation for truncation while
+  vacuuming.
+ 
 

   
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e90fc18aa9..dcd598fca8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3188,7 +3188,11 @@ lazy_truncate_heap(LVRelState *vacrel)
 return;
 			}
 
-			pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
+			(void) WaitLatch(MyLatch,
+			 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+			 VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL,
+			 WAIT_EVENT_VACUUM_TRUNCATE_LOCK);
+			ResetLatch(MyLatch);
 		}
 
 		/*
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 89b5b8b7b9..694cd48315 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -485,6 +485,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		case WAIT_EVENT_VACUUM_DELAY:
 			event_name = "VacuumDelay";
 			break;
+		case WAIT_EVENT_VACUUM_TRUNCATE_LOCK:
+			event_name = "VacuumTruncateLock";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 47accc5ffe..60b8ca76f7 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -140,7 +140,8 @@ typedef enum
 	WAIT_EVENT_PG_SLEEP,
 	WAIT_EVENT_RECOVERY_APPLY_DELAY,
 	WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL,
-	WAIT_EVENT_VACUUM_DELAY
+	WAIT_EVENT_VACUUM_DELAY,
+	WAIT_EVENT_VACUUM_TRUNCATE_LOCK
 } WaitEventTimeout;
 
 /* --
-- 
2.25.1

From cf38b83a8c224bce7db730fe3a18ec897690f8ae Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 20 Apr 2021 06:18:15 +0530
Subject: [PATCH v1] Use a WaitLatch in do_pg_stop_backup

Instead of using pg_usleep() in do_pg_stop_backup(), use a
WaitLatch. This has the advantage that we will realize if the
postmaster has been killed since the last time we decided to
sleep.
---
 src/backend/access/transam/xlog.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adfc6f67e2..a632a46a57 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11605,6 +11605,8 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 		((!backup_started_in_recovery && XLogArchivingActive()) ||
 		 (backup_started_in_recovery && XLogArchivingAlways(
 	{
+		Latch *latch;
+
 		XLByteToPrevSeg(stoppoint, _logSegNo, wal_segment_size);
 		XLogFileName(lastxlogfilename, stoptli, _logSegNo, wal_segment_size);
 
@@ -11615,6 +11617,11 @@ do_pg_stop_backup(char *labelfile, bool wai

Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-04-19 Thread Bharath Rupireddy
On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada  wrote:
> I’ve updated the patch including the above comment.

Thanks for the patch.

I was trying to understand below statements:
+ * we check without a buffer lock if the page is empty but the
+ * caller doesn't need to recheck that since we assume that in
+ * HEAP_INSERT_FROZEN case, only one process is inserting a
+ * frozen tuple into this relation.
+ *

And earlier comments from upthread:

>> AFAICS the page is always empty when RelationGetBufferForTuple
>> returned a valid vmbuffer.  So the "if" should be an "assert" instead.

> There is a chance that RelationGetBufferForTuple() returns a valid
> vmbuffer but the page is not empty, since RelationGetBufferForTuple()
> checks without a lock if the page is empty. But when it comes to
> HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now
> since only one process inserts tuples into the relation. Will fix."

I'm not sure whether it is safe to assume "at least for now since only
one process inserts tuples into the relation". What if we allow
parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we
can do that or not. Correct me if I'm wrong.

While we are modifying something in heap_insert:
1) Can we adjust the comment below in heap_insert to the 80char limit?
  * If we're inserting frozen entry into an empty page,
  * set visibility map bits and PageAllVisible() hint.
2) I'm thinking whether we can do page = BufferGetPage(buffer); after
RelationGetBufferForTuple and use in all the places where currently
BufferGetPage(buffer) is being used:
if (PageIsAllVisible(BufferGetPage(buffer)),
PageClearAllVisible(BufferGetPage(buffer)); and we could even remove
the local variable page of if (RelationNeedsWAL(relation)).
3) We could as well get the block number once and use it in all the
places in heap_insert, thus we can remove extra calls of
BufferGetBlockNumber(buffer).

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: ANALYZE counts LP_DEAD line pointers as n_dead_tup

2021-04-19 Thread Peter Geoghegan
On Fri, Apr 16, 2021 at 6:54 PM Peter Geoghegan  wrote:
> How about just documenting it in comments, as in the attached patch? I
> tried to address all of the issues with LP_DEAD accounting together.
> Both the issue raised by Masahiko, and one or two others that were
> also discussed recently on other threads. They all seem kind of
> related to me.

I pushed a version of this just now.

Thanks
-- 
Peter Geoghegan




Re: Truncate in synchronous logical replication failed

2021-04-19 Thread Ajin Cherian
On Sat, Apr 17, 2021 at 2:04 PM osumi.takami...@fujitsu.com <
osumi.takami...@fujitsu.com> wrote:

>
> No problem. Thank you for updating the patch.
> I've conducted some cosmetic changes. Could you please check this ?
> That's already applied by pgindent.
>
> I executed RT for this and made no failure.
> Just in case, I executed 010_truncate.pl test 100 times in a tight loop,
> which also didn't fail.
>
>
I reviewed the patch, ran make check, no issues. One minor comment:

Could you add the comment similar to RelationGetIndexAttrBitmap() on why
the redo, it's not very obvious
to someone reading the code, why we are refetching the index list here.

+ /* Check if we need to redo */
+ newindexoidlist = RelationGetIndexList(relation);

thanks,
Ajin Cherian
Fujitsu Australia


Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Julien Rouhaud
On Tue, Apr 20, 2021 at 12:05:27PM +1200, Thomas Munro wrote:
> 
> Yeah, that runs directly into non-trivial locking problems.  I felt
> like some of the other complaints could conceivably be addressed in
> time, including dumb stuff like Windows default locale string format
> and hopefully some expression analysis problems, but not this.  I'll
> hold off reverting for a few more days to see if anyone has any other
> thoughts on that, because there doesn't seem to be any advantage in
> being too hasty about it.

I also feel that the ALTER TYPE example Tom showed earlier isn't something
trivial to fix and cannot be done in pg14 :(




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Julien Rouhaud
On Mon, Apr 19, 2021 at 11:13:37AM -0700, Peter Geoghegan wrote:
> On Sun, Apr 18, 2021 at 4:23 AM Julien Rouhaud  wrote:
> > So IIUC the issue here is that the code could previously record useless
> > collation version dependencies in somes cases, which could lead to false
> > positive possible corruption messages (and of course additional bloat on
> > pg_depend).  False positive messages can't be avoided anyway, as a collation
> > version update may not corrupt the actually indexed set of data, especially 
> > for
> > glibc.
> 
> This argument seems completely absurd to me.

I'm not sure why?  For glibc at least, I don't see how we could not end up
raising false positive as you have a single glibc version for all its
collations.  If a user has say en_US and fr_FR, or any quite stable collation,
most of the glibc upgrades (except 2.28 of course) won't corrupt your indexes.




Re: pg_amcheck option to install extension

2021-04-19 Thread Michael Paquier
On Mon, Apr 19, 2021 at 12:53:29PM -0400, Tom Lane wrote:
> FWIW, I think that putting them both in contrib makes the most
> sense from a structural standpoint.
> 
> Either way, though, you'll still need the proposed option to
> let the executable issue a CREATE EXTENSION to get the shlib
> loaded.  Unless somebody is proposing that the extension be
> installed-by-default like plpgsql, and that I am unequivocally
> not for.

Agreed.  Something like src/extensions/ would be a tempting option,
but I don't think that it is a good idea to introduce a new piece of
infrastructure at this stage, so moving both to contrib/ would be the
best balance with the current pieces at hand.
--
Michael


signature.asc
Description: PGP signature


Re: GSoC 2021 Proposal Document

2021-04-19 Thread Andreas 'ads' Scherbaum
Hello,

On Sat, Apr 17, 2021 at 8:42 PM Nhi Dang  wrote:

>
>
Thank you for this document.

It looks like there are a couple problems with this - at least if this was
intended
to be a submission for GSoC 2021:

   - The deadline for submissions was April 13th (
   https://summerofcode.withgoogle.com/dashboard/timeline/ )
   - Submissions need to be made using the GSoC website


If you still want to work on this - outside of GSoC - I suggest also reading
this thread here:

https://www.postgresql.org/message-id/CAFwT4nAVEFJQF8nQ%2BmWc6M2Eh8nG2uEhV0bdY7wfaT2aLERUAQ%40mail.gmail.com

This discusses a number of suggested changes, and might be useful for your
proposal.


Regards,

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-04-19 Thread Kyotaro Horiguchi
At Mon, 19 Apr 2021 16:27:25 +0530, Amul Sul  wrote in 
> On Mon, Apr 19, 2021 at 2:05 PM Kyotaro Horiguchi
>  wrote:
> > +   smgrwrite(RelationGetSmgr(index), INIT_FORKNUM, 
> > BLOOM_METAPAGE_BLKNO,
> >   (char *) metapage, true);
> > -   log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
> > +   log_newpage(&(RelationGetSmgr(index))->smgr_rnode.node, 
> > INIT_FORKNUM,
> >
> > At the log_newpage, index is guaranteed to have rd_smgr. So I prefer
> > to leave the line alone..  I don't mind other sccessive calls if any
> > since what I don't like is the notation there.
> >
> 
> Perhaps, isn't that bad. It is good to follow the practice of using
> RelationGetSmgr() for rd_smgr access, IMHO.

I don't mind RelationGetSmgr(index)->smgr_rnode alone or
&variable->member alone and there's not the previous call to
RelationGetSmgr just above. How about using a temporary variable?

  SMgrRelation srel = RelationGetSmgr(index);
  smgrwrite(srel, ...);
  log_newpage(srel->..);


> > > P.S. commitfest entry https://commitfest.postgresql.org/33/3084/
> >
> > Isn't this a kind of open item?
> >
> 
> Sorry, I didn't get you. Do I need to move this to some other bucket?

As discussed in the other branch, I agree that it is registered to the
next CF, not registered as an open items of this cycle.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-04-19 Thread Kyotaro Horiguchi
At Mon, 19 Apr 2021 09:19:36 -0400, Tom Lane  wrote in 
> Michael Paquier  writes:
> > On Mon, Apr 19, 2021 at 05:35:52PM +0900, Kyotaro Horiguchi wrote:
> >> Isn't this a kind of open item?
> 
> > This does not qualify as an open item because it is not an actual bug
> > IMO, neither is it a defect of the existing code, so it seems
> > appropriate to me to not list it.
> 
> Agreed, but by the same token, rushing it into v14 doesn't have any
> clear benefit.  I'd be inclined to leave it for v15 at this point,
> especially since we don't seem to have 100% consensus on the details.

Thanks. Seems reasonable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread James Coleman
On Mon, Apr 19, 2021 at 7:10 PM Tom Lane  wrote:
>
> I wrote:
> > Anyway I'm now inclined to remove that behavior from
> > find_computable_ec_member, and adjust comments accordingly.
>
> After some more testing, that seems like a good thing to do,
> so here's a v4.

This all looks good to me.

James Coleman




Re: Implementing Incremental View Maintenance

2021-04-19 Thread Yugo NAGATA
On Mon, 19 Apr 2021 17:40:31 -0400
Tom Lane  wrote:

> Andrew Dunstan  writes:
> > This patch (v22c) just crashed for me with an assertion failure on
> > Fedora 31. Here's the stack trace:
> 
> > #2  0x0094a54a in ExceptionalCondition
> > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > fileName=fileName@entry=0xa91468
> > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > lineNumber=lineNumber@entry=199) at
> > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> 
> That assert just got added a few days ago, so that's why the patch
> seemed OK before.

Thank you for letting me know. I'll fix it.



-- 
Yugo NAGATA 




Re: when the startup process doesn't

2021-04-19 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Apr 20, 2021 at 5:55 AM Robert Haas  wrote:
>> If I had to pick one of these two ideas, I'd pick the one the
>> log-based solution, since it seems easier to access and simplifies
>> retrospective analysis, but I suspect SQL access would be quite useful
>> for some users too, especially in cloud environments where "just log
>> into the machine and have a look" is not an option.

> +1 for both ideas.  I've heard multiple requests for something like
> that.  A couple of users with update_process_title=off told me they
> regretted that choice when they found themselves running a long crash
> recovery with the only indicator of progress disabled.

Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.

regards, tom lane




Re: partial heap only tuples

2021-04-19 Thread Peter Geoghegan
On Mon, Apr 19, 2021 at 5:09 PM Bruce Momjian  wrote:
> > A diversity of strategies with fallback behavior is sometimes the best
> > strategy. Don't underestimate the contribution of rare and seemingly
> > insignificant adverse events. Consider the lifecycle of the data over
>
> That is an intersting point --- we often focus on optimizing frequent
> operations, but preventing rare but expensive-in-aggregate events from
> happening is also useful.

Right. Similarly, we sometimes focus on adding an improvement,
overlooking more promising opportunities to subtract a disimprovement.
Apparently this is a well known tendency:

https://www.scientificamerican.com/article/our-brain-typically-overlooks-this-brilliant-problem-solving-strategy/

I believe that it's particularly important to consider subtractive
approaches with a complex system. This has sometimes worked well for
me as a conscious and deliberate strategy.

--
Peter Geoghegan




Re: when the startup process doesn't

2021-04-19 Thread Thomas Munro
On Tue, Apr 20, 2021 at 5:55 AM Robert Haas  wrote:
> If I had to pick one of these two ideas, I'd pick the one the
> log-based solution, since it seems easier to access and simplifies
> retrospective analysis, but I suspect SQL access would be quite useful
> for some users too, especially in cloud environments where "just log
> into the machine and have a look" is not an option.

+1 for both ideas.  I've heard multiple requests for something like
that.  A couple of users with update_process_title=off told me they
regretted that choice when they found themselves running a long crash
recovery with the only indicator of progress disabled.




Re: when the startup process doesn't

2021-04-19 Thread Bruce Momjian
On Mon, Apr 19, 2021 at 01:55:13PM -0400, Robert Haas wrote:
> I've noticed that customers not infrequently complain that they start
> postgres and then the system doesn't come up for a while and they have
> no idea what's going on and are (understandably) worried. There are
> probably a number of reasons why this can happen, but the ones that
> seem to come up most often in my experience are (1) SyncDataDirectory
> takes a long time, (b) ResetUnloggedRelations takes a long time, and
> (c) there's a lot of WAL to apply so that takes a long time. It's
> possible to distinguish this last case from the other two by looking
> at the output of 'ps', but that's not super-convenient if your normal
> method of access to the server is via libpq, and it only works if you
> are monitoring it as it's happening rather than looking at the logs
> after-the-fact. I am not sure there's any real way to distinguish the
> other two cases without using strace or gdb or similar.
> 
> It seems to me that we could do better. One approach would be to try
> to issue a log message periodically - maybe once per minute, or some
> configurable interval, e.g. perhaps add messages something like this:

Yes, this certainly needs improvement.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Do we need to update copyright for PG11 branch

2021-04-19 Thread Bruce Momjian
On Mon, Apr 19, 2021 at 09:20:22AM -0400, Tom Lane wrote:
> bchen90  writes:
> >   Recently, I found the copyright info for PG11 branch still is "Portions
> > Copyright (c) *1996-2018*, PostgreSQL Global Development Group". Do we need
> > to update it?
> 
> No, that's not our practice.

We technically only update in back branches:

./doc/src/sgml/legal.sgml in head and back branches
./COPYRIGHT in back branches

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread Tom Lane
I wrote:
> Anyway I'm now inclined to remove that behavior from
> find_computable_ec_member, and adjust comments accordingly.

After some more testing, that seems like a good thing to do,
so here's a v4.

regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0188c1e9a1..6e87fba2aa 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -35,6 +35,7 @@
 static EquivalenceMember *add_eq_member(EquivalenceClass *ec,
 		Expr *expr, Relids relids, Relids nullable_relids,
 		bool is_child, Oid datatype);
+static bool is_exprlist_member(Expr *node, List *exprs);
 static void generate_base_implied_equalities_const(PlannerInfo *root,
    EquivalenceClass *ec);
 static void generate_base_implied_equalities_no_const(PlannerInfo *root,
@@ -769,6 +770,167 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	return newec;
 }
 
+/*
+ * find_ec_member_matching_expr
+ *		Locate an EquivalenceClass member matching the given expr, if any;
+ *		return NULL if no match.
+ *
+ * "Matching" is defined as "equal after stripping RelabelTypes".
+ * This is used for identifying sort expressions, and we need to allow
+ * binary-compatible relabeling for some cases involving binary-compatible
+ * sort operators.
+ *
+ * Child EC members are ignored unless they belong to given 'relids'.
+ */
+EquivalenceMember *
+find_ec_member_matching_expr(EquivalenceClass *ec,
+			 Expr *expr,
+			 Relids relids)
+{
+	ListCell   *lc;
+
+	/* We ignore binary-compatible relabeling on both ends */
+	while (expr && IsA(expr, RelabelType))
+		expr = ((RelabelType *) expr)->arg;
+
+	foreach(lc, ec->ec_members)
+	{
+		EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+		Expr	   *emexpr;
+
+		/*
+		 * We shouldn't be trying to sort by an equivalence class that
+		 * contains a constant, so no need to consider such cases any further.
+		 */
+		if (em->em_is_const)
+			continue;
+
+		/*
+		 * Ignore child members unless they belong to the requested rel.
+		 */
+		if (em->em_is_child &&
+			!bms_is_subset(em->em_relids, relids))
+			continue;
+
+		/*
+		 * Match if same expression (after stripping relabel).
+		 */
+		emexpr = em->em_expr;
+		while (emexpr && IsA(emexpr, RelabelType))
+			emexpr = ((RelabelType *) emexpr)->arg;
+
+		if (equal(emexpr, expr))
+			return em;
+	}
+
+	return NULL;
+}
+
+/*
+ * find_computable_ec_member
+ *		Locate an EquivalenceClass member that can be computed from the
+ *		expressions appearing in "exprs"; return NULL if no match.
+ *
+ * "exprs" can be either a list of bare expression trees, or a list of
+ * TargetEntry nodes.  Either way, it should contain Vars and possibly
+ * Aggrefs and WindowFuncs, which are matched to the corresponding elements
+ * of the EquivalenceClass's expressions.
+ *
+ * Unlike find_ec_member_matching_expr, there's no special provision here
+ * for binary-compatible relabeling.  This is intentional: if we have to
+ * compute an expression in this way, setrefs.c is going to insist on exact
+ * matches of Vars to the source tlist.
+ *
+ * Child EC members are ignored unless they belong to given 'relids'.
+ * Also, non-parallel-safe expressions are ignored if 'require_parallel_safe'.
+ *
+ * Note: some callers pass root == NULL for notational reasons.  This is OK
+ * when require_parallel_safe is false.
+ */
+EquivalenceMember *
+find_computable_ec_member(PlannerInfo *root,
+		  EquivalenceClass *ec,
+		  List *exprs,
+		  Relids relids,
+		  bool require_parallel_safe)
+{
+	ListCell   *lc;
+
+	foreach(lc, ec->ec_members)
+	{
+		EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+		List	   *exprvars;
+		ListCell   *lc2;
+
+		/*
+		 * We shouldn't be trying to sort by an equivalence class that
+		 * contains a constant, so no need to consider such cases any further.
+		 */
+		if (em->em_is_const)
+			continue;
+
+		/*
+		 * Ignore child members unless they belong to the requested rel.
+		 */
+		if (em->em_is_child &&
+			!bms_is_subset(em->em_relids, relids))
+			continue;
+
+		/*
+		 * Match if all Vars and quasi-Vars are available in "exprs".
+		 */
+		exprvars = pull_var_clause((Node *) em->em_expr,
+   PVC_INCLUDE_AGGREGATES |
+   PVC_INCLUDE_WINDOWFUNCS |
+   PVC_INCLUDE_PLACEHOLDERS);
+		foreach(lc2, exprvars)
+		{
+			if (!is_exprlist_member(lfirst(lc2), exprs))
+break;
+		}
+		list_free(exprvars);
+		if (lc2)
+			continue;			/* we hit a non-available Var */
+
+		/*
+		 * If requested, reject expressions that are not parallel-safe.  We
+		 * check this last because it's a rather expensive test.
+		 */
+		if (require_parallel_safe &&
+			!is_parallel_safe(root, (Node *) em->em_expr))
+			continue;
+
+		return em;/* found usable expression */
+	}
+
+	return NULL;
+}
+
+/*
+ * is_exprlist_member
+ *	  Subroutine for find_computable_ec_member: is "node" in "exprs"?
+ *
+ * Pe

Re: when the startup process doesn't

2021-04-19 Thread Alvaro Herrera
On 2021-Apr-19, Robert Haas wrote:

> Another possible approach would be to accept connections for
> monitoring purposes even during crash recovery. We can't allow access
> to any database at that point, since the system might not be
> consistent, but we could allow something like a replication connection
> (the non-database-associated variant).

Hmm.  We already have pg_isready, which is pretty simplistic -- it tries
to connect to the server and derive a status in a very simplistic way.
Can we perhaps improve on that?  I think your idea of using the
non-database-connected replication mode would let the server return a
tuple with some status information with a new command.  And then
pg_isready could interpret that, or just print it.

-- 
Álvaro Herrera39°49'30"S 73°17'W
Subversion to GIT: the shortest path to happiness I've ever heard of
(Alexey Klyukin)




Re: Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems

2021-04-19 Thread Tom Lane
Alexey Kondratov  writes:
> And this is an absolute true, on BSD-like systems (macOS and FreeBSD 
> tested) it hangs on looping through the entire ports range over and over 
> when $PostgresNode::use_tcp = 1 is set, since bind fails with:

Hm.

> That way, if it really could happen why not to just skip binding to 
> 127.0.0/24 addresses other than 127.0.0.1 outside of Linux/Windows as 
> per attached patch_PostgresNode.diff?

That patch seems wrong, or at least it's ignoring the advice immediately
above about binding to 0.0.0.0 only on Windows.

I wonder whether we could get away with just replacing the $use_tcp
test with $TestLib::windows_os.  It's not really apparent to me
why we should care about 127.0.0.not-1 on Unix-oid systems.

regards, tom lane




Re: partial heap only tuples

2021-04-19 Thread Bruce Momjian
On Sun, Apr 18, 2021 at 04:27:15PM -0700, Peter Geoghegan wrote:
> Everybody tends to talk about HOT as if it works perfectly once you
> make some modest assumptions, such as "there are no long-running
> transactions", and "no UPDATEs will logically modify indexed columns".
> But I tend to doubt that that's truly the case -- I think that there
> are still pathological cases where HOT cannot keep the total table
> size stable in the long run due to subtle effects that eventually
> aggregate into significant issues, like heap fragmentation. Ask Jan
> Wieck about the stability of some of the TPC-C/BenchmarkSQL tables to

...

> We might have successfully fit the successor heap tuple version a
> million times before just by HOT pruning, and yet currently we give up
> just because it didn't work on the one millionth and first occasion --
> don't you think that's kind of silly? We may be able to afford having
> a fallback strategy that is relatively expensive, provided it is
> rarely used. And it might be very effective in the aggregate, despite
> being rarely used -- it might provide us just what we were missing
> before. Just try harder when you run into a problem every once in a
> blue moon!
> 
> A diversity of strategies with fallback behavior is sometimes the best
> strategy. Don't underestimate the contribution of rare and seemingly
> insignificant adverse events. Consider the lifecycle of the data over

That is an intersting point --- we often focus on optimizing frequent
operations, but preventing rare but expensive-in-aggregate events from
happening is also useful.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Thomas Munro
On Tue, Apr 20, 2021 at 8:21 AM Tom Lane  wrote:>
> Thomas Munro  writes:
> > For example, if you think there actually is a potential better
> > plan than using pg_depend for this, that'd definitely be good to know
> > about.
>
> I really dislike using pg_depend, for a couple of reasons:
>
> * You've broken the invariant that dependencies on pinned objects
> are never recorded.  Now, some of them exist, for reasons having
> nothing to do with the primary goals of pg_depend.  If that's not
> a sign of bad relational design, I don't know what is.  I didn't
> look at the code, but I wonder if you didn't have to lobotomize
> some error checks in dependency.c because of that.  (Perhaps
> some sort of special-case representation for the default
> collation would help here?)

Hmm, OK, thanks, that's something to go back and think about.

> * pg_depend used to always be all-not-null.  Now, most rows in it
> will need a nulls bitmap, adding 8 bytes per row (on maxalign=8
> hardware) to what had been fairly narrow rows.  By my arithmetic
> that's 13.3% bloat in what is already one of our largest
> catalogs.  That's quite unpleasant.  (It would actually be
> cheaper to store an empty-string refobjversion for non-collation
> entries; a single-byte string would fit into the pad space
> after deptype, adding nothing to the row width.)

That seems like a good idea.

> > Hrmph.  Yeah.  We didn't consider types that change later like this,
> > and handling those correctly does seem to warrant some more thought
> > and work than we perhaps have time for.  My first thought is that we'd
> > need to teach it to trigger reanalysis.
>
> That seems like a nonstarter, even before you think about race
> conditions.

Yeah, that runs directly into non-trivial locking problems.  I felt
like some of the other complaints could conceivably be addressed in
time, including dumb stuff like Windows default locale string format
and hopefully some expression analysis problems, but not this.  I'll
hold off reverting for a few more days to see if anyone has any other
thoughts on that, because there doesn't seem to be any advantage in
being too hasty about it.




Re: pg_amcheck contrib application

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 12:50 PM, Robert Haas  wrote:
> 
> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
>  wrote:
>> I have added the verb "has" rather than "contains" because "has" is more 
>> consistent with the phrasing of other similar corruption reports.
> 
> That makes sense.
> 
> I think it's odd that a range of extraneous chunks is collapsed into a
> single report if the size of each chunk happens to be
> TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
> first and last extraneous chunk and the size of each? If the next
> chunk you see is the next one in sequence and the same size as all the
> others, extend your notion of the sequence end by 1. Otherwise, report
> the range accumulated so far. It seems to me that this wouldn't be any
> more code than you have now, and might actually be less.

In all cases of uncorrupted toasted attributes, the sequence of N chunks that 
make up the attribute should be N-1 chunks of TOAST_MAX_CHUNK_SIZE ending with 
a single chunk of up to TOAST_MAX_CHUNK_SIZE.  I'd like to refer to such 
sequences as "reasonably sized" sequences to make conversation easier.

If the toast pointer's va_extsize field leads us to believe that we should find 
10 reasonably sized chunks, but instead we find 30 reasonably sized chunks, we 
know something is corrupt.  We shouldn't automatically prejudice the user 
against the additional 20 chunks.  We didn't expect them, but maybe that's 
because va_extsize was corrupt and gave us a false expectation.  We're not 
pointing fingers one way or the other.

On the other hand, if we expect 10 chunks and find an additional 20 
unreasonably sized chunks, we can and should point fingers at the extra 20 
chunks.  Even if we somehow knew that va_extsize was also corrupt, we'd still 
be justified in saying the 20 unreasonably sized chunks are each, individually 
corrupt.

I tried to write the code to report one corruption message per corruption 
found.  There are some edge cases where this is a definitional challenge, so 
it's not easy to say that I've always achieved this goal, but I think i've done 
so where the definitions are clear.  As such, the only time I'd want to combine 
toast chunks into a single corruption message is when they are not in 
themselves necessarily *individually* corrupt.  That is why I wrote the code to 
use TOAST_MAX_CHUNK_SIZE rather than just storing up any series of equally 
sized chunks.

On a related note, when complaining about a sequence of toast chunks, often the 
sequence is something like [maximal, maximal, ..., maximal, partial], but 
sometimes it's just [maximal...maximal], sometimes just [maximal], and 
sometimes just [partial].  If I'm complaining about that entire sequence, I'd 
really like to do so in just one message, otherwise it looks like separate 
complaints.

I can certainly change the code to be how you are asking, but I'd first like to 
know that you really understood what I was doing here and why the reports read 
the way they do.

> I think that report_missing_chunks() could probably just report the
> range of missing chunks and not bother reporting how big they were
> expected to be. But, if it is going to report how big they were
> expected to be, I think it should have only 2 cases rather than 4:
> either a range of missing chunks of equal size, or a single missing
> chunk of some size. If, as I propose, it doesn't report the expected
> size, then you still have just 2 cases: a range of missing chunks, or
> a single missing chunk.

Right, this is the same as above.  I'm trying not to split a single corruption 
complaint into separate reports.

> Somehow I have a hard time feeling confident that check_toast_tuple()
> is going to do the right thing. The logic is really complex and hard
> to understand. 'chunkno' is a counter that advances every time we move
> to the next chunk, and 'curchunk' is the value we actually find in the
> TOAST tuple. This terminology is not easy to understand. Most messages
> now report 'curchunk', but some still report 'chunkno'. Why does
> 'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
> and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
> particular reason why what you're calling 'chunkno' needs to exist
> even as a local variable, let alone be printed out. Either we haven't
> yet validated that the chunk_id extracted from the tuple is non-null
> and greater than the last chunk number we saw, in which case we can
> just complain about it if we find it to be otherwise, or we have
> already done that validation, in which case we should complain about
> that value and not 'chunkno' in any subsequent messages.

If we use tctx->last_chunk_seen as you propose, I imagine we'd set that to -1 
prior to the first call to check_toast_tuple().  In the first call, we'd 
extract the toast chunk_seq and store it in curchunk and verify that it's one 
greater than tctx->last_chunk_seen.  That all seems fine.

But under corrupt condi

column-level security policies for application users

2021-04-19 Thread Dan Lynch
Has anyone discussed previously column-level security "policies" or how to
best manage/implement them as they don't exist yet?

In my mind we have great tools for database administrator users to have
column level security with grants, but not application users in a manner
akin to RLS.

My current solution is to leverage a trigger with a whenClause that checks
the permissions. Imagine creating a publishing flow with authors and
publishers on the same object:

CREATE TABLE posts (
id serial primary key,
title text,
content text,
published boolean DEFAULT FALSE,
author_id uuid NOT NULL DEFAULT get_curent_user_id(),
publisher_id uuid NOT NULL DEFAULT
'85d770e6-7c18-4e98-bbd5-160b512e6c23'
);

CREATE TRIGGER ensure_only_publisher_can_publish
AFTER UPDATE ON posts
FOR EACH ROW
WHEN (
NEW.publisher_id <> get_curent_user_id ()
AND
OLD.published IS DISTINCT FROM NEW.published
)
EXECUTE PROCEDURE throw_error ('OWNED_COLUMNS', 'published');

CREATE TRIGGER ensure_only_publisher_can_publish_insert
AFTER INSERT ON posts
FOR EACH ROW
WHEN (
NEW.publisher_id <> get_curent_user_id ()
AND
NEW.published IS TRUE
)
EXECUTE PROCEDURE throw_error ('OWNED_COLUMNS', 'published');

If you want to run the example I've included a gist here that wraps all
deps in a tx:
https://gist.github.com/pyramation/2a7b836ab47a2450b951a256dfe7cbde

It works! The author can create posts, and only the publisher can "publish"
them. However it has some disadvantages.

   1. uses triggers, cannot use BYPASSRLS and have to use replication role
   2. Behavior for INSERT to my knowledge requires an understanding of
   valid or default values

#1 I could manage, I can imagine using the replication role if needed in
some places. #2 however, feels clunky and closely coupled to the data model
given it requires default or whitelisted values.

Thoughts? Any other solutions out there I should be aware of?





Dan Lynch
(734) 657-4483


Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems

2021-04-19 Thread Alexey Kondratov

Hi Hackers,

Inside PostgresNode.pm there is a free port choosing routine --- 
get_free_port(). The comment section there says:


# On non-Linux, non-Windows kernels, binding to 127.0.0/24 addresses
# other than 127.0.0.1 might fail with EADDRNOTAVAIL.

And this is an absolute true, on BSD-like systems (macOS and FreeBSD 
tested) it hangs on looping through the entire ports range over and over 
when $PostgresNode::use_tcp = 1 is set, since bind fails with:


# Checking port 52208
# bind: 127.0.0.1 52208
# bind: 127.0.0.2 52208
bind: Can't assign requested address

To reproduce just apply reproduce.diff and try to run 'make -C 
src/bin/pg_ctl check'.


This is not a case with standard Postgres tests, since TestLib.pm 
chooses unix sockets automatically everywhere outside Windows. However, 
we got into this problem when tried to run a custom tap test that 
required TCP for stable running.


That way, if it really could happen why not to just skip binding to 
127.0.0/24 addresses other than 127.0.0.1 outside of Linux/Windows as 
per attached patch_PostgresNode.diff?



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index db47a97d196..9add9bde2a4 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1203,7 +1203,7 @@ sub get_free_port
 		if ($found == 1)
 		{
 			foreach my $addr (qw(127.0.0.1),
-$use_tcp ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
+$use_tcp && ($^O eq "linux" || $TestLib::windows_os) ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
 			{
 if (!can_bind($addr, $port))
 {
diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index b1e419f02e9..c25c0793537 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -11,6 +11,8 @@ use Test::More tests => 24;
 my $tempdir   = TestLib::tempdir;
 my $tempdir_short = TestLib::tempdir_short;
 
+$PostgresNode::use_tcp = 1;
+
 program_help_ok('pg_ctl');
 program_version_ok('pg_ctl');
 program_options_handling_ok('pg_ctl');


Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread Tom Lane
James Coleman  writes:
> Two things I wonder:
> 1. Should we add tests for the relabel code path?

As far as that goes, the Relabel-stripping loops in
find_ec_member_matching_expr are already exercised in the core
regression tests (I didn't bother to discover exactly where, but
a quick coverage test run says that they're hit).  The ones in
exprlist_member_ignore_relabel are not iterated though.  On
reflection, the first loop stripping the input node is visibly
unreachable by the sole caller, since everything in the exprvars
list will be a Var, Aggref, WindowFunc, or PlaceHolderVar.  I'm
less sure about what is possible in the targetlist that we're
referencing, but it strikes me that ignoring relabel on that
side is probably functionally wrong: if we have say "f(textcol)"
as an expression to be sorted on, but what is in the tlist is
textcol::varchar or the like, I do not think setrefs.c will
consider that an acceptable match.  So now that's seeming like
an actual bug --- although the lack of field reports suggests
that it's unreachable, most likely because if we do have
"f(textcol)" as a sort candidate, we'll have made sure to emit
plain "textcol" from the source relation, regardless of whether
there might also be a reason to emit textcol::varchar.

Anyway I'm now inclined to remove that behavior from
find_computable_ec_member, and adjust comments accordingly.

> 2. It'd be nice not to have the IS_SRF_CALL duplicated, but that might
> add enough complexity that it's not worth it.

Yeah, I'd messed around with variants that put more smarts
into the bottom-level functions, and decided that it wasn't
a net improvement.

regards, tom lane




Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread Tom Lane
I wrote:
> I'm not wedded to that name, certainly, but it seems like neither
> of these is quite getting at the issue.  An EC can be sorted on,
> by definition, but there are some things we don't want to sort
> on till the final output step.  I was trying to think of something
> using the terminology "early sort", but didn't much like
> "relation_has_early_sortable_ec_member" or obvious variants of that.

... or, as long as it's returning a boolean, maybe it could be
"relation_can_be_sorted_early" ?

regards, tom lane




Re: Implementing Incremental View Maintenance

2021-04-19 Thread Tom Lane
Andrew Dunstan  writes:
> This patch (v22c) just crashed for me with an assertion failure on
> Fedora 31. Here's the stack trace:

> #2  0x0094a54a in ExceptionalCondition
> (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> fileName=fileName@entry=0xa91468
> "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> lineNumber=lineNumber@entry=199) at
> /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69

That assert just got added a few days ago, so that's why the patch
seemed OK before.

regards, tom lane




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-19 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 19, 2021 at 4:32 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > On Fri, Apr 16, 2021 at 3:57 AM Noah Misch  wrote:
> > >> Hence, I do find it reasonable to let pg_read_all_data be sufficient for
> > >> setting LEAKPROOF.  I would not consult datdba, because datdba currently 
> > >> has
> > >> no special read abilities.  It feels too weird to let BYPASSRLS start
> > >> affecting non-RLS access controls.  A reasonable person may assume that
> > >> BYPASSRLS has no consequences until someone uses CREATE POLICY.  That 
> > >> said, I
> > >> wouldn't be horrified if BYPASSRLS played a part.  BYPASSRLS, like
> > >> pg_read_all_data, clearly isn't something to grant lightly.
> >
> > > I agree that datdba doesn't seem like quite the right thing, but I'm
> > > not sure I agree with the rest. How can we say that leakproof is a
> > > non-RLS access control? Its only purpose is to keep RLS secure, so I
> > > guess I'd be inclined to think that of the two, BYPASSRLS is more
> > > closely related to the topic at hand.
> >
> > Umm ... I'm pretty sure LEAKPROOF also affects optimization around
> > "security barrier" views, which I wouldn't call RLS.  Out of these
> > options, I'd prefer granting the ability to pg_read_all_data.
> 
> Oops, I forgot about security_barrier views, which is rather
> embarrassing since I committed them. So, yeah, I agree:
> pg_read_all_data is better.

I'm not really sure that attaching it to pg_read_all_data makes sense..

In general, I've been frustrated by the places where we lump privileges
together rather than having a way to distinctly GRANT capabilities
independently- that's more-or-less exactly what lead me to work on
implementing the role system in the first place, and later the
predefined roles.

I do think it's good to reduce the number of places that require
superuser, in general, but I'm not sure that marking functions as
leakproof as a non-superuser makes sense.

Here's what I'd ask Andrey- what's the actual use-case here?  Are these
cases where users are actually adding new functions which they believe
are leakproof where those functions don't require superuser already to
be created?  Clearly if they're in a untrusted language and you have to
be a superuser to install them in the first place then they should just
mark the function as leakproof when they install it.  If these are
trusted language functions, I'd be curious to actually see them as I
have doubts about if they're actually leakproof..

Or is the actual use-case here that they just want to mark functions we
know aren't leakproof as leakproof anyway because they aren't getting
the performance they want?

If it's the latter, as I suspect it is, then I don't really think the
use-case justifies any change on our part- the right answer is to make
those functions actually leakproof, or write ones which are.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread Tom Lane
James Coleman  writes:
> I forgot to comment on this in my previous email, but it seems to me
> that relation_has_safe_ec_member, while less wordy, isn't quite
> descriptive enough. Perhaps something like
> relation_has_sort_safe_ec_member?

I'm not wedded to that name, certainly, but it seems like neither
of these is quite getting at the issue.  An EC can be sorted on,
by definition, but there are some things we don't want to sort
on till the final output step.  I was trying to think of something
using the terminology "early sort", but didn't much like
"relation_has_early_sortable_ec_member" or obvious variants of that.

regards, tom lane




Re: Reduce the number of special cases to build contrib modules on windows

2021-04-19 Thread Andrew Dunstan


On 4/19/21 12:24 PM, Alvaro Herrera wrote:
>> diff --git a/src/tools/msvc/MSBuildProject.pm 
>> b/src/tools/msvc/MSBuildProject.pm
>> index ebb169e201..68606a296d 100644
>> --- a/src/tools/msvc/MSBuildProject.pm
>> +++ b/src/tools/msvc/MSBuildProject.pm
>> @@ -310,11 +310,12 @@ sub WriteItemDefinitionGroup
>>  my $targetmachine =
>>$self->{platform} eq 'Win32' ? 'MachineX86' : 'MachineX64';
>>  
>> -my $includes = $self->{includes};
>> -unless ($includes eq '' or $includes =~ /;$/)
>> +my $includes = "";
>> +foreach my $inc (@{ $self->{includes} })
>>  {
>> -$includes .= ';';
>> +$includes .= $inc . ";";
>>  }
> Perl note: you can do this more easily as 
>
>   my $includes = join ';', @{$self->{includes}};
>   $includes .= ';' unless $includes eq '';
>

or even more simply:


my $includes = join ';', @{$self->{includes}}, "";

cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-19 Thread Robert Haas
On Mon, Apr 19, 2021 at 4:32 PM Tom Lane  wrote:
> Robert Haas  writes:
> > On Fri, Apr 16, 2021 at 3:57 AM Noah Misch  wrote:
> >> Hence, I do find it reasonable to let pg_read_all_data be sufficient for
> >> setting LEAKPROOF.  I would not consult datdba, because datdba currently 
> >> has
> >> no special read abilities.  It feels too weird to let BYPASSRLS start
> >> affecting non-RLS access controls.  A reasonable person may assume that
> >> BYPASSRLS has no consequences until someone uses CREATE POLICY.  That 
> >> said, I
> >> wouldn't be horrified if BYPASSRLS played a part.  BYPASSRLS, like
> >> pg_read_all_data, clearly isn't something to grant lightly.
>
> > I agree that datdba doesn't seem like quite the right thing, but I'm
> > not sure I agree with the rest. How can we say that leakproof is a
> > non-RLS access control? Its only purpose is to keep RLS secure, so I
> > guess I'd be inclined to think that of the two, BYPASSRLS is more
> > closely related to the topic at hand.
>
> Umm ... I'm pretty sure LEAKPROOF also affects optimization around
> "security barrier" views, which I wouldn't call RLS.  Out of these
> options, I'd prefer granting the ability to pg_read_all_data.

Oops, I forgot about security_barrier views, which is rather
embarrassing since I committed them. So, yeah, I agree:
pg_read_all_data is better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread James Coleman
On Sat, Apr 17, 2021 at 3:39 PM Tom Lane  wrote:
> ...
> Also, I don't much care for either the name or API of
> find_em_expr_usable_for_sorting_rel.  The sole current caller only
> really needs a boolean result, and if it did need more than that
> it'd likely need the whole EquivalenceMember not just the em_expr
> (certainly createplan.c does).  So 0002 attached is some bikeshedding
> on that.  I kept that separate because it might be wise to do it only
> in HEAD, just in case somebody out there is calling the function from
> an extension.

I forgot to comment on this in my previous email, but it seems to me
that relation_has_safe_ec_member, while less wordy, isn't quite
descriptive enough. Perhaps something like
relation_has_sort_safe_ec_member?

James Coleman




Re: Implementing Incremental View Maintenance

2021-04-19 Thread Andrew Dunstan


On 4/7/21 5:25 AM, Yugo NAGATA wrote:
> Hi, 
>
> I rebased the patch because the cfbot failed.
>
> Regards,
> Yugo Nagata



This patch (v22c) just crashed for me with an assertion failure on
Fedora 31. Here's the stack trace:


[New LWP 333090]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: andrew regression [local]
INSERT    '.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f8981caa625 in raise () from /lib64/libc.so.6
#0  0x7f8981caa625 in raise () from /lib64/libc.so.6
#1  0x7f8981c938d9 in abort () from /lib64/libc.so.6
#2  0x0094a54a in ExceptionalCondition
(conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
fileName=fileName@entry=0xa91468
"/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
lineNumber=lineNumber@entry=199) at
/home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
#3  0x006c0e17 in standard_ExecutorStart (queryDesc=0x226af98,
eflags=0) at /home/andrew/pgl/pg_head/src/backend/executor/execMain.c:199
#4  0x006737b2 in refresh_matview_datafill (dest=0x21cf428,
query=, queryEnv=0x2245fd0,
resultTupleDesc=0x7ffd5e764888, queryString=0x0) at
/home/andrew/pgl/pg_head/src/backend/commands/matview.c:719
#5  0x00678042 in calc_delta (queryEnv=0x2245fd0,
tupdesc_new=0x7ffd5e764888, tupdesc_old=0x7ffd5e764880,
dest_new=0x21cf428, dest_old=0x0, query=0x2246108, rte_path=0x2228a60,
table=) at
/home/andrew/pgl/pg_head/src/backend/commands/matview.c:2907
#6  IVM_immediate_maintenance (fcinfo=) at
/home/andrew/pgl/pg_head/src/backend/commands/matview.c:1683
#7  0x0069e483 in ExecCallTriggerFunc (trigdata=0x7ffd5e764bb0,
tgindx=2, finfo=0x22345f8, instr=0x0, per_tuple_context=0x2245eb0) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:2142
#8  0x0069fc4c in AfterTriggerExecute (trigdesc=0x2233db8,
trigdesc=0x2233db8, trig_tuple_slot2=0x0, trig_tuple_slot1=0x0,
per_tuple_context=0x2245eb0, instr=0x0, finfo=0x2234598,
relInfo=0x2233ba0, event=0x222d380, estate=0x2233710) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4041
#9  afterTriggerInvokeEvents (events=0x21cece8, firing_id=1,
estate=0x2233710, delete_ok=false) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4255
#10 0x006a4173 in AfterTriggerEndQuery
(estate=estate@entry=0x2233710) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4632
#11 0x006c04c8 in standard_ExecutorFinish (queryDesc=0x2237300)
at /home/andrew/pgl/pg_head/src/backend/executor/execMain.c:436
#12 0x008415d8 in ProcessQuery (plan=,
sourceText=0x21490a0 "INSERT INTO mv_base_b VALUES(5,105);", params=0x0,
queryEnv=0x0, dest=0x2221010, qc=0x7ffd5e764f00) at
/home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:190
#13 0x008417f2 in PortalRunMulti (portal=portal@entry=0x21ac3c0,
isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2221010,
altdest=altdest@entry=0x2221010, qc=qc@entry=0x7ffd5e764f00) at
/home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:1267
#14 0x00842415 in PortalRun (portal=portal@entry=0x21ac3c0,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x2221010,
altdest=altdest@entry=0x2221010, qc=0x7ffd5e764f00) at
/home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:779
#15 0x0083e3ca in exec_simple_query (query_string=0x21490a0
"INSERT INTO mv_base_b VALUES(5,105);") at
/home/andrew/pgl/pg_head/src/backend/tcop/postgres.c:1196
#16 0x00840075 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffd5e765450, dbname=,
username=) at
/home/andrew/pgl/pg_head/src/backend/tcop/postgres.c:4458
#17 0x007b8054 in BackendRun (port=,
port=) at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:4488
#18 BackendStartup (port=) at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:4210
#19 ServerLoop () at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:1742
#20 0x007b8ebf in PostmasterMain (argc=argc@entry=8,
argv=argv@entry=0x21435c0) at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:1414
#21 0x0050e030 in main (argc=8, argv=0x21435c0) at
/home/andrew/pgl/pg_head/src/backend/main/main.c:209
$1 = {si_signo = 6, si_errno = 0, si_code = -6, _sifields = {_pad =
{333090, 500, 0 }, _kill = {si_pid = 333090, si_uid =
500}, _timer = {si_tid = 333090, si_overrun = 500, si_sigval =
{sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 333090, si_uid = 500,
si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid =
333090, si_uid = 500, si_status = 0, si_utime = 0, si_stime = 0},
_sigfault = {si_addr = 0x1f400051522, _addr_lsb = 0, _addr_bnd = {_lower
= 0x0, _upper = 0x0}}, _sigpoll = {si_band = 2147483981090, si_fd = 0}}}


che

Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-19 Thread James Coleman
On Sun, Apr 18, 2021 at 1:21 PM Tom Lane  wrote:
>
> I wrote:
> > I think it's time for some refactoring of this code so that we can
> > actually share the logic.  Accordingly, I propose the attached.
>
> After sleeping on it, here's an improved version that gets rid of
> an unnecessary assumption about ECs usually not containing both
> parallel-safe and parallel-unsafe members.  I'd tried to do this
> yesterday but didn't like the amount of side-effects on createplan.c
> (caused by the direct call sites not being passed the "root" pointer).
> However, we can avoid refactoring createplan.c APIs by saying that it's
> okay to pass root = NULL to find_computable_ec_member if you're not
> asking it to check parallel safety.  And there's not really a need to
> put a parallel-safety check into find_ec_member_matching_expr at all;
> that task can be left with the one caller that cares.

I like the refactoring here.

Two things I wonder:
1. Should we add tests for the relabel code path?
2. It'd be nice not to have the IS_SRF_CALL duplicated, but that might
add enough complexity that it's not worth it.

Thanks,
James Coleman




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-19 Thread Tom Lane
Robert Haas  writes:
> On Fri, Apr 16, 2021 at 3:57 AM Noah Misch  wrote:
>> Hence, I do find it reasonable to let pg_read_all_data be sufficient for
>> setting LEAKPROOF.  I would not consult datdba, because datdba currently has
>> no special read abilities.  It feels too weird to let BYPASSRLS start
>> affecting non-RLS access controls.  A reasonable person may assume that
>> BYPASSRLS has no consequences until someone uses CREATE POLICY.  That said, I
>> wouldn't be horrified if BYPASSRLS played a part.  BYPASSRLS, like
>> pg_read_all_data, clearly isn't something to grant lightly.

> I agree that datdba doesn't seem like quite the right thing, but I'm
> not sure I agree with the rest. How can we say that leakproof is a
> non-RLS access control? Its only purpose is to keep RLS secure, so I
> guess I'd be inclined to think that of the two, BYPASSRLS is more
> closely related to the topic at hand.

Umm ... I'm pretty sure LEAKPROOF also affects optimization around
"security barrier" views, which I wouldn't call RLS.  Out of these
options, I'd prefer granting the ability to pg_read_all_data.

regards, tom lane




Re: Allowing to create LEAKPROOF functions to non-superuser

2021-04-19 Thread Robert Haas
On Fri, Apr 16, 2021 at 3:57 AM Noah Misch  wrote:
> On Mon, Apr 12, 2021 at 02:35:27PM -0700, Andres Freund wrote:
> > On 2021-04-12 17:14:20 -0400, Tom Lane wrote:
> > > I doubt that falsely labeling a function LEAKPROOF can get you more
> > > than the ability to read data you're not supposed to be able to read
> > > ... but that ability is then available to all users, or at least all
> > > users who can execute the function in question.  So it definitely is a
> > > fairly serious security hazard, and one that's not well modeled by
> > > role labels.  If you give somebody e.g. pg_read_all_data privileges,
> > > you don't expect that that means they can give it to other users.
>
> I do expect that, essentially.  Like Andres describes for BYPASSRLS, they can
> create and GRANT a SECURITY DEFINER function that performs an arbitrary query
> and returns a refcursor (or stores the data to a table of the caller's
> choosing, etc.).  Unlike BYPASSRLS, they can even make pg_read_all_data own
> the function, making the situation persist after one drops the actor's role
> and that role's objects.

Yes. I think that if someone can read all the data, it's unworkable to
suppose that they can't find a way to delegate that ability to others.
If nothing else, a station wagon full of tapes has a lot of bandwidth.

> > A user with BYPASSRLS can create public security definer functions
> > returning data. If the concern is a BYPASSRLS user intentionally
> > exposing data, then there's not a meaningful increase to allow defining
> > LEAKPROOF functions.
>
> Hence, I do find it reasonable to let pg_read_all_data be sufficient for
> setting LEAKPROOF.  I would not consult datdba, because datdba currently has
> no special read abilities.  It feels too weird to let BYPASSRLS start
> affecting non-RLS access controls.  A reasonable person may assume that
> BYPASSRLS has no consequences until someone uses CREATE POLICY.  That said, I
> wouldn't be horrified if BYPASSRLS played a part.  BYPASSRLS, like
> pg_read_all_data, clearly isn't something to grant lightly.

I agree that datdba doesn't seem like quite the right thing, but I'm
not sure I agree with the rest. How can we say that leakproof is a
non-RLS access control? Its only purpose is to keep RLS secure, so I
guess I'd be inclined to think that of the two, BYPASSRLS is more
closely related to the topic at hand.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Tom Lane
Thomas Munro  writes:
> ... The question
> on my mind is whether reverting the feature and trying again for 15
> could produce anything fundamentally better at a design level, or
> would just fix problems in the analyser code that we could fix right
> now.

Well, as I said, I think what we ought to do is treat any record-accepting
functions/operators as "don't know, better assume it's collation
dependent".  What's not clear to me is how that concept could be
shoehorned into the existing design.

> For example, if you think there actually is a potential better
> plan than using pg_depend for this, that'd definitely be good to know
> about.

I really dislike using pg_depend, for a couple of reasons:

* You've broken the invariant that dependencies on pinned objects
are never recorded.  Now, some of them exist, for reasons having
nothing to do with the primary goals of pg_depend.  If that's not
a sign of bad relational design, I don't know what is.  I didn't
look at the code, but I wonder if you didn't have to lobotomize
some error checks in dependency.c because of that.  (Perhaps
some sort of special-case representation for the default
collation would help here?)

* pg_depend used to always be all-not-null.  Now, most rows in it
will need a nulls bitmap, adding 8 bytes per row (on maxalign=8
hardware) to what had been fairly narrow rows.  By my arithmetic
that's 13.3% bloat in what is already one of our largest
catalogs.  That's quite unpleasant.  (It would actually be
cheaper to store an empty-string refobjversion for non-collation
entries; a single-byte string would fit into the pad space
after deptype, adding nothing to the row width.)

> Hrmph.  Yeah.  We didn't consider types that change later like this,
> and handling those correctly does seem to warrant some more thought
> and work than we perhaps have time for.  My first thought is that we'd
> need to teach it to trigger reanalysis.

That seems like a nonstarter, even before you think about race
conditions.

regards, tom lane




Re: pg_amcheck contrib application

2021-04-19 Thread Robert Haas
On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
 wrote:
> I have added the verb "has" rather than "contains" because "has" is more 
> consistent with the phrasing of other similar corruption reports.

That makes sense.

I think it's odd that a range of extraneous chunks is collapsed into a
single report if the size of each chunk happens to be
TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
first and last extraneous chunk and the size of each? If the next
chunk you see is the next one in sequence and the same size as all the
others, extend your notion of the sequence end by 1. Otherwise, report
the range accumulated so far. It seems to me that this wouldn't be any
more code than you have now, and might actually be less.

I think that report_missing_chunks() could probably just report the
range of missing chunks and not bother reporting how big they were
expected to be. But, if it is going to report how big they were
expected to be, I think it should have only 2 cases rather than 4:
either a range of missing chunks of equal size, or a single missing
chunk of some size. If, as I propose, it doesn't report the expected
size, then you still have just 2 cases: a range of missing chunks, or
a single missing chunk.

Somehow I have a hard time feeling confident that check_toast_tuple()
is going to do the right thing. The logic is really complex and hard
to understand. 'chunkno' is a counter that advances every time we move
to the next chunk, and 'curchunk' is the value we actually find in the
TOAST tuple. This terminology is not easy to understand. Most messages
now report 'curchunk', but some still report 'chunkno'. Why does
'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
particular reason why what you're calling 'chunkno' needs to exist
even as a local variable, let alone be printed out. Either we haven't
yet validated that the chunk_id extracted from the tuple is non-null
and greater than the last chunk number we saw, in which case we can
just complain about it if we find it to be otherwise, or we have
already done that validation, in which case we should complain about
that value and not 'chunkno' in any subsequent messages.

The conditionals between where you set max_valid_prior_chunk and where
you set last_chunk_seen seem hard to understand, particularly the
bifurcated way that missing chunks are reported. Initial missing
chunks are detected by (chunkno == 0 && max_valid_prior_chunk >= 0)
and later missing chunks are detected by (chunkno > 0 &&
max_valid_prior_chunk > tctx->last_chunk_seen). I'm not sure if this
is correct; I find it hard to get my head around what
max_valid_prior_chunk is supposed to represent. But in any case I
think it can be written more simply. Just keep track of what chunk_id
we expect to extract from the next TOAST tuple. Initially it's 0.
Then:

if (chunkno < tctx->expected_chunkno)
{
   // toast value %u index scan returned chunk number %d when chunk %d
was expected
   // don't modify tctx->expected_chunkno here, just hope the next
thing matches our previous expectation
}
else
{
if (chunkno > tctx->expected_chunkno)
// chunks are missing from tctx->expected_chunkno through
Min(chunkno - 1, tctx->final_expected_chunk), provided that the latter
value is greater than or equal to the former
tctx->expected_chunkno = chunkno + 1;
}

If you do this, you only need to report extraneous chunks when chunkno
> tctx->final_expected_chunk, since chunkno < 0 is guaranteed to
trigger the first of the two complaints shown above.

In check_tuple_attribute I suggest "bool valid = false" rather than
"bool invalid = true". I think it's easier to understand.

I object to check_toasted_attribute() using 'chunkno' in a message for
the same reasons as above in regards to check_toast_tuple() i.e. I
think it's a concept which should not exist.

I think this patch could possibly be split up into multiple patches.
There's some question in my mind whether it's getting too late to
commit any of this, since some of it looks suspiciously like new
features after feature freeze. However, I kind of hate to ship this
release without at least doing something about the chunkno vs.
curchunk stuff, which is even worse in the committed code than in your
patch, and which I think will confuse the heck out of users if those
messages actually fire for anyone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Thomas Munro
On Tue, Apr 20, 2021 at 5:53 AM Tom Lane  wrote:
> I think that the real fundamental bug is supposing that static analysis
> can give 100% correct answers.  ...

Well, the goal was to perform analysis to the extent possible
statically since that would cover the vast majority of cases and is
practically all you can do.  Clearly there is always going to be a
category of invisible dependencies inside procedural code in general
(halting problem).  We did think about the idea of using new
declarations about functions/operators to know which ones actually
care about collation, rather than assuming that they all do (bugs
aside), as an optimisation, and then that mechanism could in theory
also be used to say that functions that don't appear to depend on
collations actually do internal, but that all seemed like vast
overkill, so we left it for possible later improvements.  The question
on my mind is whether reverting the feature and trying again for 15
could produce anything fundamentally better at a design level, or
would just fix problems in the analyser code that we could fix right
now.  For example, if you think there actually is a potential better
plan than using pg_depend for this, that'd definitely be good to know
about.

> ... Even if it did do so in a given state
> of the database, consider this counterexample:
>
> create type myrow as (f1 int, f2 int);
> create table mytable (id bigint, r1 myrow, r2 myrow);
> create index myindex on mytable(id) where r1 < r2;
> alter type myrow add attribute f3 text;
>
> myindex is recorded as having no collation dependency, but that is
> now wrong.

Hrmph.  Yeah.  We didn't consider types that change later like this,
and handling those correctly does seem to warrant some more thought
and work than we perhaps have time for.  My first thought is that we'd
need to teach it to trigger reanalysis.




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Peter Geoghegan
On Mon, Apr 19, 2021 at 11:49 AM Tom Lane  wrote:
> I didn't mean to imply that it's necessarily theoretically impossible,
> but given our lack of visibility into what a function or operator
> will do, plus the way that the collation feature was bolted on
> with minimal system-level redesign, it's sure pretty darn hard.
> Code like record_eq is doing a lot at runtime that we can't really
> see from static analysis.

It's worth pointing out that code like record_eq is not (or at least
should not be) fundamentally unpredictable and unruly. The fact that
record_eq does typecache lookups and whatnot seems to me to be an
implementation detail. What record_eq is entitled to assume about
collations could be formalized by some general high-level
specification. It ought to be possible to do this, just as it ought to
be possible for us to statically determine if a composite type is safe
to use with B-Tree deduplication.

Whether or not it's worth the trouble is another matter, but it might
be if a single effort solved a bunch of related problems, not just the
collation dependency problem.

> Anyway, given the ALTER TYPE ADD ATTRIBUTE counterexample, I'm
> definitely starting to lean towards "revert and try again in v15".

The counterexample concerns me because it seems to indicate a lack of
sophistication in how dependencies are managed with corner cases -- I
don't think that it's okay to leave the behavior unspecified in a
stable release. But I also think that we should consider if code like
record_eq is in fact the real problem (or just the lack of any general
specification that constrains code like it in useful ways, perhaps).
This probably won't affect whether or not the patch gets reverted now,
but it still matters.

-- 
Peter Geoghegan




Re: proposal - log_full_scan

2021-04-19 Thread Pavel Stehule
ne 18. 4. 2021 v 16:09 odesílatel Pavel Stehule 
napsal:

>
>
> ne 18. 4. 2021 v 14:28 odesílatel Julien Rouhaud 
> napsal:
>
>> On Sun, Apr 18, 2021 at 06:21:56AM +0200, Pavel Stehule wrote:
>> >
>> > The extension like pg_qualstat is good, but it does different work.
>>
>> Yes definitely.  It was just an idea if you needed something right now
>> that
>> could more or less do what you needed, not saying that we shouldn't
>> improve the
>> core :)
>>
>> > In
>> > complex applications I need to detect buggy (forgotten) queries - last
>> week
>> > I found two queries over bigger tables without predicates. So the
>> qualstat
>> > doesn't help me.
>>
>> Also not totally helpful but powa was created to detect problematic
>> queries in
>> such cases.  It wouldn't say if it's because of a seq scan or not (so yes
>> again
>> we need to improve that), but it would give you the slowest (or top
>> consumer
>> for any resource) for a given time interval.
>>
>> > This is an application for a government with few (but for
>> > government typical) specific: 1) the life cycle is short (one month), 2)
>> > there is not slow start - from first moment the application will be
>> used by
>> > more hundred thousands people, 3) the application is very public - so
>> any
>> > issues are very interesting for press and very unpleasant for politics,
>> and
>> > in next step for all suppliers (there are high penalty for failures),
>> and
>> > an admins are not happy from external extensions, 4) the budget is not
>> too
>> > big - there is not any performance testing environment
>> >
>> > First stages are covered well today. We can log and process very slow
>> > queries, and fix it immediately - with CREATE INDEX CONCURRENTLY I can
>> do
>> > it well on production servers too without high risk.
>> >
>> > But the detection of some bad not too slow queries is hard. And as an
>> > external consultant I am not able to install any external extensions to
>> the
>> > production environment for fixing some hot issues, The risk is not
>> > acceptable for project managers and I understand. So I have to use only
>> > tools available in Postgres.
>>
>> Yes I agree that having additional and more specialized tool in core
>> postgres
>> would definitely help in similar scenario.
>>
>> I think that having some kind of threshold for seq scan (like the
>> mentioned
>> auto_explain.log_seqscan = XXX) in auto_explain would be the best
>> approach, as
>> you really need the plan to know why a seq scan was chosen and if it was a
>> reasonable choice or not.
>>
>
> I would like to write this for core and for auto_explain too. I was in a
> situation when I hadnot used auto_explain too. Although this extension is
> widely used and then the risk is low.
>
> When I detect the query, then I can run the explanation manually. But sure
> I think so it can work well inside auto_explain
>

I tried to write the patch. It is harder work for core, than I expected,
because there is not any good information if the query is top or not, so it
is not easy to decide if we should log info or not.

On second hand, the modification of auto_explain is easy.

I am sending the patch

Regards

Pavel


> Regards
>
> Pavel
>
>
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 445bb37191..5754e25594 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -17,8 +17,11 @@
 #include "access/parallel.h"
 #include "commands/explain.h"
 #include "executor/instrument.h"
+#include "nodes/nodeFuncs.h"
+#include "parser/parsetree.h"
 #include "jit/jit.h"
 #include "utils/guc.h"
+#include "utils/lsyscache.h"
 
 PG_MODULE_MAGIC;
 
@@ -35,6 +38,7 @@ static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
 static double auto_explain_sample_rate = 1;
+static int auto_explain_log_seqscan = -1;	/* tuples */
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -65,7 +69,7 @@ static int	nesting_level = 0;
 static bool current_query_sampled = false;
 
 #define auto_explain_enabled() \
-	(auto_explain_log_min_duration >= 0 && \
+	((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan != -1) && \
 	 (nesting_level == 0 || auto_explain_log_nested_statements) && \
 	 current_query_sampled)
 
@@ -105,6 +109,18 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomIntVariable("auto_explain.log_seqscan",
+			"Sets the minimum tuples produced by sequantial scans which plans will be logged",
+			"Zero prints all plans that contains seqscan node. -1 turns this feature off.",
+			&auto_explain_log_seqscan,
+			-1,
+			-1, INT_MAX,
+			PGC_SUSET,
+			0,
+			NULL,
+			NULL,
+			NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_analyze",
 			 "Use EXPLAIN ANALYZE for plan logging.",
 			 NULL,
@@ -274,7 +290,9 @@ explai

Synchronous commit behavior during network outage

2021-04-19 Thread Ondřej Žižka

Hello all,
I would like to know your opinion on the following behaviour I see for 
PostgreSQL setup with synchronous replication.


This behaviour happens in a special use case. In this use case, there 
are 2 synchronous replicas with the following config (truncated):


- 2 nodes
- synchronous_standby_names='*'
- synchronous_commit=remote_apply


With this setup run the following steps (LAN down - LAN between master 
and replica):

-
postgres=# truncate table a;
TRUNCATE TABLE
postgres=# insert into a values (1); -- LAN up, insert has been applied 
to replica.

INSERT 0 1
Vypnu LAN na serveru se standby:
postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press CTRL+C)

^CCancel request sent
*WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

*INSERT 0 1
There will be warning that commit was performed only locally:
2021-04-12 19:55:53.063 CEST [26104] WARNING:  canceling wait for 
synchronous replication due to user request
2021-04-12 19:55:53.063 CEST [26104] DETAIL:  The transaction has 
already committed locally, but might not have been replicated to the 
standby.


postgres=# insert into a values (2); --LAN down, waiting for a 
confirmation from sync replica. In this situation cancel it (press CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

INSERT 0 1
postgres=# insert into a values (2); --LAN down, waiting for sync 
replica, second attempt, cancel it as well (CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

INSERT 0 1
postgres=# update a set n=3 where n=2; --LAN down, waiting for sync 
replica, cancel it (CTRL+C)

^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not 
have been replicated to the standby.

UPDATE 2
postgres=# update a set n=3 where n=2; -- run the same update,because 
data from the previous attempt was commited on master, it is sucessfull, 
but no changes

UPDATE 0
postgres=# select * from a;
 n
---
 1
 3
 3
(3 rows)
postgres=#


Now, there is only value 1 in the sync replica table (no other values), 
data is not in sync. This is expected, after the LAN restore, data will 
come sync again, but if the main/primary node will fail and we failover 
to replica before the LAN is back up or the storage for this node would 
be destroyed and data would not sync to replica before it, we will lose 
data even if the client received successful commit (with a warning).
From the synchronous_commit=remote_write level and "higher", I would 
expect, that when the remote application (doesn't matter if flush, write 
or apply) would not be applied I would not receive a confirmation about 
the commit (even with a warning). Something like, if there is no commit 
from sync replica, there is no commit on primary and if someone performs 
the steps above, the whole transaction will not send a confirmation.


This can cause issues if the application receives a confirmation about 
the success and performs some follow-up steps e.g. create a user account 
and sends a request to the mail system to create an account or create a 
VPN account. If the scenario above happens, there can exist a VPN 
account that does not have any presence in the central database and can 
be a security issue.


I hope I explained it sufficiently. :-)

Do you think, that would be possible to implement a process that would 
solve this use case?


Thank you
Ondrej



Re: pg_amcheck option to install extension

2021-04-19 Thread Andrew Dunstan


On 4/19/21 1:25 PM, Mark Dilger wrote:
>
>> On Apr 19, 2021, at 9:53 AM, Tom Lane  wrote:
>>
>> Andrew Dunstan  writes:
>>> OK, so let's fix it. If amcheck is going to stay in contrib then ISTM
>>> pg_amcheck should move there. I can organize that if there's agreement.
>>> Or else let's move amcheck as Alvaro suggests.
>> FWIW, I think that putting them both in contrib makes the most
>> sense from a structural standpoint.
> That was my original thought also, largely from a package management 
> perspective.  Just as an example, postgresql-client and postgresql-contrib 
> are separate rpms.  There isn't much point to having pg_amcheck installed as 
> part of the postgresql-client package while having amcheck in the 
> postgresql-contrib package which might not be installed.
>
> A counter argument is that amcheck is server side, and pg_amcheck is client 
> side.  Having pg_amcheck installed on a system makes sense if you are 
> connecting to a server on a different system.


There are at least two other client side programs in contrib. So this
argument doesn't quite hold water from a consistency POV.



>
>> On Mar 11, 2021, at 12:12 AM, Peter Eisentraut 
>>  wrote:
>>
>> I want to register, if we are going to add this, it ought to be in src/bin/. 
>>  If we think it's a useful tool, it should be there with all the other 
>> useful tools.
>>
>> I realize there is a dependency on a module in contrib, and it's probably 
>> now not the time to re-debate reorganizing contrib.  But if we ever get to 
>> that, this program should be the prime example why the current organization 
>> is problematic, and we should be prepared to make the necessary moves then.
> This was the request that motivated the move to src/bin.
>


I missed that, so I guess maybe I can't complain too loudly. But if I'd
seen it I would have disagreed. :-)


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Apr 19, 2021 at 10:53 AM Tom Lane  wrote:
>> I think that the real fundamental bug is supposing that static analysis
>> can give 100% correct answers.

> Is it really the case that static analysis of the kind that you'd need
> to make this 100% robust is fundamentally impossible? I find that
> proposition hard to believe.

I didn't mean to imply that it's necessarily theoretically impossible,
but given our lack of visibility into what a function or operator
will do, plus the way that the collation feature was bolted on
with minimal system-level redesign, it's sure pretty darn hard.
Code like record_eq is doing a lot at runtime that we can't really
see from static analysis.

Anyway, given the ALTER TYPE ADD ATTRIBUTE counterexample, I'm
definitely starting to lean towards "revert and try again in v15".
I feel we'd be best off to consider functions/operators that
operate on container types to be "maybe"s rather than certainly
safe or certainly not safe.  I think that such things appear
sufficiently rarely in index specifications that it's not worth it
to try to do an exact analysis of them, even if we were sure we
could get that 100% right.  But that doesn't seem to be an idea that
can trivially be added to the current design.

regards, tom lane




Re: multi-install PostgresNode fails with older postgres versions

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 10:50 AM, Jehan-Guillaume de Rorthais  
> wrote:
> 
>> The community needs a single shared PostgresNode implementation that can be
>> used by scripts which reproduce bugs.
> 
> Which means it could be OK to have a PostgresNode implementation, leaving in
> core source-tree, which supports broader needs than the core ones (older
> versions and some more methods)? Did I understood correctly?

Yes, I believe it should be in core.

I don't know about "some more methods", as it depends which methods you are 
proposing.

> If this is correct, I suppose this effort could be committed early in v15 
> cycle?

I don't care to speculate on that yet.

> Does it deserve some effort to build some dedicated TAP tests for these
> modules? I already have a small patch for this waiting on my disk for some 
> more
> tests and review...

I did that, too, in the 0002 version of my patch.  Perhaps we need to merge 
your work and mine.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Windows default locale vs initdb

2021-04-19 Thread Peter Eisentraut

On 19.04.21 07:42, Thomas Munro wrote:

It looks
like one possibility, since Vista, is to call
GetUserDefaultLocaleName()[2], which doesn't appear to have been
discussed before on this list.  That doesn't allow you to ask for the
default for each individual category, but I don't know if that is even
a concept for Windows user settings.


pg_newlocale_from_collation() doesn't support collcollate != collctype 
on Windows anyway, so that wouldn't be an issue.





Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Peter Geoghegan
On Sun, Apr 18, 2021 at 4:23 AM Julien Rouhaud  wrote:
> So IIUC the issue here is that the code could previously record useless
> collation version dependencies in somes cases, which could lead to false
> positive possible corruption messages (and of course additional bloat on
> pg_depend).  False positive messages can't be avoided anyway, as a collation
> version update may not corrupt the actually indexed set of data, especially 
> for
> glibc.

This argument seems completely absurd to me.

-- 
Peter Geoghegan




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Peter Geoghegan
On Mon, Apr 19, 2021 at 10:53 AM Tom Lane  wrote:
> I think that the real fundamental bug is supposing that static analysis
> can give 100% correct answers.  Even if it did do so in a given state
> of the database, consider this counterexample:
>
> create type myrow as (f1 int, f2 int);
> create table mytable (id bigint, r1 myrow, r2 myrow);
> create index myindex on mytable(id) where r1 < r2;
> alter type myrow add attribute f3 text;
>
> myindex is recorded as having no collation dependency, but that is
> now wrong.

Is it really the case that static analysis of the kind that you'd need
to make this 100% robust is fundamentally impossible? I find that
proposition hard to believe.

I'm not sure that you were making a totally general statement, rather
than a statement about the patch/implementation, so perhaps I just
missed the point.

-- 
Peter Geoghegan




when the startup process doesn't

2021-04-19 Thread Robert Haas
Hi,

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

LOG:  still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG:  data directory sync complete after %ld.%03d ms
LOG:  still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG:  unlogged relations reset after %ld.%03d ms
LOG:  still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X

We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Tom Lane
Andres Freund  writes:
> On 2021-04-18 11:29:42 -0400, Tom Lane wrote:
>> I'm not sure that an error in this direction is all that much more
>> problematic than the other direction.  If it's okay to claim that
>> indexes need to be rebuilt when they don't really, then we could just
>> drop this entire overcomplicated infrastructure and report that all
>> indexes need to be rebuilt after any collation version change.

> That doesn't ring true to me. There's a huge difference between needing
> to rebuild all indexes, especially primary key indexes which often are
> over int8 etc, and unnecessarily needing to rebuild indexes doing
> comparatively rare things.

It would not be that hard to exclude indexes on int8, or other cases
that clearly have zero collation dependencies.  And I think I might
have some faith in such a solution.  Right now I have zero faith
that the patch as it stands gives trustworthy answers.

I think that the real fundamental bug is supposing that static analysis
can give 100% correct answers.  Even if it did do so in a given state
of the database, consider this counterexample:

create type myrow as (f1 int, f2 int);
create table mytable (id bigint, r1 myrow, r2 myrow);
create index myindex on mytable(id) where r1 < r2;
alter type myrow add attribute f3 text;

myindex is recorded as having no collation dependency, but that is
now wrong.

regards, tom lane




Re: multi-install PostgresNode fails with older postgres versions

2021-04-19 Thread Jehan-Guillaume de Rorthais
On Mon, 19 Apr 2021 10:35:39 -0700
Mark Dilger  wrote:

> > On Apr 19, 2021, at 10:25 AM, Jehan-Guillaume de Rorthais 
> > wrote:
> > 
> > On Mon, 19 Apr 2021 12:37:08 -0400
> > Andrew Dunstan  wrote:
> >   
> >> 
> >> On 4/19/21 10:43 AM, Mark Dilger wrote:  
> >>>   
>  On Apr 19, 2021, at 5:11 AM, Andrew Dunstan  wrote:
>  
>  I think therefore I'm inclined for now to do nothing for old version
>  compatibility.  
> >>> I agree with waiting until the v15 development cycle.
> >>>   
>  I would commit the fix for the IPC::Run caching glitch,
>  and version detection  
> >>> Thank you.
> >>>   
>  I would add a warning if the module is used with
>  a version <= 11.  
> >>> Sounds fine for now.
> >>>   
>  The original goal of these changes was to allow testing of combinations
>  of different builds with openssl and nss, which doesn't involve old
>  version compatibility.  
> >>> Hmm.  I think different folks had different goals.  My personal interest
> >>> is to write automated tests which spin up older servers, create data that
> >>> cannot be created on newer servers (such as heap tuples with HEAP_MOVED_IN
> >>> or HEAP_MOVED_OFF bits set), upgrade, and test that new code handles the
> >>> old data correctly.  I think this is not only useful for our test suites
> >>> as a community, but is also useful for companies providing support
> >>> services who need to reproduce problems that customers are having on
> >>> clusters that have been pg_upgraded across large numbers of postgres
> >>> versions. 
>  As far as I know, without any compatibility changes the module is fully
>  compatible with releases 13 and 12, and with releases 11 and 10 so long
>  as you don't want a standby, and with releases 9.6 and 9.5 if you also
>  don't want a backup. That makes it suitable for a lot of testing without
>  any attempt at version compatibility.
>  
>  We can revisit compatibility further in the next release.  
> >>> Sounds good.  
> >> 
> >> 
> >> I'll work on this. Meanwhile FTR here's my latest revision - it's a lot
> >> less invasive of the main module, so it seems much more palatable to me,
> >> and still passes my test down to 7.2.  
> > 
> > I spend a fair bit of time to wonder how useful it could be to either
> > maintain such a module in core, including for external needs, or creating a
> > separate external project with a different release/distribution/packaging
> > policy.
> > 
> > Wherever the module is maintained, the goal would be to address broader
> > needs, eg. adding a switch_wal() method or wait_for_archive(), supporting
> > replication, backups, etc for multi-old-deprecated-PostgreSQL versions.
> >
> > To be honest I have mixed feelings. I feel this burden shouldn't be carried
> > by the core, which has restricted needs compared to external projects. In
> > the opposite, maintaining an external project which shares 90% of the code
> > seems to be a useless duplicate and backport effort. Moreover Craig Ringer
> > already opened the door for external use of PostgresNode with his effort to
> > install/package it, see:
> > https://www.postgresql.org/message-id/CAGRY4nxxKSFJEgVAv5YAk%3DbqULtFmNw7gEJef0CCgzpNy6O%3D-w%40mail.gmail.com
> > 
> > Thoughts?  
> 
> The community needs a single shared PostgresNode implementation that can be
> used by scripts which reproduce bugs.

Which means it could be OK to have a PostgresNode implementation, leaving in
core source-tree, which supports broader needs than the core ones (older
versions and some more methods)? Did I understood correctly?

If this is correct, I suppose this effort could be committed early in v15 cycle?

Does it deserve some effort to build some dedicated TAP tests for these
modules? I already have a small patch for this waiting on my disk for some more
tests and review...

Regards




Re: terminate called after throwing an instance of 'std::bad_alloc'

2021-04-19 Thread Justin Pryzby
On Mon, Apr 19, 2021 at 09:41:30AM -0700, Andres Freund wrote:
> On 2021-04-17 19:13:24 -0500, Justin Pryzby wrote:
> > I'm now realizing that that's RAM use for a single query, not from 
> > continuous
> > leaks across multiple queries.
> 
> What's the memory usage with inlining disabled, and whats the usage
> without jit?

PGOPTIONS="-c jit=off -c jit_inline_above_cost=0 -c jit_above_cost=0 -c 
client_min_messages=debug -c log_executor_stats=on" python3 `pwd`/jitleak.py
!   [5.272604 s user, 0.329912 s system total]
!   120948 kB max resident size

PGOPTIONS="-c jit=on -c jit_inline_above_cost=0 -c jit_above_cost=0 -c 
client_min_messages=debug -c log_executor_stats=on" python3 `pwd`/jitleak.py
DEBUG:  time to inline: 0.003s, opt: 0.000s, emit: 0.003s
... repeated many times ...
!   [585.090087 s user, 31.835228 s system total]
!   4639556 kB max resident size

PGOPTIONS="-c jit=on -c jit_inline_above_cost=-1 -c jit_above_cost=0 -c 
client_min_messages=debug -c log_executor_stats=on" python3 `pwd`/jitleak.py
DEBUG:  time to inline: 0.000s, opt: 0.000s, emit: 0.003s
... repeated many times ... which is confusing, since inlining is disabled ...
!   [318.556489 s user, 8.413007 s system total]
!   159436 kB max resident size

> > Maybe we're missing this condition somehow ?
> > if (llvm_jit_context_in_use_count == 0 &&
> 
> Do you utilize cursors or such?

Not in this backend/process.
I believe postgis/mapserver uses cursors in a separate process, though...

Our report runs like:
SELECT large_object_oid FROM ... WHERE name=$1 AND user=$2 ... FOR UPDATE -- 
our reports are stored as LOs and read by readlo
begin;
SET enable_nestloop=off; -- I'm sorry to say but nothing else has avoids 
occasional pathological query plans
SELECT report text...;
rollback;

> > Also, I just hit this assertion by cancelling the query with ^C / sigint.  
> > But
> > I don't have a reprodcer for it.
> >
> > < 2021-04-17 19:14:23.509 ADT telsasoft >PANIC:  LLVMJitContext in use 
> > count not 0 at exit (is 1)
> 
> Hm, I'll try to see how that happens - if not, do you have a backtrace?

This may be because I backpatched to v13, which had one conflict (0a2bc5d61).
Maybe this is relevant, too:  bab150045

If you could provide a patch for v13, it's a good bet there's no issue that I
didn't cause.

I'm able to reproduce the crash like this, with my patch to v13, but not your
patch on master.

python3 -c "import pg; db=pg.DB('dbname=postgres'); db.query('SET 
jit_above_cost=0; SET jit_inline_above_cost=0; SET jit=on; SET 
client_min_messages=debug'); db.query('begin'); db.query_formatted('SELECT 1 
FROM generate_series(1,99)a WHERE a=%s', [1], inline=False);"

-- 
Justin




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-19 Thread Andres Freund
Hi,

On 2021-04-18 11:29:42 -0400, Tom Lane wrote:
> I'm not sure that an error in this direction is all that much more
> problematic than the other direction.  If it's okay to claim that
> indexes need to be rebuilt when they don't really, then we could just
> drop this entire overcomplicated infrastructure and report that all
> indexes need to be rebuilt after any collation version change.

That doesn't ring true to me. There's a huge difference between needing
to rebuild all indexes, especially primary key indexes which often are
over int8 etc, and unnecessarily needing to rebuild indexes doing
comparatively rare things.

Greetings,

Andres Freund




Re: multi-install PostgresNode fails with older postgres versions

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 10:25 AM, Jehan-Guillaume de Rorthais  
> wrote:
> 
> On Mon, 19 Apr 2021 12:37:08 -0400
> Andrew Dunstan  wrote:
> 
>> 
>> On 4/19/21 10:43 AM, Mark Dilger wrote:
>>> 
 On Apr 19, 2021, at 5:11 AM, Andrew Dunstan  wrote:
 
 I think therefore I'm inclined for now to do nothing for old version
 compatibility.
>>> I agree with waiting until the v15 development cycle.
>>> 
 I would commit the fix for the IPC::Run caching glitch,
 and version detection
>>> Thank you.
>>> 
 I would add a warning if the module is used with
 a version <= 11.
>>> Sounds fine for now.
>>> 
 The original goal of these changes was to allow testing of combinations
 of different builds with openssl and nss, which doesn't involve old
 version compatibility.
>>> Hmm.  I think different folks had different goals.  My personal interest is
>>> to write automated tests which spin up older servers, create data that
>>> cannot be created on newer servers (such as heap tuples with HEAP_MOVED_IN
>>> or HEAP_MOVED_OFF bits set), upgrade, and test that new code handles the
>>> old data correctly.  I think this is not only useful for our test suites as
>>> a community, but is also useful for companies providing support services
>>> who need to reproduce problems that customers are having on clusters that
>>> have been pg_upgraded across large numbers of postgres versions.
>>> 
 As far as I know, without any compatibility changes the module is fully
 compatible with releases 13 and 12, and with releases 11 and 10 so long
 as you don't want a standby, and with releases 9.6 and 9.5 if you also
 don't want a backup. That makes it suitable for a lot of testing without
 any attempt at version compatibility.
 
 We can revisit compatibility further in the next release.
>>> Sounds good.
>> 
>> 
>> I'll work on this. Meanwhile FTR here's my latest revision - it's a lot
>> less invasive of the main module, so it seems much more palatable to me,
>> and still passes my test down to 7.2.
> 
> I spend a fair bit of time to wonder how useful it could be to either maintain
> such a module in core, including for external needs, or creating a separate
> external project with a different release/distribution/packaging policy.
> 
> Wherever the module is maintained, the goal would be to address broader
> needs, eg. adding a switch_wal() method or wait_for_archive(), supporting
> replication, backups, etc for multi-old-deprecated-PostgreSQL versions.
> 
> To be honest I have mixed feelings. I feel this burden shouldn't be carried
> by the core, which has restricted needs compared to external projects. In the
> opposite, maintaining an external project which shares 90% of the code seems 
> to
> be a useless duplicate and backport effort. Moreover Craig Ringer already 
> opened
> the door for external use of PostgresNode with his effort to install/package 
> it,
> see:
> https://www.postgresql.org/message-id/CAGRY4nxxKSFJEgVAv5YAk%3DbqULtFmNw7gEJef0CCgzpNy6O%3D-w%40mail.gmail.com
> 
> Thoughts?

The community needs a single shared PostgresNode implementation that can be 
used by scripts which reproduce bugs.  For bugs that can only be triggered by 
cross version upgrades, the scripts need a PostgresNode implementation which 
can work across versions.  Likewise for bugs that can only be triggered when 
client applications connect to servers of a different version.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: multi-install PostgresNode fails with older postgres versions

2021-04-19 Thread Jehan-Guillaume de Rorthais
On Mon, 19 Apr 2021 12:37:08 -0400
Andrew Dunstan  wrote:

> 
> On 4/19/21 10:43 AM, Mark Dilger wrote:
> >
> >> On Apr 19, 2021, at 5:11 AM, Andrew Dunstan  wrote:
> >>
> >> I think therefore I'm inclined for now to do nothing for old version
> >> compatibility.
> > I agree with waiting until the v15 development cycle.
> >
> >> I would commit the fix for the IPC::Run caching glitch,
> >> and version detection
> > Thank you.
> >
> >> I would add a warning if the module is used with
> >> a version <= 11.
> > Sounds fine for now.
> >
> >> The original goal of these changes was to allow testing of combinations
> >> of different builds with openssl and nss, which doesn't involve old
> >> version compatibility.
> > Hmm.  I think different folks had different goals.  My personal interest is
> > to write automated tests which spin up older servers, create data that
> > cannot be created on newer servers (such as heap tuples with HEAP_MOVED_IN
> > or HEAP_MOVED_OFF bits set), upgrade, and test that new code handles the
> > old data correctly.  I think this is not only useful for our test suites as
> > a community, but is also useful for companies providing support services
> > who need to reproduce problems that customers are having on clusters that
> > have been pg_upgraded across large numbers of postgres versions.
> >
> >> As far as I know, without any compatibility changes the module is fully
> >> compatible with releases 13 and 12, and with releases 11 and 10 so long
> >> as you don't want a standby, and with releases 9.6 and 9.5 if you also
> >> don't want a backup. That makes it suitable for a lot of testing without
> >> any attempt at version compatibility.
> >>
> >> We can revisit compatibility further in the next release.
> > Sounds good.
> 
> 
> I'll work on this. Meanwhile FTR here's my latest revision - it's a lot
> less invasive of the main module, so it seems much more palatable to me,
> and still passes my test down to 7.2.

I spend a fair bit of time to wonder how useful it could be to either maintain
such a module in core, including for external needs, or creating a separate
external project with a different release/distribution/packaging policy.

Wherever the module is maintained, the goal would be to address broader
needs, eg. adding a switch_wal() method or wait_for_archive(), supporting
replication, backups, etc for multi-old-deprecated-PostgreSQL versions.

To be honest I have mixed feelings. I feel this burden shouldn't be carried
by the core, which has restricted needs compared to external projects. In the
opposite, maintaining an external project which shares 90% of the code seems to
be a useless duplicate and backport effort. Moreover Craig Ringer already opened
the door for external use of PostgresNode with his effort to install/package it,
see:
https://www.postgresql.org/message-id/CAGRY4nxxKSFJEgVAv5YAk%3DbqULtFmNw7gEJef0CCgzpNy6O%3D-w%40mail.gmail.com

Thoughts?




Re: pg_amcheck option to install extension

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 9:53 AM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
>> OK, so let's fix it. If amcheck is going to stay in contrib then ISTM
>> pg_amcheck should move there. I can organize that if there's agreement.
>> Or else let's move amcheck as Alvaro suggests.
> 
> FWIW, I think that putting them both in contrib makes the most
> sense from a structural standpoint.

That was my original thought also, largely from a package management 
perspective.  Just as an example, postgresql-client and postgresql-contrib are 
separate rpms.  There isn't much point to having pg_amcheck installed as part 
of the postgresql-client package while having amcheck in the postgresql-contrib 
package which might not be installed.

A counter argument is that amcheck is server side, and pg_amcheck is client 
side.  Having pg_amcheck installed on a system makes sense if you are 
connecting to a server on a different system.

> On Mar 11, 2021, at 12:12 AM, Peter Eisentraut 
>  wrote:
> 
> I want to register, if we are going to add this, it ought to be in src/bin/.  
> If we think it's a useful tool, it should be there with all the other useful 
> tools.
> 
> I realize there is a dependency on a module in contrib, and it's probably now 
> not the time to re-debate reorganizing contrib.  But if we ever get to that, 
> this program should be the prime example why the current organization is 
> problematic, and we should be prepared to make the necessary moves then.

This was the request that motivated the move to src/bin.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

2021-04-19 Thread Robert Haas
On Mon, Apr 19, 2021 at 1:03 PM Tom Lane  wrote:
> That doco is explaining the users-eye view of it.  Places addressed
> to datatype developers, such as the CREATE TYPE reference page, see
> it a bit differently.  CREATE TYPE for instance points out that
>
> All storage values other than plain imply that the functions of the
> data type can handle values that have been toasted, as described in ...

Interesting. It feels to me like SET STORAGE PLAIN feels like it is
really trying to be two different things. Either you want to inhibit
compression and external storage for performance reasons, or your data
type can't support either one. Maybe we should separate those
concepts, since there's no mode right now that says "don't ever
compress, and externalize only if there's absolutely no other way,"
and there's no way to disable compression and externalization without
also killing off short headers. :-(

> The notion that short header doesn't cost anything seems extremely 
> Intel-centric to me.

I don't think so. It's true that Intel is very forgiving about
unaligned accesses compared to some other architectures, but I think
if you have a terabyte of data, you want it to fit into as few disk
pages as possible pretty much no matter what architecture you're
using. The dominant costs are going to be the I/O costs, not the CPU
costs of dealing with unaligned bytes. In fact, even if you have a
gigabyte of data, I bet it's *still* better to use a more compact
on-disk representation. Now, the dominant cost is going to be pumping
the data through the L3 CPU cache, which is still - I think - going to
be quite a lot more important than the CPU costs of dealing with
unaligned bytes. The CPU bus is an I/O bottleneck not unlike the disk
itself, just at a higher rate of speed which is still way slower than
the CPU speed. Now if you have a megabyte of data, or better yet a
kilobyte of data, then I think optimizing for CPU efficiency may well
be the right thing to do. I don't know how much 4-byte varlena headers
really save there, but if I were designing a storage representation
for very small data sets, I'd definitely be thinking about how I could
waste space to shave cycles.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

2021-04-19 Thread Tom Lane
Robert Haas  writes:
> On Mon, Apr 19, 2021 at 12:13 PM Tom Lane  wrote:
>> Au contraire.  The reason that mode exists at all (for varlena types)
>> is to support data types that haven't been updated for TOAST.

> This kind of begs the question of whether you have the right idea
> about what PLAIN has always been understood to mean, and whether
> everyone understands it the same way. I formed my understanding of
> what PLAIN is understood to mean by reading the ALTER TABLE .. SET
> STORAGE documentation, and there's no real hint in there that this is
> some kind of backward-compatibility only feature.

That doco is explaining the users-eye view of it.  Places addressed
to datatype developers, such as the CREATE TYPE reference page, see
it a bit differently.  CREATE TYPE for instance points out that

All storage values other than plain imply that the functions of the
data type can handle values that have been toasted, as described in ...

> I think if I were designing this from scratch, I'd want one
> switch for whether it's OK to compress, with values meaning "yes,"
> "no," and "only if stored externally," a second switch for the
> *length* at which external storage should be used (so that I can push
> out rarely-used columns at lower size thresholds and commonly-used
> ones at higher thresholds), and a third for what should happen if we
> do the stuff allowed by the first two switches and the tuple still
> doesn't fit, with value meaning "fail" and "externalize anyway".

Yeah, I don't think the existing options for attstorage have much
to recommend them except backwards compatibility.  But if we do
redesign them, I'd still say there should be a way for a data
type to say that it doesn't support these weird header hacks that
we've invented.  The notion that short header doesn't cost anything
seems extremely Intel-centric to me.

regards, tom lane




Re: multi-install PostgresNode fails with older postgres versions

2021-04-19 Thread Jehan-Guillaume de Rorthais
On Mon, 19 Apr 2021 07:43:58 -0700
Mark Dilger  wrote:

> > On Apr 19, 2021, at 5:11 AM, Andrew Dunstan  wrote:
> > 
> > I think therefore I'm inclined for now to do nothing for old version
> > compatibility.  
> 
> I agree with waiting until the v15 development cycle.

Agree.




Re: More info on pg_stat_activity Wait Event Name when is DataFileRead

2021-04-19 Thread Robert Haas
On Mon, Apr 19, 2021 at 12:17 PM PegoraroF10  wrote:
> I´m sure problem was hardware and I hope it does not occur anymore.
> If I have a logical replication and on replica I do a Vacuum Full, Cluster
> or any other EXCLUSIVE LOCK operation which, replication will wait for that.
> I was thinking was about a time to release that lock, or in my situation a
> hardware problem. If N seconds
>  later and that file is not released then change DataFileRead to
> DataFileRead + relfilenode

But how would we implement that with reasonable efficiency? If we
called setitimer() before every read() call to set the timeout, and
then again to clear it after the read(), that would probably be
hideously expensive. Perhaps it would work to have a background
"heartbeat" process that pings every backend in the system every 1s or
something like that, and make the signal handler do this, but that
supposes that the signal handler would have ready access to this
information, which doesn't seem totally straightforward to arrange,
and that it would be OK for the signal handler to grab a lock to
update shared memory, which as things stand today is definitely not
safe.

I am not trying to say that there is no way that something like this
could be made to work. There's probably something that can be done. I
don't think I know what that thing is, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck option to install extension

2021-04-19 Thread Tom Lane
Andrew Dunstan  writes:
> OK, so let's fix it. If amcheck is going to stay in contrib then ISTM
> pg_amcheck should move there. I can organize that if there's agreement.
> Or else let's move amcheck as Alvaro suggests.

FWIW, I think that putting them both in contrib makes the most
sense from a structural standpoint.

Either way, though, you'll still need the proposed option to
let the executable issue a CREATE EXTENSION to get the shlib
loaded.  Unless somebody is proposing that the extension be
installed-by-default like plpgsql, and that I am unequivocally
not for.

regards, tom lane




Re: pg_amcheck option to install extension

2021-04-19 Thread Robert Haas
On Mon, Apr 19, 2021 at 12:37 PM Mark Dilger
 wrote:
> > OK, so let's fix it. If amcheck is going to stay in contrib then ISTM
> > pg_amcheck should move there. I can organize that if there's agreement.
> > Or else let's move amcheck as Alvaro suggests.
>
> Ah, no.  I wrote pg_amcheck in contrib originally, and moved it to src/bin as 
> requested during the v14 development cycle.

Yeah, I am not that excited about moving this again. I realize it was
never committed anywhere else, but it was moved at least one during
development. And I don't see that moving it to contrib really fixes
anything anyway here, except perhaps conceptually. Maybe inventing
src/extensions is the right idea, but there's no real need to do that
at this point in the release cycle, and it doesn't actually fix
anything either. The only thing that's really needed here is to either
(a) teach the test script to install amcheck as a separate step or (b)
teach pg_amcheck to install amcheck in a user-specified schema. If we
do that, AIUI, this issue is fixed regardless of whether we move any
source code around, and if we don't do that, AIUI, this issue is not
fixed no matter how much source code we move.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: 回覆: 回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));

2021-04-19 Thread Tom Lane
Yulin PEI  writes:
> Let's look at your patch:

> ```
> // node is ('1' COLLATE "C")
> // targetType is varbit and it is non-collatable
> if (IsA(node, CollateExpr) && type_is_collatable(targetTypeId))
> {

> // we will not reach here.

That's not the committed patch, though.  I realized after posting
it that it didn't maintain the same behavior in coerce_type as
coerce_to_target_type.  But the actually-committed fix does, and
as I said, what you're suggesting seems equivalent though a bit
messier.

regards, tom lane




Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

2021-04-19 Thread Robert Haas
On Mon, Apr 19, 2021 at 12:13 PM Tom Lane  wrote:
> Au contraire.  The reason that mode exists at all (for varlena types)
> is to support data types that haven't been updated for TOAST.  Perhaps
> that's now the empty set, but it's not really our job to take away the
> capability.  If you really want MAIN you should say that, not quibble
> that PLAIN doesn't mean what it's always been understood to mean.

This kind of begs the question of whether you have the right idea
about what PLAIN has always been understood to mean, and whether
everyone understands it the same way. I formed my understanding of
what PLAIN is understood to mean by reading the ALTER TABLE .. SET
STORAGE documentation, and there's no real hint in there that this is
some kind of backward-compatibility only feature. Rather, I read that
paragraph to suggest that you can choose between the four options as a
way of getting best performance. Both external storage and compression
are trade-offs: they make the tuples smaller, which can be good for
performance, but they also make the toasted columns more expensive to
access, which can be bad for performance. It seems completely
reasonable to suppose that some workloads may benefit from a
non-default TOAST strategy; e.g. if you often access only a few
columns but scan a lot of rows, toasting wins; if you typically access
every column but only a few rows via an index scan, not toasting wins.
And if that is your idea about what the feature does - an idea that
seems perfectly defensible given what the documentation says about
this - then I think it's going to be surprising to find that it also
inhibits 1-byte headers from being used. But, IDK, maybe nobody will
care (or maybe I'm the only one who will be surprised).

Perhaps this whole area needs a broader rethink at some point. I'm
inclined to think that compatibility with varlena data types that
haven't been updated since PostgreSQL 8.3 came out is almost a
non-goal and maybe we ought to just kick such data types and the
associated code paths to the curb. It's unlikely that they get much
testing. On the other hand, perhaps we'd like to have more control
over the decision to compress or store externally than we have
currently. I think if I were designing this from scratch, I'd want one
switch for whether it's OK to compress, with values meaning "yes,"
"no," and "only if stored externally," a second switch for the
*length* at which external storage should be used (so that I can push
out rarely-used columns at lower size thresholds and commonly-used
ones at higher thresholds), and a third for what should happen if we
do the stuff allowed by the first two switches and the tuple still
doesn't fit, with value meaning "fail" and "externalize anyway".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: terminate called after throwing an instance of 'std::bad_alloc'

2021-04-19 Thread Andres Freund
Hi,

On 2021-04-17 19:13:24 -0500, Justin Pryzby wrote:
> I'm now realizing that that's RAM use for a single query, not from continuous
> leaks across multiple queries.

What's the memory usage with inlining disabled, and whats the usage
without jit?


> This is still true with the patch even if I
> #define LLVMJIT_LLVM_CONTEXT_REUSE_MAX 1
>
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 28438 postgres  20   0 3854264   2.8g  29428 R  98.7 33.2   8:56.79 postgres: 
> telsasoft ts 192.168.122.11(53614) BIND
>
> python3 ./jitleak.py # runs telsasoft reports
> INFO:  recreating LLVM context after 2 uses
> INFO:  recreating LLVM context after 2 uses
> INFO:  recreating LLVM context after 2 uses
> INFO:  recreating LLVM context after 2 uses
> INFO:  recreating LLVM context after 2 uses
> PID 27742 finished running report; est=None rows=40745; cols=34; ... 
> duration:538
> INFO:  recreating LLVM context after 81492 uses
>
> I did:
>
> -   llvm_llvm_context_reuse_count = 0;
> Assert(llvm_context != NULL);
> +   elog(INFO, "recreating LLVM context after %zu uses", 
> llvm_llvm_context_reuse_count);
> +   llvm_llvm_context_reuse_count = 0;
>
> Maybe we're missing this condition somehow ?
> if (llvm_jit_context_in_use_count == 0 &&

Do you utilize cursors or such?


> Also, I just hit this assertion by cancelling the query with ^C / sigint.  But
> I don't have a reprodcer for it.
>
> < 2021-04-17 19:14:23.509 ADT telsasoft >PANIC:  LLVMJitContext in use count 
> not 0 at exit (is 1)

Hm, I'll try to see how that happens - if not, do you have a backtrace?

Greetings,

Andres Freund




  1   2   >