Re: create_index test fails when synchronous_commit = off @ master

2022-03-10 Thread Aleksander Alekseev
Hi hackers,

> Here is an updated patch that includes the commit message.

Since this is a trivial change and no one seems to object to it so far, I'm
going to re-check the patch against the recent `master` and change its
status to "Ready for Committer" somewhere next week.

-- 
Best regards,
Aleksander Alekseev


RE: Column Filtering in Logical Replication

2022-03-10 Thread wangw.f...@fujitsu.com
On Fri, Mar 11, 2022 at 9:57 AM Tomas Vondra  
wrote:
>
Hi Tomas,
Thanks for your patches.

On Mon, Mar 9, 2022 at 9:53 PM Tomas Vondra  
wrote:
>On Wed, Mar 9, 2022 at 6:04 PM Amit Kapila  wrote:
>>On Mon, Mar 7, 2022 at 11:18 PM Tomas Vondra  
>>wrote:
>>>On Fri, Mar 4, 2022 at 6:43 PM Amit Kapila  wrote:
> >>> Fetching column filter info in tablesync.c is quite expensive. It
> >>> seems to be using four round-trips to get the complete info whereas
> >>> for row-filter we use just one round trip. I think we should try to
> >>> get both row filter and column filter info in just one round trip.
> >>>
> >>
> >> Maybe, but I really don't think this is an issue.
> >>
> >
> > I am not sure but it might matter for small tables. Leaving aside the
> > performance issue, I think the current way will get the wrong column
> > list in many cases: (a) The ALL TABLES IN SCHEMA case handling won't
> > work for partitioned tables when the partitioned table is part of one
> > schema and partition table is part of another schema. (b) The handling
> > of partition tables in other cases will fetch incorrect lists as it
> > tries to fetch the column list of all the partitions in the hierarchy.
> >
> > One of my colleagues has even tested these cases both for column
> > filters and row filters and we find the behavior of row filter is okay
> > whereas for column filter it uses the wrong column list. We will share
> > the tests and results with you in a later email. We are trying to
> > unify the column filter queries with row filter to make their behavior
> > the same and will share the findings once it is done. I hope if we are
> > able to achieve this that we will reduce the chances of bugs in this
> > area.
> >
> 
> OK, I'll take a look at that email.
I tried to get both the column filters and the row filters with one SQL, but
it failed because I think the result is not easy to parse.

I noted that we use two SQLs to get column filters in the latest
patches(20220311). I think maybe we could use one SQL to get column filters to
reduce network cost. Like the SQL in the attachment.

Regards,
Wang wei


0001-Try-to-get-column-filters-with-one-SQL.patch
Description: 0001-Try-to-get-column-filters-with-one-SQL.patch


Re: BufferAlloc: don't take two simultaneous locks

2022-03-10 Thread Kyotaro Horiguchi
At Fri, 11 Mar 2022 15:30:30 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Thanks!  I looked into dynahash part.
> 
>  struct HASHHDR
>  {
> - /*
> -  * The freelist can become a point of contention in high-concurrency 
> hash
> 
> Why did you move around the freeList?
> 
> 
> - longnentries;   /* number of entries in 
> associated buckets */
> + longnfree;  /* number of free entries in 
> the list */
> + longnalloced;   /* number of entries initially 
> allocated for
> 
> Why do we need nfree?  HASH_ASSING should do the same thing with
> HASH_REMOVE.  Maybe the reason is the code tries to put the detached
> bucket to different free list, but we can just remember the
> freelist_idx for the detached bucket as we do for hashp.  I think that
> should largely reduce the footprint of this patch.
> 
> -static void hdefault(HTAB *hashp);
> +static void hdefault(HTAB *hashp, bool partitioned);
> 
> That optimization may work even a bit, but it is not irrelevant to
> this patch?
> 
> + case HASH_REUSE:
> + if (currBucket != NULL)
> + {
> + /* check there is no unfinished 
> HASH_REUSE+HASH_ASSIGN pair */
> + Assert(DynaHashReuse.hashp == NULL);
> + Assert(DynaHashReuse.element == NULL);
> 
> I think all cases in the switch(action) other than HASH_ASSIGN needs
> this assertion and no need for checking both, maybe only for element
> would be enough.

While I looked buf_table part, I came up with additional comments.

BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id)
{
hash_search_with_hash_value(SharedBufHash,

HASH_ASSIGN,
...
BufTableDelete(BufferTag *tagPtr, uint32 hashcode, bool reuse)

BufTableDelete considers both reuse and !reuse cases but
BufTableInsert doesn't and always does HASH_ASSIGN.  That looks
odd. We should use HASH_ENTER here.  Thus I think it is more
reasonable that HASH_ENTRY uses the stashed entry if exists and
needed, or returns it to freelist if exists but not needed.

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: BufferAlloc: don't take two simultaneous locks

2022-03-10 Thread Kyotaro Horiguchi
At Thu, 03 Mar 2022 01:35:57 +0300, Yura Sokolov  
wrote in 
> В Вт, 01/03/2022 в 10:24 +0300, Yura Sokolov пишет:
> > Ok, here is v4.
> 
> And here is v5.
> 
> First, there was compilation error in Assert in dynahash.c .
> Excuse me for not checking before sending previous version.
> 
> Second, I add third commit that reduces HASHHDR allocation
> size for non-partitioned dynahash:
> - moved freeList to last position
> - alloc and memset offset(HASHHDR, freeList[1]) for
>   non-partitioned hash tables.
> I didn't benchmarked it, but I will be surprised if it
> matters much in performance sence.
> 
> Third, I put all three commits into single file to not
> confuse commitfest application.

Thanks!  I looked into dynahash part.

 struct HASHHDR
 {
-   /*
-* The freelist can become a point of contention in high-concurrency 
hash

Why did you move around the freeList?


-   longnentries;   /* number of entries in 
associated buckets */
+   longnfree;  /* number of free entries in 
the list */
+   longnalloced;   /* number of entries initially 
allocated for

Why do we need nfree?  HASH_ASSING should do the same thing with
HASH_REMOVE.  Maybe the reason is the code tries to put the detached
bucket to different free list, but we can just remember the
freelist_idx for the detached bucket as we do for hashp.  I think that
should largely reduce the footprint of this patch.

-static void hdefault(HTAB *hashp);
+static void hdefault(HTAB *hashp, bool partitioned);

That optimization may work even a bit, but it is not irrelevant to
this patch?

+   case HASH_REUSE:
+   if (currBucket != NULL)
+   {
+   /* check there is no unfinished 
HASH_REUSE+HASH_ASSIGN pair */
+   Assert(DynaHashReuse.hashp == NULL);
+   Assert(DynaHashReuse.element == NULL);

I think all cases in the switch(action) other than HASH_ASSIGN needs
this assertion and no need for checking both, maybe only for element
would be enough.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Changing "Hot Standby" to "hot standby"

2022-03-10 Thread Daniel Westermann (DWE)
>Looks the same as v5 for me, that applies the same consistency rules
>everywhere in the docs.  So applied this one.

Thank you, Michael




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Dilip Kumar
On Thu, Mar 10, 2022 at 10:18 PM Robert Haas  wrote:
>
> On Thu, Mar 10, 2022 at 6:02 AM Dilip Kumar  wrote:
> > I have completely changed the logic for this refactoring.  Basically,
> > write_relmap_file(), is already having parameters to control whether
> > to write wal, send inval and we are already passing the dbpath.
> > Instead of making a new function I just pass one additional parameter
> > to this function itself about whether we are creating a new map or not
> > and I think with that changes are very less and this looks cleaner to
> > me.  Similarly for load_relmap_file() also I just had to pass the
> > dbpath and memory for destination map.  Please have a look and let me
> > know your thoughts.
>
> It's not terrible, but how about something like the attached instead?
> I think this has the effect of reducing the number of cases that the
> low-level code needs to know about from 2 to 1, instead of making it
> go up from 2 to 3.

Yeah this looks cleaner, I will rebase the remaining patch.

> > I think we should also write the test cases for create database
> > strategy.  But I do not see any test case for create database for
> > testing the existing options.  So I am wondering whether we should add
> > the test case only for the new option we are providing or we should
> > create a  separate path which tests the new option as well as the
> > existing options.
>
> FWIW, src/bin/scripts/t/020_createdb.pl does a little bit of testing
> of this kind.

Okay, I think we need to support the strategy in createdb bin as well.
I will do that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Changing "Hot Standby" to "hot standby"

2022-03-10 Thread Michael Paquier
On Thu, Mar 10, 2022 at 05:58:05PM -0500, Robert Treat wrote:
> Not sure why the previous emails didn't go through, and still doesn't
> look like they were picked up. In the interest of progress though,
> attaching an updated patch with some minor wordsmithing; lmk if you'd
> prefer this differently

Looks the same as v5 for me, that applies the same consistency rules
everywhere in the docs.  So applied this one.
--
Michael


signature.asc
Description: PGP signature


Re: WIP: WAL prefetch (another approach)

2022-03-10 Thread Andres Freund



On March 10, 2022 9:31:13 PM PST, Thomas Munro  wrote:
>  The other thing I need to change is that I should turn on
>recovery_prefetch for platforms that support it (ie Linux and maybe
>NetBSD only for now), in the tests.  

Could a setting of "try" make sense?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Add pg_freespacemap extension sql test

2022-03-10 Thread Michael Paquier
On Wed, Mar 09, 2022 at 08:13:15PM +0900, Dong Wook Lee wrote:
> I agree with you, but I have no good idea how to deal with it.

Well, my guess is that you basically just care about being able to
detect if there is free space in the map or not, which goes down to
detecting if pg_freespace() returns 0 or a number strictly higher than
0, so wouldn't it be enough to stick some > 0 in your test queries?
Btw, if you want to test 32-bit builds, gcc allows that by passing
down -m32.

> Can the Perl TAP test be a good way?

That does not seem necessary here.
--
Michael


signature.asc
Description: PGP signature


Re: WIP: WAL prefetch (another approach)

2022-03-10 Thread Thomas Munro
On Fri, Mar 11, 2022 at 6:31 PM Thomas Munro  wrote:
> Thanks for your review of 0001!  It gave me a few things to think
> about and some good improvements.

And just in case it's useful, here's what changed between v21 and v22..
diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
index 86a7b4c5c8..0d0c556b7c 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -90,8 +90,8 @@ XLogReaderSetDecodeBuffer(XLogReaderState *state, void 
*buffer, size_t size)
 
state->decode_buffer = buffer;
state->decode_buffer_size = size;
-   state->decode_buffer_head = buffer;
state->decode_buffer_tail = buffer;
+   state->decode_buffer_head = buffer;
 }
 
 /*
@@ -271,7 +271,7 @@ XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr)
 
 /*
  * See if we can release the last record that was returned by
- * XLogNextRecord(), to free up space.
+ * XLogNextRecord(), if any, to free up space.
  */
 void
 XLogReleasePreviousRecord(XLogReaderState *state)
@@ -283,16 +283,16 @@ XLogReleasePreviousRecord(XLogReaderState *state)
 
/*
 * Remove it from the decoded record queue.  It must be the oldest item
-* decoded, decode_queue_tail.
+* decoded, decode_queue_head.
 */
record = state->record;
-   Assert(record == state->decode_queue_tail);
+   Assert(record == state->decode_queue_head);
state->record = NULL;
-   state->decode_queue_tail = record->next;
+   state->decode_queue_head = record->next;
 
-   /* It might also be the newest item decoded, decode_queue_head. */
-   if (state->decode_queue_head == record)
-   state->decode_queue_head = NULL;
+   /* It might also be the newest item decoded, decode_queue_tail. */
+   if (state->decode_queue_tail == record)
+   state->decode_queue_tail = NULL;
 
/* Release the space. */
if (unlikely(record->oversized))
@@ -302,11 +302,11 @@ XLogReleasePreviousRecord(XLogReaderState *state)
}
else
{
-   /* It must be the tail record in the decode buffer. */
-   Assert(state->decode_buffer_tail == (char *) record);
+   /* It must be the head (oldest) record in the decode buffer. */
+   Assert(state->decode_buffer_head == (char *) record);
 
/*
-* We need to update tail to point to the next record that is 
in the
+* We need to update head to point to the next record that is 
in the
 * decode buffer, if any, being careful to skip oversized ones
 * (they're not in the decode buffer).
 */
@@ -316,8 +316,8 @@ XLogReleasePreviousRecord(XLogReaderState *state)
 
if (record)
{
-   /* Adjust tail to release space up to the next record. 
*/
-   state->decode_buffer_tail = (char *) record;
+   /* Adjust head to release space up to the next record. 
*/
+   state->decode_buffer_head = (char *) record;
}
else
{
@@ -327,8 +327,8 @@ XLogReleasePreviousRecord(XLogReaderState *state)
 * we'll keep overwriting the same piece of memory if 
we're not
 * doing any prefetching.
 */
-   state->decode_buffer_tail = state->decode_buffer;
state->decode_buffer_head = state->decode_buffer;
+   state->decode_buffer_tail = state->decode_buffer;
}
}
 }
@@ -351,7 +351,7 @@ XLogNextRecord(XLogReaderState *state, char **errormsg)
/* Release the last record returned by XLogNextRecord(). */
XLogReleasePreviousRecord(state);
 
-   if (state->decode_queue_tail == NULL)
+   if (state->decode_queue_head == NULL)
{
*errormsg = NULL;
if (state->errormsg_deferred)
@@ -376,7 +376,7 @@ XLogNextRecord(XLogReaderState *state, char **errormsg)
 * XLogRecXXX(xlogreader) macros, which work with the decoder rather 
than
 * the record for historical reasons.
 */
-   state->record = state->decode_queue_tail;
+   state->record = state->decode_queue_head;
 
/*
 * Update the pointers to the beginning and one-past-the-end of this
@@ -428,12 +428,12 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
if (!XLogReaderHasQueuedRecordOrError(state))
XLogReadAhead(state, false /* nonblocking */ );
 
-   /* Consume the tail record or error. */
+   /* Consume the head record or error. */
decoded = XLogNextRecord(state, errormsg);
if (decoded)
{
/*
-* XLogReadRecord() returns a pointer to the record's header, 
not the
+

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Ashutosh Sharma
On Thu, Mar 10, 2022 at 10:18 PM Robert Haas  wrote:
>
> On Thu, Mar 10, 2022 at 6:02 AM Dilip Kumar  wrote:
> > I have completely changed the logic for this refactoring.  Basically,
> > write_relmap_file(), is already having parameters to control whether
> > to write wal, send inval and we are already passing the dbpath.
> > Instead of making a new function I just pass one additional parameter
> > to this function itself about whether we are creating a new map or not
> > and I think with that changes are very less and this looks cleaner to
> > me.  Similarly for load_relmap_file() also I just had to pass the
> > dbpath and memory for destination map.  Please have a look and let me
> > know your thoughts.
>
> It's not terrible, but how about something like the attached instead?
> I think this has the effect of reducing the number of cases that the
> low-level code needs to know about from 2 to 1, instead of making it
> go up from 2 to 3.
>

Looks better, but why do you want to pass elevel to the
load_relmap_file()? Are we calling this function from somewhere other
than read_relmap_file()? If not, do we have any plans to call this
function  directly bypassing read_relmap_file for any upcoming patch?

--
With Regards,
Ashutosh Sharma.




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Ashutosh Sharma
Thanks Dilip for working on the review comments. I'll take a look at
the new version of patch and let you know my comments, if any.

--
With Regards,
Ashutosh Sharma.

On Thu, Mar 10, 2022 at 8:38 PM Dilip Kumar  wrote:
>
> On Thu, Mar 10, 2022 at 7:22 PM Ashutosh Sharma  wrote:
> >
> > Here are some review comments on the latest patch
> > (v11-0004-WAL-logged-CREATE-DATABASE.patch). I actually did the review
> > of the v10 patch but that applies for this latest version as well.
> >
> > +   /* Now errors are fatal ... */
> > +   START_CRIT_SECTION();
> >
> > Did you mean PANIC instead of FATAL?
>
> I think here fatal didn't really mean the error level FATAL, that
> means critical and I have seen it is used in other places also.  But I
> really don't think we need this comments to removed to avoid any
> confusion.
>
> > ==
> >
> > +
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +errmsg("invalid create
> > strategy %s", strategy),
> > +errhint("Valid strategies are
> > \"wal_log\", and \"file_copy\".")));
> > +   }
> >
> >
> > Should this be - "invalid createdb strategy" instead of "invalid
> > create strategy"?
>
> Changed
>
> > ==
> >
> > +   /*
> > +* In case of ALTER DATABASE SET TABLESPACE we don't need 
> > to do
> > +* anything for the object which are not in the source
> > db's default
> > +* tablespace.  The source and destination dboid will be 
> > same in
> > +* case of ALTER DATABASE SET TABLESPACE.
> > +*/
> > +   else if (src_dboid == dst_dboid)
> > +   continue;
> > +   else
> > +   dstrnode.spcNode = srcrnode.spcNode;
> >
> >
> > Is this change still required? Do we support the WAL_COPY strategy for
> > ALTER DATABASE?
>
> Yeah now it is unreachable code so removed.
>
> > ==
> >
> > +   /* Open the source and the destination relation at
> > smgr level. */
> > +   src_smgr = smgropen(srcrnode, InvalidBackendId);
> > +   dst_smgr = smgropen(dstrnode, InvalidBackendId);
> > +
> > +   /* Copy relation storage from source to the destination. */
> > +   CreateAndCopyRelationData(src_smgr, dst_smgr,
> > relinfo->relpersistence);
> >
> > Do we need to do smgropen for destination relfilenode here? Aren't we
> > already doing that inside RelationCreateStorage?
>
> Yeah I have changed the complete logic and removed the smgr_open for
> both source and destination and moved inside
> CreateAndCopyRelationData, please check the updated code.
>
> > ==
> >
> > +   use_wal = XLogIsNeeded() &&
> > +   (relpersistence == RELPERSISTENCE_PERMANENT ||
> > copying_initfork);
> > +
> > +   /* Get number of blocks in the source relation. */
> > +   nblocks = smgrnblocks(src, forkNum);
> >
> > What if number of blocks in a source relation is ZERO? Should we check
> > for that and return immediately. We have already done smgrcreate.
>
> Yeah make sense to optimize, with that we will not have to get the
> buffer strategy so done.
>
> > ==
> >
> > +   /* We don't need to copy the shared objects to the target. */
> > +   if (classForm->reltablespace == GLOBALTABLESPACE_OID)
> > +   return NULL;
> > +
> > +   /*
> > +* If the object doesn't have the storage then nothing to be
> > +* done for that object so just ignore it.
> > +*/
> > +   if (!RELKIND_HAS_STORAGE(classForm->relkind))
> > +   return NULL;
> >
> > We can probably club together above two if-checks.
>
> Done
>
> > ==
> >
> > +  
> > +   strategy
> > +   
> > +
> > + This is used for copying the database directory.  Currently, we 
> > have
> > + two strategies the WAL_LOG and the
> > + FILE_COPY.  If WAL_LOG 
> > strategy
> > + is used then the database will be copied block by block and it 
> > will
> > + also WAL log each copied block.  Otherwise, if FILE_COPY
> >
> > I think we need to mention the default strategy in the documentation page.
>
> Done
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com




Re: wal_compression=zstd

2022-03-10 Thread Justin Pryzby
On Fri, Mar 11, 2022 at 12:23:59PM +0900, Michael Paquier wrote:
> On Wed, Mar 09, 2022 at 07:14:11AM -0600, Justin Pryzby wrote:
> > Anyway there's no compelling reason to not use the default.  If we were to 
> > use
> > a non-default default, we'd have to choose between 1 and 2 (or some negative
> > compression level).  My thinking was that zstd-1 would give the 
> > lowest-hanging
> > fruits for zstd, while minimizing performance tradeoff, since WAL affects
> > interactivity.  But choosing between 1 and 2 seems like bikeshedding.
> 
> Yeah, I have looked again at the patch today, and I saw no reason to
> not apply it to give more options to the user as zstd or lz4 are both
> good in their own ways.  So done, with the default level used.

It's great news - thanks.

-- 
Justin




Re: Column Filtering in Logical Replication

2022-03-10 Thread Tomas Vondra
On 3/11/22 03:46, Amit Kapila wrote:
> On Fri, Mar 11, 2022 at 12:44 AM Tomas Vondra
>  wrote:
>>
>> On 3/10/22 04:09, Amit Kapila wrote:
>>> On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila  wrote:

 On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
  wrote:

> OK, I reworked this to do the same thing as the row filtering patch.
>

 Thanks, I'll check this.

>>>
>>> Some assorted comments:
>>> =
>>> 1. We don't need to send a column list for the old tuple in case of an
>>> update (similar to delete). It is not required to apply a column
>>> filter for those cases because we ensure that RI must be part of the
>>> column list for updates and deletes.
>>
>> I'm not sure which part of the code does this refer to?
>>
> 
> The below part:
> @@ -464,11 +473,11 @@ logicalrep_write_update(StringInfo out,
> TransactionId xid, Relation rel,
>   pq_sendbyte(out, 'O'); /* old tuple follows */
>   else
>   pq_sendbyte(out, 'K'); /* old key follows */
> - logicalrep_write_tuple(out, rel, oldslot, binary);
> + logicalrep_write_tuple(out, rel, oldslot, binary, columns);
>   }
> 
> I think here instead of columns, the patch needs to send NULL as it is
> already doing in logicalrep_write_delete.
> 

Hmmm, yeah. In practice it doesn't really matter, because NULL means
"send all columns" so it actually relaxes the check. But we only send
the RI keys, which is a subset of the column filter. But will fix.

>>> 2.
>>> + /*
>>> + * Check if all columns referenced in the column filter are part of
>>> + * the REPLICA IDENTITY index or not.
>>>
>>> I think this comment is reverse. The rule we follow here is that
>>> attributes that are part of RI must be there in a specified column
>>> list. This is used at two places in the patch.
>>
>> Yeah, you're right. Will fix.
>>
>>> 3. get_rel_sync_entry()
>>> {
>>> /* XXX is there a danger of memory leak here? beware */
>>> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
>>> + for (int i = 0; i < nelems; i++)
>>> ...
>>> }
>>>
>>> Similar to the row filter, I think we need to use
>>> entry->cache_expr_cxt to allocate this. There are other usages of
>>> CacheMemoryContext in this part of the code but I think those need to
>>> be also changed and we can do that as a separate patch. If we do the
>>> suggested change then we don't need to separately free columns.
>>
>> I agree a shorter-lived context would be better than CacheMemoryContext,
>> but "expr" seems to indicate it's for the expression only, so maybe we
>> should rename that.
>>
> 
> Yeah, we can do that. How about rel_entry_cxt or something like that?
> The idea is that eventually, we should move a few other things of
> RelSyncEntry like attrmap where we are using CacheMemoryContext under
> this context.
> 

Yeah, rel_entry_cxt sounds fine I guess ...

>> But do we really want a memory context for every
>> single entry?
>>
> 
> Any other better idea?
> 

No, I think you're right - it'd be hard/impossible to keep track of all
the memory allocated for expression/estate. It'd be fine for the
columns, because that's just a bitmap, but not for the expressions.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_rewind enhancements

2022-03-10 Thread Bharath Rupireddy
On Fri, Mar 4, 2022 at 7:50 PM RKN Sai Krishna
 wrote:
>
> Hi,
>
> While using pg_rewind, I found that it is a bit difficult to use pg_rewind as 
> it seems to copy even the configuration files and also remove some of the 
> files created on the old primary which may not be present on the new primary. 
> Similarly it copies files under the data directory of the new primary which 
> may not be needed or which possibly could be junk files.

It's possible that the postgres vendors can have their own
files/directories in the data directory which they may not want to be
overwritten by the pg_rewind. Also, if the source server is
compromised (somebody put in some junk file) for whatever reasons,
nobody wants those files to pass over to the target server.

> I would propose to have a couple of new command line arguments to pg_rewind. 
> One, a comma separated list of files which should be preserved on the old 
> primary, in other words which shouldn't be overwritten from the new primary.

+1 from my end to have a new pg_rewind option such as --skip-file-list
or --skip-list which is basically a list of files that pg_rewind will
not overwrite in the target directory.

> Second, a comma separated list of files which should be excluded while 
> copying files from the new primary onto the old primary.

I'm not sure how it is different from the above option
--skip-file-list or --skip-list?

Another idea I can think of is to be able to tell pg_rewind "don't
copy/bring in any non-postgres files/directories from source server to
target server". This requires pg_rewind to know what are
postgres/non-postgres files/directories. Probably, we could define a
static list of what a postgres files/directories constitute, but this
creates tight-coupling with the core, say a new directory or
configuration file gets added to the core, this static list in
pg_rewind needs to be updated. Having said that initdb.c already has
this sort of list [1], we need similar kind of structures and probably
another structure for files (postgresql.auto.conf, postgresql.conf,
pg_ident.conf, pg_hba.conf, postmaster.opts, backup_label,
standby.signal, recovery.signal etc.).

Above option seems an overkill, but --skip-file-list or --skip-list is
definitely an improvement IMO.

[1] static const char *const subdirs[] = {
"global",
"pg_wal/archive_status",
"pg_commit_ts",
"pg_dynshmem",

Regards,
Bharath Rupireddy.




Re: Column Filtering in Logical Replication

2022-03-10 Thread Amit Kapila
On Fri, Mar 11, 2022 at 7:26 AM Tomas Vondra
 wrote:
>
> But this does not address tablesync.c :-( That still copies everything,
> because it decides to sync both rels (test_pub_part_1, test_pub_part_2),
> with it's row filter. On older releases this would fail, because we'd
> start two workers:
>

Yeah, this is because of the existing problem where we sync both rels
instead of one. We have fixed some similar existing problems earlier.
Hou-San has reported a similar case in another email [1].

>
> But I find this really weird - I think it's reasonable to expect the
> sync to produce the same result as if the data was inserted and
> replicated, and this just violates that.
>
> Shouldn't tablesync calculate a list of relations in a way that prevents
> such duplicate / overlapping syncs?
>

Yes, I think it is better to fix it separately than to fix it along
with row filter or column filter work.

>
In any case, this sync issue looks
> entirely unrelated to the column filtering patch.
>

Right.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716DC2982CC735FDE388804940B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: wal_compression=zstd

2022-03-10 Thread Michael Paquier
On Wed, Mar 09, 2022 at 07:14:11AM -0600, Justin Pryzby wrote:
> Anyway there's no compelling reason to not use the default.  If we were to use
> a non-default default, we'd have to choose between 1 and 2 (or some negative
> compression level).  My thinking was that zstd-1 would give the lowest-hanging
> fruits for zstd, while minimizing performance tradeoff, since WAL affects
> interactivity.  But choosing between 1 and 2 seems like bikeshedding.

Yeah, I have looked again at the patch today, and I saw no reason to
not apply it to give more options to the user as zstd or lz4 are both
good in their own ways.  So done, with the default level used.
--
Michael


signature.asc
Description: PGP signature


Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-10 Thread Kyotaro Horiguchi
Sorry, some minor non-syntactical corrections.

At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I played with this a bit, and would like to share some thoughts on it.
> 
> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.
> 
> The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> functions, they tell me the beginning of the current insertion point
> in error message.  On the other hand they don't accept the same
> value as end-LSN.  I think it is right that they tell the current
> insertion point and they should take the end-LSN as the LSN to stop
> reading.
> 
> I think pg_get_wal_stats() is worth to have but I think it should be
> implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> about FPI since pg_waldump doesn't but it is internally collected (of
> course!) and easily revealed. If we do that, the
> pg_get_wal_records_stats() would be reduced to the following SQL
> statement
> 
> SELECT resource_manager resmgr,
>count(*) AS N,
>  (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
>  sum(total_length) AS "combined size",
>  (sum(total_length) * 100 / sum(sum(total_length)) OVER 
> tot)::numeric(5,2) AS "%combined size",
>  sum(fpi_len) AS fpilen,
>  (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS 
> "%fpilen"
>  FROM pg_get_wal_records_info('0/100', '0/175DD7f')
>  GROUP by resource_manager
>  WINDOW tot AS ()
>  ORDER BY "combined size" desc;
> 
> The only difference with pg_waldump is the statement above doesn't
> show lines for the resource managers that don't contained in the
> result of pg_get_wal_records_info(). But I don't think that matters.
> 
> 
> Sometimes the field description has very long (28kb long) content. It
> makes the result output almost unreadable and I had a bit hard time
> struggling with the output full of '-'s.  I would like have a default
> limit on the length of such fields that can be long but I'm not sure
> we want that.
> 
> 
- The difference between pg_get_wal_record_info and _records_ other than
- the number of argument is the former accepts incorrect LSNs.

The discussion is somewhat confused after some twists and turns..  It
should be something like the following.

pg_get_wal_record_info and pg_get_wal_records_info are almost same
since the latter can show a single record.  However it is a bit
annoying to do that. Since, other than it doens't accept same LSNs for
start and end, it doesn't show a record when there' no record in the
specfied LSN range.  But I don't think there's no usefulness of the
behavior.

The following works,
 pg_get_wal_record_info('0/100');
 pg_get_wal_records_info('0/100');

but this doesn't
 pg_get_wal_records_info('0/100', '0/100');
> ERROR:  WAL start LSN must be less than end LSN

And the following shows no records.
 pg_get_wal_records_info('0/100', '0/101');
 pg_get_wal_records_info('0/100', '0/128');

But the following works
  pg_get_wal_records_info('0/100', '0/129');
> 0/128 | 0/0  |   0



> So I think we can consolidate the two functions as:
> 
> - pg_get_wal_records_info('0/100');
> 
>   (current behavior) find the first record and show all records
>   thereafter.
> 
> - pg_get_wal_records_info('0/100', '0/100');
> 
>   finds the first record since the start lsn and show it.
> 
> - pg_get_wal_records_info('0/100', '0/130');
> 
>   finds the first record since the start lsn then show records up to
>   the end-lsn.
> 
> 
> And about pg_get_raw_wal_record(). I don't see any use-case of the
> function alone on SQL interface.  Even if we need to inspect broken
> WAL files, it needs profound knowledge of WAL format and tools that
> doesn't work on SQL interface.
> 
> However like pageinspect, if we separate the WAL-record fetching and
> parsing it could be thought as useful.
> 
> pg_get_wal_records_info woule be like:
> 
> SELECT * FROM pg_walinspect_parse(raw)
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));
> 
> And pg_get_wal_stats woule be like:
> 
> SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));


Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Column Filtering in Logical Replication

2022-03-10 Thread Amit Kapila
On Fri, Mar 11, 2022 at 12:44 AM Tomas Vondra
 wrote:
>
> On 3/10/22 04:09, Amit Kapila wrote:
> > On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila  wrote:
> >>
> >> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
> >>  wrote:
> >>
> >>> OK, I reworked this to do the same thing as the row filtering patch.
> >>>
> >>
> >> Thanks, I'll check this.
> >>
> >
> > Some assorted comments:
> > =
> > 1. We don't need to send a column list for the old tuple in case of an
> > update (similar to delete). It is not required to apply a column
> > filter for those cases because we ensure that RI must be part of the
> > column list for updates and deletes.
>
> I'm not sure which part of the code does this refer to?
>

The below part:
@@ -464,11 +473,11 @@ logicalrep_write_update(StringInfo out,
TransactionId xid, Relation rel,
  pq_sendbyte(out, 'O'); /* old tuple follows */
  else
  pq_sendbyte(out, 'K'); /* old key follows */
- logicalrep_write_tuple(out, rel, oldslot, binary);
+ logicalrep_write_tuple(out, rel, oldslot, binary, columns);
  }

I think here instead of columns, the patch needs to send NULL as it is
already doing in logicalrep_write_delete.

> > 2.
> > + /*
> > + * Check if all columns referenced in the column filter are part of
> > + * the REPLICA IDENTITY index or not.
> >
> > I think this comment is reverse. The rule we follow here is that
> > attributes that are part of RI must be there in a specified column
> > list. This is used at two places in the patch.
>
> Yeah, you're right. Will fix.
>
> > 3. get_rel_sync_entry()
> > {
> > /* XXX is there a danger of memory leak here? beware */
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > + for (int i = 0; i < nelems; i++)
> > ...
> > }
> >
> > Similar to the row filter, I think we need to use
> > entry->cache_expr_cxt to allocate this. There are other usages of
> > CacheMemoryContext in this part of the code but I think those need to
> > be also changed and we can do that as a separate patch. If we do the
> > suggested change then we don't need to separately free columns.
>
> I agree a shorter-lived context would be better than CacheMemoryContext,
> but "expr" seems to indicate it's for the expression only, so maybe we
> should rename that.
>

Yeah, we can do that. How about rel_entry_cxt or something like that?
The idea is that eventually, we should move a few other things of
RelSyncEntry like attrmap where we are using CacheMemoryContext under
this context.

> But do we really want a memory context for every
> single entry?
>

Any other better idea?

-- 
With Regards,
Amit Kapila.




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-10 Thread Kyotaro Horiguchi
At Thu, 10 Mar 2022 22:15:42 +0530, Bharath Rupireddy 
 wrote in 
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis  wrote:
> >
> > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> > >
> > > Attaching v6 patch set with above review comments addressed. Please
> > > review it further.
> 
> Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
> upthread [1] which is having more simple-yet-useful-and-effective
> functions.
> 
> > * Don't issue WARNINGs or other messages for ordinary situations, like
> > when pg_get_wal_records_info() hits the end of WAL.
> 
> v7 patch-set [1] has no warnings, but the functions will error out if
> future LSN is specified.
> 
> > * It feels like the APIs that allow waiting for the end of WAL are
> > slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> > behavior? Try to make the API more orthogonal, where a few basic
> > functions can be combined to give you everything you need, rather than
> > specifying extra parameters and issuing WARNINGs. I
> 
> v7 patch-set [1] onwards waiting mode has been removed for all of the
> functions, again to keep things simple-yet-useful-and-effective.
> However, we can always add new pg_walinspect functions that wait for
> future WAL in the next versions once basic stuff gets committed and if
> many users ask for it.
> 
> > * In the docs, include some example output. I don't see any output in
> > the tests, which makes sense because it's mostly non-deterministic, but
> > it would be helpful to see sample output of at least
> > pg_get_wal_records_info().
> 
> +1. Added for pg_get_wal_records_info and pg_get_wal_stats.
> 
> > * Is pg_get_wal_stats() even necessary, or can you get the same
> > information with a query over pg_get_wal_records_info()? For instance,
> > if you want to group by transaction ID rather than rmgr, then
> > pg_get_wal_stats() is useless.
> 
> Yes, you are right pg_get_wal_stats provides WAL stats per resource
> manager which is similar to pg_waldump with --start, --end and --stats
> option. It provides more information than pg_get_wal_records_info and
> is a good way of getting stats than adding more columns to
> pg_get_wal_records_info, calculating percentage in sql and having
> group by clause. IMO, pg_get_wal_stats is more readable and useful.
> 
> > * Would be nice to have a pg_wal_file_is_valid() or similar, which
> > would test that it exists, and the header matches the filename (e.g. if
> > it was recycled but not used, that would count as invalid). I think
> > pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> > even if the file is valid -- for example, if a wal record spans many
> > wal segments, the segments might look invalid because they contain no
> > complete records, but the file itself is still valid and contains valid
> > wal data.
> 
> Actually I haven't tried testing a single WAL record spanning many WAL
> files yet(I'm happy to try it if someone suggests such a use-case). In
> that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
> have a problem because it just gives the next valid LSN and it's
> previous LSN using existing WAL reader API XLogFindNextRecord(). It
> opens up the WAL file segments using (some dots to connect -
> page_read/read_local_xlog_page, WALRead,
> segment_open/wal_segment_open). Thoughts?
> 
> I don't think it's necessary to have a function pg_wal_file_is_valid()
> that given a WAL file name as input checks whether a WAL file exists
> or not, probably not in the core (xlogfuncs.c) too. These kinds of
> functions can open up challenges in terms of user input validation and
> may cause unnecessary problems, please see some related discussion
> [2].
> 
> > * Is there a reason you didn't include the timeline ID in
> > pg_get_wal_records_info()?
> 
> I'm right now allowing the functions to read WAL from the current
> server's timeline which I have mentioned in the docs. The server's
> current timeline is available via pg_control_checkpoint()'s
> timeline_id. So, having timeline_id as a column doesn't make sense.
> Again this is to keep things simple-yet-useful-and-effective. However,
> we can add new pg_walinspect functions to read WAL from historic as
> well as current timelines in the next versions once basic stuff gets
> committed and if many users ask for it.
> 
> + 
> +  All the functions of this module will provide the WAL information using the
> +  current server's timeline ID.
> + 
> 
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
> 
> 'trusted' extensions concept is added by commit 50fc694 [3]. Since
> pg_walinspect deals with WAL, we strictly want to control who creates
> and can execute functions exposed by it, so I don't know if 'trusted'
> is a good idea here. Also, 

Re: Skipping logical replication transactions on subscriber side

2022-03-10 Thread Masahiko Sawada
On Thu, Mar 10, 2022 at 2:10 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, March 2, 2022 12:01 AM Masahiko Sawada  
> wrote:
> > I've attached an updated patch along with two patches for cfbot tests since 
> > the
> > main patch (0003) depends on the other two patches. Both
> > 0001 and 0002 patches are the same ones I attached on another thread[2].
> Hi, few comments on 
> v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch.

Thank you for the comments.

>
>
> (1) doc/src/sgml/ref/alter_subscription.sgml
>
>
> +SKIP (  class="parameter">skip_option =  class="parameter">value ...
> +  ...After logical replication
> +  successfully skips the transaction or commits non-empty transaction,
> +  the LSN (stored in
> +  
> pg_subscription.subskiplsn)
> +  is cleared.  See  for
> +  the details of logical replication conflicts.
> + 
> ...
> +lsn (pg_lsn)
> +
> + 
> +  Specifies the commit LSN of the remote transaction whose changes 
> are to be skipped
> +  by the logical replication worker.  Skipping
> +  individual subtransactions is not supported.  Setting 
> NONE
> +  resets the LSN.
>
>
> I think we'll extend the SKIP option choices in the future besides the 'lsn' 
> option.
> Then, one sentence "After logical replication successfully skips the 
> transaction or commits non-empty
> transaction, the LSN .. is cleared" should be moved to the explanation for 
> 'lsn' section,
> if we think this behavior to reset LSN is unique for 'lsn' option ?

Hmm, I think that regardless of the type of option (e.g., relid, xid,
and action whatever), resetting the specified something after that is
specific to SKIP command. SKIP command should have an effect on only
the first non-empty transaction. Otherwise, we could end up leaving it
if the user mistakenly specifies the wrong one.

>
>
> (2) doc/src/sgml/catalogs.sgml
>
> + 
> +  
> +   subskiplsn pg_lsn
> +  
> +  
> +   Commit LSN of the transaction whose changes are to be skipped, if a 
> valid
> +   LSN; otherwise 0/0.
> +  
> + 
> +
>
> We need to cover the PREPARE that keeps causing errors on the subscriber.
> This would apply to the entire patch (e.g. the rename of skip_xact_commit_lsn)

Fixed.

>
> (3) apply_handle_commit_internal comments
>
>  /*
>   * Helper function for apply_handle_commit and apply_handle_stream_commit.
> + * Return true if the transaction was committed, otherwise return false.
>   */
>
> If we want to make the new added line alinged with other functions in 
> worker.c,
> we should insert one blank line before it ?

This part is removed.

>
>
> (4) apply_worker_post_transaction
>
> I'm not sure if the current refactoring is good or not.
> For example, the current HEAD calls pgstat_report_stat(false)
> for a commit case if we are in a transaction in apply_handle_commit_internal.
> On the other hand, your refactoring calls pgstat_report_stat unconditionally
> for apply_handle_commit path. I'm not sure if there
> are many cases to call apply_handle_commit without opening a transaction,
> but is that acceptable ?
>
> Also, the name is a bit broad.
> How about making a function only for stopping and resetting LSN at this stage 
> ?

Agreed, it seems to be overkill. I'll revert that change.

>
>
> (5) comments for clear_subscription_skip_lsn
>
> How about changing the comment like below  ?
>
> From:
> Clear subskiplsn of pg_subscription catalog
> To:
> Clear subskiplsn of pg_subscription catalog with origin state update
>

Updated.

I'll submit an updated patch that incorporated comments I got so far
and is rebased to disable_on_error patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: refactoring basebackup.c

2022-03-10 Thread Justin Pryzby
I'm getting errors from pg_basebackup when using both -D- and 
--compress=server-*
The issue seems to go away if I use --no-manifest.

$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none 
--compress=server-gzip >/dev/null ; echo $?
pg_basebackup: error: tar member has empty name
1

$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none 
--compress=server-gzip >/dev/null ; echo $?
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL 
segments are copied through other means to complete the backup
pg_basebackup: error: COPY stream ended before last file was finished
1




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-10 Thread Nathan Bossart
On Thu, Mar 10, 2022 at 07:13:14PM -0500, Chapman Flack wrote:
> Looks like this change to an example in func.sgml is not quite right:
> 
> -postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
> +postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop());
> 
> pg_backup_stop returns a record now, not just lsn. So this works for me:
> 
> +postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);

Ah, good catch.  I made this change in v7.  I considered doing something
like this

SELECT w.* FROM pg_backup_stop() b, pg_walfile_name_offset(b.lsn) w;

but I think your suggestion is simpler.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 3283d2b85f38f46d1e2ada0e6c5ea59d8c8e9f9d Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 1 Dec 2021 23:50:49 +
Subject: [PATCH v7 1/1] remove exclusive backup mode

---
 doc/src/sgml/backup.sgml  | 222 +---
 doc/src/sgml/func.sgml|  99 +---
 doc/src/sgml/high-availability.sgml   |   6 +-
 doc/src/sgml/monitoring.sgml  |   4 +-
 doc/src/sgml/ref/pgupgrade.sgml   |   2 +-
 src/backend/access/transam/xlog.c | 493 ++
 src/backend/access/transam/xlogfuncs.c| 253 ++---
 src/backend/access/transam/xlogrecovery.c |   2 +-
 src/backend/catalog/system_functions.sql  |  18 +-
 src/backend/postmaster/postmaster.c   |  46 +-
 src/backend/replication/basebackup.c  |  20 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |   4 +
 src/bin/pg_ctl/pg_ctl.c   |   4 +-
 src/bin/pg_rewind/filemap.c   |   6 +-
 src/include/access/xlog.h |   7 +-
 src/include/catalog/pg_control.h  |   2 +-
 src/include/catalog/pg_proc.dat   |  28 +-
 src/include/miscadmin.h   |   4 -
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  56 +-
 .../t/010_logical_decoding_timelines.pl   |   4 +-
 20 files changed, 189 insertions(+), 1091 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0d69851bb1..c8b914c1aa 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -857,16 +857,8 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0
 sequence, and that the success of a step is verified before
 proceeding to the next step.

-   
-Low level base backups can be made in a non-exclusive or an exclusive
-way. The non-exclusive method is recommended and the exclusive one is
-deprecated and will eventually be removed.
-   
-
-   
-Making a Non-Exclusive Low-Level Backup
 
- A non-exclusive low level backup is one that allows other
+ A low level backup allows other
  concurrent backups to be running (both those started using
  the same backup API and those started using
  ).
@@ -881,19 +873,19 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0

 
  Connect to the server (it does not matter which database) as a user with
- rights to run pg_start_backup (superuser, or a user who has been granted
+ rights to run pg_backup_start (superuser, or a user who has been granted
  EXECUTE on the function) and issue the command:
 
-SELECT pg_start_backup('label', false, false);
+SELECT pg_backup_start(label => 'label', fast => false);
 
  where label is any string you want to use to uniquely
  identify this backup operation. The connection
- calling pg_start_backup must be maintained until the end of
+ calling pg_backup_start must be maintained until the end of
  the backup, or the backup will be automatically aborted.
 
 
 
- By default, pg_start_backup can take a long time to finish.
+ By default, pg_backup_start can take a long time to finish.
  This is because it performs a checkpoint, and the I/O
  required for the checkpoint will be spread out over a significant
  period of time, by default half your inter-checkpoint interval
@@ -905,10 +897,6 @@ SELECT pg_start_backup('label', false, false);
  issue an immediate checkpoint using as much I/O as available.
 
 
-
- The third parameter being false tells
- pg_start_backup to initiate a non-exclusive base backup.
-


 
@@ -926,7 +914,7 @@ SELECT pg_start_backup('label', false, false);
 
  In the same connection as before, issue the command:
 
-SELECT * FROM pg_stop_backup(false, true);
+SELECT * FROM pg_backup_stop(wait_for_archive => true);
 
  This terminates backup mode. On a primary, it also performs an automatic
  switch to the next WAL segment.  On a standby, it is not possible to
@@ -937,7 +925,7 @@ SELECT * FROM pg_stop_backup(false, true);
  ready to archive.
 
 
- The pg_stop_backup will return one row with three
+ The pg_backup_stop will return one row with three
  

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-10 Thread Chapman Flack
On 03/09/22 19:06, Nathan Bossart wrote:
> Done.  I went ahead and added "label => 'label'" for consistency.

Looks like this change to an example in func.sgml is not quite right:

-postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
+postgres=# SELECT * FROM pg_walfile_name_offset(pg_backup_stop());

pg_backup_stop returns a record now, not just lsn. So this works for me:

+postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);


Otherwise, all looks to be in good order.

Regards,
-Chap




Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

2022-03-10 Thread Tom Lane
Jimmy Yih  writes:
> When dropping a partitioned index, the locking order starts with the
> partitioned table, then its partitioned index, then the partition
> indexes dependent on that partitioned index, and finally the dependent
> partition indexes' parent tables. This order allows a deadlock
> scenario to happen if for example an ANALYZE happens on one of the
> partition tables which locks the partition table and then blocks when
> it attempts to lock its index (the DROP INDEX has the index lock but
> cannot get the partition table lock).

I agree this is a bug, and I think that you may have the right
general idea about how to fix it: acquire the necessary locks in
RangeVarCallbackForDropRelation.  However, this patch still needs
a fair amount of work.

1. RangeVarCallbackForDropRelation can get called more than once
during a lookup (in case of concurrent rename and suchlike).
If so it needs to be prepared to drop the lock(s) it got last time.
You have not implemented any such logic.  This doesn't seem hard
to fix, just store the OID list into struct DropRelationCallbackState.

2. I'm not sure you have fully thought through the interaction
with the subsequent "is_partition" stanza.   If the target is
an intermediate partitioned index, don't we need to acquire lock
on its parent index before starting to lock children?  (It may
be that that stanza is already in the wrong place relative to
the table-locking stanza it currently follows, but not sure.)

3. Calling find_all_inheritors with lockmode NoLock, and then
locking those relation OIDs later, is both unsafe and code-wasteful.
Just tell find_all_inheritors to get the locks you want.

4. This code will invoke find_all_inheritors on InvalidOid if
IndexGetRelation fails.  It needs to be within the if-test
just above.

5. Reading classform again at this point in the function is
not merely inefficient, but outright wrong, because we
already released the syscache entry.  Use the local variable.

6. You've not paid enough attention to updating existing comments,
particularly the header comment for RangeVarCallbackForDropRelation.

Actually though, maybe you *don't* want to do this in
RangeVarCallbackForDropRelation.  Because of point 2, it might be
better to run find_all_inheritors after we've successfully
identified and locked the direct target relation, ie do it back
in RemoveRelations.  I've not thought hard about that, but it's
attractive if only because it'd mean you don't have to fix point 1.

regards, tom lane




Re: trigger example for plsample

2022-03-10 Thread Chapman Flack
On 03/02/22 15:12, Mark Wong wrote:

> I've attached v2, which reduces the output:
> 
> * Removing the notices for the text body, and the "compile" message.
> * Replaced the notice for "compile" message with a comment as a
>   placeholder for where a compiling code or checking a cache may go.
> * Reducing the number of rows inserted into the table, thus reducing
>   the number of notice messages about which code path is taken.

I think the simplifying assumption of a simple interpreted language allows
a lot more of this code to go away. I mean more or less that whole first
PG_TRY...PG_END_TRY block, which could be replaced with a comment saying
something like

  The source text may be augmented here, such as by wrapping it as the
  body of a function in the target language, prefixing a parameter list
  with names like TD_name, TD_relid, TD_table_name, TD_table_schema,
  TD_event, TD_when, TD_level, TD_NEW, TD_OLD, and args, using whatever
  types in the target language are convenient. The augmented text can be
  cached in a longer-lived memory context, or, if the target language uses
  a compilation step, that can be done here, caching the result of the
  compilation.

That would leave only the later PG_TRY block where the function gets
"executed". That could stay largely as is, but should probably also have
a comment within it, something like

  Here the function (the possibly-augmented source text, or the result
  of compilation if the target language uses such a step) should be
  executed, after binding these values from the TriggerData struct to
  the expected parameters.

That should make the example shorter and clearer, and preserve the output
tested by the regression test. Trying to show much more than that involves
assumptions about what the PL's target language syntax looks like, how its
execution engine works and parameters are bound, and so on, and that is
likely to just be distracting, IMV.

Regards,
-Chap




Re: PG DOCS - logical replication filtering

2022-03-10 Thread Euler Taveira
On Fri, Mar 4, 2022, at 12:41 AM, Peter Smith wrote:
> PSA patch v3 to address all comments received so far.
Peter,

I started reviewing this documentation for row filter and I noticed that I was
changing too much lines to submit a patch on the top of it. Hence, I'm
attaching a new version based on this one.

Here as some of the changes that I did:

* links: I renamed the ids to use "logical-replication-row-filter" instead of
  "logical-replication-rf" because it is used in the anchors. A compound word
  is better than an abbreviation.
* titles: I changed all titles because of some stylish issue (words are
  generally capitalized) or because it reads better.
* sections: I moved the "Restrictions" section to the top but it seems
  important than the other sections. I removed the "psql" section. The psql
  commands are shown in the "Examples" section so I don't think we should
  provide a section for it.
* sentences: I expanded several sentences to provide details about the specific
  topic. I also reordered some sentences and removed some duplicated sentences.
* replica identity: I added a link to replica identity. It is a key concept for
  row filter.
* transformations: I added a few sentences explaining when/why a transformation
  is required. I removed the "Cases" part (same as in the source code) because
  it is redundant with the new sentences.
* partitioned table: the title should be _partitioned_ table. I replaced the
  bullets with sentences in the same paragraph.
* initial data sync: I removed the "subscriber" from the section title. When
  you say "initial data synchronization" it seems clear we're talking about the
  subscriber. I also add a sentence saying why pre-15 does not consider row
  filters.
* combining row filters: I renamed the section and decided to remove "for the
  same table". When reading the first sentences from this section, it is clear
  that row filtering is per table. So if you are combining multiple row
  filters, it should be for the same table. I also added a sentence saying why
  some clauses make the row filter irrelevant.
* examples: I combined the psql commands that shows row filter information
  together (\dRp+ and \d). I changed to connection string to avoid "localhost".
  Why? It does not seem a separate service (there is no port) and setup pub/sub
  in the same cluster requires additional steps. It is better to illustrate
  different clusters (at least it seems so since we don't provide details from
  publisher). I changed a value in an UPDATE because both UPDATEs use 999.

We could probably reduce the number of rows in the example but I didn't bother
to remove them.

It seems we can remove some sentences from the CREATE PUBLICATION because we
have a new section that explains all of it. I think the link that was added by
this patch is sufficient.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From aea0c822489fbdeba14141e215f76e3777521bb2 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 10 Mar 2022 17:38:24 -0300
Subject: [PATCH v4] doc: new section for row filter.

---
 doc/src/sgml/logical-replication.sgml| 444 +++
 doc/src/sgml/ref/create_publication.sgml |   2 +
 2 files changed, 446 insertions(+)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 82326c3901..38ac77b6e1 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -118,6 +118,8 @@
any combination of INSERT, UPDATE,
DELETE, and TRUNCATE, similar to how triggers are fired by
particular event types.  By default, all operation types are replicated.
+   (Row filters have no effect for TRUNCATE. See 
+   ).
   
 
   
@@ -317,6 +319,448 @@
   
  
 
+ 
+  Row Filter
+
+  
+   The published tables replicate all data to the subscribers. The replicated
+   data can be selected using a row filter. The user
+   might choose to use row filters for behavioral, security or performance
+   reasons.
+  
+
+  
+   If a published table sets a row filter, a row is replicated if its data
+   satisfies the row filter expression. It means if row filters are used in a
+   set of tables, data will be partially replicated.
+  
+
+  
+   The row filter is defined per table. Use a WHERE clause
+   after the table name for each published table that requires data to be
+   filtered out. The WHERE clause must be enclosed by
+   parentheses. See  for details.
+  
+
+  
+   Row Filter Rules
+
+   
+Row filter is applied before publishing the changes.
+   
+
+   
+If the row filter evaluates to false or
+NULL then the row is not replicated.
+   
+
+   
+The WHERE clause expression is evaluated with the same
+role used for the replication connection. It means the role specified in the
+CONNECTION clause of the .
+   
+
+   
+Row filter has no effect for TRUNCATE command.
+   
+
+  
+
+  
+   Expression Restrictions
+
+   
+The WHERE clause allows only simple 

Re: [Proposal] vacuumdb --schema only

2022-03-10 Thread Robert Treat
On Thu, Mar 10, 2022 at 1:32 AM Gilles Darold  wrote:
>
> Le 09/03/2022 à 22:10, Justin Pryzby a écrit :
>
> On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
>
> Maybe it's clearer to write this with =ANY() / != ALL() ?
> See 002.
>
> I have applied your changes and produced a new version v3 of the patch,
> thanks for the improvements. The patch have been added to commitfest
> interface, see here https://commitfest.postgresql.org/38/3587/
>
> I wondered whether my patches were improvements, and it occurred to me that
> your patch didn't fail if the specified schema didn't exist.  That's arguably
> preferable, but that's the pre-existing behavior for tables.  So I think the
> behavior of my patch is more consistent.
>
> +1
>

+1 for consistency.

Robert Treat
https://xzilla.net




Re: Changing "Hot Standby" to "hot standby"

2022-03-10 Thread Robert Treat
On Thu, Mar 10, 2022 at 8:45 AM Daniel Westermann (DWE)
 wrote:
>
> >>>Hmm.  Outside the title that had better use upper-case characters for
> >>>the first letter of each word, I can see references to the pattern you
> >>>are trying to eliminate in amcheck.sgml (1), config.sgml (3),
> >>>protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
> >>>well if the point is to make the full set of docs consistent?
>
> >>>As of the full tree, I can see that:
> >>>
> >>$ git grep "hot standby" | wc -l
> >>259
>
> >>$ git grep "Hot Standby" | wc -l
> >>73
>
> >>>So there is a trend for one of the two.
>
> >>Thanks for looking at it. Yes, I am aware there are other places which 
> >>would need to be changed and I think I mentioned that in an >>earlier 
> >>Email. Are you suggesting to change all at once? I wanted to start with the 
> >>documentation and then continue with the other >>places.
>
> >Attached a new version which also modifies amcheck.sgml, config.sgml, 
> >protocol.sgml, and mvcc.sgml accordingly.
>
> Regards
> Daniel
>
>
> From: Daniel Westermann (DWE) 
> Sent: Wednesday, March 9, 2022 15:15
> To: Michael Paquier 
> Cc: Robert Treat ; Kyotaro Horiguchi 
> ; aleksan...@timescale.com 
> ; pgsql-hackers@lists.postgresql.org 
> 
> Subject: Re: Changing "Hot Standby" to "hot standby"
>
> >>Hmm.  Outside the title that had better use upper-case characters for
> >>the first letter of each word, I can see references to the pattern you
> >>are trying to eliminate in amcheck.sgml (1), config.sgml (3),
> >>protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
> >>well if the point is to make the full set of docs consistent?
>
> >>As of the full tree, I can see that:
> >>
> >>$ git grep "hot standby" | wc -l
> >>259
>
> >>$ git grep "Hot Standby" | wc -l
> >>73
>
> >>So there is a trend for one of the two.
>
> >>Thanks for looking at it. Yes, I am aware there are other places which 
> >>would need to be changed and I think I mentioned that in an >>earlier 
> >>Email. Are you suggesting to change all at once? I wanted to start with the 
> >>documentation and then continue with the other >>places.
>
> >Attached a new version which also modifies amcheck.sgml, config.sgml, 
> >protocol.sgml, and mvcc.sgml accordingly.
>
> Sending this again as my last two mails did not seem to reach the archives or 
> the commitfest. Or do they need moderation somehow?
>

Not sure why the previous emails didn't go through, and still doesn't
look like they were picked up. In the interest of progress though,
attaching an updated patch with some minor wordsmithing; lmk if you'd
prefer this differently

Robert Treat
https://xzilla.net


align_hot_standby_v6-rt.patch
Description: Binary data


Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-10 Thread Nathan Bossart
On Thu, Mar 10, 2022 at 09:30:57PM +, Imseih (AWS), Sami wrote:
> Attached v4 which includes accounting for the hash size on startup, removal 
> of the no longer needed comment in pgstatfuncs.c and a change in both 
> code/docs to only reset the indexes_total to 0 when failsafe is triggered.

Thanks for the new patch set.

+/*
+ * Structs for tracking shared Progress information
+ * amongst worker ( and leader ) processes of a vacuum.
+ */

nitpick: Can we remove the extra spaces in the parentheses?

+if (entry != NULL)
+values[PGSTAT_NUM_PROGRESS_COMMON + PROGRESS_VACUUM_INDEXES_COMPLETED] 
= entry->indexes_processed;

What does it mean if there isn't an entry in the map?  Is this actually
expected, or should we ERROR instead?

+/* vacuum worker progress hash table */
+max_table_size = GetMaxBackends();
+size = add_size(size, hash_estimate_size(max_table_size,
+ sizeof(VacProgressEntry)));

I think the number of entries should be shared between
VacuumWorkerProgressShmemInit() and VacuumWorkerProgressShmemSize().
Otherwise, we might update one and not the other.

+/* Call the command specific function to override datum values */
+if (pg_strcasecmp(cmd, "VACUUM") == 0)
+set_vaccum_worker_progress(values);

I think we should elaborate a bit more in this comment.  It's difficult to
follow what this is doing without referencing the comment above
set_vacuum_worker_progress().

IMO the patches are in decent shape, and this should likely be marked as
ready-for-committer in the near future.  Before doing so, I think we should
check that Sawada-san is okay with moving the deeper infrastructure changes
to a separate threaḋ.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Tom Lane
Alexander Korotkov  writes:
> Sorry for publishing a patch, which doesn't even compile.
> The attached version should be good.

OK, better.

So to clarify: we are going to back-patch as far as v13, and
then in the minor releases we are going to tell people they
need to REINDEX all gist indexes on ltree?  If so, please
make that clear in the commit message, so I don't forget it
come May ;-)

Also, it looks like reindex is *not* required for indexes on
ltree[] ?  That is, gist_ltree_ops indexes are affected but
not gist__ltree_ops?

regards, tom lane




Re: role self-revocation

2022-03-10 Thread Mark Dilger



> On Mar 10, 2022, at 2:01 PM, Robert Haas  wrote:
> 
> It sounds like you prefer a behavior where CREATEROLE gives power over
> all non-superusers, but that seems pretty limiting to me. Why can't
> someone want to create a user with power over some users but not
> others?

I agree with Robert on this.

Over at [1], I introduced a patch series to (a) change CREATEROLE and (b) 
introduce role ownership.  Part (a) wasn't that controversial.  The patch 
series failed to make it for postgres 15 on account of (b).  The patch didn't 
go quite far enough, but with it applied, this is an example of a min-superuser 
"lord" operating within database "fiefdom":

fiefdom=# -- mini-superuser who can create roles and write all data
fiefdom=# CREATE ROLE lord
fiefdom-# WITH CREATEROLE
fiefdom-# IN ROLE pg_write_all_data;
CREATE ROLE
fiefdom=# 
fiefdom=# -- group which "lord" belongs to
fiefdom=# CREATE GROUP squire
fiefdom-# ROLE lord;
CREATE ROLE
fiefdom=# 
fiefdom=# -- group which "lord" has no connection to
fiefdom=# CREATE GROUP paladin;
CREATE ROLE
fiefdom=# 
fiefdom=# SET SESSION AUTHORIZATION lord;
SET
fiefdom=> 
fiefdom=> -- fail, merely a member of "squire"
fiefdom=> CREATE ROLE peon IN ROLE squire;
ERROR:  must have admin option on role "squire"
fiefdom=> 
fiefdom=> -- fail, no privilege to grant CREATEDB 
fiefdom=> CREATE ROLE peon CREATEDB;
ERROR:  must have createdb privilege to create createdb users
fiefdom=> 
fiefdom=> RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- grant admin over "squire" to "lord"
fiefdom=# GRANT squire
fiefdom-# TO lord
fiefdom-# WITH ADMIN OPTION;
GRANT ROLE
fiefdom=# 
fiefdom=# SET SESSION AUTHORIZATION lord;
SET
fiefdom=> 
fiefdom=> -- ok, have both "CREATEROLE" and admin option for "squire"
fiefdom=> CREATE ROLE peon IN ROLE squire;
CREATE ROLE
fiefdom=> 
fiefdom=> -- fail, no privilege to grant CREATEDB
fiefdom=> CREATE ROLE peasant CREATEDB IN ROLE squire;
ERROR:  must have createdb privilege to create createdb users
fiefdom=> 
fiefdom=> RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- Give lord the missing privilege
fiefdom=# GRANT CREATEDB TO lord;
ERROR:  role "createdb" does not exist
fiefdom=# 
fiefdom=# RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- ok, have "CREATEROLE", "CREATEDB", and admin option for "squire"
fiefdom=# CREATE ROLE peasant CREATEDB IN ROLE squire;
CREATE ROLE

The problem with this is that "lord" needs CREATEDB to grant CREATEDB, but 
really it should need something like grant option on "CREATEDB".  But that's 
hard to do with the existing system, given the way these privilege bits are 
represented.  If we added a few more built-in pg_* roles, such as pg_create_db, 
it would just work.  CREATEROLE itself could be reimagined as pg_create_role, 
and then users could be granted into this role with or without admin option, 
meaning they could/couldn't further give it away.  I think that would be a 
necessary component to Joshua's "bot" use-case, since the bot must itself have 
the privilege to create roles, but shouldn't necessarily be trusted with the 
privilege to create additional roles who have it.

[1] 
https://www.postgresql.org/message-id/53c7df4c-8463-4647-9dfd-779b5e186...@amazon.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-10 Thread Tom Lane
Robert Haas  writes:
> Probably easier to just say it again: I want to have users that can
> create roles and then have superuser-like powers with respect to those
> roles. They can freely exercise the privileges of those roles, and
> they can do all the things that a superuser can do but only with
> respect to those roles.

This seems reasonable in isolation, but

(1) it implies a persistent relationship between creating and created
roles.  Whether you want to call that ownership or not, it sure walks
and quacks like ownership.

(2) it seems exactly contradictory to your later point that

> Agree. I also think that it would be a good idea to attribute grants
> performed by any superuser to the bootstrap superuser, or leave them
> unattributed somehow. Because otherwise dropping superusers becomes a
> pain in the tail for no good reason.

Either there's a persistent relationship or there's not.  I don't
think it's sensible to treat superusers differently here.

I think that this argument about the difficulty of dropping superusers
may in fact be the motivation for the existing behavior that object-
permissions GRANTs done by superusers are attributed to the object
owner; something you were unhappy about upthread.

In the end these requirements seem mutually contradictory.  Either
we can have a persistent ownership relationship or not, but I don't
think we can have it apply in some cases and not others without
creating worse problems than we solve.  I'm inclined to toss overboard
the requirement that superusers need to be an easy thing to drop.
Why is that important, anyway?

> We might also need to think carefully about what happens if for
> example the table owner is changed. If bob owns the table and we
> change the owner to mary, but bob's previous grants are still
> attributed to bob, I'm not sure that's going to be very convenient.

That's already handled, is it not?

regression=# create user alice;
CREATE ROLE
regression=# create user bob;
CREATE ROLE
regression=# create user charlie;
CREATE ROLE
regression=# \c - alice
You are now connected to database "regression" as user "alice".
regression=> create table alices_table (f1 int);
CREATE TABLE
regression=> grant select on alices_table to bob;
GRANT
regression=> \c - postgres
You are now connected to database "regression" as user "postgres".
regression=# alter table alices_table owner to charlie;
ALTER TABLE
regression=# \dp alices_table
   Access privileges
 Schema | Name | Type  |Access privileges| Column privileges | 
Policies 
+--+---+-+---+--
 public | alices_table | table | charlie=arwdDxt/charlie+|   | 
|  |   | bob=r/charlie   |   | 
(1 row)

I'm a bit disturbed that parts of this discussion seem to be getting
conducted with little understanding of the system's existing behaviors.
We should not be reinventing things we already have perfectly good
solutions for in the object-privileges domain.

regards, tom lane




Re: role self-revocation

2022-03-10 Thread David G. Johnston
On Thu, Mar 10, 2022 at 3:01 PM Robert Haas  wrote:

> On Thu, Mar 10, 2022 at 4:00 PM David G. Johnston
>  wrote:
> > I dislike changing the documented behavior of CREATEROLE to the degree
> suggested here.  However, there are three choices here, only one of which
> can be chosen:
> >
> > 1. Leave CREATEROLE alone entirely
> > 2. Make it so CREATEROLE cannot assign membership to the predefined
> roles or superuser (inheritance included), but leave the rest alone.  This
> would be the hard-coded version, not the role attribute one.
> > 3. Make it so CREATEROLE can only assign membership to roles for which
> it has been made an admin; as well as the other things mentioned
> >
> > Moving forward I'd prefer options 1 or 2, leaving the ability to
> create/alter/drop a role to be vested via predefined roles.
>
> It sounds like you prefer a behavior where CREATEROLE gives power over
> all non-superusers, but that seems pretty limiting to me.
>

Doh!  I edited out the part where I made clear I considered options 1 and 2
as basically being done for a limited period of time while deprecating the
CREATEROLE attribute altogether in favor of the fine-grained and predefined
role based permission granting.  I don't want to nerf CREATEROLE as part of
adding this new feature, instead leave it as close to status quo as
reasonable so as not to mess up existing setups that make use of it.  We
can note in the release notes and documentation that we consider CREATEROLE
to be deprecated and that the new predefined role should be used to give a
user the ability to create/alter/drop roles, etc...  DBAs should consider
revoking CREATEROLE from their users and granting them proper memberships
in the predefined roles and the groups those roles should be administering.

David J.


Re: range_agg with multirange inputs

2022-03-10 Thread Chapman Flack
On 03/05/22 15:53, Paul Jungwirth wrote:
> On 3/1/22 13:33, Chapman Flack wrote:
>> I think the 4 lines should suffice, but it looks like this patch was
>> generated from a rebase of the old one (with three lines) that ended up
>> putting the new 'range_agg' entry ahead of 'max' in func.sgml, which
>> position is now baked into the 4 lines of context. :)
> 
> You're right, my last rebase messed up the docs. Here it is fixed. Sorry
> about that!

When I apply this patch, I get a func.sgml with two entries for
range_intersect_agg(anymultirange).

> I like the elog solution. I've changed them in both places.

It looks like you've now got elog in three places: the "must be called
with a range or multirange" in multirange_agg_transfn and
multirange_intersect_agg_transfn, and the "called in non-aggregate
context" in multirange_agg_transfn.

I think that last is also ok, given that its state type is internal,
so it shouldn't be reachable in a user call.

In range_agg_transfn, you've changed the message in the "must be called
with a range or multirange"; that seems like another good candidate to
be an elog.

> I see 13 other shared finalfns (using select array_agg(aggfnoid::regproc) as
> procs, array_agg(aggtransfn) as transfns, aggfinalfn from pg_aggregate where
> aggfinalfn is distinct from 0 group by aggfinalfn having count(*) > 1;) but
> a comment can't hurt! Added.

I think your query finds aggregate declarations that share the same SQL
function declaration as their finalizer functions. That seems to be more
common.

The query I used looks for cases where different SQL-declared functions
appear as finalizers of aggregates, but the different SQL declared functions
share the same internal C implementation. That's the query where this seems
to be the unique result.

WITH
 finals(regp) AS (
  SELECT DISTINCT
CAST(aggfinalfn AS regprocedure)
   FROM
pg_aggregate
   WHERE
aggfinalfn <> 0 -- InvalidOid
 )
SELECT
  prosrc, array_agg(regp)
 FROM
  pg_proc, finals
 WHERE
  oid = regp AND prolang = 12 -- INTERNALlanguageId
 GROUP BY
  prosrc
 HAVING
  count(*) > 1;

In other words, I think the interesting thing to say in the C comment
is not "shared by range_agg(anyrange) and range_agg(anymultirange)", but
"shared by range_agg_finalfn(internal,anyrange) and
multirange_agg_finalfn(internal,anymultirange)".

It seems a little extra surprising to have one C function declared in SQL
with two different names and parameter signatures. It ends up working
out because it relies on get_fn_expr_rettype, which can do its job for
either polymorphic type it might find in the parameter declaration.
But that's a bit subtle. :)

Regards,
-Chap




Re: Adding CI to our tree

2022-03-10 Thread Justin Pryzby
See attached, or at
https://github.com/justinpryzby/postgres/runs/5503079878
>From c631c3d9bdb8325aaaecc5dcdfac46eca7bd907a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 17 Jan 2022 00:53:04 -0600
Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..

This is useful for patches during development, but once a feature is merged,
new libraries should be added to the OS image files, rather than installed
during every CI run forever into the future.
---
 .cirrus.yml | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 40854046d66..6026a973dbb 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -73,10 +73,12 @@ task:
 chown -R postgres:postgres .
 mkdir -p ${CCACHE_DIR}
 chown -R postgres:postgres ${CCACHE_DIR}
-  setup_cores_script: |
+  setup_core_files_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kern.corefile='/tmp/cores/%N.%P.core'
+  setup_additional_packages_script: |
+#pkg install -y ...
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -180,10 +182,13 @@ task:
 chown -R postgres:postgres ${CCACHE_DIR}
 echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
 su postgres -c "ulimit -l -H && ulimit -l -S"
-  setup_cores_script: |
+  setup_core_files_script: |
 mkdir -m 770 /tmp/cores
 chown root:postgres /tmp/cores
 sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+  setup_additional_packages_script: |
+#apt-get update
+#apt-get -y install ...
 
   configure_script: |
 su postgres <<-EOF
@@ -237,7 +242,7 @@ task:
 ulimit -a -H && ulimit -a -S
 export
 
-  setup_cores_script:
+  setup_core_files_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
 
@@ -254,7 +259,7 @@ task:
   # packages do not need to be downloaded.
   homebrew_cache:
 folder: $HOMEBREW_CACHE
-  homebrew_install_script: |
+  setup_additional_packages_script: |
 brew install \
   ccache \
   icu4c \
@@ -389,6 +394,9 @@ task:
 powershell -Command get-psdrive -psprovider filesystem
 set
 
+  setup_additional_packages_script: |
+REM choco install -y --no-progress ...
+
   configure_script:
 # copy errors out when using forward slashes
 - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
@@ -484,6 +492,10 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_additional_packages_script: |
+#apt-get update
+#apt-get -y install ...
+
   ###
   # Test that code can be built with gcc/clang without warnings
   ###
-- 
2.17.1

>From 9e5df76468c4ecc8f411f106c86d54c1f06f70f3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 27 Feb 2022 15:17:50 -0600
Subject: [PATCH 2/7] cirrus: compile with -Og..

To improve performance of check-world, and improve debugging, without
significantly slower builds (they're cached anyway).

This makes freebsd check-world run in 8.5 minutes rather than 15 minutes.
And on linux mitigates the run-time performance effect of --coverage.
---
 .cirrus.yml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 6026a973dbb..5179353dc9d 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -107,7 +107,7 @@ task:
 \
 CC="ccache cc" \
 CXX="ccache c++" \
-CFLAGS="-O0 -ggdb"
+CFLAGS="-Og -ggdb"
 EOF
   build_script: su postgres -c "gmake -s -j${BUILD_JOBS} world-bin"
   upload_caches: ccache
@@ -201,8 +201,8 @@ task:
 CC="ccache gcc" \
 CXX="ccache g++" \
 CLANG="ccache clang" \
-CFLAGS="-O0 -ggdb" \
-CXXFLAGS="-O0 -ggdb"
+CFLAGS="-Og -ggdb" \
+CXXFLAGS="-Og -ggdb"
 EOF
   build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin"
   upload_caches: ccache
@@ -315,8 +315,8 @@ task:
   CC="ccache cc" \
   CXX="ccache c++" \
   CLANG="ccache ${brewpath}/llvm/bin/ccache" \
-  CFLAGS="-O0 -ggdb" \
-  CXXFLAGS="-O0 -ggdb" \
+  CFLAGS="-Og -ggdb" \
+  CXXFLAGS="-Og -ggdb" \
   \
   LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
   PYTHON=python3
-- 
2.17.1

>From 92f64c297628ebb7026b6b61795444ea078720d2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 20 Feb 2022 15:01:59 -0600
Subject: [PATCH 3/7] cirrus/windows: add compiler_warnings_script

I'm not sure how to write this test in windows shell; it's also not easy to
write it in posix sh, since windows shell is somehow interpretting && and ||...

ci-os-only: windows

https://cirrus-ci.com/task/6183879907213312
https://cirrus-ci.com/task/4876271443247104
---
 .cirrus.yml|  9 -
 src/tools/ci/windows-compiler-warnings | 16 
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100755 

Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 5:00 PM Bruce Momjian  wrote:
> On Thu, Mar 10, 2022 at 02:22:05PM -0500, Robert Haas wrote:
> > I mean, I didn't design pg_hba.conf, but I think it's part of the
> > database doing a reasonable thing, not an external system doing a
> > nonsensical thing.
>
> FYI, I think pg_hba.conf gets away with having negative/reject
> permissions only because it is strictly ordered.

I agree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 4:00 PM David G. Johnston
 wrote:
> I dislike changing the documented behavior of CREATEROLE to the degree 
> suggested here.  However, there are three choices here, only one of which can 
> be chosen:
>
> 1. Leave CREATEROLE alone entirely
> 2. Make it so CREATEROLE cannot assign membership to the predefined roles or 
> superuser (inheritance included), but leave the rest alone.  This would be 
> the hard-coded version, not the role attribute one.
> 3. Make it so CREATEROLE can only assign membership to roles for which it has 
> been made an admin; as well as the other things mentioned
>
> Moving forward I'd prefer options 1 or 2, leaving the ability to 
> create/alter/drop a role to be vested via predefined roles.

It sounds like you prefer a behavior where CREATEROLE gives power over
all non-superusers, but that seems pretty limiting to me. Why can't
someone want to create a user with power over some users but not
others? For example, the superuser might want to give alice the
ability to set up new users in the accounting department, but NOT give
alice the right to tinker with the backup user (who is not a
superuser, but doesn't have the replication privilege). How would they
accomplish that in your view?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: role self-revocation

2022-03-10 Thread Bruce Momjian
On Thu, Mar 10, 2022 at 02:22:05PM -0500, Robert Haas wrote:
> I mean, I didn't design pg_hba.conf, but I think it's part of the
> database doing a reasonable thing, not an external system doing a
> nonsensical thing.

FYI, I think pg_hba.conf gets away with having negative/reject
permissions only because it is strictly ordered.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 3:41 PM Stephen Frost  wrote:
> > Gosh, I feel like I've spelled that out approximately 463,121 times
> > already. That estimate might be slightly off though; I've been known
> > to make mistakes from time to time
>
> If there's a specific message that details it closely on the lists
> somewhere, I'm happy to go review it.  I admit that I didn't go back and
> look for such.

Probably easier to just say it again: I want to have users that can
create roles and then have superuser-like powers with respect to those
roles. They can freely exercise the privileges of those roles, and
they can do all the things that a superuser can do but only with
respect to those roles. They cannot break out to the OS. I think it's
pretty similar to what you are describing, with a couple of possible
exceptions. For example, would you imagine that being an admin of a
login role would let you change that user's password? Because that
would be desirable behavior from where I sit.

> Errr, just to be clear, ALTER ROLE doesn't exist *in the spec*.  I
> wasn't suggesting that we get rid of it, just that it doesn't exist in
> the spec and therefore the spec doesn't have anything to say about it.

Oh, OK.

> > Basically, in this sketch, ADMIN OPTION on a role involves the ability
> > to DROP it, which means we don't need a separate role owner concept.
>
> Right.  The above doesn't include any specifics about what to do with
> ALTER ROLE, but my thought would be to have it also be under ADMIN
> OPTION rather than under CREATEROLE, as I tried to outline (though not
> very well, I'll admit) below.

This sentence really confused me at first, but I think you're saying
that the right to alter a role would be dependent on having ADMIN
OPTION on the role rather than on having the CREATEROLE attribute.
That seems like a reasonable idea to me.

> > It also involves membership, meaning that you can freely exercise the
> > privileges of the role without SET ROLE. While I'm totally down with
> > having other possible behaviors as options, that particular behavior
> > seems very useful to me, so, sounds great.
>
> Well, yes and no- by default you're right, presuming everything is set
> as inheirited, but I'd wish for us to keep the option of creating roles
> which are noinherit and having that work just as it does today.

Hmm, so if I have membership WITH ADMIN OPTION in a role, but my role
is marked NOINHERIT, that means I can't exercise the privileges of
that role without SET ROLE. But, can I still do other things to that
role, such as dropping it? Given the current coding of
roles_is_member_of(), it seems like I can't. I don't like that, but
then I don't like much of anything about NOINHERIT. Do you have any
suggestions for how this could be improved?

To make this more concrete, suppose the superuser does "CREATE USER
alice CREATEROLE". Alice will have INHERIT, so she'll have control
over any roles she creates. But if she does "CREATE USER bob
CREATEROLE NOINHERIT" then neither she nor Bob will be able to control
the roles bob creates. I'd like to have a way to make it so that
neither Alice nor any other CREATEROLE users she spins up can create
roles over which they no longer have control. Because otherwise people
will do dumb stuff like that and then have to call the superuser to
sort it out, and the superuser won't like that because s/he is a super
busy person.

> > What do you mean by the 'drop role' exception?
>
> 'ability' was probably a better word there.  What I'm talking about is
> changing in DropRole:
>
> to be, more or less:
>
> if (!is_admin_of_role(role))
> ereport(ERROR,
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>  errmsg("permission denied to drop role")));

Sounds good.

> > I don't like removing 'alter role'.
>
> Ditto above but for AlterRole.  Taking it away from users with
> CREATEROLE being able to run those commands on $anyrole and instead
> making it so that the role running DROP ROLE or ALTER ROLE needs to have
> ADMIN on the role they're messing with.  I do think we may also need to
> make some adjustments in terms of what a regular user WITH ADMIN on a
> given role is able to do when it comes to ALTER ROLE, in particular, I
> don't think we'll want to remove the existing is-superuser checks
> against a user settings bypassrls or replication or superuser on some
> other role.  Maybe we can provide a way for a non-superuser to be given
> the ability to set those attributes for roles they create, but that
> would be a separate thing.

This too.

> > > Step #2 is also in-line with the spec: track GRANTORs and care about
> > > them, for everything.  We really should have been doing this all along.
> >
> > There are details to work out here, but in general, I like it.
>
> Cool.  Note that superusers would still be able to do $anything,
> including removing someone's ADMIN rights on a role even if that
> superuser didn't GRANT it (at least, that's my 

Re: [PATCH] pg_permissions

2022-03-10 Thread Chapman Flack
On 02/26/22 03:27, Joel Jacobson wrote:
> On Fri, Feb 25, 2022, at 22:12, Chapman Flack wrote:
>> I would be happy to review this patch, but a look through the email leaves me
>> thinking it may still be waiting on a C implementation of pg_get_acl(). Is 
>> that
>> right?
> 
> Not sure.

It looked to me as if the -hackers messages of 25 and 26 March 2021 had
found a consensus that a pg_get_acl() function would be a good thing,
with the views to be implemented over that.

I'm just not seeing any later patch that adds such a function.

Regards,
-Chap




Re: role self-revocation

2022-03-10 Thread David G. Johnston
On Thu, Mar 10, 2022 at 12:58 PM Stephen Frost  wrote:

> I don't think we're that far from having all of these though.  To start
> with, we remove from CREATEROLE the random things that it does which go
> beyond what folks tend to expect- remove the whole 'grant any role to
> any other' stuff, remove the 'drop role' exception, remove the
> 'alter role' stuff.  Do make it so that when you create a role, however,
> the above GRANT is effectively done.  Now, for the items above where we
> removed the checks against have_createrole_privilege() we go back and
> add in checks using is_admin_of_role().  Of course, also remove the role
> self-administration bug.
>
> That's step #1, but it gets us more-or-less what you're looking for, I
> think, and brings us a lot closer to what the spec has.
>

That still leaves attribute specification in place: e.g., REPLICATION,
CREATEROLE, CREATEDB, etc... (I see BYPASSRLS already is SUPERUSER only)

I dislike changing the documented behavior of CREATEROLE to the degree
suggested here.  However, there are three choices here, only one of which
can be chosen:

1. Leave CREATEROLE alone entirely
2. Make it so CREATEROLE cannot assign membership to the predefined roles
or superuser (inheritance included), but leave the rest alone.  This would
be the hard-coded version, not the role attribute one.
3. Make it so CREATEROLE can only assign membership to roles for which it
has been made an admin; as well as the other things mentioned

Moving forward I'd prefer options 1 or 2, leaving the ability to
create/alter/drop a role to be vested via predefined roles.

The rest seems fine at an initial glance.

David J.


Re: Adding CI to our tree

2022-03-10 Thread Justin Pryzby
On Thu, Mar 10, 2022 at 12:50:15PM -0800, Andres Freund wrote:
> > -  setup_cores_script: |
> > +  setup_os_script: |
> >  mkdir -m 770 /tmp/cores
> >  chown root:postgres /tmp/cores
> >  sysctl kern.corefile='/tmp/cores/%N.%P.core'
> > +#pkg install -y ...
> 
> Would you mind if I split this into setup_core_files_script and
> setup_additional_packages_script:?

That's fine.  FYI I'm also planning on using choco install --no-progress
I could resend my latest patches shortly.

> > Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script
> > 
> > I'm not sure how to write this test in windows shell; it's also not easy to
> > write it in posix sh, since windows shell is somehow interpretting && and 
> > ||...
> 
> That comment isn't accurate anymore now that it's in an external script,
> right?

No, it is accurate.  What I mean is that it's also hard to write it as a
1-liner using posix sh, since the || (and &&) seemed to be interpretted by
cmd.exe and needed escaping - gross.

-- 
Justin




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-10 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis  wrote:
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
> 
> I'm not clear on the standard either, exactly, but might not that
> allow the database owner to get a peek at what's happening in other
> databases?

The standard is basically that all of the functions it brings are
written to enforce the PG privilege system and you aren't able to use
the extension to bypass those privileges.  In some cases that means that
the C-language functions installed have if(!superuser) ereport() calls
throughout them- that's a fine answer, but it's perhaps not very helpful
to mark those as trusted.  In other cases, the C-language functions
installed don't directly provide access to data, such as the PostGIS
functions.

I've not looked back on this thread, but I'd expect pg_walinspect to
need those superuser checks and with those it *could* be marked as
trusted, but that again brings into question how useful it is to mark it
thusly.

In an ideal world, we might have a pg_readwal predefined role which
allows a role which was GRANT'd that role to be able to read WAL
traffic, and then the pg_walinspect extension could check that the
calling role has that predefined role, and other functions and
extensions could also check that rather than any existing superuser
checks.  A cloud provider or such could then include in their setup of a
new instance something like:

GRANT pg_readwal TO admin_user WITH ADMIN OPTION;

Presuming that there isn't anything that ends up in the WAL that's an
issue for the admin_user to have access to.

I certainly don't think we should allow either database owners or
regular users on a system the ability to access the WAL traffic of the
entire system.  More forcefully- we should *not* be throwing more access
rights towards $owners in general and should be thinking about how we
can allow admins, providers, whomever, the ability to control what
rights users are given.  If they're all lumped under 'owner' then
there's no way for people to provide granular access to just those
things they wish and intend to.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Adding CI to our tree

2022-03-10 Thread Andres Freund
Hi,

On 2022-03-02 14:50:58 -0600, Justin Pryzby wrote:
> From 883edaa653bcf7f1a2369d8edf46eaaac1ba0ba2 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Mon, 17 Jan 2022 00:53:04 -0600
> Subject: [PATCH 1/7] cirrus: include hints how to install OS packages..
> 
> This is useful for patches during development, but once a feature is merged,
> new libraries should be added to the OS image files, rather than installed
> during every CI run forever into the future.
> ---
>  .cirrus.yml | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index d10b0a82f9f..1b7c36283e9 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -73,10 +73,11 @@ task:
>  chown -R postgres:postgres .
>  mkdir -p ${CCACHE_DIR}
>  chown -R postgres:postgres ${CCACHE_DIR}
> -  setup_cores_script: |
> +  setup_os_script: |
>  mkdir -m 770 /tmp/cores
>  chown root:postgres /tmp/cores
>  sysctl kern.corefile='/tmp/cores/%N.%P.core'
> +#pkg install -y ...

Would you mind if I split this into setup_core_files_script and
setup_additional_packages_script:?


> +  # The commit that this branch is rebased on.  There's no easy way to find 
> this.
> +  # This does the right thing for cfbot, which always squishes all patches 
> into a single commit.
> +  # And does the right thing for any 1-patch commits.
> +  # Patches series manually submitted to cirrus may benefit from setting this
> +  # to the number of patches in the series (or directly to the commit the 
> series was rebased on).
> +  BASE_COMMIT: HEAD~1

Still think that something using
  git merge-base $CIRRUS_LAST_GREEN_CHANGE HEAD

might be better. With a bit of error handling for unset
CIRRUS_LAST_GREEN_CHANGE and for git not seeing enough history for
CIRRUS_LAST_GREEN_CHANGE.


> +apt-get update
> +apt-get -y install lcov

FWIW, I just added that to the install script used for the container / VM
image build. So it'll be pre-installed once that completes.

https://cirrus-ci.com/build/5818788821073920


> From feceea4413b84f478e6a0888cdfab4be1c80767a Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 20 Feb 2022 15:01:59 -0600
> Subject: [PATCH 6/7] wip: cirrus/windows: add compiler_warnings_script
> 
> I'm not sure how to write this test in windows shell; it's also not easy to
> write it in posix sh, since windows shell is somehow interpretting && and 
> ||...

That comment isn't accurate anymore now that it's in an external script,
right?


Greetings,

Andres Freund




Re: role self-revocation

2022-03-10 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 2:58 PM Stephen Frost  wrote:
> > It'd be useful to have a better definition of exactly what a
> > 'mini-superuser' is, but at least for the moment when it comes to roles,
> > let's look at what the spec says:
> 
> Gosh, I feel like I've spelled that out approximately 463,121 times
> already. That estimate might be slightly off though; I've been known
> to make mistakes from time to time

If there's a specific message that details it closely on the lists
somewhere, I'm happy to go review it.  I admit that I didn't go back and
look for such.

> > CREATE ROLE
> >   - Who is allowed to run CREATE ROLE is implementation-defined
> >   - After creation, this is effictively run:
> > GRANT new_role TO creator_role WITH ADMIN, GRANTOR "_SYSTEM"
> >
> > DROP ROLE
> >   - Any user who has been GRANT'd a role with ADMIN option is able to
> > DROP that role.
> >
> > GRANT ROLE
> >   - No cycles allowed
> >   - A role must have ADMIN rights on the role to be able to GRANT it to
> > another role.
> >
> > ALTER ROLE
> >   - Doesn't exist
> >
> > This actually looks to me like more-or-less what you're looking for, it
> > just isn't what we have today because CREATEROLE brings along with it a
> > bunch of other stuff, some of which we want and some that we don't, and
> > some things that the SQL spec says ADMIN should be allowed to do (DROP
> > ROLE) we don't allow today.
> 
> The above is mostly fine with me, except for the part about ALTER ROLE
> not existing. I think it's always good to be able to change your mind
> post-CREATE.

Errr, just to be clear, ALTER ROLE doesn't exist *in the spec*.  I
wasn't suggesting that we get rid of it, just that it doesn't exist in
the spec and therefore the spec doesn't have anything to say about it.

> Basically, in this sketch, ADMIN OPTION on a role involves the ability
> to DROP it, which means we don't need a separate role owner concept.

Right.  The above doesn't include any specifics about what to do with
ALTER ROLE, but my thought would be to have it also be under ADMIN
OPTION rather than under CREATEROLE, as I tried to outline (though not
very well, I'll admit) below.

> It also involves membership, meaning that you can freely exercise the
> privileges of the role without SET ROLE. While I'm totally down with
> having other possible behaviors as options, that particular behavior
> seems very useful to me, so, sounds great.

Well, yes and no- by default you're right, presuming everything is set
as inheirited, but I'd wish for us to keep the option of creating roles
which are noinherit and having that work just as it does today.

> > It's also not quite what I want because it requires that membership and
> > ADMIN go together where I'd like to be able to have those be
> > independently GRANT'able- and then some.
> >
> > I don't think we're that far from having all of these though.  To start
> > with, we remove from CREATEROLE the random things that it does which go
> > beyond what folks tend to expect- remove the whole 'grant any role to
> > any other' stuff, remove the 'drop role' exception, remove the
> > 'alter role' stuff.  Do make it so that when you create a role, however,
> > the above GRANT is effectively done.  Now, for the items above where we
> > removed the checks against have_createrole_privilege() we go back and
> > add in checks using is_admin_of_role().  Of course, also remove the role
> > self-administration bug.
> 
> What do you mean by the 'drop role' exception?

'ability' was probably a better word there.  What I'm talking about is
changing in DropRole:

if (!have_createrole_privilege())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 errmsg("permission denied to drop role")));

to be, more or less:

if (!is_admin_of_role(role))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 errmsg("permission denied to drop role")));

> I don't like removing 'alter role'.

Ditto above but for AlterRole.  Taking it away from users with
CREATEROLE being able to run those commands on $anyrole and instead
making it so that the role running DROP ROLE or ALTER ROLE needs to have
ADMIN on the role they're messing with.  I do think we may also need to
make some adjustments in terms of what a regular user WITH ADMIN on a
given role is able to do when it comes to ALTER ROLE, in particular, I
don't think we'll want to remove the existing is-superuser checks
against a user settings bypassrls or replication or superuser on some
other role.  Maybe we can provide a way for a non-superuser to be given
the ability to set those attributes for roles they create, but that
would be a separate thing.

> The rest sounds good.

Great.

> > That's step #1, but it gets us more-or-less what you're looking for, I
> > think, and brings us a lot closer to what the spec has.
> 
> Great.

Awesome.

> > 

Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 2:58 PM Stephen Frost  wrote:
> It'd be useful to have a better definition of exactly what a
> 'mini-superuser' is, but at least for the moment when it comes to roles,
> let's look at what the spec says:

Gosh, I feel like I've spelled that out approximately 463,121 times
already. That estimate might be slightly off though; I've been known
to make mistakes from time to time

> CREATE ROLE
>   - Who is allowed to run CREATE ROLE is implementation-defined
>   - After creation, this is effictively run:
> GRANT new_role TO creator_role WITH ADMIN, GRANTOR "_SYSTEM"
>
> DROP ROLE
>   - Any user who has been GRANT'd a role with ADMIN option is able to
> DROP that role.
>
> GRANT ROLE
>   - No cycles allowed
>   - A role must have ADMIN rights on the role to be able to GRANT it to
> another role.
>
> ALTER ROLE
>   - Doesn't exist
>
> This actually looks to me like more-or-less what you're looking for, it
> just isn't what we have today because CREATEROLE brings along with it a
> bunch of other stuff, some of which we want and some that we don't, and
> some things that the SQL spec says ADMIN should be allowed to do (DROP
> ROLE) we don't allow today.

The above is mostly fine with me, except for the part about ALTER ROLE
not existing. I think it's always good to be able to change your mind
post-CREATE.

Basically, in this sketch, ADMIN OPTION on a role involves the ability
to DROP it, which means we don't need a separate role owner concept.
It also involves membership, meaning that you can freely exercise the
privileges of the role without SET ROLE. While I'm totally down with
having other possible behaviors as options, that particular behavior
seems very useful to me, so, sounds great.

> It's also not quite what I want because it requires that membership and
> ADMIN go together where I'd like to be able to have those be
> independently GRANT'able- and then some.
>
> I don't think we're that far from having all of these though.  To start
> with, we remove from CREATEROLE the random things that it does which go
> beyond what folks tend to expect- remove the whole 'grant any role to
> any other' stuff, remove the 'drop role' exception, remove the
> 'alter role' stuff.  Do make it so that when you create a role, however,
> the above GRANT is effectively done.  Now, for the items above where we
> removed the checks against have_createrole_privilege() we go back and
> add in checks using is_admin_of_role().  Of course, also remove the role
> self-administration bug.

What do you mean by the 'drop role' exception?

I don't like removing 'alter role'.

The rest sounds good.

> That's step #1, but it gets us more-or-less what you're looking for, I
> think, and brings us a lot closer to what the spec has.

Great.

> Step #2 is also in-line with the spec: track GRANTORs and care about
> them, for everything.  We really should have been doing this all along.
> Note that I'm not saying that an owner of a table can't REVOKE some
> right that was GRANT'd on that table, but rather that a user who was
> GRANT'd ADMIN rights on a table and then GRANT'd that right to some
> other user shouldn't have some other user who only has ADMIN rights on
> the table be able to remove that GRANT.  Same goes for roles, meaning
> that you could GRANT rights in a role with ADMIN option and not have to
> be afraid that the role you just gave that to will be able to remove
> *your* ADMIN rights on that role.  In general, I don't think this
> would actually have a very large impact on users because most users
> don't, today, use the ADMIN option much.

There are details to work out here, but in general, I like it.

> Step #3 starts going in the direction of what I'd like to see, which
> would be to break out membership in a role as a separate thing from
> admin rights on that role.  This is also what would help with the 'bot'
> use-case that Joshua (not David Steele, btw) brought up.

Woops, apologies for getting the name wrong. I also said Marc earlier
when I meant Mark, because I work with people named Mark, Marc, and
Marc, and Mark's spelling got outvoted by some distant corner of my
brain.

I think this is a fine long-term direction, with the caveat that
you've not provided enough specifics here for me to really understand
how it would work. I fear the specifics might be hard to get right,
both in terms of making it understandable to users and in terms of
preserving as much backward-compatibility as we can. However, I am not
opposed to the concept.

> Step #4 then breaks the 'admin' option on roles into pieces- a 'drop
> role' right, a 'reset password' right, maybe separate rights for
> different role attributes, etc.  We would likely still keep the
> 'admin_option' column in pg_auth_members and just check that first
> and then check the individual rights (similar to table-level vs.
> column-level privileges) so that we stay in line with the spec's
> expectation here and with what users are used to.


Re: role self-revocation

2022-03-10 Thread David G. Johnston
On Thu, Mar 10, 2022 at 12:45 PM Stephen Frost  wrote:

>
> * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > On Thu, Mar 10, 2022 at 11:05 AM Stephen Frost 
> wrote:
>
Why not just look at the admin_option field of pg_auth_members...?  I
> don't get why that isn't an even more minimal fix than this idea you
> have of adding a column to pg_authid and then propagating around "this
> user could become a superuser" or writing code that has to go check "is
> there some way for this role to become a superuser, either directly or
> through some subset of pg_* roles?"
>

Indeed, maybe I am wrong on the scope of the patch.  But at least for the
explicit attribute it should be no more difficult than changing:

if (grouprole_is_superuser and current_role_is_not_superuser) then error:
to be
if ((grouoprole_is_superuser OR !groupuser_has_adminattr) AND
current_role_is_not_superuser) then error;

I have to imagine that given how fundamental inheritance is to our
permissions system than doing a similar check up the tree wouldn't be
difficult, but I truly don't know with a strong degree of certainty.

Assuming we don't actually rip out CREATEROLE when this change goes in...do
you propose to prohibit a CREATEROLE user from altering the membership
roster of any group which itself is not a member of and also those which it
is a member of but where admin_option is false?

I don't personally have a problem with the current state where CREATEROLE
is an admin for, but not a member of, every non-superuser(-related) role in
the system.  If the consensus is to change that then I suppose this becomes
the minimally invasive fix that accomplishes that goal as well.  It seems
incomplete though, since you still need superuser to create a group and add
the initial WITH ADMIN member to it.  So this seems to work in the "avoid
using superuser" sense if you've also added something that has what
CREATEROLE provides today - admin without membership - but that would have
the benefit of not carrying around all the baggage that CREATEROLE has.


> > I made the observation that being able to manage the membership of a
> group
> > without having the ability to create new users seems like a half a loaf
> of
> > a feature.  That's it.  I would presume that any redesign of the
> > permissions system here would address this adequately.
>
> If the new design ideas that are being thrown around don't address what
> you're thinking they should, it'd be great to point that out.
>

I mean, you need a Create Role permission in some form, even if it's
deprecating the attribute and making it a predefined role.  I picked this
thread up because it seemed like a limited scope that I could get my head
around with the time I have, with the main goal to try to understand this
aspect of the system better.  I haven't gone and looked into the main
thread yet.

David J.


Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis  wrote:
> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

I'm not clear on the standard either, exactly, but might not that
allow the database owner to get a peek at what's happening in other
databases?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: role self-revocation

2022-03-10 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> But that's not to say that we couldn't decide to do something else
> instead, and that other thing might well be better. Do you want to
> sketch out a full proposal, even just what the syntax would look like,
> and share that here? And if you could explain how I could use it to
> create the mini-superusers that I'm trying to get out of this thing,
> even better.

It'd be useful to have a better definition of exactly what a
'mini-superuser' is, but at least for the moment when it comes to roles,
let's look at what the spec says:

CREATE ROLE
  - Who is allowed to run CREATE ROLE is implementation-defined
  - After creation, this is effictively run:
GRANT new_role TO creator_role WITH ADMIN, GRANTOR "_SYSTEM"

DROP ROLE
  - Any user who has been GRANT'd a role with ADMIN option is able to
DROP that role.

GRANT ROLE
  - No cycles allowed
  - A role must have ADMIN rights on the role to be able to GRANT it to
another role.

ALTER ROLE
  - Doesn't exist

This actually looks to me like more-or-less what you're looking for, it
just isn't what we have today because CREATEROLE brings along with it a
bunch of other stuff, some of which we want and some that we don't, and
some things that the SQL spec says ADMIN should be allowed to do (DROP
ROLE) we don't allow today.

It's also not quite what I want because it requires that membership and
ADMIN go together where I'd like to be able to have those be
independently GRANT'able- and then some.

I don't think we're that far from having all of these though.  To start
with, we remove from CREATEROLE the random things that it does which go
beyond what folks tend to expect- remove the whole 'grant any role to
any other' stuff, remove the 'drop role' exception, remove the
'alter role' stuff.  Do make it so that when you create a role, however,
the above GRANT is effectively done.  Now, for the items above where we
removed the checks against have_createrole_privilege() we go back and
add in checks using is_admin_of_role().  Of course, also remove the role
self-administration bug.

That's step #1, but it gets us more-or-less what you're looking for, I
think, and brings us a lot closer to what the spec has.

Step #2 is also in-line with the spec: track GRANTORs and care about
them, for everything.  We really should have been doing this all along.
Note that I'm not saying that an owner of a table can't REVOKE some
right that was GRANT'd on that table, but rather that a user who was
GRANT'd ADMIN rights on a table and then GRANT'd that right to some
other user shouldn't have some other user who only has ADMIN rights on
the table be able to remove that GRANT.  Same goes for roles, meaning
that you could GRANT rights in a role with ADMIN option and not have to
be afraid that the role you just gave that to will be able to remove
*your* ADMIN rights on that role.  In general, I don't think this
would actually have a very large impact on users because most users
don't, today, use the ADMIN option much.

Step #3 starts going in the direction of what I'd like to see, which
would be to break out membership in a role as a separate thing from
admin rights on that role.  This is also what would help with the 'bot'
use-case that Joshua (not David Steele, btw) brought up.

Step #4 then breaks the 'admin' option on roles into pieces- a 'drop
role' right, a 'reset password' right, maybe separate rights for
different role attributes, etc.  We would likely still keep the
'admin_option' column in pg_auth_members and just check that first
and then check the individual rights (similar to table-level vs.
column-level privileges) so that we stay in line with the spec's
expectation here and with what users are used to.

In some hyptothetical world, there's even a later step #5 which allows
us to define user profiles and then grant the ability for a user to
create a role with a certain profile (but not any arbitrary profile),
thus making things like the 'bot' even more constrained in terms of
what it's able to do (maybe it can then create a role that's a member of
a role without itself being a member of that role or explicitly having
admin rights in that role, as an example).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: role self-revocation

2022-03-10 Thread Stephen Frost
Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 11:05 AM Stephen Frost  wrote:
> > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > On Thu, Mar 10, 2022 at 9:19 AM Stephen Frost  wrote:
> > > > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > > > On Thu, Mar 10, 2022 at 7:46 AM Robert Haas  
> > > > > wrote:
> >
> > > The only indirect way for a role to become superuser is to have been
> > > granted membership in a superuser group, then SET ROLE.  Non-superusers
> > > cannot do this.  If a superuser does this I consider the outcome to be no
> > > different than if they go and do:
> >
> > A non-superuser absolutely can be GRANT'd membership in a superuser role
> > and then SET ROLE to that user thus becoming a superuser.
> 
> A non-superuser cannot grant a non-superuser membership in a superuser
> role.  A superuser granting a user membership in a superuser role makes
> that user a superuser.  This seems sane.
> 
> If a superuser grants a non-superuser membership in a superuser role then
> today a non-superuser can grant a user membership in that intermediate
> role, thus having a non-superuser make another user a superuser.  This is
> arguably a bug that needs to be fixed.
> 
> My desired fix is to just require the superuser to mark (or have it marked
> by default ideally) the role inheriting superuser and put the
> responsibility on the superuser.  I agree this is not ideal, but it is
> probably quick and low risk.
> 
> I'll let someone else describe the details of the alternative option.  I
> suspect it will end up being a better option in terms of design.  But
> depending on time and risk even knowing that we want the better design
> eventually doesn't preclude getting the easier fix in now.
> 
> > No, what should matter is if the role doing the GRANT has admin rights
> > on pg_read_all_stats, or on the postgres role.  That also happens to be
> > what the spec says.
> 
> Yes, and superusers implicitly have that right, while CREATEROLE users
> implicitly have that right on the pg_* role but not on superuser roles.  I
> just want to plug that hole and include the pg_* roles (or any role for
> that matter) in being able to be denied implied ADMIN rights for
> non-superusers.

CREATEROLE users implicitly have that right on *all non-superuser
roles*.  Not just the pg_* ones, which is why the pg_* ones aren't any
different in this regard.

> > Today a non-superuser cannot "grant postgres to someuser;"
> >
> > No, but a role can be created like 'admin', which a superuser GRANT's
> > 'postgres' to and then that role can be GRANT'd to anyone by anyone who
> > has CREATEROLE rights.  That's not sane.
> 
> I agree.  And I've suggested a minimal fix, adding an attribute to the role
> that prohibits non-superusers from granting it to others, that removes the
> insane behavior.

I disagree that this is a minimal fix as I don't see it as a fix to the
actual issue, which is the ability for CREATEROLE users to GRANT role
membership to all non-superuser roles on the system.  CREATEROLE
shouldn't be allowing that.

> I'm on board for a hard-coded fix as well - if a superuser is in the
> membership chain of a role then non-superusers cannot grant membership in
> that role to others.

Why not just look at the admin_option field of pg_auth_members...?  I
don't get why that isn't an even more minimal fix than this idea you
have of adding a column to pg_authid and then propagating around "this
user could become a superuser" or writing code that has to go check "is
there some way for this role to become a superuser, either directly or
through some subset of pg_* roles?"

> Neither of those really solves the pg_* roles problem.  We still need to
> indicate that they are somehow special.  Whether it is a nice matrix or
> roles and permissions or a simple attribute that makes them behave like
> they are superuser roles.

I disagree that they should be considered special when it comes to role
membership and management.  They're just roles, like any other.

> > I disagree entirely with the idea that we must have some roles who can
> > only ever be administered by a superuser.
> 
> I don't think this is a must have.  I think that since we do have it today
> that fixes that leverage the status quo in order to be done more easily are
> perfectly valid solutions.

We have a half-way-implemented attempt at this, not something that's
actually effective, and therefore I don't agree that we really have it
today or that we should keep it.  I'd much prefer to throw out nearly
everything in the system that's doing an explicit check of "does this
role have a superuser bit set on it?"

> >   If anything, we should be
> > moving away (as we have, in fact, been doing), from anything being the
> > exclusive purview of the superuser.
> >
> 
> I totally agree.

Great.

> > > In particular, this means CREATEROLE roles cannot assign membership in
> > the
> > > marked roles; just 

Re: Reducing power consumption on idle servers

2022-03-10 Thread Andres Freund
Hi,

On 2022-03-10 17:50:47 +, Simon Riggs wrote:
> On Wed, 9 Mar 2022 at 01:16, Zheng Li  wrote:
> 
> > > 1. Standardize the hibernation time at 60s, using a #define
> > > HIBERNATE_DELAY_SEC 60
> >
> > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
> > seconds, what’s the reasoning behind it? Is longer hibernation delay
> > better? If so can we set it to INT_MAX (the max timeout allowed by
> > WaitLatch()) in which case a worker in hibernation only relies on
> > wakeup? I think it would be nice to run experiments to verify that the
> > patch reduces power consumption while varying the value of
> > HIBERNATE_DELAY_SEC.
> 
> Setting it to INT_MAX would be the same as not allowing a timeout,
> which changes a lot of current behavior and makes it less robust.

Most of these timeouts are a bad idea and should not exist. We repeatedly have
had bugs where we were missing wakeups etc but those bugs were harder to
notice because of timeouts.  I'm against entrenching this stuff further.

Greetings,

Andres Freund




Re: role self-revocation

2022-03-10 Thread David G. Johnston
On Thu, Mar 10, 2022 at 11:05 AM Stephen Frost  wrote:

> Greetings,
>
> * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > On Thu, Mar 10, 2022 at 9:19 AM Stephen Frost 
> wrote:
> > > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > > On Thu, Mar 10, 2022 at 7:46 AM Robert Haas 
> > > wrote:
>
> > The only indirect way for a role to become superuser is to have been
> > granted membership in a superuser group, then SET ROLE.  Non-superusers
> > cannot do this.  If a superuser does this I consider the outcome to be no
> > different than if they go and do:
>
> A non-superuser absolutely can be GRANT'd membership in a superuser role
> and then SET ROLE to that user thus becoming a superuser.


A non-superuser cannot grant a non-superuser membership in a superuser
role.  A superuser granting a user membership in a superuser role makes
that user a superuser.  This seems sane.

If a superuser grants a non-superuser membership in a superuser role then
today a non-superuser can grant a user membership in that intermediate
role, thus having a non-superuser make another user a superuser.  This is
arguably a bug that needs to be fixed.

My desired fix is to just require the superuser to mark (or have it marked
by default ideally) the role inheriting superuser and put the
responsibility on the superuser.  I agree this is not ideal, but it is
probably quick and low risk.

I'll let someone else describe the details of the alternative option.  I
suspect it will end up being a better option in terms of design.  But
depending on time and risk even knowing that we want the better design
eventually doesn't preclude getting the easier fix in now.


> No, what should matter is if the role doing the GRANT has admin rights
> on pg_read_all_stats, or on the postgres role.  That also happens to be
> what the spec says.
>

Yes, and superusers implicitly have that right, while CREATEROLE users
implicitly have that right on the pg_* role but not on superuser roles.  I
just want to plug that hole and include the pg_* roles (or any role for
that matter) in being able to be denied implied ADMIN rights for
non-superusers.


> Today a non-superuser cannot "grant postgres to someuser;"
>
> No, but a role can be created like 'admin', which a superuser GRANT's
> 'postgres' to and then that role can be GRANT'd to anyone by anyone who
> has CREATEROLE rights.  That's not sane.
>

I agree.  And I've suggested a minimal fix, adding an attribute to the role
that prohibits non-superusers from granting it to others, that removes the
insane behavior.

I'm on board for a hard-coded fix as well - if a superuser is in the
membership chain of a role then non-superusers cannot grant membership in
that role to others.

Neither of those really solves the pg_* roles problem.  We still need to
indicate that they are somehow special.  Whether it is a nice matrix or
roles and permissions or a simple attribute that makes them behave like
they are superuser roles.


>
> I disagree entirely with the idea that we must have some roles who can
> only ever be administered by a superuser.


I don't think this is a must have.  I think that since we do have it today
that fixes that leverage the status quo in order to be done more easily are
perfectly valid solutions.



>   If anything, we should be
> moving away (as we have, in fact, been doing), from anything being the
> exclusive purview of the superuser.
>

I totally agree.


>
> > In particular, this means CREATEROLE roles cannot assign membership in
> the
> > marked roles; just like they cannot assign membership in superuser roles
> > today.
>
> I disagree with the idea that we need to mark some roles as only being
> able to be modified by the superuser- why invent this?


Because CREATEUSER is a thing and people want to prevent roles with that
attribute from assigning membership to the predefined superuser-aspect
roles.  If I've misunderstood that desire and the scope of delegation given
by the superuser to CREATEUSER roles is acceptable, then no change here is
needed.

What you
> seem to be arguing for here is to rip out the ADMIN functionality, which
> is defined by spec and not even exclusively by PG, and replace it with a
> single per-role flag that says if that role can only be modified by
> superusers.


I made the observation that being able to manage the membership of a group
without having the ability to create new users seems like a half a loaf of
a feature.  That's it.  I would presume that any redesign of the
permissions system here would address this adequately.

 The
>
short answer is that it's not- which is why we have documented
> CREATEROLE as being 'superuser light'.  The goal here is to get rid of
> that.
>

Now you tell me.  Robert should have led with that goal upfront.

>
> Yes, we're talking about a new feature- one intended to replace the
> broken way that CREATEROLE works, which your proposal doesn't.
>
>
That is correct, I was trying to figure out minimally 

Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 2:05 PM Peter Eisentraut
 wrote:
> On 09.03.22 14:02, Robert Haas wrote:
> > On Wed, Mar 9, 2022 at 7:55 AM Peter Eisentraut
> >  wrote:
> >> Do we have subtractive permissions today?
> >
> > Not in the GRANT/REVOKE sense, I think, but you can put a user in a
> > group and then mention that group in pg_hba.conf. And that line might
> > be "reject" or whatever.
>
> Well, you can always build an external system that looks at roles and
> does nonsensical things with it.  But the privilege system itself seems
> to be additive only.  Personally, I agree with the argument that there
> should not be any subtractive permissions.  The mental model where
> permissions are sort of keys to doors or boxes just doesn't work for that.

I mean, I didn't design pg_hba.conf, but I think it's part of the
database doing a reasonable thing, not an external system doing a
nonsensical thing.

I am not sure that I (or anyone) would endorse a system where you can
say something like GRANT NOT SELECT ON TABLE foo TO bar, essentially
putting a negative ACL into the system dictating that, regardless of
any other grants that may exist, foo should not be able to SELECT from
that table. But I think it's reasonable to use groups as a way of
referencing a defined collection of users for some purpose. The
pg_hba.conf thing is an example of that. You put all the users that
you want to be treated in a certain way for authentication purposes
into a group, and then you mention the group in the file, and it just
works. I don't find that an unreasonable design at all. We could've
created some other kind of grouping mechanism for such purposes that
is separate from the role system, but we didn't choose to do that. I
don't know if that was the absolute best possible decision or not, but
it doesn't seem like an especially bad choice.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Column Filtering in Logical Replication

2022-03-10 Thread Tomas Vondra



On 3/10/22 04:09, Amit Kapila wrote:
> On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila  wrote:
>>
>> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
>>  wrote:
>>
>>> OK, I reworked this to do the same thing as the row filtering patch.
>>>
>>
>> Thanks, I'll check this.
>>
> 
> Some assorted comments:
> =
> 1. We don't need to send a column list for the old tuple in case of an
> update (similar to delete). It is not required to apply a column
> filter for those cases because we ensure that RI must be part of the
> column list for updates and deletes.

I'm not sure which part of the code does this refer to?

> 2.
> + /*
> + * Check if all columns referenced in the column filter are part of
> + * the REPLICA IDENTITY index or not.
> 
> I think this comment is reverse. The rule we follow here is that
> attributes that are part of RI must be there in a specified column
> list. This is used at two places in the patch.

Yeah, you're right. Will fix.

> 3. get_rel_sync_entry()
> {
> /* XXX is there a danger of memory leak here? beware */
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> + for (int i = 0; i < nelems; i++)
> ...
> }
> 
> Similar to the row filter, I think we need to use
> entry->cache_expr_cxt to allocate this. There are other usages of
> CacheMemoryContext in this part of the code but I think those need to
> be also changed and we can do that as a separate patch. If we do the
> suggested change then we don't need to separately free columns.

I agree a shorter-lived context would be better than CacheMemoryContext,
but "expr" seems to indicate it's for the expression only, so maybe we
should rename that. But do we really want a memory context for every
single entry?

> 4. I think we don't need the DDL changes in AtExecDropColumn. Instead,
> we can change the dependency of columns to NORMAL during publication
> commands.

I'll think about that.

> 5. There is a reference to check_publication_columns but that function
> is removed from the patch.

Right, will fix.

> 6.
> /*
> * If we know everything is replicated and the row filter is invalid
> * for update and delete, there is no point to check for other
> * publications.
> */
> if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
> pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
> !pubdesc->rf_valid_for_update && !pubdesc->rf_valid_for_delete)
> break;
> 
> /*
> * If we know everything is replicated and the column filter is invalid
> * for update and delete, there is no point to check for other
> * publications.
> */
> if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
> pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
> !pubdesc->cf_valid_for_update && !pubdesc->cf_valid_for_delete)
> break;
> 
> Can we combine these two checks?
> 

I was worried it'd get too complex / hard to understand, but I'll think
about maybe simplifying the conditions a bit.

> I feel this patch needs a more thorough review.
> 

I won't object to more review, of course.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Column Filtering in Logical Replication

2022-03-10 Thread Tomas Vondra



On 3/10/22 19:17, Tomas Vondra wrote:
> On 3/9/22 11:12, houzj.f...@fujitsu.com wrote:
>> Hi,
>>
>> Here are some tests and results about the table sync query of
>> column filter patch and row filter.
>>
>> 1) multiple publications which publish schema of parent table and partition.
>> pub
>> create schema s1;
>> create table s1.t (a int, b int, c int) partition by range (a);
>> create table t_1 partition of s1.t for values from (1) to (10);
>> create publication pub1 for all tables in schema s1;
>> create publication pub2 for table t_1(b);
>>
>> sub
>> - prepare tables
>> CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
>> pub1, pub2;
>>
>> When doing table sync for 't_1', the column list will be (b). I think it 
>> should
>> be no filter because table t_1 is also published via ALL TABLES IN SCHMEA
>> publication.
>>
>> For Row Filter, it will use no filter for this case.
>>
>>
>> 2) one publication publishes both parent and child
>> pub
>> create table t (a int, b int, c int) partition by range (a);
>> create table t_1 partition of t for values from (1) to (10)
>>partition by range (a);
>> create table t_2 partition of t_1 for values from (1) to (10);
>>
>> create publication pub2 for table t_1(a), t_2
>>   with (PUBLISH_VIA_PARTITION_ROOT);
>>
>> sub
>> - prepare tables
>> CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
>> pub2;
>>
>> When doing table sync for table 't_1', it has no column list. I think the
>> expected column list is (a).
>>
>> For Row Filter, it will use the row filter of the top most parent table(t_1) 
>> in
>> this case.
>>
>>
>> 3) one publication publishes both parent and child
>> pub
>> create table t (a int, b int, c int) partition by range (a);
>> create table t_1 partition of t for values from (1) to (10)
>>partition by range (a);
>> create table t_2 partition of t_1 for values from (1) to (10);
>>
>> create publication pub2 for table t_1(a), t_2(b)
>>   with (PUBLISH_VIA_PARTITION_ROOT);
>>
>> sub
>> - prepare tables
>> CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
>> pub2;
>>
>> When doing table sync for table 't_1', the column list would be (a, b). I 
>> think
>> the expected column list is (a).
>>
>> For Row Filter, it will use the row filter of the top most parent table(t_1) 
>> in
>> this case.
>>
> 
> Attached is an updated patch version, addressing all of those issues.
> 
> 0001 is a bugfix, reworking how we calculate publish_as_relid. The old
> approach was unstable with multiple publications, giving different
> results depending on order of the publications. This should be
> backpatched into PG13 where publish_via_partition_root was introduced, I
> think.
> 
> 0002 is the main patch, merging the changes proposed by Peter and fixing
> the issues reported here. In most cases this means adopting the code
> used for row filters, and perhaps simplifying it a bit.
> 
> 
> But I also tried to implement a row-filter test for 0001, and I'm not
> sure I understand the behavior I observe. Consider this:
> 
> -- a chain of 3 partitions (on both publisher and subscriber)
> CREATE TABLE test_part_rf (a int primary key, b int, c int)
>PARTITION BY LIST (a);
> 
> CREATE TABLE test_part_rf_1
>PARTITION OF test_part_rf FOR VALUES IN (1,2,3,4,5)
>PARTITION BY LIST (a);
> 
> CREATE TABLE test_part_rf_2
>PARTITION OF test_part_rf_1 FOR VALUES IN (1,2,3,4,5);
> 
> -- initial data
> INSERT INTO test_part_rf VALUES (1, 5, 100);
> INSERT INTO test_part_rf VALUES (2, 15, 200);
> 
> -- two publications, each adding a different partition
> CREATE PUBLICATION test_pub_part_1 FOR TABLE test_part_rf_1
>  WHERE (b < 10) WITH (publish_via_partition_root);
> 
> CREATE PUBLICATION test_pub_part_2 FOR TABLE test_part_rf_2
>  WHERE (b > 10) WITH (publish_via_partition_root);
> 
> -- now create the subscription (also try opposite ordering)
> CREATE SUBSCRIPTION test_part_sub CONNECTION '...'
>PUBLICATION test_pub_part_1, test_pub_part_2;
> 
> -- wait for sync
> 
> -- inert some more data
> INSERT INTO test_part_rf VALUES (3, 6, 300);
> INSERT INTO test_part_rf VALUES (4, 16, 400);
> 
> -- wait for catchup
> 
> Now, based on the discussion here, my expectation is that we'll use the
> row filter from the top-most ancestor in any publication, which in this
> case is test_part_rf_1. Hence the filter should be (b < 10).
> 
> So I'd expect these rows to be replicated:
> 
> 1,5,100
> 3,6,300
> 
> But that's not what I get, unfortunately. I get different results,
> depending on the order of publications:
> 
> 1) test_pub_part_1, test_pub_part_2
> 
> 1|5|100
> 2|15|200
> 3|6|300
> 4|16|400
> 
> 2) test_pub_part_2, test_pub_part_1
> 
> 3|6|300
> 4|16|400
> 
> That seems pretty bizarre, because it either means we're not enforcing
> any filter or some strange combination of filters (notice that for (2)
> we skip/replicate rows matching either 

Re: pg_stat_statements and "IN" conditions

2022-03-10 Thread Dmitry Dolgov
> On Thu, Mar 10, 2022 at 12:32:08PM -0500, Robert Haas wrote:
> On Thu, Mar 10, 2022 at 12:12 PM Tom Lane  wrote:
>
> > 2. You haven't made a case for it.  The original complaint was
> > about different lengths of IN lists not being treated as equivalent,
> > but this patch has decided to do I'm-not-even-sure-quite-what
> > about treating different Params as equivalent.  Plus you're trying
> > to invoke eval_const_expressions in the jumbler; that is absolutely
> > Not OK, for both safety and semantic reasons.
>
> I think there are two separate points here, one about patch quality
> and the other about whether the basic idea is good. I think the basic
> idea is good. I do not contend that collapsing IN-lists of arbitrary
> length is what everyone wants in all cases, but it seems entirely
> reasonable to me to think that it is what some people want. So I would
> say just make it a parameter and let people configure whichever
> behavior they want. My bet is 95% of users would prefer to have it on,
> but even if that's wildly wrong, having it as an optional behavior
> hurts nobody. Let it be off by default and let those who want it flip
> the toggle. On the code quality issue, I haven't read the patch but
> your concerns sound well-founded to me from reading what you wrote.

I have the same understanding, there is a toggle in the patch exactly
for this purpose.

To give a bit more context, the whole development was ORM-driven rather
than pulled out of thin air -- people were complaining about huge
generated queries that could be barely displayed in monitoring, I was
trying to address it via collapsing the list where it was happening. In
other words "I'm-not-even-sure-quite-what" part may be indeed too
extensive, but was triggered by real world issues.

Of course, I could get the implementation not quite right, e.g. I wasn't
aware about dangers of using eval_const_expressions. But that's what the
CF item and the corresponding discussion is for, I guess. Let me see
what I could do to improve it.




Re: role self-revocation

2022-03-10 Thread Peter Eisentraut

On 09.03.22 14:02, Robert Haas wrote:

On Wed, Mar 9, 2022 at 7:55 AM Peter Eisentraut
 wrote:

Do we have subtractive permissions today?


Not in the GRANT/REVOKE sense, I think, but you can put a user in a
group and then mention that group in pg_hba.conf. And that line might
be "reject" or whatever.


Well, you can always build an external system that looks at roles and 
does nonsensical things with it.  But the privilege system itself seems 
to be additive only.  Personally, I agree with the argument that there 
should not be any subtractive permissions.  The mental model where 
permissions are sort of keys to doors or boxes just doesn't work for that.






Re: Avoiding smgrimmedsync() during nbtree index builds

2022-03-10 Thread Andres Freund
Hi,

> From a06407b19c8d168ea966e45c0e483b38d49ddc12 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman 
> Date: Fri, 4 Mar 2022 14:48:39 -0500
> Subject: [PATCH v6 1/4] Add unbuffered IO API

I think this or one of the following patches should update 
src/backend/access/transam/README


> @@ -164,6 +164,16 @@ void
>  blbuildempty(Relation index)
>  {
>   Pagemetapage;
> + UnBufferedWriteState wstate;
> +
> + /*
> +  * Though this is an unlogged relation, pass do_wal=true since the init
> +  * fork of an unlogged index must be wal-logged and fsync'd. This 
> currently
> +  * has no effect, as unbuffered_write() does not do log_newpage()
> +  * internally. However, were this to be replaced with 
> unbuffered_extend(),
> +  * do_wal must be true to ensure the data is logged and fsync'd.
> +  */
> + unbuffered_prep(, true);

Wonder if unbuffered_write should have an assert checking that no writes to
INIT_FORKNUM are non-durable? Looks like it's pretty easy to forget...

I'd choose unbuffered_begin over _prep().


>   /* Construct metapage. */
>   metapage = (Page) palloc(BLCKSZ);
> @@ -176,18 +186,13 @@ blbuildempty(Relation index)
>* XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record.  Therefore, we need
>* this even when wal_level=minimal.
>*/
> - PageSetChecksumInplace(metapage, BLOOM_METAPAGE_BLKNO);
> - smgrwrite(RelationGetSmgr(index), INIT_FORKNUM, BLOOM_METAPAGE_BLKNO,
> -   (char *) metapage, true);
> + unbuffered_write(, RelationGetSmgr(index), INIT_FORKNUM,
> + BLOOM_METAPAGE_BLKNO, metapage);
> +
>   log_newpage(&(RelationGetSmgr(index))->smgr_rnode.node, INIT_FORKNUM,
>   BLOOM_METAPAGE_BLKNO, metapage, true);
>  
> - /*
> -  * An immediate sync is required even if we xlog'd the page, because the
> -  * write did not go through shared_buffers and therefore a concurrent
> -  * checkpoint may have moved the redo pointer past our xlog record.
> -  */
> - smgrimmedsync(RelationGetSmgr(index), INIT_FORKNUM);
> + unbuffered_finish(, RelationGetSmgr(index), INIT_FORKNUM);
>  }

I mildly prefer complete over finish, but ...



> - * We can't use the normal heap_insert function to insert into the new
> - * heap, because heap_insert overwrites the visibility information.
> - * We use a special-purpose raw_heap_insert function instead, which
> - * is optimized for bulk inserting a lot of tuples, knowing that we have
> - * exclusive access to the heap.  raw_heap_insert builds new pages in
> - * local storage.  When a page is full, or at the end of the process,
> - * we insert it to WAL as a single record and then write it to disk
> - * directly through smgr.  Note, however, that any data sent to the new
> - * heap's TOAST table will go through the normal bufmgr.
> + * We can't use the normal heap_insert function to insert into the new heap,
> + * because heap_insert overwrites the visibility information. We use a
> + * special-purpose raw_heap_insert function instead, which is optimized for
> + * bulk inserting a lot of tuples, knowing that we have exclusive access to 
> the
> + * heap.  raw_heap_insert builds new pages in local storage.  When a page is
> + * full, or at the end of the process, we insert it to WAL as a single record
> + * and then write it to disk directly through directmgr.  Note, however, that
> + * any data sent to the new heap's TOAST table will go through the normal
> + * bufmgr.

Part of this just reflows existing lines that seem otherwise unchanged, making
it harder to see the actual change.



> @@ -643,13 +644,6 @@ _bt_blnewpage(uint32 level)
>  static void
>  _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
>  {
> - /* XLOG stuff */
> - if (wstate->btws_use_wal)
> - {
> - /* We use the XLOG_FPI record type for this */
> - log_newpage(>index->rd_node, MAIN_FORKNUM, blkno, page, 
> true);
> - }
> -
>   /*
>* If we have to write pages nonsequentially, fill in the space with
>* zeroes until we come back and overwrite.  This is not logically
> @@ -661,33 +655,33 @@ _bt_blwritepage(BTWriteState *wstate, Page page, 
> BlockNumber blkno)
>   {
>   if (!wstate->btws_zeropage)
>   wstate->btws_zeropage = (Page) palloc0(BLCKSZ);
> - /* don't set checksum for all-zero page */
> - smgrextend(RelationGetSmgr(wstate->index), MAIN_FORKNUM,
> -wstate->btws_pages_written++,
> -(char *) wstate->btws_zeropage,
> -true);
> +
> + unbuffered_extend(>ub_wstate, 
> RelationGetSmgr(wstate->index),
> + MAIN_FORKNUM, wstate->btws_pages_written++,
> + wstate->btws_zeropage, true);
>   }

For a bit I thought the true 

Re: Column Filtering in Logical Replication

2022-03-10 Thread Tomas Vondra
On 3/9/22 10:20, Peter Eisentraut wrote:
> 
> On 07.03.22 16:18, Tomas Vondra wrote:
>> AFAICS these issues should be resolved by the adoption of the row-filter
>> approach (i.e. it should fail the same way as for row filter).
> 
> The first two patches (additional testing for row filtering feature)
> look okay to me.
> 
> Attached is a fixup patch for your main feature patch (the third one).
> 
> It's a bit of code and documentation cleanup, but mainly I removed the
> term "column filter" from the patch.  Half the code was using "column
> list" or similar and half the code "column filter", which was confusing.
>  Also, there seemed to be a bit of copy-and-pasting from row-filter code
> going on, with some code comments not quite sensible, so I rewrote some
> of them.  Also some code used "rf" and "cf" symbols which were a bit
> hard to tell apart.  A few more letters can increase readability.
> 
> Note in publicationcmds.c OpenTableList() the wrong if condition was used.
> 

Thanks, I've merged these changes into the patch.

> I'm still confused about the intended replica identity handling.  This
> patch still checks whether the column list contains the replica identity
> at DDL time.  And then it also checks at execution time.  I thought the
> latest understanding was that the DDL-time checking would be removed.  I
> think it's basically useless now, since as the test cases show, you can
> subvert those checks by altering the replica identity later.

Are you sure? Which part of the patch does that? AFAICS we only do those
checks in CheckCmdReplicaIdentity now, but maybe I'm missing something.
Are you sure you're not looking at some older patch version?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 12:26 PM Joshua Brindle
 wrote:
> Ownership implies DAC, the ability to grant others rights to an
> object. It's not "kooky" to see roles as owned objects, but it isn't
> required either. For example most objects on a UNIX system are owned
> and subject to DAC but users aren't.

I have no issue with anything you write in this paragraph.

> Stephen's, and now my, issue with ownership is that, since it implies
> DAC, most checks will be bypassed for the owner. We would both prefer
> for everyone to be subject to the grants, including whoever created
> the role.

That sounds like MAC, which is usually something that sits on top of
DAC and is enforced in addition to DAC, not a reason for DAC to not
exist.

> Rather, we'd like to see a "creators of roles get this set of grants
> against the role by default" and "as a superuser I can revoke grants
> from creators against roles they created"

If you create a table, you own it. You get a set of default
permissions on the table which can be revoked either by you or by
someone else, and you also have certain intrinsic rights over the
object as owner which cannot be revoked - including the ability to
re-grant yourself any previously-revoked permissions. I am not against
the idea of trying to clean things up so that everything you can do
with a table is a revocable privilege and you can be the owner without
having any rights at all, including the right to give yourself other
rights back, but I cannot believe that the idea of removing table
ownership as a concept would ever gain consensus on this list.
Therefore, I also do not think it is reasonable to say that we
shouldn't introduce a similar concept for object types that don't have
it yet, such as roles.

But that's not to say that we couldn't decide to do something else
instead, and that other thing might well be better. Do you want to
sketch out a full proposal, even just what the syntax would look like,
and share that here? And if you could explain how I could use it to
create the mini-superusers that I'm trying to get out of this thing,
even better.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: role self-revocation

2022-03-10 Thread Stephen Frost
Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 9:19 AM Stephen Frost  wrote:
> > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > On Thu, Mar 10, 2022 at 7:46 AM Robert Haas 
> > wrote:
> > > > On Wed, Mar 9, 2022 at 4:31 PM Tom Lane  wrote:
> > > > > I don't think we need syntax to describe it.  As I just said in my
> > > > > other reply, we have a perfectly good precedent for this already
> > > > > in ordinary object permissions.  That is: an object owner always,
> > > > > implicitly, has GRANT OPTION for all the object's privileges, even
> > > > > if she revoked the corresponding plain privilege from herself.
> > > > >
> > > > > Yeah, this does mean that we're effectively deciding that the creator
> > > > > of a role is its owner.  What's the problem with that?
> > > >
> > > > I don't think that's entirely the wrong concept, but it doesn't make a
> > > > lot of sense in a world where the creator has to be a superuser. If
> > > > alice, bob, and charlie are superusers who take turns creating new
> > > > users, and then we let charlie go due to budget cuts, forcing alice
> > > > and bob to change the owner of all the users he created to some other
> > > > superuser as a condition of dropping his account is a waste of
> > > > everyone's time. They can do exactly the same things to every account
> > > > on the system after we change the role owner as before.
> > >
> > > Then maybe we should just implement the idea that if a superuser would
> > > become the owner we instead substitute in the bootstrap user.  Or give
> > the
> > > DBA the choice whether they want to retain knowledge of specific roles -
> > > and thus are willing to accept the "waste of time".
> >
> > This doesn't strike me as going in the right direction.  Falling back to
> > the bootstrap superuser is generally a hack and not a great one.  I'll
> > also point out that the SQL spec hasn't got a concept of role ownership
> > either.
> >
> > > > But wait, I hear you cry, what about CREATEROLE? Well, CREATEROLE is
> > > > generally agreed to be broken right now, and if you don't agree with
> > > > that, consider that it can grant pg_execute_server_programs to a
> > > > newly-created account and then explain to me how it's functionally
> > > > different from superuser.
> > >
> > > CREATEROLE has long been defined as basically having "with admin option"
> > on
> > > every role in the system.  The failure to special-case the roles that
> > grant
> > > different aspects of superuser-ness to its members doesn't make
> > CREATEROLE
> > > itself broken, it makes the implementation of pg_execute_server_programs
> > > broken.  Only superusers should be considered to have with admin option
> > on
> > > these roles. They can delegate through the usual membership+admin
> > mechanism
> > > to a CREATEROLE role if they desire.
> >
> > No, CREATEROLE having admin option on every role in the system is broken
> > and always has been.  It's not just an issue for predefined roles like
> > pg_execute_server_program,
> 
> 
> 
> > it's an issue for any role that could become
> > a superuser either directly or indirectly and that extends beyond the
> > predefined ones.
> 
> 
> The only indirect way for a role to become superuser is to have been
> granted membership in a superuser group, then SET ROLE.  Non-superusers
> cannot do this.  If a superuser does this I consider the outcome to be no
> different than if they go and do:

A non-superuser absolutely can be GRANT'd membership in a superuser role
and then SET ROLE to that user thus becoming a superuser.  Giving users
a regular role to log in as and then membership in a role that can
become a superuser is akin to having a sudoers group in Unix and is good
practice, not something that everyone should have to be super-dooper
careful to not do, lest a CREATEROLE user be able to leverage that.

> SET allow_system_table_mods TO true;
> DROP pg_catalog.pg_class;

I don't equate these in the least.

> In short, having a CREATEROLE user issuing:
> GRANT pg_read_all_stats TO davidj;
> should result in the same outcome as them issuing:
> GRANT postgres TO davidj;
> -- ERROR:  must be superuser to alter superusers

No, what should matter is if the role doing the GRANT has admin rights
on pg_read_all_stats, or on the postgres role.  That also happens to be
what the spec says.

> Superusers can break their system and we don't go to great effort to stop
> them.  I see no difference here, so arguments of this nature aren't all
> that compelling to me.

That you don't feel they're compelling don't make them somehow not real,
nor even particularly uncommon, nor do I view ignoring that possibility
as somehow creating a strong authentication system.

> CREATEROLE shouldn't be
> > the determining factor in the question of if a role can GRANT a
> > predefined (or any other role) to some other role- that should be
> > governed by the admin option on that role, and that should 

Re: Reducing power consumption on idle servers

2022-03-10 Thread Simon Riggs
On Wed, 9 Mar 2022 at 01:16, Zheng Li  wrote:

> > 1. Standardize the hibernation time at 60s, using a #define
> > HIBERNATE_DELAY_SEC 60
>
> I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
> seconds, what’s the reasoning behind it? Is longer hibernation delay
> better? If so can we set it to INT_MAX (the max timeout allowed by
> WaitLatch()) in which case a worker in hibernation only relies on
> wakeup? I think it would be nice to run experiments to verify that the
> patch reduces power consumption while varying the value of
> HIBERNATE_DELAY_SEC.

Setting it to INT_MAX would be the same as not allowing a timeout,
which changes a lot of current behavior and makes it less robust.

Waking once per minute is what we do in various cases, so 60sec is a
good choice.

In the case of logical rep launcher we currently use 300sec, so using
60s would decrease this.

I don't see much difference between power consumption with timeouts of
60s and 300s.

In the latest patch, I chose 300s. Does anyone have an opinion on the
value here?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Alexander Korotkov
On Thu, Mar 10, 2022 at 6:39 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Good.  The revised patch is attached.  Instead of adding argument to
> > LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
> > LTREE_GET_ASIGLEN() macros.
>
> Um ... what I see after applying the patch is two redundant
> definitions of LTREE_GET_ASIGLEN, and no LTREE_GET_SIGLEN.

Sorry for publishing a patch, which doesn't even compile.
The attached version should be good.

--
Regards,
Alexander Korotkov


0001-ltree-gist-siglen-fix-v3.patch
Description: Binary data


Re: Parameter for planner estimate of recursive queries

2022-03-10 Thread Simon Riggs
On Fri, 28 Jan 2022 at 14:07, Robert Haas  wrote:
>
> On Tue, Jan 25, 2022 at 4:44 AM Peter Eisentraut
>  wrote:
> > On the one hand, this smells like a planner hint.  But on the other
> > hand, it doesn't look like we will come up with proper graph-aware
> > selectivity estimation system any time soon, so just having all graph
> > OLTP queries suck until then because the planner hint is hardcoded
> > doesn't seem like a better solution.  So I think this setting can be ok.
>
> I agree. It's a bit lame, but seems pretty harmless, and I can't see
> us realistically doing a lot better with any reasonable amount of
> work.

Shall I set this as Ready For Committer?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: pg_stat_statements and "IN" conditions

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 12:12 PM Tom Lane  wrote:
> > This seems incorrect, as the only feedback I've got was "this is a bad
> > idea", and no reaction on follow-up questions.
>
> I changed the status because it seems to me there is no chance of
> this being committed as-is.
>
> 1. I think an absolute prerequisite before we could even consider
> changing the query jumbler rules this much is to do the work that was
> put off when the jumbler was moved into core: that is, provide some
> honest support for multiple query-ID generation methods being used at
> the same time.  Even if you successfully make a case for
> pg_stat_statements to act this way, other consumers of query IDs
> aren't going to be happy with it.

FWIW, I don't find this convincing at all. Query jumbling is already
somewhat expensive, and it seems unlikely that the same person is
going to want to jumble queries in one way for pg_stat_statements and
another way for pg_stat_broccoli or whatever their other extension is.
Putting a lot of engineering work into something with such a marginal
use case seems not worthwhile to me - and also likely futile, because
I don't see how it could realistically be made nearly as cheap as a
single jumble.

> 2. You haven't made a case for it.  The original complaint was
> about different lengths of IN lists not being treated as equivalent,
> but this patch has decided to do I'm-not-even-sure-quite-what
> about treating different Params as equivalent.  Plus you're trying
> to invoke eval_const_expressions in the jumbler; that is absolutely
> Not OK, for both safety and semantic reasons.

I think there are two separate points here, one about patch quality
and the other about whether the basic idea is good. I think the basic
idea is good. I do not contend that collapsing IN-lists of arbitrary
length is what everyone wants in all cases, but it seems entirely
reasonable to me to think that it is what some people want. So I would
say just make it a parameter and let people configure whichever
behavior they want. My bet is 95% of users would prefer to have it on,
but even if that's wildly wrong, having it as an optional behavior
hurts nobody. Let it be off by default and let those who want it flip
the toggle. On the code quality issue, I haven't read the patch but
your concerns sound well-founded to me from reading what you wrote.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: role self-revocation

2022-03-10 Thread Joshua Brindle
On Thu, Mar 10, 2022 at 12:11 PM Robert Haas  wrote:
>
> On Thu, Mar 10, 2022 at 11:19 AM Stephen Frost  wrote:
> > I disagree that ownership is needed that's not what the spec calls for
> > either.  What we need is more flexibility when it comes to the
> > relationships which are allowed to be created between roles and what
> > privileges come with them.  To that end, I'd argue that we should be
> > extending pg_auth_members, first by separating out membership itself
> > into an explicitly tracked attribute (instead of being implicit in the
> > existance of a row in the table) and then adding on what other
> > privileges we see fit to add, such as the ability to DROP a role.  We
> > do need to remove the ability for a role who hasn't been explicitly
> > given the admin right on another role to modify that role's membership
> > too, as was originally proposed here.  This also seems to more closely
> > follow the spec's expectation, something that role ownership doesn't.
>
> I do not have a problem with more fine-grained kinds of authorization
> even though I think there are syntactic issues to work out, but I
> strongly disagree with the idea that we can't or shouldn't also have
> role ownership. Marc invented it. Now Tom has invented it
> independently. All sorts of other objects have it already. Trying to
> make it out like this is some kind of kooky idea is not believable.
> Yeah, it's not the most sophisticated or elegant model and that's why
> it's good for us to also have other things, but for simple cases it is
> easy to understand and works great.

Ownership implies DAC, the ability to grant others rights to an
object. It's not "kooky" to see roles as owned objects, but it isn't
required either. For example most objects on a UNIX system are owned
and subject to DAC but users aren't.

Stephen's, and now my, issue with ownership is that, since it implies
DAC, most checks will be bypassed for the owner. We would both prefer
for everyone to be subject to the grants, including whoever created
the role.

Rather, we'd like to see a "creators of roles get this set of grants
against the role by default" and "as a superuser I can revoke grants
from creators against roles they created"




Re: role self-revocation

2022-03-10 Thread David G. Johnston
On Thu, Mar 10, 2022 at 9:19 AM Stephen Frost  wrote:

> Greetings,
>
> * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > On Thu, Mar 10, 2022 at 7:46 AM Robert Haas 
> wrote:
> > > On Wed, Mar 9, 2022 at 4:31 PM Tom Lane  wrote:
> > > > I don't think we need syntax to describe it.  As I just said in my
> > > > other reply, we have a perfectly good precedent for this already
> > > > in ordinary object permissions.  That is: an object owner always,
> > > > implicitly, has GRANT OPTION for all the object's privileges, even
> > > > if she revoked the corresponding plain privilege from herself.
> > > >
> > > > Yeah, this does mean that we're effectively deciding that the creator
> > > > of a role is its owner.  What's the problem with that?
> > >
> > > I don't think that's entirely the wrong concept, but it doesn't make a
> > > lot of sense in a world where the creator has to be a superuser. If
> > > alice, bob, and charlie are superusers who take turns creating new
> > > users, and then we let charlie go due to budget cuts, forcing alice
> > > and bob to change the owner of all the users he created to some other
> > > superuser as a condition of dropping his account is a waste of
> > > everyone's time. They can do exactly the same things to every account
> > > on the system after we change the role owner as before.
> >
> > Then maybe we should just implement the idea that if a superuser would
> > become the owner we instead substitute in the bootstrap user.  Or give
> the
> > DBA the choice whether they want to retain knowledge of specific roles -
> > and thus are willing to accept the "waste of time".
>
> This doesn't strike me as going in the right direction.  Falling back to
> the bootstrap superuser is generally a hack and not a great one.  I'll
> also point out that the SQL spec hasn't got a concept of role ownership
> either.
>
> > > But wait, I hear you cry, what about CREATEROLE? Well, CREATEROLE is
> > > generally agreed to be broken right now, and if you don't agree with
> > > that, consider that it can grant pg_execute_server_programs to a
> > > newly-created account and then explain to me how it's functionally
> > > different from superuser.
> >
> > CREATEROLE has long been defined as basically having "with admin option"
> on
> > every role in the system.  The failure to special-case the roles that
> grant
> > different aspects of superuser-ness to its members doesn't make
> CREATEROLE
> > itself broken, it makes the implementation of pg_execute_server_programs
> > broken.  Only superusers should be considered to have with admin option
> on
> > these roles. They can delegate through the usual membership+admin
> mechanism
> > to a CREATEROLE role if they desire.
>
> No, CREATEROLE having admin option on every role in the system is broken
> and always has been.  It's not just an issue for predefined roles like
> pg_execute_server_program,



> it's an issue for any role that could become
> a superuser either directly or indirectly and that extends beyond the
> predefined ones.


The only indirect way for a role to become superuser is to have been
granted membership in a superuser group, then SET ROLE.  Non-superusers
cannot do this.  If a superuser does this I consider the outcome to be no
different than if they go and do:

SET allow_system_table_mods TO true;
DROP pg_catalog.pg_class;

In short, having a CREATEROLE user issuing:
GRANT pg_read_all_stats TO davidj;
should result in the same outcome as them issuing:
GRANT postgres TO davidj;
-- ERROR:  must be superuser to alter superusers

Superusers can break their system and we don't go to great effort to stop
them.  I see no difference here, so arguments of this nature aren't all
that compelling to me.

CREATEROLE shouldn't be
> the determining factor in the question of if a role can GRANT a
> predefined (or any other role) to some other role- that should be
> governed by the admin option on that role, and that should work exactly
> the same for predefined roles as it does for any other.
>

Never granting the CREATEROLE attribute to anyone will give you this
outcome today.



> ADMIN option
> without membership isn't something the catalogs today can understand
>

Today, they don't need to in order for the system to function within its
existing design specs.


> > ALTER ROLE pg_superuser WITH [NO] ADMIN;
> >
> > Then adding a role membership including the WITH ADMIN OPTION can be
> > rejected, as can the non-superuser situation.  Setting WITH NO ADMIN
> should
> > fail if any existing members have admin.  You must be a superuser to
> > execute WITH ADMIN (maybe WITH NO ADMIN as well...).  And possibly even a
> > new pg_* role that grants this ability (and maybe some others) for use
> by a
> > backup/restore user.
>
> I'm not following this in general or how it helps.  Surely we don't want
> to limit WITH ADMIN to superusers.


Today a non-superuser cannot "grant postgres to someuser;"

The point of this attribute is to allow the 

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Andres Freund
On 2022-03-10 10:43:06 -0500, Andrew Dunstan wrote:
> On 3/6/22 17:33, Andres Freund wrote:
> > One thing that's likely worth doing as part of the cross version upgrade 
> > test,
> > even if it wouldn't even help in this case, is to run amcheck post
> > upgrade. Just dumping data isn't going to touch indices at all.
> >
> > A sequence of
> >   pg_upgrade; amcheck; upgrade all extensions; amcheck;
> > would make sense.

> See
> 
 > This will be in the next release

Great, thanks for doing that!




Re: pg_stat_statements and "IN" conditions

2022-03-10 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> New status: Waiting on Author

> This seems incorrect, as the only feedback I've got was "this is a bad
> idea", and no reaction on follow-up questions.

I changed the status because it seems to me there is no chance of
this being committed as-is.

1. I think an absolute prerequisite before we could even consider
changing the query jumbler rules this much is to do the work that was
put off when the jumbler was moved into core: that is, provide some
honest support for multiple query-ID generation methods being used at
the same time.  Even if you successfully make a case for
pg_stat_statements to act this way, other consumers of query IDs
aren't going to be happy with it.

2. You haven't made a case for it.  The original complaint was
about different lengths of IN lists not being treated as equivalent,
but this patch has decided to do I'm-not-even-sure-quite-what
about treating different Params as equivalent.  Plus you're trying
to invoke eval_const_expressions in the jumbler; that is absolutely
Not OK, for both safety and semantic reasons.

If you backed off to just treating ArrayExprs containing different
numbers of Consts as equivalent, maybe that'd be something we could
adopt without fixing point 1.  I don't think anything that fuzzes the
treatment of Params can get away with that, though.

regards, tom lane




Re: role self-revocation

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 11:19 AM Stephen Frost  wrote:
> I disagree that ownership is needed that's not what the spec calls for
> either.  What we need is more flexibility when it comes to the
> relationships which are allowed to be created between roles and what
> privileges come with them.  To that end, I'd argue that we should be
> extending pg_auth_members, first by separating out membership itself
> into an explicitly tracked attribute (instead of being implicit in the
> existance of a row in the table) and then adding on what other
> privileges we see fit to add, such as the ability to DROP a role.  We
> do need to remove the ability for a role who hasn't been explicitly
> given the admin right on another role to modify that role's membership
> too, as was originally proposed here.  This also seems to more closely
> follow the spec's expectation, something that role ownership doesn't.

I do not have a problem with more fine-grained kinds of authorization
even though I think there are syntactic issues to work out, but I
strongly disagree with the idea that we can't or shouldn't also have
role ownership. Marc invented it. Now Tom has invented it
independently. All sorts of other objects have it already. Trying to
make it out like this is some kind of kooky idea is not believable.
Yeah, it's not the most sophisticated or elegant model and that's why
it's good for us to also have other things, but for simple cases it is
easy to understand and works great.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Andrew Dunstan


On 3/10/22 10:53, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 3/6/22 17:33, Andres Freund wrote:
>>> A sequence of
>>> pg_upgrade; amcheck; upgrade all extensions; amcheck;
>>> would make sense.
>> See
>> 
> Does this detect the problem at hand?
>
>   


AIUI, amcheck doesn't check gist indexes, hence Andres' "even if it
wouldn't even help in this case". So no.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Robert Haas
On Thu, Mar 10, 2022 at 6:02 AM Dilip Kumar  wrote:
> I have completely changed the logic for this refactoring.  Basically,
> write_relmap_file(), is already having parameters to control whether
> to write wal, send inval and we are already passing the dbpath.
> Instead of making a new function I just pass one additional parameter
> to this function itself about whether we are creating a new map or not
> and I think with that changes are very less and this looks cleaner to
> me.  Similarly for load_relmap_file() also I just had to pass the
> dbpath and memory for destination map.  Please have a look and let me
> know your thoughts.

It's not terrible, but how about something like the attached instead?
I think this has the effect of reducing the number of cases that the
low-level code needs to know about from 2 to 1, instead of making it
go up from 2 to 3.

> I think we should also write the test cases for create database
> strategy.  But I do not see any test case for create database for
> testing the existing options.  So I am wondering whether we should add
> the test case only for the new option we are providing or we should
> create a  separate path which tests the new option as well as the
> existing options.

FWIW, src/bin/scripts/t/020_createdb.pl does a little bit of testing
of this kind.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


relmap-refactor-rmh.patch
Description: Binary data


Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-10 Thread Bharath Rupireddy
On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis  wrote:
>
> On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> >
> > Attaching v6 patch set with above review comments addressed. Please
> > review it further.

Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
upthread [1] which is having more simple-yet-useful-and-effective
functions.

> * Don't issue WARNINGs or other messages for ordinary situations, like
> when pg_get_wal_records_info() hits the end of WAL.

v7 patch-set [1] has no warnings, but the functions will error out if
future LSN is specified.

> * It feels like the APIs that allow waiting for the end of WAL are
> slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> behavior? Try to make the API more orthogonal, where a few basic
> functions can be combined to give you everything you need, rather than
> specifying extra parameters and issuing WARNINGs. I

v7 patch-set [1] onwards waiting mode has been removed for all of the
functions, again to keep things simple-yet-useful-and-effective.
However, we can always add new pg_walinspect functions that wait for
future WAL in the next versions once basic stuff gets committed and if
many users ask for it.

> * In the docs, include some example output. I don't see any output in
> the tests, which makes sense because it's mostly non-deterministic, but
> it would be helpful to see sample output of at least
> pg_get_wal_records_info().

+1. Added for pg_get_wal_records_info and pg_get_wal_stats.

> * Is pg_get_wal_stats() even necessary, or can you get the same
> information with a query over pg_get_wal_records_info()? For instance,
> if you want to group by transaction ID rather than rmgr, then
> pg_get_wal_stats() is useless.

Yes, you are right pg_get_wal_stats provides WAL stats per resource
manager which is similar to pg_waldump with --start, --end and --stats
option. It provides more information than pg_get_wal_records_info and
is a good way of getting stats than adding more columns to
pg_get_wal_records_info, calculating percentage in sql and having
group by clause. IMO, pg_get_wal_stats is more readable and useful.

> * Would be nice to have a pg_wal_file_is_valid() or similar, which
> would test that it exists, and the header matches the filename (e.g. if
> it was recycled but not used, that would count as invalid). I think
> pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> even if the file is valid -- for example, if a wal record spans many
> wal segments, the segments might look invalid because they contain no
> complete records, but the file itself is still valid and contains valid
> wal data.

Actually I haven't tried testing a single WAL record spanning many WAL
files yet(I'm happy to try it if someone suggests such a use-case). In
that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
have a problem because it just gives the next valid LSN and it's
previous LSN using existing WAL reader API XLogFindNextRecord(). It
opens up the WAL file segments using (some dots to connect -
page_read/read_local_xlog_page, WALRead,
segment_open/wal_segment_open). Thoughts?

I don't think it's necessary to have a function pg_wal_file_is_valid()
that given a WAL file name as input checks whether a WAL file exists
or not, probably not in the core (xlogfuncs.c) too. These kinds of
functions can open up challenges in terms of user input validation and
may cause unnecessary problems, please see some related discussion
[2].

> * Is there a reason you didn't include the timeline ID in
> pg_get_wal_records_info()?

I'm right now allowing the functions to read WAL from the current
server's timeline which I have mentioned in the docs. The server's
current timeline is available via pg_control_checkpoint()'s
timeline_id. So, having timeline_id as a column doesn't make sense.
Again this is to keep things simple-yet-useful-and-effective. However,
we can add new pg_walinspect functions to read WAL from historic as
well as current timelines in the next versions once basic stuff gets
committed and if many users ask for it.

+ 
+  All the functions of this module will provide the WAL information using the
+  current server's timeline ID.
+ 

> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

'trusted' extensions concept is added by commit 50fc694 [3]. Since
pg_walinspect deals with WAL, we strictly want to control who creates
and can execute functions exposed by it, so I don't know if 'trusted'
is a good idea here. Also, pageinspect isn't a 'trusted' extension.

> * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> function should require pg_read_server_files? Or at least
> pg_read_all_data?

pg_read_all_data may not be the right choice, but pg_read_server_files
is. 

Re: pg_stat_statements and "IN" conditions

2022-03-10 Thread Dmitry Dolgov
> On Wed, Jan 05, 2022 at 10:11:11PM +0100, Dmitry Dolgov wrote:
> > On Tue, Jan 04, 2022 at 06:02:43PM -0500, Tom Lane wrote:
> > We can debate whether the rules proposed here are good for
> > pg_stat_statements or not, but it seems inevitable that they will be a
> > disaster for some other consumers of the query hash.
>
> Hm, which consumers do you mean here, potential extension? Isn't the
> ability to use an external module to compute queryid make this situation
> possible anyway?
>
> > do you really think that a query with two int
> > parameters is equivalent to one with five float parameters for all
> > query-identifying purposes?
>
> Nope, and it will be hard to figure this out no matter which approach
> we're talking about, because it mostly depends on the context and type
> of queries I guess. Instead, such functionality should allow some
> reasonable configuration. To be clear, the use case I have in mind here
> is not four or five, but rather a couple of hundreds constants where
> chances that the whole construction was generated automatically by ORM
> is higher than normal.
>
> > I can see the merits of allowing different numbers of IN elements
> > to be considered equivalent for pg_stat_statements, but this patch
> > seems to go far beyond that basic idea, and I fear the side-effects
> > will be very bad.
>
> Not sure why it goes far beyond, but then there were two approaches
> under consideration, as I've stated in the first message. I already
> don't remember all the details, but another one was evolving around
> doing similar things in a more limited fashion in transformAExprIn. The
> problem would be then to carry the information, necessary to represent
> the act of "merging" some number of queryids together. Any thoughts
> here?
>
> The idea of keeping the original queryid untouched and add another type
> of id instead sounds interesting, but it will add too much overhead for
> a quite small use case I guess.

```
Thu, 10 Mar 2022
New status: Waiting on Author
```

This seems incorrect, as the only feedback I've got was "this is a bad
idea", and no reaction on follow-up questions.




Re: [RFC] building postgres with meson -v7

2022-03-10 Thread Peter Eisentraut

On 09.03.22 17:44, Andres Freund wrote:

v6-0009-prereq-make-unicode-targets-work-in-vpath-builds.patch.gz

Another directory issue

I think it's a tad different, in that it's fixing something that's currently
broken in VPATH builds.


Ok, I took another look at this.

-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND 
-I$(abs_top_builddir)/src/common/unicode $(CPPFLAGS)


This could just be

-I.

-   $(PERL) generate-unicode_norm_table.pl
+   $(PERL) $< $(CURDIR)

I didn't detect a need for the additional directory argument.  (So the 
changes in generate-unicode_norm_table.pl are also apparently not 
necessary.)  Maybe this is something that will become useful later, in 
which case it should be split out from this patch.


The rest of this patch looks ok.





Re: role self-revocation

2022-03-10 Thread Mark Dilger



> On Mar 10, 2022, at 7:56 AM, David G. Johnston  
> wrote:
> 
> 
> I want that because I want mini-superusers, where alice can administer
> the users that alice creates just as if she were a superuser,
> including having their permissions implicitly and dropping them when
> she wants them gone, but where alice cannot break out to the operating
> system as a true superuser could do.
> 
> CREATEROLE (once the pg_* with admin rules are fixed) + Ownership and rules 
> restricting interfering with another role's objects (unless superuser) seems 
> to handle this.

What if one of alice's subordinates also owns roles?  Can alice interfere with 
*that* role's objects?  I don't see that a simple rule restricting roles from 
interfering with another role's objects is quite enough.  That raises the 
question of whether role ownership is transitive, and whether we need a concept 
similar to inherit/noinherit for ownership.

There is also the problem that CREATEROLE currently allows a set of privileges 
to be granted to created roles, and that set of privileges is hard-coded.  
You've suggested changing the hard-coded rules to remove pg_* roles from the 
list of grantable privileges, but that's still an inflexible set of hardcoded 
privileges.  Wouldn't it make more sense for the grantor to need GRANT OPTION 
on any privilege they give to roles they create?

> the bot can stand up
> accounts, can grant them membership in a defined set of groups, but
> cannot exercise the privileges of those accounts (or hack superuser
> either).
> 
> The bot should be provided a security definer procedure that encapsulates all 
> of this rather than us trying to hack the permission system.  This isn't a 
> user permission concern, it is an unauthorized privilege escalation concern.  
> Anyone with the bot's credentials can trivially overcome the third 
> restriction by creating a role with the desired membership and then logging 
> in as that role - and there is nothing the system can do to prevent that 
> while also allowing the other two permissions.

Doesn't this assume password authentication?  If the server uses ldap 
authentication, for example, wouldn't the bot need valid ldap credentials for 
at least one user for this attack to work?  And if CREATEROLE has been made 
more configurable, wouldn't the bot only be able to grant that ldap user the 
limited set of privileges that the bot's database user has been granted ADMIN 
OPTION for?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-10 Thread Stephen Frost
Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 7:46 AM Robert Haas  wrote:
> > On Wed, Mar 9, 2022 at 4:31 PM Tom Lane  wrote:
> > > I don't think we need syntax to describe it.  As I just said in my
> > > other reply, we have a perfectly good precedent for this already
> > > in ordinary object permissions.  That is: an object owner always,
> > > implicitly, has GRANT OPTION for all the object's privileges, even
> > > if she revoked the corresponding plain privilege from herself.
> > >
> > > Yeah, this does mean that we're effectively deciding that the creator
> > > of a role is its owner.  What's the problem with that?
> >
> > I don't think that's entirely the wrong concept, but it doesn't make a
> > lot of sense in a world where the creator has to be a superuser. If
> > alice, bob, and charlie are superusers who take turns creating new
> > users, and then we let charlie go due to budget cuts, forcing alice
> > and bob to change the owner of all the users he created to some other
> > superuser as a condition of dropping his account is a waste of
> > everyone's time. They can do exactly the same things to every account
> > on the system after we change the role owner as before.
> 
> Then maybe we should just implement the idea that if a superuser would
> become the owner we instead substitute in the bootstrap user.  Or give the
> DBA the choice whether they want to retain knowledge of specific roles -
> and thus are willing to accept the "waste of time".

This doesn't strike me as going in the right direction.  Falling back to
the bootstrap superuser is generally a hack and not a great one.  I'll
also point out that the SQL spec hasn't got a concept of role ownership
either.

> > But wait, I hear you cry, what about CREATEROLE? Well, CREATEROLE is
> > generally agreed to be broken right now, and if you don't agree with
> > that, consider that it can grant pg_execute_server_programs to a
> > newly-created account and then explain to me how it's functionally
> > different from superuser.
> 
> CREATEROLE has long been defined as basically having "with admin option" on
> every role in the system.  The failure to special-case the roles that grant
> different aspects of superuser-ness to its members doesn't make CREATEROLE
> itself broken, it makes the implementation of pg_execute_server_programs
> broken.  Only superusers should be considered to have with admin option on
> these roles. They can delegate through the usual membership+admin mechanism
> to a CREATEROLE role if they desire.

No, CREATEROLE having admin option on every role in the system is broken
and always has been.  It's not just an issue for predefined roles like
pg_execute_server_program, it's an issue for any role that could become
a superuser either directly or indirectly and that extends beyond the
predefined ones.  As this issue with CREATEROLE existed way before
predefined roles were added to PG, claiming that it's an issue with
predefined roles doesn't make a bit of sense.

> > The whole area needs a rethink. I believe
> > everyone involved in the discussion on the other threads agrees that
> > some reform of CREATEROLE is necessary, and more generally with the
> > idea that it's useful for non-superusers to be able to create roles.
> 
> As the documentation says, using SUPERUSER for day-to-day administration is
> contrary to good security practices.  Role management is considered to be a
> day-to-day administration activity.  I agree with this principle.  It was
> designed to neither be a superuser nor grant superuser, so removing the
> ability to grant the pg_* role memberships remains consistent with its
> original intent.

That would not be sufficient to make CREATEROLE safe.  Far, far from it.

> > I want that because I want mini-superusers, where alice can administer
> > the users that alice creates just as if she were a superuser,
> > including having their permissions implicitly and dropping them when
> > she wants them gone, but where alice cannot break out to the operating
> > system as a true superuser could do.
> 
> CREATEROLE (once the pg_* with admin rules are fixed) + Ownership and rules
> restricting interfering with another role's objects (unless superuser)
> seems to handle this.

This is not sufficient- roles can be not-superuser themselves but have
the ability to become superuser if GRANT'd a superuser role and
therefore we can't have a system where CREATEROLE allows arbitrary
GRANT'ing of roles to each other.  I'm a bit confused too as anything
where we are curtailing what CREATEROLE roles are able to do in a manner
that means they're only able to modify some subset of roles should
equally apply to predefined roles too- that is, CREATEROLE shouldn't be
the determining factor in the question of if a role can GRANT a
predefined (or any other role) to some other role- that should be
governed by the admin option on that role, and that should work exactly
the same for 

Re: role self-revocation

2022-03-10 Thread David G. Johnston
On Thu, Mar 10, 2022 at 7:46 AM Robert Haas  wrote:

> On Wed, Mar 9, 2022 at 4:31 PM Tom Lane  wrote:
> > I don't think we need syntax to describe it.  As I just said in my
> > other reply, we have a perfectly good precedent for this already
> > in ordinary object permissions.  That is: an object owner always,
> > implicitly, has GRANT OPTION for all the object's privileges, even
> > if she revoked the corresponding plain privilege from herself.
> >
> > Yeah, this does mean that we're effectively deciding that the creator
> > of a role is its owner.  What's the problem with that?
>
> I don't think that's entirely the wrong concept, but it doesn't make a
> lot of sense in a world where the creator has to be a superuser. If
> alice, bob, and charlie are superusers who take turns creating new
> users, and then we let charlie go due to budget cuts, forcing alice
> and bob to change the owner of all the users he created to some other
> superuser as a condition of dropping his account is a waste of
> everyone's time. They can do exactly the same things to every account
> on the system after we change the role owner as before.
>

Then maybe we should just implement the idea that if a superuser would
become the owner we instead substitute in the bootstrap user.  Or give the
DBA the choice whether they want to retain knowledge of specific roles -
and thus are willing to accept the "waste of time".


> But wait, I hear you cry, what about CREATEROLE? Well, CREATEROLE is
> generally agreed to be broken right now, and if you don't agree with
> that, consider that it can grant pg_execute_server_programs to a
> newly-created account and then explain to me how it's functionally
> different from superuser.


CREATEROLE has long been defined as basically having "with admin option" on
every role in the system.  The failure to special-case the roles that grant
different aspects of superuser-ness to its members doesn't make CREATEROLE
itself broken, it makes the implementation of pg_execute_server_programs
broken.  Only superusers should be considered to have with admin option on
these roles. They can delegate through the usual membership+admin mechanism
to a CREATEROLE role if they desire.


> The whole area needs a rethink. I believe
> everyone involved in the discussion on the other threads agrees that
> some reform of CREATEROLE is necessary, and more generally with the
> idea that it's useful for non-superusers to be able to create roles.
>

As the documentation says, using SUPERUSER for day-to-day administration is
contrary to good security practices.  Role management is considered to be a
day-to-day administration activity.  I agree with this principle.  It was
designed to neither be a superuser nor grant superuser, so removing the
ability to grant the pg_* role memberships remains consistent with its
original intent.


> I want that because I want mini-superusers, where alice can administer
> the users that alice creates just as if she were a superuser,
> including having their permissions implicitly and dropping them when
> she wants them gone, but where alice cannot break out to the operating
> system as a true superuser could do.


CREATEROLE (once the pg_* with admin rules are fixed) + Ownership and rules
restricting interfering with another role's objects (unless superuser)
seems to handle this.


> the bot can stand up
> accounts, can grant them membership in a defined set of groups, but
> cannot exercise the privileges of those accounts (or hack superuser
> either).


The bot should be provided a security definer procedure that encapsulates
all of this rather than us trying to hack the permission system.  This
isn't a user permission concern, it is an unauthorized privilege escalation
concern.  Anyone with the bot's credentials can trivially overcome the
third restriction by creating a role with the desired membership and then
logging in as that role - and there is nothing the system can do to prevent
that while also allowing the other two permissions.


> And that's why I'm not sure it's really the right idea to say that we
> don't need syntax for this admin-without-member concept.


We already have this syntax in the form of CREATEROLE.  But we do need a
fix, just on the group side.  We need a way to define a group as having no
ADMINS.

ALTER ROLE pg_superuser WITH [NO] ADMIN;

Then adding a role membership including the WITH ADMIN OPTION can be
rejected, as can the non-superuser situation.  Setting WITH NO ADMIN should
fail if any existing members have admin.  You must be a superuser to
execute WITH ADMIN (maybe WITH NO ADMIN as well...).  And possibly even a
new pg_* role that grants this ability (and maybe some others) for use by a
backup/restore user.

Or just special-case pg_* roles.

The advantage of exposing this to the DBA is that they can then package
pg_* roles into a custom group and still have the benefit of superuser only
administration.  In the special-case implementation the presence of a 

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Tom Lane
Andrew Dunstan  writes:
> On 3/6/22 17:33, Andres Freund wrote:
>> A sequence of
>> pg_upgrade; amcheck; upgrade all extensions; amcheck;
>> would make sense.

> See
> 

Does this detect the problem at hand?

regards, tom lane




Re: support for MERGE

2022-03-10 Thread Simon Riggs
On Sun, 27 Feb 2022 at 17:35, Alvaro Herrera  wrote:

> > > +   /*
> > > +* Project the tuple.  In case of a 
> > > partitioned table, the
> > > +* projection was already built to use the 
> > > root's descriptor,
> > > +* so we don't need to map the tuple here.
> > > +*/
> > > +   actionInfo.actionState = action;
> > > +   insert_slot = ExecProject(action->mas_proj);
> > > +
> > > +   (void) ExecInsert(mtstate, rootRelInfo,
> > > + 
> > > insert_slot, slot,
> > > + estate, 
> > > ,
> > > + 
> > > mtstate->canSetTag);
> > > +   InstrCountFiltered1(>ps, 1);
> >
> > What happens if somebody concurrently inserts a conflicting row?
>
> An open question to which I don't have any good answer RN.

Duplicate rows should produce a uniqueness violation error in one of
the transactions, assuming there is a constraint to define the
conflict. Without such a constraint there is no conflict.

Concurrent inserts are checked by merge-insert-update.spec, which
correctly raises an ERROR in this case, as documented.

Various cases were not tested in the patch - additional patch
attached, but nothing surprising there.

ExecInsert() does not return from such an ERROR, so the code fragment
appears correct to me.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


new_insert_tests_for_merge.patch
Description: Binary data


Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Andrew Dunstan


On 3/6/22 17:33, Andres Freund wrote:
> Hi,
>
> On 2022-03-06 07:46:04 -0500, Andrew Dunstan wrote:
>> This is an area not currently touched by the buildfarm's cross version
>> upgrade testing, which basically compares a pre-upgrade and post-upgrade
>> dump of the databases. The upgraded cluster does contain
>> contrib_regression_ltree.
>>
>> I'm open to suggestions on how we might improve the buildfarm's testing
>> of upgraded indexes generally.
> One thing that's likely worth doing as part of the cross version upgrade test,
> even if it wouldn't even help in this case, is to run amcheck post
> upgrade. Just dumping data isn't going to touch indices at all.
>
> A sequence of
>   pg_upgrade; amcheck; upgrade all extensions; amcheck;
> would make sense.
>


See




This will be in the next release


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Tom Lane
Alexander Korotkov  writes:
> Good.  The revised patch is attached.  Instead of adding argument to
> LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
> LTREE_GET_ASIGLEN() macros.

Um ... what I see after applying the patch is two redundant
definitions of LTREE_GET_ASIGLEN, and no LTREE_GET_SIGLEN.

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Dilip Kumar
On Thu, Mar 10, 2022 at 7:22 PM Ashutosh Sharma  wrote:
>
> Here are some review comments on the latest patch
> (v11-0004-WAL-logged-CREATE-DATABASE.patch). I actually did the review
> of the v10 patch but that applies for this latest version as well.
>
> +   /* Now errors are fatal ... */
> +   START_CRIT_SECTION();
>
> Did you mean PANIC instead of FATAL?

I think here fatal didn't really mean the error level FATAL, that
means critical and I have seen it is used in other places also.  But I
really don't think we need this comments to removed to avoid any
confusion.

> ==
>
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("invalid create
> strategy %s", strategy),
> +errhint("Valid strategies are
> \"wal_log\", and \"file_copy\".")));
> +   }
>
>
> Should this be - "invalid createdb strategy" instead of "invalid
> create strategy"?

Changed

> ==
>
> +   /*
> +* In case of ALTER DATABASE SET TABLESPACE we don't need to 
> do
> +* anything for the object which are not in the source
> db's default
> +* tablespace.  The source and destination dboid will be same 
> in
> +* case of ALTER DATABASE SET TABLESPACE.
> +*/
> +   else if (src_dboid == dst_dboid)
> +   continue;
> +   else
> +   dstrnode.spcNode = srcrnode.spcNode;
>
>
> Is this change still required? Do we support the WAL_COPY strategy for
> ALTER DATABASE?

Yeah now it is unreachable code so removed.

> ==
>
> +   /* Open the source and the destination relation at
> smgr level. */
> +   src_smgr = smgropen(srcrnode, InvalidBackendId);
> +   dst_smgr = smgropen(dstrnode, InvalidBackendId);
> +
> +   /* Copy relation storage from source to the destination. */
> +   CreateAndCopyRelationData(src_smgr, dst_smgr,
> relinfo->relpersistence);
>
> Do we need to do smgropen for destination relfilenode here? Aren't we
> already doing that inside RelationCreateStorage?

Yeah I have changed the complete logic and removed the smgr_open for
both source and destination and moved inside
CreateAndCopyRelationData, please check the updated code.

> ==
>
> +   use_wal = XLogIsNeeded() &&
> +   (relpersistence == RELPERSISTENCE_PERMANENT ||
> copying_initfork);
> +
> +   /* Get number of blocks in the source relation. */
> +   nblocks = smgrnblocks(src, forkNum);
>
> What if number of blocks in a source relation is ZERO? Should we check
> for that and return immediately. We have already done smgrcreate.

Yeah make sense to optimize, with that we will not have to get the
buffer strategy so done.

> ==
>
> +   /* We don't need to copy the shared objects to the target. */
> +   if (classForm->reltablespace == GLOBALTABLESPACE_OID)
> +   return NULL;
> +
> +   /*
> +* If the object doesn't have the storage then nothing to be
> +* done for that object so just ignore it.
> +*/
> +   if (!RELKIND_HAS_STORAGE(classForm->relkind))
> +   return NULL;
>
> We can probably club together above two if-checks.

Done

> ==
>
> +  
> +   strategy
> +   
> +
> + This is used for copying the database directory.  Currently, we have
> + two strategies the WAL_LOG and the
> + FILE_COPY.  If WAL_LOG 
> strategy
> + is used then the database will be copied block by block and it will
> + also WAL log each copied block.  Otherwise, if FILE_COPY
>
> I think we need to mention the default strategy in the documentation page.

Done

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 60bcdf3dc8458481b17fd4f4053b48e69ac9f050 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Mon, 14 Feb 2022 17:48:03 +0530
Subject: [PATCH v12 4/4] WAL logged CREATE DATABASE

Currently, CREATE DATABASE forces a checkpoint, then copies all the files,
then forces another checkpoint. The comments in the createdb() function
explain the reasons for this. The attached patch fixes this problem by making
create database completely WAL logged so that we can avoid the checkpoints.

We are also maintaining the old way of creating the database and for that we
are providing an option to choose the strategy for creating the database.
For the new method the user need to give STRATEGY=WAL_LOG and for the
old method they need to give STRATEGY=FILE_COPY.  The default strategy will
be WAL_LOG.
---
 doc/src/sgml/ref/create_database.sgml  |  23 +
 src/backend/commands/dbcommands.c  | 756 +++--
 src/backend/storage/buffer/bufmgr.c| 146 +++
 src/include/commands/dbcommands_xlog.h |   8 +
 src/include/storage/bufmgr.h   |   3 +
 src/tools/pgindent/typedefs.list   |   1 +
 6 files changed, 805 

Re: role self-revocation

2022-03-10 Thread Robert Haas
On Wed, Mar 9, 2022 at 4:31 PM Tom Lane  wrote:
> I don't think we need syntax to describe it.  As I just said in my
> other reply, we have a perfectly good precedent for this already
> in ordinary object permissions.  That is: an object owner always,
> implicitly, has GRANT OPTION for all the object's privileges, even
> if she revoked the corresponding plain privilege from herself.
>
> Yeah, this does mean that we're effectively deciding that the creator
> of a role is its owner.  What's the problem with that?

I don't think that's entirely the wrong concept, but it doesn't make a
lot of sense in a world where the creator has to be a superuser. If
alice, bob, and charlie are superusers who take turns creating new
users, and then we let charlie go due to budget cuts, forcing alice
and bob to change the owner of all the users he created to some other
superuser as a condition of dropping his account is a waste of
everyone's time. They can do exactly the same things to every account
on the system after we change the role owner as before.

But wait, I hear you cry, what about CREATEROLE? Well, CREATEROLE is
generally agreed to be broken right now, and if you don't agree with
that, consider that it can grant pg_execute_server_programs to a
newly-created account and then explain to me how it's functionally
different from superuser. The whole area needs a rethink. I believe
everyone involved in the discussion on the other threads agrees that
some reform of CREATEROLE is necessary, and more generally with the
idea that it's useful for non-superusers to be able to create roles.
But the reasons why people want that vary.

I want that because I want mini-superusers, where alice can administer
the users that alice creates just as if she were a superuser,
including having their permissions implicitly and dropping them when
she wants them gone, but where alice cannot break out to the operating
system as a true superuser could do. I want this because the lack of
meaningful privilege separation that led to CVE-2019-9193 being filed
spuriously is a very real problem. It's a thing a lot of people want,
and I want to give it to them. David Steele, on the other hand, wants
to build a user-creating bot that can create accounts but otherwise
conforms to the principle of least privilege: the bot can stand up
accounts, can grant them membership in a defined set of groups, but
cannot exercise the privileges of those accounts (or hack superuser
either). Other people may well want other things.

And that's why I'm not sure it's really the right idea to say that we
don't need syntax for this admin-without-member concept. If we just
want to bolt role ownership onto the existing framework without really
changing anything else, we can do that without extra syntax and, as
you say here, make it an implicit property of role ownership. But I
don't see that as has having much value; we just end up with a bunch
of superuser owners. Whatever. Now Stephen made the argument that we
ought to actually have admin-without-member as a first class concept,
something that could be assigned to arbitrary users. Actually, I think
he wanted it even more fine grained with that. And I think that could
make the concept a lot more useful, but then it needs some kind of
understandable syntax.

There's a lot of moving parts here. It's not just about coming up with
something that sounds generally logical, but about creating a system
that has some real-world utility.

> > But do we really have to solve this problem before we can clean up
> > this session exception?
>
> I think we need a plan for where we're going.  I don't see "clean up
> the session exception" as an end in itself; it's part of re-examining
> how all of this ought to work.  I don't say that we have to have a
> complete patch right away, only that we need a coherent end goal.

I'd like to have a plan, too, but if this behavior is accidental, I
still think we can remove it without making big decisions about future
direction. The perfect is the enemy of the good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PATCH: add "--config-file=" option to pg_rewind

2022-03-10 Thread Gunnar "Nick" Bluth
Am 10.03.22 um 14:43 schrieb Michael Banck:
> Heya,
> 
> some minor comments, I didn't look at the added test and I did not test
> the patch at all, but (as part of the Debian/Ubuntu packaging team) I
> think this patch is really important:

Much appreciated!

[...]

> I think we usually just say "data directory"; pgdata was previously only
> used as part of the option name, not like this in the text.

Fixed.

[...]

> target-pgdata is the name of the option, but maybe it is useful to spell
> out "target data directory" here, even if that means we spill over to
> two lines (which we might have to do anyway, that new line is pretty
> long).

Fixed.

> 
>> @@ -121,6 +123,7 @@ main(int argc, char **argv)
>>  {"no-sync", no_argument, NULL, 'N'},
>>  {"progress", no_argument, NULL, 'P'},
>>  {"debug", no_argument, NULL, 3},
>> +{"config-file", required_argument, NULL, 5},
> 
> Not sure what our policy is, but the way the numbered options are 1, 2,
> 4, 3, 5 after this is weird; this was already off before this patch
> though.

That one has been triggering my inner Monk too ;-)

Fixed!


[...]

> You removed an empty line here. Maybe rather prepend this with a comment
> on what's going on, and have en empty line before and afterwards.

> Kinde same here.

It does smell a bit like "stating the obvious", but alas! Both fixed.

Cheers,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/ref/pg_rewind.sgml   |  12 +
 src/bin/pg_rewind/pg_rewind.c |  24 -
 src/bin/pg_rewind/t/001_basic.pl  |   1 +
 src/bin/pg_rewind/t/RewindTest.pm | 105 +++---
 4 files changed, 101 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..62fcb71825 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+When using the -c / --restore-target-wal option, the restore_command
+is extracted from the configuration of the target cluster. If the postgresql.conf
+of that cluster is not in the target data directory (or if you want to use an alternative config),
+use this option to provide a (relative or absolute) path to the postgresql.conf to be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index b39b5c1aac..2e83c7ee8e 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -61,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -87,6 +88,8 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if it resides outside\n"));
+	printf(_(" the target data directory (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -114,13 +117,14 @@ main(int argc, char **argv)
 		{"write-recovery-conf", no_argument, NULL, 'R'},
 		{"source-pgdata", required_argument, NULL, 1},
 		{"source-server", required_argument, NULL, 2},
+		{"debug", no_argument, NULL, 3},
 		{"no-ensure-shutdown", no_argument, NULL, 4},
+		{"config-file", required_argument, NULL, 5},
 		{"version", no_argument, NULL, 'V'},
 		{"restore-target-wal", no_argument, NULL, 'c'},
 		{"dry-run", no_argument, NULL, 'n'},
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
-		{"debug", no_argument, NULL, 3},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -205,6 +209,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1061,6 +1069,13 @@ getRestoreCommand(const char *argv0)
 	appendPQExpBufferStr(postgres_cmd, " -D ");
 	appendShellString(postgres_cmd, datadir_target);
 
+	/* add --config_file switch only if requested */
+	if (config_file != NULL)
+	{
+		appendPQExpBufferStr(postgres_cmd, " --config_file=");
+		appendShellString(postgres_cmd, config_file);
+	}
+
 	/* add -C switch, for restore_command */
 	

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Ashutosh Sharma
Here are some review comments on the latest patch
(v11-0004-WAL-logged-CREATE-DATABASE.patch). I actually did the review
of the v10 patch but that applies for this latest version as well.

+   /* Now errors are fatal ... */
+   START_CRIT_SECTION();

Did you mean PANIC instead of FATAL?

==

+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("invalid create
strategy %s", strategy),
+errhint("Valid strategies are
\"wal_log\", and \"file_copy\".")));
+   }


Should this be - "invalid createdb strategy" instead of "invalid
create strategy"?

==

+   /*
+* In case of ALTER DATABASE SET TABLESPACE we don't need to do
+* anything for the object which are not in the source
db's default
+* tablespace.  The source and destination dboid will be same in
+* case of ALTER DATABASE SET TABLESPACE.
+*/
+   else if (src_dboid == dst_dboid)
+   continue;
+   else
+   dstrnode.spcNode = srcrnode.spcNode;


Is this change still required? Do we support the WAL_COPY strategy for
ALTER DATABASE?

==

+   /* Open the source and the destination relation at
smgr level. */
+   src_smgr = smgropen(srcrnode, InvalidBackendId);
+   dst_smgr = smgropen(dstrnode, InvalidBackendId);
+
+   /* Copy relation storage from source to the destination. */
+   CreateAndCopyRelationData(src_smgr, dst_smgr,
relinfo->relpersistence);

Do we need to do smgropen for destination relfilenode here? Aren't we
already doing that inside RelationCreateStorage?

==

+   use_wal = XLogIsNeeded() &&
+   (relpersistence == RELPERSISTENCE_PERMANENT ||
copying_initfork);
+
+   /* Get number of blocks in the source relation. */
+   nblocks = smgrnblocks(src, forkNum);

What if number of blocks in a source relation is ZERO? Should we check
for that and return immediately. We have already done smgrcreate.

==

+   /* We don't need to copy the shared objects to the target. */
+   if (classForm->reltablespace == GLOBALTABLESPACE_OID)
+   return NULL;
+
+   /*
+* If the object doesn't have the storage then nothing to be
+* done for that object so just ignore it.
+*/
+   if (!RELKIND_HAS_STORAGE(classForm->relkind))
+   return NULL;

We can probably club together above two if-checks.

==

+  
+   strategy
+   
+
+ This is used for copying the database directory.  Currently, we have
+ two strategies the WAL_LOG and the
+ FILE_COPY.  If WAL_LOG strategy
+ is used then the database will be copied block by block and it will
+ also WAL log each copied block.  Otherwise, if FILE_COPY

I think we need to mention the default strategy in the documentation page.

--
With Regards,
Ashutosh Sharma.

On Thu, Mar 10, 2022 at 4:32 PM Dilip Kumar  wrote:
>
> On Wed, Mar 9, 2022 at 6:44 PM Robert Haas  wrote:
> >
> > > Right, infact now also if you see the logic, the
> > > write_relmap_file_internal() is taking care of the actual update and
> > > the write_relmap_file() is doing update + setting the global
> > > variables.  So yeah we can rename as you suggested in 0001 and here
> > > also we can change the logic as you suggested that the recovery and
> > > createdb will only call the first function which is just doing the
> > > update.
> >
> > But I think we want the path construction to be managed by the
> > function rather than the caller, too.
>
> I have completely changed the logic for this refactoring.  Basically,
> write_relmap_file(), is already having parameters to control whether
> to write wal, send inval and we are already passing the dbpath.
> Instead of making a new function I just pass one additional parameter
> to this function itself about whether we are creating a new map or not
> and I think with that changes are very less and this looks cleaner to
> me.  Similarly for load_relmap_file() also I just had to pass the
> dbpath and memory for destination map.  Please have a look and let me
> know your thoughts.
>
> > Sure, I guess. It's just not obvious why the argument would actually
> > need to be a function that copies storage, or why there's more than
> > one way to copy storage. I'd rather keep all the code paths unified,
> > if we can, and set behavior via flags or something, maybe. I'm not
> > sure whether that's realistic, though.
>
> I try considering that, I think it doesn't look good to make it flag
> based,  One of the main problem I noticed is that now for copying
> either we need to call RelationCopyStorageis() or
> RelationCopyStorageUsingBuffer() based on the input flag.  But if we
> move the main copy function to the storage.c then the storage.c will
> have depedency on bufmgr functions 

Re: Lowering the ever-growing heap->pd_lower

2022-03-10 Thread Matthias van de Meent
On Wed, 16 Feb 2022 at 21:14, Matthias van de Meent
 wrote:
>
> On Wed, 16 Feb 2022 at 20:54, Peter Geoghegan  wrote:
> >
> > On Tue, Feb 15, 2022 at 10:48 AM Matthias van de Meent
> >  wrote:
> > > Peter Geoghegan asked for good arguments for the two changes
> > > implemented. Below are my arguments detailed, with adversarial loads
> > > that show the problematic behaviour of the line pointer array that is
> > > fixed with the patch.
> >
> > Why is it okay that lazy_scan_prune() still calls
> > PageGetMaxOffsetNumber() once for the page, before it ever calls
> > heap_page_prune()? Won't lazy_scan_prune() need to reestablish maxoff
> > now, if only so that its scan-page-items loop doesn't get confused
> > when it goes on to read "former line pointers"? This is certainly
> > possible with the CLOBBER_FREED_MEMORY stuff in place (which will
> > memset the truncated line pointer space with a 0x7F7F7F7F pattern).
>
> Good catch, it is not. Attached a version that re-establishes maxoff
> after each prune operation.

I double-checked the changes, and to me it seems like that was the
only place in the code where PageGetMaxOffsetNumber was not handled
correctly. This was fixed in the latest patch (v8).

Peter, would you have time to further review this patch and/or commit it?

- Matthias




Re: Changing "Hot Standby" to "hot standby"

2022-03-10 Thread Daniel Westermann (DWE)
>>>Hmm.  Outside the title that had better use upper-case characters for
>>>the first letter of each word, I can see references to the pattern you
>>>are trying to eliminate in amcheck.sgml (1), config.sgml (3),
>>>protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
>>>well if the point is to make the full set of docs consistent?

>>>As of the full tree, I can see that:
>>>
>>$ git grep "hot standby" | wc -l
>>259

>>$ git grep "Hot Standby" | wc -l
>>73

>>>So there is a trend for one of the two.

>>Thanks for looking at it. Yes, I am aware there are other places which would 
>>need to be changed and I think I mentioned that in an >>earlier Email. Are 
>>you suggesting to change all at once? I wanted to start with the 
>>documentation and then continue with the other >>places.

>Attached a new version which also modifies amcheck.sgml, config.sgml, 
>protocol.sgml, and mvcc.sgml accordingly.

Regards
Daniel


From: Daniel Westermann (DWE) 
Sent: Wednesday, March 9, 2022 15:15
To: Michael Paquier 
Cc: Robert Treat ; Kyotaro Horiguchi 
; aleksan...@timescale.com ; 
pgsql-hackers@lists.postgresql.org 
Subject: Re: Changing "Hot Standby" to "hot standby" 
 
>>Hmm.  Outside the title that had better use upper-case characters for
>>the first letter of each word, I can see references to the pattern you
>>are trying to eliminate in amcheck.sgml (1), config.sgml (3),
>>protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
>>well if the point is to make the full set of docs consistent?

>>As of the full tree, I can see that:
>>
>>$ git grep "hot standby" | wc -l
>>259

>>$ git grep "Hot Standby" | wc -l
>>73

>>So there is a trend for one of the two.

>>Thanks for looking at it. Yes, I am aware there are other places which would 
>>need to be changed and I think I mentioned that in an >>earlier Email. Are 
>>you suggesting to change all at once? I wanted to start with the 
>>documentation and then continue with the other >>places.

>Attached a new version which also modifies amcheck.sgml, config.sgml, 
>protocol.sgml, and mvcc.sgml accordingly.

Sending this again as my last two mails did not seem to reach the archives or 
the commitfest. Or do they need moderation somehow?

Regards
Danieldiff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 11d1eb5af2..5d61a33936 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -174,7 +174,7 @@ ORDER BY c.relpages DESC LIMIT 10;
   hypothetically, undiscovered bugs in the underlying B-Tree index
   access method code.  Note that
   bt_index_parent_check cannot be used when
-  Hot Standby mode is enabled (i.e., on read-only physical
+  hot standby mode is enabled (i.e., on read-only physical
   replicas), unlike bt_index_check.
  
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..e2db2a789f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4550,7 +4550,7 @@ ANY num_sync ( .
@@ -4582,7 +4582,7 @@ ANY num_sync ( .
@@ -10887,7 +10887,7 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 Enables logging of recovery-related debugging output that otherwise
 would not be logged. This parameter allows the user to override the
 normal setting of , but only for
-specific messages. This is intended for use in debugging Hot Standby.
+specific messages. This is intended for use in debugging hot standby.
 Valid values are DEBUG5, DEBUG4,
 DEBUG3, DEBUG2, DEBUG1, and
 LOG.  The default, LOG, does not affect
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b5b6042104..81fa26f985 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -548,8 +548,8 @@ protocol to make nodes agree on a serializable transactional order.
rollforward will take considerably longer, so that technique only
offers a solution for disaster recovery, not high availability.
A standby server can also be used for read-only queries, in which case
-   it is called a Hot Standby server. See  for
-   more information.
+   it is called a hot standby server. See 
+for more information.
   
 
   
@@ -1032,7 +1032,7 @@ primary_slot_name = 'node_a_slot'

 

-Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
+Hot standby feedback propagates upstream, whatever the cascaded arrangement.

 

@@ -1499,16 +1499,16 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
   Hot Standby
 
   
-   Hot Standby
+   hot standby
   
 

-Hot Standby is the term used to describe the ability to connect to
+Hot standby is the term used to describe the ability to connect to
 the server and run read-only queries while the server is in archive
 recovery or standby mode. This
 is useful both for replication purposes and for restoring a backup
   

Re: PATCH: add "--config-file=" option to pg_rewind

2022-03-10 Thread Michael Banck
Heya,

some minor comments, I didn't look at the added test and I did not test
the patch at all, but (as part of the Debian/Ubuntu packaging team) I
think this patch is really important:

On Tue, Mar 01, 2022 at 12:35:27PM +0100, Gunnar "Nick" Bluth wrote:
> diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
> index 33e6bb64ad..d0278e9b46 100644
> --- a/doc/src/sgml/ref/pg_rewind.sgml
> +++ b/doc/src/sgml/ref/pg_rewind.sgml
> @@ -241,6 +241,18 @@ PostgreSQL documentation
>
>   
>  
> + 
> +  --config-file= class="parameter">filepath
> +  
> +   
> +When using the -c / --restore-target-wal option, 
> the restore_command
> +is extracted from the configuration of the target cluster. If the 
> postgresql.conf
> +of that cluster is not in the target pgdata directory (or if you 
> want to use an alternative config),

I think we usually just say "data directory"; pgdata was previously only
used as part of the option name, not like this in the text.

> diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
> index b39b5c1aac..294353a9d5 100644
> --- a/src/bin/pg_rewind/pg_rewind.c
> +++ b/src/bin/pg_rewind/pg_rewind.c
> @@ -61,6 +61,7 @@ char   *datadir_target = NULL;
>  char*datadir_source = NULL;
>  char*connstr_source = NULL;
>  char*restore_command = NULL;
> +char*config_file = NULL;
>  
>  static bool debug = false;
>  bool showprogress = false;
> @@ -87,6 +88,7 @@ usage(const char *progname)
>   printf(_("Options:\n"));
>   printf(_("  -c, --restore-target-wal   use restore_command in 
> target configuration to\n"
>" retrieve WAL files 
> from archives\n"));
> + printf(_("  --config-file=FILE path to postgresql.conf if 
> outside target-pgdata (for -c)\n"));

>   printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to 
> modify\n"));
>   printf(_("  --source-pgdata=DIRECTORY  source data directory to 
> synchronize with\n"));
>   printf(_("  --source-server=CONNSTRsource server to synchronize 
> with\n"));

target-pgdata is the name of the option, but maybe it is useful to spell
out "target data directory" here, even if that means we spill over to
two lines (which we might have to do anyway, that new line is pretty
long).

> @@ -121,6 +123,7 @@ main(int argc, char **argv)
>   {"no-sync", no_argument, NULL, 'N'},
>   {"progress", no_argument, NULL, 'P'},
>   {"debug", no_argument, NULL, 3},
> + {"config-file", required_argument, NULL, 5},

Not sure what our policy is, but the way the numbered options are 1, 2,
4, 3, 5 after this is weird; this was already off before this patch
though.

>   {NULL, 0, NULL, 0}
>   };
>   int option_index;
> @@ -205,6 +208,10 @@ main(int argc, char **argv)
>   case 4:
>   no_ensure_shutdown = true;
>   break;
> +
> + case 5:
> + config_file = pg_strdup(optarg);
> + break;
>   }
>   }
>  
> @@ -1060,7 +1067,11 @@ getRestoreCommand(const char *argv0)
>   /* add -D switch, with properly quoted data directory */
>   appendPQExpBufferStr(postgres_cmd, " -D ");
>   appendShellString(postgres_cmd, datadir_target);
> -
> + if (config_file != NULL)
> + {
> + appendPQExpBufferStr(postgres_cmd, " --config_file=");
> + appendShellString(postgres_cmd, config_file);
> + }
>   /* add -C switch, for restore_command */

You removed an empty line here. Maybe rather prepend this with a comment
on what's going on, and have en empty line before and afterwards.

>   appendPQExpBufferStr(postgres_cmd, " -C restore_command");
>  
> @@ -1138,6 +1149,11 @@ ensureCleanShutdown(const char *argv0)
>   /* add set of options with properly quoted data directory */
>   appendPQExpBufferStr(postgres_cmd, " --single -F -D ");
>   appendShellString(postgres_cmd, datadir_target);
> + if (config_file != NULL)
> + {
> + appendPQExpBufferStr(postgres_cmd, " --config_file=");
> + appendShellString(postgres_cmd, config_file);
> + }
>  
>   /* finish with the database name, and a properly quoted redirection */

Kinde same here.


Cheers,

Michael

-- 
Michael Banck
Teamleiter PostgreSQL-Team
Projektleiter
Tel.: +49 2166 9901-171
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-10 Thread Tomas Vondra
On 3/10/22 08:09, Alexander Korotkov wrote:
> On Tue, Mar 8, 2022 at 2:05 AM Alexander Korotkov  
> wrote:
>> ...
>>
>> Good.  The revised patch is attached.  Instead of adding argument to
>> LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
>> LTREE_GET_ASIGLEN() macros.
> 
> No feedback yet.  I'm going to push this if no objections.
> 

WFM. I certainly don't have a better idea how to fix this.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Skipping logical replication transactions on subscriber side

2022-03-10 Thread Amit Kapila
On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada  wrote:
>
> I've attached an updated patch along with two patches for cfbot tests
> since the main patch (0003) depends on the other two patches. Both
> 0001 and 0002 patches are the same ones I attached on another
> thread[2].
>

Few comments on 0003:
=
1.
+ 
+  
+   subskiplsn pg_lsn
+  
+  
+   Commit LSN of the transaction whose changes are to be skipped,
if a valid
+   LSN; otherwise 0/0.
+  
+ 

Can't this be prepared LSN or rollback prepared LSN? Can we say
Finish/End LSN and then add some details which all LSNs can be there?

2. The conflict resolution explanation needs an update after the
latest commits and we should probably change the commit LSN
terminology as mentioned in the previous point.

3. The text in alter_subscription.sgml looks a bit repetitive to me
(similar to what we have in logical-replication.sgml related to
conflicts). Here also we refer to only commit LSN which needs to be
changed as mentioned in the previous two points.

4.
if (strcmp(lsn_str, "none") == 0)
+ {
+ /* Setting lsn = NONE is treated as resetting LSN */
+ lsn = InvalidXLogRecPtr;
+ }
+ else
+ {
+ /* Parse the argument as LSN */
+ lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
+ CStringGetDatum(lsn_str)));
+
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL location (LSN): %s", lsn_str)));

Is there a reason that we don't want to allow setting 0
(InvalidXLogRecPtr) for skip LSN?

5.
+# The subscriber will enter an infinite error loop, so we don't want
+# to overflow the server log with error messages.
+$node_subscriber->append_conf(
+ 'postgresql.conf',
+ qq[
+wal_retrieve_retry_interval = 2s
+]);

Can we change this test to use disable_on_error feature? I am thinking
if the disable_on_error feature got committed first, maybe we can have
one test file for this and disable_on_error feature (something like
conflicts.pl).

-- 
With Regards,
Amit Kapila.




Re: Optimize external TOAST storage

2022-03-10 Thread Dilip Kumar
On Thu, Mar 10, 2022 at 2:04 PM davinder singh
 wrote:
>
> Thanks Dilip, I have fixed your comments, please find the updated patch.
>

Some more comments

+/*
+ * For those cases where storing compressed data is not optimal,
We will use
+ * this pointer copy for referring uncompressed data.
+ */
+memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr));
+memcpy(toast_values_copy, toast_values, sizeof(toast_values));

I think we can change the comments like below

"Preserve references to the original uncompressed data before
attempting the compression.  So that during externalize if we decide
not to store the compressed data then we have the original data with
us.  For more details refer to comments atop
toast_tuple_opt_externalize".

+/*
+ * Don't save compressed data to external storage unless it saves I/O while
+ * accessing the same data by reducing the number of chunks.
+ */
+void
+toast_tuple_opt_externalize(ToastTupleContext *ttc, int attribute, int options,
+Datum orig_toast_value, ToastAttrInfo *orig_attr)

I think these comments are not explaining what is the problem with
storing the compressed data.  So you need to expand this comment
saying if it is not reducing the number or chunks then there is no
point in storing the compressed data because then we will have
additional decompression cost whenever we are fetching that data.


+
+/* Sanity check: if data is not compressed then we can proceed as usual. */
+if (!VARATT_IS_COMPRESSED(DatumGetPointer(*value)))
+toast_tuple_externalize(ttc, attribute, options);

I think we don't need "Sanity check:" here, the remaining part of the
comment is fine.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: logical decoding and replication of sequences

2022-03-10 Thread Peter Eisentraut

On 23.02.22 00:24, Tomas Vondra wrote:

Here's an updated version of the patch, fixing most of the issues from
reviews so far. There's still a couple FIXME comments, but I think those
are minor and/or straightforward to deal with.


This patch needs a rebase because of a conflict in 
expected/publication.out.  In addition, see the attached fixup patch to 
get the pg_dump tests passing (and some other stuff).


028_sequences.pl should be renamed to 029, since there is now another 028.

In psql, the output of \dRp and \dRp+ is inconsistent.  The former shows

All tables | All sequences | Inserts | Updates | Deletes | Truncates | 
Sequences | Via root


the latter shows

All tables | All sequences | Inserts | Updates | Deletes | Sequences | 
Truncates | Via root


I think the first order is the best one.

That's all for now, I'll come back with more reviewing later.From 43b0aa50d621f3b640e4285e4e329c361988a6c4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 10 Mar 2022 12:02:56 +0100
Subject: [PATCH] fixup! Add support for decoding sequences to built-in
 replication

---
 src/bin/pg_dump/t/002_pg_dump.pl | 2 +-
 src/interfaces/libpq/fe-exec.c   | 9 -
 src/test/subscription/t/028_sequences.pl | 5 ++---
 3 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 14b89c62f7..1bcd845254 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -2398,7 +2398,7 @@
create_order => 50,
create_sql   => 'CREATE PUBLICATION pub4;',
regexp => qr/^
-   \QCREATE PUBLICATION pub4 WITH (publish = 'insert, 
update, delete, truncate');\E
+   \QCREATE PUBLICATION pub4 WITH (publish = 'insert, 
update, delete, truncate, sequence');\E
/xm,
like => { %full_runs, section_post_data => 1, },
},
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 2b81065c47..0c39bc9abf 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -3733,10 +3733,7 @@ char *
 PQgetvalue(const PGresult *res, int tup_num, int field_num)
 {
if (!check_tuple_field_number(res, tup_num, field_num))
-   {
-   printf("getvalue\n");
return NULL;
-   }
return res->tuples[tup_num][field_num].value;
 }
 
@@ -3747,10 +3744,7 @@ int
 PQgetlength(const PGresult *res, int tup_num, int field_num)
 {
if (!check_tuple_field_number(res, tup_num, field_num))
-   {
-   printf("getlength\n");
return 0;
-   }
if (res->tuples[tup_num][field_num].len != NULL_LEN)
return res->tuples[tup_num][field_num].len;
else
@@ -3764,10 +3758,7 @@ int
 PQgetisnull(const PGresult *res, int tup_num, int field_num)
 {
if (!check_tuple_field_number(res, tup_num, field_num))
-   {
-   printf("getisnull\n");
return 1;   /* pretend it is null */
-   }
if (res->tuples[tup_num][field_num].len == NULL_LEN)
return 1;
else
diff --git a/src/test/subscription/t/028_sequences.pl 
b/src/test/subscription/t/028_sequences.pl
index 3843440946..cdd7f7f344 100644
--- a/src/test/subscription/t/028_sequences.pl
+++ b/src/test/subscription/t/028_sequences.pl
@@ -6,7 +6,7 @@
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 6;
+use Test::More;
 
 # Initialize publisher node
 my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
@@ -199,5 +199,4 @@
'check replicated sequence values on subscriber');
 
 
-$node_subscriber->stop('fast');
-$node_publisher->stop('fast');
+done_testing();
-- 
2.35.1



Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-10 Thread Dilip Kumar
On Wed, Mar 9, 2022 at 6:44 PM Robert Haas  wrote:
>
> > Right, infact now also if you see the logic, the
> > write_relmap_file_internal() is taking care of the actual update and
> > the write_relmap_file() is doing update + setting the global
> > variables.  So yeah we can rename as you suggested in 0001 and here
> > also we can change the logic as you suggested that the recovery and
> > createdb will only call the first function which is just doing the
> > update.
>
> But I think we want the path construction to be managed by the
> function rather than the caller, too.

I have completely changed the logic for this refactoring.  Basically,
write_relmap_file(), is already having parameters to control whether
to write wal, send inval and we are already passing the dbpath.
Instead of making a new function I just pass one additional parameter
to this function itself about whether we are creating a new map or not
and I think with that changes are very less and this looks cleaner to
me.  Similarly for load_relmap_file() also I just had to pass the
dbpath and memory for destination map.  Please have a look and let me
know your thoughts.

> Sure, I guess. It's just not obvious why the argument would actually
> need to be a function that copies storage, or why there's more than
> one way to copy storage. I'd rather keep all the code paths unified,
> if we can, and set behavior via flags or something, maybe. I'm not
> sure whether that's realistic, though.

I try considering that, I think it doesn't look good to make it flag
based,  One of the main problem I noticed is that now for copying
either we need to call RelationCopyStorageis() or
RelationCopyStorageUsingBuffer() based on the input flag.  But if we
move the main copy function to the storage.c then the storage.c will
have depedency on bufmgr functions because I don't think we can keep
RelationCopyStorageUsingBuffer() inside storage.c.  So for now, I have
duplicated the loop which is already there in index_copy_data() and
heapam_relation_copy_data() and kept that in bufmgr.c and also moved
RelationCopyStorageUsingBuffer() into the bufmgr.c.  I think bufmgr.c
is already having function which are dealing with smgr things so I
feel this is the right place for the function.

Other changes:
1.  0001 and 0002 are merged because now we are not really refactoring
these function and just passing the additioanl arguments to it make
sense to combine the changes.
2. Same with 0003, that now we are not refactoring existing functions
but providing new interfaces so merged it with the 0004 (which was
0006 previously)

I think we should also write the test cases for create database
strategy.  But I do not see any test case for create database for
testing the existing options.  So I am wondering whether we should add
the test case only for the new option we are providing or we should
create a  separate path which tests the new option as well as the
existing options.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From a479f7057649e2c6ef332ff313e9291089e193e0 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Thu, 10 Mar 2022 10:18:18 +0530
Subject: [PATCH v11 1/4] Extend relmap interfaces

Support new interfaces in relmapper, 1) Support copying the
relmap file from one database path to the other database path.
2) And another interface for getting filenode from oid.  We already
have RelationMapOidToFilenode for the same purpose but that assumes
we are connected to the database for which we want to get the mapping.
So this new interface will do the same but instead, it will get the
mapping for the input database.

These interfaces are required for next patch, for supporting the
wal logged created database.
---
 src/backend/utils/cache/relmapper.c | 159 
 src/include/utils/relmapper.h   |   7 +-
 2 files changed, 132 insertions(+), 34 deletions(-)

diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 4f6811f..6501110 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -136,10 +136,12 @@ static void apply_map_update(RelMapFile *map, Oid relationId, Oid fileNode,
 			 bool add_okay);
 static void merge_map_updates(RelMapFile *map, const RelMapFile *updates,
 			  bool add_okay);
-static void load_relmap_file(bool shared, bool lock_held);
+static void load_relmap_file(bool shared, bool lock_held, RelMapFile *dstmap,
+			 const char *dbpath);
 static void write_relmap_file(bool shared, RelMapFile *newmap,
 			  bool write_wal, bool send_sinval, bool preserve_files,
-			  Oid dbid, Oid tsid, const char *dbpath);
+			  Oid dbid, Oid tsid, const char *dbpath,
+			  bool update_relmap);
 static void perform_relmap_update(bool shared, const RelMapFile *updates);
 
 
@@ -250,6 +252,32 @@ RelationMapFilenodeToOid(Oid filenode, bool shared)
 }
 
 /*
+ * RelationMapOidToFilenodeForDatabase
+ *
+ * Same as 

Re: ICU for global collation

2022-03-10 Thread Julien Rouhaud
On Thu, Mar 10, 2022 at 10:52:41AM +0100, Peter Eisentraut wrote:
> On 05.03.22 09:38, Julien Rouhaud wrote:
> > @@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List 
> > *parameters, bool if_not_e
> >   errmsg("collation \"default\" cannot be copied")));
> >  }
> > 
> > -   if (localeEl)
> > -   {
> > -   collcollate = defGetString(localeEl);
> > -   collctype = defGetString(localeEl);
> > -   }
> > [...]
> > 
> > I tried to read the function and quickly got confused about whether possible
> > problematic conditions could be reached or not and had protection or not.  I
> > think that DefineCollation is becoming more and more unreadable, with a mix 
> > of
> > fromEl, !fromEl and either.  Maybe it would be a good time to refactor the 
> > code
> > a bit and have have something like
> > 
> > if (fromEl)
> > {
> >  // initialize all specific things
> > }
> > else
> > {
> >  // initialize all specific things
> > }
> > 
> > // handle defaults and sanity checks
> > 
> > What do you think?
> 
> How about the attached?

Thanks!  That's exactly what I had in mind.  I think it's easier to follow and
make sure of what it's doing exactly, so +1 for it.




Re: ICU for global collation

2022-03-10 Thread Peter Eisentraut

On 05.03.22 09:38, Julien Rouhaud wrote:

@@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List 
*parameters, bool if_not_e
  errmsg("collation \"default\" cannot be copied")));
 }

-   if (localeEl)
-   {
-   collcollate = defGetString(localeEl);
-   collctype = defGetString(localeEl);
-   }
[...]

I tried to read the function and quickly got confused about whether possible
problematic conditions could be reached or not and had protection or not.  I
think that DefineCollation is becoming more and more unreadable, with a mix of
fromEl, !fromEl and either.  Maybe it would be a good time to refactor the code
a bit and have have something like

if (fromEl)
{
 // initialize all specific things
}
else
{
 // initialize all specific things
}

// handle defaults and sanity checks

What do you think?


How about the attached?
From c7b408ba646c5ee5f8c6ae84bec04ca4059ed08f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 10 Mar 2022 10:49:49 +0100
Subject: [PATCH] DefineCollation() code cleanup

Reorganize the code in DefineCollation() so that the parts using the
FROM clause and the parts not doing so are more cleanly separated.  No
functionality change intended.

Reported-by: Julien Rouhaud 
---
 src/backend/commands/collationcmds.c | 109 ++-
 1 file changed, 57 insertions(+), 52 deletions(-)

diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index 12fc2316f9..93df1d366c 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -63,12 +63,11 @@ DefineCollation(ParseState *pstate, List *names, List 
*parameters, bool if_not_e
DefElem*providerEl = NULL;
DefElem*deterministicEl = NULL;
DefElem*versionEl = NULL;
-   char   *collcollate = NULL;
-   char   *collctype = NULL;
-   char   *collproviderstr = NULL;
-   boolcollisdeterministic = true;
-   int collencoding = 0;
-   charcollprovider = 0;
+   char   *collcollate;
+   char   *collctype;
+   boolcollisdeterministic;
+   int collencoding;
+   charcollprovider;
char   *collversion = NULL;
Oid newoid;
ObjectAddress address;
@@ -167,65 +166,71 @@ DefineCollation(ParseState *pstate, List *names, List 
*parameters, bool if_not_e

(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 errmsg("collation \"default\" cannot 
be copied")));
}
-
-   if (localeEl)
+   else
{
-   collcollate = defGetString(localeEl);
-   collctype = defGetString(localeEl);
-   }
+   char   *collproviderstr = NULL;
 
-   if (lccollateEl)
-   collcollate = defGetString(lccollateEl);
+   collcollate = NULL;
+   collctype = NULL;
 
-   if (lcctypeEl)
-   collctype = defGetString(lcctypeEl);
+   if (localeEl)
+   {
+   collcollate = defGetString(localeEl);
+   collctype = defGetString(localeEl);
+   }
 
-   if (providerEl)
-   collproviderstr = defGetString(providerEl);
+   if (lccollateEl)
+   collcollate = defGetString(lccollateEl);
 
-   if (deterministicEl)
-   collisdeterministic = defGetBoolean(deterministicEl);
+   if (lcctypeEl)
+   collctype = defGetString(lcctypeEl);
 
-   if (versionEl)
-   collversion = defGetString(versionEl);
+   if (providerEl)
+   collproviderstr = defGetString(providerEl);
 
-   if (collproviderstr)
-   {
-   if (pg_strcasecmp(collproviderstr, "icu") == 0)
-   collprovider = COLLPROVIDER_ICU;
-   else if (pg_strcasecmp(collproviderstr, "libc") == 0)
-   collprovider = COLLPROVIDER_LIBC;
+   if (deterministicEl)
+   collisdeterministic = defGetBoolean(deterministicEl);
+   else
+   collisdeterministic = true;
+
+   if (versionEl)
+   collversion = defGetString(versionEl);
+
+   if (collproviderstr)
+   {
+   if (pg_strcasecmp(collproviderstr, "icu") == 0)
+   collprovider = COLLPROVIDER_ICU;
+   else if (pg_strcasecmp(collproviderstr, "libc") == 0)
+   collprovider = COLLPROVIDER_LIBC;
+   else
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+   

Re: Pluggable toaster

2022-03-10 Thread Nikita Malakhov
Hi Hackers,

In addition to original patch set for Pluggable Toaster, we have two more
patches
(actually, small, but important fixes), authored by Nikita Glukhov:

1) 0001-Fix-toast_tuple_externalize.patch
This patch fixes freeing memory in case of new toasted value is the same as
old one,
this seems incorrect, and in this case the function just returns instead of
freeing old value;

2) 0002-Fix-alignment-of-custom-TOAST-pointers.patch
This patch adds data alignment for new varatt_custom data structure in
building tuples,
since varatt_custom must be aligned for custom toasters (in particular,
this fix is very
important to JSONb Toaster).

These patches must be applied on top of the latter
4_bytea_appendable_toaster_v1.patch.
Please consider them in reviewing Pluggable Toaster.

Regards.


On Wed, Feb 2, 2022 at 10:35 AM Teodor Sigaev  wrote:

> > I agree ... but I'm also worried about what happens when we have
> > multiple table AMs. One can imagine a new table AM that is
> > specifically optimized for TOAST which can be used with an existing
> > heap table. One can imagine a new table AM for the main table that
> > wants to use something different for TOAST. So, I don't think it's
> > right to imagine that the choice of TOASTer depends solely on the
> > column data type. I'm not really sure how this should work exactly ...
> > but it needs careful thought.
>
> Right. that's why we propose a validate method  (may be, it's a wrong
> name, but I don't known better one) which accepts several arguments, one
> of which is table AM oid. If that method returns false then toaster
> isn't useful with current TAM, storage or/and compression kinds, etc.
>
> --
> Teodor Sigaev  E-mail: teo...@sigaev.ru
>WWW: http://www.sigaev.ru/
>
>
>
--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


0001-Fix-toast_tuple_externalize.patch
Description: Binary data


0002-Fix-alignment-of-custom-TOAST-pointers.patch
Description: Binary data


Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2022-03-10 Thread Fabien COELHO



Hello Justin,

I hope to look at it over the week-end.

--
Fabien Coelho - CRI, MINES ParisTech




Re: Optimize external TOAST storage

2022-03-10 Thread davinder singh
Thanks Dilip, I have fixed your comments, please find the updated patch.

On Tue, Mar 8, 2022 at 9:44 PM Dilip Kumar  wrote:.

> +/* incompressible, ignore on subsequent compression passes. */
> +orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;
>
> Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
> it, the comment say "ignore on subsequent compression passes"
> but after this will there be more compression passes?  If we need to

set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.
>
That was a mistake, this flag is not required at this point, as the
attribute is externalized it will be marked as TOASTCOL_IGNORE, and
such columns are not considered for compression, I removed it. Thanks for
pointing it out.

-- 
Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com


v2_0001_optimize_external_toast_storage.patch
Description: Binary data


  1   2   >