Re: alter table set TABLE ACCESS METHOD

2021-06-03 Thread Michael Paquier
On Thu, Jun 03, 2021 at 02:36:15PM -0700, Jeff Davis wrote:
> Do we have general agreement on this point? Did I miss another purpose
> of detoasting in tablecmds.c, or can we just remove that part of the
> patch?

Catching up with this thread..  So, what you are suggesting here is
that we have no need to let ATRewriteTable() do anything about the
detoasting, and just push down the responsability of detoasting the
tuple, if necessary, down to the AM layer where the tuple insertion is
handled, right?

In short, a table AMs would receive on a rewrite with ALTER TABLE
tuples which may be toasted, still table_insert_tuple() should be able
to handle both:
- the case where this tuple was already toasted.
- the case where this tuple has been already detoasted.

You are right that this would be more consistent with what heap does
with heap_prepare_insert().
--
Michael


signature.asc
Description: PGP signature


Re: Move pg_attribute.attcompression to earlier in struct for reduced size?

2021-06-03 Thread Michael Paquier
On Fri, Jun 04, 2021 at 08:54:48AM +0900, Michael Paquier wrote:
> I have done no recompression here, so I was just stressing the extra
> test for the recompression.  Sorry for the confusion.

I am not sure yet which way we are going, but cleaning up this code
involves a couple of things:
- Clean up the docs.
- Update one of the tests of compression.sql, with its alternate
output.
- Clean up of reform_and_rewrite_tuple() where the rewrite is done.

So that would give the attached.
--
Michael
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e2cd79ec54..1b8b640012 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -19,7 +19,6 @@
  */
 #include "postgres.h"
 
-#include "access/detoast.h"
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/heaptoast.h"
@@ -27,7 +26,6 @@
 #include "access/rewriteheap.h"
 #include "access/syncscan.h"
 #include "access/tableam.h"
-#include "access/toast_compression.h"
 #include "access/tsmapi.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
@@ -2463,64 +2461,14 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 	TupleDesc	newTupDesc = RelationGetDescr(NewHeap);
 	HeapTuple	copiedTuple;
 	int			i;
-	bool		values_free[MaxTupleAttributeNumber];
-
-	memset(values_free, 0, newTupDesc->natts * sizeof(bool));
 
 	heap_deform_tuple(tuple, oldTupDesc, values, isnull);
 
+	/* Be sure to null out any dropped columns */
 	for (i = 0; i < newTupDesc->natts; i++)
 	{
-		/* Be sure to null out any dropped columns */
 		if (TupleDescAttr(newTupDesc, i)->attisdropped)
 			isnull[i] = true;
-		else if (!isnull[i] && TupleDescAttr(newTupDesc, i)->attlen == -1)
-		{
-			/*
-			 * Use this opportunity to force recompression of any data that's
-			 * compressed with some TOAST compression method other than the
-			 * one configured for the column.  We don't actually need to
-			 * perform the compression here; we just need to decompress.  That
-			 * will trigger recompression later on.
-			 */
-			struct varlena *new_value;
-			ToastCompressionId cmid;
-			char		cmethod;
-			char		targetmethod;
-
-			new_value = (struct varlena *) DatumGetPointer(values[i]);
-			cmid = toast_get_compression_id(new_value);
-
-			/* nothing to be done for uncompressed data */
-			if (cmid == TOAST_INVALID_COMPRESSION_ID)
-continue;
-
-			/* convert existing compression id to compression method */
-			switch (cmid)
-			{
-case TOAST_PGLZ_COMPRESSION_ID:
-	cmethod = TOAST_PGLZ_COMPRESSION;
-	break;
-case TOAST_LZ4_COMPRESSION_ID:
-	cmethod = TOAST_LZ4_COMPRESSION;
-	break;
-default:
-	elog(ERROR, "invalid compression method id %d", cmid);
-	cmethod = '\0'; /* keep compiler quiet */
-			}
-
-			/* figure out what the target method is */
-			targetmethod = TupleDescAttr(newTupDesc, i)->attcompression;
-			if (!CompressionMethodIsValid(targetmethod))
-targetmethod = default_toast_compression;
-
-			/* if compression method doesn't match then detoast the value */
-			if (targetmethod != cmethod)
-			{
-values[i] = PointerGetDatum(detoast_attr(new_value));
-values_free[i] = true;
-			}
-		}
 	}
 
 	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
@@ -2528,13 +2476,6 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 	/* The heap rewrite module does the rest */
 	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
 
-	/* Free any value detoasted previously */
-	for (i = 0; i < newTupDesc->natts; i++)
-	{
-		if (values_free[i])
-			pfree(DatumGetPointer(values[i]));
-	}
-
 	heap_freetuple(copiedTuple);
 }
 
diff --git a/src/test/regress/expected/compression.out b/src/test/regress/expected/compression.out
index 5c645e4650..4c997e2602 100644
--- a/src/test/regress/expected/compression.out
+++ b/src/test/regress/expected/compression.out
@@ -297,7 +297,7 @@ SELECT pg_column_compression(f1) FROM cmpart2;
  lz4
 (2 rows)
 
---vacuum full to recompress the data
+-- VACUUM FULL does not recompress
 SELECT pg_column_compression(f1) FROM cmdata;
  pg_column_compression 
 ---
@@ -309,7 +309,7 @@ VACUUM FULL cmdata;
 SELECT pg_column_compression(f1) FROM cmdata;
  pg_column_compression 
 ---
- lz4
+ pglz
  lz4
 (2 rows)
 
diff --git a/src/test/regress/expected/compression_1.out b/src/test/regress/expected/compression_1.out
index aac96037fc..15a23924ec 100644
--- a/src/test/regress/expected/compression_1.out
+++ b/src/test/regress/expected/compression_1.out
@@ -293,7 +293,7 @@ SELECT pg_column_compression(f1) FROM cmpart2;
 ---
 (0 rows)
 
---vacuum full to recompress the data
+-- VACUUM FULL does not recompress
 SELECT pg_column_compression(f1) FROM cmdata;
  pg_column_compression 
 ---
diff --git a/src/test/regress/sql/compression.sql b/src/test/regress/sql/compression.sql
index 35557c1f7d..86332dcc51 100644
--- a/src/test/regress/sql/compression.sql
+++ 

Re: Are we missing (void) when return value of fsm_set_and_search is ignored?

2021-06-03 Thread Julien Rouhaud
On Thu, Jun 03, 2021 at 02:57:42PM +0200, Peter Eisentraut wrote:
> On 03.06.21 12:54, Bharath Rupireddy wrote:
> > It looks like for some of the fsm_set_and_search calls whose return
> > value is ignored (in fsm_search and RecordPageWithFreeSpace), there's
> > no (void). Is it intentional? In the code base, we generally have
> > (void) when non-void return value of a function is ignored.
> 
> I don't think that is correct.  I don't see anyone writing
> 
> (void) printf(...);

We somehow do have a (void) fprint(...) in src/port/getopt.c.

> so this is not a generally applicable strategy.
> 
> We have pg_nodiscard for functions where you explicitly want callers to
> check the return value.  In all other cases, callers are free to ignore
> return values.

Yes, but we have a lot a examples of functions without pg_nodiscard and callers
still explicitly ignoring the results, like fsm_vacuum_page() in the same file.
It would be more consistent and make the code slightly more self explanatory.




Re: [BUG]Update Toast data failure in logical replication

2021-06-03 Thread Dilip Kumar
On Fri, Jun 4, 2021 at 8:25 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Thu, Jun 3, 2021 7:45 PMDilip Kumar  wrote:
> >
> > I have fixed all the pending issues, I see there is already a test
> > case for this so I have changed the output for that.
>
> Thanks for your patch. I tested it for all branches(10,11,12,13,HEAD) and all 
> of them passed.(This bug was introduced in PG-10.)
> I also tested the scenario where I found this bug, data could be synchronized 
> after your fix.

Thanks for verifying this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-03 Thread Masahiko Sawada
On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda  wrote:
>
>
>
> On 2021/05/25 21:59, Masahiko Sawada wrote:
> > On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda  
> > wrote:
> >>
> >> On 2021/05/21 13:45, Masahiko Sawada wrote:
> >>>
> >>> Yes. We also might need to be careful about the order of foreign
> >>> transaction resolution. I think we need to resolve foreign> transactions 
> >>> in arrival order at least within a foreign server.
> >>
> >> I agree it's better.
> >>
> >> (Although this is my interest...)
> >> Is it necessary? Although this idea seems to be for atomic visibility,
> >> 2PC can't realize that as you know. So, I wondered that.
> >
> > I think it's for fairness. If a foreign transaction arrived earlier
> > gets put off so often for other foreign transactions arrived later due
> > to its index in FdwXactCtl->xacts, it’s not understandable for users
> > and not fair. I think it’s better to handle foreign transactions in
> > FIFO manner (although this problem exists even in the current code).
>
> OK, thanks.
>
>
> On 2021/05/21 12:45, Masahiro Ikeda wrote:
> > On 2021/05/21 10:39, Masahiko Sawada wrote:
> >> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda 
> wrote:
> >> How much is the performance without those 2PC patches and with the
> >> same workload? i.e., how fast is the current postgres_fdw that uses
> >> XactCallback?
> >
> > OK, I'll test.
>
> The test results are followings. But, I couldn't confirm the performance
> improvements of 2PC patches though I may need to be changed the test 
> condition.
>
> [condition]
> * 1 coordinator and 3 foreign servers
> * There are two custom scripts which access different two foreign servers per
> transaction
>
> ``` fxact_select.pgbench
> BEGIN;
> SELECT * FROM part:p1 WHERE id = :id;
> SELECT * FROM part:p2 WHERE id = :id;
> COMMIT;
> ```
>
> ``` fxact_update.pgbench
> BEGIN;
> UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> COMMIT;
> ```
>
> [results]
>
> I have tested three times.
> Performance difference seems to be within the range of errors.
>
> # 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
> - fxact_update.pgbench
> 72.3, 74.9, 77.5  TPS  => avg 74.9 TPS
> 110.5, 106.8, 103.2  ms => avg 106.8 ms
>
> - fxact_select.pgbench
> 1767.6, 1737.1, 1717.4 TPS  => avg 1740.7 TPS
> 4.5, 4.6, 4.7 ms => avg 4.6ms
>
> # 6d0eb38557 without 2pc patches
> - fxact_update.pgbench
> 76.5, 70.6, 69.5 TPS => avg 72.2 TPS
> 104.534 + 113.244 + 115.097 => avg 111.0 ms
>
> -fxact_select.pgbench
> 1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
> 4.2, 4.6, 4.6 ms=>  4.5 ms
>

Thank you for testing!

I think the result shows that managing foreign transactions on the
core side would not be a problem in terms of performance.

>
>
>
>
> # About the bottleneck of the resolver process
>
> I investigated the performance bottleneck of the resolver process using perf.
> The main bottleneck is the following functions.
>
> 1st. 42.8% routine->CommitForeignTransaction()
> 2nd. 31.5% remove_fdwxact()
> 3rd. 10.16% CommitTransaction()
>
> 1st and 3rd problems can be solved by parallelizing resolver processes per
> remote servers. But, I wondered that the idea, which backends call also
> "COMMIT/ABORT PREPARED" and the resolver process only takes changes of
> resolving in-doubt foreign transactions, is better. In many cases, I think
> that the number of connections is much greater than the number of remote
> servers. If so, the parallelization is not enough.
>
> So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
> better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
> the extension. So, this idea is not bad.

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could 

Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-03 Thread Amit Kapila
On Thu, Jun 3, 2021 at 10:08 PM Jeff Davis  wrote:
>
> On Thu, 2021-06-03 at 09:29 +0530, Amit Kapila wrote:
> > The idea is to support two_phase via protocol with a subscriber-side
> > work where we can test it as well. The code to support it via
> > replication protocol is present in the first patch of subscriber-side
> > work [1] which uses that code as well. Basically, we don't have a
> > good
> > way to test it without subscriber-side work so decided to postpone it
> > till the corresponding work is done.
>
> Thank you for clarifying.
>
> Right now, it feels a bit incomplete. If it's not much work, I
> recommend breaking out the CREATE_REPLICATION_SLOT changes and updating
> pg_recvlogical, so that it can go in v14 (and
> pg_create_logical_replication_slot() will match
> CREATE_REPLICATION_SLOT). But if that's complicated or controversial,
> then I'm fine waiting for the other work to complete.
>

I think we can try but not sure if we can get it by then. So, here is
my suggestion:
a. remove the change in CreateReplicationSlotCmd
b. prepare the patches for protocol change and pg_recvlogical. This
will anyway include the change we removed as part of (a).

Does that sound reasonable?

-- 
With Regards,
Amit Kapila.




RE: [BUG]Update Toast data failure in logical replication

2021-06-03 Thread tanghy.f...@fujitsu.com
On Thu, Jun 3, 2021 7:45 PMDilip Kumar  wrote:
> 
> I have fixed all the pending issues, I see there is already a test
> case for this so I have changed the output for that.

Thanks for your patch. I tested it for all branches(10,11,12,13,HEAD) and all 
of them passed.(This bug was introduced in PG-10.)
I also tested the scenario where I found this bug, data could be synchronized 
after your fix.

Regards
Tang


Re: checking return value from unlink in write_relcache_init_file

2021-06-03 Thread Zhihong Yu
On Thu, Jun 3, 2021 at 6:16 PM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > On 2021-Jun-03, Tom Lane wrote:
> >> If the unlink fails, there's only really a problem if the subsequent
> >> open() fails to overwrite the file --- and that stanza is perfectly
> >> capable of complaining for itself.  So I think the code is fine and
> >> there's no need for a separate message about the unlink.  Refusing to
> >> proceed, as you've done here, is strictly worse than what we have.
>
> > It does seem to deserve a comment explaining this.
>
> Agreed, the existing comment there is a tad terse.
>
> regards, tom lane
>
Hi,
Here is the patch with a bit more comment on the unlink() call.

Cheers


comment-for-not-checking-unlink-return.patch
Description: Binary data


Re: checking return value from unlink in write_relcache_init_file

2021-06-03 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Jun-03, Tom Lane wrote:
>> If the unlink fails, there's only really a problem if the subsequent
>> open() fails to overwrite the file --- and that stanza is perfectly
>> capable of complaining for itself.  So I think the code is fine and
>> there's no need for a separate message about the unlink.  Refusing to
>> proceed, as you've done here, is strictly worse than what we have.

> It does seem to deserve a comment explaining this.

Agreed, the existing comment there is a tad terse.

regards, tom lane




Re: BUG #16079: Question Regarding the BUG #16064

2021-06-03 Thread Michael Paquier
On Thu, Jun 03, 2021 at 11:02:56AM -0700, Jeff Davis wrote:
> My feeling after all of that discussion is that the next step would be
> to move to some kind of negotiation between client and server about
> which methods are mutually acceptable. Right now, the protocol is
> structured around the server driving the authentication process, and
> the most the client can do is abort.

FWIW, this sounds very similar to what SASL solves when we try to
select a mechanism name, plus some filtering applied in the backend
with some HBA rule or some filtering in the frontend with a connection
parameter doing the restriction, like channel_binding here.

Introducing a new libpq parameter that allows the user to select which
authentication methods are allowed has been discussed in the past, I
remember vaguely writing/reviewing a patch doing that actually..
--
Michael


signature.asc
Description: PGP signature


Re: checking return value from unlink in write_relcache_init_file

2021-06-03 Thread Alvaro Herrera
On 2021-Jun-03, Tom Lane wrote:

> If the unlink fails, there's only really a problem if the subsequent
> open() fails to overwrite the file --- and that stanza is perfectly
> capable of complaining for itself.  So I think the code is fine and
> there's no need for a separate message about the unlink.  Refusing to
> proceed, as you've done here, is strictly worse than what we have.

It does seem to deserve a comment explaining this.

-- 
Álvaro Herrera   Valdivia, Chile
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)




Re: RFC: Table access methods and scans

2021-06-03 Thread Jeff Davis
Hi,

On Wed, 2021-03-31 at 22:10 +0200, Mats Kindahl wrote:
> As an example of how this is useful, I noticed the work by Heikki and
> Ashwin [1], where they return a `TableScanDesc` that contains
> information about what columns to scan, which looks very useful.
> Since
> the function `table_beginscan` in `src/include/access/tableam.h`
> accept a `ScanKey` as input, this is (AFAICT) what Heikki and Ashwin
> was exploiting to create a specialized scan for a columnar store.

I don't think ScanKeys are the right place to store information about
what columns would be useful. See another thread[2] about that topic.

> Another example of where this can be useful is to optimize access
> during a sequential scan when you can handle some specific scans very
> efficiently and can "skip ahead" many tuples if you know what is
> being
> looked for instead of filtering "late". Two examples of where this
> could be useful are:
> 
> - An access method that reads data from a remote system and doesn't
> want
>   to transfer all tuples unless necessary.
> - Some sort of log-structured storage with Bloom filters that allows
>   you to quickly skip suites that do not have a key.

I agree that would be very conventient for non-heap AMs. There's a very
old commit[3] that says:

+   /*
+* Note that unlike IndexScan, SeqScan never use keys
+* in heap_beginscan (and this is very bad) - so, here
+* we have not check are keys ok or not.
+*/

and that language has just been carried forward for decades. I wonder
if there's any major reason this hasn't been done yet. Does it just not
improve performance for a heap, or is there some other reason?

Regards,
Jeff Davis

[2] 
https://www.postgresql.org/message-id/cae-ml+9rmtnzkcntzpqf8o3b-ujhwgfbsoxpqa3wvuc8ybb...@mail.gmail.com

[3] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e3a1ab764ef2






Re: Documentation missing for PGSSLCRLDIR

2021-06-03 Thread Michael Paquier
On Thu, Jun 03, 2021 at 01:42:20PM +0900, Kyotaro Horiguchi wrote:
> Ugg..  Thanks for finding that. I don't find a similar mistake in the
> same page.

Thanks for double-checking.  Applied.
--
Michael


signature.asc
Description: PGP signature


Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-06-03 Thread Andrew Dunstan


On 6/3/21 7:30 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> As mentioned in the previous message, I've reverted most of 39b66a91bd.
> Should this topic be removed from the open-items list now?
>
>   



Yep.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Documentation missing for PGSSLCRLDIR

2021-06-03 Thread Michael Paquier
On Thu, Jun 03, 2021 at 02:08:02PM +0200, Daniel Gustafsson wrote:
> While looking at this I found another nearby oversight which needs a backport
> down to 13 where it was introduced.  The PGSSLMAXPROTOCOLVERSION documentation
> is linking to the minimum protocol version docs.  Fixed in the attached.

Thanks, fixed this bit.
--
Michael


signature.asc
Description: PGP signature


Re: A few source files have the wrong copyright year

2021-06-03 Thread Michael Paquier
On Fri, Jun 04, 2021 at 12:21:16PM +1200, David Rowley wrote:
> Pushed.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: A few source files have the wrong copyright year

2021-06-03 Thread David Rowley
On Fri, 4 Jun 2021 at 00:16, David Rowley  wrote:
> I noticed earlier today when working in brin_minmax_multi.c that the
> copyright year was incorrect. That caused me to wonder if any other
> source files have the incorrect year.

> The attached fixes.

Pushed.

David




Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2021-06-03 Thread Jeff Davis
On Sun, 2020-09-13 at 19:04 +0300, Nikolay Shaplov wrote:
> Moving reloptions to AM code is the goal I am slowly moving to. I've
> started 
> some time ago with big patch 
> https://commitfest.postgresql.org/14/992/ and 
> have been told to split it into smaller parts. So I did, and this
> patch is 
> last step that cleans options related things up, and then actual
> moving can be 
> done.

Thank you for working on this.

Can you outline the plan for moving these options to the table AM to
make sure this patch is a step in the right direction?

I was trying to work through this problem as well[1], and there are a
few complications.

* Which options apply to any relation (of any table AM), and which
apply to only heaps? As far as I can tell, the only one that seems
heap-specific is "fillfactor".

* Toast tables can be any AM, as well, so if we accept new reloptions
for a custom AM, we also need to accept options for toast tables of
that AM.

* Implementation-wise, the bytea representation of the options is not
very easy to extend. Should we have a new text field in the catalog to
hold the custom options?

Regards,
Jeff Davis

[1] 
https://www.postgresql.org/message-id/43c6ec161f930e385dbc3169a065a917cfc60714.camel%40j-davis.com





Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2021-06-03 Thread Arne Roland
Hi,


thanks for the quick reply!


From: Tomas Vondra 
Sent: Thursday, June 3, 2021 20:11
To: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in 
generate_orderedappend_path

> I haven't tested the parallel case, but I think we should sort out (3)
> get_cheapest_fractional_path_for_pathkeys as mentioned above.
>

Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.


I was referring to one message above. I thought the thread was still short 
enough. Apparently to much time has passed. Sorry, I hope this mail is better. 
I was referring to my post from April:

From: Arne Roland
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in 
generate_orderedappend_path


>3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry

> about require_parallel_safe, just like the other functions nearby.

I think it should. We have a ParallelAppend node after all.
I'm not really familiar with the way get_cheapest_fractional_path_for_pathkeys 
is used, but a quick search suggests to me, that build_minmax_path was thus far 
the only one using it. And minmax paths are never parallel safe anyway. I think 
that is the reason it doesn't do that already.


From: Zhihong Yu 
Sent: Thursday, June 3, 2021 22:50
To: Tomas Vondra
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in 
generate_orderedappend_path


Hi,
In REL_11_STABLE branch, a search revealed the following:

src/backend/optimizer/path/pathkeys.c: * 
get_cheapest_fractional_path_for_pathkeys
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List
 *paths,
src/backend/optimizer/plan/planagg.c:   
get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
src/include/optimizer/paths.h:extern Path 
*get_cheapest_fractional_path_for_pathkeys(List *paths,

It seems this function has been refactored out in subsequent releases.

FYI

Thanks for the info!
I doubt there is any interest to back patch this anywhere. My most ambitious 
dream would be getting this into pg 14.

I think, we only care about a parallel safety aware variant anyways, which 
afaict never existed.


From: Tomas Vondra 
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in 
generate_orderedappend_path
Actually, there are two comments

/* XXX maybe we should have startup_new_fractional? */

in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be

/* XXX maybe we should have startup_neq_fractional? */

and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.

At least I think that was the idea when I wrote the patch, it way too
long ago.

Sorry, I almost forgot about this myself. I only got reminded upon seeing that 
again with different queries/tables.
Just to be sure I get this correctly: You mean startup_gt_fractional (cost) as 
an additional condition, right?

Regards
Arne



Re: Move pg_attribute.attcompression to earlier in struct for reduced size?

2021-06-03 Thread Michael Paquier
On Thu, Jun 03, 2021 at 12:04:48PM -0400, Alvaro Herrera wrote:
> If the check for recompression is this expensive, then yeah I agree that
> we should take it out.  If recompression is actually occurring, then I
> don't think this is a good test :-)

I have done no recompression here, so I was just stressing the extra
test for the recompression.  Sorry for the confusion.
--
Michael


signature.asc
Description: PGP signature


Re: DELETE CASCADE

2021-06-03 Thread David G. Johnston
On Thu, Jun 3, 2021 at 3:29 PM Isaac Morland 
wrote:

> Surely you mean if we don't have DELETE permission on the referencing
> table? I don't see why we need to be a member of the table owner role.
>

I would reverse the question - why does this feature need to allow the more
broad DELETE permission instead of just limiting it to the table owner? The
latter matches the required permission for the existing cascade feature
that this is extending.

David J.


Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-06-03 Thread Tom Lane
Tomas Vondra  writes:
> As mentioned in the previous message, I've reverted most of 39b66a91bd.

Should this topic be removed from the open-items list now?

regards, tom lane




Re: checking return value from unlink in write_relcache_init_file

2021-06-03 Thread Tom Lane
Zhihong Yu  writes:
> Please comment on the proposed patch.

If the unlink fails, there's only really a problem if the subsequent
open() fails to overwrite the file --- and that stanza is perfectly
capable of complaining for itself.  So I think the code is fine and
there's no need for a separate message about the unlink.  Refusing to
proceed, as you've done here, is strictly worse than what we have.

regards, tom lane




Re: checking return value from unlink in write_relcache_init_file

2021-06-03 Thread Justin Pryzby
On Thu, Jun 03, 2021 at 03:44:13PM -0700, Zhihong Yu wrote:
> Hi,
> I was looking at write_relcache_init_file() where an attempt is made to
> unlink the tempfilename.
> 
> However, the return value is not checked.
> If the tempfilename is not removed (the file exists), I think we should log
> a warning and proceed.
> 
> Please comment on the proposed patch.

-   unlink(tempfilename);   /* in case it exists w/wrong 
permissions */
+   /* in case it exists w/wrong permissions */
+   if (unlink(tempfilename) && errno != ENOENT)
+   {
+   ereport(WARNING,
+   (errcode_for_file_access(),
+errmsg("could not unlink relation-cache 
initialization file \"%s\": %m",
+   tempfilename),
+errdetail("Continuing anyway, but there's 
something wrong.")));
+   return;
+   }
+
 
fp = AllocateFile(tempfilename, PG_BINARY_W);

The comment here is instructive: the unlink is in advance of AllocateFile(),
and if the file exists with wrong permissions, then AllocateFile would itself 
fail,
and then issue a warning:

 errmsg("could not create relation-cache 
initialization file \"%s\": %m",
tempfilename),
 errdetail("Continuing anyway, but there's 
something wrong.")));

In that context, I don't think it's needed to check the return of unlink().

-- 
Justin




checking return value from unlink in write_relcache_init_file

2021-06-03 Thread Zhihong Yu
Hi,
I was looking at write_relcache_init_file() where an attempt is made to
unlink the tempfilename.

However, the return value is not checked.
If the tempfilename is not removed (the file exists), I think we should log
a warning and proceed.

Please comment on the proposed patch.

Thanks


check-unlink-return-init-file.patch
Description: Binary data


Re: DELETE CASCADE

2021-06-03 Thread Isaac Morland
On Thu, 3 Jun 2021 at 18:25, David Christensen <
david.christen...@crunchydata.com> wrote:

> What happens if I don't have delete permission on the referencing table?
>>> When a foreign key reference delete cascades, I can cause records to
>>> disappear from a referencing table even if I don't have delete permission
>>> on that table. This feels like it's just supposed to be a convenience that
>>> replaces multiple DELETE invocations but one way or the other we need to be
>>> clear on the behaviour.
>>>
>>
>> Did you test this and find a failure? Because it is literally using all
>> of the same RI proc code/permissions as defined I would expect that it
>> would just abort the transaction.  (I am working on expanding the test
>> suite for this feature to allow for test cases like this, so keep 'em
>> coming... :-))
>>
>
> Enclosed is a basic test script and the corresponding output run through
> `psql -e` (will adapt into part of the regression test, but wanted to get
> this out there).  TL;DR; DELETE CASCADE behaves exactly as if said
> constraint were defined as a ON DELETE CASCADE FK constraint wrt DELETE
> permission behavior.  I do agree in this case, that it makes sense to throw
> an error if we're trying to bypass the RESTRICT behavior and we are not
> part of the table owner role (and since this would be called/checked
> recursively for each table involved in the graph I think we can count on it
> reporting the appropriate error message in this case).
>

Surely you mean if we don't have DELETE permission on the referencing
table? I don't see why we need to be a member of the table owner role.


Re: DELETE CASCADE

2021-06-03 Thread Isaac Morland
On Thu, 3 Jun 2021 at 18:08, David Christensen <
david.christen...@crunchydata.com> wrote:

> On Thu, Jun 3, 2021 at 4:15 PM Isaac Morland 
> wrote:
>
>>
>> What happens if I don't have delete permission on the referencing table?
>> When a foreign key reference delete cascades, I can cause records to
>> disappear from a referencing table even if I don't have delete permission
>> on that table. This feels like it's just supposed to be a convenience that
>> replaces multiple DELETE invocations but one way or the other we need to be
>> clear on the behaviour.
>>
>
> Did you test this and find a failure? Because it is literally using all of
> the same RI proc code/permissions as defined I would expect that it would
> just abort the transaction.  (I am working on expanding the test suite for
> this feature to allow for test cases like this, so keep 'em coming... :-))
>

I haven't run your patch. I'm just asking because it's a question about
exactly how the behaviour works that needs to be clearly and intentionally
decided (and documented). I think aborting the transaction with a
permission denied error on the referencing table is probably the right
behaviour: it's what you would get if you issued an equivalent delete on
the referencing table explicitly. I think of your patch as being a
convenience to avoid having to write a separate DELETE for each referencing
table. So based on what you say, it sounds like you've already covered this
issue.

Sidebar: isn't this inconsistent with trigger behaviour in general? When I
>> say "ON DELETE CASCADE" what I mean and what I get are the same: whenever
>> the referenced row is deleted, the referencing row also disappears,
>> regardless of the identity or permissions of the role running the actual
>> DELETE. But any manually implemented trigger runs as the caller; I cannot
>> make the database do something when a table update occurs; I can only make
>> the role doing the table update perform some additional actions.
>>
>
> Have you found a failure?  Because all this is doing is effectively
> calling the guts of the cascade RI routines, so no differences should
> occur.  If not, I'm not quite clear on your objection; can you clarify?
>

Sorry, my sidebar is only tangentially related. In another thread we had a
discussion about triggers, which it turns out execute as the role running
the command, not as the owner of the table. For many triggers it doesn't
matter, but for many things I can think of that I would want to do with
triggers it will only work if the trigger executes as the owner of the
table (or trigger, hypothetically…); and there are several common cases
where it makes way more sense to execute as the owner (e.g., triggers to
maintain a log table; it doesn't make sense to have to grant permissions on
the log table to roles with permissions on the main table, and also allows
spurious log entries to be made). But here it seems that cascaded actions
do execute as a role that is not dependent on who is running the command.

In short, I probably should have left off the sidebar. It's not an issue
with your patch.


Re: DELETE CASCADE

2021-06-03 Thread David Christensen
>
> What happens if I don't have delete permission on the referencing table?
>> When a foreign key reference delete cascades, I can cause records to
>> disappear from a referencing table even if I don't have delete permission
>> on that table. This feels like it's just supposed to be a convenience that
>> replaces multiple DELETE invocations but one way or the other we need to be
>> clear on the behaviour.
>>
>
> Did you test this and find a failure? Because it is literally using all of
> the same RI proc code/permissions as defined I would expect that it would
> just abort the transaction.  (I am working on expanding the test suite for
> this feature to allow for test cases like this, so keep 'em coming... :-))
>

Enclosed is a basic test script and the corresponding output run through
`psql -e` (will adapt into part of the regression test, but wanted to get
this out there).  TL;DR; DELETE CASCADE behaves exactly as if said
constraint were defined as a ON DELETE CASCADE FK constraint wrt DELETE
permission behavior.  I do agree in this case, that it makes sense to throw
an error if we're trying to bypass the RESTRICT behavior and we are not
part of the table owner role (and since this would be called/checked
recursively for each table involved in the graph I think we can count on it
reporting the appropriate error message in this case).


test_delete_cascade.out
Description: Binary data


test_delete_cascade.sql
Description: Binary data


Re: DELETE CASCADE

2021-06-03 Thread David Christensen
On Thu, Jun 3, 2021 at 4:15 PM Isaac Morland 
wrote:

> On Thu, 3 Jun 2021 at 16:49, David Christensen <
> david.christen...@crunchydata.com> wrote:
>
>> Hi -hackers,
>>
>> Presented for discussion is a POC for a DELETE CASCADE functionality,
>> which will allow you one-shot usage of treating existing NO ACTION and
>> RESTRICT FK constraints as if they were originally defined as CASCADE
>> constraints.  I can't tell you how many times this functionality would have
>> been useful in the field, and despite the expected answer of "define your
>> constraints right in the first place", this is not always an option, nor is
>> the ability to change that easily (or create new constraints that need to
>> revalidate against big tables) always the best option.
>>
>
> I would sometimes find this convenient. There are circumstances where I
> don't want every DELETE to blunder all over the database deleting stuff,
> but certain specific DELETEs should take care of the referencing tables.
>
> An additional syntax to say "CASCADE TO table1, table2" would be safer and
> sometimes useful in the case where I know I want to cascade to specific
> other tables but not all (and in particular not to ones I didn't think of
> when I wrote the query); I might almost suggest omitting the cascade to all
> syntax (or maybe have a separate syntax, literally "CASCADE TO ALL TABLES"
> or some such).
>

I'm not fond of the syntax requirements for the explicitness here, plus it
seems like it would complicate the functionality of the patch (which
currently is able to just slightly refactor the RI triggers to account for
a single state variable, rather than do anything smarter than that).  I do
understand the desire/need for visibility into what would be affected with
an offhand statement.

What happens if I don't have delete permission on the referencing table?
> When a foreign key reference delete cascades, I can cause records to
> disappear from a referencing table even if I don't have delete permission
> on that table. This feels like it's just supposed to be a convenience that
> replaces multiple DELETE invocations but one way or the other we need to be
> clear on the behaviour.
>

Did you test this and find a failure? Because it is literally using all of
the same RI proc code/permissions as defined I would expect that it would
just abort the transaction.  (I am working on expanding the test suite for
this feature to allow for test cases like this, so keep 'em coming... :-))


> Sidebar: isn't this inconsistent with trigger behaviour in general? When I
> say "ON DELETE CASCADE" what I mean and what I get are the same: whenever
> the referenced row is deleted, the referencing row also disappears,
> regardless of the identity or permissions of the role running the actual
> DELETE. But any manually implemented trigger runs as the caller; I cannot
> make the database do something when a table update occurs; I can only make
> the role doing the table update perform some additional actions.
>

Have you found a failure?  Because all this is doing is effectively calling
the guts of the cascade RI routines, so no differences should occur.  If
not, I'm not quite clear on your objection; can you clarify?

David


Re: DELETE CASCADE

2021-06-03 Thread David Christensen
On Thu, Jun 3, 2021 at 4:48 PM David G. Johnston 
wrote:

> On Thu, Jun 3, 2021 at 1:49 PM David Christensen <
> david.christen...@crunchydata.com> wrote:
>
>> Presented for discussion is a POC for a DELETE CASCADE functionality,
>> which will allow you one-shot usage of treating existing NO ACTION and
>> RESTRICT FK constraints as if they were originally defined as CASCADE
>> constraints.
>>
>
> ON DELETE NO ACTION constraints become ON DELETE CASCADE constraints - ON
> DELETE SET NULL constraints are ignored, and not possible to emulate via
> this feature.
>

I have not tested this part per se (which clearly I need to expand the
existing test suite), but my reasoning here was that ON DELETE SET
NULL/DEFAULT would still be applied with their defined behaviors (being
that we're still calling the underlying RI triggers using SPI) with the
same results; the intent of this feature is just to suppress the RESTRICT
action and cascade the DELETE to all tables (on down the chain) which would
normally block this, without having to manually figure all the dependencies
which can be inferred by the database itself.


>   I can't tell you how many times this functionality would have been
>> useful in the field, and despite the expected answer of "define your
>> constraints right in the first place", this is not always an option, nor is
>> the ability to change that easily (or create new constraints that need to
>> revalidate against big tables) always the best option.
>>
>
> Once...but I agreed.
>

Heh.


> That said, I'm happy to quibble about the specific approach to be taken;
>> I've written this based on the most straightforward way I could come up
>> with to accomplish this, but if there are better directions to take to get
>> the equivalent functionality I'm happy to discuss.
>>
>>
> This behavior should require the same permissions as actually creating an
> ON DELETE CASCADE FK on the cascaded-to tables.  i.e., Table Owner role
> membership (the requirement for FK permissions can be assumed by the
> presence of the existing FK constraint and being the table's owner).
>

I'm not sure if this would be overly prohibitive or not, but if you're the
table owner this should just work, like you point out.  I think this
restriction could be fine for the common case, and if there was a way to
hint if/when this failed to cascade as to the actual reason for the failure
I'm fine with that part too. (I was assuming that DELETE permission on the
underlying tables + existence of FK would be enough in practice, but we
could definitely tighten that up.)


> Having the defined FK behaviors be more readily changeable, while not
> mitigating this need, is IMO a more important feature to implement.  If
> there is a reason that cannot be implemented (besides no one has bothered
> to take the time) then I would consider that reason to also apply to
> prevent implementing this work-around.
>

Agreed that this would be a nice feature to have too; noone wants to break
FK consistency to change things or require a rescan of a valid constraint.


Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 03.06.21 22:21, Tom Lane wrote:
>> An example is that (AFAICT) the spec allows having both
>>  create procedure divide(x int, y int, OUT q int) ...
>>  create procedure divide(x int, y int, OUT q int, OUT r int) ...
>> which I want to reject because they have the same input parameters.

> (I'm by no means suggesting this, but I could imagine a catalog 
> representation that allows this but still checks that you can't have 
> multiple candidates that differ only by the type of an OUT parameters. 
> Say with some kind of bitmap or boolean array that indicates where the 
> OUT parameters are.  Then you can only have one candidate with a given 
> number of arguments, but the above could be allowed.  Again, I'm not 
> suggesting this, but it's a possibility in theory.)

We could certainly do something like that in a green field.  But one
of the reasons I'm unhappy about the current design is that I'm convinced
that altering the definition of pg_proc.proargtypes will break client-side
code that's looking at the catalogs.  I don't think we get to monkey with
such fundamental bits of the catalog data without a really good reason.
Allowing different OUT parameters for the same IN parameters doesn't seem
to me to qualify, given that there are other reasons why that's dubious.

regards, tom lane




Re: DELETE CASCADE

2021-06-03 Thread David G. Johnston
On Thu, Jun 3, 2021 at 1:49 PM David Christensen <
david.christen...@crunchydata.com> wrote:

> Presented for discussion is a POC for a DELETE CASCADE functionality,
> which will allow you one-shot usage of treating existing NO ACTION and
> RESTRICT FK constraints as if they were originally defined as CASCADE
> constraints.
>

ON DELETE NO ACTION constraints become ON DELETE CASCADE constraints - ON
DELETE SET NULL constraints are ignored, and not possible to emulate via
this feature.


>   I can't tell you how many times this functionality would have been
> useful in the field, and despite the expected answer of "define your
> constraints right in the first place", this is not always an option, nor is
> the ability to change that easily (or create new constraints that need to
> revalidate against big tables) always the best option.
>

Once...but I agreed.

>
> That said, I'm happy to quibble about the specific approach to be taken;
> I've written this based on the most straightforward way I could come up
> with to accomplish this, but if there are better directions to take to get
> the equivalent functionality I'm happy to discuss.
>
>
This behavior should require the same permissions as actually creating an
ON DELETE CASCADE FK on the cascaded-to tables.  i.e., Table Owner role
membership (the requirement for FK permissions can be assumed by the
presence of the existing FK constraint and being the table's owner).

Having the defined FK behaviors be more readily changeable, while not
mitigating this need, is IMO a more important feature to implement.  If
there is a reason that cannot be implemented (besides no one has bothered
to take the time) then I would consider that reason to also apply to
prevent implementing this work-around.

David J.


Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Peter Eisentraut

On 03.06.21 22:21, Tom Lane wrote:

Once you do, you'll possibly notice that PG's rules for which combinations
of signatures are allowed are different from the spec's.  I believe that
we're largely more generous than the spec, but there are a few cases where
this proposal isn't.  An example is that (AFAICT) the spec allows having
both
create procedure divide(x int, y int, OUT q int) ...
create procedure divide(x int, y int, OUT q int, OUT r int) ...
which I want to reject because they have the same input parameters.
This is perhaps annoying.  But seeing that the spec won't allow you to
also have divide() procedures for other datatypes, I'm having a hard
time feeling that this is losing on the overloading-flexibility front.


I'm okay with disallowing this.  In my experience, overloading of 
procedures is done even more rarely than of functions, so this probably 
won't affect anything in practice.


(I'm by no means suggesting this, but I could imagine a catalog 
representation that allows this but still checks that you can't have 
multiple candidates that differ only by the type of an OUT parameters. 
Say with some kind of bitmap or boolean array that indicates where the 
OUT parameters are.  Then you can only have one candidate with a given 
number of arguments, but the above could be allowed.  Again, I'm not 
suggesting this, but it's a possibility in theory.)





Re: alter table set TABLE ACCESS METHOD

2021-06-03 Thread Jeff Davis
On Thu, 2021-05-06 at 17:24 -0700, Jeff Davis wrote:
> It's the table AM's responsibility to detoast out-of-line datums and
> toast any values that are too large (see
> heapam.c:heap_prepare_insert()).

Do we have general agreement on this point? Did I miss another purpose
of detoasting in tablecmds.c, or can we just remove that part of the
patch?

Regards,
Jeff Davis






Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 02.06.21 02:04, Tom Lane wrote:
>> Hmm, actually we could make step 2 a shade tighter: if a candidate
>> routine is a function, match against proargtypes.  If it's a procedure,
>> match against coalesce(proallargtypes, proargtypes).  If we find
>> multiple matches, raise ambiguity error.

> I'm ok with this proposal.

Cool.  Do you want to try to implement it, or shall I?

A question that maybe we should refer to the RMT is whether it's
too late for this sort of redesign for v14.  I dislike reverting
the OUT-procedure feature altogether in v14, but perhaps that's
the sanest way to proceed.

regards, tom lane




Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Andrew Dunstan


On 6/3/21 4:50 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Not sure I follow the "other datatypes" bit. Are you saying the spec
>> won't let you have this?:
>> create procedure divide(x int, y int, OUT q int);
>> create procedure divide(x int, y int, OUT q float);
> In fact it won't, because the spec's rule is simply "you can't have
> two procedures with the same name and same number of parameters"
> (where they count OUT parameters, I believe).  


Oh. That's a truly awful rule.



> However the case
> I was considering was wanting to have
>
>   create procedure divide(x int, y int, OUT q int) ...
>   create procedure divide(x numeric, y numeric, OUT q numeric) ...
>
> which likewise falls foul of the spec's restriction, but which
> IMO must be allowed in Postgres.
>


Right, we should certainly allow that.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: DELETE CASCADE

2021-06-03 Thread Isaac Morland
On Thu, 3 Jun 2021 at 16:49, David Christensen <
david.christen...@crunchydata.com> wrote:

> Hi -hackers,
>
> Presented for discussion is a POC for a DELETE CASCADE functionality,
> which will allow you one-shot usage of treating existing NO ACTION and
> RESTRICT FK constraints as if they were originally defined as CASCADE
> constraints.  I can't tell you how many times this functionality would have
> been useful in the field, and despite the expected answer of "define your
> constraints right in the first place", this is not always an option, nor is
> the ability to change that easily (or create new constraints that need to
> revalidate against big tables) always the best option.
>

I would sometimes find this convenient. There are circumstances where I
don't want every DELETE to blunder all over the database deleting stuff,
but certain specific DELETEs should take care of the referencing tables.

An additional syntax to say "CASCADE TO table1, table2" would be safer and
sometimes useful in the case where I know I want to cascade to specific
other tables but not all (and in particular not to ones I didn't think of
when I wrote the query); I might almost suggest omitting the cascade to all
syntax (or maybe have a separate syntax, literally "CASCADE TO ALL TABLES"
or some such).

What happens if I don't have delete permission on the referencing table?
When a foreign key reference delete cascades, I can cause records to
disappear from a referencing table even if I don't have delete permission
on that table. This feels like it's just supposed to be a convenience that
replaces multiple DELETE invocations but one way or the other we need to be
clear on the behaviour.

Sidebar: isn't this inconsistent with trigger behaviour in general? When I
say "ON DELETE CASCADE" what I mean and what I get are the same: whenever
the referenced row is deleted, the referencing row also disappears,
regardless of the identity or permissions of the role running the actual
DELETE. But any manually implemented trigger runs as the caller; I cannot
make the database do something when a table update occurs; I can only make
the role doing the table update perform some additional actions.


Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Daniel Gustafsson
> On 3 Jun 2021, at 23:11, Tom Lane  wrote:
> 
> Bruce Momjian  writes:
>> I wonder if we should use SSL/TLS in more places in our documentation.
> 
> No objection to doing that in the docs; I'm just questioning
> switching the code-visible names.

As long as it's still searchable by "SSL", "TLS" and "SSL/TLS" and not just the
latter.

--
Daniel Gustafsson   https://vmware.com/





Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Peter Eisentraut

On 02.06.21 02:04, Tom Lane wrote:

I wrote:

It's possible that we could revert proargtypes and still accommodate
the spec's definition for ALTER/DROP ROUTINE/PROCEDURE.  I'm imagining
some rules along the line of:
1. If arg list contains any parameter modes, then it must be PG
syntax, so interpret it according to our traditional rules.
2. Otherwise, try to match the given arg types against *both*
proargtypes and proallargtypes.  If we get multiple matches,
complain that the command is ambiguous.  (In the case of DROP
PROCEDURE, it's probably OK to consider only proallargtypes.)


Hmm, actually we could make step 2 a shade tighter: if a candidate
routine is a function, match against proargtypes.  If it's a procedure,
match against coalesce(proallargtypes, proargtypes).  If we find
multiple matches, raise ambiguity error.

The cases where you get the error could be resolved by either
using traditional PG syntax, or (in most cases) by saying
FUNCTION or PROCEDURE instead of ROUTINE.


I'm ok with this proposal.




Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Tom Lane
Bruce Momjian  writes:
> I wonder if we should use SSL/TLS in more places in our documentation.

No objection to doing that in the docs; I'm just questioning
switching the code-visible names.

regards, tom lane




Re: pgsql: Add regression test for recovery pause.

2021-06-03 Thread Andrew Dunstan


On 6/3/21 4:45 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> My suspicion was correct. Fix pushed.
> Great, thanks.
>
> Do we need to worry about back-patching that?  It seems only
> accidental if no existing back-branch test cases hit this.


Well, we haven't had breakage, but its also useful to keep things in
sync as much as possible. Ill do it shortly.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Bruce Momjian
On Thu, Jun  3, 2021 at 04:55:45PM -0400, Tom Lane wrote:
> Daniel Gustafsson  writes:
> > It might also put us a hard spot if the next TLS spec ends up being called
> > something other than TLS?  It's clearly happened before =)
> 
> Good point.  I'm inclined to just stick with the SSL terminology.

I wonder if we should use SSL/TLS in more places in our documentation.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Daniel Gustafsson
> On 3 Jun 2021, at 22:55, Tom Lane  wrote:

>>> Also, do we have precedent for GUC aliases? That might be a little
>>> weird.
> 
>> I don't think we do currently, but I have a feeling the topic has surfaced 
>> here
>> before.
> 
> We do, look for "sort_mem" in guc.c.

I knew it seemed familiar but I failed to find it, thanks for the pointer.

--
Daniel Gustafsson   https://vmware.com/





Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2021-06-03 Thread Tomas Vondra
On 6/3/21 10:52 PM, Zhihong Yu wrote:
> ...
> 
> Sent a bit too soon.
> 
> The above function still exists.
> But startup_new_fractional was nowhere to be found.

Actually, there are two comments

/* XXX maybe we should have startup_new_fractional? */

in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be

/* XXX maybe we should have startup_neq_fractional? */

and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.

At least I think that was the idea when I wrote the patch, it way too
long ago.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Tom Lane
Daniel Gustafsson  writes:
> It might also put us a hard spot if the next TLS spec ends up being called
> something other than TLS?  It's clearly happened before =)

Good point.  I'm inclined to just stick with the SSL terminology.

>> Also, do we have precedent for GUC aliases? That might be a little
>> weird.

> I don't think we do currently, but I have a feeling the topic has surfaced 
> here
> before.

We do, look for "sort_mem" in guc.c.  So it's not like it'd be
inconvenient to implement.  But I think user confusion and the
potential for the new terminology to fail to be any more
future-proof are good reasons to just leave the names alone.

regards, tom lane




Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Tom Lane
Andrew Dunstan  writes:
> Not sure I follow the "other datatypes" bit. Are you saying the spec
> won't let you have this?:

> create procedure divide(x int, y int, OUT q int);
> create procedure divide(x int, y int, OUT q float);

In fact it won't, because the spec's rule is simply "you can't have
two procedures with the same name and same number of parameters"
(where they count OUT parameters, I believe).  However the case
I was considering was wanting to have

create procedure divide(x int, y int, OUT q int) ...
create procedure divide(x numeric, y numeric, OUT q numeric) ...

which likewise falls foul of the spec's restriction, but which
IMO must be allowed in Postgres.

regards, tom lane




DELETE CASCADE

2021-06-03 Thread David Christensen
Hi -hackers,

Presented for discussion is a POC for a DELETE CASCADE functionality,
which will allow you one-shot usage of treating existing NO ACTION and
RESTRICT FK constraints as if they were originally defined as CASCADE
constraints.  I can't tell you how many times this functionality would have
been useful in the field, and despite the expected answer of "define your
constraints right in the first place", this is not always an option, nor is
the ability to change that easily (or create new constraints that need to
revalidate against big tables) always the best option.

That said, I'm happy to quibble about the specific approach to be taken;
I've written this based on the most straightforward way I could come up
with to accomplish this, but if there are better directions to take to get
the equivalent functionality I'm happy to discuss.

>From the commit message:

Proof of concept of allowing a DELETE statement to override formal FK's
handling from RESTRICT/NO
ACTION and treat as CASCADE instead.

Syntax is "DELETE CASCADE ..." instead of "DELETE ... CASCADE" due to
unresolvable bison conflicts.

Sample session:

  postgres=# create table foo (id serial primary key, val text);
  CREATE TABLE
  postgres=# create table bar (id serial primary key, foo_id int references
foo(id), val text);
  CREATE TABLE
  postgres=# insert into foo (val) values ('a'),('b'),('c');
  INSERT 0 3
  postgres=# insert into bar (foo_id, val) values
(1,'d'),(1,'e'),(2,'f'),(2,'g');
  INSERT 0 4
  postgres=# select * from foo;
   id | val
  +-
1 | a
2 | b
3 | c
  (3 rows)

  postgres=# select * from bar;
   id | foo_id | val
  ++-
1 |  1 | d
2 |  1 | e
3 |  2 | f
4 |  2 | g
  (4 rows)

  postgres=# delete from foo where id = 1;
  ERROR:  update or delete on table "foo" violates foreign key constraint
"bar_foo_id_fkey" on table "bar"
  DETAIL:  Key (id)=(1) is still referenced from table "bar".
  postgres=# delete cascade from foo where id = 1;
  DELETE 1
  postgres=# select * from foo;
   id | val
  +-
2 | b
3 | c
  (2 rows)

  postgres=# select * from bar;
   id | foo_id | val
  ++-
3 |  2 | f
4 |  2 | g
  (2 rows)


Best,

David


0001-Add-support-for-DELETE-CASCADE.patch
Description: Binary data


Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Daniel Gustafsson
> On 3 Jun 2021, at 22:14, Jeff Davis  wrote:
> 
> On Thu, 2021-06-03 at 15:53 -0400, Andrew Dunstan wrote:
>> Yeah, but it's annoying to have to start every talk I give touching
>> this
>> subject with the slide that says "When we say SSL we really means
>> TLS".
>> Maybe release 15 would be a good time to rename user-visible option
>> names etc, with support for legacy names.

Perhaps.  Having spent some time in this space, SSL has IMHO become the de
facto term for an encrypted connection at the socket layer, with TLS being the
current protocol suite (additionally, often referred to SSL/TLS).  Offering
tls* counterparts to our ssl GUCs etc will offer a level of correctness but I
doubt we'll ever get rid of ssl* so we might not help too many users by the
added complexity.

It might also put us a hard spot if the next TLS spec ends up being called
something other than TLS?  It's clearly happened before =)

> Sounds good to me, though I haven't looked into how big of a diff that
> will be.
> 
> Also, do we have precedent for GUC aliases? That might be a little
> weird.

I don't think we do currently, but I have a feeling the topic has surfaced here
before.

If we end up settling on this being something we want I can volunteer to do the
legwork, but it seems a discussion best had before a patch is drafted.

--
Daniel Gustafsson   https://vmware.com/





Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2021-06-03 Thread Zhihong Yu
On Thu, Jun 3, 2021 at 1:50 PM Zhihong Yu  wrote:

>
>
> On Thu, Jun 3, 2021 at 11:12 AM Tomas Vondra <
> tomas.von...@enterprisedb.com> wrote:
>
>> Hi,
>>
>> On 6/3/21 7:17 PM, Arne Roland wrote:
>> > Hi,
>> >
>> >
>> > I haven't tested the parallel case, but I think we should sort out (3)
>> > get_cheapest_fractional_path_for_pathkeys as mentioned above.
>> >
>>
>> Not sure what you refer to by "above" - it's probably better to reply
>> in-line to existing message, which makes it much cleared.
>>
>> >
>> > I am lost about the comment regarding startup_new_fractional. Could you
>> > elaborate what you mean by that?
>> >
>>
>> Not sure what this refers to either - there's no startup_new_fractional
>> in my message and 'git grep startup_new_fractional' returns nothing.
>>
>> >
>> > Apart from that, I'd argue for a small test case. I attached a slimmed
>> > down case of what we were trying to fix. It might be worth to integrate
>> > that with an existing test, since more than a third of the time seems to
>> > be consumed by the creation and attachment of partitions.
>> >
>>
>> Maybe, if there's a suitable table to reuse, we can do that. But I don't
>> think it matters it takes ~1/3 of the time to attach the partitions.
>> What's more important is whether it measurably slows down the test
>> suite, and I don't think that's an issue.
>>
>> In any case, this seems a bit premature - we need something to test the
>> patch etc. We can worry about how expensive the test is much later.
>>
>> regards
>>
>> --
>> Tomas Vondra
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> Hi,
> In REL_11_STABLE branch, a search revealed the following:
>
> src/backend/optimizer/path/pathkeys.c: *
> get_cheapest_fractional_path_for_pathkeys
> src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List
> *paths,
> src/backend/optimizer/plan/planagg.c:
> get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
> src/include/optimizer/paths.h:extern Path
> *get_cheapest_fractional_path_for_pathkeys(List *paths,
>
> It seems this function has been refactored out in subsequent releases.
>
> FYI
>

Sent a bit too soon.

The above function still exists.
But startup_new_fractional was nowhere to be found.


Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2021-06-03 Thread Zhihong Yu
On Thu, Jun 3, 2021 at 11:12 AM Tomas Vondra 
wrote:

> Hi,
>
> On 6/3/21 7:17 PM, Arne Roland wrote:
> > Hi,
> >
> >
> > I haven't tested the parallel case, but I think we should sort out (3)
> > get_cheapest_fractional_path_for_pathkeys as mentioned above.
> >
>
> Not sure what you refer to by "above" - it's probably better to reply
> in-line to existing message, which makes it much cleared.
>
> >
> > I am lost about the comment regarding startup_new_fractional. Could you
> > elaborate what you mean by that?
> >
>
> Not sure what this refers to either - there's no startup_new_fractional
> in my message and 'git grep startup_new_fractional' returns nothing.
>
> >
> > Apart from that, I'd argue for a small test case. I attached a slimmed
> > down case of what we were trying to fix. It might be worth to integrate
> > that with an existing test, since more than a third of the time seems to
> > be consumed by the creation and attachment of partitions.
> >
>
> Maybe, if there's a suitable table to reuse, we can do that. But I don't
> think it matters it takes ~1/3 of the time to attach the partitions.
> What's more important is whether it measurably slows down the test
> suite, and I don't think that's an issue.
>
> In any case, this seems a bit premature - we need something to test the
> patch etc. We can worry about how expensive the test is much later.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> Hi,
In REL_11_STABLE branch, a search revealed the following:

src/backend/optimizer/path/pathkeys.c: *
get_cheapest_fractional_path_for_pathkeys
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List
*paths,
src/backend/optimizer/plan/planagg.c:
get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
src/include/optimizer/paths.h:extern Path
*get_cheapest_fractional_path_for_pathkeys(List *paths,

It seems this function has been refactored out in subsequent releases.

FYI


Re: pgsql: Add regression test for recovery pause.

2021-06-03 Thread Tom Lane
Andrew Dunstan  writes:
> My suspicion was correct. Fix pushed.

Great, thanks.

Do we need to worry about back-patching that?  It seems only
accidental if no existing back-branch test cases hit this.

regards, tom lane




Re: pgsql: Add regression test for recovery pause.

2021-06-03 Thread Andrew Dunstan


On 6/2/21 6:25 PM, Andrew Dunstan wrote:
>
>
> Looks to me like we're getting munged by the msys shell, and unlike on
> msys2 there isn't a way to disable it:
> https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line
>
>
> c.f. commit 73ff3a0abbb
>
>
> Maybe a robust solution would be to have the query piped to psql on its
> stdin rather than on the command line. poll_query_until looks on a quick
> check like the only place in PostgresNode where we use "psql -c"
>
>
> I'll experiment a bit tomorrow.
>
>
>


My suspicion was correct. Fix pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Andrew Dunstan


On 6/3/21 4:21 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> So AIUI your suggestion is that ALTER/DROP ROUTINE will look for an
>> ambiguity. If it doesn't find one it proceeds, otherwise it complains in
>> which case the user will have to fall back to ALTER/DROP
>> FUNCTION/PROCEDURE. Is that right? It seems a reasonable approach, and I
>> wouldn't expect to find too many ambiguous cases in practice.
> Yeah, I think that practical problems would be pretty rare.  My impression
> is that users tend not to use function/procedure name overloading too much
> in the first place, and none of this affects you at all till you do.
>
> Once you do, you'll possibly notice that PG's rules for which combinations
> of signatures are allowed are different from the spec's.  I believe that
> we're largely more generous than the spec, but there are a few cases where
> this proposal isn't.  An example is that (AFAICT) the spec allows having
> both
>   create procedure divide(x int, y int, OUT q int) ...
>   create procedure divide(x int, y int, OUT q int, OUT r int) ...
> which I want to reject because they have the same input parameters.
> This is perhaps annoying.  But seeing that the spec won't allow you to
> also have divide() procedures for other datatypes, I'm having a hard
> time feeling that this is losing on the overloading-flexibility front.
>
>   



Not sure I follow the "other datatypes" bit. Are you saying the spec
won't let you have this?:

create procedure divide(x int, y int, OUT q int);
create procedure divide(x int, y int, OUT q float);

cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Race condition in recovery?

2021-06-03 Thread Robert Haas
On Thu, May 27, 2021 at 2:26 AM Dilip Kumar  wrote:
> Changed as suggested.

I don't think the code as written here is going to work on Windows,
because your code doesn't duplicate enable_restoring's call to
perl2host or its backslash-escaping logic. It would really be better
if we could use enable_restoring directly. Also, I discovered that the
'return' in cp_history_files should really say 'exit', because
otherwise it generates a complaint every time it's run. It should also
have 'use strict' and 'use warnings' at the top.

Here's a version of your test case patch with the 1-line code fix
added, the above issues addressed, and a bunch of cosmetic tweaks.
Unfortunately, it doesn't pass for me consistently. I'm not sure if
that's because I broke something with my changes, or because the test
contains an underlying race condition which we need to address.
Attached also are the log files from a failed run if you want to look
at them. The key lines seem to be:

2021-06-03 16:16:53.984 EDT [47796] LOG:  restarted WAL streaming at
0/300 on timeline 2
2021-06-03 16:16:54.197 EDT [47813] 025_stuck_on_old_timeline.pl LOG:
statement: SELECT count(*) FROM tab_int
2021-06-03 16:16:54.197 EDT [47813] 025_stuck_on_old_timeline.pl
ERROR:  relation "tab_int" does not exist at character 22

Or from the main log:

Waiting for replication conn cascade's replay_lsn to pass '0/300' on standby
done
error running SQL: 'psql::1: ERROR:  relation "tab_int" does not exist
LINE 1: SELECT count(*) FROM tab_int
 ^'

I wonder whether that problem points to an issue with this incantation:

$node_standby->wait_for_catchup($node_cascade, 'replay',
$node_standby->lsn('replay'));

But I'm not sure, and I'm out of time to investigate for today.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v4-0001-Fix-corner-case-failure-of-new-standby-to-follow-.patch
Description: Binary data


025_stuck_on_old_timeline_primary.log
Description: Binary data


025_stuck_on_old_timeline_cascade.log
Description: Binary data


regress_log_025_stuck_on_old_timeline
Description: Binary data


025_stuck_on_old_timeline_standby.log
Description: Binary data


Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Tom Lane
Andrew Dunstan  writes:
> So AIUI your suggestion is that ALTER/DROP ROUTINE will look for an
> ambiguity. If it doesn't find one it proceeds, otherwise it complains in
> which case the user will have to fall back to ALTER/DROP
> FUNCTION/PROCEDURE. Is that right? It seems a reasonable approach, and I
> wouldn't expect to find too many ambiguous cases in practice.

Yeah, I think that practical problems would be pretty rare.  My impression
is that users tend not to use function/procedure name overloading too much
in the first place, and none of this affects you at all till you do.

Once you do, you'll possibly notice that PG's rules for which combinations
of signatures are allowed are different from the spec's.  I believe that
we're largely more generous than the spec, but there are a few cases where
this proposal isn't.  An example is that (AFAICT) the spec allows having
both
create procedure divide(x int, y int, OUT q int) ...
create procedure divide(x int, y int, OUT q int, OUT r int) ...
which I want to reject because they have the same input parameters.
This is perhaps annoying.  But seeing that the spec won't allow you to
also have divide() procedures for other datatypes, I'm having a hard
time feeling that this is losing on the overloading-flexibility front.

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Jeff Davis
On Thu, 2021-06-03 at 15:53 -0400, Andrew Dunstan wrote:
> Yeah, but it's annoying to have to start every talk I give touching
> this
> subject with the slide that says "When we say SSL we really means
> TLS".
> Maybe release 15 would be a good time to rename user-visible option
> names etc, with support for legacy names.

Sounds good to me, though I haven't looked into how big of a diff that
will be.

Also, do we have precedent for GUC aliases? That might be a little
weird.

Regards,
Jeff Davis






Make unlogged table resets detectable

2021-06-03 Thread Jeff Davis
One problem with unlogged tables is that the application has no way to
tell if they were reset, or they just happen to be empty.

This can be a problem with sharding, where you might have different
shards of an unlogged table on different servers. If one server
crashes, you'll be missing only one shard of the data, which may appear
inconsistent. In that case, you'd like the application (or sharding
solution) to be able to detect that one shard was lost, and TRUNCATE
those that remain to get back to a reasonable state.

It would be easy enough for the init fork to have a single page with a
flag set. That way, when the main fork is replaced with the init fork,
other code could detect that a reset happened.

When detected, depending on a GUC, the behavior could be to auto-
truncate it (to get the current silent behavior), or refuse to perform
the operation (except an explicit TRUNCATE), or issue a
warning/log/notice.

The biggest challenge would be: when should we detect that the reset
has happened? There might be a lot of entry points. Another idea would
be to just have a SQL function that the application could call whenever
it needs to know.

Thoughts?

Jeff Davis






Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Andrew Dunstan


On 6/3/21 1:47 PM, Daniel Gustafsson wrote:
>> On 3 Jun 2021, at 19:37, Jeff Davis  wrote:
>>
>> On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote:
>>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid
>>> needing
>>> to check for errors.
>> [ apologies for the late reply ]
>>
>> Would it be more proper to call it --with-tls={openssl,nss} ?
> Well, we use SSL for everything else (GUCs, connection params and env vars 
> etc)
> so I think --with-ssl is sensible.
>
> However, SSL and TLS are used quite interchangeably these days so I think it
> makes sense to provide --with-tls as an alias.
>

Yeah, but it's annoying to have to start every talk I give touching this
subject with the slide that says "When we say SSL we really means TLS".
Maybe release 15 would be a good time to rename user-visible option
names etc, with support for legacy names.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Andrew Dunstan


On 6/3/21 2:29 PM, Tom Lane wrote:
> I wrote:
>> Hmm, actually we could make step 2 a shade tighter: if a candidate
>> routine is a function, match against proargtypes.  If it's a procedure,
>> match against coalesce(proallargtypes, proargtypes).  If we find
>> multiple matches, raise ambiguity error.
> Where do we stand on this topic?
>
> I'm willing to have a go at implementing things that way, but
> time's a-wasting.
>
>   



So AIUI your suggestion is that ALTER/DROP ROUTINE will look for an
ambiguity. If it doesn't find one it proceeds, otherwise it complains in
which case the user will have to fall back to ALTER/DROP
FUNCTION/PROCEDURE. Is that right? It seems a reasonable approach, and I
wouldn't expect to find too many ambiguous cases in practice.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: security_definer_search_path GUC

2021-06-03 Thread Pavel Stehule
čt 3. 6. 2021 v 21:11 odesílatel Mark Dilger 
napsal:

>
>
> > On Jun 3, 2021, at 12:06 PM, Pavel Stehule 
> wrote:
> >
> >
> >
> > čt 3. 6. 2021 v 20:25 odesílatel Mark Dilger <
> mark.dil...@enterprisedb.com> napsal:
> >
> >
> > > On Jun 3, 2021, at 9:38 AM, Pavel Stehule 
> wrote:
> > >
> > > This design looks good for extensions, but I am not sure if it is good
> for users. Some declarative way without necessity to programming or install
> some extension can be nice.
> >
> > I agree, though "some declarative way" is a bit vague.  I've had ideas
> that perhaps superusers should be able to further restrict the [min,max]
> fields of int and real GUCs.  Since -1 is sometimes used to mean
> "disabled", syntax to allow specifying a set might be necessary, something
> like [-1, 60..600].  For text and enum GUCs, perhaps a set of regexps would
> work, some being required to match and others being required not to match,
> such as:
> >
> > search_path !~ '\mcustomerx\M'
> > search_path ~ '^pg_catalog,'
> >
> > If we did something like this, we'd need it to play nicely with other
> filters provided by extensions, because I'm reasonably sure not all filters
> could be done merely using set notation and regular expression syntax.  In
> fact, I find it hard to convince myself that set notation and regular
> expression syntax would even be useful in a large enough number of cases to
> be worth implementing.  What are your thought on that?
> >
> > I don't think so for immutable strings we need regular expressions.
> Maybe use some special keyword
> >
> > search_path only "pg_catalog"
>
> I think we're trying to solve different problems.  I'm trying to allow
> non-superusers to set GUCs while putting constraints on what values they
> choose.  You appear to be trying to revoke the ability to set a GUC by
> forcing it to only ever have a single value.
>

My proposal doesn't mean the search_path cannot be changed - it limits
possible values like your patch. Maybe we can get inspiration from
pg_hba.conf


>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>


Re: speed up verifying UTF-8

2021-06-03 Thread Heikki Linnakangas

On 03/06/2021 22:16, Heikki Linnakangas wrote:

On 03/06/2021 22:10, John Naylor wrote:

On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas mailto:hlinn...@iki.fi>> wrote:
  >                 x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
  >                 x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
  >
  >                 /* then check that the high bit is set in each byte. */
  >                 x = (x1 | x2);
  >                 x &= UINT64CONST(0x8080808080808080);
  >                 if (x != UINT64CONST(0x8080808080808080))
  >                         return 0;

That seems right, I'll try that and update the patch. (Forgot to attach
earlier anyway)


Ugh, actually that has the same issue as before. If one of the bytes is
in one half is zero, but not in the other half, this fail to detect it.
Sorry for the noise..


If you replace (x1 | x2) with (x1 & x2) above, I think it's correct.

- Heikki




Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-06-03 Thread David Christensen
New versions attached to address the initial CF feedback and rebase on HEAD
as of now.

0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch

- expands the units that pg_size_pretty() can handle up to YB.

0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch

- expands the units that pg_size_bytes() can handle, up to YB.


0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch
Description: Binary data


0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch
Description: Binary data


Re: speed up verifying UTF-8

2021-06-03 Thread Heikki Linnakangas

On 03/06/2021 22:10, John Naylor wrote:
On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas > wrote:

 >                 x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
 >                 x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
 >
 >                 /* then check that the high bit is set in each byte. */
 >                 x = (x1 | x2);
 >                 x &= UINT64CONST(0x8080808080808080);
 >                 if (x != UINT64CONST(0x8080808080808080))
 >                         return 0;

That seems right, I'll try that and update the patch. (Forgot to attach 
earlier anyway)


Ugh, actually that has the same issue as before. If one of the bytes is 
in one half is zero, but not in the other half, this fail to detect it. 
Sorry for the noise..


- Heikki




Re: speed up verifying UTF-8

2021-06-03 Thread John Naylor
On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas  wrote:
>
> On 03/06/2021 17:33, Greg Stark wrote:
> >> 3. It's probably cheaper perform the HAS_ZERO check just once on (half1
> > | half2). We have to compute (half1 | half2) anyway.
> >
> > Wouldn't you have to check (half1 & half2) ?
>
> Ah, you're right of course. But & is not quite right either, it will
> give false positives. That's ok from a correctness point of view here,
> because we then fall back to checking byte by byte, but I don't think
> it's a good tradeoff.

Ah, of course.

> /*
>  * Check if there are any zero bytes in this chunk.
>  *
>  * First, add 0x7f to each byte. This sets the high bit
in each byte,
>  * unless it was a zero. We already checked that none of
the bytes had
>  * the high bit set previously, so the max value each
byte can have
>  * after the addition is 0x7f + 0x7f = 0xfe, and we don't
need to
>  * worry about carrying over to the next byte.
>  */
> x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
> x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
>
> /* then check that the high bit is set in each byte. */
> x = (x1 | x2);
> x &= UINT64CONST(0x8080808080808080);
> if (x != UINT64CONST(0x8080808080808080))
> return 0;

That seems right, I'll try that and update the patch. (Forgot to attach
earlier anyway)

--
John Naylor
EDB: http://www.enterprisedb.com


Re: security_definer_search_path GUC

2021-06-03 Thread Mark Dilger



> On Jun 3, 2021, at 12:06 PM, Pavel Stehule  wrote:
> 
> 
> 
> čt 3. 6. 2021 v 20:25 odesílatel Mark Dilger  
> napsal:
> 
> 
> > On Jun 3, 2021, at 9:38 AM, Pavel Stehule  wrote:
> > 
> > This design looks good for extensions, but I am not sure if it is good for 
> > users. Some declarative way without necessity to programming or install 
> > some extension can be nice.
> 
> I agree, though "some declarative way" is a bit vague.  I've had ideas that 
> perhaps superusers should be able to further restrict the [min,max] fields of 
> int and real GUCs.  Since -1 is sometimes used to mean "disabled", syntax to 
> allow specifying a set might be necessary, something like [-1, 60..600].  For 
> text and enum GUCs, perhaps a set of regexps would work, some being required 
> to match and others being required not to match, such as:
> 
> search_path !~ '\mcustomerx\M'
> search_path ~ '^pg_catalog,'
> 
> If we did something like this, we'd need it to play nicely with other filters 
> provided by extensions, because I'm reasonably sure not all filters could be 
> done merely using set notation and regular expression syntax.  In fact, I 
> find it hard to convince myself that set notation and regular expression 
> syntax would even be useful in a large enough number of cases to be worth 
> implementing.  What are your thought on that?
> 
> I don't think so for immutable strings we need regular expressions.  Maybe 
> use some special keyword
> 
> search_path only "pg_catalog" 

I think we're trying to solve different problems.  I'm trying to allow 
non-superusers to set GUCs while putting constraints on what values they 
choose.  You appear to be trying to revoke the ability to set a GUC by forcing 
it to only ever have a single value.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: speed up verifying UTF-8

2021-06-03 Thread Heikki Linnakangas

On 03/06/2021 17:33, Greg Stark wrote:

3. It's probably cheaper perform the HAS_ZERO check just once on (half1

| half2). We have to compute (half1 | half2) anyway.

Wouldn't you have to check (half1 & half2) ?


Ah, you're right of course. But & is not quite right either, it will 
give false positives. That's ok from a correctness point of view here, 
because we then fall back to checking byte by byte, but I don't think 
it's a good tradeoff.


I think this works, however:

/* Verify a chunk of bytes for valid ASCII including a zero-byte check. */
static inline int
check_ascii(const unsigned char *s, int len)
{
uint64  half1,
half2,
highbits_set;
uint64  x1,
x2;
uint64  x;

if (len >= 2 * sizeof(uint64))
{
memcpy(, s, sizeof(uint64));
memcpy(, s + sizeof(uint64), sizeof(uint64));

/* Check if any bytes in this chunk have the high bit set. */
highbits_set = ((half1 | half2) & 
UINT64CONST(0x8080808080808080));
if (highbits_set)
return 0;

/*
 * Check if there are any zero bytes in this chunk.
 *
 * First, add 0x7f to each byte. This sets the high bit in each 
byte,
 * unless it was a zero. We already checked that none of the 
bytes had
 * the high bit set previously, so the max value each byte can 
have
 * after the addition is 0x7f + 0x7f = 0xfe, and we don't need 
to
 * worry about carrying over to the next byte.
 */
x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);

/* then check that the high bit is set in each byte. */
x = (x1 | x2);
x &= UINT64CONST(0x8080808080808080);
if (x != UINT64CONST(0x8080808080808080))
return 0;

return 2 * sizeof(uint64);
}
else
return 0;
}

- Heikki




Re: security_definer_search_path GUC

2021-06-03 Thread Pavel Stehule
čt 3. 6. 2021 v 20:25 odesílatel Mark Dilger 
napsal:

>
>
> > On Jun 3, 2021, at 9:38 AM, Pavel Stehule 
> wrote:
> >
> > This design looks good for extensions, but I am not sure if it is good
> for users. Some declarative way without necessity to programming or install
> some extension can be nice.
>
> I agree, though "some declarative way" is a bit vague.  I've had ideas
> that perhaps superusers should be able to further restrict the [min,max]
> fields of int and real GUCs.  Since -1 is sometimes used to mean
> "disabled", syntax to allow specifying a set might be necessary, something
> like [-1, 60..600].  For text and enum GUCs, perhaps a set of regexps would
> work, some being required to match and others being required not to match,
> such as:
>
> search_path !~ '\mcustomerx\M'
> search_path ~ '^pg_catalog,'
>
> If we did something like this, we'd need it to play nicely with other
> filters provided by extensions, because I'm reasonably sure not all filters
> could be done merely using set notation and regular expression syntax.  In
> fact, I find it hard to convince myself that set notation and regular
> expression syntax would even be useful in a large enough number of cases to
> be worth implementing.  What are your thought on that?
>

I don't think so for immutable strings we need regular expressions.  Maybe
use some special keyword

search_path only "pg_catalog"




> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>


Re: speed up verifying UTF-8

2021-06-03 Thread John Naylor
On Thu, Jun 3, 2021 at 9:16 AM Heikki Linnakangas  wrote:

> Some ideas:
>
> 1. Better to check if any high bits are set first. We care more about
> the speed of that than of detecting zero bytes, because input with high
> bits is valid but zeros are an error.
>
> 2. Since we check that there are no high bits, we can do the zero-checks
> with fewer instructions like this:

Both ideas make sense, and I like the shortcut we can take with the zero
check. I think Greg is right that the zero check needs “half1 & half2”, so
I tested with that (updated patches attached).

> What test set have you been using for performance testing this? I'd like

The microbenchmark is the same one you attached to [1], which I extended
with a 95% multibyte case. With the new zero check:

clang 12.0.5 / MacOS:

master:

 chinese | mixed | ascii
-+---+---
 981 |   688 |   371

0001:

 chinese | mixed | ascii
-+---+---
 932 |   548 |   110

plus optimized zero check:

 chinese | mixed | ascii
-+---+---
 689 |   573 |59

It makes sense that the Chinese text case is faster since the zero check is
skipped.

gcc 4.8.5 / Linux:

master:

 chinese | mixed | ascii
-+---+---
2561 |  1493 |   825

0001:

 chinese | mixed | ascii
-+---+---
2968 |  1035 |   158

plus optimized zero check:

 chinese | mixed | ascii
-+---+---
2413 |  1078 |   137

The second machine is a bit older and has an old compiler, but there is
still a small speed increase. In fact, without Heikki's tweaks, 0001
regresses on multibyte.

(Note: I'm not seeing the 7x improvement I claimed for 0001 here, but that
was from memory and I think that was a different machine and newer gcc. We
can report a range of results as we proceed.)

[1]
https://www.postgresql.org/message-id/06d45421-61b8-86dd-e765-f1ce527a5...@iki.fi

--
John Naylor
EDB: http://www.enterprisedb.com


Re: security_definer_search_path GUC

2021-06-03 Thread Isaac Morland
I thought everybody was already doing this, but maybe not. I put the
following in all my function definitions:

SET search_path FROM CURRENT

(with the exception of a very few functions which explicitly need to use
the caller's search path)

It seems to me that if this was the default (note: I'm totally ignoring
backward compatibility issues for now), then most of these issues wouldn't
exist. My schema creation scripts start with an appropriate search path
setting and that value then gets built into every function they create.

Related question: how can function compilation work when the behaviour
depends on the search path of the caller? In other words, the behaviour of
the function can be totally different on each call. Are there any popular
programming environments in which the behaviour of a called function
depends on the caller's environment (actually yes: shell scripting, with
$PATH especially; but besides that and stored procedures)?

I also want to mention that I consider any suggestion to eliminate the
search_path concept as a complete non-starter. It would be no different
from proposing that the next version of a programming language eliminate
(or stop using) the module system. If I could make it happen easily, I
would go in the other direction and allow schemas to be hierarchical (note:
totally ignoring all sorts of very important choices which are more than
just details about how this should work). I would like to be able to have
an extension or subsystem exist in a single schema, with its objects broken
up into schemas within the schema. Same reason as most languages have
hierarchical module systems.

On Thu, 3 Jun 2021 at 14:25, Mark Dilger 
wrote:

>
>
> > On Jun 3, 2021, at 9:38 AM, Pavel Stehule 
> wrote:
> >
> > This design looks good for extensions, but I am not sure if it is good
> for users. Some declarative way without necessity to programming or install
> some extension can be nice.
>
> I agree, though "some declarative way" is a bit vague.  I've had ideas
> that perhaps superusers should be able to further restrict the [min,max]
> fields of int and real GUCs.  Since -1 is sometimes used to mean
> "disabled", syntax to allow specifying a set might be necessary, something
> like [-1, 60..600].  For text and enum GUCs, perhaps a set of regexps would
> work, some being required to match and others being required not to match,
> such as:
>
> search_path !~ '\mcustomerx\M'
> search_path ~ '^pg_catalog,'
>
> If we did something like this, we'd need it to play nicely with other
> filters provided by extensions, because I'm reasonably sure not all filters
> could be done merely using set notation and regular expression syntax.  In
> fact, I find it hard to convince myself that set notation and regular
> expression syntax would even be useful in a large enough number of cases to
> be worth implementing.  What are your thought on that?
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>
>
>


Re: CALL versus procedures with output-only arguments

2021-06-03 Thread Tom Lane
I wrote:
> Hmm, actually we could make step 2 a shade tighter: if a candidate
> routine is a function, match against proargtypes.  If it's a procedure,
> match against coalesce(proallargtypes, proargtypes).  If we find
> multiple matches, raise ambiguity error.

Where do we stand on this topic?

I'm willing to have a go at implementing things that way, but
time's a-wasting.

regards, tom lane




Re: security_definer_search_path GUC

2021-06-03 Thread Mark Dilger



> On Jun 3, 2021, at 9:38 AM, Pavel Stehule  wrote:
> 
> This design looks good for extensions, but I am not sure if it is good for 
> users. Some declarative way without necessity to programming or install some 
> extension can be nice.

I agree, though "some declarative way" is a bit vague.  I've had ideas that 
perhaps superusers should be able to further restrict the [min,max] fields of 
int and real GUCs.  Since -1 is sometimes used to mean "disabled", syntax to 
allow specifying a set might be necessary, something like [-1, 60..600].  For 
text and enum GUCs, perhaps a set of regexps would work, some being required to 
match and others being required not to match, such as:

search_path !~ '\mcustomerx\M'
search_path ~ '^pg_catalog,'

If we did something like this, we'd need it to play nicely with other filters 
provided by extensions, because I'm reasonably sure not all filters could be 
done merely using set notation and regular expression syntax.  In fact, I find 
it hard to convince myself that set notation and regular expression syntax 
would even be useful in a large enough number of cases to be worth 
implementing.  What are your thought on that?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: SSL SNI

2021-06-03 Thread Tom Lane
I wrote:
> It looks like the immediate problem can be resolved by just adding
> a check for conn->pghost not being NULL,

... scratch that.  There's another problem here, which is that this
code should not be looking at conn->pghost AT ALL.  That will do the
wrong thing with a multi-element host list.  The right thing to be
looking at is conn->connhost[conn->whichhost].host --- with a test
to make sure it's not NULL or an empty string.  (I didn't stop to
study this code close enough to see if it'll ignore an empty
string without help.)

regards, tom lane




Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2021-06-03 Thread Tomas Vondra
Hi,

On 6/3/21 7:17 PM, Arne Roland wrote:
> Hi,
> 
> 
> I haven't tested the parallel case, but I think we should sort out (3)
> get_cheapest_fractional_path_for_pathkeys as mentioned above.
> 

Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.

> 
> I am lost about the comment regarding startup_new_fractional. Could you
> elaborate what you mean by that?
> 

Not sure what this refers to either - there's no startup_new_fractional
in my message and 'git grep startup_new_fractional' returns nothing.

> 
> Apart from that, I'd argue for a small test case. I attached a slimmed
> down case of what we were trying to fix. It might be worth to integrate
> that with an existing test, since more than a third of the time seems to
> be consumed by the creation and attachment of partitions.
> 

Maybe, if there's a suitable table to reuse, we can do that. But I don't
think it matters it takes ~1/3 of the time to attach the partitions.
What's more important is whether it measurably slows down the test
suite, and I don't think that's an issue.

In any case, this seems a bit premature - we need something to test the
patch etc. We can worry about how expensive the test is much later.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BUG #16079: Question Regarding the BUG #16064

2021-06-03 Thread Jeff Davis
On Mon, 2020-12-21 at 13:44 -0500, Tom Lane wrote:
> Hm.  I'm less concerned about that scenario than about somebody
> snooping
> the on-the-wire traffic.  If we're going to invent a connection
> setting
> for this, I'd say that in addition to "ok to send cleartext password"
> and "never ok to send cleartext password", there should be a setting
> for
> "send cleartext password only if connection is encrypted".  Possibly
> that should even be the default.

There was a fair amount of related discussion here:


https://www.postgresql.org/message-id/227015d8417f2b4fef03f8966dbfa5cbcc4f44da.camel%40j-davis.com

My feeling after all of that discussion is that the next step would be
to move to some kind of negotiation between client and server about
which methods are mutually acceptable. Right now, the protocol is
structured around the server driving the authentication process, and
the most the client can do is abort.

> BTW, do we have a client-side setting to insist that passwords not be
> sent in MD5 hashing either?  A person who is paranoid about this
> would
> likely want to disable that code path as well.

channel_binding=require is one way to do it, but it also requires ssl.

Regards,
Jeff Davis






Re: SSL SNI

2021-06-03 Thread Tom Lane
I wrote:
> Jacob Champion  writes:
>> It looks like this code needs some guards for a NULL conn->pghost. For 
>> example when running
>> psql 'dbname=postgres sslmode=require hostaddr=127.0.0.1'
>> with no PGHOST in the environment, psql is currently segfaulting for
>> me.

> Duplicated here:

It looks like the immediate problem can be resolved by just adding
a check for conn->pghost not being NULL, since the comment above
says

 * Per RFC 6066, do not set it if the host is a literal IP address (IPv4
 * or IPv6).

and having only hostaddr certainly fits that case.  But I didn't
check to see if any more problems arise later.

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Daniel Gustafsson
> On 3 Jun 2021, at 19:37, Jeff Davis  wrote:
> 
> On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote:
>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid
>> needing
>> to check for errors.
> 
> [ apologies for the late reply ]
> 
> Would it be more proper to call it --with-tls={openssl,nss} ?

Well, we use SSL for everything else (GUCs, connection params and env vars etc)
so I think --with-ssl is sensible.

However, SSL and TLS are used quite interchangeably these days so I think it
makes sense to provide --with-tls as an alias.

--
Daniel Gustafsson   https://vmware.com/





Re: SSL SNI

2021-06-03 Thread Tom Lane
Jacob Champion  writes:
> It looks like this code needs some guards for a NULL conn->pghost. For 
> example when running
> psql 'dbname=postgres sslmode=require hostaddr=127.0.0.1'
> with no PGHOST in the environment, psql is currently segfaulting for
> me.

Duplicated here:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f3adec47ec3 in __strspn_sse42 () from /lib64/libc.so.6
(gdb) bt
#0  0x7f3adec47ec3 in __strspn_sse42 () from /lib64/libc.so.6
#1  0x7f3adf6b7026 in initialize_SSL (conn=0xed4160)
at fe-secure-openssl.c:1090
#2  0x7f3adf6b8755 in pgtls_open_client (conn=conn@entry=0xed4160)
at fe-secure-openssl.c:132
#3  0x7f3adf6b3955 in pqsecure_open_client (conn=conn@entry=0xed4160)
at fe-secure.c:180
#4  0x7f3adf6a4808 in PQconnectPoll (conn=conn@entry=0xed4160)
at fe-connect.c:3102
#5  0x7f3adf6a5b31 in connectDBComplete (conn=conn@entry=0xed4160)
at fe-connect.c:2219
#6  0x7f3adf6a8968 in PQconnectdbParams (keywords=keywords@entry=0xed40c0, 
values=values@entry=0xed4110, expand_dbname=expand_dbname@entry=1)
at fe-connect.c:669
#7  0x00404db2 in main (argc=, argv=0x7ffc58477208)
at startup.c:266

You don't seem to need the "sslmode=require" either, just an
SSL-enabled server.

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2021-06-03 Thread Jeff Davis
On Tue, 2020-10-27 at 23:39 -0700, Andres Freund wrote:
> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid
> needing
> to check for errors.

[ apologies for the late reply ]

Would it be more proper to call it --with-tls={openssl,nss} ?

Regards,
Jeff Davis






Re: SSL SNI

2021-06-03 Thread Jacob Champion
On Wed, 2021-04-07 at 15:32 +0200, Peter Eisentraut wrote:
> Committed like that.  (Default to on, but it's easy to change if there 
> are any further thoughts.)

Hi Peter,

It looks like this code needs some guards for a NULL conn->pghost. For example 
when running

psql 'dbname=postgres sslmode=require hostaddr=127.0.0.1'
with no PGHOST in the environment, psql is currently segfaulting for
me.

--Jacob


Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2021-06-03 Thread Arne Roland
Hi,


I haven't tested the parallel case, but I think we should sort out (3) 
get_cheapest_fractional_path_for_pathkeys as mentioned above.


I am lost about the comment regarding startup_new_fractional. Could you 
elaborate what you mean by that?


Apart from that, I'd argue for a small test case. I attached a slimmed down 
case of what we were trying to fix. It might be worth to integrate that with an 
existing test, since more than a third of the time seems to be consumed by the 
creation and attachment of partitions.


Regards
Arne



merge-fraction-cheapest_example.sql
Description: merge-fraction-cheapest_example.sql


Re: missing GRANT on pg_subscription columns

2021-06-03 Thread Tom Lane
"Euler Taveira"  writes:
> I was checking the GRANT on pg_subscription and noticed that the command is 
> not
> correct. There is a comment that says "All columns of pg_subscription except
> subconninfo are readable". However, there are columns that aren't included: 
> oid
> and subsynccommit. It seems an oversight in the commits 6f236e1eb8c and
> 887227a1cc8.

Ugh.

> There are monitoring tools and data collectors that aren't using a
> superuser to read catalog information (I usually recommend using pg_monitor).
> Hence, you cannot join pg_subscription with relations such as
> pg_subscription_rel or pg_stat_subscription because column oid has no
> column-level privilege. I'm attaching a patch to fix it (indeed, 2 patches
> because of additional columns for v14). We should add instructions in the 
> minor
> version release notes too.

I agree with fixing this in HEAD.  But given that this has been wrong
since v10 with zero previous complaints, I doubt that it is worth the
complication of trying to do something about it in the back branches.
Maybe we could just adjust the docs there, instead.

regards, tom lane




Re: security_definer_search_path GUC

2021-06-03 Thread Pavel Stehule
čt 3. 6. 2021 v 18:30 odesílatel Mark Dilger 
napsal:

>
>
> > On Jun 3, 2021, at 9:03 AM, Pavel Stehule 
> wrote:
> >
> > I agree so some possibility of locking search_path or possibility to
> control who and when can change it can increase security. This should be a
> core feature. It's maybe more generic issue - same functionality can be
> required for work_mem setting, maybe max_paralel_workers_per_gather, and
> other GUC
>
> Chapman already suggested a mechanism in [1] to allow chaining together
> additional validators for GUCs.
>
> When setting search_path, the check_search_path(char **newval, void
> **extra, GucSource source) function is invoked.  As I understand Chapman's
> proposal, additional validators could be added to any GUC.  You could
> implement search_path restrictions by defining additional validators that
> enforce whatever restriction you like.
>

This design looks good for extensions, but I am not sure if it is good for
users. Some declarative way without necessity to programming or install
some extension can be nice.

Pavel


> Marko, does his idea sound workable for your needs?  I understood your
> original proposal as only restricting the value of search_path within
> security definer functions.  This idea would allow you to restrict it
> everywhere, and not tailored to just that context.
>
> [1]
> https://www.postgresql.org/message-id/608c9a81.3020...@anastigmatix.net
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>


Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-03 Thread Jeff Davis
On Thu, 2021-06-03 at 09:29 +0530, Amit Kapila wrote:
> The idea is to support two_phase via protocol with a subscriber-side
> work where we can test it as well. The code to support it via
> replication protocol is present in the first patch of subscriber-side
> work [1] which uses that code as well. Basically, we don't have a
> good
> way to test it without subscriber-side work so decided to postpone it
> till the corresponding work is done.

Thank you for clarifying.

Right now, it feels a bit incomplete. If it's not much work, I
recommend breaking out the CREATE_REPLICATION_SLOT changes and updating
pg_recvlogical, so that it can go in v14 (and
pg_create_logical_replication_slot() will match
CREATE_REPLICATION_SLOT). But if that's complicated or controversial,
then I'm fine waiting for the other work to complete.

Regards,
Jeff Davis






Re: security_definer_search_path GUC

2021-06-03 Thread Marko Tiikkaja
On Thu, Jun 3, 2021 at 7:30 PM Mark Dilger 
wrote:

> > On Jun 3, 2021, at 9:03 AM, Pavel Stehule 
> wrote:
> >
> > I agree so some possibility of locking search_path or possibility to
> control who and when can change it can increase security. This should be a
> core feature. It's maybe more generic issue - same functionality can be
> required for work_mem setting, maybe max_paralel_workers_per_gather, and
> other GUC
>
> Chapman already suggested a mechanism in [1] to allow chaining together
> additional validators for GUCs.
>
> When setting search_path, the check_search_path(char **newval, void
> **extra, GucSource source) function is invoked.  As I understand Chapman's
> proposal, additional validators could be added to any GUC.  You could
> implement search_path restrictions by defining additional validators that
> enforce whatever restriction you like.
>
> Marko, does his idea sound workable for your needs?  I understood your
> original proposal as only restricting the value of search_path within
> security definer functions.  This idea would allow you to restrict it
> everywhere, and not tailored to just that context.
>

Yeah, that would work for my use case just as well.


.m


Re: security_definer_search_path GUC

2021-06-03 Thread Mark Dilger



> On Jun 3, 2021, at 9:03 AM, Pavel Stehule  wrote:
> 
> I agree so some possibility of locking search_path or possibility to control 
> who and when can change it can increase security. This should be a core 
> feature. It's maybe more generic issue - same functionality can be required 
> for work_mem setting, maybe max_paralel_workers_per_gather, and other GUC

Chapman already suggested a mechanism in [1] to allow chaining together 
additional validators for GUCs.

When setting search_path, the check_search_path(char **newval, void **extra, 
GucSource source) function is invoked.  As I understand Chapman's proposal, 
additional validators could be added to any GUC.  You could implement 
search_path restrictions by defining additional validators that enforce 
whatever restriction you like.

Marko, does his idea sound workable for your needs?  I understood your original 
proposal as only restricting the value of search_path within security definer 
functions.  This idea would allow you to restrict it everywhere, and not 
tailored to just that context.

[1] https://www.postgresql.org/message-id/608c9a81.3020...@anastigmatix.net

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Move pg_attribute.attcompression to earlier in struct for reduced size?

2021-06-03 Thread Alvaro Herrera
On 2021-Jun-02, Michael Paquier wrote:

> After 12 runs of VACUUM FULL on my laptop, I have removed the two
> highest and the two lowest to remove some noise, and did an average of
> the rest:
> - HEAD, 100 text columns: 5720ms
> - REL_13_STABLE, 100 text columns: 4308ms
> - HEAD, 200 text columns: 10020ms
> - REL_13_STABLE, 200 text columns:  8319ms
> 
> So yes, that looks much visible to me, and an argument in favor of the
> removal of the forced recompression on HEAD when rewriting tuples.

Just to be clear -- that's the time to vacuum the table without changing
the compression algorithm, right?  So the overhead is just the check for
whether the recompression is needed, not the recompression itself?

If the check for recompression is this expensive, then yeah I agree that
we should take it out.  If recompression is actually occurring, then I
don't think this is a good test :-)

-- 
Álvaro Herrera   Valdivia, Chile




Re: security_definer_search_path GUC

2021-06-03 Thread Pavel Stehule
čt 3. 6. 2021 v 17:54 odesílatel Marko Tiikkaja  napsal:

> On Thu, Jun 3, 2021 at 9:14 AM Joel Jacobson  wrote:
>
>> On Thu, Jun 3, 2021, at 00:55, Marko Tiikkaja wrote:
>>
>> They still show up everywhere when looking at "public".  So this is only
>> slightly better, and a maintenance burden.
>>
>>
>> Good point. I find this annoying as well sometimes.
>>
>> It's easy to get a list of all objects for an extension, via \dx+
>>
>> But it's hard to see what objects in a schema, that are provided by
>> different extensions, via e.g. \df public.*
>>
>> What about adding a new "Extension" column next to "Schema" to the
>> relevant commands, such as \df?
>>
>
> That's just one part of it.  The other part of my original proposal was to
> avoid having to  SET search_path  for all SECURITY DEFINER functions.  I
> still think either being able to lock search_path or the separate prosecdef
> search_path is the best option here.
>

I agree so some possibility of locking search_path or possibility to
control who and when can change it can increase security. This should be a
core feature. It's maybe more generic issue - same functionality can be
required for work_mem setting, maybe max_paralel_workers_per_gather, and
other GUC

Regards

Pavel

>
>
> .m
>


Re: security_definer_search_path GUC

2021-06-03 Thread Marko Tiikkaja
On Thu, Jun 3, 2021 at 9:14 AM Joel Jacobson  wrote:

> On Thu, Jun 3, 2021, at 00:55, Marko Tiikkaja wrote:
>
> They still show up everywhere when looking at "public".  So this is only
> slightly better, and a maintenance burden.
>
>
> Good point. I find this annoying as well sometimes.
>
> It's easy to get a list of all objects for an extension, via \dx+
>
> But it's hard to see what objects in a schema, that are provided by
> different extensions, via e.g. \df public.*
>
> What about adding a new "Extension" column next to "Schema" to the
> relevant commands, such as \df?
>

That's just one part of it.  The other part of my original proposal was to
avoid having to  SET search_path  for all SECURITY DEFINER functions.  I
still think either being able to lock search_path or the separate prosecdef
search_path is the best option here.


.m


Re: security_definer_search_path GUC

2021-06-03 Thread Pavel Stehule
čt 3. 6. 2021 v 8:14 odesílatel Joel Jacobson  napsal:

> On Thu, Jun 3, 2021, at 00:55, Marko Tiikkaja wrote:
>
> On Wed, Jun 2, 2021 at 11:32 PM Joel Jacobson  wrote:
>
> But if running a recent PostgreSQL version, with support for extensions, I
> think an even cleaner solution
> would be to package such compatibility versions in a "compat" extension,
> that would just install them into the public schema.
>
>
> Writing, verifying and shipping extension upgrade scripts is not pleasant.
>
>
> I agree. Thanks for acknowledging this problem.
>
> I'm experimenting with an idea that I hope can simplify the "verifying"
> part of the problem.
> hope to have something to show you all soon.
>
> I'd much prefer something that's integrated to the workflow I already have.
>
>
> Fair point. I guess also the initial switching cost of changing workflow
> is quite high and difficult to motivate. So even if extensions ergonomics
> are improved, many existing users will not migrate their workflows anyway
> due to this.
>
>
>
> And if you wonder what functions in public come from the compat extension,
> you can use use \dx+.
>
>
> They still show up everywhere when looking at "public".  So this is only
> slightly better, and a maintenance burden.
>
>
> Good point. I find this annoying as well sometimes.
>
> It's easy to get a list of all objects for an extension, via \dx+
>
> But it's hard to see what objects in a schema, that are provided by
> different extensions, via e.g. \df public.*
>
> What about adding a new "Extension" column next to "Schema" to the
> relevant commands, such as \df?
>

I think so for \df+ it can be very useful. I don't think it is important
enough to be in short form, but it can be nice in enhanced form.

Pavel


> /Joel
>


Re: speed up verifying UTF-8

2021-06-03 Thread John Naylor
I wrote:

> On Thu, Jun 3, 2021 at 10:42 AM Greg Stark  wrote:
> >

> > If
> > we're processing much more than 128 bits and happy to detect NUL
> > errors only at the end after wasting some work then you could hoist
> > that has_zero check entirely out of the loop (removing the branch
> > though it's probably a correctly predicted branch anyways).
> >
> > Do something like:
> >
> > zero_accumulator = zero_accumulator & next_chunk
> >
> > in the loop and then only at the very end check for zeros in that.
>
> That's the approach taken in the SSE4 patch, and in fact that's the
logical way to do it there. I hadn't considered doing it that way in the
pure C case, but I think it's worth trying.

Actually, I spoke too quickly. We can't have an error accumulator in the C
case because we need to return how many bytes were valid. In fact, in the
SSE case, it checks the error vector at the end and then reruns with the
fallback case to count the valid bytes.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: speed up verifying UTF-8

2021-06-03 Thread John Naylor
On Thu, Jun 3, 2021 at 10:42 AM Greg Stark  wrote:
>
> I haven't looked at the surrounding code. Are we processing all the
> COPY data in one long stream or processing each field individually?

It happens on 64kB chunks.

> If
> we're processing much more than 128 bits and happy to detect NUL
> errors only at the end after wasting some work then you could hoist
> that has_zero check entirely out of the loop (removing the branch
> though it's probably a correctly predicted branch anyways).
>
> Do something like:
>
> zero_accumulator = zero_accumulator & next_chunk
>
> in the loop and then only at the very end check for zeros in that.

That's the approach taken in the SSE4 patch, and in fact that's the logical
way to do it there. I hadn't considered doing it that way in the pure C
case, but I think it's worth trying.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Asynchronous Append on postgres_fdw nodes.

2021-06-03 Thread Andrey Lepikhov

On 3/6/21 14:49, Etsuro Fujita wrote:

On Tue, May 11, 2021 at 6:55 PM Etsuro Fujita  wrote:

On Tue, May 11, 2021 at 6:27 PM Andrey Lepikhov
 wrote:

On 11/5/21 12:24, Etsuro Fujita wrote:



  ->  Append (actual rows=3000 loops=1)
->  Async Foreign Scan on f1 (actual rows=0 loops=1)
->  Async Foreign Scan on f2 (actual rows=0 loops=1)
->  Foreign Scan on f3 (actual rows=3000 loops=1)

Here we give preference to the synchronous scan. Why?


This would be expected behavior, and the reason is avoid performance
degradation; you might think it would be better to execute the async
Foreign Scan nodes more aggressively, but it would require
waiting/polling for file descriptor events many times, which is
expensive and might cause performance degradation.  I think there is
room for improvement, though.

Yes, I agree with you. Maybe you can add note in documentation on
async_capable, for example:
"... Synchronous and asynchronous scanning strategies can be mixed by
optimizer in one scan plan of a partitioned table or an 'UNION ALL'
command. For performance reasons, synchronous scans executes before the
first of async scan. ..."


+1  But I think this is an independent issue, so I think it would be
better to address the issue separately.


I think that since postgres-fdw.sgml would be for users rather than
developers, unlike fdwhandler.sgml, it would be better to explain this
more in a not-too-technical way.  So how about something like this?

Asynchronous execution is applied even when an Append node contains
subplan(s) executed synchronously as well as subplan(s) executed
asynchronously.  In that case, if the asynchronous subplans are ones
executed using postgres_fdw, tuples from the asynchronous subplans are
not returned until after at least one synchronous subplan returns all
tuples, as that subplan is executed while the asynchronous subplans
are waiting for the results of queries sent to foreign servers.  This
behavior might change in a future release.

Good, this text is clear for me.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Multi-Column List Partitioning

2021-06-03 Thread Nitin Jadhav
> Approach 1 sounds better.  It sounds like approach 1 might help us
> implement support for allowing NULLs in range partition bounds in the
> future, if at all.  For now, it might be better to not allocate the
> isnull array except for list partitioning.

Thanks for confirming.

> I'll wait for you to post a new patch addressing at least the comments
> in my earlier email.  Also, please make sure to run `make check`
> successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres@17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres@17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres@17503=#\d+ t
  Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
+-+---+--+-+--+-+--+-
 a  | integer |   |  | | plain|
 |  |
 b  | text|   |  | | extended |
 |  |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres@17503=#\d+ t1
Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
+-+---+--+-+--+-+--+-
 a  | integer |   |  | | plain|
 |  |
 b  | text|   |  | | extended |
 |  |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing  a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

Pending items:
1. Support of partition pruning for multi-column list partitioning.
2. Support of partition wise join for multi-column list partitioning.

I will continue to work on the above 2 items.
Kindly let me know if I am missing something.

Thanks & Regards,
Nitin Jadhav


On Wed, May 26, 2021 at 10:27 AM Amit Langote  wrote:
>
> On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
>  wrote:
> > > IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
> > > understand from this syntax that the partition constraint is something
> > > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> > > row-wise comparison.
> >
> > Thanks for suggesting to use row-wise comparison.
>
> Actually, I was just describing how the *users* may want to visualize
> the partition constraint...
>
> > I have few queries
> > with respect to handling of NULL values.
> >
> > 1. What should be the partition constraint for the above case. AFAIK,
> > row-wise comparison wont work with NULL values as shown in [1]. I mean
> > two rows are considered equal if all their corresponding members are
> > non-null and equal. The rows are unequal if any corresponding members
> > are non-null and unequal. Otherwise the result of the row comparison
> > is unknown (null). So we should generate different types of
> > constraints for NULL values.
> >
> > Ex:
> > CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
> > CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
> > (NULL, 1), (NULL, NULL);
> >
> > As per my 

Re: speed up verifying UTF-8

2021-06-03 Thread Greg Stark
I haven't looked at the surrounding code. Are we processing all the
COPY data in one long stream or processing each field individually? If
we're processing much more than 128 bits and happy to detect NUL
errors only at the end after wasting some work then you could hoist
that has_zero check entirely out of the loop (removing the branch
though it's probably a correctly predicted branch anyways).

Do something like:

zero_accumulator = zero_accumulator & next_chunk

in the loop and then only at the very end check for zeros in that.




Re: speed up verifying UTF-8

2021-06-03 Thread Greg Stark
> 3. It's probably cheaper perform the HAS_ZERO check just once on (half1
| half2). We have to compute (half1 | half2) anyway.

Wouldn't you have to check (half1 & half2) ?




Re: pg_upgrade is failed for 'plpgsql_call_handler' handler

2021-06-03 Thread Daniel Gustafsson
> On 3 Jun 2021, at 16:12, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 3 Jun 2021, at 11:53, tushar  wrote:
>>> In one of my testing scenario, i found pg_upgrade is failed for 
>>> 'plpgsql_call_handler' handle
> 
>> This is intentional since the language template work in 8.1, before then
>> pg_dump would look up support functions in pg_catalog.
> 
> I don't see any particular need to support reaching inside the guts
> of another PL language implementation, as this test case does.
> We'd be perfectly within our rights to rename plpgsql_call_handler
> as something else; that should be nobody's business except that
> of the plpgsql extension.
> 
> But yeah, the behavior you're seeing here is intended to support
> normally-packaged languages.  pg_dump won't ordinarily dump objects
> in pg_catalog, because it assumes stuff in pg_catalog is to
> be treated as built-in.

Agreed, I don't think there is anything we could/should do here (the lack of
complaints in the archives back that up).

--
Daniel Gustafsson   https://vmware.com/





Re: pg_upgrade is failed for 'plpgsql_call_handler' handler

2021-06-03 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 3 Jun 2021, at 11:53, tushar  wrote:
>> In one of my testing scenario, i found pg_upgrade is failed for 
>> 'plpgsql_call_handler' handle

> This is intentional since the language template work in 8.1, before then
> pg_dump would look up support functions in pg_catalog.

I don't see any particular need to support reaching inside the guts
of another PL language implementation, as this test case does.
We'd be perfectly within our rights to rename plpgsql_call_handler
as something else; that should be nobody's business except that
of the plpgsql extension.

But yeah, the behavior you're seeing here is intended to support
normally-packaged languages.  pg_dump won't ordinarily dump objects
in pg_catalog, because it assumes stuff in pg_catalog is to
be treated as built-in.

regards, tom lane




Re: Failures with gcd functions with GCC snapshots GCC and -O3 (?)

2021-06-03 Thread Tom Lane
Michael Paquier  writes:
> serinus has been complaining about the new gcd functions in 13~:

moonjelly, which also runs a bleeding-edge gcc, started to fail the same
way at about the same time.  Given that none of our code in that area
has changed, it's hard to think it's anything but a broken compiler.
Maybe somebody should report that to gcc upstream?

regards, tom lane




missing GRANT on pg_subscription columns

2021-06-03 Thread Euler Taveira
Hi,

I was checking the GRANT on pg_subscription and noticed that the command is not
correct. There is a comment that says "All columns of pg_subscription except
subconninfo are readable". However, there are columns that aren't included: oid
and subsynccommit. It seems an oversight in the commits 6f236e1eb8c and
887227a1cc8.

There are monitoring tools and data collectors that aren't using a
superuser to read catalog information (I usually recommend using pg_monitor).
Hence, you cannot join pg_subscription with relations such as
pg_subscription_rel or pg_stat_subscription because column oid has no
column-level privilege. I'm attaching a patch to fix it (indeed, 2 patches
because of additional columns for v14). We should add instructions in the minor
version release notes too.

This issue was reported by Israel Barth.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 64b138b9974934f689e57fc34d370424b2a348a9 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 31 May 2021 19:40:36 -0300
Subject: [PATCH] Grant read privilege to additional pg_subscription columns

pg_subscription should be read by PUBLIC except subconninfo column.
Documentation is correct but the GRANT command is not. Columns oid and
subsynccommit don't have the right privileges. It seems an oversight in
the commits 6f236e1eb8c and 887227a1cc8. The current behavior prohibits
joins between pg_subscription and related tables (pg_subscription_rel
and pg_stat_subscription) for non-superusers.
---
 src/backend/catalog/system_views.sql | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5c84d758bb..5088e7f1d5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1254,5 +1254,5 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 
 -- All columns of pg_subscription except subconninfo are readable.
 REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (subdbid, subname, subowner, subenabled, subbinary, substream, subslotname, subpublications)
+GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary, substream, subslotname, subsynccommit, subpublications)
 ON pg_subscription TO public;
-- 
2.20.1

From d54165988c69a021962adaad6e2e31f80f4ad85c Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 1 Jun 2021 11:17:26 -0300
Subject: [PATCH] Grant read privilege to additional pg_subscription columns

pg_subscription should be read by PUBLIC except subconninfo column.
Documentation is correct but the GRANT command is not. Columns oid and
subsynccommit don't have the right privileges. It seems an oversight in
the commits 6f236e1eb8c and 887227a1cc8. The current behavior prohibits
joins between pg_subscription and related tables (pg_subscription_rel
and pg_stat_subscription) for non-superusers.
---
 src/backend/catalog/system_views.sql | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 51d738cc42..eb363c9ade 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1122,7 +1122,7 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 
 -- All columns of pg_subscription except subconninfo are readable.
 REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (subdbid, subname, subowner, subenabled, subslotname, subpublications)
+GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subslotname, subsynccommit, subpublications)
 ON pg_subscription TO public;
 
 
-- 
2.20.1



Re: speed up verifying UTF-8

2021-06-03 Thread Heikki Linnakangas

On 02/06/2021 19:26, John Naylor wrote:
For v10, I've split the patch up into two parts. 0001 uses pure C 
everywhere. This is much smaller and easier to review, and gets us the 
most bang for the buck.


One concern Heikki raised upthread is that platforms with poor 
unaligned-memory access will see a regression. We could easily add an 
#ifdef to take care of that, but I haven't done so here.


To recap: On ascii-only input with storage taken out of the picture, 
profiles of COPY FROM show a reduction from nealy 10% down to just over 
1%. In microbenchmarks found earlier in this thread, this works out to 
about 7 times faster. On multibyte/mixed input, 0001 is a bit faster, 
but not really enough to make a difference in copy performance.


Nice!

This kind of bit-twiddling is fun, so I couldn't resist tinkering with 
it, to see if we can shave some more instructions from it:



+/* from https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord */
+#define HAS_ZERO(chunk) ( \
+   ((chunk) - UINT64CONST(0x0101010101010101)) & \
+~(chunk) & \
+UINT64CONST(0x8080808080808080))
+
+/* Verify a chunk of bytes for valid ASCII including a zero-byte check. */
+static inline int
+check_ascii(const unsigned char *s, int len)
+{
+   uint64  half1,
+   half2,
+   highbits_set;
+
+   if (len >= 2 * sizeof(uint64))
+   {
+   memcpy(, s, sizeof(uint64));
+   memcpy(, s + sizeof(uint64), sizeof(uint64));
+
+   /* If there are zero bytes, bail and let the slow path handle 
it. */
+   if (HAS_ZERO(half1) || HAS_ZERO(half2))
+   return 0;
+
+   /* Check if any bytes in this chunk have the high bit set. */
+   highbits_set = ((half1 | half2) & 
UINT64CONST(0x8080808080808080));
+
+   if (!highbits_set)
+   return 2 * sizeof(uint64);
+   else
+   return 0;
+   }
+   else
+   return 0;
+}


Some ideas:

1. Better to check if any high bits are set first. We care more about 
the speed of that than of detecting zero bytes, because input with high 
bits is valid but zeros are an error.


2. Since we check that there are no high bits, we can do the zero-checks 
with fewer instructions like this:


/* NB: this is only correct if 'chunk' doesn't have any high bits set */
#define HAS_ZERO(chunk) ( \
  ((chunk) + \
   UINT64CONST(0x7f7f7f7f7f7f7f7f)) & \
  UINT64CONST(0x8080808080808080) == UINT64CONST(0x8080808080808080))

3. It's probably cheaper perform the HAS_ZERO check just once on (half1 
| half2). We have to compute (half1 | half2) anyway.



Putting all that together:

/* Verify a chunk of bytes for valid ASCII including a zero-byte check. */
static inline int
check_ascii(const unsigned char *s, int len)
{
uint64  half1,
half2,
highbits_set;
uint64  x;

if (len >= 2 * sizeof(uint64))
{
memcpy(, s, sizeof(uint64));
memcpy(, s + sizeof(uint64), sizeof(uint64));

/* Check if any bytes in this chunk have the high bit set. */
highbits_set = ((half1 | half2) & 
UINT64CONST(0x8080808080808080));
if (highbits_set)
return 0;

/*
 * Check if there are any zero bytes in this chunk. This is 
only correct
 * if there are no high bits set, but we checked that already.
 */
x = (half1 | half2) + UINT64CONST(0x7f7f7f7f7f7f7f7f);
x &= UINT64CONST(0x8080808080808080);
if (x != UINT64CONST(0x8080808080808080))
return 0;

return 2 * sizeof(uint64);
}
else
return 0;
}

In quick testing, that indeed compiles into fewer instructions. With 
GCC, there's no measurable difference in performance. But with clang, 
this version is much faster than the original, because the original 
version is much slower than when compiled with GCC. In other words, this 
version seems to avoid some clang misoptimization. I tested only with 
ASCII input, I haven't tried other cases.


What test set have you been using for performance testing this? I'd like 
to know how this version compares, and I could also try running it on my 
old raspberry pi, which is more strict about alignmnt.


0002 adds the SSE4 implementation on x86-64, and is equally fast on all 
input, at the cost of greater complexity.


Didn't look closely, but seems reasonable at a quick glance.

- Heikki




  1   2   >