Re: Export log_line_prefix(); useful for emit_log_hook.

2022-07-03 Thread Michael Paquier
On Wed, Jun 29, 2022 at 03:09:42PM +0200, Alvaro Herrera wrote:
> Hmm, maybe your hypothetical book would prefer to use a different
> setting for log line prefix than Log_line_prefix, so it would make sense
> to pass the format string as a parameter to the function instead of
> relying on the GUC global.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Prevent writes on large objects in read-only transactions

2022-07-03 Thread Michael Paquier
On Wed, Jun 29, 2022 at 05:29:50PM +0900, Yugo NAGATA wrote:
> Thank you for reviewing the patch. I attached the updated patch.

Thanks for the new version.  I have looked at that again, and the set
of changes seem fine (including the change for the alternate output).
So, applied.
--
Michael


signature.asc
Description: PGP signature


Re: Perform streaming logical transactions by background workers and parallel apply

2022-07-03 Thread Peter Smith
Below are some review comments for patch v14-0004:


v14-0004


4.0 General.

This comment is an after-thought but as I write this mail I am
wondering if most of this 0004 patch is even necessary at all? Instead
of introducing a new column and all the baggage that goes with it,
can't the same functionality be achieved just by toggling the
streaming mode 'substream' value from 'p' (parallel) to 't' (on)
whenever an error occurs causing a retry? Anyway, if you do change it
this way then most of the following comments can be disregarded.


==

4.1 Commit message

Patch needs an explanatory commit message. Currently, there is nothing.

==

4.2 doc/src/sgml/catalogs.sgml

+ 
+  
+   subretry bool
+  
+  
+   If true, the subscription will not try to apply streaming transaction
+   in parallel mode. See
+for more information.
+  
+ 

I think it is overkill to mention anything about the
streaming=parallel here because IIUC it is nothing to do with this
field at all. I thought you really only need to say something brief
like:

SUGGESTION:
True if the previous apply change failed and a retry was required.

==

4.3 doc/src/sgml/ref/create_subscription.sgml

@@ -244,6 +244,10 @@ CREATE SUBSCRIPTION subscription_nameon
+  mode.
  

I did not think it is good to say "we" in the docs.

SUGGESTION
When applying a streaming transaction, if either requirement is not
met, the background worker will exit with an error. Parallel mode is
disregarded when retrying; instead the transaction will be applied
using streaming = on.

==

4.4 .../replication/logical/applybgworker.c

+ /*
+ * We don't start new background worker if retry was set as it's possible
+ * that the last time we tried to apply a transaction in background worker
+ * and the check failed (see function apply_bgworker_relation_check). So
+ * we will try to apply this transaction in apply worker.
+ */

SUGGESTION (simplified, and remove "we")
Don't use apply background workers for retries, because it is possible
that the last time we tried to apply a transaction using an apply
background worker the checks failed (see function
apply_bgworker_relation_check).

~~~

4.5

+ elog(DEBUG1, "retry to apply an streaming transaction in apply "
+ "background worker");

IMO the log message is too confusing

SUGGESTION
"apply background workers are not used for retries"

==

4.6 src/backend/replication/logical/worker.c

4.6.a - apply_handle_commit

+ /* Set the flag that we will not retry later. */
+ set_subscription_retry(false);

But the comment is wrong, isn't it? Shouldn’t it just say that we are
not *currently* retrying? And can’t this just anyway be redundant if
only the catalog column has a DEFAULT value of false?

4.6.b - apply_handle_prepare
Ditto

4.6.c - apply_handle_commit_prepared
Ditto

4.6.d - apply_handle_rollback_prepared
Ditto

4.6.e - apply_handle_stream_prepare
Ditto

4.6.f - apply_handle_stream_abort
Ditto

4.6.g - apply_handle_stream_commit
Ditto

~~~

4.7 src/backend/replication/logical/worker.c

4.7.a - start_table_sync

@@ -3894,6 +3917,9 @@ start_table_sync(XLogRecPtr *origin_startpos,
char **myslotname)
  }
  PG_CATCH();
  {
+ /* Set the flag that we will retry later. */
+ set_subscription_retry(true);

Maybe this should say more like "Flag that the next apply will be the
result of a retry"

4.7.b - start_apply
Ditto

~~~

4.8 src/backend/replication/logical/worker.c - set_subscription_retry

+
+/*
+ * Set subretry of pg_subscription catalog.
+ *
+ * If retry is true, subscriber is about to exit with an error. Otherwise, it
+ * means that the changes was applied successfully.
+ */
+static void
+set_subscription_retry(bool retry)

"changes" -> "change" ?

~~~

4.8 src/backend/replication/logical/worker.c - set_subscription_retry

Isn't this flag only every used when streaming=parallel? But it does
not seem ot be checking that anywhere before potentiall executing all
this code when maybe will never be used.

==

4.9 src/include/catalog/pg_subscription.h

@@ -76,6 +76,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  bool subdisableonerr; /* True if a worker error should cause the
  * subscription to be disabled */

+ bool subretry; /* True if the previous apply change failed. */

I was wondering if you can give this column a DEFAULT value of false,
because then perhaps most of the patch code from worker.c may be able
to be eliminated.

==

4.10 .../subscription/t/032_streaming_apply.pl

I felt that the test cases all seem to blend together. IMO it will be
more readable if the main text parts are  visually separated

e.g using a comment like:
# =


--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Lazy JIT IR code generation to increase JIT speed with partitions

2022-07-03 Thread Luc Vlaming Hummel
Hi Alvaro, hi David,

Thanks for reviewing this and the interesting examples!

Wanted to give a bit of extra insight as to why I'd love to have a system that 
can lazily emit JIT code and hence creates roughly a module per function:
In the end I'm hoping that we can migrate to a system where we only JIT after a 
configurable cost has been exceeded for this node, as well as a configurable 
amount of rows has actually been processed.
Reason is that this would safeguard against some problematic planning issues 
wrt JIT (node not being executed, row count being massively off).
It would also allow for more finegrained control, with a cost system similar to 
most other planning costs, as they are also per node and not globally, and 
would potentially allow us to only JIT things where we expect to truly gain any 
runtime compared to the costs of doing it.

If this means we have to invest more in making it cheap(er) to emit modules, 
I'm all for that. Kudos to David for fixing the caching in that sense :) 
@Andres if there's any other things we ought to fix to make this cheap (enough) 
compared to the previous code I'd love to know your thoughts.

Best,
Luc Vlaming
(ServiceNow)


From: David Geier 
Sent: Wednesday, June 29, 2022 11:03 AM
To: Alvaro Herrera 
Cc: Luc Vlaming ; Andres Freund ; 
PostgreSQL-development 
Subject: Re: Lazy JIT IR code generation to increase JIT speed with partitions 
 
[External Email]
 
Hi Alvaro,

That's a very interesting case and might indeed be fixed or at least improved 
by this patch. I tried to reproduce this, but at least when running a simple, 
serial query with increasing numbers of functions, the time spent per function 
is linear or even slightly sub-linear (same as Tom observed in [1]).

I also couldn't reproduce the JIT runtimes you shared, when running the 
attached catalog query. The catalog query ran serially for me with the 
following JIT stats:

 JIT:
   Functions: 169
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 12.223 ms, Inlining 17.323 ms, Optimization 388.491 ms, 
Emission 283.464 ms, Total 701.501 ms

Is it possible that the query ran in parallel for you? For parallel queries, 
every worker JITs all of the functions it uses. Even though the workers might 
JIT the functions in parallel, the time reported in the EXPLAIN ANALYZE output 
is the sum of the time spent by all workers. With this patch applied, the JIT 
time drops significantly, as many of the generated functions remain unused.

 JIT:
   Modules: 15
   Functions: 26
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.931 ms, Inlining 0.722 ms, Optimization 67.195 ms, 
Emission 70.347 ms, Total 140.195 ms

Of course, this does not prove that the nonlinearity that you observed went 
away. Could you share with me how you ran the query so that I can reproduce 
your numbers on master to then compare them with the patched version? Also, 
which LLVM version did you run with? I'm currently running with LLVM 13.

Thanks!

--
David Geier
(ServiceNow)

On Mon, Jun 27, 2022 at 5:37 PM Alvaro Herrera  wrote:
On 2021-Jan-18, Luc Vlaming wrote:

> I would like this topic to somehow progress and was wondering what other
> benchmarks / tests would be needed to have some progress? I've so far
> provided benchmarks for small(ish) queries and some tpch numbers, assuming
> those would be enough.

Hi, some time ago I reported a case[1] where our JIT implementation does
a very poor job and perhaps the changes that you're making could explain
what is going on, and maybe even fix it:

[1] https://postgr.es/m/20241706.wqq7xoyigwa2@alvherre.pgsql

The query for which I investigated the problem involved some pg_logical
metadata tables, so I didn't post it anywhere public; but the blog post
I found later contains a link to a query that shows the same symptoms,
and which is luckily still available online:
https://gist.github.com/saicitus/251ba20b211e9e73285af35e61b19580
I attach it here in case it goes missing sometime.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Probable memory leak with ECPG and AIX

2022-07-03 Thread Noah Misch
On Sat, Jul 02, 2022 at 11:59:58PM -0400, Tom Lane wrote:
> Confirmed that either initializing ecpg_clocale or adding -fno-common
> allows the ecpglib build to succeed.  (Testing it beyond that would
> require another hour or so to build the rest of the system, so I won't.)
> 
> As I said, I was already leaning to the idea that initializing the
> variable explicitly is better style, so I recommend we do that.

Works for me.  Pushed that way, and things have been clean.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-03 Thread Masahiko Sawada
On Tue, Jun 28, 2022 at 10:10 PM John Naylor
 wrote:
>
> On Tue, Jun 28, 2022 at 1:24 PM Masahiko Sawada  wrote:
> >
> > > I
> > > suspect other optimizations would be worth a lot more than using AVX2:
> > > - collapsing inner nodes
> > > - taking care when constructing the key (more on this when we
> > > integrate with VACUUM)
> > > ...and a couple Andres mentioned:
> > > - memory management: in
> > > https://www.postgresql.org/message-id/flat/20210717194333.mr5io3zup3kxahfm%40alap3.anarazel.de
> > > - node dispatch:
> > > https://www.postgresql.org/message-id/20210728184139.qhvx6nbwdcvo63m6%40alap3.anarazel.de
> > >
> > > Therefore, I would suggest that we use SSE2 only, because:
> > > - portability is very easy
> > > - to avoid a performance hit from indirecting through a function pointer
> >
> > Okay, I'll try these optimizations and see if the performance becomes 
> > better.
>
> FWIW, I think it's fine if we delay these until after committing a
> good-enough version. The exception is key construction and I think
> that deserves some attention now (more on this below).

Agreed.

>
> > I've done benchmark tests while changing the node types. The code base
> > is v3 patch that doesn't have the optimization you mentioned below
> > (memory management and node dispatch) but I added the code to use SSE2
> > for node-16 and node-32.
>
> Great, this is helpful to visualize what's going on!
>
> > * sse2_4_16_48_256
> > * nkeys = 9091, height = 3, n4 = 0, n16 = 0, n48 = 512, n256 = 
> > 916433
> > * nkeys = 2, height = 3, n4 = 2, n16 = 0, n48 = 207, n256 = 50
> >
> > * sse2_4_32_128_256
> > * nkeys = 9091, height = 3, n4 = 0, n32 = 285, n128 = 916629, n256 
> > = 31
> > * nkeys = 2, height = 3, n4 = 2, n32 = 48, n128 = 208, n256 = 1
>
> > Observations are:
> >
> > In both test cases, There is not much difference between using AVX2
> > and SSE2. The more mode types, the more time it takes for loading the
> > data (see sse2_4_16_32_128_256).
>
> Good to know. And as Andres mentioned in his PoC, more node types
> would be a barrier for pointer tagging, since 32-bit platforms only
> have two spare bits in the pointer.
>
> > In dense case, since most nodes have around 100 children, the radix
> > tree that has node-128 had a good figure in terms of memory usage. On
>
> Looking at the node stats, and then your benchmark code, I think key
> construction is a major influence, maybe more than node type. The
> key/value scheme tested now makes sense:
>
> blockhi || blocklo || 9 bits of item offset
>
> (with the leaf nodes containing a bit map of the lowest few bits of
> this whole thing)
>
> We want the lower fanout nodes at the top of the tree and higher
> fanout ones at the bottom.

So more inner nodes can fit in CPU cache, right?

>
> Note some consequences: If the table has enough columns such that much
> fewer than 100 tuples fit on a page (maybe 30 or 40), then in the
> dense case the nodes above the leaves will have lower fanout (maybe
> they will fit in a node32). Also, the bitmap values in the leaves will
> be more empty. In other words, many tables in the wild *resemble* the
> sparse case a bit, even if truly all tuples on the page are dead.
>
> Note also that the dense case in the benchmark above has ~4500 times
> more keys than the sparse case, and uses about ~1000 times more
> memory. But the runtime is only 2-3 times longer. That's interesting
> to me.
>
> To optimize for the sparse case, it seems to me that the key/value would be
>
> blockhi || 9 bits of item offset || blocklo
>
> I believe that would make the leaf nodes more dense, with fewer inner
> nodes, and could drastically speed up the sparse case, and maybe many
> realistic dense cases.

Does it have an effect on the number of inner nodes?

>  I'm curious to hear your thoughts.

Thank you for your analysis. It's worth trying. We use 9 bits for item
offset but most pages don't use all bits in practice. So probably it
might be better to move the most significant bit of item offset to the
left of blockhi. Or more simply:

9 bits of item offset || blockhi || blocklo

>
> > the other hand, the radix tree that doesn't have node-128 has a better
> > number in terms of insertion performance. This is probably because we
> > need to iterate over 'isset' flags from the beginning of the array in
> > order to find an empty slot when inserting new data. We do the same
> > thing also for node-48 but it was better than node-128 as it's up to
> > 48.
>
> I mentioned in my diff, but for those following along, I think we can
> improve that by iterating over the bytes and if it's 0xFF all 8 bits
> are set already so keep looking...

Right. Using 0xFF also makes the code readable so I'll change that.

>
> > In terms of lookup performance, the results vary but I could not find
> > any common pattern that makes the performance better or worse. Getting
> > more statistics such as the number of each node type per tree level
> > might 

Re: Perform streaming logical transactions by background workers and parallel apply

2022-07-03 Thread Peter Smith
Below are some review comments for patch v14-0003:


v14-0003


3.1 Commit message

If any of the following checks are violated, an error will be reported.
1. The unique columns between publisher and subscriber are difference.
2. There is any non-immutable function present in expression in
subscriber's relation. Check from the following 4 items:
   a. The function in triggers;
   b. Column default value expressions and domain constraints;
   c. Constraint expressions.
   d. The foreign keys.

SUGGESTION (rewording to match the docs and the code).

Add some checks before using apply background worker to apply changes.
streaming=parallel mode has two requirements:
1) The unique columns must be the same between publisher and subscriber
2) There cannot be any non-immutable functions in the subscriber-side
replicated table. Look for functions in the following places:
* a. Trigger functions
* b. Column default value expressions and domain constraints
* c. Constraint expressions
* d. Foreign keys

==

3.2 doc/src/sgml/ref/create_subscription.sgml

+  To run in this mode, there are following two requirements. The first
+  is that the unique column should be the same between publisher and
+  subscriber; the second is that there should not be any non-immutable
+  function in subscriber-side replicated table.

SUGGESTION
Parallel mode has two requirements: 1) the unique columns must be the
same between publisher and subscriber; 2) there cannot be any
non-immutable functions in the subscriber-side replicated table.

==

3.3 .../replication/logical/applybgworker.c - apply_bgworker_relation_check

+ * Check if changes on this logical replication relation can be applied by
+ * apply background worker.

SUGGESTION
Check if changes on this relation can be applied by an apply background worker.


~~~

3.4

+ * Although we maintains the commit order by allowing only one process to
+ * commit at a time, our access order to the relation has changed.

SUGGESTION
Although the commit order is maintained only allowing one process to
commit at a time, the access order to the relation has changed.

~~~

3.5

+ /* Check only we are in apply bgworker. */
+ if (!am_apply_bgworker())
+ return;

SUGGESTION
/* Skip check if not an apply background worker. */

~~~

3.6

+ /*
+ * If it is a partitioned table, we do not check it, we will check its
+ * partition later.
+ */

This comment is lacking useful details:

/* Partition table checks are done later in (?) */

~~~

3.7

+ if (!rel->sameunique)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot replicate relation with different unique index"),
+ errhint("Please change the streaming option to 'on' instead of
'parallel'.")));

Maybe the first message should change slightly so it is worded
consistently with the other one.

SUGGESTION
errmsg("cannot replicate relation. Unique indexes must be the same"),

==

3.8 src/backend/replication/logical/proto.c

-#define LOGICALREP_IS_REPLICA_IDENTITY 1
+#define LOGICALREP_IS_REPLICA_IDENTITY 0x0001
+#define LOGICALREP_IS_UNIQUE 0x0002

I think these constants should named differently to reflect that they
are just attribute flags. They should should use similar bitset styles
to the other nearby constants.

SUGGESTION
#define ATTR_IS_REPLICA_IDENTITY (1 << 0)
#define ATTR_IS_UNIQUE (1 << 1)

~~~

3.9 src/backend/replication/logical/proto.c - logicalrep_write_attrs

This big slab of new code to get the BMS looks very similar to
RelationGetIdentityKeyBitmap. So perhaps this code should be
encapsulated in another function like that one (in relcache.c?) and
then just called from logicalrep_write_attrs

==

3.10 src/backend/replication/logical/relation.c -
logicalrep_relmap_reset_volatility_cb

+/*
+ * Reset the flag volatility of all existing entry in the relation map cache.
+ */
+static void
+logicalrep_relmap_reset_volatility_cb(Datum arg, int cacheid, uint32 hashvalue)

SUGGESTION
Reset the volatility flag of all entries in the relation map cache.

~~~

3.11 src/backend/replication/logical/relation.c -
logicalrep_rel_mark_safe_in_apply_bgworker

+/*
+ * Check if unique index/constraint matches and mark sameunique and volatility
+ * flag.
+ *
+ * Don't throw any error here just mark the relation entry as not sameunique or
+ * FUNCTION_NONIMMUTABLE as we only check these in apply background worker.
+ */
+static void
+logicalrep_rel_mark_safe_in_apply_bgworker(LogicalRepRelMapEntry *entry)

SUGGESTION
Check if unique index/constraint matches and assign 'sameunique' flag.
Check if there are any non-immutable functions and assign the
'volatility' flag. Note: Don't throw any error here - these flags will
be checked in the apply background worker.

~~~

3.12 src/backend/replication/logical/relation.c -
logicalrep_rel_mark_safe_in_apply_bgworker

I did not really understand why you used an enum for one flag
(volatility) but not the other one (sameunique); shouldn’t bot

Re: Handle infinite recursion in logical replication setup

2022-07-03 Thread Amit Kapila
On Mon, Jul 4, 2022 at 3:59 AM Peter Smith  wrote:
>
> On Mon, Jul 4, 2022 at 12:59 AM vignesh C  wrote:
> ...
> > > 2.
> > > /* ALTER SUBSCRIPTION  SET ( */
> > >   else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) &&
> > > TailMatches("SET", "("))
> > > - COMPLETE_WITH("binary", "slot_name", "streaming",
> > > "synchronous_commit", "disable_on_error");
> > > + COMPLETE_WITH("binary", "origin", "slot_name", "streaming",
> > > "synchronous_commit", "disable_on_error");
> > >   /* ALTER SUBSCRIPTION  SKIP ( */
> > >   else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) &&
> > > TailMatches("SKIP", "("))
> > >   COMPLETE_WITH("lsn");
> > > @@ -3152,7 +3152,7 @@ psql_completion(const char *text, int start, int 
> > > end)
> > >   /* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */
> > >   else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", 
> > > "("))
> > >   COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> > > -   "enabled", "slot_name", "streaming",
> > > +   "enabled", "origin", "slot_name", "streaming",
> > > "synchronous_commit", "two_phase", "disable_on_error");
> > >
> > > Why do you choose to add a new option in-between other parameters
> > > instead of at the end which we normally do? The one possible reason I
> > > can think of is that all the parameters at the end are boolean so you
> > > want to add this before those but then why before slot_name, and again
> > > I don't see such a rule being followed for other parameters.
> >
> > I was not sure if it should be maintained in alphabetical order,
> > anyway since the last option "disable_on_error" is at the end, I have
> > changed it to the end.
> >
>
> Although it seems it is not a hard rule, mostly the COMPLETE_WITH are
> coded using alphabetical order. Anyway, I think that was a clear
> intention here too since 13 of 14 parameters were already in
> alphabetical order; it is actually only that "disable_on_error"
> parameter that was misplaced; not the new "origin" parameter.
>

Agreed, but let's not change disable_on_error as part of this patch.

-- 
With Regards,
Amit Kapila.




Re: "ERROR: latch already owned" on gharial

2022-07-03 Thread Thomas Munro
On Wed, Jun 1, 2022 at 12:55 AM Robert Haas  wrote:
> OK, I have access to the box now. I guess I might as well leave the
> crontab jobs enabled until the next time this happens, since Thomas
> just took steps to improve the logging, but I do think these BF
> members are overdue to be killed off, and would like to do that as
> soon as it seems like a reasonable step to take.

A couple of months later, there has been no repeat of that error.  I'd
happily forget about that and move on, if you want to decommission
these.




Re: Too-long socket paths are breaking several buildfarm members

2022-07-03 Thread Michael Paquier
On Sun, Jul 03, 2022 at 09:40:23PM -0400, Tom Lane wrote:
> Yeah, I just came to the same conclusion and pushed an equivalent
> patch.  Sorry for the duplicated effort.

No problem.  Thanks for the quick fix.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: dshash: Add sequential scan support.

2022-07-03 Thread Thomas Munro
[Re-directing to -hackers]

On Fri, Mar 11, 2022 at 2:27 PM Andres Freund  wrote:
> On 2022-03-10 20:09:56 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > dshash: Add sequential scan support.
> > > Add ability to scan all entries sequentially to dshash. The interface is
> > > similar but a bit different both from that of dynahash and simple dshash
> > > search functions. The most significant differences is that dshash's 
> > > interfac
> > > always needs a call to dshash_seq_term when scan ends.
> >
> > Umm ... what about error recovery?  Or have you just cemented the
> > proposition that long-lived dshashes are unsafe?
>
> I don't think this commit made it worse. dshash_seq_term() releases an lwlock
> (which will be released in case of an error) and unsets
> hash_table->find_[exclusively_]locked. The latter weren't introduced by this
> patch, and are also set by dshash_find().
>
> I agree that ->find_[exclusively_]locked are problematic from an error
> recovery perspective.

Right, as seen in the build farm at [1].  Also reproducible with something like:

@@ -269,6 +269,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size
request_size,
return false;
}

+   /* XXX random fault injection */
+   if (op == DSM_OP_ATTACH && random() < RAND_MAX / 8)
+   {
+   close(fd);
+   elog(ERROR, "chaos");
+   return false;
+   }
+

I must have thought that it was easy and practical to write no-throw
straight-line code and be sure to reach dshash_release_lock(), but I
concede that it was a bad idea: even dsa_get_address() can throw*, and
you're often likely to need to call that while accessing dshash
elements.  For example, in lookup_rowtype_tupdesc_internal(), there is
a sequence dshash_find(), ..., dsa_get_address(), ...,
dshash_release_lock(), and I must have considered the range of code
between find and release to be no-throw, but now I know that it is
not.

> It's per-backend state at least and just used for assertions. We could remove
> it. Or stop checking it in places where it could be set wrongly: dshash_find()
> and dshash_detach() couldn't check anymore, but the rest of the assertions
> would still be valid afaics?

Yeah, it's all for assertions... let's just remove it.  Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

*dsa_get_address() might need to adjust the memory map with system
calls, which might fail.  If you think of DSA as not only an allocator
but also a poor man's user level virtual memory scheme to tide us over
until we get threads, then this is a pretty low level kind of
should-not-happen failure that is analogous on some level to SIGBUS or
SIGSEGV or something like that, and we should PANIC.  Then we could
claim that dsa_get_address() is no-throw.  At least, that was one
argument I had with myself while investigating that strange Solaris
shm_open() failure, but ... I lost the argument.  It's quite an
extreme position to take just to support these assertions, which are
of pretty limited value.

[1] 
https://www.postgresql.org/message-id/20220701232009.jcwxpl45bptaxv5n%40alap3.anarazel.de
From e3e87a9ec39d3456a7bf29796102502d3724993e Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 4 Jul 2022 11:56:02 +1200
Subject: [PATCH] Remove spurious assertions from dshash.c.

dshash.c maintained flags to track locking, in order to make assertions
that the dshash API was being used correctly.  Unfortunately the
interaction with ereport's non-local exits was not thought through
carefully enough.  While lwlocks are released automatically, the flags
can get out of sync.  Since they're only used for assertions anyway, we
can just remove them.

This problem was noted by Tom and Andres while reviewing changes to
support the new shared memory stats system in 15, the second user of
dshash.c.  On closer inspection, even the earlier typcache.c code is not
guaranteed to meet the asserted conditions, now that it's been
highlighted that even dsa_get_address() can throw (albeit in unlikely
circumstances).

Back-patch to 11, where dshash.c arrived.

Reported-by: Tom Lane 
Reported-by: Andres Freund 
Discussion: https://postgr.es/m/20220311012712.botrpsikaufzt...@alap3.anarazel.de

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index ec454b4d65..9706e7926b 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -110,8 +110,6 @@ struct dshash_table
 	dshash_table_control *control;	/* Control object in DSM. */
 	dsa_pointer *buckets;		/* Current bucket pointers in DSM. */
 	size_t		size_log2;		/* log2(number of buckets) */
-	bool		find_locked;	/* Is any partition lock held by 'find'? */
-	bool		find_exclusively_locked;	/* ... exclusively? */
 };
 
 /* Given a pointer to an item, find the entry (user data) it holds. */
@@ -234,9 +232,6 @@ dshash_create(dsa_area *area, const dshash_para

Re: Issue with pg_stat_subscription_stats

2022-07-03 Thread Masahiko Sawada
On Sat, Jul 2, 2022 at 9:52 AM Masahiko Sawada  wrote:
>
>
>
> On Sat, Jul 2, 2022 at 2:53 Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2022-07-01 16:08:48 +0900, Masahiko Sawada wrote:
>> > Yes, my point is that it may be misleading that the subscription stats
>> > are created when a subscription is created.
>>
>> I think it's important to create stats at that time, because otherwise it's
>> basically impossible to ensure that stats are dropped when a transaction 
>> rolls
>> back. If some / all columns should return something else before stats are
>> reported that can be addressed easily by tracking that in a separate field.
>>
>>
>> > On the other hand, I'm not sure we agreed that the behavior that
>> > Melanie reported is not a problem. The user might get confused since
>> > the subscription stats works differently than other stats when a
>> > reset. Previously, the primary reason why I hesitated to create the
>> > subscription stats when creating a subscription is that CREATE
>> > SUBSCRIPTION (with create_slot = false) can be rolled back. But with
>> > the shmem stats, we can easily resolve it by using
>> > pgstat_create_transactional().
>>
>> Yep.
>>
>>
>> > > > The second problem is that the following code in DropSubscription()
>> > > > should be updated:
>> > > >
>> > > > /*
>> > > >  * Tell the cumulative stats system that the subscription is 
>> > > > getting
>> > > >  * dropped. We can safely report dropping the subscription 
>> > > > statistics here
>> > > >  * if the subscription is associated with a replication slot since 
>> > > > we
>> > > >  * cannot run DROP SUBSCRIPTION inside a transaction block.  
>> > > > Subscription
>> > > >  * statistics will be removed later by (auto)vacuum either if it's 
>> > > > not
>> > > >  * associated with a replication slot or if the message for 
>> > > > dropping the
>> > > >  * subscription gets lost.
>> > > >  */
>> > > > if (slotname)
>> > > > pgstat_drop_subscription(subid);
>> > > >
>> > > > I think we can call pgstat_drop_subscription() even if slotname is
>> > > > NULL and need to update the comment.
>> > > >
>> > >
>> > > +1.
>> > >
>> > > > IIUC autovacuum is no longer
>> > > > responsible for garbage collection.
>> > > >
>> > >
>> > > Right, this is my understanding as well.
>> >
>> > Thank you for the confirmation.
>>
>> Want to propose a patch?
>
>
> Yes, I’ll propose a patch.
>

I've attached the patch, fix_drop_subscriptions_stats.patch, to fix it.

I've also attached another PoC patch,
poc_create_subscription_stats.patch, to create the stats entry when
creating the subscription, which address the issue reported in this
thread; the pg_stat_reset_subscription_stats() doesn't update the
stats_reset if no error is reported yet.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


fix_drop_subscription_stats.patch
Description: Binary data


poc_create_subscription_stats.patch
Description: Binary data


Re: Too-long socket paths are breaking several buildfarm members

2022-07-03 Thread Tom Lane
Michael Paquier  writes:
> There is PostgreSQL::Test::Utils::tempdir_short for that, which is
> what all the nodes created in Cluster.pm use for
> unix_socket_directories.  One way to address the issue would be to
> pass that to pg_upgrade with --socketdir, as of the attached.

Yeah, I just came to the same conclusion and pushed an equivalent
patch.  Sorry for the duplicated effort.

regards, tom lane




Re: Too-long socket paths are breaking several buildfarm members

2022-07-03 Thread Michael Paquier
On Sun, Jul 03, 2022 at 07:22:11PM -0400, Tom Lane wrote:
> That path name is 3 bytes over the platform limit.  Evidently,
> "REL_15_STABLE" is just enough longer than "HEAD" to make this fail,
> whereas we didn't see the problem as long as the test case only
> ran in HEAD.

That tells enough about UNIXSOCK_PATH_BUFLEN.  It looks like test.sh
has been using for ages /tmp/pg_upgrade_check* as socket directory to
counter this issue.

> Members butterflyfish, massasauga, and myna likewise have yet to pass
> this test in REL_15_STABLE, though they're perfectly happy in HEAD.
> They are returning cut-down logs that don't allow diagnosing for
> certain, but a reasonable bet is that it's the same kind of problem.

Hmm.  That's possible.

> I think that the conversion of pg_upgrade's test script to TAP
> form missed a bet.  IIRC, we have mechanism somewhere to ensure
> that test socket path names are created under /tmp, or someplace else
> that's not subject to possibly-long paths of installation directories.
> That's evidently not being used here.

There is PostgreSQL::Test::Utils::tempdir_short for that, which is
what all the nodes created in Cluster.pm use for
unix_socket_directories.  One way to address the issue would be to
pass that to pg_upgrade with --socketdir, as of the attached.
--
Michael
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 67e0be6856..8dbda0950e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -216,6 +216,9 @@ chdir ${PostgreSQL::Test::Utils::tmp_check};
 # Upgrade the instance.
 $oldnode->stop;
 
+# Use a short path for socket directories.
+my $socketdir = PostgreSQL::Test::Utils::tempdir_short;
+
 # Cause a failure at the start of pg_upgrade, this should create the logging
 # directory pg_upgrade_output.d but leave it around.  Keep --check for an
 # early exit.
@@ -228,6 +231,7 @@ command_fails(
 		'-B', $newbindir,
 		'-p', $oldnode->port,
 		'-P', $newnode->port,
+		'-s', $socketdir,
 		'--check'
 	],
 	'run of pg_upgrade --check for new instance with incorrect binary path');
@@ -241,7 +245,8 @@ command_ok(
 		'pg_upgrade', '--no-sync','-d', $oldnode->data_dir,
 		'-D', $newnode->data_dir, '-b', $oldbindir,
 		'-B', $newbindir, '-p', $oldnode->port,
-		'-P', $newnode->port, '--check'
+		'-P', $newnode->port, '-s', $socketdir,
+		'--check'
 	],
 	'run of pg_upgrade --check for new instance');
 ok(!-d $newnode->data_dir . "/pg_upgrade_output.d",
@@ -253,7 +258,7 @@ command_ok(
 		'pg_upgrade', '--no-sync','-d', $oldnode->data_dir,
 		'-D', $newnode->data_dir, '-b', $oldbindir,
 		'-B', $newbindir, '-p', $oldnode->port,
-		'-P', $newnode->port
+		'-P', $newnode->port, '-s', $socketdir,
 	],
 	'run of pg_upgrade for new instance');
 ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",


signature.asc
Description: PGP signature


Re: AIX support - alignment issues

2022-07-03 Thread Tom Lane
Andres Freund  writes:
> On 2022-07-03 20:08:19 -0400, Tom Lane wrote:
>> I do still feel that HPPA is of interest, to keep us honest
>> about spinlock support

> I.e. forgetting to initialize them? Or the weird alignment stuff it has?

The nonzero initialization mainly, and to a lesser extent the weird
size of a lock.  I think the fact that the active word is only part
of the lock struct is pretty well encapsulated.

> I'd started to work a patch to detect missing initialization for both
> spinlocks and lwlocks, I think that'd be good to have for more common cases.

No objection to having more than one check for this ;-)

regards, tom lane




Re: AIX support - alignment issues

2022-07-03 Thread Andres Freund
Hi,

On 2022-07-03 20:08:19 -0400, Tom Lane wrote:
> I would have preferred to keep pademelon, with its pre-C99 compiler, going
> until v11 is EOL, but that ain't happening.

I'm not too worried about that - clang with
  -std=c89 -Wc99-extensions -Werror=c99-extensions
as it's running on mylodon for the older branches seems to do a decent
job. And is obviously much faster :)


> I would not stand in the way of dropping HP-UX and IA64 support as of
> v16.

Cool.


> I do still feel that HPPA is of interest, to keep us honest
> about spinlock support

I.e. forgetting to initialize them? Or the weird alignment stuff it has?

I'd started to work a patch to detect missing initialization for both
spinlocks and lwlocks, I think that'd be good to have for more common cases.

Greetings,

Andres Freund




Re: AIX support - alignment issues

2022-07-03 Thread Andres Freund
Hi,

On 2022-07-04 10:33:37 +1200, Thomas Munro wrote:
> I don't have a dog in this race, but AIX is clearly not in the same
> category as HP-UX (and maybe Solaris is somewhere in between).

The reason to consider whether it's worth supporting AIX is that it's library
model is very different from other unix like platforms (much closer to windows
though). We also have dedicated compiler support for it, which I guess could
separately be dropped.

Greetings,

Andres Freund




Re: AIX support - alignment issues

2022-07-03 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Jul 3, 2022 at 8:34 AM Tom Lane  wrote:
>> I am a little concerned though that we don't have access to the latest
>> version of AIX --- that seems like a non-maintainable situation.

> The release history doesn't look t bad on that front: the live
> versions are 7.1 (2010-2023), 7.2 (2015-TBA) and 7.3 (2021-TBA).  7.3
> only came out half a year ago, slightly after Windows 11, which we
> aren't testing yet either.  Those GCC AIX systems seem to be provided
> by IBM and the Open Source Lab at Oregon State University which has a
> POWER lab providing ongoing CI services etc to various OSS projects,
> so I would assume that upgrades (and retirement of the
> about-to-be-desupported 7.1 system) will come along eventually.

OK, we can wait awhile to see what happens on that.

> I don't have a dog in this race, but AIX is clearly not in the same
> category as HP-UX (and maybe Solaris is somewhere in between).  AIX
> runs on hardware you can buy today that got a major refresh last year
> (Power 10), while HP-UX runs only on discontinued CPUs, so while it's
> a no-brainer to drop HP-UX support, it's a trickier question for AIX.

Yeah.  FTR, I'm out of the HP-UX game: due to a hardware failure,
I can no longer boot that installation.  I would have preferred to
keep pademelon, with its pre-C99 compiler, going until v11 is EOL,
but that ain't happening.  I see that EDB are still running a couple
of HP-UX/IA64 animals, but I wonder if they're prepared to do anything
to support those animals --- like, say, fix platform-specific bugs.
Robert has definitely indicated displeasure with doing so, but
I don't know if he makes the decisions on that.

I would not stand in the way of dropping HP-UX and IA64 support as of
v16.  (I do still feel that HPPA is of interest, to keep us honest
about spinlock support --- but that dual-stack arrangement that IA64
uses is surely not part of anyone's future.)

I have no opinion either way about Solaris.

regards, tom lane




Too-long socket paths are breaking several buildfarm members

2022-07-03 Thread Tom Lane
Buildfarm member thorntail has yet to pass the pg_upgrade test
in the REL_15_STABLE branch.  It looks like the problem reduces to
an overlength pathname:

2022-07-04 00:27:03.404 MSK [2212393:2] LOG:  Unix-domain socket path 
"/home/nm/farm/sparc64_deb10_gcc_64_ubsan/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/.s.PGSQL.49714"
 is too long (maximum 107 bytes)

That path name is 3 bytes over the platform limit.  Evidently,
"REL_15_STABLE" is just enough longer than "HEAD" to make this fail,
whereas we didn't see the problem as long as the test case only
ran in HEAD.

Members butterflyfish, massasauga, and myna likewise have yet to pass
this test in REL_15_STABLE, though they're perfectly happy in HEAD.
They are returning cut-down logs that don't allow diagnosing for
certain, but a reasonable bet is that it's the same kind of problem.

I think that the conversion of pg_upgrade's test script to TAP
form missed a bet.  IIRC, we have mechanism somewhere to ensure
that test socket path names are created under /tmp, or someplace else
that's not subject to possibly-long paths of installation directories.
That's evidently not being used here.

regards, tom lane




Re: AIX support - alignment issues

2022-07-03 Thread Thomas Munro
On Sun, Jul 3, 2022 at 8:34 AM Tom Lane  wrote:
> I am a little concerned though that we don't have access to the latest
> version of AIX --- that seems like a non-maintainable situation.

The release history doesn't look t bad on that front: the live
versions are 7.1 (2010-2023), 7.2 (2015-TBA) and 7.3 (2021-TBA).  7.3
only came out half a year ago, slightly after Windows 11, which we
aren't testing yet either.  Those GCC AIX systems seem to be provided
by IBM and the Open Source Lab at Oregon State University which has a
POWER lab providing ongoing CI services etc to various OSS projects,
so I would assume that upgrades (and retirement of the
about-to-be-desupported 7.1 system) will come along eventually.

I don't have a dog in this race, but AIX is clearly not in the same
category as HP-UX (and maybe Solaris is somewhere in between).  AIX
runs on hardware you can buy today that got a major refresh last year
(Power 10), while HP-UX runs only on discontinued CPUs, so while it's
a no-brainer to drop HP-UX support, it's a trickier question for AIX.
I guess the way open source is supposed to work is that someone with a
real interest in PostgreSQL on AIX helps maintain it, not only keeping
it building and passing tests, but making it work really well (cf huge
pages, scalable event handling, probably more things that would be
obvious to an AIX expert...), and representing ongoing demand and
interests from the AIX user community...




Re: generate_series for timestamptz and time zone problem

2022-07-03 Thread Tom Lane
=?UTF-8?Q?Przemys=c5=82aw_Sztoch?=  writes:
> I have problem with this:
> -- Considering only built-in procs (prolang = 12), look for multiple uses
> -- of the same internal function (ie, matching prosrc fields).  It's OK to
> -- have several entries with different pronames for the same internal 
> function,
> -- but conflicts in the number of arguments and other critical items should
> -- be complained of.  (We don't check data types here; see next query.)

It's telling you you're violating project style.  Don't make multiple
pg_proc entries point at the same C function and then use PG_NARGS
to disambiguate; instead point at two separate functions.  The functions
can share code at the next level down, if they want.  (Just looking
at the patch, though, I wonder if sharing code is really beneficial
in this case.  It seems quite messy, and I wouldn't be surprised
if it hurts performance in the existing case.)

You also need to expend some more effort on refactoring code, to
eliminate silliness like looking up the timezone name each time
through the SRF.  That's got to be pretty awful performance-wise.

regards, tom lane




Re: Handle infinite recursion in logical replication setup

2022-07-03 Thread Peter Smith
On Mon, Jul 4, 2022 at 12:59 AM vignesh C  wrote:
...
> > 2.
> > /* ALTER SUBSCRIPTION  SET ( */
> >   else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) &&
> > TailMatches("SET", "("))
> > - COMPLETE_WITH("binary", "slot_name", "streaming",
> > "synchronous_commit", "disable_on_error");
> > + COMPLETE_WITH("binary", "origin", "slot_name", "streaming",
> > "synchronous_commit", "disable_on_error");
> >   /* ALTER SUBSCRIPTION  SKIP ( */
> >   else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) &&
> > TailMatches("SKIP", "("))
> >   COMPLETE_WITH("lsn");
> > @@ -3152,7 +3152,7 @@ psql_completion(const char *text, int start, int end)
> >   /* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */
> >   else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", 
> > "("))
> >   COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> > -   "enabled", "slot_name", "streaming",
> > +   "enabled", "origin", "slot_name", "streaming",
> > "synchronous_commit", "two_phase", "disable_on_error");
> >
> > Why do you choose to add a new option in-between other parameters
> > instead of at the end which we normally do? The one possible reason I
> > can think of is that all the parameters at the end are boolean so you
> > want to add this before those but then why before slot_name, and again
> > I don't see such a rule being followed for other parameters.
>
> I was not sure if it should be maintained in alphabetical order,
> anyway since the last option "disable_on_error" is at the end, I have
> changed it to the end.
>

Although it seems it is not a hard rule, mostly the COMPLETE_WITH are
coded using alphabetical order. Anyway, I think that was a clear
intention here too since 13 of 14 parameters were already in
alphabetical order; it is actually only that "disable_on_error"
parameter that was misplaced; not the new "origin" parameter.

Also, in practice, on  those completions will get output in
alphabetical order, so IMO it makes more sense for the code to be
consistent with the output:

e.g.
test_sub=# create subscription sub xxx connection '' publication pub WITH (
BINARY  DISABLE_ON_ERRORSTREAMING
CONNECT ENABLED SYNCHRONOUS_COMMIT
COPY_DATA   ORIGIN  TWO_PHASE
CREATE_SLOT SLOT_NAME
test_sub=#

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Allowing REINDEX to have an optional name

2022-07-03 Thread Tom Lane
Simon Riggs  writes:
> Thanks for the review, new version attached.

This is marked as Ready for Committer, but that seems unduly
optimistic.  The cfbot shows that it's failing on all platforms ---
and not in the same way on each, suggesting there are multiple
problems.

regards, tom lane




Re: JSON/SQL: jsonpath: incomprehensible error message

2022-07-03 Thread Andrew Dunstan


On 2022-06-30 Th 04:19, Amit Kapila wrote:
> On Wed, Jun 29, 2022 at 6:58 PM Erik Rijkers  wrote:
>> Op 29-06-2022 om 15:00 schreef Amit Kapila:
>>> On Mon, Jun 27, 2022 at 8:46 PM Andrew Dunstan  wrote:
 On 2022-06-26 Su 11:44, Erik Rijkers wrote:
> JSON/SQL jsonpath
>
> For example, a jsonpath string with deliberate typo 'like_regexp'
> (instead of 'like_regex'):
>
> select js
> from (values (jsonb '{}')) as f(js)
> where js @? '$ ? (@ like_regexp "^xxx")';
>
> ERROR:  syntax error, unexpected IDENT_P at or near " " of jsonpath input
> LINE 1: ...s from (values (jsonb '{}')) as f(js) where js @? '$ ? (@
> li...
>
> Both  'IDENT_P'  and  'at or near " "'  seem pretty useless.
>
>>> removing this. One thing that is not clear to me is why OP sees an
>>> acceptable message (ERROR:  syntax error, unexpected invalid token at
>>> or near "=" of jsonpath input) for a similar query in 14?
>> To mention that was perhaps unwise of me because The  IDENT_P (or more
>> generally, *_P)  messages can be provoked on 14 too.
>>
> Okay, then I think it is better to backpatch this fix.



Done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PoC] Reducing planning time when tables have many partitions

2022-07-03 Thread Tom Lane
Yuya Watari  writes:
> On Thu, Mar 24, 2022 at 11:03 AM David Rowley  wrote:
>> I think a better way to solve this would be just to have a single hash
>> table over all EquivalenceClasses that allows fast lookups of
>> EquivalenceMember->em_expr.

> If the predicate were "em->em_expr == something", the hash table whose
> key is em_expr would be effective. However, the actual predicates are
> not of this type but the following.

> // Find EquivalenceMembers whose relids is equal to the given relids
> (1) bms_equal(em->em_relids, relids)

> // Find EquivalenceMembers whose relids is a subset of the given relids
> (2) bms_is_subset(em->em_relids, relids)

Yeah, that's a really interesting observation, and I agree that
David's suggestion doesn't address it.  Maybe after we fix this
problem, matching of em_expr would be the next thing to look at,
but your results say it isn't the first thing.

I'm not real thrilled with trying to throw hashtables at the problem,
though.  As David noted, they'd be counterproductive for simple
queries.  Sure, we could address that with duplicate code paths,
but that's a messy and hard-to-tune approach.  Also, I find the
idea of hashing on all subsets of relids to be outright scary.
"m is not so large in most cases" does not help when m *is* large.

For the bms_equal class of lookups, I wonder if we could get anywhere
by adding an additional List field to every RelOptInfo that chains
all EquivalenceMembers that match that RelOptInfo's relids.
The trick here would be to figure out when to build those lists.
The simple answer would be to do it lazily on-demand, but that
would mean a separate scan of all the EquivalenceMembers for each
RelOptInfo; I wonder if there's a way to do better?

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.

regards, tom lane




Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-07-03 Thread Przemysław Sztoch

Michael Paquier wrote on 6/28/2022 7:14 AM:

On Thu, Jun 23, 2022 at 02:10:42PM +0200, Przemysław Sztoch wrote:

The only division that is probably possible is the one attached.

Well, the addition of cyrillic does not make necessary the removal of
SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a
dictionnary when manipulating the set of codepoints, but that's me
being too picky.  Just to say that I am fine with what you are
proposing here.

By the way, could you add a couple of regressions tests for each
patch with a sample of the characters added?  U+210C is a particularly
sensitive case, as we should really make sure that it maps to what we
want even if Latin-ASCII.xml tells a different story.  This requires
the addition of a couple of queries in unaccent.sql with the expected
output updated in unaccent.out.
--
Michael

Regression tests has been added.
--
Przemysław Sztoch | Mobile +48 509 99 00 66
From bc0cbd36f5dfece10306466437d5de2dc676f485 Mon Sep 17 00:00:00 2001
From: Przemyslaw Sztoch 
Date: Thu, 23 Jun 2022 13:56:09 +0200
Subject: [PATCH 1/2] Update unnaccent rules generator

add cirillic alpha
add digits
set->dict
---
 contrib/unaccent/expected/unaccent.out  |   24 +
 contrib/unaccent/generate_unaccent_rules.py |   66 +-
 contrib/unaccent/sql/unaccent.sql   |5 +
 contrib/unaccent/unaccent.rules | 1056 +++
 4 files changed, 1115 insertions(+), 36 deletions(-)

diff --git a/contrib/unaccent/expected/unaccent.out 
b/contrib/unaccent/expected/unaccent.out
index c1bd7cd897..090a0b3889 100644
--- a/contrib/unaccent/expected/unaccent.out
+++ b/contrib/unaccent/expected/unaccent.out
@@ -37,6 +37,18 @@ SELECT unaccent('À');  -- Remove combining diacritical 
0x0300
  A
 (1 row)
 
+SELECT unaccent('ℌ'); -- Controversial conversion from Latin-ASCII.xml
+ unaccent 
+--
+ x
+(1 row)
+
+SELECT unaccent('Chrząszcza szczudłem przechrzcił wąż, Strząsa skrzydła z 
dżdżu,'); -- Polish
+unaccent 
+-
+ Chrzaszcza szczudlem przechrzcil waz, Strzasa skrzydla z dzdzu,
+(1 row)
+
 SELECT unaccent('unaccent', 'foobar');
  unaccent 
 --
@@ -67,6 +79,12 @@ SELECT unaccent('unaccent', 'À');
  A
 (1 row)
 
+SELECT unaccent('unaccent', 'Łódź ąćęłńóśżź ĄĆĘŁŃÓŚŻŹ'); -- Polish letters
+ unaccent 
+--
+ Lodz acelnoszz ACELNOSZZ
+(1 row)
+
 SELECT ts_lexize('unaccent', 'foobar');
  ts_lexize 
 ---
@@ -97,3 +115,9 @@ SELECT ts_lexize('unaccent', 'À');
  {A}
 (1 row)
 
+SELECT unaccent('unaccent', '⅓ ⅜ ℃ ℉ ℗');
+ unaccent  
+---
+ 1/3 3/8 °C °F (P)
+(1 row)
+
diff --git a/contrib/unaccent/generate_unaccent_rules.py 
b/contrib/unaccent/generate_unaccent_rules.py
index c405e231b3..71932c8224 100644
--- a/contrib/unaccent/generate_unaccent_rules.py
+++ b/contrib/unaccent/generate_unaccent_rules.py
@@ -35,13 +35,16 @@ import xml.etree.ElementTree as ET
 sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
 
 # The ranges of Unicode characters that we consider to be "plain letters".
-# For now we are being conservative by including only Latin and Greek.  This
-# could be extended in future based on feedback from people with relevant
-# language knowledge.
+# For now we are being conservative by including only Latin, Greek and 
Cyrillic.
+# This could be extended in future based on feedback from people with
+# relevant language knowledge.
 PLAIN_LETTER_RANGES = ((ord('a'), ord('z')),  # Latin lower case
(ord('A'), ord('Z')),  # Latin upper case
-   (0x03b1, 0x03c9),  # GREEK SMALL LETTER ALPHA, 
GREEK SMALL LETTER OMEGA
-   (0x0391, 0x03a9))  # GREEK CAPITAL LETTER ALPHA, 
GREEK CAPITAL LETTER OMEGA
+   (ord('0'), ord('9')),  # Digits
+   (0x0391, 0x03a9),  # Greek capital letters 
(ALPHA-OMEGA)
+   (0x03b1, 0x03c9),  # Greek small letters 
(ALPHA-OMEGA)
+   (0x0410, 0x044f),  # Cyrillic capital and small 
letters
+   (0x00b0, 0x00b0))  # Degree sign
 
 # Combining marks follow a "base" character, and result in a composite
 # character. Example: "U&'A\0300'"produces "À".There are three types of
@@ -139,24 +142,24 @@ def get_plain_letter(codepoint, table):
 return codepoint
 
 # Should not come here
-assert(False)
+assert False, 'Codepoint U+%0.2X' % codepoint.id
 
 
 def is_ligature(codepoint, table):
 """Return true for letters combined with letters."""
-return all(is_letter(table[i], table) for i in codepoint.combining_ids)
+return all(i in table and is_letter(table[i], table) for i in 
codepoint.combining_ids)
 
 
 def get_plain_letters(codepoint, table):
 """Return a list of plain letters from a ligature."""
-assert(is_ligature(codepoint

Re: Allow makeaclitem() to accept multiple privileges

2022-07-03 Thread Tom Lane
"Tharakan, Robins"  writes:
> Presently, makeaclitem() allows only a single privilege in a single call.
> This
> patch allows it to additionally accept multiple comma-separated privileges.

Seems reasonable.  Pushed with minor cosmetic adjustments (mostly
docs changes).

regards, tom lane




Re: Fix proposal for comparaison bugs in PostgreSQL::Version

2022-07-03 Thread Jehan-Guillaume de Rorthais
On Sun, 3 Jul 2022 10:40:21 -0400
Andrew Dunstan  wrote:

> On 2022-06-29 We 05:09, Jehan-Guillaume de Rorthais wrote:
> > On Tue, 28 Jun 2022 18:17:40 -0400
> > Andrew Dunstan  wrote:
> >  
> >> On 2022-06-28 Tu 16:53, Jehan-Guillaume de Rorthais wrote:  
> >>> ...
> >>> A better fix would be to store the version internally as version_num that
> >>> are trivial to compute and compare. Please, find in attachment an
> >>> implementation of this.
> >>>
> >>> The patch is a bit bigger because it improved the devel version to support
> >>> rc/beta/alpha comparison like 14rc2 > 14rc1.
> >>>
> >>> Moreover, it adds a bunch of TAP tests to check various use cases.
> >>
> >> Nice catch, but this looks like massive overkill. I think we can very
> >> simply fix the test in just a few lines of code, instead of a 190 line
> >> fix and a 130 line TAP test.  
> > I explained why the patch was a little bit larger than required: it fixes
> > the bugs and do a little bit more. The _version_cmp sub is shorter and
> > easier to understand, I use multi-line code where I could probably fold
> > them in a one-liner, added some comments... Anyway, I don't feel the number
> > of line changed is "massive". But I can probably remove some code and
> > shrink some other if it is really important...
> >
> > Moreover, to be honest, I don't mind the number of additional lines of TAP
> > tests. Especially since it runs really, really fast and doesn't hurt
> > day-to-day devs as it is independent from other TAP tests anyway. It could
> > be 1k, if it runs fast, is meaningful and helps avoiding futur regressions,
> > I would welcome the addition.  
> 
> 
> I don't see the point of having a TAP test at all. We have TAP tests for
> testing the substantive products we test, not for the test suite
> infrastructure. Otherwise, where will we stop? Shall we have tests for
> the things that test the test suite?

Tons of perl module have regression tests. When questioning where testing
should stop, it seems the Test::More module itself is not the last frontier:
https://github.com/Test-More/test-more/tree/master/t

Moreover, the PostgreSQL::Version is not a TAP test module, but a module to
deal with PostgreSQL versions and compare them.

Testing makes development faster as well when it comes to test the code.
Instead of testing vaguely manually, you can test a whole bunch of situations
and add accumulate some more when you think about a new one or when a bug is
reported. Having TAP test helps to make sure the code work as expected.

It helped me when creating my patch. With all due respect, I just don't
understand your arguments against them. The number of lines or questioning when
testing should stop doesn't hold much.

> > If we really want to save some bytes, I have a two lines worth of code fix
> > that looks more readable to me than fixing _version_cmp:
> >
> > +++ b/src/test/perl/PostgreSQL/Version.pm
> > @@ -92,9 +92,13 @@ sub new
> > # Split into an array
> > my @numbers = split(/\./, $arg);
> >  
> > +   # make sure all digit of the array-represented version are set so
> > we can
> > +   # keep _version_cmp code as a "simple" digit-to-digit comparison
> > loop
> > +   $numbers[$_] += 0 for 0..3;
> > +
> > # Treat development versions as having a minor/micro version one
> > less than # the first released version of that branch.
> > -   push @numbers, -1 if ($devel);
> > +   $numbers[3] = -1 if $devel;
> >  
> > $devel ||= "";  
> 
> I don't see why this is any more readable.

The _version_cmp is much more readable.

But anyway, this is not the point. Using an array to compare versions where we
can use version_num seems like useless and buggy convolutions to me.

Regards,




Re: 15beta1 tab completion of extension versions

2022-07-03 Thread Tom Lane
Noah Misch  writes:
> On Sun, Jul 03, 2022 at 02:00:59PM -0400, Tom Lane wrote:
>> This will require one extra  when what you want is to update to
>> a specific version, but I doubt that that's going to bother anyone
>> very much.  I don't want to try to resurrect the v14 behavior exactly
>> because it's too much of a mess from a quoting standpoint.

> Works for me, and I agree the patch implements that successfully.  "ALTER
> EXTENSION x UPDATE;" is an infrequent command, and "ALTER EXTENSION x UPDATE
> TO ..." is even less frequent.  It's not worth much special effort to shave
>  steps.

Done that way then.  Thanks for the report.

regards, tom lane




Re: automatically generating node support functions

2022-07-03 Thread Tom Lane
Peter Eisentraut  writes:
> Here is a patch that reformats the relevant (and a few more) comments 
> that way.  This has been run through pgindent, so the formatting should 
> be stable.

Now that that's been pushed, the main patch is of course quite broken.
Are you working on a rebase?

I looked through the last published version of the main patch (Alvaro's
0002 from 2022-04-19), without trying to actually test it, and found
a couple of things that look wrong in the Makefiles:

* AFAICT, the infrastructure for removing the generated files at
"make *clean" is incomplete.  In particular I don't see any code
for removing the symlinks or the associated stamp file during
"make clean".  It looks like the existing header symlinks are
all cleaned up in src/include/Makefile's "clean" rule, so you
could do likewise for these.  Also, the "make maintainer-clean"
infrastructure seems incomplete --- shouldn't src/backend/Makefile's
maintainer-clean rule now also do
$(MAKE) -C nodes $@
?

* There are some useful comments in backend/utils/Makefile that
I think should be carried over along with the make rules that
(it looks like) you cribbed from there, notably

# fmgr-stamp records the last time we ran Gen_fmgrtab.pl.  We don't rely on
# the timestamps of the individual output files, because the Perl script
# won't update them if they didn't change (to avoid unnecessary recompiles).

# These generated headers must be symlinked into builddir/src/include/,
# using absolute links for the reasons explained in src/backend/Makefile.
# We use header-stamp to record that we've done this because the symlinks
# themselves may appear older than fmgr-stamp.

and something similar to this for the "clean" rule:
# fmgroids.h, fmgrprotos.h, fmgrtab.c, fmgr-stamp, and errcodes.h are in the
# distribution tarball, so they are not cleaned here.


Also, I share David's upthread allergy to the option names "path_hackN"
and to documenting those only inside the conversion script.  I think
the existing text that you moved into the script, such as this bit:

# We do not print the parent, else we'd be in infinite
# recursion.  We can print the parent's relids for
# identification purposes, though.  We print the pathtarget
# only if it's not the default one for the rel.  We also do
# not print the whole of param_info, since it's printed via
# RelOptInfo; it's sufficient and less cluttering to print
# just the required outer relids.

is perfectly adequate as documentation, it just needs to be somewhere else
(pathnodes.h seems fine, if not nodes.h) and labeled as to exactly which
pg_node_attr option invokes which behavior.

BTW, I think this: "Unknown attributes are ignored" is a seriously
bad idea; it will allow typos to escape detection.

regards, tom lane




Re: 15beta1 tab completion of extension versions

2022-07-03 Thread Noah Misch
On Sun, Jul 03, 2022 at 02:00:59PM -0400, Tom Lane wrote:
> I wrote:
> > Noah Misch  writes:
> >> "ALTER EXTENSION hstore UPDATE;" is a valid command (updates to the
> >> control file default version).  Hence, I think the v14 behavior was better.
> 
> > Hmm ... good point, let me think about that.
> 
> After consideration, my preferred solution is just this:
> 
> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> index 463cac9fb0..c5cafe6f4b 100644
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -1927,7 +1927,7 @@ psql_completion(const char *text, int start, int end)
>  
>   /* ALTER EXTENSION  */
>   else if (Matches("ALTER", "EXTENSION", MatchAny))
> - COMPLETE_WITH("ADD", "DROP", "UPDATE TO", "SET SCHEMA");
> + COMPLETE_WITH("ADD", "DROP", "UPDATE", "SET SCHEMA");
>  
>   /* ALTER EXTENSION  UPDATE */
>   else if (Matches("ALTER", "EXTENSION", MatchAny, "UPDATE"))
> 
> This will require one extra  when what you want is to update to
> a specific version, but I doubt that that's going to bother anyone
> very much.  I don't want to try to resurrect the v14 behavior exactly
> because it's too much of a mess from a quoting standpoint.

Works for me, and I agree the patch implements that successfully.  "ALTER
EXTENSION x UPDATE;" is an infrequent command, and "ALTER EXTENSION x UPDATE
TO ..." is even less frequent.  It's not worth much special effort to shave
 steps.




Re: 15beta1 tab completion of extension versions

2022-07-03 Thread Tom Lane
I wrote:
> Noah Misch  writes:
>> "ALTER EXTENSION hstore UPDATE;" is a valid command (updates to the
>> control file default version).  Hence, I think the v14 behavior was better.

> Hmm ... good point, let me think about that.

After consideration, my preferred solution is just this:

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 463cac9fb0..c5cafe6f4b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1927,7 +1927,7 @@ psql_completion(const char *text, int start, int end)
 
/* ALTER EXTENSION  */
else if (Matches("ALTER", "EXTENSION", MatchAny))
-   COMPLETE_WITH("ADD", "DROP", "UPDATE TO", "SET SCHEMA");
+   COMPLETE_WITH("ADD", "DROP", "UPDATE", "SET SCHEMA");
 
/* ALTER EXTENSION  UPDATE */
else if (Matches("ALTER", "EXTENSION", MatchAny, "UPDATE"))

This will require one extra  when what you want is to update to
a specific version, but I doubt that that's going to bother anyone
very much.  I don't want to try to resurrect the v14 behavior exactly
because it's too much of a mess from a quoting standpoint.

regards, tom lane




Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-03 Thread Andres Freund
Hi,

On 2022-07-03 10:07:54 -0700, Nathan Bossart wrote:
> Thanks for the prompt review.
> 
> On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote:
> > On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
> >> +  /* Obtain requested tasks */
> >> +  SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +  flags = CustodianShmem->cust_flags;
> >> +  CustodianShmem->cust_flags = 0;
> >> +  SpinLockRelease(&CustodianShmem->cust_lck);
> > 
> > Just resetting the flags to 0 is problematic. Consider what happens if 
> > there's
> > two tasks and and the one processed first errors out. You'll loose 
> > information
> > about needing to run the second task.
> 
> I think we also want to retry any failed tasks.

I don't think so, at least not if it's just going to retry that task straight
away - then we'll get stuck on that one task forever. If we had the ability to
"queue" it the end, to be processed after other already dequeued tasks, it'd
be a different story.


> The way v6 handles this is by requesting all tasks after an exception.

Ick. That strikes me as a bad idea.


> >> +/*
> >> + * RequestCustodian
> >> + *Called to request a custodian task.
> >> + */
> >> +void
> >> +RequestCustodian(int flags)
> >> +{
> >> +  SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +  CustodianShmem->cust_flags |= flags;
> >> +  SpinLockRelease(&CustodianShmem->cust_lck);
> >> +
> >> +  if (ProcGlobal->custodianLatch)
> >> +  SetLatch(ProcGlobal->custodianLatch);
> >> +}
> > 
> > With this representation we can't really implement waiting for a task or
> > such. And it doesn't seem like a great API for the caller to just specify a
> > mix of flags.
> 
> At the moment, the idea is that nothing should need to wait for a task
> because the custodian only handles things that are relatively non-critical.

Which is just plainly not true as the patchset stands...

I think we're going to have to block if some cleanup as part of a checkpoint
hasn't been completed by the next checkpoint - otherwise it'll just end up
being way too confusing and there's absolutely no backpressure anymore.


> If that changes, this could probably be expanded to look more like
> RequestCheckpoint().
> 
> What would you suggest using instead of a mix of flags?

I suspect an array of tasks with requested and completed counters or such?
With a condition variable to wait on?


> >> +  /* let the custodian know what it can remove */
> >> +  CustodianSetLogicalRewriteCutoff(cutoff);
> > 
> > Setting this variable in a custodian datastructure and then fetching it from
> > there seems architecturally wrong to me.
> 
> Where do you think it should go?  I previously had it in the checkpointer's
> shared memory, but you didn't like that the functions were declared in
> bgwriter.h (along with the other checkpoint stuff).  If the checkpointer
> shared memory is the right place, should we create checkpointer.h and use
> that instead?

Well, so far I have not understood what the whole point of the shared state
is, so i have a bit of a hard time answering this ;)


> >> +/*
> >> + * Remove all mappings not needed anymore based on the logical restart 
> >> LSN saved
> >> + * by the checkpointer.  We use this saved value instead of calling
> >> + * ReplicationSlotsComputeLogicalRestartLSN() so that we don't interfere 
> >> with an
> >> + * ongoing call to CheckPointLogicalRewriteHeap() that is flushing 
> >> mappings to
> >> + * disk.
> >> + */
> > 
> > What interference could there be?
> 
> My concern is that the custodian could obtain a later cutoff than what the
> checkpointer does, which might cause files to be concurrently unlinked and
> fsync'd.  If we always use the checkpointer's cutoff, that shouldn't be a
> problem.  This could probably be better explained in this comment.

How about having a Datum argument to RequestCustodian() that is forwarded to
the task?


> >> +void
> >> +RemoveOldLogicalRewriteMappings(void)
> >> +{
> >> +  XLogRecPtr  cutoff;
> >> +  DIR*mappings_dir;
> >> +  struct dirent *mapping_de;
> >> +  charpath[MAXPGPATH + 20];
> >> +  boolvalue_set = false;
> >> +
> >> +  cutoff = CustodianGetLogicalRewriteCutoff(&value_set);
> >> +  if (!value_set)
> >> +  return;
> > 
> > Afaics nothing clears values_set - is that a good idea?
> 
> I'm using value_set to differentiate the case where InvalidXLogRecPtr means
> the checkpointer hasn't determined a value yet versus the case where it
> has.  In the former, we don't want to take any action.  In the latter, we
> want to unlink all the files.  Since we're moving to a request model for
> the custodian, I might be able to remove this value_set stuff completely.
> If that's not possible, it probably deserves a better comment.

It would.

Greetings,

Andres Freund




Re: replacing role-level NOINHERIT with a grant-level option

2022-07-03 Thread Nathan Bossart
On Sat, Jul 02, 2022 at 11:04:28PM -0400, Robert Haas wrote:
> On Sat, Jul 2, 2022 at 6:16 PM Nathan Bossart  
> wrote:
>> I was thinking that when DEFAULT was removed, pg_dump would just need to
>> generate WITH INHERIT TRUE/FALSE based on the value of rolinherit for older
>> versions.  Using the role-level property as the default for future grants
>> seems a viable strategy, although it would break backward compatibility.
>> For example, if I create a NOINHERIT role, grant a bunch of roles to it,
>> and then change it to INHERIT, the role won't begin inheriting the
>> privileges of the roles it is a member of.  Right now, it does.
> 
> I think the idea you propose here is interesting, because I think it
> proves that committing v2 or something like it doesn't really lock us
> into the role-level property any more than we already are, which at
> least makes me feel slightly less bad about that option. However, if
> there's implacable opposition to any compatibility break at any point,
> then maybe this plan would never actually be implemented in practice.
> And if there's not, maybe we can be bolder now.

If by "bolder" you mean "mark [NO]INHERIT as deprecated-and-to-be-removed
and begin emitting WARNINGs when it and WITH INHERIT DEFAULT are used," I
think it's worth consideration.  I suspect it will be hard to sell removing
[NO]INHERIT in v16 because it would introduce a compatibility break without
giving users much time to migrate.  I could be wrong, though.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: PSA: Autoconf has risen from the dead

2022-07-03 Thread Andres Freund
Hi,

On 2022-07-03 10:50:49 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > Hmm, I also don't know how annoying it's going to be to get the new
> > ninja/meson stuff working on macOS ... I really hope someone puts a
> > good set of directions on the wiki or in the documentation or
> > someplace.

Yea, I guess I should start a documentation section...

I've only used homebrew on mac, but with that it should be something along the
lines of

brew install meson
meson setup --buildtype debug -Dcassert=true build-directory
cd build-directory
ninja

of course if you want to build against some dependencies and / or run tap
tests, you need to do something similar to what you have to do for
configure. I.e.
- install perl modules [1]
- tell the build about location of homebrew [2]


> If you use MacPorts it's just "install those packages", and I imagine
> the same for Homebrew.  I've not tried build-from-source on modern
> platforms.

I've done some semi automated testing (to be turned fully automatic) across
meson versions that didn't so far show any need for that. We do require a
certain minimum version of meson (indicated in the top-level meson.build,
raises an error if not met), which in turn requires a minimum version of ninja
(also errors).

The windows build with msbuild is slower on older versions of meson that are
unproblematic on other platforms. But given you're not going to install an
outdated meson from $package-manager there, I don't think it's worth worrying
about.

Greetings,

Andres Freund

[1] https://github.com/anarazel/postgres/blob/meson/.cirrus.yml#L638
[2] https://github.com/anarazel/postgres/blob/meson/.cirrus.yml#L742




Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word

2022-07-03 Thread Tom Lane
Noah Misch  writes:
> On Mon, May 30, 2022 at 05:20:15PM -0400, Tom Lane wrote:
>> [allow EXEC SQL TYPE unreserved_keyword IS ...]

> I didn't locate any problems beyond the test and doc gaps that you mentioned,
> so I've marked this Ready for Committer.

Thanks!  Here's a fleshed-out version with doc changes, plus adjustment
of preproc/type.pgc so that it exposes the existing problem.  (No code
changes from v1.)  I'll push this in a few days if there are not
objections.

regards, tom lane

diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index 7f8b4dd5c0..0ba1bf93a6 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -1483,6 +1483,10 @@ EXEC SQL END DECLARE SECTION;
 
 
  Typedefs
+ 
+  typedef
+  in ECPG
+ 
 
  
   Use the typedef keyword to map new types to already
@@ -1497,8 +1501,38 @@ EXEC SQL END DECLARE SECTION;
 
 EXEC SQL TYPE serial_t IS long;
 
-  This declaration does not need to be part of a declare section.
+  This declaration does not need to be part of a declare section;
+  that is, you can also write typedefs as normal C statements.
  
+
+ 
+  Any word you declare as a typedef cannot be used as a SQL keyword
+  in EXEC SQL commands later in the same program.
+  For example, this won't work:
+
+EXEC SQL BEGIN DECLARE SECTION;
+typedef int start;
+EXEC SQL END DECLARE SECTION;
+...
+EXEC SQL START TRANSACTION;
+
+  ECPG will report a syntax error for START
+  TRANSACTION, because it no longer
+  recognizes START as a SQL keyword,
+  only as a typedef.
+ 
+
+ 
+  
+   In PostgreSQL releases before v16, use
+   of SQL keywords as typedef names was likely to result in syntax
+   errors associated with use of the typedef itself, rather than use
+   of the name as a SQL keyword.  The new behavior is less likely to
+   cause problems when an existing ECPG application is recompiled in
+   a new PostgreSQL release with new
+   keywords.
+  
+ 
 
 
 
diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer
index b95fc44314..fba35f6be6 100644
--- a/src/interfaces/ecpg/preproc/ecpg.trailer
+++ b/src/interfaces/ecpg/preproc/ecpg.trailer
@@ -564,8 +564,29 @@ var_type:	simple_type
 			$$.type_index = mm_strdup("-1");
 			$$.type_sizeof = NULL;
 		}
-		| ECPGColLabelCommon '(' precision opt_scale ')'
+		| NUMERIC '(' precision opt_scale ')'
 		{
+			$$.type_enum = ECPGt_numeric;
+			$$.type_str = mm_strdup("numeric");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| DECIMAL_P '(' precision opt_scale ')'
+		{
+			$$.type_enum = ECPGt_decimal;
+			$$.type_str = mm_strdup("decimal");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| IDENT '(' precision opt_scale ')'
+		{
+			/*
+			 * In C parsing mode, NUMERIC and DECIMAL are not keywords, so
+			 * they will show up here as a plain identifier, and we need
+			 * this duplicate code to recognize them.
+			 */
 			if (strcmp($1, "numeric") == 0)
 			{
 $$.type_enum = ECPGt_numeric;
@@ -587,15 +608,98 @@ var_type:	simple_type
 			$$.type_index = mm_strdup("-1");
 			$$.type_sizeof = NULL;
 		}
-		| ECPGColLabelCommon ecpg_interval
+		| VARCHAR
 		{
-			if (strlen($2) != 0 && strcmp ($1, "datetime") != 0 && strcmp ($1, "interval") != 0)
-mmerror (PARSE_ERROR, ET_ERROR, "interval specification not allowed here");
+			$$.type_enum = ECPGt_varchar;
+			$$.type_str = EMPTY; /*mm_strdup("varchar");*/
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| FLOAT_P
+		{
+			/* Note: DOUBLE is handled in simple_type */
+			$$.type_enum = ECPGt_float;
+			$$.type_str = mm_strdup("float");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| NUMERIC
+		{
+			$$.type_enum = ECPGt_numeric;
+			$$.type_str = mm_strdup("numeric");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| DECIMAL_P
+		{
+			$$.type_enum = ECPGt_decimal;
+			$$.type_str = mm_strdup("decimal");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| TIMESTAMP
+		{
+			$$.type_enum = ECPGt_timestamp;
+			$$.type_str = mm_strdup("timestamp");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| INTERVAL ecpg_interval
+		{
+			$$.type_enum = ECPGt_interval;
+			$$.type_str = mm_strdup("interval");
+			$$.type_dimension = mm_strdup("-1");
+			$$.type_index = mm_strdup("-1");
+			$$.type_sizeof = NULL;
+		}
+		| STRING
+		{
+			if (INFORMIX_MODE)
+			{
+/* In Informix mode, "string" is automatically a typedef */
+$$.type_enum = ECPGt

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-03 Thread Nathan Bossart
Hi Andres,

Thanks for the prompt review.

On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote:
> On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
>> +/* Obtain requested tasks */
>> +SpinLockAcquire(&CustodianShmem->cust_lck);
>> +flags = CustodianShmem->cust_flags;
>> +CustodianShmem->cust_flags = 0;
>> +SpinLockRelease(&CustodianShmem->cust_lck);
> 
> Just resetting the flags to 0 is problematic. Consider what happens if there's
> two tasks and and the one processed first errors out. You'll loose information
> about needing to run the second task.

I think we also want to retry any failed tasks.  The way v6 handles this is
by requesting all tasks after an exception.  Another way to handle this
could be to reset each individual flag before the task is executed, and
then we could surround each one with a PG_CATCH block that resets the flag.
I'll do it this way in the next revision.

>> +/*
>> + * RequestCustodian
>> + *  Called to request a custodian task.
>> + */
>> +void
>> +RequestCustodian(int flags)
>> +{
>> +SpinLockAcquire(&CustodianShmem->cust_lck);
>> +CustodianShmem->cust_flags |= flags;
>> +SpinLockRelease(&CustodianShmem->cust_lck);
>> +
>> +if (ProcGlobal->custodianLatch)
>> +SetLatch(ProcGlobal->custodianLatch);
>> +}
> 
> With this representation we can't really implement waiting for a task or
> such. And it doesn't seem like a great API for the caller to just specify a
> mix of flags.

At the moment, the idea is that nothing should need to wait for a task
because the custodian only handles things that are relatively non-critical.
If that changes, this could probably be expanded to look more like
RequestCheckpoint().

What would you suggest using instead of a mix of flags?

>> +/* Calculate how long to sleep */
>> +end_time = (pg_time_t) time(NULL);
>> +elapsed_secs = end_time - start_time;
>> +if (elapsed_secs >= CUSTODIAN_TIMEOUT_S)
>> +continue;   /* no sleep for us */
>> +cur_timeout = CUSTODIAN_TIMEOUT_S - elapsed_secs;
>> +
>> +(void) WaitLatch(MyLatch,
>> + WL_LATCH_SET | WL_TIMEOUT | 
>> WL_EXIT_ON_PM_DEATH,
>> + cur_timeout * 1000L /* convert 
>> to ms */ ,
>> + WAIT_EVENT_CUSTODIAN_MAIN);
>> +}
> 
> I don't think we should have this thing wake up on a regular basis. We're
> doing way too much of that already, and I don't think we should add
> more. Either we need a list of times when tasks need to be processed and wake
> up at that time, or just wake up if somebody requests a task.

I agree.  I will remove the timeout in the next revision.

>> diff --git a/src/backend/postmaster/postmaster.c 
>> b/src/backend/postmaster/postmaster.c
>> index e67370012f..82aa0c6307 100644
>> --- a/src/backend/postmaster/postmaster.c
>> +++ b/src/backend/postmaster/postmaster.c
>> @@ -1402,7 +1402,8 @@ PostmasterMain(int argc, char *argv[])
>>   * Remove old temporary files.  At this point there can be no other
>>   * Postgres processes running in this directory, so this should be safe.
>>   */
>> -RemovePgTempFiles();
>> +RemovePgTempFiles(true, true);
>> +RemovePgTempFiles(false, false);
> 
> This is imo hard to read and easy to get wrong. Make it multiple functions or
> pass named flags in.

Will do.

>> + * StagePgTempDirForRemoval
>> + *
>> + * This function renames the given directory with a special prefix that
>> + * RemoveStagedPgTempDirs() will know to look for.  An integer is appended 
>> to
>> + * the end of the new directory name in case previously staged pgsql_tmp
>> + * directories have not yet been removed.
>> + */
> 
> It doesn't seem great to need to iterate through a directory that contains
> other files, potentially a significant number. How about having a
> staged_for_removal/ directory, and then only scanning that?

Yeah, that seems like a good idea.  Will do.

>> +/*
>> + * Find a name for the stage directory.  We just increment an integer 
>> at the
>> + * end of the name until we find one that doesn't exist.
>> + */
>> +for (int n = 0; n <= INT_MAX; n++)
>> +{
>> +snprintf(stage_path, sizeof(stage_path), "%s/%s%d", parent_path,
>> + PG_TEMP_DIR_TO_REMOVE_PREFIX, n);
>> +
>> +if (stat(stage_path, &statbuf) != 0)
>> +{
>> +if (errno == ENOENT)
>> +break;
>> +
>> +ereport(LOG,
>> +(errcode_for_file_access(),
>> + errmsg("could not stat file \"%s\": 
>> %m", stage_path)));
>> +return;
>> +}
>> +
>> +stage_path[0] = '\0';
> 
> I still disli

Re: PSA: Autoconf has risen from the dead

2022-07-03 Thread Tom Lane
Robert Haas  writes:
> Hmm, I also don't know how annoying it's going to be to get the new
> ninja/meson stuff working on macOS ... I really hope someone puts a
> good set of directions on the wiki or in the documentation or
> someplace.

If you use MacPorts it's just "install those packages", and I imagine
the same for Homebrew.  I've not tried build-from-source on modern
platforms.

One thing I think we lack data on is whether we're going to need a
policy similar to everyone-must-use-exactly-this-autoconf-version.
If we do, that will greatly raise the importance of building from
source.

regards, tom lane




Re: PSA: Autoconf has risen from the dead

2022-07-03 Thread Robert Haas
On Sat, Jul 2, 2022 at 1:42 PM Tom Lane  wrote:
> On the whole, I'm questioning the value of messing with our autoconf
> infrastructure at this stage.  We did agree at PGCon that we'd keep
> it going for a couple years more, but it's not real clear to me why
> we can't limp along with 2.69 until we decide to drop it.

If building it on macOS is going to be annoying, then -1 from me for
upgrading to a new version until that's resolved.

Hmm, I also don't know how annoying it's going to be to get the new
ninja/meson stuff working on macOS ... I really hope someone puts a
good set of directions on the wiki or in the documentation or
someplace.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Fix proposal for comparaison bugs in PostgreSQL::Version

2022-07-03 Thread Andrew Dunstan


On 2022-06-29 We 05:09, Jehan-Guillaume de Rorthais wrote:
> On Tue, 28 Jun 2022 18:17:40 -0400
> Andrew Dunstan  wrote:
>
>> On 2022-06-28 Tu 16:53, Jehan-Guillaume de Rorthais wrote:
>>> ...
>>> A better fix would be to store the version internally as version_num that
>>> are trivial to compute and compare. Please, find in attachment an
>>> implementation of this.
>>>
>>> The patch is a bit bigger because it improved the devel version to support
>>> rc/beta/alpha comparison like 14rc2 > 14rc1.
>>>
>>> Moreover, it adds a bunch of TAP tests to check various use cases.  
>>
>> Nice catch, but this looks like massive overkill. I think we can very
>> simply fix the test in just a few lines of code, instead of a 190 line
>> fix and a 130 line TAP test.
> I explained why the patch was a little bit larger than required: it fixes the
> bugs and do a little bit more. The _version_cmp sub is shorter and easier to
> understand, I use multi-line code where I could probably fold them in a
> one-liner, added some comments... Anyway, I don't feel the number of line
> changed is "massive". But I can probably remove some code and shrink some 
> other
> if it is really important...
>
> Moreover, to be honest, I don't mind the number of additional lines of TAP
> tests. Especially since it runs really, really fast and doesn't hurt 
> day-to-day
> devs as it is independent from other TAP tests anyway. It could be 1k, if it
> runs fast, is meaningful and helps avoiding futur regressions, I would welcome
> the addition.


I don't see the point of having a TAP test at all. We have TAP tests for
testing the substantive products we test, not for the test suite
infrastructure. Otherwise, where will we stop? Shall we have tests for
the things that test the test suite?


>
> If we really want to save some bytes, I have a two lines worth of code fix 
> that
> looks more readable to me than fixing _version_cmp:
>
> +++ b/src/test/perl/PostgreSQL/Version.pm
> @@ -92,9 +92,13 @@ sub new
> # Split into an array
> my @numbers = split(/\./, $arg);
>  
> +   # make sure all digit of the array-represented version are set so we 
> can
> +   # keep _version_cmp code as a "simple" digit-to-digit comparison loop
> +   $numbers[$_] += 0 for 0..3;
> +
> # Treat development versions as having a minor/micro version one less 
> than
> # the first released version of that branch.
> -   push @numbers, -1 if ($devel);
> +   $numbers[3] = -1 if $devel;
>  
> $devel ||= "";


I don't see why this is any more readable.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: making relfilenodes 56 bits

2022-07-03 Thread Robert Haas
On Sat, Jul 2, 2022 at 3:29 PM Andres Freund  wrote:
> Why did you choose a quite small value for VAR_RFN_PREFETCH? VAR_OID_PREFETCH
> is 8192, but you chose 64 for VAR_RFN_PREFETCH?

As Dilip mentioned, I suggested a lower value. If that's too low, we
can go higher, but I think there is value in not making this
excessively large. Somebody somewhere is going to have a database
that's crash-restarting like mad, and I don't want that person to run
through an insane number of relfilenodes for no reason. I don't think
there are going to be a lot of people creating thousands upon
thousands of relations in a short period of time, and I'm not sure
that it's a big deal if those who do end up having to wait for a few
extra xlog flushes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: 15beta1 tab completion of extension versions

2022-07-03 Thread Tom Lane
Noah Misch  writes:
> I think it makes sense to send UPDATE TO as a single completion in places
> where no valid command can have the UPDATE without the TO.  CREATE RULE foo AS
> ON UPDATE TO is a candidate, though CREATE RULE completion doesn't do that
> today.  "ALTER EXTENSION hstore UPDATE;" is a valid command (updates to the
> control file default version).  Hence, I think the v14 behavior was better.

Hmm ... good point, let me think about that.

regards, tom lane




Re: making relfilenodes 56 bits

2022-07-03 Thread Dilip Kumar
On Sun, Jul 3, 2022 at 12:59 AM Andres Freund  wrote:

> Hm. Now that I think about it, isn't the XlogFlush() in
> XLogPutNextRelFileNumber() problematic performance wise? Yes, we'll spread the
> cost across a number of GetNewRelFileNumber() calls, but still, an additional
> f[data]sync for every 64 relfilenodes assigned isn't cheap - today there's
> zero fsyncs when creating a sequence or table inside a transaction (there are
> some for indexes, but there's patches to fix that).
>
> Not that I really see an obvious alternative.

I think to see the impact we need a workload which frequently creates
the relfilenode, maybe we can test where parallel to pgbench we are
frequently creating the relation/indexes and check how much
performance hit we see.  And if we see the impact then increasing
VAR_RFN_PREFETCH value can help in resolving that.

> I guess we could try to invent a flush-log-before-write type logic for
> relfilenodes somehow? So that the first block actually written to a file needs
> to ensure the WAL record that created the relation is flushed. But getting
> that to work reliably seems nontrivial.

>
> One thing that would be good is to add an assertion to a few places ensuring
> that relfilenodes aren't above ->nextRelFileNumber, most importantly somewhere
> in the recovery path.

Yes, it makes sense.

> Why did you choose a quite small value for VAR_RFN_PREFETCH? VAR_OID_PREFETCH
> is 8192, but you chose 64 for VAR_RFN_PREFETCH?

Earlier it was 8192, then there was a comment from Robert that we can
use Oid for many other things like an identifier for a catalog tuple
or a TOAST chunk, but a RelFileNumber requires a filesystem operation,
so the amount of work that is needed to use up 8192 RelFileNumbers is
a lot bigger than the amount of work required to use up 8192 OIDs.

I think that make sense so I reduced it to 64, but now I tends to
think that we also need to consider the point that after consuming
VAR_RFN_PREFETCH we are going to do XlogFlush(), so it's better that
we keep it high.  And as Robert told upthread, even with keeping it
8192 we can still crash 2^41 (2 trillian) times before we completely
run out of the number.  So I think we can easily keep it up to 8192
and I don't think that we really need to worry much about the
performance impact by XlogFlush()?

> I'd spell out RFN in VAR_RFN_PREFETCH btw, it took me a bit to expand RFN to
> relfilenode.

Okay.

> This is embarassing. Trying to keep alignment match between C and catalog
> alignment on AIX, without actually making the system understand the alignment
> rules, is a remarkably shortsighted approach.
>
> I started a separate thread about it, since it's not really relevant to this 
> thread:
> https://postgr.es/m/20220702183354.a6uhja35wta7agew%40alap3.anarazel.de
>
> Maybe we could at least make the field order to be something like
>   oid, relam, relfilenode, relname

Yeah that we can do.

> that should be ok alignment wise, keep oid first, and seems to make sense from
> an "importance" POV? Can't really interpret later fields without knowing relam
> etc.

Right.

> > I think due to wraparound if relfilenode gets reused by another
> > relation in the same checkpoint then there was an issue in crash
> > recovery with wal level minimal.  But the root of the issue is a
> > wraparound, right?
>
> I'm not convinced the tombstones were required solely in the oid wraparound
> case before, despite the comment saying so, with wal_level=minimal. I gotta do
> some non-work stuff for a bit, so I need to stop pondering this now :)
>
> I think it might be a good idea to have a few weeks in which we do *not*
> remove the tombstones, but have assertion checks against such files existing
> when we don't expect them to. I.e. commit 1-3, add the asserts, then commit 4
> a bit later.

I think this is a good idea.

> > Okay, so you mean to say that we can first drop the remaining segment
> > and at last we drop the segment 0 right?
>
> I'd use the approach Robert suggested and delete from the end, going down.

Yeah, I got it, thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Slow standby snapshot

2022-07-03 Thread Andrey Borodin



> On 1 Apr 2022, at 04:18, Michail Nikolaev  wrote:
> 
> Hello.
> 
> Just an updated commit message.

I've looked into v5.

IMO the purpose of KnownAssignedXidsNext would be slightly more obvious if it 
was named KnownAssignedXidsNextOffset.
Also please consider some editorialisation:
s/high value/big number/g
KnownAssignedXidsNext[] is updating while taking the snapshot. -> 
KnownAssignedXidsNext[] is updated during taking the snapshot.
O(N) next call -> amortized O(N) on next call

Is it safe on all platforms to do "KnownAssignedXidsNext[prev] = n;" while only 
holding shared lock? I think it is, per Alexander's comment, but maybe let's 
document it?

Thank you!

Thanks! Best regards, Andrey Borodin.



Re: 15beta1 tab completion of extension versions

2022-07-03 Thread Noah Misch
On Sun, Jun 19, 2022 at 12:56:13AM -0400, Tom Lane wrote:
> Actually ... after further thought it seems like maybe we should
> make this more like other cases rather than less so.  ISTM that much
> of the issue here is somebody's decision that "TO version" should be
> offered as a completion of "UPDATE", which is unlike the way we do this
> anywhere else --- the usual thing is to offer "UPDATE TO" as a single
> completion.  So I'm thinking about the attached.

Which are the older completions that offer "UPDATE TO"?  I don't see any.

> This behaves a little differently from the old code.  In v14,
>   alter extension pg_trgm upd
> gives you
>   alter extension pg_trgm update
> and another  produces
>   alter extension pg_trgm update TO "1.
> 
> With this,
>   alter extension pg_trgm upd
> gives you
>   alter extension pg_trgm update to
> and another  produces
>   alter extension pg_trgm update to "1.
> 
> That seems more consistent with other cases, and it's the same
> number of  presses.

I think it makes sense to send UPDATE TO as a single completion in places
where no valid command can have the UPDATE without the TO.  CREATE RULE foo AS
ON UPDATE TO is a candidate, though CREATE RULE completion doesn't do that
today.  "ALTER EXTENSION hstore UPDATE;" is a valid command (updates to the
control file default version).  Hence, I think the v14 behavior was better.




Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word

2022-07-03 Thread Noah Misch
On Mon, May 30, 2022 at 05:20:15PM -0400, Tom Lane wrote:

[allow EXEC SQL TYPE unreserved_keyword IS ...]

> 1. In pgc.l, if an identifier is a typedef name, ignore any possible
> keyword meaning and return it as an IDENT.  (I'd originally supposed
> that we'd want to return some new TYPEDEF token type, but that does
> not seem to be necessary right now, and adding a new token type would
> increase the patch footprint quite a bit.)
> 
> 2. In the var_type production, forget about ECPGColLabel[Common]
> and just handle the keywords we know we need, plus IDENT for the
> typedef case.  It turns out that we have to have duplicate coding
> because most of these keywords are not keywords in C lexing mode,
> so that they'll come through the IDENT path anyway when we're
> in a C rather than SQL context.  That seemed acceptable to me.
> I thought about adding them all to the C keywords list but that
> seemed likely to have undesirable side-effects, and again it'd
> bloat the patch footprint.
> 
> This fix is not without downsides.  Disabling recognition of
> keywords that match typedefs means that, for example, if you
> declare a typedef named "work" then ECPG will fail to parse
> "EXEC SQL BEGIN WORK".  So in a real sense this is just trading
> one hazard for another.  But there is an important difference:
> with this, whether your ECPG program works depends only on what
> typedef names and SQL commands are used in the program.  If
> it compiles today it'll still compile next year, whereas with
> the present implementation the addition of some new unreserved
> SQL keyword could break it.  We'd have to document this change
> for sure, and it wouldn't be something to back-patch, but it
> seems like it might be acceptable from the users' standpoint.

I agree this change is more likely to please a user than to harm a user.  The
user benefit is slim, but the patch is also slim.

> We could narrow (not eliminate) this hazard if we could get the
> typedef lookup in pgc.l to happen only when we're about to parse
> a var_type construct.  But because of Bison's lookahead behavior,
> that seems to be impossible, or at least undesirably messy
> and fragile.  But perhaps somebody else will see a way.

I don't, though I'm not much of a Bison wizard.

> Anyway, this seems like too big a change to consider for v15,
> so I'll stick this patch into the v16 CF queue.  It's only
> draft quality anyway --- lacks documentation changes and test
> cases.  There are also some coding points that could use review.
> Notably, I made the typedef lookup override SQL keywords but
> not C keywords; this is for consistency with the C-mode lookup
> rules, but is it the right thing?

That decision seems fine.  ScanCKeywordLookup() covers just twenty-six
keywords, and that list hasn't changed since 2003.  Moreover, most of them are
keywords of the C language itself, so allowing them would entailing mangling
them in the generated C to avoid C compiler errors.  Given the lack of
complaints, let's not go there.

I didn't locate any problems beyond the test and doc gaps that you mentioned,
so I've marked this Ready for Committer.