Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-15 Thread Amit Kapila
On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila  wrote:
>
> On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada
>  wrote:
> >
> > On Thu, 9 Jul 2020 at 12:11, Amit Kapila  wrote:
> > >
> > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > >
> > > > I think that using oids has another benefit that we don't need to send
> > > > slot name to the stats collector along with the stats. Since the
> > > > maximum size of slot name is NAMEDATALEN and we don't support the
> > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the
> > > > user wants to increase NAMEDATALEN they might not be able to build.
> > > >
> > >
> > > I think NAMEDATALEN is used for many other objects as well and I don't
> > > think we want to change it in foreseeable future, so that doesn't
> > > sound to be a good reason to invent OIDs for slots.  OTOH, I do
> > > understand it would be better to send OIDs than names for slots but I
> > > am just not sure if it is a good idea to invent a new way to generate
> > > OIDs (which is different from how we do it for other objects in the
> > > system) for this purpose.
> >
> > I'm concerned that there might be users who are using custom
> > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I
> > also agree with your concerns. So perhaps we can go with the current
> > PoC patch approach as the first version (i.g., sending slot drop
> > message to stats collector). When we need such a unique identifier
> > also for other purposes, we will be able to change this feature so
> > that it uses that identifier for this statistics reporting purpose.
> >
>
> Okay, feel to submit the version atop my revert patch.
>

Attached, please find the rebased version.  I have kept prorows as 10
instead of 100 for pg_stat_get_replication_slots because I don't see
much reason for keeping the value more than the default value of
max_replication_slots.

As we are targeting this patch for PG14, so I think we can now add the
functionality to reset the stats as well.  What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


spill_stats_v1.patch
Description: Binary data


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-15 Thread Dilip Kumar
On Wed, Jul 15, 2020 at 6:59 PM Amit Kapila  wrote:
>
> On Wed, Jul 15, 2020 at 9:29 AM Dilip Kumar  wrote:
> >
> >
> > I have reviewed your changes and those look good to me,  please find
> > the latest version of the patch set.
> >
>
> I have done an additional round of review and below are the changes I
> made in the attached patch-set.
> 1. Changed comments in 0002.
> 2. In 0005, apart from changing a few comments and function name, I
> have changed below code:
> + if (ReorderBufferCanStream(rb) &&
> + !SnapBuildXactNeedsSkip(builder, ctx->reader->ReadRecPtr))
> Here, I think it is better to compare it with EndRecPtr.  I feel in
> boundary case the next record could be the same as start_decoding_at,
> so why to avoid streaming in that case?

Make sense to me

> 3. In 0006, made below changes:
> a. Removed function ReorderBufferFreeChange and added a new
> parameter in ReorderBufferReturnChange to achieve the same purpose.
> b. Changed quite a few comments, function names, added additional
> Asserts, and few other cosmetic changes.
> 4. In 0007, made below changes:
> a. Removed the unnecessary change in .gitignore
> b. Changed the newly added option name to "stream-change".
>
> Apart from above, I have merged patches 0004, 0005, 0006 and 0007 as
> those seems one functionality to me.  For the sake of review, the
> patch-set that contains merged patches is attached separately as
> v34-combined.
>
> Let me know what you think of the changes?

I have reviewed the changes and looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: gs_group_1 crashing on 13beta2/s390x

2020-07-15 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> It's hardly surprising that datumCopy would segfault when given a
 Tom> null "value" and told it is pass-by-reference. However, to get to
 Tom> the datumCopy call, we must have passed the MemoryContextContains
 Tom> check on that very same pointer value, and that would surely have
 Tom> segfaulted as well, one would think.

Nope, because MemoryContextContains just returns "false" if passed a
NULL pointer.

-- 
Andrew (irc:RhodiumToad)




Re: Volatile Functions in Parallel Plans

2020-07-15 Thread Jesse Zhang
Hi Zhenghua,

On Wed, Jul 15, 2020 at 5:44 AM Zhenghua Lyu wrote:
>
> Hi,
> I test some SQL in the latest Postgres master branch code (we find these 
> issues when
> developing Greenplum database in the PR 
> https://github.com/greenplum-db/gpdb/pull/10418,
> and my colleague come up with the following cases in Postgres):
>
>
> create table t3 (c1 text, c2 text);
> CREATE TABLE
> insert into t3
> select
>   'fhufehwiohewiuewhuhwiufhwifhweuhfwu', --random data
>   'fiowehufwhfyegygfewpfwwfeuhwhufwh' --random data
> from generate_series(1, 1000) i;
> INSERT 0 1000
> analyze t3;
> ANALYZE
> create table t4 (like t3);
> CREATE TABLE
> insert into t4 select * from t4;
> INSERT 0 0
> insert into t4 select * from t3;
> INSERT 0 1000
> analyze t4;
> ANALYZE
> set enable_hashjoin to off;
> SET
> explain (costs off)
> select count(*) from t3, t4
> where t3.c1 like '%sss'
>   and timeofday() = t4.c1 and t3.c1 = t4.c1;
>QUERY PLAN
> 
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on t3
>Filter: (c1 ~~ '%sss'::text)
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
> (10 rows)
>
> explain (verbose, costs off)
> select count(*)
> from
>   t3,
>   (select *, timeofday() as x from t4 ) t4
> where t3.c1 like '%sss' and
>   timeofday() = t4.c1 and t3.c1 = t4.c1;
> QUERY PLAN
> --
>  Finalize Aggregate
>Output: count(*)
>->  Gather
>  Output: (PARTIAL count(*))
>  Workers Planned: 2
>  ->  Partial Aggregate
>Output: PARTIAL count(*)
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on public.t3
>Output: t3.c1, t3.c2
>Filter: (t3.c1 ~~ '%sss'::text)
>  ->  Seq Scan on public.t4
>Output: t4.c1, NULL::text, timeofday()
>Filter: (timeofday() = t4.c1)
> (15 rows)
>
>
>
> Focus on the last two plans, the function timeofday is
> volatile but paralle-safe. And Postgres outputs two parallel
> plan.
>
>
> The first plan:
>
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on t3
>Filter: (c1 ~~ '%sss'::text)
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
>
> The join's left tree is parallel scan and the right tree is seq scan.
> This algorithm is correct using the distribute distributive law of
> distributed join:
>A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) 
> ... (An join B) )
>
> The correctness of the above law should have a pre-assumption:
>   The data set of B is the same in each join: (A1 join B) (A2 join B) ... 
> (An join B)
>
> But things get complicated when volatile functions come in. Timeofday is just
> an example to show the idea. The core is volatile functions  can return 
> different
> results on successive calls with the same arguments. Thus the following piece,
> the right tree of the join
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
> can not be considered consistent everywhere in the scan workers.
>
> The second plan
>
>  Finalize Aggregate
>Output: count(*)
>->  Gather
>  Output: (PARTIAL count(*))
>  Workers Planned: 2
>  ->  Partial Aggregate
>Output: PARTIAL count(*)
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on public.t3
>Output: t3.c1, t3.c2
>Filter: (t3.c1 ~~ '%sss'::text)
>  ->  Seq Scan on public.t4
>Output: t4.c1, NULL::text, timeofday()
>Filter: (timeofday() = t4.c1)
>
>
> have voltile projections in the right tree of the nestloop:
>
>  ->  Seq Scan on public.t4
>Output: t4.c1, NULL::text, timeofday()
>Filter: (timeofday() = t4.c1)
>
> It should not be taken as consistent in different workers.

You are right, no they are not consistent. But Neither plans is
incorrect:

1. In the first query, it's semantically permissible to evaluate
timeofday() for each pair of (c1, c2), and the plan reflects that.
(Notice that the parall

Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-15 Thread Prabhat Sahu
Hi All,
I was testing the feature on top of v3 patch and found the "pg_upgrade"
failure after keeping "alter system read only;" as below:

-- Steps:
./initdb -D data
./pg_ctl -D data -l logs start -c
./psql postgres
alter system read only;
\q
./pg_ctl -D data -l logs stop -c

./initdb -D data2
./pg_upgrade -b . -B . -d data -D data2 -p  -P 5520


[edb@localhost bin]$ ./pg_upgrade -b . -B . -d data -D data2 -p  -P 5520
Performing Consistency Checks
-
Checking cluster versions   ok

The source cluster was not shut down cleanly.
Failure, exiting

--Below is the logs
2021-07-16 11:04:20.305 IST [105788] LOG:  starting PostgreSQL 14devel on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
4.8.5-39), 64-bit
2020-07-16 11:04:20.309 IST [105788] LOG:  listening on IPv6 address "::1",
port 5432
2020-07-16 11:04:20.309 IST [105788] LOG:  listening on IPv4 address
"127.0.0.1", port 5432
2020-07-16 11:04:20.321 IST [105788] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5432"
2020-07-16 11:04:20.347 IST [105789] LOG:  database system was shut down at
2020-07-16 11:04:20 IST
2020-07-16 11:04:20.352 IST [105788] LOG:  database system is ready to
accept connections
2020-07-16 11:04:20.534 IST [105790] LOG:  system is now read only
2020-07-16 11:04:20.542 IST [105788] LOG:  received fast shutdown request
2020-07-16 11:04:20.543 IST [105788] LOG:  aborting any active transactions
2020-07-16 11:04:20.544 IST [105788] LOG:  background worker "logical
replication launcher" (PID 105795) exited with exit code 1
2020-07-16 11:04:20.544 IST [105790] LOG:  shutting down
2020-07-16 11:04:20.544 IST [105790] LOG:  skipping shutdown checkpoint
because the system is read only
2020-07-16 11:04:20.551 IST [105788] LOG:  database system is shut down

On Tue, Jul 14, 2020 at 12:08 PM Amul Sul  wrote:

> Attached is a rebased version for the latest master head[1].
>
> Regards,
> Amul
>
> 1] Commit # 101f903e51f52bf595cd8177d2e0bc6fe9000762
>


-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-15 Thread Masahiko Sawada
On Tue, 14 Jul 2020 at 11:19, Fujii Masao  wrote:
>
>
>
> On 2020/07/14 9:08, Masahiro Ikeda wrote:
> >> I've attached the latest version patches. I've incorporated the review
> >> comments I got so far and improved locking strategy.
> >
> > Thanks for updating the patch!
>
> +1
> I'm interested in these patches and now studying them. While checking
> the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

>
> 1. We can access to the foreign table even during recovery in the HEAD.
> But in the patched version, when I did that, I got the following error.
> Is this intentional?
>
> ERROR:  cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

>
> 2. With the patch, when INSERT/UPDATE/DELETE are executed both in
> local and remote servers, 2PC is executed at the commit phase. But
> when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are
> executed in local and INSERT/UPDATE/DELETE are executed in remote,
> 2PC is NOT executed. Is this safe?

Hmm, you're right. I think atomic commit must be used also when the
user executes other write SQLs such as TRUNCATE, COPY, CLUSTER, and
CREATE TABLE on the local node.

>
> 3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE
> are executed. But it's not reset even when those queries are canceled by
> ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase.

Will fix.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: renaming configure.in to configure.ac

2020-07-15 Thread Noah Misch
On Wed, Jul 15, 2020 at 09:45:54AM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > It was mentioned elsewhere in passing that a new Autoconf release might 
> > be coming.  That one will warn about the old naming "configure.in" and 
> > request "configure.ac".  So we might want to rename that sometime. 
> > Before we get into the specifics, I suggest that all interested parties 
> > check whether buildfarm scripts, packaging scripts, etc. need to be 
> > adjusted for the newer name.
> 
> Along the same line, I read at [1]
> 
> Because it has been such a long time, and because some of the changes
> potentially break existing Autoconf scripts, we are conducting a
> public beta test before the final release of version 2.70.  Please
> test this beta with your autoconf scripts, and report any problems you
> find to the Savannah bug tracker:
> 
> Maybe we should do some pro-active testing, rather than just waiting for
> 2.70 to get dropped on us?  God knows how long it will be until 2.71.

Sounds good.  A cheap option would be to regenerate with 2.70, push that on a
Friday night to see what the buildfarm thinks, and revert it on Sunday night.




Re: Partitioning and postgres_fdw optimisations for multi-tenancy

2020-07-15 Thread Etsuro Fujita
On Wed, Jul 15, 2020 at 9:02 PM Etsuro Fujita  wrote:
> On Wed, Jul 15, 2020 at 12:12 AM Alexey Kondratov
>  wrote:
> > On 2020-07-14 15:27, Ashutosh Bapat wrote:
> > > On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> > >  wrote:
> > >> Some real-life test queries show, that all single-node queries aren't
> > >> pushed-down to the required node. For example:
> > >>
> > >> SELECT
> > >>  *
> > >> FROM
> > >>  documents
> > >>  INNER JOIN users ON documents.user_id = users.id
> > >> WHERE
> > >>  documents.company_id = 5
> > >>  AND users.company_id = 5;
> > >
> > > There are a couple of things happening here
> > > 1. the clauses on company_id in WHERE clause are causing partition
> > > pruning. Partition-wise join is disabled with partition pruning before
> > > PG13.
>
> More precisely, PWJ cannot be applied when there are no matched
> partitions on the nullable side due to partition pruning before PG13.

On reflection, I think I was wrong: the limitation applies to PG13,
even with advanced PWJ.

> But the join is an inner join, so I think PWJ can still be applied for
> the join.

I think I was wrong in this point as well :-(.  PWJ cannot be applied
to the join due to the limitation of the PWJ matching logic.  See the
discussion started in [1].  I think the patch in [2] would address
this issue as well, though the patch is under review.

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk%2BihstpKEt3a1LT6X78A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/502.1586032...@sss.pgh.pa.us




RE: Transactions involving multiple postgres foreign servers, take 2

2020-07-15 Thread tsunakawa.ta...@fujitsu.com
Hi Sawada san,


I'm reviewing this patch series, and let me give some initial comments and 
questions.  I'm looking at this with a hope that this will be useful purely as 
a FDW enhancement for our new use cases, regardless of whether the FDW will be 
used for Postgres scale-out.

I don't think it's necessarily required to combine 2PC with the global 
visibility.  X/Open XA specification only handles the atomic commit.  The only 
part in the XA specification that refers to global visibility is the following:


[Quote from XA specification]
--
2.3.2 Protocol Optimisations 
・ Read-only 
An RM can respond to the TM’s prepare request by asserting that the RM was not 
asked to update shared resources in this transaction branch. This response 
concludes the RM’s involvement in the transaction; the Phase 2 dialogue between 
the TM and this RM does not occur. The TM need not stably record, in its list 
of 
participating RMs, an RM that asserts a read-only role in the global 
transaction. 

However, if the RM returns the read-only optimisation before all work on the 
global 
transaction is prepared, global serialisability1 cannot be guaranteed. This is 
because 
the RM may release transaction context, such as read locks, before all 
application 
activity for that global transaction is finished. 

1. 
Serialisability is a property of a set of concurrent transactions. For a 
serialisable set of transactions, at least one 
serial sequence of the transactions exists that produces identical results, 
with respect to shared resources, as does 
concurrent execution of the transaction. 
--


(1)
Do other popular DBMSs (Oracle, MySQL, etc.)  provide concrete functions that 
can be used for the new FDW commit/rollback/prepare API?  I'm asking this to 
confirm that we really need to provide these functions, not as the transaction 
callbacks for postgres_fdw.


(2)
How are data modifications tracked in local and remote transactions?  0001 
seems to handle local INSERT/DELETE/UPDATE.  Especially:

* COPY FROM to local/remote tables/views.

* User-defined function calls that modify data, e.g. SELECT func1() WHERE col = 
func2()


(3)
Does the 2PC processing always go through the background worker?
Is the group commit effective on the remote server? That is, PREPARE and COMMIT 
PREPARED issued from multiple remote sessions are written to WAL in batch?


Regards
Takayuki Tsunakawa



Re: Volatile Functions in Parallel Plans

2020-07-15 Thread Zhenghua Lyu
Hi, thanks for your reply.
But this won't be consistent even for non-parallel plans.
If we do not use the distributed law of parallel join, it seems
OK.

If we generate a parallel plan using the distributed law of the join,
then this transformation's pre-assumption might be broken.

Currently, we don't consider volatile functions as
parallel-safe by default.

I run the SQL in pg12:

zlv=# select count(proname) from pg_proc where provolatile = 'v' and 
proparallel ='s';
 count
---
   100
(1 row)

zlv=# select proname from pg_proc where provolatile = 'v' and proparallel ='s';
proname

 timeofday
 bthandler
 hashhandler
 gisthandler
 ginhandler
 spghandler
 brinhandler

It seems there are many functions which is both volatile and parallel safe.

From: Amit Kapila 
Sent: Thursday, July 16, 2020 12:07 PM
To: Zhenghua Lyu 
Cc: PostgreSQL Hackers 
Subject: Re: Volatile Functions in Parallel Plans

On Wed, Jul 15, 2020 at 6:14 PM Zhenghua Lyu  wrote:
>
>
> The first plan:
>
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on t3
>Filter: (c1 ~~ '%sss'::text)
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
>
> The join's left tree is parallel scan and the right tree is seq scan.
> This algorithm is correct using the distribute distributive law of
> distributed join:
>A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) 
> ... (An join B) )
>
> The correctness of the above law should have a pre-assumption:
>   The data set of B is the same in each join: (A1 join B) (A2 join B) ... 
> (An join B)
>
> But things get complicated when volatile functions come in. Timeofday is just
> an example to show the idea. The core is volatile functions  can return 
> different
> results on successive calls with the same arguments. Thus the following piece,
> the right tree of the join
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
> can not be considered consistent everywhere in the scan workers.
>

But this won't be consistent even for non-parallel plans.  I mean to
say for each loop of join the "Seq Scan on t4" would give different
results.  Currently, we don't consider volatile functions as
parallel-safe by default.

--
With Regards,
Amit Kapila.
EnterpriseDB: 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=02%7C01%7Czlyu%40vmware.com%7C825aa0c2259c4da0112008d8293dcd1c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637304692698598521&sdata=LWZnJ43KQML3EBwB2DoPGE0KHA2t6A3%2FIS9KSLx%2Bcn4%3D&reserved=0


Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-15 Thread Masahiko Sawada
On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda  wrote:
>
> > I've attached the latest version patches. I've incorporated the review
> > comments I got so far and improved locking strategy.
>
> I want to ask a question about streaming replication with 2PC.
> Are you going to support 2PC with streaming replication?
>
> I tried streaming replication using v23 patches.
> I confirm that 2PC works with streaming replication,
> which there are primary/standby coordinator.
>
> But, in my understanding, the WAL of "PREPARE" and
> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
> sync.
>
> If this is right, the unresolved transaction can be occurred.
>
> For example,
>
> 1. PREPARE is done
> 2. crash primary before the WAL related to PREPARE is
> replicated to the standby server
> 3. promote standby server // but can't execute "ABORT PREPARED"
>
> In above case, the remote server has the unresolved transaction.
> Can we solve this problem to support in-sync replication?
>
> But, I think some users use async replication for performance.
> Do we need to document the limitation or make another solution?
>

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes. To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Volatile Functions in Parallel Plans

2020-07-15 Thread Amit Kapila
On Wed, Jul 15, 2020 at 6:14 PM Zhenghua Lyu  wrote:
>
>
> The first plan:
>
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on t3
>Filter: (c1 ~~ '%sss'::text)
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
>
> The join's left tree is parallel scan and the right tree is seq scan.
> This algorithm is correct using the distribute distributive law of
> distributed join:
>A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) 
> ... (An join B) )
>
> The correctness of the above law should have a pre-assumption:
>   The data set of B is the same in each join: (A1 join B) (A2 join B) ... 
> (An join B)
>
> But things get complicated when volatile functions come in. Timeofday is just
> an example to show the idea. The core is volatile functions  can return 
> different
> results on successive calls with the same arguments. Thus the following piece,
> the right tree of the join
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
> can not be considered consistent everywhere in the scan workers.
>

But this won't be consistent even for non-parallel plans.  I mean to
say for each loop of join the "Seq Scan on t4" would give different
results.  Currently, we don't consider volatile functions as
parallel-safe by default.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-15 Thread Amit Kapila
On Wed, Jul 15, 2020 at 8:06 AM Amit Kapila  wrote:
>
> On Wed, Jul 15, 2020 at 12:32 AM Robert Haas  wrote:
> >
> > On Sat, Jul 11, 2020 at 8:37 AM Dilip Kumar  wrote:
> > > I have just notice that the parallelism is off even for the select
> > > part of the query mentioned in the $subject.  I see the only reason it
> > > is not getting parallel because we block the parallelism if the query
> > > type is not SELECT.  I don't see any reason for not selecting the
> > > parallelism for this query.
> >
> > There's a relevant comment near the top of heap_prepare_insert().
> >
>
> I think that is no longer true after commits 85f6b49c2c and 3ba59ccc89
> where we have allowed relation extension and page locks to conflict
> among group members.  We have accordingly changed comments at a few
> places but forgot to update this one.  I will check and see if any
> other similar comments are there which needs to be updated.
>

The attached patch fixes the comments.  Let me know if you think I
have missed anything or any other comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-comments-in-heapam.c.patch
Description: Binary data


Re: Is it useful to record whether plans are generic or custom?

2020-07-15 Thread torikoshia

On 2020-07-15 11:44, Fujii Masao wrote:

On 2020/07/14 21:24, torikoshia wrote:

On 2020-07-10 10:49, torikoshia wrote:

On 2020-07-08 16:41, Fujii Masao wrote:

On 2020/07/08 10:14, torikoshia wrote:

On 2020-07-06 22:16, Fujii Masao wrote:

On 2020/06/11 14:59, torikoshia wrote:

On 2020-06-10 18:00, Kyotaro Horiguchi wrote:



+    TupleDescInitEntry(tupdesc, (AttrNumber) 8, "last_plan",

This could be a problem if we showed the last plan in this view. 
I

think "last_plan_type" would be better.

+    if (prep_stmt->plansource->last_plan_type == 
PLAN_CACHE_TYPE_CUSTOM)

+    values[7] = CStringGetTextDatum("custom");
+    else if (prep_stmt->plansource->last_plan_type == 
PLAN_CACHE_TYPE_GENERIC)

+    values[7] = CStringGetTextDatum("generic");
+    else
+    nulls[7] = true;

Using swith-case prevents future additional type (if any) from 
being

unhandled.  I think we are recommending that as a convension.


Thanks for your reviewing!

I've attached a patch that reflects your comments.


Thanks for the patch! Here are the comments.


Thanks for your review!


+    Number of times generic plan was choosen
+    Number of times custom plan was choosen

Typo: "choosen" should be "chosen"?


Thanks, fixed them.

+  role="column_definition">
+   last_plan_type 
text

+  
+  
+    Tells the last plan type was generic or custom. If the 
prepared

+    statement has not executed yet, this field is null
+  

Could you tell me how this information is expected to be used?
I think that generic_plans and custom_plans are useful when 
investigating
the cause of performance drop by cached plan mode. But I failed to 
get

how much useful last_plan_type is.


This may be an exceptional case, but I once had a case needed
to ensure whether generic or custom plan was chosen for specific
queries in a development environment.


In your case, probably you had to ensure that the last multiple (or 
every)
executions chose generic or custom plan? If yes, I'm afraid that 
displaying

only the last plan mode is not enough for your case. No?
So it seems better to check generic_plans or custom_plans columns in 
the

view rather than last_plan_type even in your case. Thought?


Yeah, I now feel last_plan is not so necessary and only the numbers 
of

generic/custom plan is enough.

If there are no objections, I'm going to remove this column and 
related codes.


As mentioned, I removed last_plan column.


Thanks for updating the patch! It basically looks good to me.

I have one comment; you added the regression tests for generic and
custom plans into prepare.sql. But the similar tests already exist in
plancache.sql. So isn't it better to add the tests for generic_plans 
and

custom_plans columns, into plancache.sql?



Thanks for your comments!

Agreed.
I removed tests on prepare.sql and added them to plancache.sql.
Thoughts?


Regards,


--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom e3517479cea76e88f3ab8626d14452b36288dcb5 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 14 Jul 2020 10:27:32 +0900
Subject: [PATCH] We didn't have an easy way to find how many times generic and
 ustom plans of a prepared statement have been executed. This patch exposes
 the numbers of both plans in pg_prepared_statements.

---
 doc/src/sgml/catalogs.sgml  | 18 
 src/backend/commands/prepare.c  | 15 +++---
 src/backend/utils/cache/plancache.c | 17 
 src/include/catalog/pg_proc.dat |  6 ++--
 src/include/utils/plancache.h   |  3 +-
 src/test/regress/expected/plancache.out | 37 -
 src/test/regress/expected/rules.out |  6 ++--
 src/test/regress/sql/plancache.sql  | 12 +++-
 8 files changed, 96 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9cdff4864..8bc6d685f4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10830,6 +10830,24 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
frontend/backend protocol
   
  
+
+ 
+  
+   generic_plans int8
+  
+  
+Number of times generic plan was chosen
+  
+ 
+
+ 
+  
+   custom_plans int8
+  
+  
+Number of times custom plan was chosen
+  
+ 
 

   
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80d6df8ac1..4b18be5b27 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -694,7 +694,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 /*
  * This set returning function reads all the prepared statements and
- * returns a set of (name, statement, prepare_time, param_types, from_sql).
+ * returns a set of (name, statement, prepare_time, param_types, from_sql,
+ * generic_plans, custom_plans).
  */
 Datum
 pg_prepared_state

Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Filipe Rosset
On Wed, Jul 15, 2020 at 7:48 PM Tom Lane  wrote:

> As of a couple days ago, buildfarm member caiman (Fedora rawhide)
> is failing like this in all the pre-v12 branches:
>
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND
> -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
> wait_error.o wait_error.c
> wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
> wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared
> (first use in this function)
>71 |  sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
>   |  ^~~
> wait_error.c:71:6: note: each undeclared identifier is reported only once
> for each function it appears in
> make[2]: *** [: wait_error.o] Error 1
>
> We haven't changed anything, ergo something changed at the OS level.
>
> Oddly, we'd not get to this code unless configure set
> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
> issue here is some rearrangement of system header files combined with
> wait_error.c (and maybe other places?) not including exactly the same
> headers that configure tested.
>
> Anyway, rather than installing rawhide and trying to debug this,
> I'd like to make a modest proposal: let's back-patch the v12
> patches that made us stop relying on sys_siglist[], viz a73d08319
> and cc92cca43.  Per the discussions that led to those patches,
> it's been decades since any platform didn't have POSIX-compliant
> strsignal(), so we'd be much better off relying on that.
>
> regards, tom lane
>

 I believe it's related with these recent glibc changes at rawhide.
https://src.fedoraproject.org/rpms/glibc/c/0aab7eb58528999277c626fc16682da179de03d0?branch=master

  - signal: Move sys_errlist to a compat symbol
  - signal: Move sys_siglist to a compat symbol
SHA512 (glibc-2.31.9000-683-gffb17e7ba3.tar.xz) =
103ff3c04de5dc149df93e5399de1630f6fff1b8d7f127881d6e530492b8b953a8064205ceecb311a77c0a10de3a5ab2056121fd1fa833a30327c6b1f08beacc


Re: Improving connection scalability: GetSnapshotData()

2020-07-15 Thread Alvaro Herrera
On 2020-Jul-15, Andres Freund wrote:

> It could make sense to split the conversion of
> VariableCacheData->latestCompletedXid to FullTransactionId out from 0001
> into is own commit. Not sure...

+1, the commit is large enough and that change can be had in advance.

Note you forward-declare struct GlobalVisState twice in heapam.h.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




"cert" + clientcert=verify-ca in pg_hba.conf?

2020-07-15 Thread Kyotaro Horiguchi
Hello.

The "Certificate Authentication" section in the doc for PG12 and later
describes the relation ship with clientcert as follows.

> In a pg_hba.conf record specifying certificate authentication, the
> authentication option clientcert is assumed to be verify-ca or
> verify-full, and it cannot be turned off since a client certificate
> is necessary for this method. What the cert method adds to the basic
> clientcert certificate validity test is a check that the cn
> attribute matches the database user name.

In reality, cert method is assumed as "vefiry-full" and does not add
anything to verify-full and cannot be degraded or turned off. It seems
to be a mistake on rewriting it when clientcert was changed to accept
verify-ca/full at PG12.

Related to that, pg_hba.conf accepts the combination of "cert" method
and the option clientcert="verify-ca" but it is ignored. We should
reject that combination the same way with "cert"+"no-verify".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d7e24df379f29e55adc41c6a98602f2434f5b07e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 15 Jul 2020 21:00:20 +0900
Subject: [PATCH] Fix behavior for "cert"+"verify-ca" in pg_hba.conf

"cert" method in a line in pg_hba.conf is assumed as
clientcert="verify_full" and if we set it to "verify-ca", it is just
ignored. Since we explicitly reject "no-verify" in that case so we
should reject verify-ca as well.
---
 doc/src/sgml/client-auth.sgml |  7 ++-
 src/backend/libpq/hba.c   | 10 ++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 5cd88b462d..5ea45d5b87 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -2044,11 +2044,8 @@ host ... radius radiusservers="server1,server2" radiussecrets="""secret one"",""

 In a pg_hba.conf record specifying certificate
 authentication, the authentication option clientcert is
-assumed to be verify-ca or verify-full,
-and it cannot be turned off since a client certificate is necessary for this
-method. What the cert method adds to the basic
-clientcert certificate validity test is a check that the
-cn attribute matches the database user name.
+assumed to be verify-full,
+and it cannot be degradated to verify-ca nor turned off.

   
 
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 9d63830553..18ef385405 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1711,6 +1711,16 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 		if (strcmp(val, "1") == 0
 			|| strcmp(val, "verify-ca") == 0)
 		{
+			if (hbaline->auth_method == uaCert)
+			{
+ereport(elevel,
+		(errcode(ERRCODE_CONFIG_FILE_ERROR),
+		 errmsg("clientcert can not be set to \"verify-ca\" when using \"cert\" authentication"),
+		 errcontext("line %d of configuration file \"%s\"",
+	line_num, HbaFileName)));
+*err_msg = "clientcert can not be set to \"verify-ca\" when using \"cert\" authentication";
+return false;
+			}
 			hbaline->clientcert = clientCertCA;
 		}
 		else if (strcmp(val, "verify-full") == 0)
-- 
2.18.4



Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Jul 16, 2020 at 10:48 AM Tom Lane  wrote:
>> We haven't changed anything, ergo something changed at the OS level.

> It looks like glibc very recently decided[1] to hide the declaration,
> but we're using a cached configure test result.

Right.  So, modulo the mis-cached result, what would happen if we do
nothing is that the back branches would lose the ability to translate
signal numbers to strings on bleeding-edge glibc.  I don't think we
want that, so we need to back-patch.  Attached is a lightly tested
patch for v11.  (This includes 7570df0f3 as well, so that
pgstrsignal.c will be the same in all branches.)

regards, tom lane

diff --git a/configure b/configure
index 4e1b4be7fb..7382a34d60 100755
--- a/configure
+++ b/configure
@@ -15004,7 +15004,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range uselocale utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setsid shm_open strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15893,24 +15893,6 @@ esac
 
 fi
 
-ac_fn_c_check_decl "$LINENO" "sys_siglist" "ac_cv_have_decl_sys_siglist" "#include 
-/* NetBSD declares sys_siglist in unistd.h.  */
-#ifdef HAVE_UNISTD_H
-# include 
-#endif
-
-"
-if test "x$ac_cv_have_decl_sys_siglist" = xyes; then :
-  ac_have_decl=1
-else
-  ac_have_decl=0
-fi
-
-cat >>confdefs.h <<_ACEOF
-#define HAVE_DECL_SYS_SIGLIST $ac_have_decl
-_ACEOF
-
-
 ac_fn_c_check_func "$LINENO" "syslog" "ac_cv_func_syslog"
 if test "x$ac_cv_func_syslog" = xyes; then :
   ac_fn_c_check_header_mongrel "$LINENO" "syslog.h" "ac_cv_header_syslog_h" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index fab1658bca..defcb8ff99 100644
--- a/configure.in
+++ b/configure.in
@@ -1622,6 +1622,7 @@ AC_CHECK_FUNCS(m4_normalize([
 	setproctitle
 	setsid
 	shm_open
+	strsignal
 	symlink
 	sync_file_range
 	uselocale
@@ -1821,14 +1822,6 @@ if test "$PORTNAME" = "cygwin"; then
   AC_LIBOBJ(dirmod)
 fi
 
-AC_CHECK_DECLS([sys_siglist], [], [],
-[#include 
-/* NetBSD declares sys_siglist in unistd.h.  */
-#ifdef HAVE_UNISTD_H
-# include 
-#endif
-])
-
 AC_CHECK_FUNC(syslog,
   [AC_CHECK_HEADER(syslog.h,
[AC_DEFINE(HAVE_SYSLOG, 1, [Define to 1 if you have the syslog interface.])])])
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 252220c770..08577f5e5f 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -596,17 +596,10 @@ pgarch_archiveXlog(char *xlog)
 	 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
 	 errdetail("The failed archive command was: %s",
 			   xlogarchcmd)));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
-			ereport(lev,
-	(errmsg("archive command was terminated by signal %d: %s",
-			WTERMSIG(rc),
-			WTERMSIG(rc) < NSIG ? sys_siglist[WTERMSIG(rc)] : "(unknown)"),
-	 errdetail("The failed archive command was: %s",
-			   xlogarchcmd)));
 #else
 			ereport(lev,
-	(errmsg("archive command was terminated by signal %d",
-			WTERMSIG(rc)),
+	(errmsg("archive command was terminated by signal %d: %s",
+			WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
 	 errdetail("The failed archive command was: %s",
 			   xlogarchcmd)));
 #endif
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3bfc299be1..75a9e07041 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3563,6 +3563,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		procname, pid, WEXITSTATUS(exitstatus)),
  activity ? errdetail("Failed process was running: %s", activity) : 0));
 	else if (WIFSIGNALED(exitstatus))
+	{
 #if defined(WIN32)
 		ereport(lev,
 
@@ -3573,7 +3574,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		procname, pid, WTERMSIG(exitstatus)),
  errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
  activity ? errdetail("Failed process was running: %s", activity) : 0));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
+#else
 		ereport(lev,
 
 		/*--
@@ -3581,19 +3582,10 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		  "server process" */
 (errmsg("%s (PID %d) was terminated by signal %d: %s",
 		procname, p

Re: Infinities in type numeric

2020-07-15 Thread Dean Rasheed
On Tue, 16 Jun 2020 at 18:24, Tom Lane  wrote:
>
> The attached v3 patch fixes these things and also takes care of an
> oversight in v2: I'd made numeric() apply typmod restrictions to Inf,
> but not numeric_in() or numeric_recv().  I believe the patch itself
> is in pretty good shape now, though there are still some issues
> elsewhere as noted in the first message in this thread.
>

I had a look at this, and I think it's mostly in good shape. It looks
like everything from the first message in this thread has been
resolved, except I don't know about the jsonpath stuff, because I
haven't been following that.

I tried to go over all the edge cases and I think they all make sense,
except for a couple of cases which I've listed below, along with a few
other minor comments:

1). I don't think that the way in_range() handles infinities is quite
right. For example:

SELECT in_range('inf'::numeric, 10::numeric, 'inf'::numeric, false, false);
 in_range
--
 f
(1 row)

But I think that should return "val >= base + offset", which is "Inf
>= Inf", which should be true.

Similarly, I think this should return true:

SELECT in_range('-inf'::numeric, 10::numeric, 'inf'::numeric, true, true);
 in_range
--
 f
(1 row)

I think this could use some test coverage.

2). I think numeric_pg_lsn() needs updating -- this should probably be an error:

SELECT pg_lsn('inf'::numeric);
 pg_lsn

 0/0
(1 row)

3). In the bottom half of numeric.c, there's a section header comment
saying "Local functions follow ... In general, these do not support
NaNs ...". That should probably also mention infinities. There are
also now more functions to mention that are exceptions to that comment
about not handling NaN/Inf, but I think that some of the new
exceptions can be avoided.

4). The comment for set_var_from_str() mentions that it doesn't handle
"NaN", so on the face of it, it ought to also mention that it doesn't
handle "Infinity" either. However, this is only a few lines down from
that "Local functions follow ..." section header comment, which
already covers that, so it seems pointless mentioning NaNs and
infinities again for this function (none of the other local functions
in that section of the file do).

5). It seems a bit odd that numeric_to_double_no_overflow() handles
infinite inputs, but not NaN inputs, while its only caller
numeric_float8_no_overflow() handles NaNs, but not infinities. ISTM
that it would be neater to have all the special-handling in one place
(in the caller). That would also stop numeric_to_double_no_overflow()
being an exception to the preceding section header comment about local
functions not handling Nan/Inf. In fact, I wonder why keep
numeric_to_double_no_overflow() at all? It could just be rolled into
its caller, making it more like numeric_float8().

6). The next function, numericvar_to_double_no_overflow(), has a
comment that just says "As above, but work from a NumericVar", but it
isn't really "as above" anymore, since it doesn't handle infinite
inputs. Depending on what happens to numeric_to_double_no_overflow(),
this function's comment might need some tweaking.

7). The new function numeric_is_integral() feels out of place where it
is, amongst arithmetic functions operating on NumericVar's, because it
operates on a Numeric, and also because it handles NaNs, making it
another exception to the preceding comment about local functions that
don't handle NaNs. Perhaps it would fit in better after
numeric_is_nan() and numeric_is_inf(). Even though it's a local
function, it feels more akin to those functions.

Finally, not really in the scope of this patch, but something I
noticed anyway while looking at edge cases -- float and numeric handle
NaN/0 differently:

SELECT 'nan'::float8 / 0::float8;
ERROR:  division by zero

SELECT 'nan'::numeric / 0::numeric;
 ?column?
--
  NaN

I'm not sure if this is worth worrying about, or which behaviour is
preferable though.

Regards,
Dean




Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Jul 16, 2020 at 10:48 AM Tom Lane  wrote:
>> Oddly, we'd not get to this code unless configure set
>> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.

> It looks like glibc very recently decided[1] to hide the declaration,
> but we're using a cached configure test result.

Ah, of course.  I was thinking that Peter had just changed configure
in the last day or so, but that did not affect the back branches.
So it's unsurprising for buildfarm animals to be using cached configure
results.

> I guess rawhide is the RH thing that tracks the bleeding edge?

Yup.  Possibly we should recommend that buildfarm owners running on
non-stable platforms disable autoconf result caching --- I believe
that's "use_accache => undef" in the configuration file.

Alternatively, maybe it'd be bright for the buildfarm script to
discard that cache after any failure (or at least configure or
build failures).

regards, tom lane




Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Thomas Munro
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane  wrote:
> We haven't changed anything, ergo something changed at the OS level.
>
> Oddly, we'd not get to this code unless configure set
> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
> issue here is some rearrangement of system header files combined with
> wait_error.c (and maybe other places?) not including exactly the same
> headers that configure tested.

It looks like glibc very recently decided[1] to hide the declaration,
but we're using a cached configure test result.  I guess rawhide is
the RH thing that tracks the bleeding edge?

> Anyway, rather than installing rawhide and trying to debug this,
> I'd like to make a modest proposal: let's back-patch the v12
> patches that made us stop relying on sys_siglist[], viz a73d08319
> and cc92cca43.  Per the discussions that led to those patches,
> it's been decades since any platform didn't have POSIX-compliant
> strsignal(), so we'd be much better off relying on that.

Seems sensible.  Despite the claims of the glibc manual[2], it's not
really a GNU extension, and the BSDs have it (for decades).

[1] 
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b1ccfc061feee9ce616444ded8e1cd5acf9fa97f
[2] https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html




sys_siglist[] is causing us trouble again

2020-07-15 Thread Tom Lane
As of a couple days ago, buildfarm member caiman (Fedora rawhide)
is failing like this in all the pre-v12 branches:

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND 
-I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o wait_error.o 
wait_error.c
wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared (first 
use in this function)
   71 |  sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
  |  ^~~
wait_error.c:71:6: note: each undeclared identifier is reported only once for 
each function it appears in
make[2]: *** [: wait_error.o] Error 1

We haven't changed anything, ergo something changed at the OS level.

Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.

Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43.  Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2020-07-15 Thread Daniel Gustafsson
> On 15 Jul 2020, at 20:35, Andrew Dunstan  
> wrote:
> 
> On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>> 
>> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
>> if anyone has an interest in NSS, then I would love to hear feedback on how 
>> it
>> works (and doesn't work).
> 
> I'll be happy to help, particularly with Windows support and with some
> of the callback stuff I've had a hand in.

That would be fantastic, thanks!  The password callback handling is still a
TODO so feel free to take a stab at that since you have a lot of context on
there.

For Windows, I've include USE_NSS in Solution.pm as Thomas pointed out in this
thread, but that was done blind as I've done no testing on Windows yet.

cheers ./daniel



Re: Generic Index Skip Scan

2020-07-15 Thread David Rowley
On Thu, 16 Jul 2020 at 07:52, Floris Van Nee  wrote:
>
> Besides the great efforts that Dmitry et al. are putting into the skip scan 
> for DISTINCT queries [1], I’m also still keen on extending the use of it 
> further. I’d like to address the limited cases in which skipping can occur 
> here. A few months ago I shared an initial rough patch that provided a 
> generic skip implementation, but lacked the proper planning work [2]. I’d 
> like to share a second patch set that provides an implementation of the 
> planner as well. Perhaps this can lead to some proper discussions how we’d 
> like to shape this patch further.
>
> Please see [2] for an introduction and some rough performance comparisons. 
> This patch improves upon those, because it implements proper cost estimation 
> logic. It will now only choose the skip scan if it’s deemed to be cheaper 
> than using a regular index scan. Other than that, all the features are still 
> there. The skip scan can be used in many more types of queries than in the 
> original DISTINCT patch as provided in [1], making it more performant and 
> also more predictable for users.
>
> I’m keen on receiving feedback on this idea and on the patch.

I don't think anyone ever thought the feature would be limited to just
making DISTINCT go faster. There's certain to be more usages in the
future.

However, for me it would be premature to look at this now.

David




Re: gs_group_1 crashing on 13beta2/s390x

2020-07-15 Thread Tom Lane
Christoph Berg  writes:
>> On the Debian s390x buildd, the 13beta2 build is crashing:

> I wired gdb into the build process and got this backtrace:

> #0  datumCopy (typByVal=false, typLen=-1, value=0) at 
> ./build/../src/backend/utils/adt/datum.c:142
> vl = 0x0
> res = 
> res = 
> vl = 
> eoh = 
> resultsize = 
> resultptr = 
> realSize = 
> resultptr = 
> realSize = 
> resultptr = 
> #1  datumCopy (value=0, typByVal=false, typLen=-1) at 
> ./build/../src/backend/utils/adt/datum.c:131
> res = 
> vl = 
> eoh = 
> resultsize = 
> resultptr = 
> realSize = 
> resultptr = 
> #2  0x02aa04423af8 in finalize_aggregate 
> (aggstate=aggstate@entry=0x2aa05775920, peragg=peragg@entry=0x2aa056e02f0, 
> resultVal=0x2aa056e0208, resultIsNull=0x2aa056e022a, pergroupstate= out>, pergroupstate=) at 
> ./build/../src/backend/executor/nodeAgg.c:1128

Hmm.  If gdb isn't lying to us, that has to be coming from here:

/*
 * If result is pass-by-ref, make sure it is in the right context.
 */
if (!peragg->resulttypeByVal && !*resultIsNull &&
!MemoryContextContains(CurrentMemoryContext,
   DatumGetPointer(*resultVal)))
*resultVal = datumCopy(*resultVal,
   peragg->resulttypeByVal,
   peragg->resulttypeLen);

The line numbers in HEAD are a bit different, but that's the only
call of datumCopy() in finalize_aggregate().

It's hardly surprising that datumCopy would segfault when given
a null "value" and told it is pass-by-reference.  However, to get to
the datumCopy call, we must have passed the MemoryContextContains
check on that very same pointer value, and that would surely have
segfaulted as well, one would think.

Given the apparently-can't-happen situation at the call site,
and the fact that we're not seeing similar failures reported
elsewhere (and note that every line shown above is at least
five years old), I'm kind of forced to the conclusion that this
is a compiler bug.  Does adjusting the -O level make it go away?

regards, tom lane




Re: Warn when parallel restoring a custom dump without data offsets

2020-07-15 Thread Tom Lane
I wrote:
> The attached 0001 rewrites your 0001 as per the above ideas (dropping
> the proposed doc change for now), and includes your 0004 for simplicity.
> I'm including your 0002 verbatim just so the cfbot will be able to do a
> meaningful test on 0001; but as stated, I don't really want to commit it.

I spent some more time testing this, by trying to dump and restore the
core regression database.  I immediately noticed that I sometimes got
"ftell mismatch with expected position -- ftell used" warnings, though
it was a bit variable depending on the -j level.  The reason was fairly
apparent on looking at the code: we had various fseeko() calls in code
paths that did not bother to correct ctx->filePos afterwards.  In fact,
*none* of the four existing fseeko calls in pg_backup_custom.c did so.
It's fairly surprising that that hadn't caused a problem up to now.

I started to add adjustments of ctx->filePos after all the fseeko calls,
but then began to wonder why we don't just rip the variable out entirely.
The only places where we need it are to set dataPos for data blocks,
but that's an entirely pointless activity if we don't have seek
capability, because we're not going to be able to rewrite the TOC
to emit the updated values.

Hence, the  patch attached rips out ctx->filePos, and then
0001 is the currently-discussed patch rebased on that.  I also added
an additional refinement, which is to track the furthest point we've
scanned to while looking for data blocks in an offset-less file.
If we have seek capability, then when we need to resume looking for
data blocks we can search forward from that spot rather than wherever
we happened to have stopped at.  This fixes an additional source
of potentially-O(N^2) behavior if we have to restore blocks in a
very out-of-order fashion.  I'm not sure that it makes much difference
in common cases, but with this we can say positively that we don't
scan the same block more than once per worker process.

I'm still unhappy about the proposed test case (0002), but now
I have a more concrete reason for that: it didn't catch this bug,
so the coverage is still pretty miserable.

Dump-and-restore-the-regression-database used to be a pretty common
manual test for pg_dump, but we never got around to automating it,
possibly because we figured that the pg_upgrade test script covers
that ground.  It's becoming gruesomely clear that pg_upgrade is a
distinct operating mode that doesn't necessarily have the same bugs.
So I'm inclined to feel that what we ought to do is automate a test
of that sort; but first we'll have to fix the existing bugs described
at [1][2].

Given the current state of affairs, I'm inclined to commit the
attached with no new test coverage, and then come back and look
at better testing after the other bugs are dealt with.

regards, tom lane

[1] https://www.postgresql.org/message-id/3169466.1594841366%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/3170626.1594842723%40sss.pgh.pa.us

diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index 6ab122242c..3a9881d601 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -70,14 +70,12 @@ typedef struct
 {
 	CompressorState *cs;
 	int			hasSeek;
-	pgoff_t		filePos;
-	pgoff_t		dataStart;
 } lclContext;
 
 typedef struct
 {
 	int			dataState;
-	pgoff_t		dataPos;
+	pgoff_t		dataPos;		/* valid only if dataState=K_OFFSET_POS_SET */
 } lclTocEntry;
 
 
@@ -144,8 +142,6 @@ InitArchiveFmt_Custom(ArchiveHandle *AH)
 	AH->lo_buf_size = LOBBUFSIZE;
 	AH->lo_buf = (void *) pg_malloc(LOBBUFSIZE);
 
-	ctx->filePos = 0;
-
 	/*
 	 * Now open the file
 	 */
@@ -185,7 +181,6 @@ InitArchiveFmt_Custom(ArchiveHandle *AH)
 
 		ReadHead(AH);
 		ReadToc(AH);
-		ctx->dataStart = _getFilePos(AH, ctx);
 	}
 
 }
@@ -290,7 +285,8 @@ _StartData(ArchiveHandle *AH, TocEntry *te)
 	lclTocEntry *tctx = (lclTocEntry *) te->formatData;
 
 	tctx->dataPos = _getFilePos(AH, ctx);
-	tctx->dataState = K_OFFSET_POS_SET;
+	if (tctx->dataPos >= 0)
+		tctx->dataState = K_OFFSET_POS_SET;
 
 	_WriteByte(AH, BLK_DATA);	/* Block type */
 	WriteInt(AH, te->dumpId);	/* For sanity check */
@@ -350,7 +346,8 @@ _StartBlobs(ArchiveHandle *AH, TocEntry *te)
 	lclTocEntry *tctx = (lclTocEntry *) te->formatData;
 
 	tctx->dataPos = _getFilePos(AH, ctx);
-	tctx->dataState = K_OFFSET_POS_SET;
+	if (tctx->dataPos >= 0)
+		tctx->dataState = K_OFFSET_POS_SET;
 
 	_WriteByte(AH, BLK_BLOBS);	/* Block type */
 	WriteInt(AH, te->dumpId);	/* For sanity check */
@@ -551,7 +548,6 @@ _skipBlobs(ArchiveHandle *AH)
 static void
 _skipData(ArchiveHandle *AH)
 {
-	lclContext *ctx = (lclContext *) AH->formatData;
 	size_t		blkLen;
 	char	   *buf = NULL;
 	int			buflen = 0;
@@ -575,8 +571,6 @@ _skipData(ArchiveHandle *AH)
 fatal("could not read from input file: %m");
 		}
 
-		ctx->filePos += blkLen;
-
 		blkLen = ReadInt(AH);
 	}
 
@@ -594,12 +588,10 @@ _skipData(ArchiveHandle *AH)
 

Re: gs_group_1 crashing on 13beta2/s390x

2020-07-15 Thread Christoph Berg
Re: To PostgreSQL Hackers
> On the Debian s390x buildd, the 13beta2 build is crashing:
> 
> 2020-07-15 01:19:59.149 UTC [859] LOG:  server process (PID 1415) was 
> terminated by signal 11: Segmentation fault
> 2020-07-15 01:19:59.149 UTC [859] DETAIL:  Failed process was running: create 
> table gs_group_1 as
>   select g100, g10, sum(g::numeric), count(*), max(g::text)
>   from gs_data_1 group by cube (g1000, g100,g10);

I wired gdb into the build process and got this backtrace:

2020-07-15 16:03:38.310 UTC [21073] LOG:  server process (PID 21575) was 
terminated by signal 11: Segmentation fault
2020-07-15 16:03:38.310 UTC [21073] DETAIL:  Failed process was running: create 
table gs_group_1 as
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);

 build/src/bin/pg_upgrade/tmp_check/data.old/core 
[New LWP 21575]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildd regression [local] CREATE TABLE AS  
 '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  datumCopy (typByVal=false, typLen=-1, value=0) at 
./build/../src/backend/utils/adt/datum.c:142
142 if (VARATT_IS_EXTERNAL_EXPANDED(vl))
#0  datumCopy (typByVal=false, typLen=-1, value=0) at 
./build/../src/backend/utils/adt/datum.c:142
vl = 0x0
res = 
res = 
vl = 
eoh = 
resultsize = 
resultptr = 
realSize = 
resultptr = 
realSize = 
resultptr = 
#1  datumCopy (value=0, typByVal=false, typLen=-1) at 
./build/../src/backend/utils/adt/datum.c:131
res = 
vl = 
eoh = 
resultsize = 
resultptr = 
realSize = 
resultptr = 
#2  0x02aa04423af8 in finalize_aggregate 
(aggstate=aggstate@entry=0x2aa05775920, peragg=peragg@entry=0x2aa056e02f0, 
resultVal=0x2aa056e0208, resultIsNull=0x2aa056e022a, pergroupstate=, pergroupstate=) at 
./build/../src/backend/executor/nodeAgg.c:1128
fcinfodata = {fcinfo = {flinfo = 0x2aa056e0250, context = 
0x2aa05775920, resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 1, 
args = 0x3fff6a7b578}, fcinfo_data = 
"\000\000\002\252\005n\002P\000\000\002\252\005wY ", '\000' , 
"\247\000\001\000\000\002\252\005t\265\250\000\000\003\377\211\341\207F\000\000\003\377\000\000\002\001\000\000\000\000\000\000\003\376\000\000\000\000\000\000\017\370\000\000\000\000\000\000\001\377\000\000\000\000\000\000\000\260\000\000\000k\000\000\000k\000\000\000\000\000\000
 \000\000\000\003\377\213\016J 
\000\000\000p\000\000\000k\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\020",
 '\000' , 
"w\000\000\000|\000\000\000\000\000\000\000\002\000\000\002\252\006&9\250\000\000\002\252\005wZh\000\000\002\252\005wZH\000\000\003\377\213\n_\210\000\000\000\000\000\000\000\000\000\000"...}
fcinfo = 0x3fff6a7b558
anynull = 
oldContext = 
i = 
lc = 
pertrans = 
#3  0x02aa04423ff4 in finalize_aggregates 
(aggstate=aggstate@entry=0x2aa05775920, peraggs=peraggs@entry=0x2aa056e0240, 
pergroup=0x2aa056c8ed8) at ./build/../src/backend/executor/nodeAgg.c:1345
peragg = 0x2aa056e02f0
transno = 
pergroupstate = 0x2aa056c8ef8
econtext = 
aggvalues = 0x2aa056e01f8
aggnulls = 0x2aa056e0228
aggno = 2
transno = 
#4  0x02aa04424f5c in agg_retrieve_direct (aggstate=0x2aa05775920) at 
./build/../src/backend/executor/nodeAgg.c:2480
econtext = 0x2aa05776080
firstSlot = 0x2aa062639a8
numGroupingSets = 
node = 
tmpcontext = 0x2aa05775d60
peragg = 0x2aa056e0240
outerslot = 
nextSetSize = 
pergroups = 0x2aa056c8ea8
result = 
hasGroupingSets = 
currentSet = 
numReset = 
i = 
node = 
econtext = 
tmpcontext = 
peragg = 
pergroups = 
outerslot = 
firstSlot = 
result = 
hasGroupingSets = 
numGroupingSets = 
currentSet = 
nextSetSize = 
numReset = 
i = 
#5  ExecAgg (pstate=0x2aa05775920) at 
./build/../src/backend/executor/nodeAgg.c:2140
node = 0x2aa05775920
result = 0x0
#6  0x02aa0441001a in ExecProcNode (node=0x2aa05775920) at 
./build/../src/include/executor/executor.h:245
No locals.
#7  ExecutePlan (execute_once=, dest=0x2aa0565fa58, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, 
planstate=0x2aa05775920, estate=0x2aa057756f8) at 
./build/../src/backend/executor/execMain.c:1646
slot = 
current_tuple_count = 0
slot = 
current_tuple_count = 
#8  standard_ExecutorRun (queryDesc=0x2aa062df508, direction=, 
count=0, execute_once=) at 
./build/../sr

Dependencies for partitioned indexes are still a mess

2020-07-15 Thread Tom Lane
I've been experimenting with trying to dump-and-restore the
regression database, which is a test case that for some reason
we don't cover in the buildfarm (pg_upgrade is not the same thing).
It seems like the dependency choices we've made for partitioned
indexes are a complete failure for this purpose.

Setup:

1. make installcheck
2. Work around the bug complained of at [1]:
   psql regression -c 'drop table gtest30_1, gtest1_1'
3. pg_dump -Fc regression >regression.dump

Issue #1: "--clean" does not work

1. createdb r2
2. pg_restore -d r2 regression.dump
3. pg_restore --clean -d r2 regression.dump

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6606; 1259 35458 INDEX idxpart32_a_idx postgres
pg_restore: error: could not execute query: ERROR:  cannot drop index 
public.idxpart32_a_idx because index public.idxpart3_a_idx requires it
HINT:  You can drop index public.idxpart3_a_idx instead.
Command was: DROP INDEX public.idxpart32_a_idx;
pg_restore: from TOC entry 6605; 1259 35454 INDEX idxpart31_a_idx postgres
pg_restore: error: could not execute query: ERROR:  cannot drop index 
public.idxpart31_a_idx because index public.idxpart3_a_idx requires it
HINT:  You can drop index public.idxpart3_a_idx instead.
Command was: DROP INDEX public.idxpart31_a_idx;
...
pg_restore: from TOC entry 6622; 2606 35509 CONSTRAINT pk52 pk52_pkey postgres
pg_restore: error: could not execute query: ERROR:  cannot drop inherited 
constraint "pk52_pkey" of relation "pk52"
Command was: ALTER TABLE ONLY regress_indexing.pk52 DROP CONSTRAINT pk52_pkey;
pg_restore: from TOC entry 6620; 2606 35504 CONSTRAINT pk51 pk51_pkey postgres
pg_restore: error: could not execute query: ERROR:  cannot drop inherited 
constraint "pk51_pkey" of relation "pk51"
Command was: ALTER TABLE ONLY regress_indexing.pk51 DROP CONSTRAINT pk51_pkey;
pg_restore: from TOC entry 6618; 2606 35502 CONSTRAINT pk5 pk5_pkey postgres
pg_restore: error: could not execute query: ERROR:  cannot drop inherited 
constraint "pk5_pkey" of relation "pk5"
Command was: ALTER TABLE ONLY regress_indexing.pk5 DROP CONSTRAINT pk5_pkey;
...

(There seem to be some other problems as well, but most of the 54 complaints
are related to partitioned indexes/constraints.)

Issue #2: parallel restore does not work

1. dropdb r2; createdb r2
2. pg_restore -j8 -d r2 regression.dump 

This is fairly timing-dependent, but some attempts fail with messages
like

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
pg_restore: error: could not execute query: ERROR:  there is no unique 
constraint matching given keys for referenced table "pk"
Command was: ALTER TABLE fkpart3.fk
ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);

The problem here seems to be that some commands like this:

ALTER INDEX fkpart3.pk5_pkey ATTACH PARTITION fkpart3.pk52_pkey;

are not executed soon enough, indicating that we lack dependencies
that would guarantee the restore order.

I have not analyzed these issues in any detail -- they're just bugs
I tripped over while testing parallel pg_restore.  In particular
I do not know if #1 and #2 have the same root cause.

regards, tom lane

[1] https://www.postgresql.org/message-id/3169466.1594841366%40sss.pgh.pa.us




Re: Dumping/restoring fails on inherited generated column

2020-07-15 Thread Tom Lane
Peter Eisentraut  writes:
>> Right, there were a number of combinations that were not properly
>> handled.  The attached patch should fix them all.  It's made against
>> PG12 but also works on master.  See contained commit message and
>> documentation for details.

> committed to master and PG12

So ... this did not actually fix the dump/restore problem.  In fact,
it's worse, because in HEAD I see two failures not one when doing the
same test proposed at the start of this thread:

1. make installcheck
2. pg_dump -Fc regression >r.dump
3. createdb r2
4. pg_restore -d r2 r.dump

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6253; 2604 226187 DEFAULT gtest1_1 b postgres
pg_restore: error: could not execute query: ERROR:  column "b" of relation 
"gtest1_1" is a generated column
Command was: ALTER TABLE ONLY public.gtest1_1 ALTER COLUMN b SET DEFAULT (a * 
2);


pg_restore: from TOC entry 6279; 2604 227276 DEFAULT gtest30_1 b postgres
pg_restore: error: could not execute query: ERROR:  cannot use column reference 
in DEFAULT expression
Command was: ALTER TABLE ONLY public.gtest30_1 ALTER COLUMN b SET DEFAULT (a * 
2);


pg_restore: warning: errors ignored on restore: 2

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2020-07-15 Thread Andrew Dunstan


On 5/15/20 4:46 PM, Daniel Gustafsson wrote:
>
> My plan is to keep hacking at this to have it reviewable for the 14 cycle, so
> if anyone has an interest in NSS, then I would love to hear feedback on how it
> works (and doesn't work).


I'll be happy to help, particularly with Windows support and with some
of the callback stuff I've had a hand in.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Postgres is not able to handle more than 4k tables!?

2020-07-15 Thread Konstantin Knizhnik



On 15.07.2020 18:03, Alvaro Herrera wrote:

On 2020-Jul-15, Konstantin Knizhnik wrote:



On 15.07.2020 02:17, Alvaro Herrera wrote:

On 2020-Jul-10, Konstantin Knizhnik wrote:


@@ -3076,6 +3080,10 @@ relation_needs_vacanalyze(Oid relid,
instuples = tabentry->inserts_since_vacuum;
anltuples = tabentry->changes_since_analyze;
+   rel = RelationIdGetRelation(relid);
+   oldestXmin = 
TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM), 
rel);
+   RelationClose(rel);

*cough*


Sorry, Alvaro.
Can you explain this *cough*
You didn't like that relation is opened  just to call GetOldestXmin?
But this functions requires Relation. Do you suggest to rewrite it so that
it is possible to pass just Oid of relation?

At that point of autovacuum, you don't have a lock on the relation; the
only thing you have is a pg_class tuple (and we do it that way on
purpose as I recall).  I think asking relcache for it is dangerous, and
moreover requesting relcache for it directly goes counter our normal
coding pattern.  At the very least you should have a comment explaining
why you do it and why it's okay to do it, and also handle the case when
RelationIdGetRelation returns null.

However, looking at the bigger picture I wonder if it would be better to
test the getoldestxmin much later in the process to avoid this whole
issue.  Just carry forward the relation until the point where vacuum is
called ... that may be cleaner?  And the extra cost is not that much.


Thank you for explanation.
I have prepared new version of the patch which opens relation with care.
Concerning your suggestion to perform this check later (in vacuum_rel() 
I guess?)

I tried to implement it but faced with another problem.

Right now information about autovacuum_oldest_xmin for relationis stored 
in statistic (PgStat_StatTabEntry)
together with other atovacuum related fields like 
autovac_vacuum_timestamp, autovac_analyze_timestamp,...)
I am not sure that it is right place for storing autovacuum_oldest_xmin 
but I didn't want to organize in shared memory yet another hash table

just for keeping this field. May be you can suggest something better...
But right now it is stored here when vacuum is completed.

PgStat_StatTabEntry is obtained by get_pgstat_tabentry_relid() which 
also needs  pgstat_fetch_stat_dbentry(MyDatabaseId) and 
pgstat_fetch_stat_dbentry(InvalidOid).  I do not want to copy all this 
stuff to vacuum.c.
It seems to me to be easier to open relation in 
relation_needs_vacanalyze(), isn;t it?



diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1bbc4598f7..f7e17a9bbf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -607,7 +607,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	pgstat_report_vacuum(RelationGetRelid(onerel),
 		 onerel->rd_rel->relisshared,
 		 new_live_tuples,
-		 vacrelstats->new_dead_tuples);
+		 vacrelstats->new_dead_tuples,
+		 OldestXmin);
 	pgstat_progress_end_command();
 
 	/* and log the action if appropriate */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 9c7d4b0c60..00f7cb4ca1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -92,6 +92,7 @@
 #include "storage/lmgr.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/smgr.h"
@@ -2992,6 +2993,9 @@ relation_needs_vacanalyze(Oid relid,
 	TransactionId xidForceLimit;
 	MultiXactId multiForceLimit;
 
+	TransactionId oldestXmin = InvalidTransactionId;
+	Relationrel;
+
 	AssertArg(classForm != NULL);
 	AssertArg(OidIsValid(relid));
 
@@ -3076,6 +3080,20 @@ relation_needs_vacanalyze(Oid relid,
 		instuples = tabentry->inserts_since_vacuum;
 		anltuples = tabentry->changes_since_analyze;
 
+		/*
+		 * We need to calculate oldestXmin for relation in the same way as it is done in vacuum.c
+		 * (see comment in vacuum_set_xid_limits).
+		 * This is why we have to open relation.
+		 */
+		if (ConditionalLockRelationOid(relid, AccessShareLock))
+		{
+			rel = try_relation_open(relid, NoLock);
+			if (rel)
+			{
+oldestXmin = TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM), rel);
+relation_close(rel, AccessShareLock);
+			}
+		}
 		vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
 		vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
 		anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
@@ -3094,10 +3112,24 @@ relation_needs_vacanalyze(Oid relid,
  NameStr(classForm->relname),
  vactuples, vacthresh, anltuples, anlthresh);
 
-		/* Determine if this table needs vacuum or analyze. */
-		*dovacuum = force_vacuum || (vactuples > vacthresh) ||
-

Re: BUG #16419: wrong parsing BC year in to_date() function

2020-07-15 Thread David G. Johnston
On Tue, May 12, 2020 at 8:56 PM Laurenz Albe 
wrote:

> On Tue, 2020-05-12 at 18:09 -0700, David G. Johnston wrote:
> > Redirecting to -hackers for visibility.  I feel there needs to be
> something done here, even if just documentation (a bullet in the usage
> notes section - and a code comment update for the macro)
> > pointing this out and not changing any behavior.
>
> Since "to_date" is an Oracle compatibility function, here is what Oracle
> 18.4 has to say to that:
>
> SQL> SELECT to_date('', '') FROM dual;
> SELECT to_date('', '') FROM dual
>*
> ERROR at line 1:
> ORA-01841: (full) year must be between -4713 and +, and not be 0
>
>
Attached is a concrete patch (back-patchable hopefully) documenting the
current reality.

As noted in the patch commit message (commentary really):

make_timestamp not agreeing with make_date on how to handle negative years
should probably just be fixed - but that is for someone else to handle.

Whether to actually change the behavior of to_date is up for debate though
I would presume it would not be back-patched.

David J.


v1-001-to-date-behavior-in-bc-bug16419.patch
Description: Binary data


Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher

2020-07-15 Thread Andres Freund
Hi,

On 2020-07-15 20:33:59 +0530, Bharath Rupireddy wrote:
> >
> > +1.  I will commit this tomorrow unless someone thinks otherwise.
> >
>
> I think versions <= 12, have "pqsignal(SIGHUP,
> logicalrep_launcher_sighup)", not sure why and which commit removed
> logicalrep_launcher_sighup().

commit 1e53fe0e70f610c34f4c9e770d108cd94151342c
Author: Robert Haas 
Date:   2019-12-17 13:03:57 -0500

Use PostgresSigHupHandler in more places.

There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.

Patch by me, reviewed by Andres Freund and Daniel Gustafsson.

Discussion: 
http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=pkbakdqaltc_9qoz1m...@mail.gmail.com

Indeed looks like a typo. Robert, do you concur?

Andres




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-15 Thread Andres Freund
Hi,

On 2020-07-14 15:59:21 -0400, Robert Haas wrote:
> On Tue, Jul 14, 2020 at 3:42 PM Andres Freund  wrote:
> > The "found xmin ... from before relfrozenxid ..." cases should all be
> > fixable without needing such a function, and without it making fixing
> > them significantly easier, no? As far as I understand your suggested
> > solution, you need the tid(s) of these tuples, right? If you have those,
> > I don't think it's meaningfully harder to INSERT ... DELETE WHERE ctid =
> >  or something like that.
> >
> > ISTM that the hard part is finding all problematic tuples in an
> > efficient manner (i.e. that doesn't require one manual VACUUM for each
> > individual block + parsing VACUUMs error message), not "fixing" those
> > tuples.
> 
> I haven't tried the INSERT ... DELETE approach, but I've definitely
> seen a case where a straight UPDATE did not fix the problem; VACUUM
> continued failing afterwards.

The only way I can see that to happen is for the old tuple's multixact
being copied forward. That'd not happen with INSERT ... DELETE.


> In that case, it was a system catalog
> that was affected, and not one where TRUNCATE + re-INSERT was remotely
> practical.

FWIW, an rewriting ALTER TABLE would likely also fix it. But obviously
that'd require allow_system_table_mods...



> Do you have a reason for believing that INSERT ... DELETE is going to
> be better than UPDATE? It seems to me that either way you can end up
> with a deleted and thus invisible tuple that you still can't get rid
> of.

None of the "new" checks around freezing would apply to deleted
tuples. So we shouldn't fail with an error like $subject.

Greetings,

Andres Freund




Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher

2020-07-15 Thread Bharath Rupireddy
>
> +1.  I will commit this tomorrow unless someone thinks otherwise.
>

I think versions <= 12, have "pqsignal(SIGHUP,
logicalrep_launcher_sighup)", not sure why and which commit removed
logicalrep_launcher_sighup().

We might have to also backpatch this patch to version 13.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: TAP tests and symlinks on Windows

2020-07-15 Thread Andrew Dunstan

On 7/14/20 1:31 AM, Michael Paquier wrote:
> On Fri, Jul 10, 2020 at 07:58:02AM -0400, Andrew Dunstan wrote:
>> After much frustration and gnashing of teeth here's a patch that allows
>> almost all the TAP tests involving symlinks to work as expected on all
>> Windows build environments, without requiring an additional Perl module.
>> I have tested this on a system that is very similar to that running
>> drongo and fairywren, with both msys2 and MSVC builds.
> Thanks Andrew for looking at the part with MSYS.  The tests pass for
> me with MSVC.  The trick with mklink is cool.  I have not considered
> that, and the test code gets simpler.
>
> +   my $cmd = qq{mklink /j "$newname" "$oldname"};
> +   if ($Config{osname} eq 'msys')
> +   {
> +   # need some indirection on msys
> +   $cmd = qq{echo '$cmd' | \$COMSPEC /Q};
> +   }
> +   note("dir_symlink cmd: $cmd");
> +   system($cmd);
> From the quoting perspective, wouldn't it be simpler to build an array
> with all those arguments and call system() with @cmd?



This is the simplest invocation I found to be reliable on msys2 (and it
took me a long time to find). If you have a tested alternative please
let me know.


> +# Create files that look like temporary relations to ensure they are ignored
> +# in a tablespace.
> +my @tempRelationFiles = qw(t888_888 t88_88_vm.1);
> This variable conflicts with a previous declaration, creating a
> warning.
>
> +   skip "symlink check not implemented on Windows", 1
> + if ($windows_os);
> opendir(my $dh, "$pgdata/pg_tblspc") or die;
> I think that this would be cleaner with a SKIP block.



I don't understand this comment. The skip statement here is in a SKIP
block. In fact skip only works inside SKIP blocks. (perldoc Test::More
for details). Maybe you got confused by the diff format.


>
> +Portably create a symlink for a director. On Windows this creates a junction.
> +Elsewhere it just calls perl's builtin symlink.
> s/director/directory/
> s/junction/junction point/



fixed.


>
>
>  The TAP tests require the Perl module IPC::Run.
>  This module is available from CPAN or an operating system package.
> +On Windows, Win32API::File is also required .
>
> This part should be backpatched IMO.



I will do this in  a separate backpatched commit.



>
> Some of the indentation is weird, this needs a cleanup with perltidy.


Done.


Revised patch attached.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 208df557b8..f674a7c94e 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -211,87 +211,93 @@ $node->command_fails(
 	'pg_basebackup tar with long name fails');
 unlink "$pgdata/$superlongname";
 
-# The following tests test symlinks. Windows doesn't have symlinks, so
-# skip on Windows.
+# The following tests are for symlinks.
+
+# Move pg_replslot out of $pgdata and create a symlink to it.
+$node->stop;
+
+# Set umask so test directories and files are created with group permissions
+umask(0027);
+
+# Enable group permissions on PGDATA
+chmod_recursive("$pgdata", 0750, 0640);
+
+rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
+  or BAIL_OUT "could not move $pgdata/pg_replslot";
+dir_symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
+  or BAIL_OUT "could not symlink to $pgdata/pg_replslot";
+
+$node->start;
+
+# Create a temporary directory in the system location and symlink it
+# to our physical temp location.  That way we can use shorter names
+# for the tablespace directories, which hopefully won't run afoul of
+# the 99 character length limit.
+my $shorter_tempdir = TestLib::tempdir_short . "/tempdir";
+dir_symlink "$tempdir", $shorter_tempdir;
+
+mkdir "$tempdir/tblspc1";
+my $realTsDir= TestLib::perl2host("$shorter_tempdir/tblspc1");
+my $real_tempdir = TestLib::perl2host($tempdir);
+$node->safe_psql('postgres',
+	"CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';");
+$node->safe_psql('postgres',
+	"CREATE TABLE test1 (a int) TABLESPACE tblspc1;");
+$node->command_ok(
+	[ 'pg_basebackup', '-D', "$real_tempdir/tarbackup2", '-Ft' ],
+	'tar format with tablespaces');
+ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created');
+my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
+is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+rmtree("$tempdir/tarbackup2");
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;');
+
+my $tblspc1UnloggedPath = $node->safe_psql('postgres',
+	q{select pg_relation_filepath('tblspc1_unlogged')});
+
+# Make sure main and init forks exist
+ok( -f "$pgdata/${tblspc1UnloggedPath}_init",
+	

Re: Postgres is not able to handle more than 4k tables!?

2020-07-15 Thread Alvaro Herrera
On 2020-Jul-15, Konstantin Knizhnik wrote:

> 
> 
> On 15.07.2020 02:17, Alvaro Herrera wrote:
> > On 2020-Jul-10, Konstantin Knizhnik wrote:
> > 
> > > @@ -3076,6 +3080,10 @@ relation_needs_vacanalyze(Oid relid,
> > >   instuples = tabentry->inserts_since_vacuum;
> > >   anltuples = tabentry->changes_since_analyze;
> > > + rel = RelationIdGetRelation(relid);
> > > + oldestXmin = 
> > > TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, 
> > > PROCARRAY_FLAGS_VACUUM), rel);
> > > + RelationClose(rel);
> > *cough*
> > 
> Sorry, Alvaro.
> Can you explain this *cough*
> You didn't like that relation is opened  just to call GetOldestXmin?
> But this functions requires Relation. Do you suggest to rewrite it so that
> it is possible to pass just Oid of relation?

At that point of autovacuum, you don't have a lock on the relation; the
only thing you have is a pg_class tuple (and we do it that way on
purpose as I recall).  I think asking relcache for it is dangerous, and
moreover requesting relcache for it directly goes counter our normal
coding pattern.  At the very least you should have a comment explaining
why you do it and why it's okay to do it, and also handle the case when
RelationIdGetRelation returns null.

However, looking at the bigger picture I wonder if it would be better to
test the getoldestxmin much later in the process to avoid this whole
issue.  Just carry forward the relation until the point where vacuum is
called ... that may be cleaner?  And the extra cost is not that much.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Postgres is not able to handle more than 4k tables!?

2020-07-15 Thread Konstantin Knizhnik




On 15.07.2020 02:17, Alvaro Herrera wrote:

On 2020-Jul-10, Konstantin Knizhnik wrote:


@@ -3076,6 +3080,10 @@ relation_needs_vacanalyze(Oid relid,
instuples = tabentry->inserts_since_vacuum;
anltuples = tabentry->changes_since_analyze;
  
+		rel = RelationIdGetRelation(relid);

+   oldestXmin = 
TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM), 
rel);
+   RelationClose(rel);

*cough*


Sorry, Alvaro.
Can you explain this *cough*
You didn't like that relation is opened  just to call GetOldestXmin?
But this functions requires Relation. Do you suggest to rewrite it so 
that it is possible to pass just Oid of relation?


Or you do you think that such calculation of oldestSmin is obscure and 
at lest requires some comment?
Actually, I have copied it from vacuum.c and there is a large comment 
explaining why it is calculated in this  way.

May be it is enough to add reference to vacuum.c?
Or may be create some special function for it?
I just need to oldestXmin in calculated in the same way in vacuum.c and 
autovacuum.c






BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

2020-07-15 Thread Jehan-Guillaume de Rorthais
Hi,

I'm bumping this thread on pgsql-hacker, hopefully it will drag some more
opinions/discussions.

Should we try to fix this issue or not? This is clearly an upstream bug. It has
been reported, including regression tests, but this doesn't move since 2 years
now.

If we choose not to fix it on our side using eg a workaround (see patch), I
suppose this small bug should be documented somewhere so people are not lost
alone in the wild.

Opinions?

Regards,

Begin forwarded message:

Date: Sat, 13 Jun 2020 00:43:22 +0200
From: Jehan-Guillaume de Rorthais 
To: Thomas Munro , Peter Geoghegan 
Cc: Роман Литовченко , PostgreSQL mailing lists
 Subject: Re: BUG #15285: Query used index
over field with ICU collation in some cases wrongly return 0 rows


On Fri, 12 Jun 2020 18:40:55 +0200
Jehan-Guillaume de Rorthais  wrote:

> On Wed, 10 Jun 2020 00:29:33 +0200
> Jehan-Guillaume de Rorthais  wrote:
> [...]  
> > After playing with ICU regression tests, I found functions ucol_strcollIter
> > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> > here.  
> 
> I did some benchmarks. See attachment for the script and its header to
> reproduce.
> 
> It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
> Performance tests were done on current master HEAD (buggy) and using the patch
> in attachment, relying on ucol_strcollIter.
> 
> My preliminary test with ucol_getSortKey was catastrophic, as we might
> expect. 15-17x slower than the current HEAD. So I removed it from actual
> tests. I didn't try with ucol_nextSortKeyPart though.
> 
> Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
> this might be acceptable. Here are the numbers:
> 
>DB Encoding   HEAD  strcollIter   ratio
>UTF8  2.74 3.27   1.19x
>LATIN15.34 5.40   1.01x
> 
> I plan to add a regression test soon.  

Please, find in attachment the second version of the patch, with a
regression test.

Regards,


-- 
Jehan-Guillaume de Rorthais
Dalibo
>From bb428135490caafe23562e3dcd9925d7d7c5a7be Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Thu, 11 Jun 2020 18:08:31 +0200
Subject: [PATCH] Replace buggy ucol_strcoll* funcs with ucol_strcollIter

Functions ucol_strcoll and ucol_strcollUTF8 are breaking some collation rules.
This leads to wrong sort order or wrong result from index using such
collations. See bug report #15285 for details:
http://postgr.es/m/153201618542.1404.3611626898935613264%40wrigleys.postgresql.org

This fix replace ucol_strcoll* functions with ucol_strcollIter() which is not
affected by this bug.

Reported-by: Roman Lytovchenko
Analysed-by: Peter Geoghegan
Author: Jehan-Guillaume de Rorthais
---
 src/backend/utils/adt/varlena.c   | 54 ---
 src/include/utils/pg_locale.h | 14 -
 .../regress/expected/collate.icu.utf8.out | 12 +
 src/test/regress/sql/collate.icu.utf8.sql |  8 +++
 4 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 2eaabd6231..f156c00a1a 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1638,34 +1638,43 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 			if (mylocale->provider == COLLPROVIDER_ICU)
 			{
 #ifdef USE_ICU
-#ifdef HAVE_UCOL_STRCOLLUTF8
 if (GetDatabaseEncoding() == PG_UTF8)
 {
 	UErrorCode	status;
+	UCharIterator	iter1,
+	iter2;
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(mylocale->info.icu.ucol,
-			  arg1, len1,
-			  arg2, len2,
-			  &status);
+
+	uiter_setUTF8(&iter1, arg1, len1);
+	uiter_setUTF8(&iter2, arg2, len2);
+
+	result = ucol_strcollIter(mylocale->info.icu.ucol,
+			  &iter1, &iter2, &status);
 	if (U_FAILURE(status))
 		ereport(ERROR,
 (errmsg("collation failed: %s", u_errorName(status;
 }
 else
-#endif
 {
+	UErrorCode	status;
+	UCharIterator	iter1,
+	iter2;
 	int32_t		ulen1,
 ulen2;
 	UChar	   *uchar1,
 			   *uchar2;
 
+	status = U_ZERO_ERROR;
+
 	ulen1 = icu_to_uchar(&uchar1, arg1, len1);
 	ulen2 = icu_to_uchar(&uchar2, arg2, len2);
 
-	result = ucol_strcoll(mylocale->info.icu.ucol,
-		  uchar1, ulen1,
-		  uchar2, ulen2);
+	uiter_setString(&iter1, uchar1, ulen1);
+	uiter_setString(&iter2, uchar2, ulen2);
+
+	result = ucol_strcollIter(mylocale->info.icu.ucol,
+			  &iter1, &iter2, &status);
 
 	pfree(uchar1);
 	pfree(uchar2);
@@ -2352,34 +2361,43 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 		if (sss->locale->provider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
-#ifdef HAVE_UCOL_STRCOLLUTF8
 			if (GetDatabaseEncoding() == PG_UTF8)
 			{
 UErrorCode	status;
+UCharIterator	iter1,
+iter2;
 
 s

Re: Improve handling of parameter differences in physical replication

2020-07-15 Thread Peter Eisentraut

Here is a minimally updated new patch version to resolve a merge conflict.

On 2020-06-24 10:00, Peter Eisentraut wrote:

Here is another stab at this subject.

This is a much simplified variant:  When encountering a parameter change
in the WAL that is higher than the standby's current setting, we log a
warning (instead of an error until now) and pause recovery.  If you
resume (unpause) recovery, the instance shuts down as before.

This allows you to keep your standbys running for a bit (depending on
lag requirements) and schedule the required restart more deliberately.

I had previously suggested making this new behavior configurable, but
there didn't seem to be much interest in that, so I have not included
that there.

The documentation changes are mostly carried over from previous patch
versions (but adjusted for the actual behavior of the patch).


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 6dfa5445eed5c4bf601577df1a874bf10b0e562a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Jun 2020 08:49:42 +0200
Subject: [PATCH v4 1/2] Replace a macro by a function

Using a macro is ugly and not justified here.
---
 src/backend/access/transam/xlog.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 0a97b1d37f..b9d011fa8d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6235,16 +6235,17 @@ GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream)
  * Note that text field supplied is a parameter name and does not require
  * translation
  */
-#define RecoveryRequiresIntParameter(param_name, currValue, minValue) \
-do { \
-   if ((currValue) < (minValue)) \
-   ereport(ERROR, \
-   (errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
-errmsg("hot standby is not possible because %s 
= %d is a lower setting than on the primary server (its value was %d)", \
-   param_name, \
-   currValue, \
-   minValue))); \
-} while(0)
+static void
+RecoveryRequiresIntParameter(const char *param_name, int currValue, int 
minValue)
+{
+   if (currValue < minValue)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("hot standby is not possible because %s 
= %d is a lower setting than on the primary server (its value was %d)",
+   param_name,
+   currValue,
+   minValue)));
+}
 
 /*
  * Check to see if required parameters are set high enough on this server
-- 
2.27.0

From 4b5812c41fe581042249a1af43ac96df75f14a23 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Jun 2020 09:18:21 +0200
Subject: [PATCH v4 2/2] Pause recovery for insufficient parameter settings

When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

This patch changes this behavior to pause recovery at that point
instead.  That allows read traffic on the standby to continue while
database administrators figure out next steps.  When recovery is
unpaused, the server shuts down (as before).  The idea is to fix the
parameters while recovery is paused and then restart when there is a
maintenance window.
---
 doc/src/sgml/high-availability.sgml | 47 +
 src/backend/access/transam/xlog.c   | 17 ++-
 2 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml 
b/doc/src/sgml/high-availability.sgml
index 6a9184f314..5aa8eeced2 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2158,18 +2158,14 @@ Administrator's Overview

 

-The setting of some parameters on the standby will need reconfiguration
-if they have been changed on the primary. For these parameters,
-the value on the standby must
-be equal to or greater than the value on the primary.
-Therefore, if you want to increase these values, you should do so on all
-standby servers first, before applying the changes to the primary server.
-Conversely, if you want to decrease these values, you should do so on the
-primary server first, before applying the changes to all standby servers.
-If these parameters
-are not set high enough then the standby will refuse to start.
-Higher values can then be supplied and the server
-

Re: renaming configure.in to configure.ac

2020-07-15 Thread Tom Lane
Peter Eisentraut  writes:
> It was mentioned elsewhere in passing that a new Autoconf release might 
> be coming.  That one will warn about the old naming "configure.in" and 
> request "configure.ac".  So we might want to rename that sometime. 
> Before we get into the specifics, I suggest that all interested parties 
> check whether buildfarm scripts, packaging scripts, etc. need to be 
> adjusted for the newer name.

Along the same line, I read at [1]

Because it has been such a long time, and because some of the changes
potentially break existing Autoconf scripts, we are conducting a
public beta test before the final release of version 2.70.  Please
test this beta with your autoconf scripts, and report any problems you
find to the Savannah bug tracker:

Maybe we should do some pro-active testing, rather than just waiting for
2.70 to get dropped on us?  God knows how long it will be until 2.71.

regards, tom lane

[1] https://lists.gnu.org/archive/html/autoconf/2020-07/msg6.html




Re: renaming configure.in to configure.ac

2020-07-15 Thread Andrew Dunstan


On 7/15/20 9:14 AM, Peter Eisentraut wrote:
> It was mentioned elsewhere in passing that a new Autoconf release
> might be coming.  That one will warn about the old naming
> "configure.in" and request "configure.ac".  So we might want to rename
> that sometime. Before we get into the specifics, I suggest that all
> interested parties check whether buildfarm scripts, packaging scripts,
> etc. need to be adjusted for the newer name.
>


The buildfarm does not use autoconf at all, so it won't care less what
the file is called.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: libpq: Request Pipelining/Batching status ?

2020-07-15 Thread Matthieu Garrigues
Did my message made it to the mailing list ? or not yet ?

Matthieu Garrigues


On Fri, Jul 10, 2020 at 5:08 PM Matthieu Garrigues <
matthieu.garrig...@gmail.com> wrote:

> Hi all,
>
> Do you know what is the status of Request Pipelining and/or Batching in
> libpq ?
>
> I could see that I'm not the first one to think about it, I see an item in
> the todolist:
>
> https://web.archive.org/web/20200125013930/https://wiki.postgresql.org/wiki/Todo
>
> And a thread here:
>
> https://www.postgresql-archive.org/PATCH-Batch-pipelining-support-for-libpq-td5904551i80.html
>
> And a patch here:
> https://2ndquadrant.github.io/postgres/libpq-batch-mode.html
>
> Seems like this boost performances a lot, drogon, a c++ framework
> outperform all
> other web framework thanks to this fork:
> https://www.techempower.com/benchmarks/#section=data-r19&hw=ph&test=update
> https://github.com/TechEmpower/FrameworkBenchmarks/issues/5502
>
> It would be nice to have it it the official libpq so we don't have to use
> an outdated fork
> to have this feature.
> Is anybody working on it ? is there lots of work to finalize this patch ?
>
> Thanks in advance,
> Matthieu
>
>


Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher

2020-07-15 Thread Amit Kapila
On Wed, Jul 15, 2020 at 6:21 PM Dilip Kumar  wrote:
>
> On Wed, Jul 15, 2020 at 6:17 PM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > In ApplyLauncherMain, it seems like we are having SIGTERM signal
> > mapped for config reload. I think we should be having SIGHUP for
> > SignalHandlerForConfigReload(). Otherwise we miss to take the updated
> > value for wal_retrieve_retry_interval in ApplyLauncherMain.
> >
> > Attached is a patch having this change.
> >
> > Thoughts?
>
> Yeah, it just looks like a typo so your fix looks good to me.
>

+1.  I will commit this tomorrow unless someone thinks otherwise.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




renaming configure.in to configure.ac

2020-07-15 Thread Peter Eisentraut
It was mentioned elsewhere in passing that a new Autoconf release might 
be coming.  That one will warn about the old naming "configure.in" and 
request "configure.ac".  So we might want to rename that sometime. 
Before we get into the specifics, I suggest that all interested parties 
check whether buildfarm scripts, packaging scripts, etc. need to be 
adjusted for the newer name.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: some more pg_dump refactoring

2020-07-15 Thread Peter Eisentraut

On 2020-07-09 16:14, Peter Eisentraut wrote:

On 2020-07-08 06:42, Fabien COELHO wrote:

What do you think about this patch to reorganize the existing code from that
old commit?


I think it is a definite further improvement.

Patch applies cleanly, compiles, global & pg_dump tap test ok, looks ok to
me.


Thanks.  I have committed that, and attached is my original patch
adjusted to this newer style.


committed

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Which SET TYPE don't actually require a rewrite

2020-07-15 Thread Magnus Hagander
Our Fine Manual (TM) specifies:
"As an exception, when changing the type of an existing column, if the
USING clause does not change the column contents and the old type is either
binary coercible to the new type or an unconstrained domain over the new
type, a table rewrite is not needed; but any indexes on the affected
columns must still be rebuilt."

First of all, how is a non-internals-expert even supposed to know what a
binary coercible type is? That's not a very user-friendly way to say it.

Second, how is even an expert supposed to find the list? :)

For example, we can query pg_cast for casts that are binary coercible,
that's a start, but it doesn't really tell us the answer.

We can also for example increase the precision of numeric without a rewrite
(but not scale). Or we can change between text and varchar. And we can
increase the length of a varchar but not decrease it.

Surely we can do better than this when it comes to documenting it? Even if
it's a pluggable thing so it may or may not be true of external
datatypes installed later, we should be able to at least be more clear
about the builtin types, I think?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher

2020-07-15 Thread Dilip Kumar
On Wed, Jul 15, 2020 at 6:17 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> In ApplyLauncherMain, it seems like we are having SIGTERM signal
> mapped for config reload. I think we should be having SIGHUP for
> SignalHandlerForConfigReload(). Otherwise we miss to take the updated
> value for wal_retrieve_retry_interval in ApplyLauncherMain.
>
> Attached is a patch having this change.
>
> Thoughts?

Yeah, it just looks like a typo so your fix looks good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Have SIGHUP instead of SIGTERM for config reload in logical replication launcher

2020-07-15 Thread Bharath Rupireddy
Hi,

In ApplyLauncherMain, it seems like we are having SIGTERM signal
mapped for config reload. I think we should be having SIGHUP for
SignalHandlerForConfigReload(). Otherwise we miss to take the updated
value for wal_retrieve_retry_interval in ApplyLauncherMain.

Attached is a patch having this change.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-Have-SIGHUP-for-config-reload-in-logical-replicat.patch
Description: Binary data


Volatile Functions in Parallel Plans

2020-07-15 Thread Zhenghua Lyu
Hi,
I test some SQL in the latest Postgres master branch code (we find these 
issues when
developing Greenplum database in the PR 
https://github.com/greenplum-db/gpdb/pull/10418,
and my colleague come up with the following cases in Postgres):

create table t3 (c1 text, c2 text);
CREATE TABLE
insert into t3
select
  'fhufehwiohewiuewhuhwiufhwifhweuhfwu', --random data
  'fiowehufwhfyegygfewpfwwfeuhwhufwh' --random data
from generate_series(1, 1000) i;
INSERT 0 1000
analyze t3;
ANALYZE
create table t4 (like t3);
CREATE TABLE
insert into t4 select * from t4;
INSERT 0 0
insert into t4 select * from t3;
INSERT 0 1000
analyze t4;
ANALYZE
set enable_hashjoin to off;
SET
explain (costs off)
select count(*) from t3, t4
where t3.c1 like '%sss'
  and timeofday() = t4.c1 and t3.c1 = t4.c1;
   QUERY PLAN

 Finalize Aggregate
   ->  Gather
 Workers Planned: 2
 ->  Partial Aggregate
   ->  Nested Loop
 Join Filter: (t3.c1 = t4.c1)
 ->  Parallel Seq Scan on t3
   Filter: (c1 ~~ '%sss'::text)
 ->  Seq Scan on t4
   Filter: (timeofday() = c1)
(10 rows)

explain (verbose, costs off)
select count(*)
from
  t3,
  (select *, timeofday() as x from t4 ) t4
where t3.c1 like '%sss' and
  timeofday() = t4.c1 and t3.c1 = t4.c1;
QUERY PLAN
--
 Finalize Aggregate
   Output: count(*)
   ->  Gather
 Output: (PARTIAL count(*))
 Workers Planned: 2
 ->  Partial Aggregate
   Output: PARTIAL count(*)
   ->  Nested Loop
 Join Filter: (t3.c1 = t4.c1)
 ->  Parallel Seq Scan on public.t3
   Output: t3.c1, t3.c2
   Filter: (t3.c1 ~~ '%sss'::text)
 ->  Seq Scan on public.t4
   Output: t4.c1, NULL::text, timeofday()
   Filter: (timeofday() = t4.c1)
(15 rows)


Focus on the last two plans, the function timeofday is
volatile but paralle-safe. And Postgres outputs two parallel
plan.


The first plan:
 Finalize Aggregate
   ->  Gather
 Workers Planned: 2
 ->  Partial Aggregate
   ->  Nested Loop
 Join Filter: (t3.c1 = t4.c1)
 ->  Parallel Seq Scan on t3
   Filter: (c1 ~~ '%sss'::text)
 ->  Seq Scan on t4
   Filter: (timeofday() = c1)

The join's left tree is parallel scan and the right tree is seq scan.
This algorithm is correct using the distribute distributive law of
distributed join:
   A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) 
... (An join B) )

The correctness of the above law should have a pre-assumption:
  The data set of B is the same in each join: (A1 join B) (A2 join B) ... 
(An join B)

But things get complicated when volatile functions come in. Timeofday is just
an example to show the idea. The core is volatile functions  can return 
different
results on successive calls with the same arguments. Thus the following piece,
the right tree of the join
 ->  Seq Scan on t4
   Filter: (timeofday() = c1)
can not be considered consistent everywhere in the scan workers.

The second plan

 Finalize Aggregate
   Output: count(*)
   ->  Gather
 Output: (PARTIAL count(*))
 Workers Planned: 2
 ->  Partial Aggregate
   Output: PARTIAL count(*)
   ->  Nested Loop
 Join Filter: (t3.c1 = t4.c1)
 ->  Parallel Seq Scan on public.t3
   Output: t3.c1, t3.c2
   Filter: (t3.c1 ~~ '%sss'::text)
 ->  Seq Scan on public.t4
   Output: t4.c1, NULL::text, timeofday()
   Filter: (timeofday() = t4.c1)

have voltile projections in the right tree of the nestloop:

 ->  Seq Scan on public.t4
   Output: t4.c1, NULL::text, timeofday()
   Filter: (timeofday() = t4.c1)

It should not be taken as consistent in different workers.

--

The above are just two cases we find today. And it should be enough to
show the core issue to have a discussion here.

The question is, should we consider volatile functions when generating
parallel plans?

--
FYI, some plan diffs of Greenplum can be found here: 
https://www.diffnow.com/report/etulf





Re: Mark btree_gist functions as PARALLEL SAFE

2020-07-15 Thread Alexander Korotkov
On Thu, Jun 18, 2020 at 7:48 PM Winfield, Steven
 wrote:
> Done - thanks again.

This patch looks good to me.

I've rechecked it marks all the functions as parallel safe by
installing an extension and querying the catalog.  I've also rechecked
that there is nothing suspicious in these functions in terms of
parallel safety.  I did just minor adjustments in migration script
comments.

I'm going to push this if no objections.

--
Regards,
Alexander Korotkov


0001-Update-btree_gist-extension-for-parallel-query.patch
Description: Binary data


Re: Partitioning and postgres_fdw optimisations for multi-tenancy

2020-07-15 Thread Etsuro Fujita
On Wed, Jul 15, 2020 at 12:12 AM Alexey Kondratov
 wrote:
> On 2020-07-14 15:27, Ashutosh Bapat wrote:
> > On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> >  wrote:
> >> Some real-life test queries show, that all single-node queries aren't
> >> pushed-down to the required node. For example:
> >>
> >> SELECT
> >>  *
> >> FROM
> >>  documents
> >>  INNER JOIN users ON documents.user_id = users.id
> >> WHERE
> >>  documents.company_id = 5
> >>  AND users.company_id = 5;
> >
> > There are a couple of things happening here
> > 1. the clauses on company_id in WHERE clause are causing partition
> > pruning. Partition-wise join is disabled with partition pruning before
> > PG13.

More precisely, PWJ cannot be applied when there are no matched
partitions on the nullable side due to partition pruning before PG13.
But the join is an inner join, so I think PWJ can still be applied for
the join.

> > In PG13 we have added advanced partition matching algorithm
> > which will allow partition-wise join with partition pruning.

> BTW, can you, please, share a link to commit / thread about allowing
> partition-wise join and partition pruning to work together in PG13?

I think the link would be this:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c8434d64ce03c32e0029417a82ae937f2055268f

Unfortunately, advanced PWJ added by the commit only allows PWJ and
partition pruning to work together for list/range partitioned tables,
not for hash partitioned tables.  However, I think the commit would
have nothing to do with the issue here, because 1) the tables involved
in the join have the same partition bounds, and 2) the commit doesn't
change the behavior of such a join.

Best regards,
Etsuro Fujita




Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-15 Thread Masahiro Ikeda

On 2020-07-15 15:06, Masahiko Sawada wrote:
On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda  
wrote:


> I've attached the latest version patches. I've incorporated the review
> comments I got so far and improved locking strategy.

Thanks for updating the patch!
I have three questions about the v23 patches.


1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING:  canceling wait for resolving foreign transaction due to user
request
DETAIL:  The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR:  server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
CONTEXT:  remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```

B. When prepare is succeeded,
the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING:  canceling wait for resolving foreign transaction due to user
request
DETAIL:  The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```

In case of A, I think that "committed locally" message can confuse 
user.

Because although messages show committed but the transaction is
"ABORTED".

I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?


No, you're right. I'll fix it in the next version patch.

I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.


Thanks for replying and sharing the synchronous replication problem.


BTW how did you test the case (A)? It says canceling wait for foreign
transaction resolution but the remote SQL command is PREPARE
TRANSACTION.


I think the timing of failure is important for 2PC test.
Since I don't have any good solution to simulate those flexibly,
I use the GDB debugger.

The message of the case (A) is sent
after performing the following operations.

1. Attach the debugger to a backend process.
2. Set a breakpoint to PreCommit_FdwXact() in CommitTransaction().
   // Before PREPARE.
3. Execute "BEGIN" and insert data into two remote foreign tables.
4. Issue a "Commit" command
5. The backend process stops at the breakpoint.
6. Stop a remote foreign server.
7. Detach the debugger.
  // The backend continues and prepare is failed. TR try to abort all 
remote txs.
  // It's unnecessary to resolve remote txs which prepare is failed, 
isn't it?

8. Send a cancel request.


BTW, I concerned that how to test the 2PC patches.
There are many failure patterns, such as failure timing,
failure server/nw (and unexpected recovery), and those combinations...

Though it's best to test those failure patterns automatically,
I have no idea for now, so I manually check some patterns.



I've incorporated the above your comments in the local branch. I'll
post the latest version patch after incorporating other comments soon.


OK, Thanks.


Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION




Re: SQL/JSON: functions

2020-07-15 Thread Andrew Dunstan


On 7/14/20 9:47 PM, Nikita Glukhov wrote:
> Attached 49th version of the patches with two new patches #5 and #6.
>
> On 15.07.2020 00:09, Andrew Dunstan wrote:
>> On 7/14/20 1:00 PM, Andrew Dunstan wrote:
>>> On 7/5/20 1:29 PM, Justin Pryzby wrote:
 On Mon, Mar 23, 2020 at 08:28:52PM +0300, Nikita Glukhov wrote:
> Attached 47th version of the patches.
 The patch checker/cfbot says this doesn't apply ; could you send a 
 rebasified
 version ?

>>> To keep things moving, I've rebased these patches. However, 1) the docs
>>> patches use  in many cases where they
>>> should now just use  
> I haven't touched  yet, because I'm not sure
> if  or  is correct here at all.



Here's the relevant commit message that explains the policy:


commit 47046763c3
Author: Tom Lane 
Date:   Mon May 4 13:48:30 2020 -0400

    Doc: standardize markup a bit more.
   
    We had a mishmash of , ,
    and  markup for operator/function arguments.  Use 
    consistently for things that are in fact names of parameters (including
    OUT parameters), reserving  for things that aren't.  The
    latter class includes some made-up-by-the-docs type class names, like
    "numeric_type", as well as placeholders for arguments that don't have
    well-defined types.  Possibly we could do better with those categories
    as well, but for the moment I'm content not to have parameter names
    marked up in different ways in different places.



>
>> Turns out these patches also need to get the message on the new way of
>> writing entries in func.sgml - I'll publish some updates on that in the
>> next day or so so that "make doc" will succeed.
> I can do it by myself, but I just need to understand what to fix and how.



Here's an example that shows how it was transformed for json_agg:


- 
-  
-   
-    json_agg
-   
-   json_agg(expression)
-  
-  
-   any
-  
-  
-   json
-  
-  No
-  aggregates values, including nulls, as a JSON array
- 
+  
+   
+    
+ json_agg
+    
+    json_agg ( anyelement )
+    json
+   
+   
+    
+ jsonb_agg
+    
+    jsonb_agg ( anyelement )
+    jsonb
+   
+   
+    Collects all the input values, including nulls, into a JSON array.
+    Values are converted to JSON as per to_json
+    or to_jsonb.
+   
+   No
+  
 



>>> and b) patch 4 fails when run under force_parallel=regress.
> Fixed parallel-safety check for RETURNING clause of JSON_EXISTS().
>
> On 05.04.2020 19:50, Alexander Korotkov wrote:
> > 1) Uniqueness checks using JsonbUniqueCheckContext and
> > JsonUniqueCheckContext have quadratic complexity over number of keys.
> > That doesn't look good especially for jsonb, which anyway sorts object
> > keys before object serialization.
> > 2) We have two uniqueness checks for json type, which use
> > JsonbUniqueCheckContext and JsonUniqueState.  JsonUniqueState uses
> > stack of hashes, while JsonbUniqueCheckContext have just plain array
> > of keys.  I think we can make JsonUniqueState use single hash, where
> > object identifies would be part of hash key.  And we should replace
> > JsonbUniqueCheckContext with JsonUniqueState.  That would eliminate
> > extra entities and provide reasonable complexity for cases, which now
> > use JsonbUniqueCheckContext.
>
> Unique checks were refactored as Alexander proposed.
>
> > 3) New SQL/JSON clauses don't use timezone and considered as immutable
> > assuming all the children are immutable.  Immutability is good, but
> > ignoring timezone in all the cases is plain wrong.  The first thing we
> > can do is to use timezone and make SQL/JSON clauses stable.  But that
> > limits their usage in functional and partial indexes.  I see couple of
> > things we can do next (one of them or probably both).
> > 3.1) Provide user a way so specify that we should ignore timezone in
> > particular case (IGNORE TIMEZONE clause or something like that).  Then
> > SQL/JSON clause will be considered as immutable.
> > 3.2) Automatically detect whether jsonpath might use timezone.  If
> > jsonpath doesn't use .datetime() method, it doesn't need timezone for
> > sure.  Also, from the datetime format specifiers we can get that we
> > don't compare non-timezoned values to timezoned values.  So, if we
> > detect this jsonpath never uses timezone, we can consider SQL/JSON
> > clause as immutable.
>
> Implemented second variant with automatic detection.
>
> I also tried to add explicit IGNORE TIMEZONE / IMMUTABLE clauses, but all of
> them lead to shift/reduce conflicts that seem not easy to resolve.




Not sure if this helps:


https://wiki.postgresql.org/wiki/Debugging_the_PostgreSQL_grammar_(Bison)



>
>
>
> Patch #5 implements functions for new JSON type that is expected to appear in
> the upcoming SQL/JSON standard:
>
>  - JSON() is for constructing JSON typed values from JSON text.
>It is almost equivalen

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-15 Thread Amit Kapila
On Wed, Jul 15, 2020 at 4:51 PM Ajin Cherian  wrote:
>
> On Wed, Jul 15, 2020 at 2:05 PM Dilip Kumar  wrote:
>>
>>   Please see the
>> latest patch set v33.
>>
>>
>>
>
> I have a minor comment. You've defined a new function 
> ReorderBufferStartStreaming() but the function doesn't actually start 
> streaming but is used to find out if you can start streaming and it returns a 
> boolean. Can't you name it accordingly?
> Probably ReorderBufferCanStartStreaming(). I understand that it internally 
> calls ReorderBufferCanStream() which is similar sounding but I think that 
> should not matter.
>

+1.  I am actually editing some of the patches and I have already
named it as you are suggesting.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




file_fdw vs relative paths

2020-07-15 Thread Magnus Hagander
According to the documentation, the filename given in file_fdw must be an
absolute path. Hwever, it works perfectly fine with a relative path.

So either the documentation is wrong, or the code is wrong. It behaves the
same at least back to 9.5, I did not try it further back than that.

I can't find a reference to the code that limits this. AFAICT the
documentation has been there since day 1.

Question is, which one is right. Is there a reason we'd want to restrict it
to absolute pathnames?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-15 Thread Ajin Cherian
On Wed, Jul 15, 2020 at 2:05 PM Dilip Kumar  wrote:

>   Please see the
> latest patch set v33.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>
>
>
I have a minor comment. You've defined a new
function ReorderBufferStartStreaming() but the function doesn't actually
start streaming but is used to find out if you can start streaming and it
returns a boolean. Can't you name it accordingly?
Probably ReorderBufferCanStartStreaming(). I understand that it internally
calls ReorderBufferCanStream() which is similar sounding but I think that
should not matter.

regards,
Ajin Cherian
Fujitsu Australia


Re: Support for NSS as a libpq TLS backend

2020-07-15 Thread Thomas Munro
On Fri, Jul 10, 2020 at 5:10 PM Thomas Munro  wrote:
> -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 
> 31))
>
> I see the same when I use Debian's autoconf, but not FreeBSD's or
> MacPorts', despite all being version 2.69.  That seems to be due to
> non-upstreamed changes added by the Debian maintainers (I see the
> off_t thing mentioned in /usr/share/doc/autoconf/changelog.Debian.gz).

By the way, Dagfinn mentioned that these changes were in fact
upstreamed, and happened to be beta-released today[1], and are due out
in ~3 months as 2.70.  That'll be something for us to coordinate a bit
further down the road.

[1] https://lists.gnu.org/archive/html/autoconf/2020-07/msg6.html




gs_group_1 crashing on 13beta2/s390x

2020-07-15 Thread Christoph Berg
On the Debian s390x buildd, the 13beta2 build is crashing:

2020-07-15 01:19:59.149 UTC [859] LOG:  server process (PID 1415) was 
terminated by signal 11: Segmentation fault
2020-07-15 01:19:59.149 UTC [859] DETAIL:  Failed process was running: create 
table gs_group_1 as
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);

Full build log at 
https://buildd.debian.org/status/fetch.php?pkg=postgresql-13&arch=s390x&ver=13%7Ebeta2-1&stamp=1594776007&raw=0

The failure is reproducible there: 
https://buildd.debian.org/status/logs.php?pkg=postgresql-13&ver=13%7Ebeta2-1&arch=s390x

I tried a manual build on a s390x machine, but that one went through
fine, so I can't provide a backtrace at the moment.

Christoph




Re: POC and rebased patch for CSN based snapshots

2020-07-15 Thread Andrey V. Lepikhov

On 7/13/20 11:46 AM, movead...@highgo.ca wrote:

I continue to see your patch. Some code improvements see at the attachment.

Questions:
* csnSnapshotActive is the only member of the CSNshapshotShared struct.
* The WriteAssignCSNXlogRec() routine. I din't understand why you add 20 
nanosec to current CSN and write this into the WAL. For simplify our 
communication, I rewrote this routine in accordance with my opinion (see 
patch in attachment).


At general, maybe we will add your WAL writing CSN machinery + TAP tests 
to the patch from the thread [1] and work on it together?


[1] 
https://www.postgresql.org/message-id/flat/07b2c899-4ed0-4c87-1327-23c750311248%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional
>From 9a1595507c83b5fde61a6a3cc30f6df9df410e76 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Wed, 15 Jul 2020 11:55:00 +0500
Subject: [PATCH] 1

---
 src/backend/access/transam/csn_log.c   | 35 --
 src/include/access/csn_log.h   |  8 +++---
 src/test/regress/expected/sysviews.out |  3 ++-
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/csn_log.c b/src/backend/access/transam/csn_log.c
index 319e89c805..53d3877851 100644
--- a/src/backend/access/transam/csn_log.c
+++ b/src/backend/access/transam/csn_log.c
@@ -150,8 +150,8 @@ CSNLogSetCSN(TransactionId xid, int nsubxids,
  */
 static void
 CSNLogSetPageStatus(TransactionId xid, int nsubxids,
-		   TransactionId *subxids,
-		   XidCSN csn, int pageno)
+	TransactionId *subxids,
+	XidCSN csn, int pageno)
 {
 	int			slotno;
 	int			i;
@@ -187,8 +187,8 @@ CSNLogSetCSNInSlot(TransactionId xid, XidCSN csn, int slotno)
 
 	Assert(LWLockHeldByMe(CSNLogControlLock));
 
-	ptr = (XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
-
+	ptr = (XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] +
+	entryno * sizeof(XidCSN));
 	*ptr = csn;
 }
 
@@ -205,17 +205,16 @@ CSNLogGetCSNByXid(TransactionId xid)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToPgIndex(xid);
 	int			slotno;
-	XidCSN *ptr;
-	XidCSN	xid_csn;
+	XidCSN		csn;
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 	slotno = SimpleLruReadPage_ReadOnly(CsnlogCtl, pageno, xid);
-	ptr = (XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
-	xid_csn = *ptr;
+	csn = *(XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] +
+	entryno * sizeof(XidCSN));
 
 	LWLockRelease(CSNLogControlLock);
 
-	return xid_csn;
+	return csn;
 }
 
 /*
@@ -501,15 +500,15 @@ WriteAssignCSNXlogRec(XidCSN xidcsn)
 {
 	XidCSN log_csn = 0;
 
-	if(xidcsn > get_last_log_wal_csn())
-	{
-		log_csn = CSNAddByNanosec(xidcsn, 20);
-		set_last_log_wal_csn(log_csn);
-	}
-	else
-	{
+	if(xidcsn <= get_last_log_wal_csn())
+		/*
+		 * WAL-write related code. If concurrent backend already wrote into WAL
+		 * its CSN with bigger value it isn't needed to write this value.
+		 */
 		return;
-	}
+
+	log_csn = CSNAddByNanosec(xidcsn, 0);
+	set_last_log_wal_csn(log_csn);
 
 	XLogBeginInsert();
 	XLogRegisterData((char *) (&log_csn), sizeof(XidCSN));
@@ -571,7 +570,6 @@ csnlog_redo(XLogReaderState *record)
 		LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
 		set_last_max_csn(csn);
 		LWLockRelease(CSNLogControlLock);
-
 	}
 	else if (info == XLOG_CSN_SETXIDCSN)
 	{
@@ -589,7 +587,6 @@ csnlog_redo(XLogReaderState *record)
 		SimpleLruWritePage(CsnlogCtl, slotno);
 		LWLockRelease(CSNLogControlLock);
 		Assert(!CsnlogCtl->shared->page_dirty[slotno]);
-
 	}
 	else if (info == XLOG_CSN_TRUNCATE)
 	{
diff --git a/src/include/access/csn_log.h b/src/include/access/csn_log.h
index 5838028a30..c23a71446a 100644
--- a/src/include/access/csn_log.h
+++ b/src/include/access/csn_log.h
@@ -15,10 +15,10 @@
 #include "utils/snapshot.h"
 
 /* XLOG stuff */
-#define XLOG_CSN_ASSIGNMENT 0x00
-#define XLOG_CSN_SETXIDCSN   0x10
-#define XLOG_CSN_ZEROPAGE   0x20
-#define XLOG_CSN_TRUNCATE   0x30
+#define XLOG_CSN_ASSIGNMENT	0x00
+#define XLOG_CSN_SETXIDCSN	0x10
+#define XLOG_CSN_ZEROPAGE	0x20
+#define XLOG_CSN_TRUNCATE	0x30
 
 typedef struct xl_xidcsn_set
 {
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..cc169a1999 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -73,6 +73,7 @@ select name, setting from pg_settings where name like 'enable%';
   name  | setting 
 +-
  enable_bitmapscan  | on
+ enable_csn_snapshot| off
  enable_gathermerge | on
  enable_hashagg | on
  enable_hashjoin| on
@@ -90,7 +91,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan | on
  enable_sort| on
  enable_tidscan | on

Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-15 Thread Fujii Masao




On 2020/07/15 15:06, Masahiko Sawada wrote:

On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda  wrote:



I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.


Thanks for updating the patch!
I have three questions about the v23 patches.


1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
 committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING:  canceling wait for resolving foreign transaction due to user
request
DETAIL:  The transaction has already committed locally, but might not
have been committed on the foreign server.
ERROR:  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
CONTEXT:  remote SQL command: PREPARE TRANSACTION
'fx_1020791818_519_16399_10'
```

B. When prepare is succeeded,
 the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING:  canceling wait for resolving foreign transaction due to user
request
DETAIL:  The transaction has already committed locally, but might not
have been committed on the foreign server.
COMMIT
```

In case of A, I think that "committed locally" message can confuse user.
Because although messages show committed but the transaction is
"ABORTED".

I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?


No, you're right. I'll fix it in the next version patch.

I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.


Yes. Also the same message is logged when executing PREPARE TRANSACTION.
Maybe it should be changed to "the transaction has already prepared".

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION