Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2023-04-26 Thread Dimos Stamatakis
Hi hackers,

I was wondering whether there are any updates on the bug in visibility check 
introduced in version 14.5.

Many thanks,
Dimos
[ServiceNow]


Re: pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function

2023-01-26 Thread Dimos Stamatakis
## Dimos Stamatakis (dimos.stamata...@servicenow.com):

> In our scenario we changed the permissions of this function in PG14.5
> (via an automated tool) and then pg_upgrade tries to change the
> permissions in PG15.1 as well.

Given that this function wasn't even documented and did nothing but
throw an error "function close_lb not implemented" - couldn't you
revert that permissions change for the upgrade? (if it comes to the
worst, a superuser could UPDATE pg_catalog.pg_proc and set proacl
to NULL for that function, but that's not how you manage ACLs in
production, it's for emergency fixing only).

Thanks Christoph! Actually, I already tried reverting the permissions but 
pg_upgrade attempts to replicate the revert SQL statement as well 😊
It would be nice to make pg_upgrade ignore some statements while upgrading.
As David mentions, we can alter the patch to ignore dropped functions.

Thanks,
Dimos
(ServiceNow)


pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function

2023-01-25 Thread Dimos Stamatakis
Hi hackers,

I attempted to perform an upgrade from PG-14.5 to PG-15.1 with pg_upgrade and 
unfortunately it errors out because of a function that does not exist anymore 
in PG-15.1.
The function is ‘pg_catalog.close_lb’ and it exists in 14.5 but not in 15.1.
In our scenario we changed the permissions of this function in PG14.5 (via an 
automated tool) and then pg_upgrade tries to change the permissions in PG15.1 
as well.


Steps to reproduce:


  1.  Run initdb for 14.5
  2.  Run initdb for 15.1
  3.  Run psql client on 14.5
 *   postgres=# REVOKE ALL ON FUNCTION close_lb(line, box) FROM $USER;
  4.  Run pg_upgrade from 14.5 to 15.1

This will error out because pg_upgrade will attempt to REVOKE the persmissions 
on close_lb on 15.1.
Is there a way to specify which functions/objects to exclude in pg_upgrade?
Thanks in advance!

Dimos
(ServiceNow)


Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-29 Thread Dimos Stamatakis
[External Email]

On 2022-Nov-25, Dimos Stamatakis wrote:

> So does this mean there is no race condition in this case and that
> this error is redundant?

No, it means I believe a bug exists but that I haven't spent enough time
on it to understand what it is.



Great! Please keep me posted and let me know if you need any more evidence to 
debug. 😊

Thanks,
Dimos


Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-25 Thread Dimos Stamatakis
So does this mean there is no race condition in this case and that this error 
is redundant?

Thanks,
Dimos
[ServiceNow]


From: Alvaro Herrera 
Date: Thursday, 24. November 2022 at 19:24
To: Dimos Stamatakis 
Cc: Peter Geoghegan , simon.ri...@enterprisedb.com 
, pgsql-hackers@lists.postgresql.org 

Subject: Re: Fix for visibility check on 14.5 fails on tpcc with high 
concurrency
[External Email]

On 2022-Nov-24, Alvaro Herrera wrote:

> On 2022-Nov-24, Dimos Stamatakis wrote:
>
> > rmgr: MultiXact len (rec/tot): 54/ 54, tx: 248477, lsn: 0/66DB82A8, prev 
> > 0/66DB8260, desc: CREATE_ID 133 offset 265 nmembers 2: 248477 (nokeyupd) 
> > 248500 (keysh)
> > rmgr: Heap len (rec/tot): 70/ 70, tx: 248477, lsn: 0/66DB82E0, prev 
> > 0/66DB82A8, desc: HOT_UPDATE off 20 xmax 133 flags 0x20 IS_MULTI EXCL_LOCK 
> > ; new off 59 xmax 132, blkref #0: rel 1663/16384/16385 blk 422
> > rmgr: Transaction len (rec/tot): 34/ 34, tx: 248477, lsn: 0/66DBA710, prev 
> > 0/66DBA6D0, desc: ABORT 2022-11-23 20:33:03.712298 UTC
> > …
> > rmgr: Transaction len (rec/tot): 34/ 34, tx: 248645, lsn: 0/66DBB060, prev 
> > 0/66DBB020, desc: ABORT 2022-11-23 20:33:03.712388 UTC
>
> Ah, it seems clear enough: the transaction that aborted after having
> updated the tuple, is still considered live when doing the second
> update. That sounds wrong.

Hmm, if a transaction is aborted but its Xid is after latestCompletedXid,
it would be reported as still running. AFAICS that is only modified
with ProcArrayLock held in exclusive mode, and only read with it held in
share mode, so this should be safe.

I can see nothing else ATM.

--
Álvaro Herrera Breisgau, Deutschland — 
https://www.EnterpriseDB.com/<https://www.EnterpriseDB.com>
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison)
http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php<http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php>


Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-22 Thread Dimos Stamatakis
Hi hackers,

When running tpcc on sysbench with high concurrency (96 threads, scale factor 
5) we realized that a fix for visibility check (introduced in PG-14.5) causes 
sysbench to fail in 1 out of 70 runs.
The error is the following:

SQL error, errno = 0, state = 'XX000': new multixact has more than one updating 
member

And it is caused by the following statement:

UPDATE warehouse1
  SET w_ytd = w_ytd + 234
  WHERE w_id = 3;

The commit that fixes the visibility check is the following:
https://github.com/postgres/postgres/commit/e24615a0057a9932904317576cf5c4d42349b363

We reverted this commit and tpcc does not fail anymore, proving that this 
change is problematic.
Steps to reproduce:
1. Install sysbench
  https://github.com/akopytov/sysbench
2. Install percona sysbench TPCC
  https://github.com/Percona-Lab/sysbench-tpcc
3. Run percona sysbench -- prepare
  # sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 
--pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database 
--db-driver=pgsql --tables=1 --threads=96 --scale=5 --time=60 prepare
4. Run percona sysbench -- run
  # sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 
--pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database 
--db-driver=pgsql --tables=1 --report-interval=1 --rand-seed=1 --threads=96 
--scale=5 --time=60 run

We tested on a machine with 2 NUMA nodes, 16 physical cores per node, and 2 
threads per core, resulting in 64 threads total. The total memory is 376GB.
Attached please find the configuration file we used (postgresql.conf).

This commit was supposed to fix a race condition during the visibility check. 
Please let us know whether you are aware of this issue and if there is a quick 
fix.
Any input is highly appreciated.

Thanks,
Dimos
[ServiceNow]


postgresql.conf
Description: postgresql.conf