building a singularity image from docker hub postgres image

2024-01-29 Thread Allan Kamau
I am trying to build a singularity image from postgres docker image.
I am issuing the command below.

$ singularity build /local/data/some/postgres.16.1.sif
docker://postgres/postgres:16.1

INFO:Starting build...
INFO:Fetching OCI image...
FATAL:   While performing build: conveyor failed to get: GET
https://index.docker.io/v2/postgres/postgres/manifests/16.1: UNAUTHORIZED:
authentication required; [map[Action:pull Class: Name:postgres/postgres
Type:repository]]


What is the url I should use?

-Allan.


Good overview of programming IN Postgres?

2024-01-29 Thread Guyren Howe
Is there a good overview of programming WITHIN Postgres? Not writing code in 
client languages that communicates WITH Postgres, but how to take full 
advantage of Postgres’ unique programming features — the advanced type system, 
the various PL languages, triggers, stored procedures, functions, …

I’m thinking both how all those things work, but also how to think about 
effective architectures given that design palette.

The Postgres documentation is great and is the ultimate reference, but is there 
a good overview for someone who wants to take best advantage of all these 
features?

Thinking if there isn’t such a thing that I might write it. But I don’t want to 
duplicate effort.


Re: Best practices for data buffer cache setting/tuning (v15)

2024-01-29 Thread Bruce Momjian
On Mon, Jan 29, 2024 at 01:46:30PM -0800, Christophe Pettus wrote:
>
>
> > On Jan 29, 2024, at 11:39, David Gauthier 
> > wrote:
> >
> > Is there a document which makes recommendations on sizing data
> > buffer cache, tuning options which evict old/unused data in mem, and
> > cache fragmentation avoidance for a v15.3 DB ?
>
> On any modern system, set shared_buffers to 25% of instance RAM, and
> get on with life.
>
> I realize that seems glib, but small variations in shared_buffers make
> almost no significant difference in performance.  There are unusual
> workloads (like very high insert rate with few queries) that require
> different settings, but for most typical ELTP or data warehouse
> workloads, that is sufficient.

This blog entry might explain why Postgres requires less tuning of
shared buffers than other database systems:

https://momjian.us/main/blogs/pgblog/2018.html#December_7_2018

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Monitoring logical replication

2024-01-29 Thread Klaus Darilion

Hi Saheed!

I monitor our replication this way:

1. Every 10 seconds i fetch the current LSN and write it into a table, 
next with the current timestamp. Further I fetch confirmend LSNs from 
the replication slots and delete old entries in lsn2data table.


calculate_logical_replication_lag.php:

if (!$dbconn = pg_pconnect('host='.$config_int['dbhost'].' 
dbname='.$config_int['dbname'].' user='.$dbuser.' password='.$dbpass)) {

print "Sorry, database connection failed";
exit;
}

$accuracy = 10; // in seconds

//
// Preparations:
//
// CREATE TABLE lsn2data(
//lsn pg_lsn PRIMARY KEY,
//seen timestamp NOT NULL DEFAULT NOW()
// );
// CREATE ROLE replication_lag_user WITH LOGIN PASSWORD 
'XXX';

// GRANT ALL ON TABLE lsn2data TO replication_lag_user;
//
// CREATE OR REPLACE FUNCTION get_replication_lag() RETURNS TABLE 
(subscriber name, lag bigint) AS

// $BODY$
// DECLARE
// subscriber name;
// BEGIN
// FOR subscriber IN
// SELECT slot_name FROM pg_replication_slots
// LOOP
// RETURN QUERY SELECT slot_name, EXTRACT(EPOCH FROM 
NOW()-seen)::bigint lag from lsn2data,pg_replication_slots WHERE 
slot_name=subscriber AND lsn < confirmed_flush_lsn ORDER BY lsn DESC 
LIMIT 1;

// END LOOP;
// RETURN;
// END
// $BODY$
// LANGUAGE plpgsql;
//
while (1) {
$dbq = pg_query("INSERT INTO lsn2data (lsn) VALUES 
(pg_current_wal_lsn())");

if ($dbq === FALSE) {
mylog(LOG_ERROR, "SQL query error: 
".pg_last_error()."\n");

exit(1);
}

$dbq = pg_query("DELETE FROM lsn2data WHERE lsn < (".
  "SELECT lsn FROM lsn2data WHERE lsn < (".
"SELECT confirmed_flush_lsn FROM 
pg_replication_slots ORDER BY confirmed_flush_lsn ASC LIMIT 1".

  ") ORDER BY lsn DESC LIMIT 1".
")"
);
if ($dbq === FALSE) {
mylog(LOG_ERROR, "SQL query error: 
".pg_last_error()."\n");

exit(1);
}
sleep($accuracy);
}

2. I graph the replications lags (converted from LSN to seconds) in my 
check_mk monitoring:


#!/bin/bash

#
# Managed by Puppet: 
modules/base/files/monitoring/check_logical_replication_lag.sh

#
# Check the logical replication lag and export performance data for each 
subscriber

#

# exit on error
#set -e

#Make sure this script only runs one at a time
(

  ME=$0
  MEBASE=`basename $0`

  mylog () {
echo "$MEBASE: $1"
logger -t "$MEBASE" "$1"
  }

  flock -x -w 1 200
  if [ $? != "0" ]; then
#echo "ERROR: $0 is already running ... exit"
logger -t "$MEBASE" "ERROR: $0 is already running ... exit"
exit 1
  fi

  # Do stuff

# Variablen fuer Monitoring
CMK_SPOOLDIR=/var/lib/check_mk_agent/spool
CMK_NAME=$MEBASE
CMK_SPOOLFILE=600_`basename ${CMK_NAME}`.txt
CMK_HEADER="<<>>"
TMP_FILE="/tmp/logical_replication_lag.csv"

# Schwellwerte
warn=300
crit=600

final_output="$CMK_HEADER\nP $CMK_NAME "

# move to a directory where user postgresl may reside (sudo)
cd /tmp

# Lag auslesen. Waehrend dem initialen aufsynchen eines Subscribers gibt 
es temporaere Subscriptions, mit dem Namen reg_xxx1-pid-sync-pid.

# Damit diese nicht getrackt werden gibt es die huebsche LIKE Clause.
rm -f "$TMP_FILE"
sudo -u postgres psql regdns -c "COPY (SELECT subscriber,lag FROM 
get_replication_lag() WHERE subscriber LIKE '%\_' ORDER BY 2 DESC) 
TO '$TMP_FILE' With CSV" 2>&1> /dev/null
LC=$(sudo -u postgres psql -t regdns -c "SELECT count(*) FROM 
get_replication_lag();" | tr -d ' ')


if [ $LC == "0" ]; then
echo -e "$CMK_HEADER\n0 $CMK_NAME - No Slaves with Replication 
found - maybe we are a slave?" > $CMK_SPOOLDIR/$CMK_SPOOLFILE

exit 0;
fi

grep $(hostname | cut -d '-' -f2) "$TMP_FILE" > /dev/null
if [ $? != "0" ]; then
echo -e "$CMK_HEADER\n2 $CMK_NAME - Postgres Output does not 
seem valid. Please check script $ME and output in $TMP_FILE" > 
$CMK_SPOOLDIR/$CMK_SPOOLFILE

exit 1;
fi

# CSV in Array einlesen
IFS=$'\n' read -d '' -r -a input_file < "$TMP_FILE"

# Auswerten
maxlag=0
for i in "${input_file[@]}"; do
node=`echo $i | awk -F  "," '{print $1}' | tr -- _ -`
lag=`echo $i | awk -F  "," '{print $2}'`
final_output="$final_output$node=$lag;$warn;$crit|"
# 
https://unix.stackexchange.com/questions/186663/is-there-a-unix-command-that-gives-the-minimum-maximum-of-two-numbers

maxlag=$(( maxlag > lag ? maxlag : lag ))
done
final_output="${final_output}max-lag=$maxlag;$warn;$crit"

# Letztes Pipe Zeichen rausschneiden
#final_output=`echo $final_output | rev | cut -c 2- | rev`

# Spool File schreiben
echo -e $final_output > $CMK_SPOOLDIR/$CMK_SPOOLFILE
logger -t "$MEBASE" "$final_output"


) 200>/tmp/`basename $0`.exclusivelock


3. During initial sync I check the status on the subscriber. Once it has 
synced all tables of the publication, it will send me an email.

#
# Managed by 

Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Adrian Klaver

On 1/29/24 11:35, Shaheed Haque wrote:



On Tue, 30 Jan 2024, 00:27 Adrian Klaver, > wrote:


On 1/29/24 10:12, Shaheed Haque wrote:


 >
 > Yes. But I was under the impression that the initial copy of logical
 > replication was the same?
 >

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.


Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the
publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.


Good points, understood. For the next bit, let's assume neither of these 
are in play.


Finally logical replication is generally not static so there is the
issue of determining a point in time for the check.


Indeed. I currently have a static source db but would eventually like to 
eliminate the implied downtime. What I'd like to provide my user is some 


Implied downtime of what?

indication of progress initially during the copy_data phase, and for the 
future, of the anticipated incremental convergence.


And, as per my other note, I would ideally like to be able to do this 
using only a connection to one db.


I was assuming that logical replication needed "something" similar 
internally, and was hoping the LSNs were that "something".


I'm going to say up front I am no expert on the internals of logical 
replication. Will point you at:


https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.



Thanks, Shaheed


-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: What should I expect when creating many logical replication slots?

2024-01-29 Thread Klaus Darilion

Am 2024-01-16 19:51, schrieb Jim Nasby:

On 1/11/24 6:17 PM, Antonin Bas wrote:

Hi all,

I have a use case for which I am considering using Postgres Logical 
Replication, but I would like to scale up to 100 or even 200 
replication slots.


I have increased max_wal_senders and max_replication_slots to 100 
(also making sure that max_connections is large enough). Things seem 
to be working pretty well so far based on some PoC code I have 
written. Postgres is creating a walsender process for each replication 
slot, as expected, and the memory footprint of each one is around 4MB.


So I am quite happy with the way things are working, but I am a bit 
uneasy about increasing these configuration values by 10-20x compared 
to their defaults (both max_wal_senders and max_replication_slots 
default to 10).


Is there anything I should be looking out for specifically? Is it 
considered an anti-pattern to use that many replication slots and 
walsender processes? And, when my database comes under heavy write 
load, will walsender processes start consuming a large amount of CPU / 
memory (I recognize that this is a vague question, I am still working 
on some empirical testing).


The biggest issue with logical decoding (what drives logical 
replication) is that every subscriber has to completely decode 
everything for it's publication, which can be extremely memory 
intensive under certain circumstances (long running transacitons being 
one potential trigger). Decoders also have to read through all WAL 
traffic, regardless of what their publication is set to - everything 
runs of the single WAL stream.


Indeed, big/long-running transaction can be an issue. Today I added a 
column with SERIAL to a table with 55 mio rows. Although that table is 
not in the publication, the hugh transaction got written to the WAL and 
spilled to disk by the WAL senders. Having 50 WAL senders, we basically 
created a DoS-amplification-attack against our disk. Luckily we could 
increase the disk size.*


Besides that, our DB has (in my point of view) plenty of UPDATE/INSERTs 
and did not have replication performance problems with currently 50 
replication slots. But if that would become a bottleneck, we would use 
cascading replication, ie the master uses logical replication to 2 
"distribution" servers, which further replicates to the 50+ subscribers. 
That way, the inbetween "distribution" server would also filter out WAL 
decoding for changes that are not part of the publication.


regards
Klaus

* Does somebody know why the WAL sender has to track transactions that 
make changes to tables which are not the publication?





Re: Best practices for data buffer cache setting/tuning (v15)

2024-01-29 Thread Christophe Pettus



> On Jan 29, 2024, at 11:39, David Gauthier  wrote:
> 
> Is there a document which makes recommendations on sizing data buffer cache, 
> tuning options which evict old/unused data in mem, and cache fragmentation 
> avoidance for a v15.3 DB ?

On any modern system, set shared_buffers to 25% of instance RAM, and get on 
with life.

I realize that seems glib, but small variations in shared_buffers make almost 
no significant difference in performance.  There are unusual workloads (like 
very high insert rate with few queries) that require different settings, but 
for most typical ELTP or data warehouse workloads, that is sufficient.



Re: Seeking help extricating data from Amazon RDS Aurora/Postgres

2024-01-29 Thread Alan Hodgson
On Mon, 2024-01-29 at 14:22 -0500, Bill Mitchell wrote:
> We are attempting to extract one of our database from Amazon RDS
> Aurora/Postgres to another PostgreSQL cluster that is running
> directly on EC2 instances. 

Aurora PostgreSQL supports logical replication and purports to use
the native WAL, so I suppose it might be compatible with an open
source PostgreSQL target. Probably worth testing.


Re: Best practices for data buffer cache setting/tuning (v15)

2024-01-29 Thread Laurenz Albe
On Mon, 2024-01-29 at 14:39 -0500, David Gauthier wrote:
> Is there a document which makes recommendations on sizing data buffer cache,
> tuning options which evict old/unused data in mem, and cache fragmentation
> avoidance for a v15.3 DB ?

Start with "shared_buffers" sized as a quarter of the available RAM and
define enough huge pages on the Linux kernel to fit shared buffers.

There are no tuning options for evicting buffers, and there cannot be any
fragmentation.

Yours,
Laurenz Albe




Best practices for data buffer cache setting/tuning (v15)

2024-01-29 Thread David Gauthier
Is there a document which makes recommendations on sizing data buffer
cache, tuning options which evict old/unused data in mem, and cache
fragmentation avoidance for a v15.3 DB ?

Thanks in Advance.


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
On Tue, 30 Jan 2024, 00:27 Adrian Klaver,  wrote:

> On 1/29/24 10:12, Shaheed Haque wrote:
>
>
> >
> > Yes. But I was under the impression that the initial copy of logical
> > replication was the same?
> >
>
> Are you taking about the copy_data option to WITH?
>
> If so yes and no.
>
> Yes as it uses COPY to transfer the data.
>

Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the
> publisher. Also if you have cascading publishers/subscriptions the
> 'original' data maybe upstream of the publisher you are comparing to.


Good points, understood. For the next bit, let's assume neither of these
are in play.

Finally logical replication is generally not static so there is the
> issue of determining a point in time for the check.
>

Indeed. I currently have a static source db but would eventually like to
eliminate the implied downtime. What I'd like to provide my user is some
indication of progress initially during the copy_data phase, and for the
future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this using
only a connection to one db.

I was assuming that logical replication needed "something" similar
internally, and was hoping the LSNs were that "something".

Thanks, Shaheed


> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Seeking help extricating data from Amazon RDS Aurora/Postgres

2024-01-29 Thread Christophe Pettus



> On Jan 29, 2024, at 11:22, Bill Mitchell  wrote:
> 
> Wondering if any of the other members of this LISTSERV have tried migrating 
> their data off of Amazon RDS Aurora Postgres with success.

Any logical-replication based solution (DMS, fivetran, in-core logical 
replication) will handle the problem.  Although the overall migration time is 
high, the source database is in production for the entire time, so the actual 
downtime is small.

DMS is not a great solution for going PostgreSQL-to-PostgreSQL, as it has 
limited data type support.  In-core logical replication is probably the best 
solution.

One thing to be aware of is when replication starts, it first needs to copy 
over the existing data in the tables *and* capture changes that occur during 
that copy.  Because of that, it will retain WAL created during that copy.  That 
can be quite a bit of disk space, so planning for that is important.



Seeking help extricating data from Amazon RDS Aurora/Postgres

2024-01-29 Thread Bill Mitchell
We are attempting to extract one of our database from Amazon RDS 
Aurora/Postgres to another PostgreSQL cluster that is running directly 
on EC2 instances.


This particular database is currently about 3.5TB.

We are sensitive to this database being down for an extended period of 
time as it will adversely impact some subsystems within our 
application.  We're hoping to find a means to migrate within a 4 hour 
window.


We've tried multiple approaches:

 * Using Amazon DMS to copy the data
 o despite doing parallelization, the total time would be around 89
   hours
 * Using FiveTran (great tool!) to copy data
 o using this tool was that it would take about 86 *hours* to complete.
 * Using a database dump and restore
 o Doing a backup from RDS's replica fails, due to holding a
   database lock for too long.

The ideal situation would be to have a streaming replica from Amazon 
RDS, but you can only create a replica that is hosted by AWS, and have 
no access to the WAL logs...


Wondering if any of the other members of this LISTSERV have tried 
migrating their data off of Amazon RDS Aurora Postgres with success.


Thanks in advance for any pointers,

Bill



Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Laura Smith


On Monday, 29 January 2024 at 09:06, Ron Johnson  
wrote:

> 
> That's kinda like being asked to prove that rocks always fall when you drop 
> them. Either you trust physics, because physics has always worked, or you 
> must watch every rock, because next time it might not fall. The analogy is 
> slightly flawed, since we always check the pg_dump and pg_restore return 
> codes, since something else might impact their function.
> 
> But if you still need evidence, here's what I'm doing to verify table and 
> record counts during a 9.6 -> 14 migration. You'll have to modify it for your 
> purpose.



Thanks Ron !

I must admit that I am willing to trust pg_dump / pg_restore, mostly for the 
reasons Adrian Klaver implied.

However your script is likely the very thing I was looking for in terms of belt 
& braces.  So I appreciate you publishing it as a source of inspiration !




Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Adrian Klaver

On 1/29/24 10:12, Shaheed Haque wrote:




Yes. But I was under the impression that the initial copy of logical 
replication was the same?




Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

No as what COPY transfers can be affected by WHERE clauses on the 
publisher. Also if you have cascading publishers/subscriptions the 
'original' data maybe upstream of the publisher you are comparing to. 
Finally logical replication is generally not static so there is the 
issue of determining a point in time for the check.


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Query performance in 9.6.24 vs 14.10

2024-01-29 Thread Bob Jolliffe
Thanks for the update.

On Mon, Jan 29, 2024, 16:53 Ron Johnson  wrote:

> According to my tests, sometimes JIT is a little faster, and sometimes
> it's a little slower. Mostly within the realm of statistical noise
> (especially with each query having a sample size of only 13, on a VM that
> lives on a probably-busy host).
>
> On Mon, Jan 29, 2024 at 9:18 AM Ron Johnson 
> wrote:
>
>> Yes, jit=on.
>>
>> I'll test them with jit=off, to see the difference.  (The application is
>> 3rd party, so will change it at the system level.)
>>
>> On Mon, Jan 29, 2024 at 7:09 AM Bob Jolliffe 
>> wrote:
>>
>>> Out of curiosity, is the pg14 running with the default jit=on setting?
>>>
>>> This is obviously entirely due to the nature of the particular queries
>>> themselves, but we found that for our workloads that pg versions
>>> greater than 11 were exacting a huge cost due to the jit compiler.  Once we
>>> explicitly turned jit=off we started to see improvements.
>>>
>>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
On Mon, 29 Jan 2024, 23:57 Adrian Klaver,  wrote:

> On 1/29/24 09:28, Shaheed Haque wrote:
>
>
> >
> > Right, for me, state, not just record count is what I'm interested in
> > (for the initial full table copy part of replication). So, given the
> > explanation about the possible per-table window, is there some property
> > of the table that could be used to confirm that a table has made it
> across?
> >
> > I guess there is such a thing since the following incremental syncing
> > would presumably need it. I had hoped the LSN was this thing, but
> > confirmation would be great.
>
> The OP was referring to the pg_dump/pg_restore cycle, you seem to be
> referring to logical replication. Is that correct?
>

Yes. But I was under the impression that the initial copy of logical
replication was the same?


> >
> > Thanks, Shaheed
> >
> >
> >  >
> >  > Thanks !
> >  >
> >  > Laura
> >  >
> >  >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Adrian Klaver

On 1/29/24 09:28, Shaheed Haque wrote:




Right, for me, state, not just record count is what I'm interested in 
(for the initial full table copy part of replication). So, given the 
explanation about the possible per-table window, is there some property 
of the table that could be used to confirm that a table has made it across?


I guess there is such a thing since the following incremental syncing 
would presumably need it. I had hoped the LSN was this thing, but 
confirmation would be great.


The OP was referring to the pg_dump/pg_restore cycle, you seem to be 
referring to logical replication. Is that correct?




Thanks, Shaheed


 >
 > Thanks !
 >
 > Laura
 >
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
On Mon, 29 Jan 2024, 22:52 Adrian Klaver,  wrote:

> On 1/29/24 00:12, Laura Smith wrote:
> > Hi
> >
> > Let's say I've got a scenario where I'm doing a pg_dump replication
> rather than online streaming, e.g. due to air-gap or whatever.
> >
> > Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>
> Assuming you are using pg_restore on a non-plain text dump file then
> from pg_restore.c:
>
> * pg_restore.c
>
> *  pg_restore is an utility extracting postgres database definitions
>   *  from a backup archive created by pg_dump using the archiver
>   *  interface.
>   *
>   *  pg_restore will read the backup archive and
>   *  dump out a script that reproduces
>   *  the schema of the database in terms of
>   *user-defined types
>   *user-defined functions
>   *tables
>   *indexes
>   *aggregates
>   *operators
>   *ACL - grant/revoke
>   *
>   * the output script is SQL that is understood by PostgreSQL
>   *
>   * Basic process in a restore operation is:
>   *
>   *  Open the Archive and read the TOC.
>   *  Set flags in TOC entries, and *maybe* reorder them.
>   *  Generate script to stdout
>   *  Exit
>
> Then:
>
> pg_restore -l -f  
>
> to get the TOC mentioned above. Walk through that to verify schema is
> the same in the restored database.
>
> This will not tell you whether all the data was transferred. You will
> either have to trust from pg_dump.c:
>
>   *  pg_dump will read the system catalogs in a database and dump out a
>   *  script that reproduces the schema in terms of SQL that is
> understood
>   *  by PostgreSQL
>   *
>   *  Note that pg_dump runs in a transaction-snapshot mode transaction,
>   *  so it sees a consistent snapshot of the database including system
>   *  catalogs. However, it relies in part on various specialized
> backend
>   *  functions like pg_get_indexdef(), and those things tend to look at
>   *  the currently committed state.  So it is possible to get 'cache
>   *  lookup failed' error if someone performs DDL changes while a
> dump is
>   *  happening. The window for this sort of thing is from the
> acquisition
>   *  of the transaction snapshot to getSchemaData() (when pg_dump
> acquires
>   *  AccessShareLock on every table it intends to dump). It isn't
> very large,
>   *  but it can happen.
>
> Or come up with way to capture the state of the data at the time of dump
> and then compare to restored database. Something like Ron posted.
>

Right, for me, state, not just record count is what I'm interested in (for
the initial full table copy part of replication). So, given the explanation
about the possible per-table window, is there some property of the table
that could be used to confirm that a table has made it across?

I guess there is such a thing since the following incremental syncing would
presumably need it. I had hoped the LSN was this thing, but confirmation
would be great.

Thanks, Shaheed


> >
> > Thanks !
> >
> > Laura
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>


Re: Query performance in 9.6.24 vs 14.10

2024-01-29 Thread Ron Johnson
According to my tests, sometimes JIT is a little faster, and sometimes it's
a little slower. Mostly within the realm of statistical noise (especially
with each query having a sample size of only 13, on a VM that lives on a
probably-busy host).

On Mon, Jan 29, 2024 at 9:18 AM Ron Johnson  wrote:

> Yes, jit=on.
>
> I'll test them with jit=off, to see the difference.  (The application is
> 3rd party, so will change it at the system level.)
>
> On Mon, Jan 29, 2024 at 7:09 AM Bob Jolliffe 
> wrote:
>
>> Out of curiosity, is the pg14 running with the default jit=on setting?
>>
>> This is obviously entirely due to the nature of the particular queries
>> themselves, but we found that for our workloads that pg versions
>> greater than 11 were exacting a huge cost due to the jit compiler.  Once we
>> explicitly turned jit=off we started to see improvements.
>>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Adrian Klaver

On 1/29/24 00:12, Laura Smith wrote:

Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than 
online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ?  e.g. using doing something 
clever with ctid or something to ensure both the schema and all its rows were 
restored to the same point at which the dump was taken ?


Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

*  pg_restore is an utility extracting postgres database definitions
 *  from a backup archive created by pg_dump using the archiver
 *  interface.
 *
 *  pg_restore will read the backup archive and
 *  dump out a script that reproduces
 *  the schema of the database in terms of
 *user-defined types
 *user-defined functions
 *tables
 *indexes
 *aggregates
 *operators
 *ACL - grant/revoke
 *
 * the output script is SQL that is understood by PostgreSQL
 *
 * Basic process in a restore operation is:
 *
 *  Open the Archive and read the TOC.
 *  Set flags in TOC entries, and *maybe* reorder them.
 *  Generate script to stdout
 *  Exit

Then:

pg_restore -l -f  

to get the TOC mentioned above. Walk through that to verify schema is 
the same in the restored database.


This will not tell you whether all the data was transferred. You will 
either have to trust from pg_dump.c:


 *  pg_dump will read the system catalogs in a database and dump out a
 *  script that reproduces the schema in terms of SQL that is 
understood

 *  by PostgreSQL
 *
 *  Note that pg_dump runs in a transaction-snapshot mode transaction,
 *  so it sees a consistent snapshot of the database including system
 *  catalogs. However, it relies in part on various specialized backend
 *  functions like pg_get_indexdef(), and those things tend to look at
 *  the currently committed state.  So it is possible to get 'cache
 *  lookup failed' error if someone performs DDL changes while a 
dump is
 *  happening. The window for this sort of thing is from the 
acquisition
 *  of the transaction snapshot to getSchemaData() (when pg_dump 
acquires
 *  AccessShareLock on every table it intends to dump). It isn't 
very large,

 *  but it can happen.

Or come up with way to capture the state of the data at the time of dump 
and then compare to restored database. Something like Ron posted.




Thanks !

Laura




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Query performance in 9.6.24 vs 14.10

2024-01-29 Thread Ron Johnson
Yes, jit=on.

I'll test them with jit=off, to see the difference.  (The application is
3rd party, so will change it at the system level.)

On Mon, Jan 29, 2024 at 7:09 AM Bob Jolliffe  wrote:

> Out of curiosity, is the pg14 running with the default jit=on setting?
>
> This is obviously entirely due to the nature of the particular queries
> themselves, but we found that for our workloads that pg versions
> greater than 11 were exacting a huge cost due to the jit compiler.  Once we
> explicitly turned jit=off we started to see improvements.
>


Re: Query performance in 9.6.24 vs 14.10

2024-01-29 Thread Bob Jolliffe
Out of curiosity, is the pg14 running with the default jit=on setting?

This is obviously entirely due to the nature of the particular queries
themselves, but we found that for our workloads that pg versions
greater than 11 were exacting a huge cost due to the jit compiler.  Once we
explicitly turned jit=off we started to see improvements.

On Mon, 29 Jan 2024 at 07:55, Ron Johnson  wrote:

> On Sun, Jan 28, 2024 at 10:44 PM David Rowley 
> wrote:
>
>> On Mon, 29 Jan 2024 at 07:37, Ron Johnson 
>> wrote:
>>
>>> 08 9.6.24 1,142.164 1,160.801 1,103.716 1,249.852 1,191.081
>>> 14.10 159.354 155.111 155.111 162.797 158.157 86.72%
>>>
>>
>> Your speedup per cent calculation undersells PG14 by quite a bit.  I'd
>> call that an increase of ~639% rather than 86.72%.
>>
>> I think you've done "1 - sum( <14.10 numbers> ) / sum( <9.6.24 numbers>)"
>> whereas I think you should have done "sum( <9.6.24 numbers>) / sum( <14.10
>> numbers> ) - 1"
>>
>> Nonetheless, thanks for testing this out.  I assume this is just a report
>> giving good feedback about progress in this area...?
>>
>
> The spreadsheet function, using the Median cells, is (PG9.6 - PG14) /
> PG9.6).  That's essentially the same as what you wrote.
>
> 158.157 / 1191.081 = 0.13278
>
> 1191.081 / 158.157 = 7.53, so 9.6.24 on that query is 7.53x slower.
>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Ron Johnson
On Mon, Jan 29, 2024 at 3:12 AM Laura Smith <
n5d9xq3ti233xiyif...@protonmail.ch> wrote:

> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather
> than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>

1. Since pg_dump creates a logical copy, nothing binary like ctid will work.
2. pg_dump is designed to take a snapshot.If it doesn't, it would fail
for people who do logical replication.  However, no one has noticed.

That's kinda like being asked to prove that rocks *always* fall when you
drop them.  Either you trust physics, because physics has always worked, or
you must watch every rock, because next time it might not fall.  The
analogy is slightly flawed, since we always check the pg_dump and
pg_restore return codes, since something else might impact their function.

But if you still need evidence, here's what I'm doing to verify table and
record counts during a 9.6 -> 14 migration.  You'll have to modify it for
your purpose.

Create this table and function beforehand:
CREATE TABLE dba.migration_table_counts (
location text check (location in ('96', '14'))
  , table_name text
  , row_count bigint
  , count_time timestamp without time zone default current_timestamp
  , primary key (table_name, location)
);
CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS
INTEGER
LANGUAGE plpgsql
AS $func$
DECLARE
r RECORD;
_sql TEXT;
_table_count BIGINT;
BEGIN
FOR r IN select relnamespace::regnamespace::text||'.'||relname as
table_name
from pg_class cla
where relkind = 'r'
and not exists (select 1   -- excludes parent tables
from pg_inherits inh1
where inh1.inhparent = cla.oid)
and relnamespace::regnamespace::text
not in ('pg_catalog', 'information_schema', 'dba')
order by 1
LOOP
_sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
RAISE NOTICE '% %', to_char(clock_timestamp(), '-MM-DD HH24:MI:
SS.MS'), _sql;
EXECUTE _sql INTO _table_count;
--RAISE NOTICE '%', _table_count;
INSERT INTO dba.migration_table_counts (location, table_name,
row_count)
VALUES (_p_source, r.table_name, _table_count);
END LOOP;
RAISE NOTICE '% %', to_char(clock_timestamp(), '-MM-DD HH24:MI:SS.MS'),
'Finished';
RETURN 0;
END
$func$;

Run this script in a cron job that executes at the same time as your
pg_dump cron job.  Parameters should be for the source database.
#!/bin/bash
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM
dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from
dba.get_table_counts('$Source');"

Run the same script on the destination server after the pg_restore is
finished.
Dump the source dba.migration_table_counts then load it into the
destination dba.migration_table_counts.

These two queries run on the destination server will check that all tables
exist in both databases, and that the record counts are the same.

You'll have some slight variations, since the two jobs are in separate
transactions.  (Mine won't, since the applications will be shut down, and
pg_hba.conf will block them.)

declare -gr SQL1="
with
c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
select c96.*, c14.*
from c96 full join c14 on c96.table_name = c14.table_name
where c96.table_name is null
   or c14.table_name is null
order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"

declare -gr SQL2="
with
c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
from c96 inner join c14 on c96.table_name = c14.table_name
where c96.row_count != c14.row_count
order by c96.table_name;"
psql $DB -ac "$SQL2"


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
I'd also like to know how to do this. The current approaches seem, afaict,
to involve making on both end of the connection. Even given the inherently
racy nature of the issue, that seems unwieldy to me.

https://www.postgresql.org/message-id/CAHAc2jdAHvp7tFZBP37awcth%3DT3h5WXCN9KjZOvuTNJaAAC_hg%40mail.gmail.com

On Mon, 29 Jan 2024, 14:12 Laura Smith, 
wrote:

> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather
> than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>
> Thanks !
>
> Laura
>
>
>


Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Laura Smith
Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than 
online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ?  e.g. using doing something 
clever with ctid or something to ensure both the schema and all its rows were 
restored to the same point at which the dump was taken ?

Thanks !

Laura