Re: pg_dump restores as expected on some machines and reports duplicate keys on others

2024-06-23 Thread Shaheed Haque
First, thanks for all the kind replies.

To my eternal shame, after spending hours trying to debug this, I found,
buried deep in one of my own initialisation scripts, the creation of a
handful of "seed" database objects which, of course, caused all my woes.

Thanks again,

Shaheed


pg_dump restores as expected on some machines and reports duplicate keys on others

2024-06-22 Thread Shaheed Haque
Hi,

I am using Postgres 14 on AWS RDS and am seeing the output of pg_dump be
restored as expected by pg_restore on some database instances, and fail
with reports of duplicate keys on other database instances:

   - My deployments are always a pair, one "logic VM" for Django etc and
   one "RDS instance". The psql client runs on the logic VM. The Postgres
   version is the same in all cases; psql reports:


   - psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1), server 14.9)


   - The pg_restore is done using the same script in both cases.
   - In the failing cases, there are always the same 26 errors (listed in
   detail below), but in summary, 3 distinct "child" tables complain of a
   duplicate id=1, id=2 and id=3 respectively.
   - These "child" tables are FK-related via some intermediate table to a
   top level table. They form a polymorphic set. There are other similar child
   tables which do not appear to be affected:
  - polymorphicmodel
 - companybankdetail
- companybankdetailde
- companybankdetailgb  <<< 1 duplicate, id=2
- companybankdetailus
- companypostaldetail
- companypostaldetailde
- companypostaldetailgb  <<< 1 duplicate, id=1
- companypostaldetailus
 - companytaxdetail
- companytaxdetailde
- companytaxdetailgb  <<< 1 duplicate, id=3
- companytaxdetailus
 - ...
 - several other hierarchies, all error free
 - ...
  - I've looked at the dumped .dat files but they contain no
   duplicates.
   - The one difference I can think of between deployment pairs which work
   ok, and those which fail is that the logic VM (i.e. where the psql client
   script runs) is the use of a standard AWS ubuntu image for the OK case,
   versus a custom AWS image for the failing case.
  - The custom image is a saved snapshot of one created using the
  standard image.

Why should the use of one type of VM image versus another cause pg_restore
to hallucinate the duplicate records?

Encls: 26 errors as mentioned...


pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5635; 1262 53894 DATABASE foo postgres
pg_restore: error: could not execute query: ERROR:  database "foo" already
exists
Command was: CREATE DATABASE foo WITH TEMPLATE = template0 ENCODING =
'UTF8' LOCALE = 'en_US.UTF-8';


pg_restore: from TOC entry 4808; 2606 80439 CONSTRAINT
paiyroll_companybankdetail paiyroll_companybankdetail_pkey dbcoreuser
pg_restore: error: could not execute query: ERROR:  could not create unique
index "paiyroll_companybankdetail_pkey"
DETAIL:  Key (polymorphicmodel_ptr_id)=(2) is duplicated.
Command was: ALTER TABLE ONLY public.paiyroll_companybankdetail
   ADD CONSTRAINT paiyroll_companybankdetail_pkey PRIMARY KEY
(polymorphicmodel_ptr_id);


pg_restore: from TOC entry 4812; 2606 80443 CONSTRAINT
paiyroll_companybankdetailgb paiyroll_companybankdetailgb_pkey dbcoreuser
pg_restore: error: could not execute query: ERROR:  could not create unique
index "paiyroll_companybankdetailgb_pkey"
DETAIL:  Key (companybankdetail_ptr_id)=(2) is duplicated.
Command was: ALTER TABLE ONLY public.paiyroll_companybankdetailgb
   ADD CONSTRAINT paiyroll_companybankdetailgb_pkey PRIMARY KEY
(companybankdetail_ptr_id);


pg_restore: from TOC entry 4817; 2606 80447 CONSTRAINT
paiyroll_companypostaldetail paiyroll_companypostaldetail_pkey dbcoreuser
pg_restore: error: could not execute query: ERROR:  could not create unique
index "paiyroll_companypostaldetail_pkey"
DETAIL:  Key (polymorphicmodel_ptr_id)=(1) is duplicated.
Command was: ALTER TABLE ONLY public.paiyroll_companypostaldetail
   ADD CONSTRAINT paiyroll_companypostaldetail_pkey PRIMARY KEY
(polymorphicmodel_ptr_id);


pg_restore: from TOC entry 4821; 2606 80451 CONSTRAINT
paiyroll_companypostaldetailgb paiyroll_companypostaldetailgb_pkey
dbcoreuser
pg_restore: error: could not execute query: ERROR:  could not create unique
index "paiyroll_companypostaldetailgb_pkey"
DETAIL:  Key (companypostaldetail_ptr_id)=(1) is duplicated.
Command was: ALTER TABLE ONLY public.paiyroll_companypostaldetailgb
   ADD CONSTRAINT paiyroll_companypostaldetailgb_pkey PRIMARY KEY
(companypostaldetail_ptr_id);


pg_restore: from TOC entry 4826; 2606 80455 CONSTRAINT
paiyroll_companytaxdetail paiyroll_companytaxdetail_pkey dbcoreuser
pg_restore: error: could not execute query: ERROR:  could not create unique
index "paiyroll_companytaxdetail_pkey"
DETAIL:  Key (polymorphicmodel_ptr_id)=(3) is duplicated.
Command was: ALTER TABLE ONLY public.paiyroll_companytaxdetail
   ADD CONSTRAINT paiyroll_companytaxdetail_pkey PRIMARY KEY
(polymorphicmodel_ptr_id);


pg_restore: from TOC entry 4830; 2606 80459 CONSTRAINT
paiyroll_companytaxdetailgb paiyroll_companytaxdetailgb_pkey dbcoreuser
pg_restore: error: could not execute query: ERROR:  could not create unique
index "paiyroll_companytaxdetailgb_pkey"
DETAIL:  Key 

Re: Monitoring logical replication

2024-06-18 Thread Shaheed Haque
Hi all,

Is there an "official" pairing of LSN values on the publication and
subscription sides that should be used to track the delta between the two
systems? I ask because Google is full of different pairs being used. I
tried to identify the highest level interface points exposed, i.e. what is
documented on
https://www.postgresql.org/docs/current/replication-origins.html, the
pg_stat_subscription table, the pg_stat_publication table and the
pg_current_wal_lsn() function on the publisher, but these seem to be barely
used.

Am I missing something?

Thanks, Shaheed

P.S. On a related note, I see a (stalled?) discussion on providing LSN ->
timestamp conversion
<https://www.postgresql.org/message-id/flat/CAAKRu_bw7Pgw8Mi9LJrBkFvPPHgvVjPphrT8ugbzs-2V0f%2B1Rw%40mail.gmail.com#8540282228634ecd061585867c6275ca>,
I'd just like to say that something like that would be very useful.



On Tue, 30 Jan 2024 at 11:27, Shaheed Haque  wrote:

> This is great, thank you for posting. I'm currently a subcontinent or two
> away from my dev env, but will compare your approach with mine (you are
> using some facilities of psql I'm not familiar with). At least you have
> confirmed that LSNs are the place to start.
>
> Thanks again, Shaheed
>
>
> On Tue, 30 Jan 2024, 05:15 Klaus Darilion, 
> wrote:
>
>> Hi Saheed!
>>
>> I monitor our replication this way:
>>
>> 1. Every 10 seconds i fetch the current LSN and write it into a table,
>> next with the current timestamp. Further I fetch confirmend LSNs from
>> the replication slots and delete old entries in lsn2data table.
>>
>> calculate_logical_replication_lag.php:
>>
>> >
>> $path = realpath(dirname(__FILE__) . "/../inc");
>> set_include_path($path . PATH_SEPARATOR . get_include_path());
>>
>> require_once('config.php');
>> $config_int['syslogprefix'] = basename(__FILE__);
>> require_once('logging.php');
>>
>> $dbuser="replication_lag_user";
>> $dbpass="";
>> if (!$dbconn = pg_pconnect('host='.$config_int['dbhost'].'
>> dbname='.$config_int['dbname'].' user='.$dbuser.' password='.$dbpass)) {
>>  print "Sorry, database connection failed";
>>  exit;
>> }
>>
>> $accuracy = 10; // in seconds
>>
>> //
>> // Preparations:
>> //
>> // CREATE TABLE lsn2data(
>> //lsn pg_lsn PRIMARY KEY,
>> //seen timestamp NOT NULL DEFAULT NOW()
>> // );
>> // CREATE ROLE replication_lag_user WITH LOGIN PASSWORD
>> 'XXX';
>> // GRANT ALL ON TABLE lsn2data TO replication_lag_user;
>> //
>> // CREATE OR REPLACE FUNCTION get_replication_lag() RETURNS TABLE
>> (subscriber name, lag bigint) AS
>> // $BODY$
>> // DECLARE
>> // subscriber name;
>> // BEGIN
>> // FOR subscriber IN
>> // SELECT slot_name FROM pg_replication_slots
>> // LOOP
>> // RETURN QUERY SELECT slot_name, EXTRACT(EPOCH FROM
>> NOW()-seen)::bigint lag from lsn2data,pg_replication_slots WHERE
>> slot_name=subscriber AND lsn < confirmed_flush_lsn ORDER BY lsn DESC
>> LIMIT 1;
>> // END LOOP;
>> // RETURN;
>> // END
>> // $BODY$
>> // LANGUAGE plpgsql;
>> //
>> while (1) {
>>  $dbq = pg_query("INSERT INTO lsn2data (lsn) VALUES
>> (pg_current_wal_lsn())");
>>  if ($dbq === FALSE) {
>>  mylog(LOG_ERROR, "SQL query error:
>> ".pg_last_error()."\n");
>>  exit(1);
>>  }
>>
>>  $dbq = pg_query("DELETE FROM lsn2data WHERE lsn < (".
>>"SELECT lsn FROM lsn2data WHERE lsn < (".
>>  "SELECT confirmed_flush_lsn FROM
>> pg_replication_slots ORDER BY confirmed_flush_lsn ASC LIMIT 1".
>>") ORDER BY lsn DESC LIMIT 1".
>>  ")"
>>  );
>>  if ($dbq === FALSE) {
>>  mylog(LOG_ERROR, "SQL query error:
>> ".pg_last_error()."\n");
>>  exit(1);
>>  }
>>  sleep($accuracy);
>> }
>>
>> 2. I graph the replications lags (converted from LSN to seconds) in my
>> check_mk monitoring:
>>
>> #!/bin/bash
>>
>> #
>> # Managed by Puppet:
>> modules/base/files/monitoring/check_logical_replication_lag.sh
>> #
>> # Check the logical replication lag and export performance data for each
>> subscriber
>&g

Re: Purpose of pg_dump tar archive format?

2024-06-04 Thread Shaheed Haque
On Tue, 4 Jun 2024 at 20:47, Gavin Roy  wrote:

>
> On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson 
> wrote:
>
>>
>> But why tar instead of custom? That was part of my original question.
>>
>
> I've found it pretty useful for programmatically accessing data in a dump
> for large databases outside of the normal pg_dump/pg_restore workflow. You
> don't have to seek through one large binary file to get to the data section
> to get at the data.
>

This is true for us too; specifically, tar, including with compression, is
very convenient for both CLI and Python ecosystems.


Re: Purpose of pg_dump tar archive format?

2024-06-04 Thread Shaheed Haque
We use it. I bet lots of others do too.

On Tue, 4 Jun 2024, 18:06 Adrian Klaver,  wrote:

> Reply to list also.
> Ccing list
>
> On 6/4/24 10:03 AM, Ron Johnson wrote:
>
> >
> > If you don't need the tar format then don't use it.
> >
> >
> > That's neither the purpose nor the point of my question.
> >
> > I think that a note in the docs mentioning that it's obsolete would be
> > helpful for new users who recognize "tar" so choose it.
>
> You are assuming facts not in evidence, namely that the format is obsolete.
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>


Re: PostgreSQL as advanced job queuing system

2024-03-22 Thread Shaheed Haque
Generally, I'd suggest you think carefully about the nature of the jobs,
and draw up a list of must-have properties (performance of course, but also
things like whether jobs have to survive planned or unplanned outages, be
visible across a WAN, numbers of readers and writers, delivery guarantees,
etc etc) and then decide on make versus "buy". Distributed systems are
hard, and hard to get right.

On Fri, 22 Mar 2024, 16:17 Thiemo Kellner, 
wrote:

>
>
> Am 22.03.2024 um 14:15 schrieb Fred Habash:
> > We developed a home-grown queue system using Postgres, but its
> > performance was largely hindered by que tables bloating and the need to
> > continuously vacuum them. It did not scale whatsoever. With some
> > workarounds, we ended up designing three sets of queue tables, switching
> > between them based on some queue stats, vacuum the inactive set, and
> repeat.
> > We kept this queue system for low SLA app components. For others, we
> > switched to Kafka. Investing in learning and implementing purpose built
> > queue systems pays off for the long term.
>
> I wonder whether one should https://youtu.be/VEWXmdjzIpQ=543 not to
> scale either.
>
>
>


Re: creating a subset DB efficiently ?

2024-03-09 Thread Shaheed Haque
I go more or less the other way. I have a script which follows the
child...parent...grandparent...etc foreign keys in the source database and
dumps only the records belonging to the selected "project" (your
terminology, in my case it is "client"). I save the dumped data to an
archive file.

The script has hardcoded knowledge only about the absolute minimum number
of the root ancestor tables (and certain special cases mostly relating to
the fact I'm talking about a Django ORM-managed schema) and otherwise tries
to avoid knowing too much so that the schema can evolve mostly without
bothering the script.

The script includes meta data about the dump in a "manifest". The manifest
records, amongst other things, the order in which the tables were dumped.
The restore operation uses this to correctly order the table restores.

I can then simply restore the archive, table by table, using the individual
dumps using a script which walks the manifest.






On Sat, 9 Mar 2024, 14:56 hector vass,  wrote:

> On Fri, Mar 8, 2024 at 4:22 PM David Gauthier 
> wrote:
>
>> Here's the situation
>>
>> - The DB contains data for several projects.
>> - The tables of the DB contain data for all projects (data is not
>> partitioned on project name or anything like that)
>> - The "project" identifier (table column) exists in a few "parent" tables
>> with many child... grandchild,... tables under them connected with foreign
>> keys defined with "on delete cascade".  So if a record in one of the parent
>> table records is deleted, all of its underlying, dependent records get
>> deleted too.
>> - New projects come in, and old ones need to be removed and "archived" in
>> DBs of their own.  So there's a DB called "active_projects" and there's a
>> DB called "project_a_archive" (identical metadata).
>> - The idea is to copy the data for project "a" that's in
>> "active_projects" to the "project_a_arhchive" DB AND delete the project a
>> data out of "active_projects".
>> - Leave "project_a_archive" up and running if someone needs to attach to
>> that and get some old/archived data.
>>
>> The brute-force method I've been using is...
>> 1)  pg_dump "active_projects" to a (huge) file then populate
>> "project_a_archive" using that (I don't have the privs to create database,
>> IT creates an empty one for me, so this is how I do it).
>> 2) go into the "project_a_archive" DB and run... "delete from par_tbl_1
>> where project <> 'a' ", "delete from par_tbl_2 where project <> 'a' ",
>> etc... leaving only project "a" data in the DB.
>> 3) go into the "active_projects" DB and "delete from par_tbl_1 where
>> project = 'a' ", etc... removing project "a" from the "active_projects DB.
>>
>> Ya, not very elegant, it takes a long time and it takes a lot of
>> resources.  So I'm looking for ideas on how to do this better.
>>
>> Related question...
>> The "delete from par_tbl_a where project <> 'a' " is taking forever.  I
>> fear it's because it's trying to journal everything in case I want to
>> rollback.  But this is just in the archive DB and I don't mind taking the
>> risk if I can speed this up outside of a transaction.  How can I run a
>> delete command like this without the rollback recovery overhead ?
>>
>
>
> >(I don't have the privs to create database, IT creates an empty one for
> me, so this is how I do it).
>
> That's a shame.  You can do something similar with tablespaces
>   Template your existing schema to create a new schema for the project
> (pg_dump -s)
>   Create tablespace for this new project and schema
>
>  You can then move the physical tablespace to cheaper disk and use
> symbolic links or... archive and/or back it up at the schema level with
> pg_dump -n
>
> ...as long as you don't put anything in the public schema all you are
> really sharing is roles otherwise a bit like a separate database
>
>
>
>
>


Re: Improving pg_dump performance when handling large numbers of LOBs

2024-02-05 Thread Shaheed Haque
Might it be worth a modest amount of time using some basic profiling to see
where the time is going? A week is a looonnngg time, even for 150e6
operations. For example, if there an unexpectedly high IO load, some
temporary M.2 storage might help?

On Tue, 6 Feb 2024, 01:36 Ron Johnson,  wrote:

> On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis 
> wrote:
>
>> Hi,
>>
>> We've inherited a series of legacy PG 12 clusters that each contain a
>> database that we need to migrate to a PG 15 cluster. Each database contains
>> about 150 million large objects totaling about 250GB.
>>
>
> 250*10^9 / (150*10^6) = 1667 bytes.  That's *tiny*.
>
> Am I misunderstanding you?
>
>>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-30 Thread Shaheed Haque
On Tue, 30 Jan 2024, 05:02 Adrian Klaver,  wrote:

> On 1/29/24 11:35, Shaheed Haque wrote:
> >
> >
> > On Tue, 30 Jan 2024, 00:27 Adrian Klaver,  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 1/29/24 10:12, Shaheed Haque wrote:
> >
> >
> >  >
> >  > Yes. But I was under the impression that the initial copy of
> logical
> >  > replication was the same?
> >  >
> >
> > Are you taking about the copy_data option to WITH?
> >
> > If so yes and no.
> >
> > Yes as it uses COPY to transfer the data.
> >
> >
> > Yes, this is what I meant.
> >
> > No as what COPY transfers can be affected by WHERE clauses on the
> > publisher. Also if you have cascading publishers/subscriptions the
> > 'original' data maybe upstream of the publisher you are comparing to.
> >
> >
> > Good points, understood. For the next bit, let's assume neither of these
> > are in play.
> >
> > Finally logical replication is generally not static so there is the
> > issue of determining a point in time for the check.
> >
> >
> > Indeed. I currently have a static source db but would eventually like to
> > eliminate the implied downtime. What I'd like to provide my user is some
>
> Implied downtime of what?
>

Oh, forgot to say: the downtime of my Django based app. Not anything at the
PG level.


> > indication of progress initially during the copy_data phase, and for the
> > future, of the anticipated incremental convergence.
> >
> > And, as per my other note, I would ideally like to be able to do this
> > using only a connection to one db.
> >
> > I was assuming that logical replication needed "something" similar
> > internally, and was hoping the LSNs were that "something".
>
> I'm going to say up front I am no expert on the internals of logical
> replication. Will point you at:
>
> https://www.postgresql.org/docs/current/protocol-message-formats.html
>
> A quick look at that indicates to me it is more involved then you think.
>
> >
> > Thanks, Shaheed
> >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-30 Thread Shaheed Haque
On Tue, 30 Jan 2024, 05:02 Adrian Klaver,  wrote:

> On 1/29/24 11:35, Shaheed Haque wrote:
> >
> >
> > On Tue, 30 Jan 2024, 00:27 Adrian Klaver,  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 1/29/24 10:12, Shaheed Haque wrote:
> >
> >
> >  >
> >  > Yes. But I was under the impression that the initial copy of
> logical
> >  > replication was the same?
> >  >
> >
> > Are you taking about the copy_data option to WITH?
> >
> > If so yes and no.
> >
> > Yes as it uses COPY to transfer the data.
> >
> >
> > Yes, this is what I meant.
> >
> > No as what COPY transfers can be affected by WHERE clauses on the
> > publisher. Also if you have cascading publishers/subscriptions the
> > 'original' data maybe upstream of the publisher you are comparing to.
> >
> >
> > Good points, understood. For the next bit, let's assume neither of these
> > are in play.
> >
> > Finally logical replication is generally not static so there is the
> > issue of determining a point in time for the check.
> >
> >
> > Indeed. I currently have a static source db but would eventually like to
> > eliminate the implied downtime. What I'd like to provide my user is some
>
> Implied downtime of what?
>
> > indication of progress initially during the copy_data phase, and for the
> > future, of the anticipated incremental convergence.
> >
> > And, as per my other note, I would ideally like to be able to do this
> > using only a connection to one db.
> >
> > I was assuming that logical replication needed "something" similar
> > internally, and was hoping the LSNs were that "something".
>
> I'm going to say up front I am no expert on the internals of logical
> replication. Will point you at:
>
> https://www.postgresql.org/docs/current/protocol-message-formats.html
>
> A quick look at that indicates to me it is more involved then you think.
>

I'll take a look. Thanks for the tip and the gentle guidance; it is much
appreciated.


> >
> > Thanks, Shaheed
> >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Monitoring logical replication

2024-01-30 Thread Shaheed Haque
subscriber LIKE '%\_' ORDER BY 2 DESC)
> TO '$TMP_FILE' With CSV" 2>&1> /dev/null
> LC=$(sudo -u postgres psql -t regdns -c "SELECT count(*) FROM
> get_replication_lag();" | tr -d ' ')
>
> if [ $LC == "0" ]; then
>  echo -e "$CMK_HEADER\n0 $CMK_NAME - No Slaves with Replication
> found - maybe we are a slave?" > $CMK_SPOOLDIR/$CMK_SPOOLFILE
>  exit 0;
> fi
>
> grep $(hostname | cut -d '-' -f2) "$TMP_FILE" > /dev/null
> if [ $? != "0" ]; then
>  echo -e "$CMK_HEADER\n2 $CMK_NAME - Postgres Output does not
> seem valid. Please check script $ME and output in $TMP_FILE" >
> $CMK_SPOOLDIR/$CMK_SPOOLFILE
>  exit 1;
> fi
>
> # CSV in Array einlesen
> IFS=$'\n' read -d '' -r -a input_file < "$TMP_FILE"
>
> # Auswerten
> maxlag=0
> for i in "${input_file[@]}"; do
>  node=`echo $i | awk -F  "," '{print $1}' | tr -- _ -`
>  lag=`echo $i | awk -F  "," '{print $2}'`
>  final_output="$final_output$node=$lag;$warn;$crit|"
>  #
>
> https://unix.stackexchange.com/questions/186663/is-there-a-unix-command-that-gives-the-minimum-maximum-of-two-numbers
>  maxlag=$(( maxlag > lag ? maxlag : lag ))
> done
> final_output="${final_output}max-lag=$maxlag;$warn;$crit"
>
> # Letztes Pipe Zeichen rausschneiden
> #final_output=`echo $final_output | rev | cut -c 2- | rev`
>
> # Spool File schreiben
> echo -e $final_output > $CMK_SPOOLDIR/$CMK_SPOOLFILE
> logger -t "$MEBASE" "$final_output"
>
>
> ) 200>/tmp/`basename $0`.exclusivelock
>
>
> 3. During initial sync I check the status on the subscriber. Once it has
> synced all tables of the publication, it will send me an email.
> #
> # Managed by Puppet:
> modules/pdns/templates/check_pglogical_subscription.sh.erb
> #
>
> #
> # This script checks and eventually creates the subscription, and wait
> until the initial sync is finished
> #
>
> PUB=regdns2020_pub
> SLEEP=5
> PREFIX=check_pglogical_subscription.sh
> NUMTAB=175
>
> SECONDS=0
> date
> while true; do
>  echo "SELECT * from pg_subscription;" | sudo -u postgres psql -t
> regdns | grep -q $PUB
>  if [ $? -eq 0 ]; then
>  echo "OK: Host is subscribed to '$PUB'. Checking for
> table count ..."
>  break
>  fi
>  echo "ERROR: Host is not subscribed to '$PUB'. Subscribing to
> master ..."
>  logger -t $PREFIX "ERROR: Host is not subscribed to '$PUB'.
> Subscribing to master ..."
>  echo "CREATE SUBSCRIPTION `hostname -s|tr -- - _` CONNECTION
> 'host=X dbname= user=X password=XX PUBLICATION
> regdns2020_pub;" | sudo -u postgres psql regdns && touch
> /etc/regdns.schema_subscription.created
>  echo "Re-Checking in $SLEEP seconds ..."
>  logger -t $PREFIX "Re-Checking in $SLEEP seconds ..."
>  sleep $SLEEP
> done
>
> while true; do
>  COUNT=$(echo "SELECT count(*) from pg_subscription_rel;" | sudo
> -u postgres psql -t regdns | head -1 | xargs)
>  if [ $COUNT -eq $NUMTAB ]; then
>  echo "OK: Subscription '$PUB' contains $NUMTAB tables -
> that is OK. Checking for initial-sync status ..."
>  logger -t $PREFIX "OK: Subscription '$PUB' contains
> $NUMTAB tables - that is OK. Checking for initial-sync status ..."
>  break
>  fi
>  echo "ERROR: Subscription '$PUB' contains $COUNT tables, but
> should contain $NUMTAB table. Re-Checking in $SLEEP seconds ..."
>  logger -t $PREFIX  "ERROR: Subscription '$PUB' contains $COUNT
> tables, but should contain $NUMTAB table. Re-Checking in $SLEEP seconds
> ..."
>  sleep $SLEEP
> done
>
> while true; do
>  COUNTFIN=$(echo "SELECT count(*) from pg_subscription_rel WHERE
> srsubstate='r';" | sudo -u postgres psql -t regdns | head -1 | xargs)
>  if [ $COUNTFIN -eq $NUMTAB ]; then
>  echo "OK: Initial sync of $COUNTFIN/$NUMTAB tables
> finished in $SECONDS seconds."
>  logger -t $PREFIX "OK: Initial sync of $COUNTFIN/$NUMTAB
> tables finished in $SECONDS seconds."
>  echo "OK: Initial sync of $COUNTFIN/$NUMTAB tables
> finished in $SECONDS seconds." | mailx -s "$HOST $SECONDS seconds to
> subscribe" -- root
>  break
>  fi
>  echo &qu

Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
On Tue, 30 Jan 2024, 00:27 Adrian Klaver,  wrote:

> On 1/29/24 10:12, Shaheed Haque wrote:
>
>
> >
> > Yes. But I was under the impression that the initial copy of logical
> > replication was the same?
> >
>
> Are you taking about the copy_data option to WITH?
>
> If so yes and no.
>
> Yes as it uses COPY to transfer the data.
>

Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the
> publisher. Also if you have cascading publishers/subscriptions the
> 'original' data maybe upstream of the publisher you are comparing to.


Good points, understood. For the next bit, let's assume neither of these
are in play.

Finally logical replication is generally not static so there is the
> issue of determining a point in time for the check.
>

Indeed. I currently have a static source db but would eventually like to
eliminate the implied downtime. What I'd like to provide my user is some
indication of progress initially during the copy_data phase, and for the
future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this using
only a connection to one db.

I was assuming that logical replication needed "something" similar
internally, and was hoping the LSNs were that "something".

Thanks, Shaheed


> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
On Mon, 29 Jan 2024, 23:57 Adrian Klaver,  wrote:

> On 1/29/24 09:28, Shaheed Haque wrote:
>
>
> >
> > Right, for me, state, not just record count is what I'm interested in
> > (for the initial full table copy part of replication). So, given the
> > explanation about the possible per-table window, is there some property
> > of the table that could be used to confirm that a table has made it
> across?
> >
> > I guess there is such a thing since the following incremental syncing
> > would presumably need it. I had hoped the LSN was this thing, but
> > confirmation would be great.
>
> The OP was referring to the pg_dump/pg_restore cycle, you seem to be
> referring to logical replication. Is that correct?
>

Yes. But I was under the impression that the initial copy of logical
replication was the same?


> >
> > Thanks, Shaheed
> >
> >
> >  >
> >  > Thanks !
> >  >
> >  > Laura
> >  >
> >  >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
On Mon, 29 Jan 2024, 22:52 Adrian Klaver,  wrote:

> On 1/29/24 00:12, Laura Smith wrote:
> > Hi
> >
> > Let's say I've got a scenario where I'm doing a pg_dump replication
> rather than online streaming, e.g. due to air-gap or whatever.
> >
> > Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>
> Assuming you are using pg_restore on a non-plain text dump file then
> from pg_restore.c:
>
> * pg_restore.c
>
> *  pg_restore is an utility extracting postgres database definitions
>   *  from a backup archive created by pg_dump using the archiver
>   *  interface.
>   *
>   *  pg_restore will read the backup archive and
>   *  dump out a script that reproduces
>   *  the schema of the database in terms of
>   *user-defined types
>   *user-defined functions
>   *tables
>   *indexes
>   *aggregates
>   *operators
>   *ACL - grant/revoke
>   *
>   * the output script is SQL that is understood by PostgreSQL
>   *
>   * Basic process in a restore operation is:
>   *
>   *  Open the Archive and read the TOC.
>   *  Set flags in TOC entries, and *maybe* reorder them.
>   *  Generate script to stdout
>   *  Exit
>
> Then:
>
> pg_restore -l -f  
>
> to get the TOC mentioned above. Walk through that to verify schema is
> the same in the restored database.
>
> This will not tell you whether all the data was transferred. You will
> either have to trust from pg_dump.c:
>
>   *  pg_dump will read the system catalogs in a database and dump out a
>   *  script that reproduces the schema in terms of SQL that is
> understood
>   *  by PostgreSQL
>   *
>   *  Note that pg_dump runs in a transaction-snapshot mode transaction,
>   *  so it sees a consistent snapshot of the database including system
>   *  catalogs. However, it relies in part on various specialized
> backend
>   *  functions like pg_get_indexdef(), and those things tend to look at
>   *  the currently committed state.  So it is possible to get 'cache
>   *  lookup failed' error if someone performs DDL changes while a
> dump is
>   *  happening. The window for this sort of thing is from the
> acquisition
>   *  of the transaction snapshot to getSchemaData() (when pg_dump
> acquires
>   *  AccessShareLock on every table it intends to dump). It isn't
> very large,
>   *  but it can happen.
>
> Or come up with way to capture the state of the data at the time of dump
> and then compare to restored database. Something like Ron posted.
>

Right, for me, state, not just record count is what I'm interested in (for
the initial full table copy part of replication). So, given the explanation
about the possible per-table window, is there some property of the table
that could be used to confirm that a table has made it across?

I guess there is such a thing since the following incremental syncing would
presumably need it. I had hoped the LSN was this thing, but confirmation
would be great.

Thanks, Shaheed


> >
> > Thanks !
> >
> > Laura
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>


Re: Scriptable way to validate a pg_dump restore ?

2024-01-29 Thread Shaheed Haque
I'd also like to know how to do this. The current approaches seem, afaict,
to involve making on both end of the connection. Even given the inherently
racy nature of the issue, that seems unwieldy to me.

https://www.postgresql.org/message-id/CAHAc2jdAHvp7tFZBP37awcth%3DT3h5WXCN9KjZOvuTNJaAAC_hg%40mail.gmail.com

On Mon, 29 Jan 2024, 14:12 Laura Smith, 
wrote:

> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather
> than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>
> Thanks !
>
> Laura
>
>
>


Monitoring logical replication

2023-10-07 Thread Shaheed Haque
Hi,

I've been playing with logical replication (currently on PG14),
specifically in an AWS RDS Postgres context, but NOT using AWS' own
replication tooling. I'm generally familiar with the challenges of
distributed systems (such causality, time synchronisation etc), but not
especially familiar with PG.

In looking at how to tell how a given subscriber has caught up with its
publisher, there is plenty of advice around the Web, for example
https://dba.stackexchange.com/questions/314324/monitor-logical-replication-using-lsn.
Like this example, much advice ends up talking about using separate queries
on the publisher and the subscriber to compare LSNs. First, (I think) I
understand the core difficulty that comparing LSNs is inherently racy, but
given that, I'm a bit unclear as to why a single query on the publisher is
not enough...IIUC:

   - Changes sent from the publisher to the subscriber are identified by
   LSN.
   - The publisher knows it's own current latest LSN (pg_current_wal_lsn()),
   but this seems not to be exposed at the subscriber.
   - The subscriber knows what it has applied locally and even tells the
   publisher (pg_stat_subscription.latest_end_lsn), but it does not seem to
   be exposed at the publisher.

Have I missed something? Is there a way to track the LSN delta (given that
this is known to be racy) just by querying one end?

Second, how do folk "know" when replication is "done". For example, if the
two LSNs continued to match for 1 * replication lag? Or N * replication
lag? What would be a plausible N?

Third, as we know when logical replication is started, the initial table
state is captured in a snapshot, and sent across using COPY TABLE under the
covers. Now, let's say that the publisher is idle (i.e. no SQL writes to
the user's schema...obviously pg_catalog might change as replication is
configured and enabled) and that the replication starts with the publisher
as LSN_start. How could one know when the copying is done:

   - I initially assumed that the publisher's LSN would not change from
   LSN_start, but as the copying proceeds, I see that it DOES change
   (presumably because there are updates happening to pg_catalog, such as the
   temporary slots coming and going).
   - Is there some kind of singleton state on either publisher or
   subscriber that could be checked to know? (At the moment, I am counting the
   records in all copied tables).

I realise that the knowledge that the publisher is "idle" is a special
case, but right now, my test for being "done" is:

   - Number of records in copied tables matches AND the publisher's
   pg_stat_subscription matches the subscriber's pg_stat_subscription.
   latest_end_lsn.

Plus or minus the bit about replication lag, is there a better way?

Thanks, Shaheed


Re: Is it possible to index "deep" into a JSONB column?

2022-05-31 Thread Shaheed Haque
I forgot to say...

I see the documentation on jsonpath indexing says:

"GIN index extracts statements of following form out of jsonpath:
accessors_chain = const. Accessors chain may consist of .key, [*], and
[index] accessors. jsonb_ops additionally supports .* and .**
accessors"

But I'm unable to make "CREATE INDEX...USING gin ((snapshot ->
'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true
|| @.state.employment[last][2] == 0)'))" trigger that.

On Tue, 31 May 2022 at 10:16, Shaheed Haque  wrote:
>
> OK, I was able to translate your excellent note into this:
>
> CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.works_id'));
>
> and query using "@>" to see it in use:
>
>SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...
>
>EXPLAIN ANALYSE...
> ->  Bitmap Index Scan on foo  (cost=0.00..8.22 rows=29
> width=0) (actual time=0.005..0.005 rows=0 loops=1)
>   Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
> @> '1091'::jsonb)
>
> Unfortunately, my real query involved multiple AND clauses which (as
> per another recent thread) seems to end up fetching the large
> (probably TOASTed) JSONB once for each AND clause at a cost of 150ms
> each. So, I got rid of the multiple ANDs by staying inside the
> jsonpath like this:
>
> SELECT ... WHERE (... AND
> (snapshot @? '$.employee."2209" ? (
> @.pay_graph <> 0 || @.last_run_of_employment == true ||
> @.state.employment[last][2] == 0)'
> )
> );
>
> But I have not been able to find an index formulation the new jsonpath
> can use. I tried adding
>
> CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
> CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
> CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.pay_graph'));
>
> Any thoughts on indexing for this case? If it makes any difference,
> I'm presently on PG12 and PG13 but looking to move to PG14.
>
> Thanks, Shaheed
>
> On Mon, 30 May 2022 at 19:59, Shaheed Haque  wrote:
> >
> > Imre,
> >
> > I'm gradually working my way into the combination of SQL, JSONB and
> > jsonpath that this involves even without the indexing, but this looks
> > very helpful/promising, especially on the indexing. Thanks a lot for
> > the input,
> >
> > Shaheed
> >
> > On Mon, 30 May 2022 at 11:50, Imre Samu  wrote:
> > >
> > > > In principle, I believe this allows index-assisted access to keys and
> > > > values nested in arrays and inner objects but in practice, it seems
> > > > the planner "often" decides to ignore the index in favour of a table 
> > > > scan.
> > >
> > > part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
> > >
> > > SET enable_seqscan = OFF;
> > > select * from jsonb_table
> > >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') 
> > > @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > > ++---+
> > > | id |   jsonb_col
> > >|
> > > ++---+
> > > |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": 
> > > "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}} |
> > > ++---+
> > > (1 row)
> > >
> > >
> > > EXPLAIN ANALYZE select * from jsonb_table
> > >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') 
> > > @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > > +-+
> > > |   
> > > QUERY PLAN
> > > |
> > > +-+
> > > | Bitmap Heap Scan on js

Re: Is it possible to index "deep" into a JSONB column?

2022-05-31 Thread Shaheed Haque
OK, I was able to translate your excellent note into this:

CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
'$.employee.*.works_id'));

and query using "@>" to see it in use:

   SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...

   EXPLAIN ANALYSE...
->  Bitmap Index Scan on foo  (cost=0.00..8.22 rows=29
width=0) (actual time=0.005..0.005 rows=0 loops=1)
  Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
@> '1091'::jsonb)

Unfortunately, my real query involved multiple AND clauses which (as
per another recent thread) seems to end up fetching the large
(probably TOASTed) JSONB once for each AND clause at a cost of 150ms
each. So, I got rid of the multiple ANDs by staying inside the
jsonpath like this:

SELECT ... WHERE (... AND
(snapshot @? '$.employee."2209" ? (
@.pay_graph <> 0 || @.last_run_of_employment == true ||
@.state.employment[last][2] == 0)'
)
);

But I have not been able to find an index formulation the new jsonpath
can use. I tried adding

CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
'$.employee.*.pay_graph'));

Any thoughts on indexing for this case? If it makes any difference,
I'm presently on PG12 and PG13 but looking to move to PG14.

Thanks, Shaheed

On Mon, 30 May 2022 at 19:59, Shaheed Haque  wrote:
>
> Imre,
>
> I'm gradually working my way into the combination of SQL, JSONB and
> jsonpath that this involves even without the indexing, but this looks
> very helpful/promising, especially on the indexing. Thanks a lot for
> the input,
>
> Shaheed
>
> On Mon, 30 May 2022 at 11:50, Imre Samu  wrote:
> >
> > > In principle, I believe this allows index-assisted access to keys and
> > > values nested in arrays and inner objects but in practice, it seems
> > > the planner "often" decides to ignore the index in favour of a table scan.
> >
> > part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
> >
> > SET enable_seqscan = OFF;
> > select * from jsonb_table
> >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> 
> > to_jsonb('1971-02-02'::TEXT);
> > ;
> > ++---+
> > | id |   jsonb_col  
> >  |
> > ++---+
> > |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, 
> > "7122": {"date_of_birth": "1971-02-02"}}} |
> > ++---+
> > (1 row)
> >
> >
> > EXPLAIN ANALYZE select * from jsonb_table
> >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> 
> > to_jsonb('1971-02-02'::TEXT);
> > ;
> > +-+
> > |   
> > QUERY PLAN  
> >   |
> > +-+
> > | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) 
> > (actual time=0.056..0.059 rows=1 loops=1)   
> >  |
> > |   Recheck Cond: (jsonb_path_query_array(jsonb_col, 
> > '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> 
> > to_jsonb('1971-02-02'::text)) |
> > |   Heap Blocks: exact=1
> > 
> >   |
> > |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) 
> > (actual time=0.026..0.027 rows=1 loops=1)   
> > |
> > | Index Cond: (jsonb_path_query_array(jsonb_col, 
> > '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> 
> > to_jsonb('1971-02-02'::text)) |
> > | Planning Time: 0.255 ms   

Re: Is it possible to index "deep" into a JSONB column?

2022-05-30 Thread Shaheed Haque
Imre,

I'm gradually working my way into the combination of SQL, JSONB and
jsonpath that this involves even without the indexing, but this looks
very helpful/promising, especially on the indexing. Thanks a lot for
the input,

Shaheed

On Mon, 30 May 2022 at 11:50, Imre Samu  wrote:
>
> > In principle, I believe this allows index-assisted access to keys and
> > values nested in arrays and inner objects but in practice, it seems
> > the planner "often" decides to ignore the index in favour of a table scan.
>
> part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
>
> SET enable_seqscan = OFF;
> select * from jsonb_table
>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> 
> to_jsonb('1971-02-02'::TEXT);
> ;
> ++---+
> | id |   jsonb_col
>|
> ++---+
> |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, 
> "7122": {"date_of_birth": "1971-02-02"}}} |
> ++---+
> (1 row)
>
>
> EXPLAIN ANALYZE select * from jsonb_table
>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> 
> to_jsonb('1971-02-02'::TEXT);
> ;
> +-+
> |   QUERY 
> PLAN|
> +-+
> | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual 
> time=0.056..0.059 rows=1 loops=1)|
> |   Recheck Cond: (jsonb_path_query_array(jsonb_col, 
> '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> 
> to_jsonb('1971-02-02'::text)) |
> |   Heap Blocks: exact=1  
> |
> |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) 
> (actual time=0.026..0.027 rows=1 loops=1) 
>   |
> | Index Cond: (jsonb_path_query_array(jsonb_col, 
> '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> 
> to_jsonb('1971-02-02'::text)) |
> | Planning Time: 0.255 ms 
> |
> | Execution Time: 0.122 ms
> |
> +-+
> (7 rows)
>
> regards,
>   Imre
>
>
> Imre Samu  ezt írta (időpont: 2022. máj. 30., H, 12:30):
>>
>> Hi Shaheed,
>>
>> > WHAT GOES HERE
>>
>> imho check the:   jsonb_path_query_array( jsonb_col, 
>> '$.employee.*.date_of_birth' )
>>
>> may example:
>>
>> CREATE TABLE jsonb_table (
>> id serial primary key,
>> jsonb_col JSONB
>> );
>>
>> INSERT INTO jsonb_table(jsonb_col)
>> VALUES
>>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": 
>> "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
>>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": 
>> "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
>>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": 
>> "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
>> ;
>>
>> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  
>> from jsonb_table;
>>
>> -- create index
>> create index jpqarr_idx
&g

Re: Is it possible to index "deep" into a JSONB column?

2022-05-30 Thread Shaheed Haque
Hi Bryn,

On Mon, 30 May 2022 at 03:12, Bryn Llewellyn  wrote:
...
>
> Try this:
>
>snapshot -> ‘employee’->>’date_of_birth’
>

Syntactically, that works:

create index bryn on paiyroll_payrun using btree ((snapshot ->
'employee'->>'date_of_birth'));

But IIUC it is looking for 'date_of_birth' in the wrong level. it
would need to do something like this:

create index bryn on paiyroll_payrun using btree ((snapshot ->
'employee'->'2209'->>'date_of_birth'));

To paraphrase, my question is about how to replace the '2209' with all
possible object keys. For the employee-is-an-array, the documentation
says this should work:

create index bryn2 on paiyroll_payrun using btree ((snapshot ->
'employee[*]'->>'date_of_birth'));

but is silent on the employee-is-an-object case. (As I said, in case
it helps, in my case the keys are in fact stringified numbers).

Thanks, Shaheed




Is it possible to index "deep" into a JSONB column?

2022-05-29 Thread Shaheed Haque
Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:

  create index idx1 on mytable using gin (snapshot);

In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).

Now, I know it is possible to index inner objects, so that is snapshot
looks like this:

{
"stuff": {},
"more other stuff": {},
"employee": {
 "1234": {"date_of_birth": "1970-01-01"},
 "56B789": {"date_of_birth": "1971-02-02"},
}
}

I can say:

  create index idx2 on mytable using gin ((snapshot -> 'employee'));

But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:

  create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));

I believe an asterisk "*" would work if 'employee' was an array, but
here it is  nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).

Thanks, Shaheed




Re: JSONB index not in use, but is TOAST the real cause of slow query?

2022-05-29 Thread Shaheed Haque
On Sun, 29 May 2022, 15:58 Tom Lane,  wrote:

> Shaheed Haque  writes:
> > Unfortunately, the real query which I think should behave very
> > similarly is still at the several-seconds level despite using the
> > index. ...
>
> > -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
> > width=4) (actual time=32.488..2258.891 rows=62 loops=1)
> >   Recheck Cond: ((company_id = 173) AND ((snapshot ->
> > 'employee'::text) ? '16376'::text))
> >   Filter: (((snapshot #>
> > '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
> > OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
> > OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
> > '0'::jsonb))
> >   Heap Blocks: exact=5
> > -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
> > time=0.038..0.039 rows=0 loops=1)
> >   -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
> > (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
> > loops=1)
> > Index Cond: (company_id = 173)
> >   -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
> > width=0) (actual time=0.021..0.021 rows=62 loops=1)
> > Index Cond: ((snapshot -> 'employee'::text) ?
> '16376'::text)
>
> > IIUC, at the bottom, the indices are doing their thing, but a couple
> > of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
> > cannot quite see why.
>
> I suppose it's the execution of that "Filter" condition, which will
> require perhaps as many as three fetches of the "snapshot" column.
>

Thanks, that's clearly in the frame.

You really need to rethink that data structure.  Sure, you can store tons
> of unorganized data in a jsonb column, but you pay for that convenience
> with slow access.  Normalizing the bits you need frequently into a more
> traditional relational schema is the route to better-performing queries.
>

Ack. Indeed, the current design works very well for all of the access
patterns other than this one, which only recently came into view as a
problem.

Ahead of contemplating a design change I have been looking at how to
optimise this bit. I'm currently mired in a crash course on SQL syntax as
pertains to JSONB, jsonpath et. al. And the equally mysterious side effects
of "?" and "@>" and so on in terms of the amount of data being fetched etc.
(and all wrapped in a dose of ORM for good measure).

I'll write separately with more specific questions if needed on those
details.

Thanks again for the kind help.

Shaheed


> regards, tom lane
>


Re: JSONB index not in use, but is TOAST the real cause of slow query?

2022-05-28 Thread Shaheed Haque
Adrian, Tom, thanks for the input. Based on that, it occurred to me to
create some "dummy" rows and - almost magically - the index kicks in!
Before is 2500ms:


Seq Scan on paiyroll_payrun  (cost=0.00..52.43 rows=17 width=32)
(actual time=53.127..2567.024 rows=104 loops=1)
  Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Filter: 1835
Planning Time: 0.060 ms
Execution Time: 2567.044 ms
(5 rows)


After is 300ms:


Bitmap Heap Scan on paiyroll_payrun  (cost=36.11..64.67 rows=14
width=32) (actual time=4.189..311.932 rows=104 loops=1)
  Recheck Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Index Recheck: 1
  Heap Blocks: exact=8
  ->  Bitmap Index Scan on idx1  (cost=0.00..36.10 rows=14 width=0)
(actual time=0.087..0.087 rows=105 loops=1)
Index Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
Planning Time: 0.167 ms
Execution Time: 311.962 ms
(8 rows)


Woot!

Unfortunately, the real query which I think should behave very
similarly is still at the several-seconds level despite using the
index. Before 3600ms:


SELECT "paiyroll_payrun"."actual_t"
  FROM "paiyroll_payrun"
 WHERE ("paiyroll_payrun"."company_id" = 173 AND
("paiyroll_payrun"."snapshot" -> 'employee') ? '16376'
   AND NOT (("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','last_run_of_employment']) = 'true'
   AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','pay_graph']) = '0'
   AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','state','employment','-1','2']) > '0'))
 ORDER BY "paiyroll_payrun"."actual_t" DESC
 LIMIT 1



Limit (cost=31.33..31.33 rows=1 width=4) (actual
time=3595.174..3595.176 rows=1 loops=1)
  -> Sort (cost=31.33..31.33 rows=3 width=4) (actual
time=3595.174..3595.174 rows=1 loops=1)
Sort Key: actual_t DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on paiyroll_payrun (cost=6.43..31.31 rows=3
width=4) (actual time=44.575..3595.082 rows=62 loops=1)
   Recheck Cond: (company_id = 173)
   Filter: (((snapshot -> 'employee'::text) ? '16376'::text) AND
(((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <>
'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[])
<> '0'::jsonb) OR ((snapshot #>
'{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb)))
Rows Removed by Filter: 242
   Heap Blocks: exact=9
 -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.43 rows=304 width=0) (actual time=0.013..0.013 rows=304
loops=1)
 Index Cond: (company_id = 173)
Planning Time: 0.258 ms
Execution Time: 3595.195 ms


After, with the index in use, 2200ms:


Limit (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.989..2258.990 rows=1 loops=1)
  -> Sort (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.988..2258.989 rows=1 loops=1)
Sort Key: actual_t DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
width=4) (actual time=32.488..2258.891 rows=62 loops=1)
  Recheck Cond: ((company_id = 173) AND ((snapshot ->
'employee'::text) ? '16376'::text))
  Filter: (((snapshot #>
'{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
'0'::jsonb))
  Heap Blocks: exact=5
-> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
time=0.038..0.039 rows=0 loops=1)
  -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
loops=1)
Index Cond: (company_id = 173)
  -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
width=0) (actual time=0.021..0.021 rows=62 loops=1)
Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)
Planning Time: 0.245 ms
Execution Time: 2259.019 ms
===

IIUC, at the bottom, the indices are doing their thing, but a couple
of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
cannot quite see why. Have I missed a needed index or what? I'm pretty
much a novice at SQL, so it is quite possible I've missed something
very basic.

Thanks, Shaheed

On Sat, 28 May 2022 at 20:59, Tom Lane  wrote:
>
> Shaheed Haque  writes:
> > One last thought about TOAST. If the cost of the -> retrieving the
> > data cannot be obviated, is there any way to tweak how that works?
>
> The only thing that's readily tweakable is to disable data compression

Re: JSONB index not in use, but is TOAST the real cause of slow query?

2022-05-28 Thread Shaheed Haque
Tom,

Thanks for the considered advice and insights. My takeaway is that
based on what I've said,you are mostly unsurprised by the results I
see. In the longer term, the number of rows will increase but I will
have to ponder options for the immediate future. I'll have a play with
the knobs you suggested and will report back with anything of note.

One last thought about TOAST. If the cost of the -> retrieving the
data cannot be obviated, is there any way to tweak how that works?

Thanks, Shaheed

On Sat, 28 May 2022 at 19:41, Tom Lane  wrote:
>
> Shaheed Haque  writes:
> > ==
> > foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM
> > paiyroll_payrun WHERE snapshot ->'employee' ? '2209';
> >   QUERY PLAN
> > -
> > Seq Scan on paiyroll_payrun  (cost=0.00..29.13 rows=9 width=32)
> > (actual time=50.185..2520.983 rows=104 loops=1)
> >   Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
> >   Rows Removed by Filter: 835
> > Planning Time: 0.075 ms
> > Execution Time: 2521.004 ms
> > (5 rows)
> > ==
>
> > So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to
> > presume the INDEX is not used because of the number of rows? Is there
> > a way to verify that?
>
> You could do "set enable_seqscan = off" and see if the EXPLAIN
> results change.  My guess is that you'll find that the indexscan
> alternative is costed at a bit more than 29.13 units and thus
> the planner thinks seqscan is cheaper.
>
> > And how can I understand the dreadful amount of
> > time (of course, this is just on my dev machine, but still...)?
>
> In the seqscan case, the -> operator is going to retrieve the whole
> JSONB value from each row, which of course is pretty darn expensive
> if it's a few megabytes.  Unfortunately the planner doesn't account
> for detoasting costs when making such estimates, so it doesn't
> realize that the seqscan case is going to be expensive.  (Fixing
> that has been on the to-do list for a long time, but we seldom
> see cases where it matters this much, so it hasn't gotten done.)
>
> The problem would likely go away by itself if your table had more
> than a few hundred rows, but if you don't anticipate that happening
> then you need some sort of band-aid.  I don't recommend turning
> enable_seqscan off as a production fix; it'd likely have negative
> effects on other queries.  Personally I'd experiment with reducing
> random_page_cost a bit to see if I could encourage use of the index
> that way.  The default value of 4.0 is tuned for spinning-rust
> storage and is not too appropriate for a lot of modern hardware,
> so there's probably room to fix it that way without detuning your
> setup for other queries.
>
> You should probably also rethink whether you really want to store
> your data in this format, because anything at all that you do with
> that big JSONB column is going to be expensive.  (Another thing
> that's been on the to-do list for awhile is enabling partial
> retrieval of large JSONB values, but AFAIK that hasn't happened
> yet either.)
>
> regards, tom lane




JSONB index not in use, but is TOAST the real cause of slow query?

2022-05-28 Thread Shaheed Haque
Hi,

I have a database table with a modest number of rows (<1000) but where
one column in the table is a JSONB "snapshot" which can be a few MB in
size. Generally, this is a great fit for the read-write access
patterns involved, but there is a read-query which is VERY slow.

I've searched via Google and in this mailing list archive and found
lots of material, but nothing that explains what is going on...or how
to fix it. The table columns look like this:

...a primary key...
...various other keys including a couple of FKs...
snapshot JSONB

and before I did anything, the indices looked like this:

   "paiyroll_payrun_pkey" PRIMARY KEY, btree (process_ptr_id)
   "paiyroll_payrun_company_id_ce341888" btree (company_id)
   "paiyroll_payrun_schedule_id_1593f55f" btree (schedule_id)

The data in one row's "snapshot" looks a bit like this:

{
"stuff": {},
"more other stuff": {},
"employee": {
 "1234": {},
 "56789": {},
}
}

The query that is slow can be approximated like this:

   SELECT snapshot ->'stuff'->'item' FROM paiyroll_payrun WHERE
snapshot ->'employee' ? '2209';

When I add this index:

CREATE INDEX idx1 ON paiyroll_payrun USING gin ((snapshot ->'employee'));

the analyser says this:

==
foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM
paiyroll_payrun WHERE snapshot ->'employee' ? '2209';
  QUERY PLAN
-
Seq Scan on paiyroll_payrun  (cost=0.00..29.13 rows=9 width=32)
(actual time=50.185..2520.983 rows=104 loops=1)
  Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Filter: 835
Planning Time: 0.075 ms
Execution Time: 2521.004 ms
(5 rows)
==

So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to
presume the INDEX is not used because of the number of rows? Is there
a way to verify that? And how can I understand the dreadful amount of
time (of course, this is just on my dev machine, but still...)? Is
there a way to see/tweak what TOAST costs or indeed to confirm if it
is even in use?

Any help appreciated.

Thanks, Shaheed




Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-02 Thread Shaheed Haque
>
> Hi,
>
> On Mon, 1 Jun 2020 at 23:50, Alban Hertroys  wrote:


> > On 1 Jun 2020, at 20:18, Shaheed Haque  wrote:
> >
> > Hi,
> >
> > I'm using Django's ORM to access Postgres12. My "MyModel" table has a
> JSONB column called 'snapshot'. In Python terms, each row's 'snapshot'
> looks like this:
> >
> > ==
> > snapshot = {
> > 'pay_definition' : {
> > '1234': {..., 'name': 'foo', ...},
> > '99': {..., 'name': 'bar', ...},
> > }
> > ==
> >
> > I'd like to find all unique values of 'name' in all rows of MyModel. I
> have this working using native JSON functions from the ORM like this:
> >
> > =
> > class PayDef(Func):
> > function='to_jsonb'
> >
>  
> template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
> >
> >
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
> flat=True)
> > =
> >
> > So, skipping the ordering/distinct/ORM parts, the core looks like this:
> >
> >
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)
>
>
> I do something like this to get a set of sub-paths in a JSONB field (no
> idea how to write that in Django):
>
> select snapshot->’pay_definition’->k.value->’name’
>   from MyModel
>   join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on
> true
>
>
I was unaware of the LATERAL keyword, so thanks. After a bit of Googling
however, it seems that it is tricky/impossible to use from the ORM (barring
a full scale escape to a "raw" query). One question: as a novice here, I
think I understand the right hand side of your JOIN "... k(value)" is
shorthand for:

... AS table_name(column_name)

except that I don't see any clues in the docs that jsonb_object_keys() is a
"table function". Can you kindly clarify?

> I don’t know how that compares performance-wise to using jsonb_each, but
> perhaps worth a try. Obviously, the way it’s written above it doesn’t
> return distinct values of ’name’ yet, but that’s fairly easy to remedy.
>
> Indeed; this is what I managed to get to:

SELECT DISTINCT snapshot -> 'pay_definition' -> k.value -> 'name' AS name
FROM paiyroll_payrun
 JOIN LATERAL jsonb_object_keys(snapshot -> 'pay_definition')
AS k(value) ON true
ORDER BY name;

At any rate, I'll have to ponder the "raw" route absent some way to "JOIN
LATERAL".

Thanks, Shaheed


> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>
>


Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-01 Thread Shaheed Haque
Hi,

I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB
column called 'snapshot'. In Python terms, each row's 'snapshot' looks like
this:

==
snapshot = {
'pay_definition' : {
'1234': {..., 'name': 'foo', ...},
'99': {..., 'name': 'bar', ...},
}
==

I'd like to find all unique values of 'name' in all rows of MyModel. I have
this working using native JSON functions from the ORM like this:

=
class PayDef(Func):
function='to_jsonb'

template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"

MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
flat=True)
=

So, skipping the ordering/distinct/ORM parts, the core looks like this:

to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name')

My question is if this the best way to solve this problem? The way my
current logic works, reading from inside out is, I think:

   1. Pass in the 'snapshot'.
   2. Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this
   key.
   3. To skip the unknown numeric keys, "jsonb_each()" turns each key,
   value pair into an inner row like ['1234', {...}].
   4. To get to the value column of the inner row "row_to_json()->'value'".
   5. To get the name field's value "->'name'".
   6. A final call to "to_jsonb" in the PayDefs class. This bit is clearly
   Django-specific.

For example, I think the pair of calls row_to_json(jsonb_each()) is needed
because there is no jsonb_object_values() to complement
jsonb_object_keys(). Likewise, since all I care about is the string value
of 'name', is there a way to get rid of the PayDefs class, and its
invocation of to_jsonb (this is probably Django-specific)?

To provide context on what "better" might be:

   - Snapshot JSONs might easily be 20MB in size.
   - Each 'pay_definition' is probablyonly about 1kB in size, and there
   might be 50 of them in a snapshot.
   - There might be 1000 MyModel instances in a given query.
   - I'm using PostgreSQL 12

so my concern is not have the database server or Django perform extraneous
work converting between strings and JSON for example.

Thanks, Shaheed

P.S. I posted a Django-centric version of this to the relevant mailing list
but got no replies; nevertheless, apologies for the cross post.