Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Abhishek Bhola
Hi Stephen

I got the latest documentation
and
understood that I was supposed to run
`select * from pg_backup_stop();`  and store the labelfile output to a file
in the data directory of the secondary node.

I also understood that this Low-level API backup is not really
a recommended way and I need to move to another method.
However, till I fully set up a tool like pgBackRest (of which you are one
of the contributors - very impressive!) for PROD, can you please tell me:
1) Are there any other options that are safe to use yet fast? Like you said
rsync --size-only would miss a lot of changes, but anything that would also
not start syncing every data file with only different timestamp, even
though it is exactly the same, including the checksum.
2) While rsyncing from the master node, do I need to exclude `pg_wal`
folder?
3) The replica/standby node should have access to the WAL segments archived
during time between start and stop backup, right?
4) What kind of data corruption would have been introduced due to the
absence of labelfile and if there is a way to identify the corruption? So
for example, if I am able to reindex all the tables, select all the data
and vacuum the db, do I still need to pg_dump and pg_restore those DBs?

Thanks

On Wed, Jul 26, 2023 at 11:00 AM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> Hi Stephen
>
> Thank you for your reply.
>
> Basically, this is not a valid way to perform a backup/restore of PG.
>
> Is it not valid only for PG 15 or even for earlier versions? I have always
> referred to this https://www.postgresql.org/docs/8.1/backup-online.html
>
>
>> better, use an existing well maintained backup/restore tool for this
>
> Is there any tool you could recommend? A tool to bring back the standby
> node when it was down for a day or so for some OS/firmware upgrade.
> I have tried pgBackRest, but it requires a control node. So anything that
> doesn't involve a 3rd server for this situation would be helpful.
>
>  this also doesn't grab and restore the absolutely required
>> backup_label file that's returned from pg_backup_stop()
>>
> I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
> but I did not see any backup_label_file being created.
> psql (15.1)
> Type "help" for help.
>
> postgres=# select pg_backup_start('backup');
>  pg_backup_start
> -
>  68/3228
> (1 row)
>
> postgres=# select pg_backup_stop();
> NOTICE:  all required WAL segments have been archived
> pg_backup_stop
>
> ---
>  (68/32000100,"START WAL LOCATION: 68/3228 (file
> 000100680032)+
>  CHECKPOINT LOCATION: 68/3260
> +
>  BACKUP METHOD: streamed
>+
>  BACKUP FROM: primary
> +
>  START TIME: 2023-07-26 08:51:28 JST
>+
>  LABEL: backup
>+
>  START TIMELINE: 1
>+
>  ","16724 /PGDATA/datadg/tbs1
> +
>  16725 /PGDATA/datadg/tbs2
>+
>  ")
> (1 row)
>
> I read the documentation on this page
> https://www.postgresql.org/docs/current/functions-admin.html
>
>> The desired contents of the backup label file and the tablespace map file
>> are returned as part of the result of the function and must be written to
>> files in the backup area.
>
> I don't understand, "*must be written to files in the backup area*". Does
> it mean we need to manually create a file first on the master node and then
> rsync it to the backup node?
>
> Thanks
>
> On Tue, Jul 25, 2023 at 9:22 PM Stephen Frost  wrote:
>
>> Greetings,
>>
>> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
>> > I recently set up a Postgres15 master-slave (Primary-secondary) cluster
>> on
>> > 2 nodes. At the time of starting, I ensured that data files are exactly
>> the
>> > same on both the nodes. The size of the DB is 1.5TB.
>> > The directory structure on both nodes looks as follows:
>> > ```
>> >   - /PROD/datadg/
>> >|
>> >|-> /PROD/datadg/tablespace
>> >|   |-> /PROD/datadg/tablespace/tablespace1
>> >|   |-> /PROD/datadg/tablespace/tablespace2
>> >|
>> >|-> /PROD/datadg/data
>> >|   |-> /PROD/datadg/data/pg_tblspc
>> >|   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
>> > /PROD/datadg/tablespace/tablespace1
>> >|   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
>> > /PROD/datadg/tablespace/tablespace2
>> >
>> > ```
>> > Almost a week later now, I see almost a 2GB size difference in the
>> > tablespace folders on the 2 nodes. I also see some file count
>> difference on
>> > both the nodes.
>>
>> Not really enough to go on here.
>>
>> > `autovacuum` is on on both the nodes and there aren't any `idle in
>> > transaction` queries on the slave node. Also there hasn't been any
>> > disruption on the streaming replication. I did not get any error like
>> `WAL
>> > segment already removed` or so on the slave node. `pg_stat_repl

Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Abhishek Bhola
Hi Stephen

Thank you for your reply.

Basically, this is not a valid way to perform a backup/restore of PG.

Is it not valid only for PG 15 or even for earlier versions? I have always
referred to this https://www.postgresql.org/docs/8.1/backup-online.html


> better, use an existing well maintained backup/restore tool for this

Is there any tool you could recommend? A tool to bring back the standby
node when it was down for a day or so for some OS/firmware upgrade.
I have tried pgBackRest, but it requires a control node. So anything that
doesn't involve a 3rd server for this situation would be helpful.

 this also doesn't grab and restore the absolutely required
> backup_label file that's returned from pg_backup_stop()
>
I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
but I did not see any backup_label_file being created.
psql (15.1)
Type "help" for help.

postgres=# select pg_backup_start('backup');
 pg_backup_start
-
 68/3228
(1 row)

postgres=# select pg_backup_stop();
NOTICE:  all required WAL segments have been archived
pg_backup_stop
---
 (68/32000100,"START WAL LOCATION: 68/3228 (file
000100680032)+
 CHECKPOINT LOCATION: 68/3260
  +
 BACKUP METHOD: streamed
   +
 BACKUP FROM: primary
  +
 START TIME: 2023-07-26 08:51:28 JST
   +
 LABEL: backup
   +
 START TIMELINE: 1
   +
 ","16724 /PGDATA/datadg/tbs1
  +
 16725 /PGDATA/datadg/tbs2
   +
 ")
(1 row)

I read the documentation on this page
https://www.postgresql.org/docs/current/functions-admin.html

> The desired contents of the backup label file and the tablespace map file
> are returned as part of the result of the function and must be written to
> files in the backup area.

I don't understand, "*must be written to files in the backup area*". Does
it mean we need to manually create a file first on the master node and then
rsync it to the backup node?

Thanks

On Tue, Jul 25, 2023 at 9:22 PM Stephen Frost  wrote:

> Greetings,
>
> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> > I recently set up a Postgres15 master-slave (Primary-secondary) cluster
> on
> > 2 nodes. At the time of starting, I ensured that data files are exactly
> the
> > same on both the nodes. The size of the DB is 1.5TB.
> > The directory structure on both nodes looks as follows:
> > ```
> >   - /PROD/datadg/
> >|
> >|-> /PROD/datadg/tablespace
> >|   |-> /PROD/datadg/tablespace/tablespace1
> >|   |-> /PROD/datadg/tablespace/tablespace2
> >|
> >|-> /PROD/datadg/data
> >|   |-> /PROD/datadg/data/pg_tblspc
> >|   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
> > /PROD/datadg/tablespace/tablespace1
> >|   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
> > /PROD/datadg/tablespace/tablespace2
> >
> > ```
> > Almost a week later now, I see almost a 2GB size difference in the
> > tablespace folders on the 2 nodes. I also see some file count difference
> on
> > both the nodes.
>
> Not really enough to go on here.
>
> > `autovacuum` is on on both the nodes and there aren't any `idle in
> > transaction` queries on the slave node. Also there hasn't been any
> > disruption on the streaming replication. I did not get any error like
> `WAL
> > segment already removed` or so on the slave node. `pg_stat_replication`
> on
> > the master node also doesn't show anything out of the ordinary and the
> > `sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not
> > see a difference in counts of most tables either, haven't verified for
> all
> > of them.
> > So my **first question** is:
> >
> > * Why is there a difference in the files in the tablespace folder? I can
> > understand the difference in the modification timestamps, but some files
> > are just missing on the slave node.
>
> Unlogged tables would certainly be a pretty easy explanation of size
> differences between the two.  There's also temporary files that might be
> created on one system but not the other for in-progress queries.  These
> aren't the only possibilities but just a couple of likely candidates.
>
> > Now if I were to run `vacuumdb` on the master node, there are chances
> that
> > the slave node will break and give an error like this
> > ```
> > PANIC,XX000,"WAL contains references to invalid pages","WAL redo at
> > 875E/21A70BD0 for
> >  Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel
> > 16405/16419/533716933, fork 2, blk 26; blkref #1: rel
> > 16405/16419/533716933, blk 853758""","startup
> > ```
>
> Did you actually run vacuumdb and actually see this?  Are you able to
> reproduce it?
>
> > In the case when slave node breaks, these are the steps I usually do to
> > bring the slave node back:
> > 1) Start `pg_backup_start('backup')` on the master node
> > 2) rsync the files from master to slave by running the following on the
> > slave node:
> > ```
> > rsync -av --delete ma

Re: Nu-B here

2023-07-25 Thread Amn Ojee Uw

Please note the following :
/psql (12.15 (Debian 12.15-1.pgdg120+1
/Thanks in advance
//

On 7/20/23 10:41 a.m., Adrian Klaver wrote:

On 7/19/23 23:32, Alban Hertroys wrote:



On 20 Jul 2023, at 02:36, Amn Ojee Uw  wrote:

After this command 'sudo -u postgres psql'
I get this message :
could not change directory to "/home/my_account": Permission denied


What’s the reason that you’re using the OS user postgres?


Because the Debian/Ubuntu packaging sets up pg_hba.conf to use peer 
auth for local connections and the only database user created is 
postgres.


See:

https://ubuntu.com/server/docs/databases-postgresql

for more information.



If you’re simply trying to connect to the database named postgres as 
database user postgres, you can instead use the command 'psql -U 
postgres postgres’.





Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Error 503 Backend fetch failed

2023-07-25 Thread Stephen Frost
Greetings,

* Sabri Taner Burak ALTINDAL (s...@protonmail.com) wrote:
> I am interning at a company. I was using Python to download your mail lists, 
> but it seems that there was an unwanted situation, and you have restricted 
> access to log in. However, I read that you are not against the storage of 
> emails on your website. Could you please explain why access was restricted? I 
> would like to assure you that I will be more careful. I hope you will 
> understand this negative situation with tolerance and kindly request you to 
> open access again.

Backend fetch failed is actually just an intermittent error that
sometimes happens when we're rebooting systems or networking devices.

Thanks,

Stephen


signature.asc
Description: PGP signature


Error 503 Backend fetch failed

2023-07-25 Thread Sabri Taner Burak ALTINDAL
I am interning at a company. I was using Python to download your mail lists, 
but it seems that there was an unwanted situation, and you have restricted 
access to log in. However, I read that you are not against the storage of 
emails on your website. Could you please explain why access was restricted? I 
would like to assure you that I will be more careful. I hope you will 
understand this negative situation with tolerance and kindly request you to 
open access again.

IPv4 of company : 
[78.189.180.67](https://www.whatismyip.com/ip/78.189.180.67/)Kind regards,

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.

Bu ileti [Proton Mail](https://proton.me/) güvenli e-posta hizmetiyle 
gönderilmiştir.

Re: How to improve the performance of my SQL query?

2023-07-25 Thread David Rowley
On Fri, 21 Jul 2023 at 13:44, gzh  wrote:
>
> The definitions of the columns used in SQL are as follows.
>
> TBL_SHA
> ms_cd character(6) NOT NULL   -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)

> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = 'MLD009'
> and TBL_SHA.ETRYS in
>(select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = 'MLD009'
>AND TBL_INF.RY_CD = '0001'
>)
> - Execution Plan -
> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.769..124168.771 rows=1 loops=1)
>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
> time=124168.767..124168.769 rows=1 loops=1)
> ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) (actual 
> time=97264.166..123920.769 rows=320 loops=1)
>   ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
> '0001'::bpchar))
> Heap Fetches: 1
>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
> width=18) (actual time=97264.138..123554.792 rows=320 loops=1)
> Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = 
> '0001'::bpchar))
> Rows Removed by Filter: 32000325
> Planning Time: 0.162 ms
> Execution Time: 124168.838 ms

You might want to check your description of the table definitions.
Going by the above EXPLAIN ANALYZE output, it very much does not look
like ms_cd is the primary key of TBL_SHA. If it is then it's very
weird that you have 320 rows for MS_CD = 'MLD009'. You have some
data corruption if that's the case. I suspect you've just not
accurately described the table definition, however.

David




Re: Re: How to improve the performance of my SQL query?

2023-07-25 Thread David Rowley
On Thu, 20 Jul 2023 at 23:36, gzh  wrote:
>
>
> Thank you very much for taking the time to reply to my question.
>
>
> Sorry, I provided incorrect information.
>
> The index also does not work in the following query statement.
>
>
> > select COUNT(ET_CD)
> > from TBL_SHA
> > WHERE MS_CD = '009'
> > AND ETRYS = '01'
>
> QUERY PLAN
> Limit  (cost=2419643.47..2419643.48 rows=1 width=8) (actual 
> time=128667.439..128668.250 rows=1 loops=1)
>   ->  Finalize Aggregate  (cost=2419643.47..2419643.48 rows=1 width=8) 
> (actual time=128667.437..128668.246 rows=1 loops=1)
> ->  Gather  (cost=2419643.25..2419643.46 rows=2 width=8) (actual 
> time=128664.108..128668.233 rows=3 loops=1)
>   Workers Planned: 2
>   Workers Launched: 2
>   ->  Partial Aggregate  (cost=2418643.25..2418643.26 rows=1 
> width=8) (actual time=128655.256..128655.258 rows=1 loops=3)
> ->  Parallel Seq Scan on TBL_SHA  (cost=0.00..2415548.85 
> rows=1237762 width=9) (actual time=75357.455..128531.615 rows=107 loops=3)
>   Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
> '01'::bpchar))
>   Rows Removed by Filter: 11833442
> Planning Time: 0.118 ms
> Execution Time: 128668.290 ms
>
> The TBL_SHA table has another index, as shown below.
>
> CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
> CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, 
> BK_CD, FR_CD, RM_CD)
>
> When I take the following query statement, the result is returned quickly.
> Why does index_search_01 always not work?

The method to access the table is decided by the query planner based
on costs.  The costs are driven off the row estimates which are driven
from table statistics.  If the table statistics, for example say that
99% of rows have MS_CD = '009', then scanning an index on MS_CD is
unlikely to be a good idea as that would likely require random access
to the heap.  It's likely better to perform a table scan and then just
filter out the 1% of rows that don't match.

Try executing the query after having done:

SET enable_seqscan TO off;

What plan does it use now?

Is that plan faster or slower than the seq scan plan?

David




Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Stephen Frost
Greetings,

* Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> I recently set up a Postgres15 master-slave (Primary-secondary) cluster on
> 2 nodes. At the time of starting, I ensured that data files are exactly the
> same on both the nodes. The size of the DB is 1.5TB.
> The directory structure on both nodes looks as follows:
> ```
>   - /PROD/datadg/
>|
>|-> /PROD/datadg/tablespace
>|   |-> /PROD/datadg/tablespace/tablespace1
>|   |-> /PROD/datadg/tablespace/tablespace2
>|
>|-> /PROD/datadg/data
>|   |-> /PROD/datadg/data/pg_tblspc
>|   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
> /PROD/datadg/tablespace/tablespace1
>|   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
> /PROD/datadg/tablespace/tablespace2
> 
> ```
> Almost a week later now, I see almost a 2GB size difference in the
> tablespace folders on the 2 nodes. I also see some file count difference on
> both the nodes.

Not really enough to go on here.

> `autovacuum` is on on both the nodes and there aren't any `idle in
> transaction` queries on the slave node. Also there hasn't been any
> disruption on the streaming replication. I did not get any error like `WAL
> segment already removed` or so on the slave node. `pg_stat_replication` on
> the master node also doesn't show anything out of the ordinary and the
> `sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not
> see a difference in counts of most tables either, haven't verified for all
> of them.
> So my **first question** is:
> 
> * Why is there a difference in the files in the tablespace folder? I can
> understand the difference in the modification timestamps, but some files
> are just missing on the slave node.

Unlogged tables would certainly be a pretty easy explanation of size
differences between the two.  There's also temporary files that might be
created on one system but not the other for in-progress queries.  These
aren't the only possibilities but just a couple of likely candidates.

> Now if I were to run `vacuumdb` on the master node, there are chances that
> the slave node will break and give an error like this
> ```
> PANIC,XX000,"WAL contains references to invalid pages","WAL redo at
> 875E/21A70BD0 for
>  Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel
> 16405/16419/533716933, fork 2, blk 26; blkref #1: rel
> 16405/16419/533716933, blk 853758""","startup
> ```

Did you actually run vacuumdb and actually see this?  Are you able to
reproduce it?

> In the case when slave node breaks, these are the steps I usually do to
> bring the slave node back:
> 1) Start `pg_backup_start('backup')` on the master node
> 2) rsync the files from master to slave by running the following on the
> slave node:
> ```
> rsync -av --delete master_node:/PROD/datadg/data/ /PROD/datadg/data
> --exclude 'pg_log' --exclude 'pg_replslot'
> ```
> 3. Stop `pg_backup_stop()` on  master node

This really isn't good enough as rsync will use timestamp/size by
default and this also doesn't grab and restore the absolutely required
backup_label file that's returned from pg_backup_stop().  Basically,
this is not a valid way to perform a backup/restore of PG.  Please go
read the documentation for how to use the low-level API ... or, even
better, use an existing well maintained backup/restore tool for this.

> 4. Start the slave node again and it usually works, even though the
> tablespace files might not still be the same.

If you didn't grab the backup_label from pg_backup_stop() and drop it
into place on the replica, then you've almost certainly got a corrupt
system.

> **Second question**:
> 
> * What is the best way to bring the slave node back? Is the `rsync` between
> tablespaces required? And if yes, what is the best method to do it for very
> large databases, something maybe as big as 30TB or more. I don't want to
> `rsync` all the files even if the timestamp on them is different. So is a
> command like this safe to do? Or should an option like `--checksum` be
> used?
> ```
> rsync -av --delete master_node:/PROD/datadg/tablespace/
> /PROD/datadg/tablespace --size-only
> ```

Using --size-only is an absolutely horrid idea and will miss lots of
changes.  What you're doing isn't working because you're just creating
corrupt replicas by not following the process correctly for performing a
backup and restore of PG.

This stuff isn't easy.  Use a tool for it, especially for larger
systems.

> **Third question:**
> 
> * Is it advised to run `vacuumdb` before or after bringing the slave node
> back again?

No, this isn't necessary.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: suggestion about time based partitioning and hibernate

2023-07-25 Thread Marc Millas
On Tue, Jul 18, 2023 at 8:18 AM Luca Ferrari  wrote:

> Dear all,
> I'm looking for ideas here, and it could be someone already stepped
> into declarative partitioning of an existing database where Hibernate
> (a Java ORM) handles the tables.
> The situation is as follows:
>
> create table foo( id primary key, a_date date, ... );
>
> Now, the trivial way to partition this would be on a range based on
> a_date, so that the primary key of the tables shifts from id to (id,
> a_date). One thing that frightens me is that Hibernate does a lot of
> per-row lookups by means of the id, so while the partitioning is
> probably going to make things more manageable and even faster in some
> scenarios, could lead to drawbacks when Hibernate queries by id.
> Moreover, hibernate will think id is unique while it is not anymore.
> Last but not least, referencing foreign keys are made by Hibernate
> thru the id column, and it means that incoming foreign keys to foo
> will not be in place anymore.
>
> Now, I know that I can define a composite key in hibernate, in order
> to match the effective new data structure, but this requires a huge
> rewrite of the application code.
> And after all, we are talking about a non-PostgreSQL related piece, so
> the problem is not on the PostgreSQL side.
>
> Anyone has already done a partitioning in such scenario?
>
> I am thinking that partitioning on an hash of id could be the only way
> to go without having to touch the hibernate side, even if this would
> bring up a less balanced partitioned structure. In such case, I mean
> partitioning by hash, having a table with 60 millions rows per 50 GB
> in size, what would be the rule of thumb to select the number of
> partitions (i.e., a suggested modulus)?
>
> I will appreciate any suggestion.
>
> Thanks,
> Luca
>

Hello,

quite often, in such cases, the id is a sequence or something like that.
so, partitioning on id range or a_date range is similar, on a maintenance
point of vue.
and, obviously if you partition by id range, no pk pb.
the difference is the SQL plan to access the data, as to prune partitions,
Postgres need to have the partition key in the where clause.
obviously, if your code do access the data thru id... then partition by id.

>
>   Marc MILLAS
>
>


Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Abhishek Bhola
I recently set up a Postgres15 master-slave (Primary-secondary) cluster on
2 nodes. At the time of starting, I ensured that data files are exactly the
same on both the nodes. The size of the DB is 1.5TB.
The directory structure on both nodes looks as follows:
```
  - /PROD/datadg/
   |
   |-> /PROD/datadg/tablespace
   |   |-> /PROD/datadg/tablespace/tablespace1
   |   |-> /PROD/datadg/tablespace/tablespace2
   |
   |-> /PROD/datadg/data
   |   |-> /PROD/datadg/data/pg_tblspc
   |   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
/PROD/datadg/tablespace/tablespace1
   |   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
/PROD/datadg/tablespace/tablespace2

```
Almost a week later now, I see almost a 2GB size difference in the
tablespace folders on the 2 nodes. I also see some file count difference on
both the nodes.

`autovacuum` is on on both the nodes and there aren't any `idle in
transaction` queries on the slave node. Also there hasn't been any
disruption on the streaming replication. I did not get any error like `WAL
segment already removed` or so on the slave node. `pg_stat_replication` on
the master node also doesn't show anything out of the ordinary and the
`sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not
see a difference in counts of most tables either, haven't verified for all
of them.
So my **first question** is:

* Why is there a difference in the files in the tablespace folder? I can
understand the difference in the modification timestamps, but some files
are just missing on the slave node.

Now if I were to run `vacuumdb` on the master node, there are chances that
the slave node will break and give an error like this
```
PANIC,XX000,"WAL contains references to invalid pages","WAL redo at
875E/21A70BD0 for
 Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel
16405/16419/533716933, fork 2, blk 26; blkref #1: rel
16405/16419/533716933, blk 853758""","startup
```

In the case when slave node breaks, these are the steps I usually do to
bring the slave node back:
1) Start `pg_backup_start('backup')` on the master node
2) rsync the files from master to slave by running the following on the
slave node:
```
rsync -av --delete master_node:/PROD/datadg/data/ /PROD/datadg/data
--exclude 'pg_log' --exclude 'pg_replslot'
```
3. Stop `pg_backup_stop()` on  master node

4. Start the slave node again and it usually works, even though the
tablespace files might not still be the same.


**Second question**:

* What is the best way to bring the slave node back? Is the `rsync` between
tablespaces required? And if yes, what is the best method to do it for very
large databases, something maybe as big as 30TB or more. I don't want to
`rsync` all the files even if the timestamp on them is different. So is a
command like this safe to do? Or should an option like `--checksum` be
used?
```
rsync -av --delete master_node:/PROD/datadg/tablespace/
/PROD/datadg/tablespace --size-only
```

**Third question:**

* Is it advised to run `vacuumdb` before or after bringing the slave node
back again?

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: [Beginner Question]Is there way to test the postgres's kernel function?

2023-07-25 Thread jian he
On Mon, Jul 24, 2023 at 5:43 PM Wen Yi  wrote:
>
> Hi community,
> I am learning the kernel of the postgres, and I want to test the 
> make_relative_path function to know what it's operational principle.(I can 
> read it's code, and I want to actually run it)
> But I really don't know how to do it, wrtite a extension?
>
> Can someone give me some advice?
> Thanks in advance!
>
> Yours,
> Wen Yi.

create a c file, copy c.h  include line to new c file. add an empty
main function. then gcc -Wextra your_c_absolute_path
then all kinds of errors will happen in the terminal.
found out why. either delete the line or figure out why error.

Finally, no errors happen. then experiment in the empty main function.
The following example only makes sure no error when gcc -Wextra.

/*-

gcc -Wextra 
/home/jian/.config/Code/User/globalStorage/buenon.scratchpads/scratchpads/47a6066b9dfaca1c2b13ea2e99f4e929/scratch1.c

*/

#include 
#include 
#include 
#include 
#include 
#include 
#ifdef HAVE_STRINGS_H
#include 
#endif
#include 
#include 
#include 
#if defined(WIN32) || defined(__CYGWIN__)
#include  /* ensure O_BINARY is available */
#endif
#include 
#ifdef ENABLE_NLS
#include 
#endif
#include 

#include 
#include 
#ifdef WIN32
#ifdef _WIN32_IE
#undef _WIN32_IE
#endif
#define _WIN32_IE 0x0500
#ifdef near
#undef near
#endif
#define near
#include 
#else
#include 
#endif

// #include "pg_config_paths.h"
#define MAXPGPATH 1024
#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)

#define IS_NONWINDOWS_DIR_SEP(ch) ((ch) == '/')

#ifndef WIN32
#define IS_PATH_VAR_SEP(ch) ((ch) == ':')
#else
#define IS_PATH_VAR_SEP(ch) ((ch) == ';')
#endif

#define StaticAssertDecl(condition, errmessage) \
_Static_assert(condition, errmessage)
#define StaticAssertStmt(condition, errmessage) \
do { _Static_assert(condition, errmessage); } while(0)
#define StaticAssertExpr(condition, errmessage) \
((void) ({ StaticAssertStmt(condition, errmessage); true; }))
#define  HAVE__BUILTIN_TYPES_COMPATIBLE_P 1

#if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P)
#define unconstify(underlying_type, expr) \
(StaticAssertExpr(__builtin_types_compatible_p(__typeof(expr), const
underlying_type), \
  "wrong cast"), \
(underlying_type) (expr))
#endif

static void make_relative_path(char *ret_path, const char *target_path,
   const char *bin_path, const char *my_exec_path);
static char *trim_directory(char *path);
static void trim_trailing_separator(char *path);
static char *append_subdir_to_path(char *path, char *subdir);

char *
first_dir_separator(const char *filename);

/*
 * Copy src to string dst of size siz.  At most siz-1 characters
 * will be copied.  Always NUL terminates (unless siz == 0).
 * Returns strlen(src); if retval >= siz, truncation occurred.
 * Function creation history:  http://www.gratisoft.us/todd/papers/strlcpy.html
 */
size_t
strlcpy(char *dst, const char *src, size_t siz)
{
char*d = dst;
const char *s = src;
size_t n = siz;

/* Copy as many bytes as will fit */
if (n != 0)
{
while (--n != 0)
{
if ((*d++ = *s++) == '\0')
break;
}
}

/* Not enough room in dst, add NUL and traverse rest of src */
if (n == 0)
{
if (siz != 0)
*d = '\0'; /* NUL-terminate dst */
while (*s++)
;
}

return (s - src - 1); /* count does not include NUL */
}



/*
 * skip_drive
 *
 * On Windows, a path may begin with "C:" or "//network/".  Advance over
 * this and point to the effective start of the path.
 */
#ifdef WIN32

static char *
skip_drive(const char *path)
{
if (IS_DIR_SEP(path[0]) && IS_DIR_SEP(path[1]))
{
path += 2;
while (*path && !IS_DIR_SEP(*path))
path++;
}
else if (isalpha((unsigned char) path[0]) && path[1] == ':')
{
path += 2;
}
return (char *) path;
}
#else

#define skip_drive(path) (path)
#endif

/*
 * has_drive_prefix
 *
 * Return true if the given pathname has a drive prefix.
 */
bool
has_drive_prefix(const char *path)
{
#ifdef WIN32
return skip_drive(path) != path;
#else
return false;
#endif
}

/*
 * first_dir_separator
 *
 * Find the location of the first directory separator, return
 * NULL if not found.
 */
char *
first_dir_separator(const char *filename)
{
const char *p;

for (p = skip_drive(filename); *p; p++)
if (IS_DIR_SEP(*p))
return unconstify(char *, p);
return NULL;
}

/*
 * first_path_var_separator
 *
 * Find the location of the first path separator (i.e. ':' on
 * Unix, ';' on Windows), return NULL if not found.
 */
char *
first_path_var_separator(const char *pathlist)
{
const char *p;

/* skip_drive is not needed */
for (p = pathlist; *p; p++)
if (IS_PATH_VAR_SEP(*p))
return unconstify(char *, p);
return NULL;
}

/*
 * last_dir_separator
 *
 * Find the location of the last directory separator, return
 * NULL if not found.
 */
char *
last_dir_separator(const char *filename)
{
const char *p,
   *ret = NULL;

for (p = skip_drive(filename); *p; p++)
if (IS_DIR_SEP(*p))
ret = p;
return unconstify(char *, ret);
}


/*
 * make_native_path - on WIN32, change / to \ in the path