Re: PG 14 pg_basebackup accepts --compress=server-zst option
Ron Johnson writes: > On Fri, Jun 7, 2024 at 12:32 AM David G. Johnston < > david.g.johns...@gmail.com> wrote: >> I don’t see us adding an error message at this point. > Me neither. It just seemed odd. v14 thinks the argument of --compress must be an integer, and doesn't really bother with any syntax error checks: case 'Z': compresslevel = atoi(optarg); if (compresslevel < 0 || compresslevel > 9) { pg_log_error("invalid compression level \"%s\"", optarg); exit(1); } break; In your example, atoi() will return zero and it will sail along with no compression. Releases 15 and up have more complex ideas of what --compress can specify, and seem to syntax-check it much more thoroughly. This is a pretty common coding pattern, so I can't get excited about changing it, especially not in long-stable branches. regards, tom lane
Re: PG 14 pg_basebackup accepts --compress=server-zst option
On Fri, Jun 7, 2024 at 12:32 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thursday, June 6, 2024, Kashif Zeeshan wrote: > >> Hi >> >> On Fri, Jun 7, 2024 at 6:54 AM Ron Johnson >> wrote: >> >>> >>> https://www.postgresql.org/docs/14/app-pgbasebackup.html doesn't >>> mention "--compress=[{client|server}-]method". That first appears in the >>> v15 docs. >>> >>> And yet pg_basebackup doesn't complain about an invalid option. >>> (Technically, this is a bug; I first noticed it a week after copying a >>> script from a PG 15 server to five PG 14 servers, and running it quite a >>> few times without fail.) >>> >> > Seems a bit suspect, but as your script doesn’t mention tar the option > itself is apparently ignored, I guess silently. > Does this mean that "--compress=server-zst" is only relevant with --format=tar? > Assuming this isn’t an actual regression in behavior in a patch-released > older version > My apologies for not mentioning the version: 14.12-1PGDG-rhel8. > I don’t see us adding an error message at this point. > Me neither. It just seemed odd.
Re: PG 14 pg_basebackup accepts --compress=server-zst option
On Thursday, June 6, 2024, Kashif Zeeshan wrote: > Hi > > On Fri, Jun 7, 2024 at 6:54 AM Ron Johnson > wrote: > >> >> https://www.postgresql.org/docs/14/app-pgbasebackup.html doesn't mention >> "--compress=[{client|server}-]method". That first appears in the v15 >> docs. >> >> And yet pg_basebackup doesn't complain about an invalid option. >> (Technically, this is a bug; I first noticed it a week after copying a >> script from a PG 15 server to five PG 14 servers, and running it quite a >> few times without fail.) >> > Seems a bit suspect, but as your script doesn’t mention tar the option itself is apparently ignored, I guess silently. Assuming this isn’t an actual regression in behavior in a patch-released older version I don’t see us adding an error message at this point. > If the support is removed then it should be mentioned in the official > documentation. > Support wasn’t removed. Re-read the email and check the version/times being mentioned again. David J.
Re: PG 14 pg_basebackup accepts --compress=server-zst option
Hi On Fri, Jun 7, 2024 at 6:54 AM Ron Johnson wrote: > > https://www.postgresql.org/docs/14/app-pgbasebackup.html doesn't mention > "--compress=[{client|server}-]method". That first appears in the v15 docs. > > And yet pg_basebackup doesn't complain about an invalid option. > (Technically, this is a bug; I first noticed it a week after copying a > script from a PG 15 server to five PG 14 servers, and running it quite a > few times without fail.) > If the support is removed then it should be mentioned in the official documentation. Regards Kashif Zeeshan Bitnine Global > > $ pg_basebackup \ > > --pgdata=$PGDATA \ > > --dbname=service=basebackup \ > > --verbose --progress \ > > --checkpoint=fast \ > > --write-recovery-conf \ > > --wal-method=stream \ > > --create-slot --slot=pgstandby1 \ > > --compress=server-zst ; echo $? > pg_basebackup: initiating base backup, waiting for checkpoint to complete > pg_basebackup: checkpoint completed > pg_basebackup: write-ahead log start point: 256/BC28 on timeline 1 > pg_basebackup: starting background WAL receiver > pg_basebackup: created replication slot "pgstandby1" > 42567083/42567083 kB (100%), 1/1 tablespace > pg_basebackup: write-ahead log end point: 256/BC000138 > pg_basebackup: waiting for background process to finish streaming ... > pg_basebackup: syncing data to disk ... > pg_basebackup: renaming backup_manifest.tmp to backup_manifest > pg_basebackup: base backup completed > 0 > >
PG 14 pg_basebackup accepts --compress=server-zst option
https://www.postgresql.org/docs/14/app-pgbasebackup.html doesn't mention "--compress=[{client|server}-]method". That first appears in the v15 docs. And yet pg_basebackup doesn't complain about an invalid option. (Technically, this is a bug; I first noticed it a week after copying a script from a PG 15 server to five PG 14 servers, and running it quite a few times without fail.) $ pg_basebackup \ > --pgdata=$PGDATA \ > --dbname=service=basebackup \ > --verbose --progress \ > --checkpoint=fast \ > --write-recovery-conf \ > --wal-method=stream \ > --create-slot --slot=pgstandby1 \ > --compress=server-zst ; echo $? pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 256/BC28 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created replication slot "pgstandby1" 42567083/42567083 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 256/BC000138 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed 0
pg_basebackup "Permission denied" error when initiating streaming replication
Hi, I'm trying to initiate streaming replication between two Postgres 10 servers on Windows. The source cluster is of size ~450gb, and the cluster consists of the primary data folder, and two separated tablespaces. One of the tablespaces is configured with a symbolic link inside Windows. Now, the problem I'm encountering is the following: I call pg_basebackup at the target as follows: >> pg_basebackup -h -U -p -W -D -T >> "C:\Server >> Files\Data\ProgramsData\PostgreSQL\10\data\base\demo_tablespace"=E:\PG_tablespace_data\10\demo_tablespace >> -T "C:\Server >> Files\Data\ProgramsData\PostgreSQL\10\data\base\tsb_tablespace"=E:\PG_tablespace_data\10\tsb_tablespace >> -P -R -v This starts the pg_basebackup and it runs without problems for the duration of copying the 450gbs of data. Then the output is the following: >>pg_basebackup: initiating base backup, waiting for checkpoint to complete >>pg_basebackup: checkpoint complete >>pg_basebackup: write-ahead log start point: 181/928 on timeline 1 >>pg_basebackup: starting background WAL receiver >>432917950/482036946 kB (100%), 3/3 tablespaces >>pg_basebackup: could not get write-ahead log end position from server: ERROR: >>could not open file "./base/demo_tablespace": Permission denied The demo_tablespace is the one that is a symbolic link from another folder. After the error, I find that all the data is copied to the target tablespaces and primary data directory, but the data directory is missing all pg_... directories and configuration files. So no running database at target. I'm running the source Postgres service on Windows as a local account, the command is given with administrator cmd and the ./base/demo_tablespace folder's security settings should have all needed permissions for the operation. I have a hard time understanding why pg_basebackup can retrieve the data from the ./base/demo_tablespace directory, and then suddenly gets denied. I'm running out of ideas on what to try next. BR, Frans Simmelvuo
Re: pg_basebackup Restore problem
Thank You very much for your time. On Wed, Jan 17, 2024 at 4:51 PM Ron Johnson wrote: > Perfectly understandable, but tar *did* fail. Time to start debugging > your shell script. > > On Wed, Jan 17, 2024 at 4:26 PM Johnathan Tiamoh < > johnathantia...@gmail.com> wrote: > >> Ok. >> >> I'm a little confused because has always work >> >> On Wed, Jan 17, 2024 at 4:11 PM Ron Johnson >> wrote: >> >>> Then you've got a bug somewhere in: >>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>> >>> On Wed, Jan 17, 2024 at 4:07 PM Johnathan Tiamoh < >>> johnathantia...@gmail.com> wrote: >>> >>>> Yes. >>>> >>>> I am trying to restore the backups on a standby >>>> >>>> On Wed, Jan 17, 2024 at 4:04 PM Ron Johnson >>>> wrote: >>>> >>>>> Wait a minute... *tar* is throwing the errors, not pg_basebackup, no? >>>>> >>>>> On Wed, Jan 17, 2024 at 3:27 PM Johnathan Tiamoh < >>>>> johnathantia...@gmail.com> wrote: >>>>> >>>>>> 1. What's in $PATH_FOLDER? >>>>>> >>>>>> /tnt/backup/current >>>>>> >>>>>> 2. What pg_basebackup command did you use? >>>>>> >>>>>> pg_basebackup -D "$baseback_dir" --format=tar \ >>>>>> "${comp_opts[@]}" --wal-method=stream --no-password >>>>>> --verbose "${PG_DUMP_OPTS[@]}" >>>>>> >>>>>> /bin/mv "$baseback_dir"/* "/enf/backup/current/" >>>>>> /bin/rm -r "$baseback_dir" >>>>>> >>>>>> >>>>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it >>>>>> needs? >>>>>> >>>>>> >>>>>> On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson >>>>>> wrote: >>>>>> >>>>>>> 1. What's in $PATH_FOLDER? >>>>>>> 2. What pg_basebackup command did you use? >>>>>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it >>>>>>> needs? >>>>>>> >>>>>>> On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < >>>>>>> johnathantia...@gmail.com> wrote: >>>>>>> >>>>>>>> You need to tell us the PG version number *and* show us the full >>>>>>>> command you ran. ? >>>>>>>> >>>>>>>> Postgresql Version 14.10 >>>>>>>> >>>>>>>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>>>>>>> >>>>>>>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>>>>>>> >>>>>>>> What user are you running pg_basebackup as? >>>>>>>> >>>>>>>> I ran it as postgres and now I'm restoring as postgres >>>>>>>> >>>>>>>> /tnt??? Or /mnt? >>>>>>>> /tnt/backup/current >>>>>>>> >>>>>>>> This is where the backup files are. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson < >>>>>>>> ronljohnso...@gmail.com> wrote: >>>>>>>> >>>>>>>>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>>>>>>>> johnathantia...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I am trying to restore a pg_basebackup and have the following >>>>>>>>>> errors. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> You need to tell us the PG version number *and* show us the full >>>>>>>>> command you ran. >>>>>>>>> >>>>>>>>> >>>>>>>>>> nohup: ignoring input >>>>>>>>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>>>>>>>> >>>>>>>>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>>>>>>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>>>>>>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>>>>>>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>>>>>>>> tar: Exiting with failure status due to previous errors >>>>>>>>>> PG_14_202107181/ >>>>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>>>> >>>>>>>>> >>>>>>>>> What user are you running pg_basebackup as? >>>>>>>>> >>>>>>>>> >>>>>>>>>> PG_14_202107181/137502/ >>>>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or >>>>>>>>>> directory >>>>>>>>>> PG_14_202107181/137502/3222926016 >>>>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>>>> >>>>>>>>>> The files exist in the specified directory. >>>>>>>>>> >>>>>>>>> >>>>>>>>> /tnt??? Or /mnt? >>>>>>>>> >>>>>>>>>
Re: pg_basebackup Restore problem
Perfectly understandable, but tar *did* fail. Time to start debugging your shell script. On Wed, Jan 17, 2024 at 4:26 PM Johnathan Tiamoh wrote: > Ok. > > I'm a little confused because has always work > > On Wed, Jan 17, 2024 at 4:11 PM Ron Johnson > wrote: > >> Then you've got a bug somewhere in: >> tar -h -zxvf $PATH_FOLDER/* .tar.gz >> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >> >> On Wed, Jan 17, 2024 at 4:07 PM Johnathan Tiamoh < >> johnathantia...@gmail.com> wrote: >> >>> Yes. >>> >>> I am trying to restore the backups on a standby >>> >>> On Wed, Jan 17, 2024 at 4:04 PM Ron Johnson >>> wrote: >>> >>>> Wait a minute... *tar* is throwing the errors, not pg_basebackup, no? >>>> >>>> On Wed, Jan 17, 2024 at 3:27 PM Johnathan Tiamoh < >>>> johnathantia...@gmail.com> wrote: >>>> >>>>> 1. What's in $PATH_FOLDER? >>>>> >>>>> /tnt/backup/current >>>>> >>>>> 2. What pg_basebackup command did you use? >>>>> >>>>> pg_basebackup -D "$baseback_dir" --format=tar \ >>>>> "${comp_opts[@]}" --wal-method=stream --no-password >>>>> --verbose "${PG_DUMP_OPTS[@]}" >>>>> >>>>> /bin/mv "$baseback_dir"/* "/enf/backup/current/" >>>>> /bin/rm -r "$baseback_dir" >>>>> >>>>> >>>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it >>>>> needs? >>>>> >>>>> >>>>> On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson >>>>> wrote: >>>>> >>>>>> 1. What's in $PATH_FOLDER? >>>>>> 2. What pg_basebackup command did you use? >>>>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it >>>>>> needs? >>>>>> >>>>>> On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < >>>>>> johnathantia...@gmail.com> wrote: >>>>>> >>>>>>> You need to tell us the PG version number *and* show us the full >>>>>>> command you ran. ? >>>>>>> >>>>>>> Postgresql Version 14.10 >>>>>>> >>>>>>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>>>>>> >>>>>>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>>>>>> >>>>>>> What user are you running pg_basebackup as? >>>>>>> >>>>>>> I ran it as postgres and now I'm restoring as postgres >>>>>>> >>>>>>> /tnt??? Or /mnt? >>>>>>> /tnt/backup/current >>>>>>> >>>>>>> This is where the backup files are. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson < >>>>>>> ronljohnso...@gmail.com> wrote: >>>>>>> >>>>>>>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>>>>>>> johnathantia...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am trying to restore a pg_basebackup and have the following >>>>>>>>> errors. >>>>>>>>> >>>>>>>>> >>>>>>>> You need to tell us the PG version number *and* show us the full >>>>>>>> command you ran. >>>>>>>> >>>>>>>> >>>>>>>>> nohup: ignoring input >>>>>>>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>>>>>>> >>>>>>>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>>>>>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>>>>>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>>>>>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>>>>>>> tar: Exiting with failure status due to previous errors >>>>>>>>> PG_14_202107181/ >>>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>>> >>>>>>>> >>>>>>>> What user are you running pg_basebackup as? >>>>>>>> >>>>>>>> >>>>>>>>> PG_14_202107181/137502/ >>>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or >>>>>>>>> directory >>>>>>>>> PG_14_202107181/137502/3222926016 >>>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>>> >>>>>>>>> The files exist in the specified directory. >>>>>>>>> >>>>>>>> >>>>>>>> /tnt??? Or /mnt? >>>>>>>> >>>>>>>>
Re: pg_basebackup Restore problem
Ok. I'm a little confused because has always work On Wed, Jan 17, 2024 at 4:11 PM Ron Johnson wrote: > Then you've got a bug somewhere in: > tar -h -zxvf $PATH_FOLDER/* .tar.gz > ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} > > On Wed, Jan 17, 2024 at 4:07 PM Johnathan Tiamoh < > johnathantia...@gmail.com> wrote: > >> Yes. >> >> I am trying to restore the backups on a standby >> >> On Wed, Jan 17, 2024 at 4:04 PM Ron Johnson >> wrote: >> >>> Wait a minute... *tar* is throwing the errors, not pg_basebackup, no? >>> >>> On Wed, Jan 17, 2024 at 3:27 PM Johnathan Tiamoh < >>> johnathantia...@gmail.com> wrote: >>> >>>> 1. What's in $PATH_FOLDER? >>>> >>>> /tnt/backup/current >>>> >>>> 2. What pg_basebackup command did you use? >>>> >>>> pg_basebackup -D "$baseback_dir" --format=tar \ >>>> "${comp_opts[@]}" --wal-method=stream --no-password >>>> --verbose "${PG_DUMP_OPTS[@]}" >>>> >>>> /bin/mv "$baseback_dir"/* "/enf/backup/current/" >>>> /bin/rm -r "$baseback_dir" >>>> >>>> >>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? >>>> >>>> >>>> On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson >>>> wrote: >>>> >>>>> 1. What's in $PATH_FOLDER? >>>>> 2. What pg_basebackup command did you use? >>>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it >>>>> needs? >>>>> >>>>> On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < >>>>> johnathantia...@gmail.com> wrote: >>>>> >>>>>> You need to tell us the PG version number *and* show us the full >>>>>> command you ran. ? >>>>>> >>>>>> Postgresql Version 14.10 >>>>>> >>>>>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>>>>> >>>>>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>>>>> >>>>>> What user are you running pg_basebackup as? >>>>>> >>>>>> I ran it as postgres and now I'm restoring as postgres >>>>>> >>>>>> /tnt??? Or /mnt? >>>>>> /tnt/backup/current >>>>>> >>>>>> This is where the backup files are. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson >>>>>> wrote: >>>>>> >>>>>>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>>>>>> johnathantia...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I am trying to restore a pg_basebackup and have the following >>>>>>>> errors. >>>>>>>> >>>>>>>> >>>>>>> You need to tell us the PG version number *and* show us the full >>>>>>> command you ran. >>>>>>> >>>>>>> >>>>>>>> nohup: ignoring input >>>>>>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>>>>>> >>>>>>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>>>>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>>>>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>>>>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>>>>>> tar: Exiting with failure status due to previous errors >>>>>>>> PG_14_202107181/ >>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>> >>>>>>> >>>>>>> What user are you running pg_basebackup as? >>>>>>> >>>>>>> >>>>>>>> PG_14_202107181/137502/ >>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >>>>>>>> PG_14_202107181/137502/3222926016 >>>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>>> >>>>>>>> The files exist in the specified directory. >>>>>>>> >>>>>>> >>>>>>> /tnt??? Or /mnt? >>>>>>> >>>>>>>
Re: pg_basebackup Restore problem
Then you've got a bug somewhere in: tar -h -zxvf $PATH_FOLDER/* .tar.gz ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} On Wed, Jan 17, 2024 at 4:07 PM Johnathan Tiamoh wrote: > Yes. > > I am trying to restore the backups on a standby > > On Wed, Jan 17, 2024 at 4:04 PM Ron Johnson > wrote: > >> Wait a minute... *tar* is throwing the errors, not pg_basebackup, no? >> >> On Wed, Jan 17, 2024 at 3:27 PM Johnathan Tiamoh < >> johnathantia...@gmail.com> wrote: >> >>> 1. What's in $PATH_FOLDER? >>> >>> /tnt/backup/current >>> >>> 2. What pg_basebackup command did you use? >>> >>> pg_basebackup -D "$baseback_dir" --format=tar \ >>> "${comp_opts[@]}" --wal-method=stream --no-password >>> --verbose "${PG_DUMP_OPTS[@]}" >>> >>> /bin/mv "$baseback_dir"/* "/enf/backup/current/" >>> /bin/rm -r "$baseback_dir" >>> >>> >>> 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? >>> >>> >>> On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson >>> wrote: >>> >>>> 1. What's in $PATH_FOLDER? >>>> 2. What pg_basebackup command did you use? >>>> 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? >>>> >>>> On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < >>>> johnathantia...@gmail.com> wrote: >>>> >>>>> You need to tell us the PG version number *and* show us the full >>>>> command you ran. ? >>>>> >>>>> Postgresql Version 14.10 >>>>> >>>>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>>>> >>>>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>>>> >>>>> What user are you running pg_basebackup as? >>>>> >>>>> I ran it as postgres and now I'm restoring as postgres >>>>> >>>>> /tnt??? Or /mnt? >>>>> /tnt/backup/current >>>>> >>>>> This is where the backup files are. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson >>>>> wrote: >>>>> >>>>>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>>>>> johnathantia...@gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am trying to restore a pg_basebackup and have the following errors. >>>>>>> >>>>>>> >>>>>> You need to tell us the PG version number *and* show us the full >>>>>> command you ran. >>>>>> >>>>>> >>>>>>> nohup: ignoring input >>>>>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>>>>> >>>>>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>>>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>>>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>>>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>>>>> tar: Exiting with failure status due to previous errors >>>>>>> PG_14_202107181/ >>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>> >>>>>> >>>>>> What user are you running pg_basebackup as? >>>>>> >>>>>> >>>>>>> PG_14_202107181/137502/ >>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >>>>>>> PG_14_202107181/137502/3222926016 >>>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>>> >>>>>>> The files exist in the specified directory. >>>>>>> >>>>>> >>>>>> /tnt??? Or /mnt? >>>>>> >>>>>>
Re: pg_basebackup Restore problem
Yes. I am trying to restore the backups on a standby On Wed, Jan 17, 2024 at 4:04 PM Ron Johnson wrote: > Wait a minute... *tar* is throwing the errors, not pg_basebackup, no? > > On Wed, Jan 17, 2024 at 3:27 PM Johnathan Tiamoh < > johnathantia...@gmail.com> wrote: > >> 1. What's in $PATH_FOLDER? >> >> /tnt/backup/current >> >> 2. What pg_basebackup command did you use? >> >> pg_basebackup -D "$baseback_dir" --format=tar \ >> "${comp_opts[@]}" --wal-method=stream --no-password >> --verbose "${PG_DUMP_OPTS[@]}" >> >> /bin/mv "$baseback_dir"/* "/enf/backup/current/" >> /bin/rm -r "$baseback_dir" >> >> >> 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? >> >> >> On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson >> wrote: >> >>> 1. What's in $PATH_FOLDER? >>> 2. What pg_basebackup command did you use? >>> 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? >>> >>> On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < >>> johnathantia...@gmail.com> wrote: >>> >>>> You need to tell us the PG version number *and* show us the full >>>> command you ran. ? >>>> >>>> Postgresql Version 14.10 >>>> >>>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>>> >>>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>>> >>>> What user are you running pg_basebackup as? >>>> >>>> I ran it as postgres and now I'm restoring as postgres >>>> >>>> /tnt??? Or /mnt? >>>> /tnt/backup/current >>>> >>>> This is where the backup files are. >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson >>>> wrote: >>>> >>>>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>>>> johnathantia...@gmail.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I am trying to restore a pg_basebackup and have the following errors. >>>>>> >>>>>> >>>>> You need to tell us the PG version number *and* show us the full >>>>> command you ran. >>>>> >>>>> >>>>>> nohup: ignoring input >>>>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>>>> >>>>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>>>> tar: Exiting with failure status due to previous errors >>>>>> PG_14_202107181/ >>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>> >>>>> >>>>> What user are you running pg_basebackup as? >>>>> >>>>> >>>>>> PG_14_202107181/137502/ >>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >>>>>> PG_14_202107181/137502/3222926016 >>>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>>> >>>>>> The files exist in the specified directory. >>>>>> >>>>> >>>>> /tnt??? Or /mnt? >>>>> >>>>>
Re: pg_basebackup Restore problem
Wait a minute... *tar* is throwing the errors, not pg_basebackup, no? On Wed, Jan 17, 2024 at 3:27 PM Johnathan Tiamoh wrote: > 1. What's in $PATH_FOLDER? > > /tnt/backup/current > > 2. What pg_basebackup command did you use? > > pg_basebackup -D "$baseback_dir" --format=tar \ > "${comp_opts[@]}" --wal-method=stream --no-password > --verbose "${PG_DUMP_OPTS[@]}" > > /bin/mv "$baseback_dir"/* "/enf/backup/current/" > /bin/rm -r "$baseback_dir" > > > 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? > > > On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson > wrote: > >> 1. What's in $PATH_FOLDER? >> 2. What pg_basebackup command did you use? >> 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? >> >> On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < >> johnathantia...@gmail.com> wrote: >> >>> You need to tell us the PG version number *and* show us the full >>> command you ran. ? >>> >>> Postgresql Version 14.10 >>> >>> tar -h -zxvf $PATH_FOLDER/* .tar.gz >>> >>> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >>> >>> What user are you running pg_basebackup as? >>> >>> I ran it as postgres and now I'm restoring as postgres >>> >>> /tnt??? Or /mnt? >>> /tnt/backup/current >>> >>> This is where the backup files are. >>> >>> >>> >>> >>> >>> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson >>> wrote: >>> >>>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>>> johnathantia...@gmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am trying to restore a pg_basebackup and have the following errors. >>>>> >>>>> >>>> You need to tell us the PG version number *and* show us the full >>>> command you ran. >>>> >>>> >>>>> nohup: ignoring input >>>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>>> >>>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>>> tar: Exiting with failure status due to previous errors >>>>> PG_14_202107181/ >>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>> >>>> >>>> What user are you running pg_basebackup as? >>>> >>>> >>>>> PG_14_202107181/137502/ >>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >>>>> PG_14_202107181/137502/3222926016 >>>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>>> >>>>> The files exist in the specified directory. >>>>> >>>> >>>> /tnt??? Or /mnt? >>>> >>>>
Re: pg_basebackup Restore problem
1. What's in $PATH_FOLDER? /tnt/backup/current 2. What pg_basebackup command did you use? pg_basebackup -D "$baseback_dir" --format=tar \ "${comp_opts[@]}" --wal-method=stream --no-password --verbose "${PG_DUMP_OPTS[@]}" /bin/mv "$baseback_dir"/* "/enf/backup/current/" /bin/rm -r "$baseback_dir" 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? On Wed, Jan 17, 2024 at 11:46 AM Ron Johnson wrote: > 1. What's in $PATH_FOLDER? > 2. What pg_basebackup command did you use? > 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? > > On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh < > johnathantia...@gmail.com> wrote: > >> You need to tell us the PG version number *and* show us the full command >> you ran. ? >> >> Postgresql Version 14.10 >> >> tar -h -zxvf $PATH_FOLDER/* .tar.gz >> >> ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} >> >> What user are you running pg_basebackup as? >> >> I ran it as postgres and now I'm restoring as postgres >> >> /tnt??? Or /mnt? >> /tnt/backup/current >> >> This is where the backup files are. >> >> >> >> >> >> On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson >> wrote: >> >>> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >>> johnathantia...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> I am trying to restore a pg_basebackup and have the following errors. >>>> >>>> >>> You need to tell us the PG version number *and* show us the full >>> command you ran. >>> >>> >>>> nohup: ignoring input >>>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>>> >>> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>>> tar: Exiting with failure status due to previous errors >>>> PG_14_202107181/ >>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>> >>> >>> What user are you running pg_basebackup as? >>> >>> >>>> PG_14_202107181/137502/ >>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >>>> PG_14_202107181/137502/3222926016 >>>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>>> >>>> The files exist in the specified directory. >>>> >>> >>> /tnt??? Or /mnt? >>> >>>
Re: pg_basebackup Restore problem
1. What's in $PATH_FOLDER? 2. What pg_basebackup command did you use? 3. Why aren't you letting pg_basebackup maintain the WAL files it needs? On Wed, Jan 17, 2024 at 11:27 AM Johnathan Tiamoh wrote: > You need to tell us the PG version number *and* show us the full command > you ran. ? > > Postgresql Version 14.10 > > tar -h -zxvf $PATH_FOLDER/* .tar.gz > > ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} > > What user are you running pg_basebackup as? > > I ran it as postgres and now I'm restoring as postgres > > /tnt??? Or /mnt? > /tnt/backup/current > > This is where the backup files are. > > > > > > On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson > wrote: > >> On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < >> johnathantia...@gmail.com> wrote: >> >>> Hello, >>> >>> I am trying to restore a pg_basebackup and have the following errors. >>> >>> >> You need to tell us the PG version number *and* show us the full command >> you ran. >> >> >>> nohup: ignoring input >>> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >>> >> tar: /tnt/backup/current/7401.tar.gz: Not found in archive >>> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >>> tar: /tnt/backup/current/base.tar.gz: Not found in archive >>> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >>> tar: Exiting with failure status due to previous errors >>> PG_14_202107181/ >>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>> >> >> What user are you running pg_basebackup as? >> >> >>> PG_14_202107181/137502/ >>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >>> PG_14_202107181/137502/3222926016 >>> tar: PG_14_202107181: Cannot mkdir: Permission denied >>> >>> The files exist in the specified directory. >>> >> >> /tnt??? Or /mnt? >> >>
Re: pg_basebackup Restore problem
You need to tell us the PG version number *and* show us the full command you ran. ? Postgresql Version 14.10 tar -h -zxvf $PATH_FOLDER/* .tar.gz ls $PATH_FOLDER/*.tar.gz | xargs -I {} tar -h -zxvf {} What user are you running pg_basebackup as? I ran it as postgres and now I'm restoring as postgres /tnt??? Or /mnt? /tnt/backup/current This is where the backup files are. On Wed, Jan 17, 2024 at 11:15 AM Ron Johnson wrote: > On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh < > johnathantia...@gmail.com> wrote: > >> Hello, >> >> I am trying to restore a pg_basebackup and have the following errors. >> >> > You need to tell us the PG version number *and* show us the full command > you ran. > > >> nohup: ignoring input >> tar: /tnt/backup/current/7400.tar.gz: Not found in archive >> > tar: /tnt/backup/current/7401.tar.gz: Not found in archive >> tar: /tnt/backup/current/7402.tar.gz: Not found in archive >> tar: /tnt/backup/current/base.tar.gz: Not found in archive >> tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive >> tar: Exiting with failure status due to previous errors >> PG_14_202107181/ >> tar: PG_14_202107181: Cannot mkdir: Permission denied >> > > What user are you running pg_basebackup as? > > >> PG_14_202107181/137502/ >> tar: PG_14_202107181: Cannot mkdir: Permission denied >> tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory >> PG_14_202107181/137502/3222926016 >> tar: PG_14_202107181: Cannot mkdir: Permission denied >> >> The files exist in the specified directory. >> > > /tnt??? Or /mnt? > >
Re: pg_basebackup Restore problem
On Wed, Jan 17, 2024 at 11:11 AM Johnathan Tiamoh wrote: > Hello, > > I am trying to restore a pg_basebackup and have the following errors. > > You need to tell us the PG version number *and* show us the full command you ran. > nohup: ignoring input > tar: /tnt/backup/current/7400.tar.gz: Not found in archive > tar: /tnt/backup/current/7401.tar.gz: Not found in archive > tar: /tnt/backup/current/7402.tar.gz: Not found in archive > tar: /tnt/backup/current/base.tar.gz: Not found in archive > tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive > tar: Exiting with failure status due to previous errors > PG_14_202107181/ > tar: PG_14_202107181: Cannot mkdir: Permission denied > What user are you running pg_basebackup as? > PG_14_202107181/137502/ > tar: PG_14_202107181: Cannot mkdir: Permission denied > tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory > PG_14_202107181/137502/3222926016 > tar: PG_14_202107181: Cannot mkdir: Permission denied > > The files exist in the specified directory. > /tnt??? Or /mnt?
pg_basebackup Restore problem
Hello, I am trying to restore a pg_basebackup and have the following errors. nohup: ignoring input tar: /tnt/backup/current/7400.tar.gz: Not found in archive tar: /tnt/backup/current/7401.tar.gz: Not found in archive tar: /tnt/backup/current/7402.tar.gz: Not found in archive tar: /tnt/backup/current/base.tar.gz: Not found in archive tar: /tnt/backup/current/pg_wal.tar.gz: Not found in archive tar: Exiting with failure status due to previous errors PG_14_202107181/ tar: PG_14_202107181: Cannot mkdir: Permission denied PG_14_202107181/137502/ tar: PG_14_202107181: Cannot mkdir: Permission denied tar: PG_14_202107181/137502: Cannot mkdir: No such file or directory PG_14_202107181/137502/3222926016 tar: PG_14_202107181: Cannot mkdir: Permission denied The files exist in the specified directory. Thank you Johnathan T.
Re: pg_basebackup
## Matthias Apitz (g...@unixarea.de): > 2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to > replication timeout Besides "what Lauenz said" (especially about the horribly ooutdated PostgreSQL version): check IO speed and saturation during backup and make sure you're not stalling. I've seen this beaviour a few times, mostly in conjunction with btrfs - using a suitably proven filesystem usually solved the problem (overloaded hardware can be a problem, too - but modern systems can take quite a bit more than in the olden days of spinning rust). Regards, Christoph -- Spare Space.
Re: pg_basebackup
On Mon, 2023-11-20 at 07:30 +0100, Matthias Apitz wrote: > We're facing in a customer installation (PostgreSQL 13.1 on Linux) the > following problem for the first time and not reproducible: 13.1? Your immediate reaction should be "update to the latest minor release". > ${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM} > > The resulting stdout/stderr of the script: > > 16.11.2023-20:20:02: pg_basebackup the cluster to > /Backup/postgres/sisis-20231116-1 ... > pg_basebackup: could not receive data from WAL stream: server closed the > connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > pg_basebackup: child process exited with error 1 > > pg-error.log: > > 2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to > replication timeout > > Why the PostgreSQL server says something about "replication", we do > pg_basebackup? Because "pg_basebackup" uses a replication connection. > Some more information: > > - wal_sender_timeout has default value (60s) Increase "wal_sender_timeout", perhaps to 0 (which means "infinite"). Yours, Laurenz Albe
pg_basebackup
Hello, We're facing in a customer installation (PostgreSQL 13.1 on Linux) the following problem for the first time and not reproducible: The effective part of our backup script contains: ... test -d ${BACKUPWAL}-${DATE}-${NUM}/ || mkdir -p ${BACKUPWAL}-${DATE}-${NUM}/ # kick to archive the current log; use a DB which will exist; # psql -U ${DBSUSER} -dpostgres -c "select pg_switch_wal();" > /dev/null # backup the cluster # printf "%s: pg_basebackup the cluster to %s ... " "`date "+%d.%m.%Y-%H:%M:%S"`" ${BACKUPDIR}-${DATE}-${NUM} ${BINDIR}/pg_basebackup -U ${DBSUSER} -Ft -z -D ${BACKUPDIR}-${DATE}-${NUM} ... The resulting stdout/stderr of the script: 16.11.2023-20:20:02: pg_basebackup the cluster to /Backup/postgres/sisis-20231116-1 ... pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_basebackup: child process exited with error 1 pg-error.log: 2023-11-16 20:34:13.538 CET [6250] LOG: terminating walsender process due to replication timeout Why the PostgreSQL server says something about "replication", we do pg_basebackup? Some more information: - wal_sender_timeout has default value (60s) - backup target is a local file, not a network storage - the Linux SLES 15 server is good equipped - nothing is logged in /var/log/messages Any ideas? Thanks. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Re: pg_basebackup / recovery
Στις 13/4/23 01:31, ο/η Michael Paquier έγραψε: On Wed, Apr 12, 2023 at 01:45:56PM +0300, Achilleas Mantzios - cloud wrote: On 4/12/23 12:32, Fabrice Chapuis wrote: During recovery process of a self contained backup, how postgres know to stop reading wal when consistency is reached? Because it knows the full packup info. It will observe the STOP WAL LOCATION: 3BC7/4B000130 (file 00023BC7004B) inside the backup file There is a bit more to that in the recovery logic, depending mostly on the presence of backup_ label file in the data folder when recovery begins. Once the backup_label is found at the beginning of recovery, its information is stored in the control file and the file is renamed to backup_label.old hence stopping the server when recovery has not reached its expected point would rely on the control file contents later on. Then, the startup process and its WAL redo makes sure that WAL replays until it finds the WAL record marking the end of the backup. Grepping for XLOG_BACKUP_END (WAL record type in this case) shows all the areas that rely on that, and xlogrecovery.c covers the most relevant bits. Thank you for the info! Sorry about my stupid typo "packup" :( -- Michael -- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt
Re: pg_basebackup / recovery
On Wed, Apr 12, 2023 at 01:45:56PM +0300, Achilleas Mantzios - cloud wrote: > On 4/12/23 12:32, Fabrice Chapuis wrote: >> During recovery process of a self contained backup, how postgres know to >> stop reading wal when consistency is reached? > > Because it knows the full packup info. It will observe the > > STOP WAL LOCATION: 3BC7/4B000130 (file 00023BC7004B) > > inside the backup file There is a bit more to that in the recovery logic, depending mostly on the presence of backup_ label file in the data folder when recovery begins. Once the backup_label is found at the beginning of recovery, its information is stored in the control file and the file is renamed to backup_label.old hence stopping the server when recovery has not reached its expected point would rely on the control file contents later on. Then, the startup process and its WAL redo makes sure that WAL replays until it finds the WAL record marking the end of the backup. Grepping for XLOG_BACKUP_END (WAL record type in this case) shows all the areas that rely on that, and xlogrecovery.c covers the most relevant bits. -- Michael signature.asc Description: PGP signature
Re: pg_basebackup / recovery
On 4/12/23 12:32, Fabrice Chapuis wrote: During recovery process of a self contained backup, how postgres know to stop reading wal when consistency is reached? Because it knows the full packup info. It will observe the STOP WAL LOCATION: 3BC7/4B000130 (file 00023BC7004B) inside the backup file It's a full contained backup because you haven't called with the |-X /|method none|/| , and it doesn't get into standby because you haven't called with |--write-recovery-conf !| |I believe by default it will do what you meant that you want. |
pg_basebackup / recovery
During recovery process of a self contained backup, how postgres know to stop reading wal when consistency is reached?
Re: postgres replication without pg_basebackup? postgres 13.3
On Mon, 2022-11-07 at 23:11 +0100, Pilar de Teodoro wrote: > Thank you very much for the idea. [of running pg_rewind] > We ran pg_rewind correctly: > [postgres@gacsdb05 data-13.3]$ pg_rewind -c -R > --target-pgdata=/PostgresDB/sas_hdd/data-13.3/ --source-server="host= > port= user=postgres password=XXX" > pg_rewind: source and target cluster are on the same timeline > pg_rewind: no rewind required I cannot verify that you ran it correctly. "Target" should be the old server with the extra transactions. "Source" should be the promoted standby server. Promotion switches to a new timeline, so it looks like you did something wrong. > but then we got the following error: > > [2022-11-07 22:57:55 CET-]LOG: starting PostgreSQL 13.3 on > x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), > 64-bit > [2022-11-07 22:57:55 CET-]LOG: listening on IPv4 address "0.0.0.0", port > [2022-11-07 22:57:55 CET-]LOG: could not create IPv6 socket for address > "::": Address family not supported by protocol > [2022-11-07 22:57:55 CET-]LOG: listening on Unix socket "/tmp/.s.PGSQL." > [2022-11-07 22:57:55 CET-]LOG: database system was shut down in recovery at > 2022-11-07 22:57:47 CET > [2022-11-07 22:57:55 CET-]LOG: restored log file "0003.history" from > archive > cp: cannot stat '/PostgresWalLogArchive/new/0004.history': No such file > or directory > [2022-11-07 22:57:55 CET-]LOG: entering standby mode > [2022-11-07 22:57:55 CET-]LOG: restored log file "0003.history" from > archive > [2022-11-07 22:57:55 CET-]LOG: invalid primary checkpoint record > [2022-11-07 22:57:55 CET-]PANIC: could not locate a valid checkpoint record > [2022-11-07 22:57:55 CET-]LOG: startup process (PID 3011860) was terminated > by signal 6: Aborted > [2022-11-07 22:57:55 CET-]LOG: aborting startup due to startup process > failure > [2022-11-07 22:57:55 CET-]LOG: database system is shut down > > We have read we can run pg_resetwal but the Wal folder is the folder where > the Wals are archived in the primary. Would that be correct to reset them? Don't run "pg_resetwal". At the very least, it will break your standby. At this point, your standby seems to be broken. I don't know what exactly you did, but it leeks like you should run a "pg_basebackup" after all. Yours, Laurenz Albe
Re: postgres replication without pg_basebackup? postgres 13.3
Dear Laurenz, Thank you very much for the idea. We ran pg_rewind correctly: [postgres@gacsdb05 data-13.3]$ pg_rewind -c -R --target-pgdata=/PostgresDB/sas_hdd/data-13.3/ --source-server="host= port= user=postgres password=XXX" pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required but then we got the following error: [2022-11-07 22:57:55 CET-]LOG: starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit [2022-11-07 22:57:55 CET-]LOG: listening on IPv4 address "0.0.0.0", port [2022-11-07 22:57:55 CET-]LOG: could not create IPv6 socket for address "::": Address family not supported by protocol [2022-11-07 22:57:55 CET-]LOG: listening on Unix socket "/tmp/.s.PGSQL." [2022-11-07 22:57:55 CET-]LOG: database system was shut down in recovery at 2022-11-07 22:57:47 CET [2022-11-07 22:57:55 CET-]LOG: restored log file "0003.history" from archive cp: cannot stat '/PostgresWalLogArchive/new/0004.history': No such file or directory [2022-11-07 22:57:55 CET-]LOG: entering standby mode [2022-11-07 22:57:55 CET-]LOG: restored log file "0003.history" from archive [2022-11-07 22:57:55 CET-]LOG: invalid primary checkpoint record [2022-11-07 22:57:55 CET-]PANIC: could not locate a valid checkpoint record [2022-11-07 22:57:55 CET-]LOG: startup process (PID 3011860) was terminated by signal 6: Aborted [2022-11-07 22:57:55 CET-]LOG: aborting startup due to startup process failure [2022-11-07 22:57:55 CET-]LOG: database system is shut down We have read we can run pg_resetwal but the Wal folder is the folder where the Wals are archived in the primary. Would that be correct to reset them? Any suggestion? Thank you so much, Pilar El lun, 7 nov 2022 a las 12:21, Laurenz Albe () escribió: > On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote: > > We have a very large database of 37TB and we had to promote our standby > to primary due to > > some disk failures. Now the issues are solved, we would like to make > standby the old primary > > from a copy of the new primary which is already in place. Is it > possible without using pg_basebackup? > > That's exactly what "pg_rewind" is for. It is a fast version of > "pg_basebackup" for exactly > that case. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com >
Re: postgres replication without pg_basebackup? postgres 13.3
On Mon, 2022-11-07 at 11:02 +0100, Pilar de Teodoro wrote: > We have a very large database of 37TB and we had to promote our standby to > primary due to > some disk failures. Now the issues are solved, we would like to make standby > the old primary > from a copy of the new primary which is already in place. Is it possible > without using pg_basebackup? That's exactly what "pg_rewind" is for. It is a fast version of "pg_basebackup" for exactly that case. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
postgres replication without pg_basebackup? postgres 13.3
Dear all, We have a very large database of 37TB and we had to promote our standby to primary due to some disk failures. Now the issues are solved, we would like to make standby the old primary from a copy of the new primary which is already in place. Is it possible without using pg_basebackup? We have already a copy created than can be started but if we open as standby (standby.signal created) the log says when started: [2022-10-28 21:27:12 CEST-]LOG: starting PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit [2022-10-28 21:27:12 CEST-]LOG: listening on IPv4 address "0.0.0.0", port 8300 [2022-10-28 21:27:12 CEST-]LOG: could not create IPv6 socket for address "::": Address family not supported by protocol [2022-10-28 21:27:12 CEST-]LOG: listening on Unix socket "/tmp/.s.PGSQL.8300" [2022-10-28 21:27:12 CEST-]LOG: database system was interrupted; last known up at 2022-10-28 20:31:08 CEST [2022-10-28 21:28:06 CEST-]LOG: restored log file "0003.history" from archive cp: cannot stat '/PostgresWalLogArchive/new/0004.history': No such file or directory [2022-10-28 21:28:06 CEST-]LOG: entering standby mode [2022-10-28 21:28:06 CEST-]LOG: restored log file "0003.history" from archive cp: cannot stat '/PostgresWalLogArchive/new/0003BF72002F': No such file or directory [2022-10-28 21:28:06 CEST-]LOG: restored log file "0002BF72002F" from archive [2022-10-28 21:28:06 CEST-]FATAL: requested timeline 3 is not a child of this server's history *[2022-10-28 21:28:06 CEST-]DETAIL: Latest checkpoint is at BF72/2F0309C0 on timeline 2, but in the history of the requested timeline, the server forked off from that timeline at B1D6/6000.* *[2022-10-28 21:28:06 CEST-]LOG: startup process (PID 1298266) exited with exit code 1* [2022-10-28 21:28:06 CEST-]LOG: aborting startup due to startup process failure [2022-10-28 21:28:07 CEST-]LOG: database system is shut down Do you know if we can make it work without having to run pg_basebackup for 37TB which will take about 3 days to copy? Why it is requesting a different timeline if it is a copy of the primary? Primary replica parameters: wal_level = replica archive_mode = on archive_command = 'cp -i %p /PostgresWalLogArchive/new/%f' standby replica parameters: primary_conninfo = 'host=XXX port=8300 user=postgres password=***' restore_command = 'cp /PostgresWalLogArchive/new/%f %p' archive_cleanup_command = '/home/postgres/software/postgresql/bin/pg_archivecleanup /PostgresWalLogArchive/new/%f %r' Thank you very much for your help in advance. Pilar de Teodoro
Re: About pg_basebackup
On Thu, 2022-05-19 at 13:18 +0900, 菊池祐 wrote: > I executed the pg_basebackup command to set up a replication configuration in > postgresql, > but the following message appears and replication didn’t complete. > > 2741/2742 tablespaces (/var/lib/pgsql/9.2/bac12609150596/12609150596 kB > (100%), 2741/2742 tablespaces (/var/lib/pgsql/9.2/bac12609150596/12609150596 > kB (100%), 2742/2742 tablespaces > NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be > archived > WARNING: pg_stop_backup still waiting for all required WAL segments to be > archived (60 seconds elapsed) > HINT: Check that your archive_command is executing properly. pg_stop_backup > can be canceled safely, but the database backup will not be usable without > all the WAL segments. > WARNING: pg_stop_backup still waiting for all required WAL segments to be > archived (120 seconds elapsed) I guess that your "archive_command" is hanging or failing. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
About pg_basebackup
Hi,I executed the pg_basebackup command to set up a replication configuration in postgresql, but the following message appears and replication didn’t complete.2741/2742 tablespaces (/var/lib/pgsql/9.2/bac12609150596/12609150596 kB (100%), 2741/2742 tablespaces (/var/lib/pgsql/9.2/bac12609150596/12609150596 kB (100%), 2742/2742 tablespaces NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be archived WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed) HINT: Check that your archive_command is executing properly. pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments. WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (120 seconds elapsed) ・ ・ ・Please let me know how to deal with this. The environment is as follows and I attached postgresql.conf file.Version : 9.2OS : Cent 6.6 -6124946774920295119postgresql.conf Description: Binary data
Re: pg_basebackup with hostssl ?
Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Wednesday, February 2nd, 2022 at 17:20, Adrian Klaver wrote: > > Before you do that I would establish that you are connecting to the > > correct Postgres instance. > > Good news, all up and running ! The new "postgresql.auto.conf" automagic is pure awesome. ;-)
Re: pg_basebackup with hostssl ?
On 2/2/22 09:18, Laura Smith wrote: Seems like I was looking for the wrong words on the right page ! Thanks, will try that. Before you do that I would establish that you are connecting to the correct Postgres instance. -- Adrian Klaver adrian.kla...@aklaver.com
Re: pg_basebackup with hostssl ?
Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Wednesday, February 2nd, 2022 at 16:50, Adrian Klaver wrote: > Not completely: > > https://www.postgresql.org/docs/current/app-pgbasebackup.html > > -d connstr > > --dbname=connstr > > Specifies parameters used to connect to the server, as a > > ; these will override any conflicting command line > > options. > > The option is called --dbname for consistency with other client > > applications, but because pg_basebackup doesn't connect to any > > particular database in the cluster, any database name in the connection > > string will be ignored. > Seems like I was looking for the wrong words on the right page ! Thanks, will try that.
Re: pg_basebackup with hostssl ?
On Wed, Feb 2, 2022 at 9:37 AM Laura Smith < n5d9xq3ti233xiyif...@protonmail.ch> wrote: > The point I am making is that pg_basebackup is connecting without > encryption and I want to know how to tell it to use encryption. The docs > are silent on the subject. > It is a client application that talks libpq. All of them understand a common set of environment variables: https://www.postgresql.org/docs/current/libpq-envars.html Many of those can also be supplied as part of the connection string. These include SSL mode control. The question I would ask is whether psql connects by default using ssl in the same basic configuration. If so, then the inconsistency amounts to pg_basebackup having an insecure default connection method while psql has a secure one. David J.
Re: pg_basebackup with hostssl ?
On 2/2/22 08:37, Laura Smith wrote: Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Wednesday, February 2nd, 2022 at 16:30, Adrian Klaver wrote: I am not following. pg_basebackup is a stand alone client that is not involved in replication other then establishing a starting point from which to later establish a replication relationship. Indeed. And that is exactly that I am trying to do (i.e. have master, want new slave). The error is saying that whatever instance you are pointing pg_basebackup at does not have entry in pg_hba.conf for the host/user/encryption combination pg_basebackup is trying to connect as. But it *DOES* have an entry, per my original message: "hostsslreplicationall10.0.0.0/8md5" From the OP: " Long story short, I've got the following in pg_hba.conf of the master: hostsslreplicationall10.0.0.0/8md5 But the slave is complaining: FATAL: no pg_hba.conf entry for replication connection from host "10.1.2.3", user "myrepl", no encryption " The error is coming from a connection to the slave which does not have data yet, correct? Or a matching pg_hba.conf entry, it would seem. And yes, that hba is loaded and live because other remote clients are happily connected to that server and thus reliant on a valid hba.conf. The host is correct. The user is correct. The point I am making is that pg_basebackup is connecting without encryption and I want to know how to tell it to use encryption. The docs are silent on the subject. Not completely: https://www.postgresql.org/docs/current/app-pgbasebackup.html -d connstr --dbname=connstr Specifies parameters used to connect to the server, as a ; these will override any conflicting command line options. The option is called --dbname for consistency with other client applications, but because pg_basebackup doesn't connect to any particular database in the cluster, any database name in the connection string will be ignored. Following the link: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING You can use that to set sslmode. -- Adrian Klaver adrian.kla...@aklaver.com
Re: pg_basebackup with hostssl ?
Forgot to add that I also have : "hostsslallall10.0.0.0/8md5"
Re: pg_basebackup with hostssl ?
Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Wednesday, February 2nd, 2022 at 16:30, Adrian Klaver wrote: > I am not following. pg_basebackup is a stand alone client that is not > > involved in replication other then establishing a starting point from > > which to later establish a replication relationship. > Indeed. And that is exactly that I am trying to do (i.e. have master, want new slave). > The error is saying that whatever instance you are pointing > > pg_basebackup at does not have entry in pg_hba.conf for the > > host/user/encryption combination pg_basebackup is trying to connect as. > But it *DOES* have an entry, per my original message: "hostsslreplicationall10.0.0.0/8md5" And yes, that hba is loaded and live because other remote clients are happily connected to that server and thus reliant on a valid hba.conf. The host is correct. The user is correct. The point I am making is that pg_basebackup is connecting without encryption and I want to know how to tell it to use encryption. The docs are silent on the subject.
Re: pg_basebackup with hostssl ?
On 2/2/22 07:48, Laura Smith wrote: I've had a quick glance through the man page for pg_basebackup but can't see any flags to set ssl. Long story short, I've got the following in pg_hba.conf of the master: hostsslreplicationall10.0.0.0/8md5 But the slave is complaining: FATAL: no pg_hba.conf entry for replication connection from host "10.1.2.3", user "myrepl", no encryption I suspect this is almost certainly because I'm using "hostssl" instead of "host". But other than the obvious and undesirable quick-fix, how should I be calling pg_basebackup to make sure it uses encryption ? I am not following. pg_basebackup is a stand alone client that is not involved in replication other then establishing a starting point from which to later establish a replication relationship. The error is saying that whatever instance you are pointing pg_basebackup at does not have entry in pg_hba.conf for the host/user/encryption combination pg_basebackup is trying to connect as. -- Adrian Klaver adrian.kla...@aklaver.com
pg_basebackup with hostssl ?
I've had a quick glance through the man page for pg_basebackup but can't see any flags to set ssl. Long story short, I've got the following in pg_hba.conf of the master: hostsslreplicationall10.0.0.0/8md5 But the slave is complaining: FATAL: no pg_hba.conf entry for replication connection from host "10.1.2.3", user "myrepl", no encryption I suspect this is almost certainly because I'm using "hostssl" instead of "host". But other than the obvious and undesirable quick-fix, how should I be calling pg_basebackup to make sure it uses encryption ?
Re: Issue with pg_basebackup v.11
Thanks Tom. Regards, Ninad Shah On Sat, 23 Oct 2021 at 20:12, Tom Lane wrote: > Ninad Shah writes: > > Would keepalive setting address and mitigate the issue? > > [ shrug... ] Maybe; nobody else has more information about this > situation than you do. I suggested something to experiment with. > > regards, tom lane >
Re: Issue with pg_basebackup v.11
Ninad Shah writes: > Would keepalive setting address and mitigate the issue? [ shrug... ] Maybe; nobody else has more information about this situation than you do. I suggested something to experiment with. regards, tom lane
Re: Issue with pg_basebackup v.11
Hey Tom, Thank you for your response. Actually, when we copy data using scp/rsync, it works without any issue. But, it fails while attempting to transfer using pg_basebackup. Would keepalive setting address and mitigate the issue? Regards, Ninad Shah On Fri, 22 Oct 2021 at 21:39, Tom Lane wrote: > Ninad Shah writes: > > What I observed is that it takes a couple of hours between below 2 lines. > > > 115454656/1304172127 kB (8%), 0/1 tablespace > > (...atastaging/base/115868/154220.2) > > pgbasebackup: could not read COPY data: could not receive data from > server: > > Connection timed out > > We have heard reports of network connections dropping while pg_basebackup > is busy doing something disk-intensive such as fsync'ing. The apparent > 2-hour delay here does not mean that pg_basebackup was out to lunch for > 2 hours; more likely that reflects the TCP timeout delay before the kernel > realizes that the connection is lost. The actual blame probably resides > with some firewall or router that has a short timeout for idle > connections. > > I'd try turning on fairly aggressive TCP keepalive settings for the > connection, say keepalives_idle=30 or so. > > regards, tom lane >
Re: Issue with pg_basebackup v.11
Ninad Shah writes: > What I observed is that it takes a couple of hours between below 2 lines. > 115454656/1304172127 kB (8%), 0/1 tablespace > (...atastaging/base/115868/154220.2) > pgbasebackup: could not read COPY data: could not receive data from server: > Connection timed out We have heard reports of network connections dropping while pg_basebackup is busy doing something disk-intensive such as fsync'ing. The apparent 2-hour delay here does not mean that pg_basebackup was out to lunch for 2 hours; more likely that reflects the TCP timeout delay before the kernel realizes that the connection is lost. The actual blame probably resides with some firewall or router that has a short timeout for idle connections. I'd try turning on fairly aggressive TCP keepalive settings for the connection, say keepalives_idle=30 or so. regards, tom lane
Issue with pg_basebackup v.11
Hello experts, I am facing an issue with a customer's production server while trying to take backup using pg_basebackup. Below is the log from pg_basebackup execution. * 115338208/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1) 115355616/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1) 115372640/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1) 115389568/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1) 115405792/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1) 115423776/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.1) 115440640/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.2) 115454656/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.2) pgbasebackup: could not read COPY data: could not receive data from server: Connection timed out pgbasebackup: removing contents of data directory "/u01/PostgreSQL/11/datastaging"* It copied nearly 110 GB of data and exited. Initially, we suspected it as a network/OS issue. However, we tried to copy a 150 GB large file over the network, which finished successfully. What I observed is that it takes a couple of hours between below 2 lines. 115454656/1304172127 kB (8%), 0/1 tablespace (...atastaging/base/115868/154220.2) pgbasebackup: could not read COPY data: could not receive data from server: Connection timed out In other words, it run for an hour, and later, it takes 2 hours before it times out. Can someone please help me out here? Regards, Ninad Shah
Re: Problem in pg_basebackup of 1.7 TB database while replicating in PostgreSQL 9.4.( Master and slave both are in postgreSQL 9.4)
On 10/4/21 2:28 AM, Amal Chakravarty wrote: Hi all. I am working with a database of 1.7 TB size which is in PostgreSQL 9.4. Kindly suggest how to take the pg_basebackup of such a huge data while replicating from master to slave. Obligatory "9.4 is EOL" comment. -- Angular momentum makes the world go 'round.
Re: Problem in pg_basebackup of 1.7 TB database while replicating in PostgreSQL 9.4.( Master and slave both are in postgreSQL 9.4)
On 10/4/21 9:28 AM, Amal Chakravarty wrote: Hi all. I am working with a database of 1.7 TB size which is in PostgreSQL 9.4. Kindly suggest how to take the pg_basebackup of such a huge data while replicating from master to slave. And what exactly is the issue? Does it fail in some way or are you looking for a faster / more efficient way to transfer the data? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Problem in pg_basebackup of 1.7 TB database while replicating in PostgreSQL 9.4.( Master and slave both are in postgreSQL 9.4)
Hi all. I am working with a database of 1.7 TB size which is in PostgreSQL 9.4. Kindly suggest how to take the pg_basebackup of such a huge data while replicating from master to slave. Regards, Amal Chakravarty.
Re: pg_basebackup fails with "COPY stream ended"
On Tue, Jun 15, 2021 at 09:53:45PM -0700, Dipanjan Das wrote: > > I am running "pg_basebackup -h -U postgres -D -X stream". It > fails with either of the following two error messages: > [...] > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. Do you have frequent emergency restarts like that? You should start by investigating on why this is happening. The logs should tell you what process crashed and what it was executing, could you report those messages? We will also probably need a backtrace, see https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Getting_a_trace_from_a_randomly_crashing_backend on how do to that.
pg_basebackup fails with "COPY stream ended"
Hi, I am running "pg_basebackup -h -U postgres -D -X stream". It fails with either of the following two error messages: ERROR: Backup failed copying files. DETAILS: data transfer failure on directory '/mnt/data/barman/base/20210615T212304/data' pg_basebackup error: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. pg_basebackup: error: could not read COPY data: SSL SYSCALL error: EOF detected pg_basebackup: removing contents of data directory "/mnt/data/barman/base/20210615T212304/data" OR, ERROR: Backup failed copying files. DETAILS: data transfer failure on directory '/mnt/data/barman/base/20210615T212849/data' pg_basebackup error: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: error: COPY stream ended before last file was finished pg_basebackup: removing contents of data directory "/mnt/data/barman/base/20210615T212849/data" I am running Postgres 12.7 on Ubuntu 20.04. Can anyone please help me figure out what's going wrong?
Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup
On Tue, 2021-04-13 at 06:36 -0700, MaXinjian wrote: > > Why do you write the WAL to /tmp/pg_wal, only to later mount that at the > > default location? > > pg_wal dir has size limitation, if wal files are too large, they will be > overwrited, right? No, they won't. You could run out of space on the file system though. > > I see nothing wrong with what you are doing, but I may have got lost in > > your complicated procedure. > > You don't happen to remove "backup_label", do you? > > em, I do remove backup_label... Then that's your problem. That will corrupt your data, because recovery starts from the wrong checkpoint. > 1. It means recovery.conf is not necessary, backup_label is necessary? Yes, exactly. > 2. Which key in backup_label is necessary? The whole file needs to be preserved unchanged, just as it is. Don't mess with that file. > 3. I searched the log, it do has recoveried. > Then, if there is no backup_label, what's the default START WAL LOCATION and > CHECKPOINT LOCATION? That's the catch. "backup_label" is the *only way* to tell a backup from a crashed PostgreSQL cluster. If there is no "backup_label", PostgreSQL will get the latest checkpoint from the control file (global/pg_control), which may well be later than the checkpoint that started the backup, so you will miss to recover some transactions. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup
Ma Xinjian writes: > When I use pg_basebackup to backup and restore db(Let's call it A) to a > standalone instance(Let's call it B), "missing chunk number 0 for toast > value xxx in pg_toast_xxx" errors output. > PG version: 10.3 10.3 is quite a few bug fixes ago. Maybe you'd have better results with the current release (10.16). regards, tom lane
Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup
> Why do you write the WAL to /tmp/pg_wal, only to later mount that at the > default location? pg_wal dir has size limitation, if wal files are too large, they will be overwrited, right? > I see nothing wrong with what you are doing, but I may have got lost in > your complicated procedure. > You don't happen to remove "backup_label", do you? em, I do remove backup_label... 1. It means recovery.conf is not necessary, backup_label is necessary? 2. Which key in backup_label is necessary? 3. I searched the log, it do has recoveried. Then, if there is no backup_label, what's the default START WAL LOCATION and CHECKPOINT LOCATION? -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: "missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup
On Tue, 2021-04-13 at 02:38 -0700, Ma Xinjian wrote: > When I use pg_basebackup to backup and restore db(Let's call it A) to a > standalone instance(Let's call it B), "missing chunk number 0 for toast > value xxx in pg_toast_xxx" errors output. > > PG version: 10.3 > pg_basebackup command: > /usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data > -Xs -P -n --waldir=/tmp/pg_wal >I have mounted a disk to /tmp/pg_wal before, then I will mount the disk > to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal > records during backup. > > Since I don't want B to be a standy server, I just want it to be a > standalone server. > I removed recovery.conf, then simply start postgresql-10.service. It turned > out that postgresql-10.service > can be started successfully. But when I use this postgresql(reindex, vacumm > and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx" > errors output. > > When pg_basebackup, it will store wal under pg_wal, can't postgresql work > with wal records locally? > I think primary_conninfo in recovery.conf is just used to get newer wal > records from A. Right? > > I have also tested: > If I start postgresql-10.service with recovery.conf firstly, then split it > from postgresql cluster, everything works fine. > > Above test seems proved that it is wal records's problem. I am really > confused. Your mail got me confused... Why do you write the WAL to /tmp/pg_wal, only to later mount that at the default location? I see nothing wrong with what you are doing, but I may have got lost in your complicated procedure. You don't happen to remove "backup_label", do you? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
"missing chunk number 0 for toast value xxx in pg_toast_xxx" when pg_basebackup
Hi, When I use pg_basebackup to backup and restore db(Let's call it A) to a standalone instance(Let's call it B), "missing chunk number 0 for toast value xxx in pg_toast_xxx" errors output. PG version: 10.3 pg_basebackup command: /usr/pgsql-10/bin/pg_basebackup -h p-rdb-c01 -D /var/lib/pgsql/10/data -Xs -P -n --waldir=/tmp/pg_wal I have mounted a disk to /tmp/pg_wal before, then I will mount the disk to /var/lib/pgsql/10/data/pg_wal, so as to ensure completeness of wal records during backup. Since I don't want B to be a standy server, I just want it to be a standalone server. I removed recovery.conf, then simply start postgresql-10.service. It turned out that postgresql-10.service can be started successfully. But when I use this postgresql(reindex, vacumm and so on), "missing chunk number 0 for toast value xxx in pg_toast_xxx" errors output. When pg_basebackup, it will store wal under pg_wal, can't postgresql work with wal records locally? I think primary_conninfo in recovery.conf is just used to get newer wal records from A. Right? I have also tested: If I start postgresql-10.service with recovery.conf firstly, then split it from postgresql cluster, everything works fine. Above test seems proved that it is wal records's problem. I am really confused. Regards Ma Xinjian -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: "invalid record length" after restoring pg_basebackup
Dennis Jacobfeuerborn writes: > On 11/13/20 4:02 PM, Tom Lane wrote: >> This looks quite normal to me. If you'd pulled the power plug on the >> primary system at the time you made this backup, you would likely see >> the same message at the end of its crash recovery. Some sort of >> corrupt-WAL-entry report is expected at the end of WAL replay anytime >> you didn't have a clean shutdown. > But the system the backup was pulled from kept running fine. Also > wouldn't that make re-attaching the system to the primary impossible > since replication cannot be continued due to the broken WAL record? There is no "broken WAL record". There is only junk following the primary's current WAL write point. > What I would expect is that pg_basebackup only transfer healthy WAL > entries so that a restored system can pick up right after that with > streaming replication. You need to adjust your expectations. pg_basebackup doesn't parse the WAL data, because it has no need to. It just copies whole WAL segment files. regards, tom lane
Re: "invalid record length" after restoring pg_basebackup
On 11/13/20 4:02 PM, Tom Lane wrote: > Dennis Jacobfeuerborn writes: >> All of this works fine and the logs report that the db reaches a >> consistent recovery state but as last entry it reports an "invalid >> record length": > > This looks quite normal to me. If you'd pulled the power plug on the > primary system at the time you made this backup, you would likely see > the same message at the end of its crash recovery. Some sort of > corrupt-WAL-entry report is expected at the end of WAL replay anytime > you didn't have a clean shutdown. But the system the backup was pulled from kept running fine. Also wouldn't that make re-attaching the system to the primary impossible since replication cannot be continued due to the broken WAL record? What I would expect is that pg_basebackup only transfer healthy WAL entries so that a restored system can pick up right after that with streaming replication. Regards, Dennis
Re: "invalid record length" after restoring pg_basebackup
Dennis Jacobfeuerborn writes: > All of this works fine and the logs report that the db reaches a > consistent recovery state but as last entry it reports an "invalid > record length": This looks quite normal to me. If you'd pulled the power plug on the primary system at the time you made this backup, you would likely see the same message at the end of its crash recovery. Some sort of corrupt-WAL-entry report is expected at the end of WAL replay anytime you didn't have a clean shutdown. regards, tom lane
"invalid record length" after restoring pg_basebackup
Hi, I've run into a strange issue after restoring a backup that I created using pg_basebackup on a standby instance. The command I use to create the backup is this: pg_basebackup -v --write-recovery-conf -h$BACKUP_HOST -p5432 -U$BACKUP_USER --format tar --wal-method stream --compress=2 -D "$BACKUP_DIR" This backup runs fine and produces a "base.tar.gz" and "pg_wal.tar.gz" file. The server version is 11.7. To restore the backup on another system I unpack the "base.tar.gz" file into the data directory and "pg_wal.tar.gz" into the "pg_wal" sub-directory. I then comment out the "primary_conninfo" directive in the "recovery.conf" file since I don't want to replicate from the primary on this system I just want to restore the state from the backup. All of this works fine and the logs report that the db reaches a consistent recovery state but as last entry it reports an "invalid record length": 2020-11-13 12:25:34.266 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2020-11-13 12:25:34.266 UTC [1] LOG: listening on IPv6 address "::", port 5432 2020-11-13 12:25:34.269 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-11-13 12:25:34.437 UTC [20] LOG: database system was interrupted while in recovery at log time 2020-11-03 14:36:00 UTC 2020-11-13 12:25:34.437 UTC [20] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2020-11-13 12:25:35.359 UTC [20] WARNING: recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command 2020-11-13 12:25:35.359 UTC [20] HINT: The database server will regularly poll the pg_wal subdirectory to check for files placed there. 2020-11-13 12:25:35.359 UTC [20] LOG: entering standby mode 2020-11-13 12:25:35.372 UTC [20] LOG: redo starts at 932/9D123E70 2020-11-13 12:26:45.435 UTC [20] LOG: consistent recovery state reached at 933/AA63F600 2020-11-13 12:26:45.435 UTC [1] LOG: database system is ready to accept read only connections 2020-11-13 12:26:45.444 UTC [20] LOG: invalid record length at 933/AA6DA660: wanted 24, got 0 Looking at the WAL data the last file is "0001093300AA" and there is a file "archive_status/0001093300A9.done" but no corresponding ".done" file for the "...AA" WAL file. Running pg_waldump on the "...AA" file shows the following at the end: $ pg_waldump 0001093300AA ... rmgr: Heaplen (rec/tot): 54/54, tx:1743220, lsn: 933/AA6DA510, prev 933/AA6D8F08, desc: LOCK off 12: xid 1743220: flags 0 LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/13117/874105 blk 0 rmgr: Heaplen (rec/tot): 81/81, tx:1743220, lsn: 933/AA6DA548, prev 933/AA6DA510, desc: INSERT off 19, blkref #0: rel 1663/13117/868296 blk 0 rmgr: Btree len (rec/tot): 80/80, tx:1743220, lsn: 933/AA6DA5A0, prev 933/AA6DA548, desc: INSERT_LEAF off 24, blkref #0: rel 1663/13117/868299 blk 1 rmgr: Btree len (rec/tot): 72/72, tx:1743220, lsn: 933/AA6DA5F0, prev 933/AA6DA5A0, desc: INSERT_LEAF off 23, blkref #0: rel 1663/13117/915254 blk 2 rmgr: Transaction len (rec/tot): 34/34, tx:1743220, lsn: 933/AA6DA638, prev 933/AA6DA5F0, desc: COMMIT 2020-11-03 14:49:29.205562 UTC pg_waldump: FATAL: error in WAL record at 933/AA6DA638: invalid record length at 933/AA6DA660: wanted 24, got 0 It looks like the WAL in the backup is inconsistent at the end but I'd expect pg_basebackup to create a clean backup. Is this something I should expect or is this a problem with pg_basebackup? Regards, Dennis
Re: pg_basebackup + delta base backups
Greetings, * Christopher Pereira (krip...@imatronix.cl) wrote: > On 26-May-20 10:20, Stephen Frost wrote: > >"out of sync" is a bit of an odd concept, but having a replica fall > >behind a long way is certainly something that can happen and may require > >a rebuild from a backup (or from a new sync off of the primary in some > >other way, as you suggest below). In a situation where there's async > >replication happening and you promote a replica to take over, that's > >definitely a case where you might also have to rebuild the former > >primary. > > Yes, a common case with async streaming is when primary (A) goes down and > replica is promoted as a new master (B). > Then A comes back and has some data that was not streamed to B so pg_rewind > is useless. > > I wonder if there is some option to just discard this branched data from A > in order to start as a new replica. ... This is exactly what pg_rewind is for, so I'm not really sure what you're asking about here. > I noticed that pg_rewind is useless even when both DBs are identical > (according to pg_dumpall | md5sum). That isn't really a meaningful or useful thing to consider. > >>As you said, all the pieces are there and it would be quite easy to write a > >>new "pg_basebackup_delta" script that could be executed on the standby host > >>to: > >> > >>1) setup a pgBackRest repo on the primary host (via SSH) > >> > >>2) create a backup on the primary host (via SSH) > >> > >>3) do a delta restore on the standby > >> > >>Even when the repository on the primary host is only created temporarily > >>(and require double storage, resources, etc), it may still be worth > >>considering the traffic that can be saved by doing a delta restore on a > >>standby host in a different region, right? > >So... There's actually a way to do this with pgbackrest, but it doesn't > >support the delta capability. > > If I understood correctly the method you described, you were basically doing > a "backup" between A (primary) and B (repo) and in such a way the repo is > then compatible with the pg_data structure, but without delta support (ie. > transfering the whole database)? Yes. > Delta support is critical for VLDBs, so I see two alternatives to replace > pg_basebackup with pgbackrest to rebuild a replica: > > 1) Create a temporary repo on the primary > > 2) Create a temporary repo on the replica Yes, if you use a repo then you can use pgbackrest's incremental and delta features. > All configurations would be undone after the replica has been rebuilt and > both alternatives would be using delta over the wire. > In your opinion, which alternative is better considering network traffic? If you want to minimize network traffic then probably using either incremental backups or delta restore would make the most sense. Thanks, Stephen signature.asc Description: PGP signature
Re: canceling statement due to conflict with recovery after pg_basebackup
At Wed, 3 Jun 2020 10:07:14 +0300, "Andrus" wrote in > Hi! Hi. > Async binary replication hot standby was started after pg_basebackup. > Running query in slave throws error > > ERROR: canceling statement due to conflict with recovery > > Why ? As written in the messages. > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query > might have needed to see row versions that must be removed. Despite the master had removed some rows by vacuum and that must be reflected to your standby, the standby cannot do that because of the transaction that has started before the rows are vacuumed. More technically, the transaction on the standby was using a snapshot with older transaction ids than the vacuum cutoff transaction id on the master. So the standby needed to cancel the the statement or session in order to continue replication. > Query should return table and other sizes in decreasing order. > How to improve it so that this error does not occur. Hot-standby-feedback would work. https://www.postgresql.org/docs/12/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK > Log: > > 2020-06-03 09:40:52 EEST LOG: database system was interrupted; last > known up at 2020-06-03 07:59:56 EEST > 2020-06-03 09:41:10 EEST LOG: entering standby mode > 2020-06-03 09:41:10 EEST LOG: redo starts at 2E2/28 > 2020-06-03 09:41:19 EEST LOG: consistent recovery state reached at > 2E2/B5A56C8 > 2020-06-03 09:41:19 EEST LOG: database system is ready to accept read > only connections > 2020-06-03 09:41:19 EEST LOG: started streaming WAL from primary at > 2E2/C00 on timeline 1 > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba ERROR: canceling > statement due to conflict with recovery > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query > might have needed to see row versions that must be removed. > 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba STATEMENT: select > company_name(n.nspname)::char(20) as company, > relname::char(25), >pg_size_pretty(pg_total_relation_size(c.oid))::char(10) as totalsize, > n.nspname::char(12), >case > when c.relkind='i' then 'index' >when c.relkind='t' then 'toast' > when c.relkind='r' then 'table' > when c.relkind='v' then 'view' > when c.relkind='c' then 'composite type' > when c.relkind='S' then 'sequence' >else c.relkind::text > end ::char(14) as "type" > from >pg_class c >left join pg_namespace n on n.oid = c.relnamespace >left join pg_tablespace t on t.oid = c.reltablespace > where >(pg_total_relation_size(c.oid)>>21)>0 and c.relkind!='t' > order by >pg_total_relation_size(c.oid) desc regards. -- Kyotaro Horiguchi NTT Open Source Software Center
canceling statement due to conflict with recovery after pg_basebackup
Hi! Async binary replication hot standby was started after pg_basebackup. Running query in slave throws error ERROR: canceling statement due to conflict with recovery Why ? Query should return table and other sizes in decreasing order. How to improve it so that this error does not occur. Log: 2020-06-03 09:40:52 EEST LOG: database system was interrupted; last known up at 2020-06-03 07:59:56 EEST 2020-06-03 09:41:10 EEST LOG: entering standby mode 2020-06-03 09:41:10 EEST LOG: redo starts at 2E2/28 2020-06-03 09:41:19 EEST LOG: consistent recovery state reached at 2E2/B5A56C8 2020-06-03 09:41:19 EEST LOG: database system is ready to accept read only connections 2020-06-03 09:41:19 EEST LOG: started streaming WAL from primary at 2E2/C00 on timeline 1 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba ERROR: canceling statement due to conflict with recovery 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba DETAIL: User query might have needed to see row versions that must be removed. 2020-06-03 09:54:23 EEST 85.253.131.166 user@sba STATEMENT: select company_name(n.nspname)::char(20) as company, relname::char(25), pg_size_pretty(pg_total_relation_size(c.oid))::char(10) as totalsize, n.nspname::char(12), case when c.relkind='i' then 'index' when c.relkind='t' then 'toast' when c.relkind='r' then 'table' when c.relkind='v' then 'view' when c.relkind='c' then 'composite type' when c.relkind='S' then 'sequence' else c.relkind::text end ::char(14) as "type" from pg_class c left join pg_namespace n on n.oid = c.relnamespace left join pg_tablespace t on t.oid = c.reltablespace where (pg_total_relation_size(c.oid)>>21)>0 and c.relkind!='t' order by pg_total_relation_size(c.oid) desc Andrus.
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi Andrus, > On 01. Jun, 2020, at 12:26, Andrus wrote: > My script does the same thing as your comands. no, it does not. And as long as you refuse to understand that, I can't help you. > I want to create replication server. no, you want to create a streaming replica. > Old cluster is empty, from initdb. Backup is not needed then why the mv stuff? > pg_basebackup uses environment varuables if not specified in command line. So > my script does the same thing. have you looked at the other options? > I tried > sudo --user=postgres pg_basebackup > but got error > could not change directory to "/root": Permission denied try sudo su - postgres pg_basebackup Note the "-"! It's essential to get the environment of postgres, which is what you want. Cheers, Paul
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi! How to create replication server ? I always do it this way and it work for me: $ pg_basebackup -h ${PGHOST} -p ${PGPORT} -U replicator -W -R -D ${PGDATA} -P -v -Fp -Xs After that, I edit ${PGDATA}/postgresql.conf and (w/ PostgreSQL 11 and older ${PGDATA}/recovery.conf) to make it do what I want and then I just launch it: $ pg_ctl start My script does the same thing as your comands. From that moment onward, it replicates and applies to the replica. Checks in pg_stat_replication on the master and pg_stat_wal_receiver on the replica >confirm that. They also show the WAL switches. To provoke a WAL switch I always do: postgres=# checkpoint; select pg_switch_wal(); CHECKPOINT pg_switch_wal I just don't understand what you're trying to achieve here. I want to create replication server. My guess is, you want to stop and backup the old database cluster, Old cluster is empty, from initdb. Backup is not needed then create a new one in its old directory, right? pg_basebackup creates new main directory. In this case, you probably need to change your script to something like this: PGHOST=remote.example.com PGPASSWORD=mypass PGUSER=replikaator PGDATA=/var/lib/postgresql/12/main export PGHOST PGPASSWORD PGUSER PGDATA /etc/init.d/postgresql stop mv ${PGDATA} /var/lib/postgresql/12/mainennebaasbakuppi pg_basebackup -h ${PGHOST} -p ${PGPORT} -U ${PGUSER} -W -R -D ${PGDATA} -P -v -Fp -Xs /etc/init.d/postgresql start pg_basebackup uses environment varuables if not specified in command line. So my script does the same thing. Note that my invocation of pg_basebackup asks for the replicator password. This is intended. You'd probably want to change that. Also, no need to play around with ownership and permissions. Do it as "postgres", not as "root". I tried sudo --user=postgres pg_basebackup but got error could not change directory to "/root": Permission denied Andrus.
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
On Mon, Jun 1, 2020 at 10:17 AM Andrus wrote: > Hi! > > > I have tried to re-initiate replica serveral times in low-use time but > this error occurs again. > >remove the whole replica's PGDATA/* and do a pg_basebackup again. But > before that, make sure wal_keep_segments in big enough on the > >master and, > > I renamed whole cluster before pg_basebackup > > >just as much important, do a vacuumdb -a (takes much space during the > process) and use archiving! > > I run vacuumdb --full --all before pg_basebackup > > > If named replication slot is used commands like > > vacuumdb --all --full > > will cause main server crash due to disk space limit. pg_wal directory > will occupy free disk space. After that main server stops. > >>if you have disk constraints you will run into trouble sooner or later > anyway. Make sure, you have enough disk space. There's no > >>way around that anyway. > > This space is sufficient for base backup and replication. > > >> I tried using wal_keep_segments =180 > >> Will setting wal_keep_segments to higher value allw replication start > after pg_basebackup ? > >it depends. If you start the replica immediately and don't wait for hours > or days, you should be good to go. But that depends on > >different factors, for example, how >many WAL files are written during > the pg_basebackup and pg_ctl start of the replica. If more > >than 180 WALs have gone by on the master because it is really busy, >then > you're probably lost again. Point being, you'll have to > >launch the replica before WALs are expired! > >Again: Make sure you have enough disk space, use archiving and use a > replication slot. > > I tried with wal_keep_segments=360 but problem persisists. > Server generates lot of less than 300 wal files. > Have you verified that wal_keep_segments actually end up at 360, by connecting to the database and issuing SHOW wal_keep_segments? I've seen far too many examples of people who accidentally had a second line that overrode the one they thought they changed, and thus still ran with a lower number. Shell script starts server after pg_basebackup completes automatically: > > PGHOST=example.com > PGPASSWORD=mypass > PGUSER=replikaator > export PGHOST PGPASSWORD PGUSER > /etc/init.d/postgresql stop > mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainennebaasbakuppi > pg_basebackup --verbose --progress --write-recovery-conf -D > /var/lib/postgresql/12/main > chmod --recursive --verbose 0700 /var/lib/postgresql/12/main > chown -Rv postgres:postgres /var/lib/postgresql/12/main > /etc/init.d/postgresql start > Do you get any useful output from the -v part of pg_basebackup? It should for example tell you the exact start and stop point in the wal during the basebackup, that can be correlated to the msising file. Normally the window between end of pg_basebackup and start of the actual service is not big enough to cause a problem (since v12 will do a streaming receive of the logs *during* the backup -- it could be a big problem before that was possible, or if one forgot to enable it before it was the default), and it certainly sounds weird that it should be in your case, unless the chmod and chown commands take a *long* time. But if it is, there is nothing preventing you from creating a slot just during setup and then get rid of it. That is: 1. create slot 2. pg_basebackup with slot 3. start replication with slot 4. restart replication without slot once it's caught up 5. drop slot However, if you want reliable replication, you really should have a slot. Or at least, you should have either a slot *or* log archiving that's read-accessible from the replica. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi Andrus, > On 01. Jun, 2020, at 10:17, Andrus wrote: > Shell script starts server after pg_basebackup completes automatically: > > PGHOST=example.com > PGPASSWORD=mypass > PGUSER=replikaator > export PGHOST PGPASSWORD PGUSER > /etc/init.d/postgresql stop > mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainennebaasbakuppi > pg_basebackup --verbose --progress --write-recovery-conf -D > /var/lib/postgresql/12/main > chmod --recursive --verbose 0700 /var/lib/postgresql/12/main > chown -Rv postgres:postgres /var/lib/postgresql/12/main > /etc/init.d/postgresql start > > How to create replication server ? I always do it this way and it work for me: $ pg_basebackup -h ${PGHOST} -p ${PGPORT} -U replicator -W -R -D ${PGDATA} -P -v -Fp -Xs After that, I edit ${PGDATA}/postgresql.conf and (w/ PostgreSQL 11 and older ${PGDATA}/recovery.conf) to make it do what I want and then I just launch it: $ pg_ctl start From that moment onward, it replicates and applies to the replica. Checks in pg_stat_replication on the master and pg_stat_wal_receiver on the replica confirm that. They also show the WAL switches. To provoke a WAL switch I always do: postgres=# checkpoint; select pg_switch_wal(); CHECKPOINT pg_switch_wal --- C/5128 (1 row) I just don't understand what you're trying to achieve here. My guess is, you want to stop and backup the old database cluster, then create a new one in its old directory, right? In this case, you probably need to change your script to something like this: PGHOST=remote.example.com PGPASSWORD=mypass PGUSER=replikaator PGDATA=/var/lib/postgresql/12/main export PGHOST PGPASSWORD PGUSER PGDATA /etc/init.d/postgresql stop mv ${PGDATA} /var/lib/postgresql/12/mainennebaasbakuppi pg_basebackup -h ${PGHOST} -p ${PGPORT} -U ${PGUSER} -W -R -D ${PGDATA} -P -v -Fp -Xs /etc/init.d/postgresql start Note that my invocation of pg_basebackup asks for the replicator password. This is intended. You'd probably want to change that. Also, no need to play around with ownership and permissions. Do it as "postgres", not as "root". Cheers, Paul
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi! I want to create hot standby async server using /etc/init.d/postgresql stop mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainold pg_basebackup --write-recovery-conf -D /var/lib/postgresql/12/main I don't see where the base backup is being taken from just where it is going. It is taken from VPS server over 20 Mbit public internet connection. Both servers are running Debian Linux. I dont receive WALs. If you are doing binary replication then you are receiving WALs. It just a matter of whether you are streaming them or shipping them over complete. Using wal_keep_segments=360 also causes same wal file not found error after pg_basebackup. master server has 6GB wal files. wal log during pg_basebackup is much slower than 360. Maybe pg_basebackup skips wal segments . Maybe using wal_compression=on causes the issue. How to fix this ? How to create base backup so that cluster is tranferred over internet faster? Maybe it can transferred in compressed form over internet. Andrus.
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
On 5/31/20 2:43 PM, Andrus wrote: Hi! In addition to my most recent questions: What are you trying to achieve? I want to create hot standby async server using /etc/init.d/postgresql stop mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainold pg_basebackup --write-recovery-conf -D /var/lib/postgresql/12/main I don't see where the base backup is being taken from just where it is going. chmod --recursive --verbose 0700 /var/lib/postgresql/12/main chown -Rv postgres:postgres /var/lib/postgresql/12/main /etc/init.d/postgresql start In other words why do a pg_basebackup if you have a standby receiving WALs? I dont receive WALs. If you are doing binary replication then you are receiving WALs. It just a matter of whether you are streaming them or shipping them over complete. Andrus. -- Adrian Klaver adrian.kla...@aklaver.com
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi! Will wal_keep_segments keep segments also if named replication slot is lot used ? Well if you are using a replication slot there is no point in using wal_keep_segments. Slots where created in, part at least, so you did not have to guess at a wal_keep_segments number. I dont use slot. To really answer this we will need to see the exact commands you are using and the sequence they are done in. Replication server is created using /etc/init.d/postgresql stop mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainold pg_basebackup --write-recovery-conf -D /var/lib/postgresql/12/main chmod --recursive --verbose 0700 /var/lib/postgresql/12/main chown -Rv postgres:postgres /var/lib/postgresql/12/main /etc/init.d/postgresql start Andrus.
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi! In addition to my most recent questions: What are you trying to achieve? I want to create hot standby async server using /etc/init.d/postgresql stop mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainold pg_basebackup --write-recovery-conf -D /var/lib/postgresql/12/main chmod --recursive --verbose 0700 /var/lib/postgresql/12/main chown -Rv postgres:postgres /var/lib/postgresql/12/main /etc/init.d/postgresql start In other words why do a pg_basebackup if you have a standby receiving WALs? I dont receive WALs. Andrus.
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi Andrus, > On 31. May, 2020, at 22:56, Andrus wrote: > wal files are not archieved. IMHO a bad decision. They should be. Now you're in the situation where you see why. > I have tried to re-initiate replica serveral times in low-use time but this > error occurs again. remove the whole replica's PGDATA/* and do a pg_basebackup again. But before that, make sure wal_keep_segments in big enough on the master and, just as much important, do a vacuumdb -a (takes much space during the process) and use archiving! > If named replication slot is used commands like > vacuumdb --all --full > will cause main server crash due to disk space limit. pg_wal directory will > occupy free disk space. After that main server stops. if you have disk constraints you will run into trouble sooner or later anyway. Make sure, you have enough disk space. There's no way around that anyway. > I tried using wal_keep_segments =180 > Will setting wal_keep_segments to higher value allw replication start after > pg_basebackup ? it depends. If you start the replica immediately and don't wait for hours or days, you should be good to go. But that depends on different factors, for example, how many WAL files are written during the pg_basebackup and pg_ctl start of the replica. If more than 180 WALs have gone by on the master because it is really busy, then you're probably lost again. Point being, you'll have to launch the replica before WALs are expired! Again: Make sure you have enough disk space, use archiving and use a replication slot. Cheers, Paul
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
On 5/31/20 2:03 PM, Andrus wrote: Hi! I'm guessing are looking for: https://www.postgresql.org/docs/12/runtime-config-replication.html 26.2.6. Replication Slots Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected. Using replication slot can cause pg_wal directoy to occupy all free disk space and after that server stop respondig. This is spelled out here: https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. Will wal_keep_segments keep segments also if named replication slot is lot used ? In addition to my most recent questions: What are you trying to achieve? In other words why do a pg_basebackup if you have a standby receiving WALs? Andrus. -- Adrian Klaver adrian.kla...@aklaver.com
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
On 5/31/20 2:03 PM, Andrus wrote: Hi! I'm guessing are looking for: https://www.postgresql.org/docs/12/runtime-config-replication.html 26.2.6. Replication Slots Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected. Using replication slot can cause pg_wal directoy to occupy all free disk space and after that server stop respondig. This is spelled out here: https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. Will wal_keep_segments keep segments also if named replication slot is lot used ? Well if you are using a replication slot there is no point in using wal_keep_segments. Slots where created in, part at least, so you did not have to guess at a wal_keep_segments number. To really answer this we will need to see the exact commands you are using and the sequence they are done in. Andrus. -- Adrian Klaver adrian.kla...@aklaver.com
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi! I'm guessing are looking for: https://www.postgresql.org/docs/12/runtime-config-replication.html 26.2.6. Replication Slots Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected. Using replication slot can cause pg_wal directoy to occupy all free disk space and after that server stop respondig. This is spelled out here: https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. Will wal_keep_segments keep segments also if named replication slot is lot used ? Andrus.
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi Andrus, > On 31. May, 2020, at 21:47, Andrus wrote: > replikaator@[unknown] ERROR: requested WAL segment 000102CF00E9 > has already been removed the message says it all. You need to copy the WAL file 000102CF00E9 and newer to the replica's pg_wal directory because it has been removed already on the master site. Obviously, you can only do that if the files have been archived. Otherwise, you'd have to fully reinitiate the replica. Replication will start again as soon as the requested WALs are copied over to the replica. Use a replication slot to avoid this situation. If you use a replication slot, the master will only remove WAL files which are not needed by any one replica. Hope this helps. Cheers, Paul
Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
On 5/31/20 12:47 PM, Andrus wrote: Hi! pg_basebackup takes 8 hours. After it is finished, replication slave does not start: LOG: consistent recovery state reached at 2DE/985A5BE0 LOG: database system is ready to accept read only connections LOG: started streaming WAL from primary at 2DE/9900 on timeline 1 replikaator@[unknown] LOG: received replication command: SHOW data_directory_mode replikaator@[unknown] LOG: received replication command: IDENTIFY_SYSTEM replikaator@[unknown] LOG: received replication command: START_REPLICATION 2CF/E900 TIMELIN replikaator@[unknown] ERROR: requested WAL segment 000102CF00E9 has already been re replikaator@[unknown] LOG: received replication command: SHOW data_directory_mode replikaator@[unknown] LOG: received replication command: IDENTIFY_SYSTEM replikaator@[unknown] LOG: received replication command: START_REPLICATION 2CF/E900 TIMELIN replikaator@[unknown] ERROR: requested WAL segment 000102CF00E9 has already been removed There's your problem ^ ... i tried it again and same error occured. How to force replication to start? If the WAL is gone you can't. More below. I increased wal parameters in master to wal_compression=on max_wal_size = 5GB min_wal_size = 4GB # was 80MB wal_keep_segments= 360 # was 180 Will this allow replication to start after pg_basebackup ? According to doc min_wal_size and wal_keep_segments both keep the minimum number of wal segments for replication. No it doesn't: https://www.postgresql.org/docs/12/runtime-config-replication.html "wal_keep_segments (integer) Specifies the minimum number of past log file segments kept in the pg_wal directory, in case a standby server needs to fetch them for streaming replication. Each segment is normally 16 megabytes. If a standby server connected to the sending server falls behind by more than wal_keep_segments segments, the sending server might remove a WAL segment still needed by the standby, in which case the replication ^^ connection will be terminated. Downstream connections will also ^ eventually fail as a result. (However, the standby server can recover by fetching the segment from archive, if WAL archiving is in use.) ... https://www.postgresql.org/docs/12/runtime-config-wal.html "min_wal_size (integer) As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. This can be used to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. If this value is specified without units, it is taken as megabytes. The default is 80 MB. This parameter can only be set in the postgresql.conf file or on the server command line. " I'm guessing are looking for: https://www.postgresql.org/docs/12/runtime-config-replication.html " 26.2.6. Replication Slots Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected. ... " This is spelled out here: https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that's accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments." Why those parameters are duplicated? Andrus. -- Adrian Klaver adrian.kla...@aklaver.com
How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Hi! pg_basebackup takes 8 hours. After it is finished, replication slave does not start: LOG: consistent recovery state reached at 2DE/985A5BE0 LOG: database system is ready to accept read only connections LOG: started streaming WAL from primary at 2DE/9900 on timeline 1 replikaator@[unknown] LOG: received replication command: SHOW data_directory_mode replikaator@[unknown] LOG: received replication command: IDENTIFY_SYSTEM replikaator@[unknown] LOG: received replication command: START_REPLICATION 2CF/E900 TIMELIN replikaator@[unknown] ERROR: requested WAL segment 000102CF00E9 has already been re replikaator@[unknown] LOG: received replication command: SHOW data_directory_mode replikaator@[unknown] LOG: received replication command: IDENTIFY_SYSTEM replikaator@[unknown] LOG: received replication command: START_REPLICATION 2CF/E900 TIMELIN replikaator@[unknown] ERROR: requested WAL segment 000102CF00E9 has already been removed ... i tried it again and same error occured. How to force replication to start? I increased wal parameters in master to wal_compression=on max_wal_size = 5GB min_wal_size = 4GB # was 80MB wal_keep_segments= 360 # was 180 Will this allow replication to start after pg_basebackup ? According to doc min_wal_size and wal_keep_segments both keep the minimum number of wal segments for replication. Why those parameters are duplicated? Andrus.
Re: pg_basebackup + delta base backups
On 26-May-20 10:20, Stephen Frost wrote: [...] "out of sync" is a bit of an odd concept, but having a replica fall behind a long way is certainly something that can happen and may require a rebuild from a backup (or from a new sync off of the primary in some other way, as you suggest below). In a situation where there's async replication happening and you promote a replica to take over, that's definitely a case where you might also have to rebuild the former primary. Hi Stepehen, Yes, a common case with async streaming is when primary (A) goes down and replica is promoted as a new master (B). Then A comes back and has some data that was not streamed to B so pg_rewind is useless. I wonder if there is some option to just discard this branched data from A in order to start as a new replica. I noticed that pg_rewind is useless even when both DBs are identical (according to pg_dumpall | md5sum). [...] As you said, all the pieces are there and it would be quite easy to write a new "pg_basebackup_delta" script that could be executed on the standby host to: 1) setup a pgBackRest repo on the primary host (via SSH) 2) create a backup on the primary host (via SSH) 3) do a delta restore on the standby Even when the repository on the primary host is only created temporarily (and require double storage, resources, etc), it may still be worth considering the traffic that can be saved by doing a delta restore on a standby host in a different region, right? So... There's actually a way to do this with pgbackrest, but it doesn't support the delta capability. If I understood correctly the method you described, you were basically doing a "backup" between A (primary) and B (repo) and in such a way the repo is then compatible with the pg_data structure, but without delta support (ie. transfering the whole database)? Delta support is critical for VLDBs, so I see two alternatives to replace pg_basebackup with pgbackrest to rebuild a replica: 1) Create a temporary repo on the primary 2) Create a temporary repo on the replica All configurations would be undone after the replica has been rebuilt and both alternatives would be using delta over the wire. In your opinion, which alternative is better considering network traffic? Thanks, Christopher
Re: pg_basebackup + incremental base backups
Greetings, * Christopher Pereira (krip...@imatronix.cl) wrote: > On 24-May-20 15:48, Stephen Frost wrote: > >That really shouldn't be possible. I'm very curious as to exactly what > >happened that resulted in your primary/replica being 'out of sync', as > >you say. > > Actually this was more a hypothetical question to find a solution in case > some day one of our standby clusters goes out of sync and we have to rebuild > it having a very big database. "out of sync" is a bit of an odd concept, but having a replica fall behind a long way is certainly something that can happen and may require a rebuild from a backup (or from a new sync off of the primary in some other way, as you suggest below). In a situation where there's async replication happening and you promote a replica to take over, that's definitely a case where you might also have to rebuild the former primary. > With proper WAL archiving this shouldn't happen but we wanted to be prepared > for this scenario just in case. Sure, having WAL archiving and good backups is definitely my strong recommendation for how to address any cases where the replica falls behind, or the replica is promoted and you want to remaster the former primary. > We did some tests measuring IO and traffic and are very happy with the > results. We will definitely be adding pgBackRest to our toolchain. Glad to hear that. > Regarding my initial question, I still believe that the world deserves a > simple direct pg_basebackup replacement even when putting an additional > "repo host" in the middle is a better idea in the long term. Perhaps.. > As you said, all the pieces are there and it would be quite easy to write a > new "pg_basebackup_delta" script that could be executed on the standby host > to: > > 1) setup a pgBackRest repo on the primary host (via SSH) > > 2) create a backup on the primary host (via SSH) > > 3) do a delta restore on the standby > > Even when the repository on the primary host is only created temporarily > (and require double storage, resources, etc), it may still be worth > considering the traffic that can be saved by doing a delta restore on a > standby host in a different region, right? So... There's actually a way to do this with pgbackrest, but it doesn't support the delta capability. We take care in pgbackrest to make the repo format for full backups actually match exactly what a PG cluster would look like, specifically because we wish to allow users to, if absolutely everything else fails and pgbackrest is non-functional, rebuild from the repo. To that point, we even make sure that command-line tools like gzip, bzip2, openssl, etc, work with the files we create. For what you're asking about though, you would do something like: - Set up a pgbackrest repo on the host you're rebuilding (the replica) - Disable all repo compression, encryption, et al. - Enable archive copy, so the WAL for the backup is put into the backup - Enable parallel jobs - Set up pgbackrest on the primary with the replica configured as the repo host, get WAL archiving working and such. - Run create-stanza - Perform a *full* backup, make sure everything works. Once that's done, you go find the pg_data directory inside the full backup that you made inside of the pgbackrest repo, copy/move/rsync (with checksums enabled!) that to where your PG data directory should be, set up a recovery.conf to point to the primary, maybe also set it up with a restore_command pointing to that repo (may or may not be needed, depending) and start up PG. That should allow PG to start, replay all of the necessary WAL, and then connect to the primary and start streaming. If you have any tablespaces, you'd need to deal with those too, of course. If you start moving things out of the pgbackrest repo, you're corrupting it, naturally, so you wouldn't really want to continue using it once you've gotten all of this done. Again, this isn't an approach which I'd generally recommend... We used to do it when we were first writing pgbackrest with a ZFS filesystem and after a backup we'd create a snapshot off of the repo and start PG right up and then run pg_dump on it and do other checks to make sure it worked, but have moved to using pgbackrest delta restores instead for that kind of use-case. Thanks, Stephen signature.asc Description: PGP signature
Re: pg_basebackup + incremental base backups
On 24-May-20 15:48, Stephen Frost wrote: That really shouldn't be possible. I'm very curious as to exactly what happened that resulted in your primary/replica being 'out of sync', as you say. Hi Stephen, Actually this was more a hypothetical question to find a solution in case some day one of our standby clusters goes out of sync and we have to rebuild it having a very big database. With proper WAL archiving this shouldn't happen but we wanted to be prepared for this scenario just in case. We did some tests measuring IO and traffic and are very happy with the results. We will definitely be adding pgBackRest to our toolchain. Regarding my initial question, I still believe that the world deserves a simple direct pg_basebackup replacement even when putting an additional "repo host" in the middle is a better idea in the long term. As you said, all the pieces are there and it would be quite easy to write a new "pg_basebackup_delta" script that could be executed on the standby host to: 1) setup a pgBackRest repo on the primary host (via SSH) 2) create a backup on the primary host (via SSH) 3) do a delta restore on the standby Even when the repository on the primary host is only created temporarily (and require double storage, resources, etc), it may still be worth considering the traffic that can be saved by doing a delta restore on a standby host in a different region, right? Thanks and congratulations for the good work.
Re: Query returns no rows in pg_basebackup cluster
Hi! How to set logical replication for all user databases in cluster so that when new database is added or new tables are added to database they will start replicate automatically ? I think that it would be good if you spend some time reading the documentation on this stuff, particularly the part about restrictions, to understand the use cases where that can become useful: https://www.postgresql.org/docs/devel/logical-replication.html Thank you. I read it and havent found any reference to PITR recovery. For PITR recovery it should probably save sql statements to files and allow to specify recovery target time for applying sql statements to base backup. Is PITR recovery supported only using binary WAL files ? Other limits can probably be solved. Andrus.
Re: Query returns no rows in pg_basebackup cluster
On Sunday, May 24, 2020, Andrus wrote: > Hi! > > Backup in created in Windows from Linux server using pg_receivewal and >>> pg_basebackup . >>> Can this backup used for PITR in Linux ? >>> >> No. Physical copies need to be based on the same platform. If you >> wish to replicate a cluster without any platform, architecture or even >> not-too-many major version constraints, there is also logical >> replication available since v10. >> > > Will logical replication also allow two modes: > 1. PITR recovery can used if needed > 2. Hot standby: User databases in both clusters contain same data. > > Why are you spending so much effort on this Window/Linux hybrid setup? Get yourself another Linux server and setup physical replication. It sounds like it will exactly meet your requirements and you will waste more time and money working out alternatives than the server would cost. David J.
Re: Query returns no rows in pg_basebackup cluster
On Mon, May 25, 2020 at 09:02:49AM +0300, Andrus wrote: > Will logical replication also allow two modes: > 1. PITR recovery can used if needed > 2. Hot standby: User databases in both clusters contain same data. > > How to set logical replication for all user databases in cluster so that > when new database is added or new tables are added to database they will > start replicate automatically ? I think that it would be good if you spend some time reading the documentation on this stuff, particularly the part about restrictions, to understand the use cases where that can become useful: https://www.postgresql.org/docs/devel/logical-replication.html -- Michael signature.asc Description: PGP signature
Re: Query returns no rows in pg_basebackup cluster
Hi! No. Physical copies need to be based on the same platform. Does the O/S that the client software runs on really affect this? To the extent that the O/S determines text sort order, yes; see thread. The short answer here is that we aren't going to support such cases. If you try to replicate across platforms, and it works, you're in luck. If it doesn't work, you get to keep both pieces; we will not accept that as a bug. In 2017 Peter wrote that ICU-based collations will offered alongside the libc-based collations (1) Currently it still requires re-compilation of Postgres for all binary replication platforms. Maybe ICU locale will selected during installation automatically in Postgres 13 . Using same ICU locale in all replication platforms will hopefully fix the issue. Currently option is to use ucs_basic as default collation when creating cluster. (1) https://www.2ndquadrant.com/en/blog/icu-support-postgresql-10/ Andrus.
Re: Query returns no rows in pg_basebackup cluster
Hi! Backup in created in Windows from Linux server using pg_receivewal and pg_basebackup . Can this backup used for PITR in Linux ? No. Physical copies need to be based on the same platform. If you wish to replicate a cluster without any platform, architecture or even not-too-many major version constraints, there is also logical replication available since v10. Will logical replication also allow two modes: 1. PITR recovery can used if needed 2. Hot standby: User databases in both clusters contain same data. How to set logical replication for all user databases in cluster so that when new database is added or new tables are added to database they will start replicate automatically ? Will it require more powerful backup server to replay main server sql stream from different databases. Andrus.
Re: Query returns no rows in pg_basebackup cluster
"David G. Johnston" writes: > On Sun, May 24, 2020 at 4:10 PM Michael Paquier wrote: >> No. Physical copies need to be based on the same platform. > Does the O/S that the client software runs on really affect this? To the extent that the O/S determines text sort order, yes; see thread. The short answer here is that we aren't going to support such cases. If you try to replicate across platforms, and it works, you're in luck. If it doesn't work, you get to keep both pieces; we will not accept that as a bug. regards, tom lane
Re: Query returns no rows in pg_basebackup cluster
On Sun, May 24, 2020 at 4:10 PM Michael Paquier wrote: > On Fri, May 22, 2020 at 09:53:37AM +0300, Andrus wrote: > > Backup in created in Windows from Linux server using pg_receivewal and > pg_basebackup . > > Can this backup used for PITR in Linux ? > > No. Physical copies need to be based on the same platform. If you > wish to replicate a cluster without any platform, architecture or even > not-too-many major version constraints, there is also logical > replication available since v10. > Does the O/S that the client software runs on really affect this? I would expect that you could store the offline files anywhere. As long as the architecture your original server is on and the one you are restoring to are the same the restored server should work. They are just bytes until a server interprets them, no? David J.
Re: Query returns no rows in pg_basebackup cluster
On Fri, May 22, 2020 at 09:53:37AM +0300, Andrus wrote: > Backup in created in Windows from Linux server using pg_receivewal and > pg_basebackup . > Can this backup used for PITR in Linux ? No. Physical copies need to be based on the same platform. If you wish to replicate a cluster without any platform, architecture or even not-too-many major version constraints, there is also logical replication available since v10. -- Michael signature.asc Description: PGP signature
Re: pg_basebackup + incremental base backups
Greetings, * Christopher Pereira (krip...@imatronix.cl) wrote: > Ok, we want to use pgbackrest to *rebuild a standby that has fallen behind* > (where pg_rewind won't work). After reading the docs, we believe we should > use this setup: > > a) Primary host: primary cluster > > b) Repository host: needed for rebuilding the standby (and having PITR as > bonus). > > c) Standby host: standby cluster That would work. > 1) The standby will use streaming replication and will be in sync until > someday something funny happens and both standby and repository get out of > sync with the primary. Having failures in both the standby and repo would have to be something very 'funny' indeed.. > Now, to rebuild the standby first we will have to create a new backup > transferring the data from *primary -> repository*, right? If the repo is entirely gone *and* the standby server is broken, then, yes, you'd have to do a new backup into the repo and then restore that. > Wouldn't this also have a load impact on the primary cluster? Yes. The distinction I was trying to make earlier is that if the standby server gets messed up in any shape, and the repo/repo-server is still operating correctly, then you can perform a pgbackrest delta restore whereby the standby is rebuilt from the last backup that was performed. That process only involves the repo server and the standby server, there wouldn't be any load on the primary, and further, only those files which are different on the standby server vs. the last backup in the repo would be copied, minimizing bandwidth between the two. > 2) In the user guide section 17.3 is explained how to create a "pg-standby > host" to replicate the data *from the repository host*. > And in section 17.4 is explained how to setup Streaming Replication to > replicate the data *from the primary host*. > Do 17.3 and 17.4 work together so that the data is *replicated from the > repository* and then *streamed from the primary*? It's more of an either-or. That is, if you have a pgbackrest repo, and you configure PostgreSQL to have a restore_command which will fetch WAL from the repo *and* you configure PostgreSQL to have a primary_conninfo line to connect to an upstream primary and stream WAL, then you give PG two ways to get the WAL and it'll try one and then the other. If either are able to provide the necessary WAL, then the replica will replay the WAL until it gets to the end of all available WAL, at which point it should be able to connect to the primary and stream WAL. Should the replica ever fall behind and the primary no longer has the WAL that the replica needs, it'll automatically go to the repo to fetch that WAL. Only if the WAL is no longer available from either the primary or the WAL repo is the replica so far behind that it would need a delta restore from a backup to be caught back up. > 3) Before being able to rebuild the standby cluster, would we first need to > update the backup on the repository (backup from primary -> repository) in > order for streaming replication to work (from primary -> standby)? This isn't required, no. Doing so, however, perhaps using a pgbackrest incremental backup, would reduce the amount of WAL the replica would need to replay to get caught back up with the primary though. > 4) Once the backup on the repository is ready, what are the chances that > streaming replication from primary to standby won't work because they got > out of sync again? That really shouldn't be possible. I'm very curious as to exactly what happened that resulted in your primary/replica being 'out of sync', as you say. > 5) Could we just work with 2 hosts (primary and standby) instead of 3? > FAQ section 8 says the repository shouldn't be on the same host as the > standby and having it on the primary doesn't make much sense because if the > primary host is down we won't have access to the backup. It's possible but it's really not recommended, particularly if the idea is to have the primary/replica providing high availability. When you are setting up such an HA solution, you *really* want to have your configurations and such be symmetric between the two (and pgbackrest makes this easy for you, when you have a repo host, by allowing you to configure both in the pgbackrest config file and pgbackrest will figure out which is primary and which is the replica, and perform the backup from whichever one you'd prefer to). In the 2 node configuration you're contemplating, if the replia is down, the primary won't be able to ship WAL off of the system, resulting in a WAL buildup on the primary which could lead to a database failure due to running out of disk space, and further increases risk that the primary then fails and you have lost both availability *and* whatever had been still on the primary, potentially going back hours or more. > It would be ideal to have the repository on the standby host and taking good > care of the configurations. What exactly should be cared of for
Re: pg_basebackup + incremental base backups
We've contemplated adding support for something like this to pgbackrest, since all the pieces are there, but there hasn't been a lot of demand for it and it kind of goes against the idea of having a proper backup solution, really.. It'd also create quite a bit of load on the primary to checksum all the files to do the comparison against what's on the replica that you're trying to update, so not something you'd probably want to do a lot more than necessary. Ok, we want to use pgbackrest to *rebuild a standby that has fallen behind* (where pg_rewind won't work). After reading the docs, we believe we should use this setup: a) Primary host: primary cluster b) Repository host: needed for rebuilding the standby (and having PITR as bonus). c) Standby host: standby cluster Some questions: 1) The standby will use streaming replication and will be in sync until someday something funny happens and both standby and repository get out of sync with the primary. Now, to rebuild the standby first we will have to create a new backup transferring the data from *primary -> repository*, right? Wouldn't this also have a load impact on the primary cluster? 2) In the user guide section 17.3 is explained how to create a "pg-standby host" to replicate the data *from the repository host*. And in section 17.4 is explained how to setup Streaming Replication to replicate the data *from the primary host*. Do 17.3 and 17.4 work together so that the data is *replicated from the repository* and then *streamed from the primary*? 3) Before being able to rebuild the standby cluster, would we first need to update the backup on the repository (backup from primary -> repository) in order for streaming replication to work (from primary -> standby)? 4) Once the backup on the repository is ready, what are the chances that streaming replication from primary to standby won't work because they got out of sync again? 5) Could we just work with 2 hosts (primary and standby) instead of 3? FAQ section 8 says the repository shouldn't be on the same host as the standby and having it on the primary doesn't make much sense because if the primary host is down we won't have access to the backup. It would be ideal to have the repository on the standby host and taking good care of the configurations. What exactly should be cared of for this setup to be safe? I'm afraid I'm not understanding very well the pgbackrest design or how to use it efficiently to rebuild a standby cluster that got out of sync.
Re: Query returns no rows in pg_basebackup cluster
Hi! ERROR: item order invariant violated for index "desktop_baas_liigid_idx" DETAIL: Lower index tid=(3,15) (points to index tid=(16,4098)) higher index tid=(3,16) (points to index tid=(17,4098)) page lsn=292/630C0CE8. SQL state: XX002 Uh huh ... and I'll bet the same test on the source server is just fine? I don't find it surprising in the least that different platforms have different ideas on fine points like how to sort a leading underscore. Those things just aren't that well standardized. This column is not used for locale specific data. Running alter table desktop alter baas type char(8) collate ucs_basic fixes the issue. Is this fix reasonable ? What other issues may occur ? Can base backup created in windows using pg_basecakup used in Linux without such fix? Andrus.
Re: Query returns no rows in pg_basebackup cluster
"Andrus" writes: >> Possibly you could try running contrib/amcheck on the index in question >> and see if it reports any issues. > I tried and it reports error > ERROR: item order invariant violated for index "desktop_baas_liigid_idx" > DETAIL: Lower index tid=(3,15) (points to index tid=(16,4098)) higher index > tid=(3,16) (points to index tid=(17,4098)) page > lsn=292/630C0CE8. > SQL state: XX002 Uh huh ... and I'll bet the same test on the source server is just fine? I don't find it surprising in the least that different platforms have different ideas on fine points like how to sort a leading underscore. Those things just aren't that well standardized. regards, tom lane
Re: Query returns no rows in pg_basebackup cluster
Hi! The sorting rules for this locale must be the same in both platforms. Only locale names are different. I think they are less alike than you hoped, because if they were alike, you wouldn't be seeing this problem. Possibly you could try running contrib/amcheck on the index in question and see if it reports any issues. I tried and it reports error ERROR: item order invariant violated for index "desktop_baas_liigid_idx" DETAIL: Lower index tid=(3,15) (points to index tid=(16,4098)) higher index tid=(3,16) (points to index tid=(17,4098)) page lsn=292/630C0CE8. SQL state: XX002 Andrus.
Re: Query returns no rows in pg_basebackup cluster
"Andrus" writes: >> No, what it sounds like is the OP tried to physically replicate a >> database on another platform with completely different sorting rules. > The sorting rules for this locale must be the same in both platforms. > Only locale names are different. I think they are less alike than you hoped, because if they were alike, you wouldn't be seeing this problem. Possibly you could try running contrib/amcheck on the index in question and see if it reports any issues. regards, tom lane
Re: pg_basebackup + incremental base backups
On 21-May-20 08:43, Stephen Frost wrote: * Christopher Pereira (krip...@imatronix.cl) wrote: [...] Is there some way to rebuild the standby cluster by doing a differential backup of the primary cluster directly? We've contemplated adding support for something like this to pgbackrest, since all the pieces are there, but there hasn't been a lot of demand for it and it kind of goes against the idea of having a proper backup solution, really.. It'd also create quite a bit of load on the primary to checksum all the files to do the comparison against what's on the replica that you're trying to update, so not something you'd probably want to do a lot more than necessary. We have backups of the whole server and only need a efficient way to rebuild the hot-standby cluster when pg_rewind is not able to do so. I agree with your concerns about the increased load on the primary server, but this rebuilding process would only be done in case of emergency or during low load hours. pg_basebackup works fine but does not support differential/incremental backups which is a blocker. Do you know any alternative software that is able to rebuild the standby PG data dir using rsync or similar while the primary is still online? It seems a simple pg_start_backup + rsync + pg_stop_backup (maybe combined with a LVM snapshot) would do, but we would prefer to use some existing tool. We just tried barman, but it also seems to require a restore from the backup before being able to start the standby server (?), and we are afraid this would require double storage, IO and time for rebuilding the standby cluster. Thanks.
Re: Query returns no rows in pg_basebackup cluster
Hi! Database in Windows is in read-only (recovery) mode so it cannot changed. Then you might as well just rm -rf it (or whatever the equivalent Windows incantation is). On Windows, that database is broken and useless. Backup in created in Windows from Linux server using pg_receivewal and pg_basebackup . Can this backup used for PITR in Linux ? Andrus.
Re: Query returns no rows in pg_basebackup cluster
Hi! Main server is in Linux and backup server is in windows. This is not a supported setup if you want to run a physical backup. Your backup and your primary need to be the same - software and hardware. Consider anything that is working to be a false negative – assume >something will break or simply give incorrect results. This base backup should used for recovery. Taking new base backup in Linux does not allow to recover to earlier date. Both servers have Intel 64 bit CPUs. I understand that only issue is the index structure and that REINDEX will fix this. What other issues may occur ? Will pg_dump/pg_restore in Windows server fix all issues. Andrus.
Re: Query returns no rows in pg_basebackup cluster
On Thu, May 21, 2020 at 10:41 PM Andrus wrote: > Main server is in Linux and backup server is in windows. > This is not a supported setup if you want to run a physical backup. Your backup and your primary need to be the same - software and hardware. Consider anything that is working to be a false negative - assume something will break or simply give incorrect results. David J.
Re: Query returns no rows in pg_basebackup cluster
Hi! No, what it sounds like is the OP tried to physically replicate a database on another platform with completely different sorting rules. The sorting rules for this locale must be the same in both platforms. Only locale names are different. It looks like windows server does not recognize Linux locale name. Which means all his text indexes are corrupt according to the destination platform's sorting rules, which easily explains the observed misbehavior (ie, index searches not finding the expected rows). Lot of queries seems working properly. REINDEX would fix it. REINDEX throws error ERROR: cannot execute REINDEX during recovery SQL state: 25006 But the major point here is you can't just ignore a collation mismatch, which in turn implies that you can't do physical replication from Linux to Windows, or vice versa (and most other cross-platform cases are just as dangerous). Database is used in recovery mode to find proper recovery point and to get data from it in this point. Locales are actually same. In windows Postgres does not recognize Linux locale name. Database in Windows is in read-only (recovery) mode so it cannot changed. Then you might as well just rm -rf it (or whatever the equivalent Windows incantation is). On Windows, that database is broken and useless. Most queries seems to work. Database should examined to get accidently deleted data from it. Is making it read-write and index only solution or can it fixed in read-only database also, e-q forcing same local in postgres.conf Andrus.
Re: Query returns no rows in pg_basebackup cluster
Hi! The LIKE query probably doesn't use an index and thus finds the relevant data via sequential scan and equality checks on each record. Yeah, exactly. An equality condition will use a btree index if available. LIKE, however, sees the "_" as a wildcard so it cannot use an index and resorts to a seqscan --- which will work fine. It's just index searches (and index-based sorts) that are broken. Of course, if there isn't an index on the column in question then this theory falls to the ground. There is composite index on baas column CREATE TABLE public.desktop ( id integer NOT NULL DEFAULT nextval('desktop_id_seq'::regclass), recordtype character(5) COLLATE pg_catalog."default" NOT NULL, klass character(1) COLLATE pg_catalog."default", baas character(8) COLLATE pg_catalog."default" NOT NULL, liigid character(1) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::bpchar, jrk numeric(4,0) NOT NULL DEFAULT 0, ... CONSTRAINT desktop_pkey PRIMARY KEY (id), CONSTRAINT desktop_baas_not_empty CHECK (baas <> ''::bpchar), CONSTRAINT desktop_id_check CHECK (id > 0), CONSTRAINT desktop_recordtype_check CHECK (recordtype = 'Aken'::bpchar OR recordtype = 'Veerg'::bpchar) ) TABLESPACE pg_default; CREATE INDEX desktop_baas_liigid_idx ON public.desktop USING btree (baas COLLATE pg_catalog."default" ASC NULLS LAST, liigid COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; Maybe it is possible to force postgres in windows to use the same locale as in Linux. Locales are actually the same. Andrus.
Re: Query returns no rows in pg_basebackup cluster
Hi! Are you referring to two different instances of Postgres on Windows? No. Main server is in Linux and backup server is in windows. Andrus.