Re: [GENERAL] Broken primary key after backup restore.

2015-10-02 Thread Kevin Grittner
Michael Chau  wrote:

> For some reason, there were some bad wal log files in pg_xlog. I
> believe that they got generated during the backup last Monday,
> but I don't know why. I speculate that may be the
> wal_keep_segments was not set high enough as I have changed it
> recently.
>
> Luckily, I have archived the wal log files. And by comparing
> between the two directories, I did see those bad wal log files in
> pg_xlog directory only.

The instructions for making a backup explicitly say to exclude or
delete the files in the pg_xlog directory and use those from the
archive.  That's because tar (or whatever you are using to copy the
files) may copy a WAL file before some change made during the
backup is written to it.  What you were doing is not supported and
likely to appear to work sometimes (possibly even without hidden
corruption), and fail to create a backup that will even start on
other attempts.

You might find this blog post helpful:

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Broken primary key after backup restore.

2015-09-21 Thread Michael Chau
Hi,

Just want to let everybody knows

So, I found out this morning what went wrong.

For some reason, there were some bad wal log files in pg_xlog. I believe
that they got generated during the backup last Monday, but I don't know
why. I speculate that may be the wal_keep_segments was not set high enough
as I have changed it recently.

Luckily, I have archived the wal log files. And by comparing between the
two directories, I did see those bad wal log files in pg_xlog directory
only.

Yes, we were looking at Barman before. But at that time Barman didn't
support our version of PostgreSQL. I think that I will pursue this.

Thanks again.

On Fri, Sep 18, 2015 at 2:40 PM, David Steele  wrote:

> On 9/18/15 3:44 PM, Michael Chau wrote:
>
>> Hi Jeff,
>>
>> Only if you are very lucky.  If your tar command tars up the pg_xlog
>>> directory as the last thing it does, then you are probably going to be OK.
>>> Otherwise, it is a crap shoot.
>>>
>>
>> May be that's it. I have another similar set up, but the pg_xlog is a
>> soft link to another directory, and I use 'tar -chvzf'. It tar up the
>> pg_xlog at the very last. And the restore is fine.
>>
>
> This is still not always safe.  It depends on your wal_keep_segments
> settings and some luck.  WAL segments can be recycled during the backup.
>
> For this one, DB1 and DB2, the pg_xlog is the directory itself, and I
>> use 'tar -cvzf'. And it tar up pg_xlog at the beginning. I always have
>> doubt about it. But I though pg_stop_backup() and pg_start_backup() like
>> freezing would prevent the inconsistency.
>>
>
> This is definitely not a good idea.
>
> Indeed, I will look inot pgbasebackup.
>>
>
> pg_basebackup is good for creating replicas but for real backup you might
> want to consider purpose-built backup software like pgBackRest or barman.
>
> --
> -David
> da...@pgmasters.net
>



-- 
*Michael Chau*
*Database Administrator*
*GAME GOLF*
77 Geary St, 5th floor
San Francisco, CA 94108
c) *510-366-3800*
e) *michael.c...@gameyourgame.com *
f) www.facebook.com/gamegolf 
t) @GAMEGOLF
w) www.gamegolf.c *om*


Re: [GENERAL] Broken primary key after backup restore.

2015-09-18 Thread David Steele

On 9/18/15 3:44 PM, Michael Chau wrote:

Hi Jeff,


Only if you are very lucky.  If your tar command tars up the pg_xlog directory 
as the last thing it does, then you are probably going to be OK.  Otherwise, it 
is a crap shoot.


May be that's it. I have another similar set up, but the pg_xlog is a
soft link to another directory, and I use 'tar -chvzf'. It tar up the
pg_xlog at the very last. And the restore is fine.


This is still not always safe.  It depends on your wal_keep_segments 
settings and some luck.  WAL segments can be recycled during the backup.



For this one, DB1 and DB2, the pg_xlog is the directory itself, and I
use 'tar -cvzf'. And it tar up pg_xlog at the beginning. I always have
doubt about it. But I though pg_stop_backup() and pg_start_backup() like
freezing would prevent the inconsistency.


This is definitely not a good idea.


Indeed, I will look inot pgbasebackup.


pg_basebackup is good for creating replicas but for real backup you 
might want to consider purpose-built backup software like pgBackRest or 
barman.


--
-David
da...@pgmasters.net


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Broken primary key after backup restore.

2015-09-18 Thread Michael Chau
Hi Jeff,

>Only if you are very lucky.  If your tar command tars up the pg_xlog
directory as the last thing it does, then you are probably going to be OK.
Otherwise, it is a crap shoot.

May be that's it. I have another similar set up, but the pg_xlog is a soft
link to another directory, and I use 'tar -chvzf'. It tar up the pg_xlog at
the very last. And the restore is fine.

For this one, DB1 and DB2, the pg_xlog is the directory itself, and I use
'tar -cvzf'. And it tar up pg_xlog at the beginning. I always have doubt
about it. But I though pg_stop_backup() and pg_start_backup() like freezing
would prevent the inconsistency.

Indeed, I will look inot pgbasebackup.

Thanks,



On Fri, Sep 18, 2015 at 11:20 AM, Jeff Janes  wrote:

> On Fri, Sep 18, 2015 at 6:16 AM, Adrian Klaver 
> wrote:
>
>> On 09/17/2015 11:15 PM, Guillaume Lelarge wrote:
>>
>>> Le 18 sept. 2015 5:23 AM, "Adrian Klaver" >> > a écrit :
>>>  >
>>>  > On 09/17/2015 05:37 PM, Michael Chau wrote:
>>>  >>
>>>  >> To restore on test server:
>>>  >>
>>>  >> 1) Just untar the tar ball, then start up Postgres. Of course the
>>> data
>>>  >> directory is empty beforehand.
>>>  >>
>>>  >> This has been working for almost 2 years without any problem until
>>> last
>>>  >> Monday. I remember that I just ran vacuum analyze that table on both
>>> DB1
>>>  >> and DB2  that morning. But, I don't think that it harms anything.
>>>  >
>>>  >
>>>  > Well it looks fairly straight forward, to me at least.
>>>  >
>>>
>>> Do I miss something obvious? Because this is to me the wrong way to do
>>> the restore. You need to apply WAL files archived between
>>> pg_start_backup and pg_stop_backup to get consistent data files.
>>>
>>
>> Would that not be taken care of by the tar data directory/ untar data
>> directory?
>>
>
> Only if you are very lucky.  If your tar command tars up the pg_xlog
> directory as the last thing it does, then you are probably going to be OK.
> Otherwise, it is a crap shoot.
>
>
>
>> I would think if it was a WAL issue the OP could never get the server to
>> start and get to the point the query failed on a single table and column.
>
>
> With pg_basebackup, that is probably the case, as it either doesn't copy
> xlog at all, or if it does it makes sure it is complete.  But with tar, you
> have no such protection.
>
>
>
>> All that being said, I think the OP would be better served by
>> pg_basebackup:
>>
>> http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.html
>
>
>
> Yes, indeed.
>
> Cheers,
>
> Jeff
>
>


-- 
*Michael Chau*
*Database Administrator*
*GAME GOLF*
77 Geary St, 5th floor
San Francisco, CA 94108
c) *510-366-3800*
e) *michael.c...@gameyourgame.com *
f) www.facebook.com/gamegolf 
t) @GAMEGOLF
w) www.gamegolf.c *om*


Re: [GENERAL] Broken primary key after backup restore.

2015-09-18 Thread Jeff Janes
On Fri, Sep 18, 2015 at 6:16 AM, Adrian Klaver 
wrote:

> On 09/17/2015 11:15 PM, Guillaume Lelarge wrote:
>
>> Le 18 sept. 2015 5:23 AM, "Adrian Klaver" > > a écrit :
>>  >
>>  > On 09/17/2015 05:37 PM, Michael Chau wrote:
>>  >>
>>  >> To restore on test server:
>>  >>
>>  >> 1) Just untar the tar ball, then start up Postgres. Of course the data
>>  >> directory is empty beforehand.
>>  >>
>>  >> This has been working for almost 2 years without any problem until
>> last
>>  >> Monday. I remember that I just ran vacuum analyze that table on both
>> DB1
>>  >> and DB2  that morning. But, I don't think that it harms anything.
>>  >
>>  >
>>  > Well it looks fairly straight forward, to me at least.
>>  >
>>
>> Do I miss something obvious? Because this is to me the wrong way to do
>> the restore. You need to apply WAL files archived between
>> pg_start_backup and pg_stop_backup to get consistent data files.
>>
>
> Would that not be taken care of by the tar data directory/ untar data
> directory?
>

Only if you are very lucky.  If your tar command tars up the pg_xlog
directory as the last thing it does, then you are probably going to be OK.
Otherwise, it is a crap shoot.



> I would think if it was a WAL issue the OP could never get the server to
> start and get to the point the query failed on a single table and column.


With pg_basebackup, that is probably the case, as it either doesn't copy
xlog at all, or if it does it makes sure it is complete.  But with tar, you
have no such protection.



> All that being said, I think the OP would be better served by
> pg_basebackup:
>
> http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.html



Yes, indeed.

Cheers,

Jeff


Re: [GENERAL] Broken primary key after backup restore.

2015-09-18 Thread Adrian Klaver

On 09/17/2015 11:15 PM, Guillaume Lelarge wrote:

Le 18 sept. 2015 5:23 AM, "Adrian Klaver" mailto:adrian.kla...@aklaver.com>> a écrit :
 >
 > On 09/17/2015 05:37 PM, Michael Chau wrote:
 >>
 >> 1)
 >>
 >> In Production, I have a DB2 which is replicated partially using
Londiste
 >> from DB1.
 >>
 >>
 >> Well I think the above needs more explanation to help understand how the
 >> DB2 backup got into this state and possibly prevent it in the future.
 >>
 >> A: So, the DB1 has several schemas in the database. We use Londiste to
 >> replicate just one of the schemas to DB2.  The table in question is in
 >> that schema. Backup is done on both DB1 and DB2.
 >>
 >> 2)
 >> I make file-system backups nightly on both DBs.
 >>
 >> How is that done exactly?
 >>
 >> A: To backup:
 >>
 >> 1) pg_start_backup()
 >> 2) tar up the files under the data directory
 >> 3) pg_stop_backup()
 >>
 >> To restore on test server:
 >>
 >> 1) Just untar the tar ball, then start up Postgres. Of course the data
 >> directory is empty beforehand.
 >>
 >> This has been working for almost 2 years without any problem until last
 >> Monday. I remember that I just ran vacuum analyze that table on both DB1
 >> and DB2  that morning. But, I don't think that it harms anything.
 >
 >
 > Well it looks fairly straight forward, to me at least.
 >

Do I miss something obvious? Because this is to me the wrong way to do
the restore. You need to apply WAL files archived between
pg_start_backup and pg_stop_backup to get consistent data files.


Would that not be taken care of by the tar data directory/ untar data 
directory?


I would think if it was a WAL issue the OP could never get the server to 
start and get to the point the query failed on a single table and 
column. All that being said, I think the OP would be better served by 
pg_basebackup:


http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.html







--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Broken primary key after backup restore.

2015-09-17 Thread Guillaume Lelarge
Le 18 sept. 2015 5:23 AM, "Adrian Klaver"  a
écrit :
>
> On 09/17/2015 05:37 PM, Michael Chau wrote:
>>
>> 1)
>>
>> In Production, I have a DB2 which is replicated partially using
Londiste
>> from DB1.
>>
>>
>> Well I think the above needs more explanation to help understand how the
>> DB2 backup got into this state and possibly prevent it in the future.
>>
>> A: So, the DB1 has several schemas in the database. We use Londiste to
>> replicate just one of the schemas to DB2.  The table in question is in
>> that schema. Backup is done on both DB1 and DB2.
>>
>> 2)
>> I make file-system backups nightly on both DBs.
>>
>> How is that done exactly?
>>
>> A: To backup:
>>
>> 1) pg_start_backup()
>> 2) tar up the files under the data directory
>> 3) pg_stop_backup()
>>
>> To restore on test server:
>>
>> 1) Just untar the tar ball, then start up Postgres. Of course the data
>> directory is empty beforehand.
>>
>> This has been working for almost 2 years without any problem until last
>> Monday. I remember that I just ran vacuum analyze that table on both DB1
>> and DB2  that morning. But, I don't think that it harms anything.
>
>
> Well it looks fairly straight forward, to me at least.
>

Do I miss something obvious? Because this is to me the wrong way to do the
restore. You need to apply WAL files archived between pg_start_backup and
pg_stop_backup to get consistent data files.


Re: [GENERAL] Broken primary key after backup restore.

2015-09-17 Thread Adrian Klaver

On 09/17/2015 05:37 PM, Michael Chau wrote:

1)

In Production, I have a DB2 which is replicated partially using Londiste
from DB1.


Well I think the above needs more explanation to help understand how the
DB2 backup got into this state and possibly prevent it in the future.

A: So, the DB1 has several schemas in the database. We use Londiste to
replicate just one of the schemas to DB2.  The table in question is in
that schema. Backup is done on both DB1 and DB2.

2)
I make file-system backups nightly on both DBs.

How is that done exactly?

A: To backup:

1) pg_start_backup()
2) tar up the files under the data directory
3) pg_stop_backup()

To restore on test server:

1) Just untar the tar ball, then start up Postgres. Of course the data
directory is empty beforehand.

This has been working for almost 2 years without any problem until last
Monday. I remember that I just ran vacuum analyze that table on both DB1
and DB2  that morning. But, I don't think that it harms anything.


Well it looks fairly straight forward, to me at least.


Anything in the Postgres logs when you started up the test server based 
off the DB2 file system backup?


Anything in the original DB2 logs around the time you where taking the 
backup?




Thanks





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Broken primary key after backup restore.

2015-09-17 Thread Michael Chau
1)
>
> In Production, I have a DB2 which is replicated partially using Londiste
> from DB1.
>

Well I think the above needs more explanation to help understand how the
DB2 backup got into this state and possibly prevent it in the future.

A: So, the DB1 has several schemas in the database. We use Londiste to
replicate just one of the schemas to DB2.  The table in question is in that
schema. Backup is done on both DB1 and DB2.

2)
I make file-system backups nightly on both DBs.

How is that done exactly?

A: To backup:

1) pg_start_backup()
2) tar up the files under the data directory
3) pg_stop_backup()

To restore on test server:

1) Just untar the tar ball, then start up Postgres. Of course the data
directory is empty beforehand.

This has been working for almost 2 years without any problem until last
Monday. I remember that I just ran vacuum analyze that table on both DB1
and DB2  that morning. But, I don't think that it harms anything.

Thanks


On Thu, Sep 17, 2015 at 4:53 PM, Adrian Klaver 
wrote:

> On 09/17/2015 04:31 PM, Michael Chau wrote:
>
>> Hi,
>>
>> In Production, I have a DB2 which is replicated partially using Londiste
>> from DB1.
>>
>
> Well I think the above needs more explanation to help understand how the
> DB2 backup got into this state and possibly prevent it in the future.
>
>
> I make file-system backups nightly on both DBs.
>
> How is that done exactly?
>
>
>> Last Monday, when I restored the backup made from DB2 to a test server,
>> Postgres(9.3.5) started up fine. But, I found out that the primary key
>> of one of the tables is broken
>>
>> # select * from  order by id desc;
>> ERROR:  could not find left sibling of block 17392 in index "mytable_pkey"
>>
>> I am able to select without using the id index. On Prod DB1 , DB2 and on
>> another test server restored from backup made from DB1, there is no
>> problem, as I am able to select the table with and without index.
>>
>
> Did you restore to the DB2 derived test server in the same way as you did
> the other servers?
>
>
>
>> The table has 5 million rows. And I run Vacuum Analyze once a week.
>>
>> 1) To fix the above error, I tried to run vacuum full on the table and
>> run 'reindex table ;. But it didn't help as the reindexing has
>> taken very very long time and not sure if it has finished or just timed
>> out.
>>
>> There is also a suggestion to recreate the primary key constraint
>> concurrently which I will look into later.
>>
>> 2) However, my main concern right now is whether there is any corruption
>> in the Prods table as it does look fine. Is there any way to check? And
>> also should we trust a file-system backup in this case?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 
*Michael Chau*
*Database Administrator*
*GAME GOLF*
77 Geary St, 5th floor
San Francisco, CA 94108
c) *510-366-3800 <510-366-3800>*
e) *michael.c...@gameyourgame.com *
f) www.facebook.com/gamegolf 
t) @GAMEGOLF
w) www.gamegolf.c *om*


Re: [GENERAL] Broken primary key after backup restore.

2015-09-17 Thread Adrian Klaver

On 09/17/2015 04:31 PM, Michael Chau wrote:

Hi,

In Production, I have a DB2 which is replicated partially using Londiste
from DB1.


Well I think the above needs more explanation to help understand how the 
DB2 backup got into this state and possibly prevent it in the future.



I make file-system backups nightly on both DBs.

How is that done exactly?



Last Monday, when I restored the backup made from DB2 to a test server,
Postgres(9.3.5) started up fine. But, I found out that the primary key
of one of the tables is broken

# select * from  order by id desc;
ERROR:  could not find left sibling of block 17392 in index "mytable_pkey"

I am able to select without using the id index. On Prod DB1 , DB2 and on
another test server restored from backup made from DB1, there is no
problem, as I am able to select the table with and without index.


Did you restore to the DB2 derived test server in the same way as you 
did the other servers?




The table has 5 million rows. And I run Vacuum Analyze once a week.

1) To fix the above error, I tried to run vacuum full on the table and
run 'reindex table ;. But it didn't help as the reindexing has
taken very very long time and not sure if it has finished or just timed out.

There is also a suggestion to recreate the primary key constraint
concurrently which I will look into later.

2) However, my main concern right now is whether there is any corruption
in the Prods table as it does look fine. Is there any way to check? And
also should we trust a file-system backup in this case?

Thanks









--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Broken primary key after backup restore.

2015-09-17 Thread Michael Chau
Hi,

In Production, I have a DB2 which is replicated partially using Londiste
from DB1. I make file-system backups nightly on both DBs.

Last Monday, when I restored the backup made from DB2 to a test server,
Postgres(9.3.5) started up fine. But, I found out that the primary key of
one of the tables is broken

# select * from  order by id desc;
ERROR:  could not find left sibling of block 17392 in index "mytable_pkey"

I am able to select without using the id index. On Prod DB1 , DB2 and on
another test server restored from backup made from DB1, there is no
problem, as I am able to select the table with and without index.

The table has 5 million rows. And I run Vacuum Analyze once a week.

1) To fix the above error, I tried to run vacuum full on the table and run
'reindex table ;. But it didn't help as the reindexing has taken
very very long time and not sure if it has finished or just timed out.

There is also a suggestion to recreate the primary key constraint
concurrently which I will look into later.

2) However, my main concern right now is whether there is any corruption in
the Prods table as it does look fine. Is there any way to check? And also
should we trust a file-system backup in this case?

Thanks