Re: [GENERAL] requested timeline doesn't contain minimum recovery point

2017-01-11 Thread Tom DalPozzo
>
>
> > I mean, could random bytes appear as a valid record (very unlikely, but
> > possible)?
>
> Yes, that could be possible if some memory or disk is broken. That's
> why, while it is important to take backups, it is more important to
> make sure that they are able to restore correctly before deploying
> them.
> --
> Michael
>
Hi,
of course against memory or disk corruption, nothing 100% safe can be done.
But, excluding these cases, can there be situations in which the WAL reader
gets confused?
I'm thinking at WAL segments recycling: when a WAL is recycled it is not
filled with anything (zeroes...) right?
If I'm right, then there are still old records in the WAL. If they're
aligned with the new offsets, I guess that the system can understand that
they're older (looking at some ID) and not valid but if not aligned, there
could be an unlucky and unlikely issue.

In other word,  excluding HW problems and possible unwanted bugs, I'd like
to know if the logic underneath WAL reading at startup is 100%safe.

Regards
Pupillo


Re: [GENERAL] requested timeline doesn't contain minimum recovery point

2017-01-10 Thread Michael Paquier
On Tue, Jan 10, 2017 at 10:35 PM, Tom DalPozzo  wrote:
> I redid the tests following your suggestion to issue a checkpoint manually.
> IT WORKS!
> Just a question: when the standby server starts, I see the log error
> messages (ex.: "invalid record length...")  when WAL end is reached. I know
> that it's normal.
> But I'm wondering if the system, in order to detect the end of the WAL,
> controls only the validity of the records in the WAL.

You may want to look at xlogreader.c and track report_invalid_record()
to see what are the error checks being done. No full checks are done
depending on the record types, but there are some checks for the
backup blocks, the size record, etc.

> I mean, could random bytes appear as a valid record (very unlikely, but
> possible)?

Yes, that could be possible if some memory or disk is broken. That's
why, while it is important to take backups, it is more important to
make sure that they are able to restore correctly before deploying
them.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] requested timeline doesn't contain minimum recovery point

2017-01-10 Thread Tom DalPozzo
>
> Could you give more details? What does pg_rewind tell you at each
>> phase? Is that on Postgres 9.5 or 9.6? I use pg_rewind quite
>> extensively on 9.5 but I have no problems of this time with multiple
>> timeline jumps when juggling between two nodes. Another thing that is
>> coming to my mind: you are using pg_rewing with a source node that is
>> running. You should issue a checkpoint manually after promoting the
>> node to be sure that its control file gets the new timeline number.
>> --
>> Michael
>>
> Hi,
>
> sometimes pg_rewind says that nothing needs to be done, sometimes it says
> it's rewinding and done at the end.
> I'm using 9.6. I moved there from 9.5 as I'm also using replication slots
> and in 9.6 there is a second parameter added. But I seem to remember that
> it did the same in 9.5 too but I'm not really sure.
> I checked that the server, at promotion said the message about the new
> timeline.
> I will make some more tests.
> Regards
> Pupillo
>

Hi!
I redid the tests following your suggestion to issue a checkpoint manually.
IT WORKS!
Just a question: when the standby server starts, I see the log error
messages (ex.: "invalid record length...")  when WAL end is reached. I know
that it's normal.
But I'm wondering if the system, in order to detect the end of the WAL,
controls only the validity of the records in the WAL.
I mean, could random bytes appear as a valid record (very unlikely, but
possible)?
Thanks
Pupillo


Re: [GENERAL] requested timeline doesn't contain minimum recovery point

2017-01-06 Thread Tom DalPozzo
2017-01-06 13:09 GMT+01:00 Michael Paquier :

> On Fri, Jan 6, 2017 at 1:01 AM, Tom DalPozzo  wrote:
> > Hi,
> > there is something happening in my replication that is not clear to me. I
> > think I'm missing something.
> > I've two server, red and blue.
> > red is primary blue is standby, async repl.
> > Now:
> > 1 cleanly stop red
> > 2 promote blue
> > 3 insert tuples in blue
> > 4 from red site, pg_rewind from blue to red dir.
> > 5 start red as standby-> OK
> > 6 wait a long time and then cleanly stop blue
> > 7 promote red
> > 8 insert tuples in red
> > 9 from blue site, pg_rewind from red to blue dir
> > 10 start blue as standby -> I get "requested timeline 3 doesn't contain
> > minimum recovery point 1/... on timeline 1
> >
> > Sometimes this "switching game"  works up to timeline 4 or 5, not always
> 3
>
> Could you give more details? What does pg_rewind tell you at each
> phase? Is that on Postgres 9.5 or 9.6? I use pg_rewind quite
> extensively on 9.5 but I have no problems of this time with multiple
> timeline jumps when juggling between two nodes. Another thing that is
> coming to my mind: you are using pg_rewing with a source node that is
> running. You should issue a checkpoint manually after promoting the
> node to be sure that its control file gets the new timeline number.
> --
> Michael
>
Hi,

sometimes pg_rewind says that nothing needs to be done, sometimes it says
it's rewinding and done at the end.
I'm using 9.6. I moved there from 9.5 as I'm also using replication slots
and in 9.6 there is a second parameter added. But I seem to remember that
it did the same in 9.5 too but I'm not really sure.
I checked that the server, at promotion said the message about the new
timeline.
I will make some more tests.
Regards
Pupillo


Re: [GENERAL] requested timeline doesn't contain minimum recovery point

2017-01-06 Thread Michael Paquier
On Fri, Jan 6, 2017 at 1:01 AM, Tom DalPozzo  wrote:
> Hi,
> there is something happening in my replication that is not clear to me. I
> think I'm missing something.
> I've two server, red and blue.
> red is primary blue is standby, async repl.
> Now:
> 1 cleanly stop red
> 2 promote blue
> 3 insert tuples in blue
> 4 from red site, pg_rewind from blue to red dir.
> 5 start red as standby-> OK
> 6 wait a long time and then cleanly stop blue
> 7 promote red
> 8 insert tuples in red
> 9 from blue site, pg_rewind from red to blue dir
> 10 start blue as standby -> I get "requested timeline 3 doesn't contain
> minimum recovery point 1/... on timeline 1
>
> Sometimes this "switching game"  works up to timeline 4 or 5, not always 3

Could you give more details? What does pg_rewind tell you at each
phase? Is that on Postgres 9.5 or 9.6? I use pg_rewind quite
extensively on 9.5 but I have no problems of this time with multiple
timeline jumps when juggling between two nodes. Another thing that is
coming to my mind: you are using pg_rewing with a source node that is
running. You should issue a checkpoint manually after promoting the
node to be sure that its control file gets the new timeline number.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general