Hi,
On Sat, Apr 11, 2009 at 1:31 AM, Simon Riggs <[email protected]> wrote:
>
> Fujii-san,
>
> I like the new patch using the content of the file to determine the
> mode. Much easier to use at failover time.
>
> On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote:
>
>> > One problem with this patch is that in smart mode, the trigger file is not
>> > deleted. That's different from current pg_standby behavior, and makes
>> > accidental failovers after one failover more likely.
>>
>> Yes, it's because pg_standby cannot be sure when the trigger file
>> can be removed in smart mode. If the trigger file is deleted as soon
>> as it's found, just like in fast mode, pg_standby may keep waiting
>> for WAL file again.
>
> My understanding of smart mode is fairly simple:
>
> if (triggered)
> {
> if (smartMode && nextWALfile+1 exists)
> exit(0);
> else
> {
> delete trigger file
> exit(1);
> }
> }
>
> If you perform a file lookahead (the +1) as shown above then you avoid
> the problem Heikki observes.
Thanks for the suggestion!
A lookahead (the +1) may have pg_standby get stuck as follows.
Am I missing something?
1. the trigger file containing "smart" is created.
2. pg_standby is executed.
2-1. nextWALfile is restored.
2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
3. the restored nextWALfile is applied.
4. pg_standby is executed again to restore nextWALfile+1.
5. pg_standby gets stuck because the trigger file and nextWALfile+1
don't exist.
But, a lookahead nextWALfile seems to work fine.
if (triggered)
{
if (smartMode && nextWALfile exists)
exit(0)
else
{
delete trigger file
exit(1)
}
}
1. the trigger file containing "smart" is created.
2. pg_standby is executed.
2-1. nextWALfile is restored.
3. the restored nextWALfile is applied.
4. pg_standby is executed again to restore nextWALfile+1.
4-1. the trigger file is deleted because nextWALfile+1 doesn't exist.
5. the startup process fails to read nextWALfile+1.
6. pg_standby is executed again to re-fetch nextWALfile.
6-1. nextWALfile is restored.
6-2. pg_standby doesn't get stuck because nextWALfile exists.
Furthermore, pg_standby may have to check if nextWALfile exists
not only in archiveLocation but also in pg_xlog. Because, when
pg_xlog of the primary server can be read at failover, WAL files
in it may be copied to pg_xlog of the standby server to be applied.
(but, not sure if it's better to copy such files to pg_xlog instead of
archiveLocation in this case).
Comments?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers