Re: [HACKERS] bug of recovery?

Simon Riggs Thu, 29 Sep 2011 04:49:45 -0700

On Thu, Sep 29, 2011 at 12:31 PM, Fujii Masao <[email protected]> wrote:
> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <[email protected]> wrote:
>> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote:
>>> On 27.09.2011 00:28, Florian Pflug wrote:
>>>> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>>>>> we (think we) have reached consistency, rather than leaving it to be
>>>>> done only when we exit recovery mode.
>>>>
>>>> I believe we also need to prevent the creation of restart points before
>>>> we've reached consistency.
>>>
>>> Seems reasonable. We could still allow restartpoints when the hash table is 
>>> empty, though. And once we've reached consistency, we can throw an error 
>>> immediately in log_invalid_page(), instead of adding the entry in the hash 
>>> table.
>>
>> That mimics the way the rm_safe_restartpoint callbacks work, which is good.
>>
>> Actually, why don't we use that machinery to implement this? There's 
>> currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need 
>> to create one that checks whether invalid_page_tab is empty.
>
> Okay, the attached patch prevents the creation of restartpoints by using
> rm_safe_restartpoint callback if we've not reached a consistent state yet
> and the invalid-page table is not empty. But the invalid-page table is not
> tied to the specific resource manager, so using rm_safe_restartpoint for
> that seems to slightly odd. Is this OK?
>
> Also, according to other suggestions, the patch changes 
> XLogCheckInvalidPages()
> so that it's called as soon as we've reached a consistent state, and changes
> log_invalid_page() so that it emits PANIC immediately if consistency is 
> already
> reached. These are very good changes, I think. Because they enable us to
> notice serious problem which causes PANIC error immediately. Without these
> changes, you unfortunately might notice that the standby database is corrupted
> when failover happens. Though such a problem might rarely happen, I think it's
> worth doing those changes.


Patch does everything we agreed it should.

Good suggestion from Florian.

This worries me slightly now though because the patch makes us PANIC
in a place we didn't used to and once we do that we cannot restart the
server at all. Are we sure we want that? It's certainly a great way to
shake down errors in other code...

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bug of recovery?

Reply via email to