Looping pgsql-general mail list

I see I am not clear in my question , below are the order of events we see
when  we get a  invalid page header in block corruption

-Windows server crashed/restarted due to power failure ( we believe)  ( I
see that write-cache / write back cache / Disk cache are enabled and we
don't have battery backup )
-Posgres DB crashes (LOG:  database system was not properly shut down;
automatic recovery in progress )
- After around 3hrs after crash we see below messages in log

2016-03-03 20:13:18 GMT ERROR:  invalid page header in block 204 of
relation base/16413/16900
2016-03-03 20:13:18 GMT CONTEXT:  automatic analyze of table "
a.parametertable" => this is a relation not Index

- select count(*) on table gives invalid page header message , where as
 select count(*) order by primary key does not give any issue
- After clearing the blocks using zero_damaged_pages , total count of table
rows  using the table and order by primary key matches which means there is
no dataloss ( I havent verified each record by record , I assume there is
no dataloss) please correct ??

I would like to know what was the cause of invalid page header and is there
any way I can reproduce this error ?

Appreciate your comments and suggestions on this ?

-Sreekanth

On Thu, Dec 8, 2016 at 12:55 PM, Sreekanth Palluru <sree...@gmail.com>
wrote:

>
> Forgot to loop community
>
>
> Hi Jorge/All,
> Thanks for the reply.
> As per the documentation , I think backend reads the this page header and
> reports that it is damaged.
> I am looking at ways we re-create this scenario of creation this blank
> corrupted page ? do I have any control over Backend and after it
> initializes a new page using function PageInit
> <https://doxygen.postgresql.org/bufpage_8c.html#ab871202326b101c6ec24b7f628157c2c>
> (Page
> <https://doxygen.postgresql.org/bufpage_8h.html#a2010e3258a7075b32ad5750134ab9c5c>
> page, Size
> <https://doxygen.postgresql.org/c_8h.html#af9ecec2d692138fab9167164a457cbd4>
> pageSize, Size
> <https://doxygen.postgresql.org/c_8h.html#af9ecec2d692138fab9167164a457cbd4>
> specialSize) and I want to halt/crash the backend by stopping postgres
> database or through Hardware/system crash ?
> Is this possible ?
>
> Also I see that based on below comments from Source code, having such
> pages in database is normal . is my understanding correct ?
>
>
> /*
> 63  * PageIsVerified
> 64  * Check that the page header and checksum (if any) appear valid.
> 65  *
> 66  * This is called when a page has just been read in from disk. The
> idea is
> 67  * to cheaply detect trashed pages before we go nuts following bogus
> item
> 68  * pointers, testing invalid transaction identifiers, etc.
> 69  *
> 70  * It turns out to be necessary to allow zeroed pages here too. Even
> though
> 71  * this routine is *not* called when deliberately adding a page to a
> relation,
> 72  * there are scenarios in which a zeroed page might be found in a
> table.
> 73  * (Example: a backend extends a relation, then crashes before it can
> write
> 74  * any WAL entry about the new page. The kernel will already have the
> 75  * zeroed page in the file, and it will stay that way after restart.)
> So we
> 76  * allow zeroed pages here, and are careful that the page access macros
> 77  * treat such a page as empty and without free space. Eventually,
> VACUUM
> 78  * will clean up such a page and make it usable.
> 79  */
> zero_damaged_pages (boolean)
>
> Detection of a damaged page header normally causes PostgreSQL to report
> an error, aborting the current transaction. Setting zero_damaged_pages to
> on causes the system to instead report a warning, zero out the damaged page
> in memory, and continue processing. This behavior will destroy data,
> namely all the rows on the damaged page. However, it does allow you to get
> past the error and retrieve rows from any undamaged pages that might be
> present in the table. It is useful for recovering data if corruption has
> occurred due to a hardware or software error. You should generally not set
> this on until you have given up hope of recovering data from the damaged
> pages of a table. Zeroed-out pages are not forced to disk so it is
> recommended to recreate the table or the index before turning this
> parameter off again. The default setting is off, and it can only be
> changed by a superuser.
>
> On Thu, Dec 8, 2016 at 9:50 AM, Jorge Torralba <jorge.torra...@gmail.com>
> wrote:
>
>> Look into this setting.
>>
>> zero_damaged_pages  = on;
>>
>> The docs should explain it.
>>
>>
>>
>> On Wed, Dec 7, 2016 at 2:47 PM, sreekanth Palluru <sree...@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I am getting invalid page header error and what I could observe is when
>>> I select the table I get this error , where as if I select table and order
>>> by primary key I can retrieve the rows from table.
>>>
>>> And I don't see any dataloss ( based on total number records) after
>>> fixing the blocks using zero_damaged_pages=on and then vacuum full on the
>>> table
>>>
>>>
>>> Please note I have renamed few tables to avoid giving actual table names
>>>
>>> We run Postgres 9.2 version on Windows and Dell Optiplex XE2 server and
>>> we suspect that block corruption would have happen due to
>>> hardware/memory/power failures reasons and I have gone through wikik
>>> https://wiki.postgresql.org/wiki/Reliable_Writes.
>>>
>>> I want to understand why we can read the table through where there
>>> exists index and explain plan shows Index scan and with high cost compare
>>> to seq scan .
>>> I assume that since there no rows/data present in these corrupted blocks
>>> index scan skips these blocks and hence it is not throwing the error .
>>>
>>> Also , I want to know what would have caused the postgres to create
>>> these corrupted blocks and
>>> can I reproduce this error ? appreciate if you share any pointers to
>>> blogs/mailing lists if this type of issue is already discussed ?
>>>
>>>
>>> create table a.parametertable_bak as select * from a.parametertable
>>> order by id;
>>>
>>> labs=# select count(*) from a.parametertable_bak ;
>>>  count
>>> -------
>>>  31415
>>> (1 row)
>>>
>>>
>>> labs=#
>>> labs=# checkpoint;
>>> CHECKPOINT
>>> labs=# set zero_damaged_pages=on;
>>> SET
>>> labs=# vacuum full a.parametertable;
>>> WARNING:  invalid page header in block 204 of relation base/16413/16900;
>>> zeroing out page
>>> WARNING:  invalid page header in block 205 of relation base/16413/16900;
>>> zeroing out page
>>> VACUUM
>>> labs=# select count(*) from a.parametertable ;
>>>  count
>>> -------
>>>  31415
>>> (1 row)
>>>
>>>
>>> labs=#
>>> -Sreekanth
>>>
>>
>>
>>
>> --
>> Thanks,
>>
>> Jorge Torralba
>> ----------------------------
>>
>> Note: This communication may contain privileged or other confidential
>> information. If you are not the intended recipient, please do not print,
>> copy, retransmit, disseminate or otherwise use the information. Please
>> indicate to the sender that you have received this email in error and
>> delete the copy you received. Thank You.
>>
>
>
>
> --
> Regards
> Sreekanth
>
>
>
> --
> Regards
> Sreekanth
>



-- 
Regards
Sreekanth

Reply via email to