Hello Thanks for replying.
It is really a complicated concept. So I think that in a mission critical environment , it is not a good choice to turn full_page_writes on. Best Regards 2013/8/27 Jeff Janes <jeff.ja...@gmail.com> > On Sun, Aug 25, 2013 at 7:57 PM, 高健 <luckyjack...@gmail.com> wrote: > > Hi : > > > > Thanks to Alvaro! Sorry for replying lately. > > > > I have understood a little about it. > > > > But the description of full_page_write made me even confused. Sorry that > > maybe I go to another problem: > > > > It is said: > > > http://www.postgresql.org/docs/9.2/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES > > ---------- > > When this parameter is on, the PostgreSQL server writes the entire > content > > of each disk page to WAL during the first modification of that page > after a > > checkpoint. This is needed because a page write that is in process > during an > > operating system crash might be only partially completed, leading to an > > on-disk page that contains a mix of old and new data. > > ------- > > > > Let me imagine that: > > On a disk page, there are following data: > > > > id=1 val=1 with transaction id of 1001 > > id=2 val=2 with transaction id of 1002 > > id=3 val=3 with transaction id of 1003 > > > > If I start DB, > > And begin with transaction id of 2002 deal with data of id=2 ,making val > to > > 20 > > Then with trsansaction id of 2003 deal with data of id=3,making val to 30 > > > > If With full_page_write =off, > > When my checkpoint occur, it succeed with transaction 2002 but failed > with > > 2003 because of crash. > > A checkpoint either succeeds or fails. It cannot succeed with some > transactions and fail with others. > > > Then disk page will be of: > > > > id=1 val=1 with transaction id of 1001------maybe this is the very old > data > > id=2 val=20 with transaction id of 2002------This is now new data > > id=3 val=3 with transaction id of 1003------This is old data. > > > Postgres does not do in-place updates, it marks the old row as > obsolete and creates a new one. > > id=1 val= 1 with transaction id of 1001 > id=2 val= 2 with transaction id of 1002 xmax of 2002 > id=3 val= 3 with transaction id of 1003 xmax of 2003 > id=2 val=20 with transaction id of 2002 > id=3 val=30 with transaction id of 2003 > > Of course the whole point of a torn page write is that you don't how > much got written, so you don't know what is actually on the disk. > > > When DB restart from crash, > > I think that there are wal data of transaction id 2002 and 2003 beause > > that wal written to wal_buffer is before data written to shared_buffer. > > The wal is written out of wal_buffer and flushed, before the > corresponding block is written out of shared_buffer. > > > So if Online wal log file is ok, there will be no data lost, and > > roll-forward and roll-back can happen. > > If some online wal log file is dmaged during crash: > > There might be some data lost,but if we have archive log, we can restore > > back due to archive wal log's latest transaction id. > > Most WAL records would have no problem being applied to a block that > is an otherwise uncorrupted mix of old and new. > > But some WAL records have instructions that amount to "grab 134 bytes > from offset 7134 in the block and move them to offset 1623". If the > block is an unknown mix of old and new data, that can't be carried out > safely. > > Cheers, > > Jeff >