Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
>
> Hi, I had already read that doc but I can't  answer clearly to my
>> questions 2,4 and 5.
>>
>
> The answer would seem to depend on what you consider 'a consistency state
> position'. Is it possible to be more explicit about what you mean?
>
>>
>> Hi, I meant a position such that, if you replay up to it, then the DB is
in a consistent state (transactions done entirely or not a t all...).
But, as Alvaro wrote, any position is ok
Thank you very much
Pupillo


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
>
> > Hi,
> > so let's suppose that the WAL is:
> > LSN 10: start transaction 123
> > LSN 11: update tuple 100
> >checkpoint position here (not a record but just for understanding)
> > LSN 12: update tuple 100
> > LSN 13: update tuple 100
> > LSN 14: checkpoint record ( postion=11)
> > LSN 15: update tuple 100
> > and that the system crashes now, before ending to write all the
> > transaction's recs to the WAL  (other updates and commit record missing).
> >
> > At the replay, starting from LSN 12, the entire page we had at LSN 11 is
> > written to the disk, though carrying inconsistent data.
> > Then we can even replay up to the end of WAL but always getting
> > inconsistent data.
> > BUT, you say, as the tuple is not commited in the WAL, only the old
> version
> > of the tuple will be visible? Right?
>
> Yes -- all the updated tuples are invisible because the commit record
> for transaction 123 does not appear in wal.  A future VACUUM will remove
> all those tuples.  Note that precisely for this reason, the original
> version of the tuple had not been removed yet.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Ok! Now many things are clear to me
Thank you very much
Pupillo


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Alvaro Herrera
Tom DalPozzo wrote:

> Hi,
> so let's suppose that the WAL is:
> LSN 10: start transaction 123
> LSN 11: update tuple 100
>checkpoint position here (not a record but just for understanding)
> LSN 12: update tuple 100
> LSN 13: update tuple 100
> LSN 14: checkpoint record ( postion=11)
> LSN 15: update tuple 100
> and that the system crashes now, before ending to write all the
> transaction's recs to the WAL  (other updates and commit record missing).
> 
> At the replay, starting from LSN 12, the entire page we had at LSN 11 is
> written to the disk, though carrying inconsistent data.
> Then we can even replay up to the end of WAL but always getting
> inconsistent data.
> BUT, you say, as the tuple is not commited in the WAL, only the old version
> of the tuple will be visible? Right?

Yes -- all the updated tuples are invisible because the commit record
for transaction 123 does not appear in wal.  A future VACUUM will remove
all those tuples.  Note that precisely for this reason, the original
version of the tuple had not been removed yet.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
>
> Whether any individual tuple in the data files is visible or not depends
> not only on the data itself, but also on the commit status of the
> transactions that created it (and deleted it, if any).  Replaying WAL
> also updates the commit status of transactions, so if you're in the
> middle of replaying WAL, you may be adding tuples to the data files, but
> those tuples will not become visible until their commit records are also
> updated.
>
> You can stop replaying WAL at any point, and data will always be in a
> consistent state.  Some data tuples might be "from the future" and those
> will not be visible, which is what makes it all consistent.
>
> Hi,
so let's suppose that the WAL is:
LSN 10: start transaction 123
LSN 11: update tuple 100
   checkpoint position here (not a record but just for understanding)
LSN 12: update tuple 100
LSN 13: update tuple 100
LSN 14: checkpoint record ( postion=11)
LSN 15: update tuple 100
and that the system crashes now, before ending to write all the
transaction's recs to the WAL  (other updates and commit record missing).

At the replay, starting from LSN 12, the entire page we had at LSN 11 is
written to the disk, though carrying inconsistent data.
Then we can even replay up to the end of WAL but always getting
inconsistent data.
BUT, you say, as the tuple is not commited in the WAL, only the old version
of the tuple will be visible? Right?

Regards
Pupillo


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Alvaro Herrera
Tom DalPozzo wrote:

> 2) I see that a checkpoint position can be right in the middle of a group
> of records related to a transaction (in the example, transaction id 10684).
> So a checkpoint position is NOT a consistency state point, right?

> 4) If I'm right at 2) then, between the checkpoint position (1/F00A7448 )
> and the checkpoint record position (1/FCBD7510) there must be a point where
> the DB is in a consistency state. If not, in case of crash just after
> writing the checkpoint record to the WAL and its position to pg_control,
> the system would replay from the checkpoint position (known by  last
> checkpoint record) without finding a consistency state. Right?
> 
> 5) How can we define, in terms of log records, a consistency state position?

Whether any individual tuple in the data files is visible or not depends
not only on the data itself, but also on the commit status of the
transactions that created it (and deleted it, if any).  Replaying WAL
also updates the commit status of transactions, so if you're in the
middle of replaying WAL, you may be adding tuples to the data files, but
those tuples will not become visible until their commit records are also
updated.

You can stop replaying WAL at any point, and data will always be in a
consistent state.  Some data tuples might be "from the future" and those
will not be visible, which is what makes it all consistent.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver

On 01/09/2017 01:10 PM, Tom DalPozzo wrote:

Reread your original post and realized you where also asking
about transaction consistency and WALs. The thumbnail version is
that Postgres writes transactions to the WALs before they are
written to the data files on disk. A checkpoint represents a
point in the sequence when is is known that the changes recorded
in the WAL have been also recorded in the disk data files. So
Postgres then knows that in a recovery scenario it needs to only
redo/replay the WAL changes that are past the last checkpoint.
So the transactions are there it is just a matter of if they
need to be replayed or not. This is subject to caveats:


https://www.postgresql.org/docs/9.5/static/wal-reliability.html



Hi, I had already read that doc but I can't  answer clearly to my
questions 2,4 and 5.


The answer would seem to depend on what you consider 'a consistency 
state position'. Is it possible to be more explicit about what you mean?



Regards
Pupillo





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
>
> Reread your original post and realized you where also asking about
>> transaction consistency and WALs. The thumbnail version is that Postgres
>> writes transactions to the WALs before they are written to the data files
>> on disk. A checkpoint represents a point in the sequence when is is known
>> that the changes recorded in the WAL have been also recorded in the disk
>> data files. So Postgres then knows that in a recovery scenario it needs to
>> only redo/replay the WAL changes that are past the last checkpoint. So the
>> transactions are there it is just a matter of if they need to be replayed
>> or not. This is subject to caveats:
>>
>
> https://www.postgresql.org/docs/9.5/static/wal-reliability.html
>
>
>> Hi, I had already read that doc but I can't  answer clearly to my
questions 2,4 and 5.
Regards
Pupillo


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver

On 01/09/2017 06:47 AM, Tom DalPozzo wrote:

https://www.postgresql.org/docs/9.5/static/wal-internals.html



"After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file pg_control. Therefore, at
the start of recovery, the server first reads pg_control and then
the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record. Because the entire content of data pages is saved in the log
on the first page modification after a checkpoint (assuming
full_page_writes is not disabled), all pages changed since the
checkpoint will be restored to a consistent state."


Hi, yes I know that, it's what I meant in my point 3). As it says first
 "the checkpoint's position (NOT the record)is saved in the file
pg_control" then I had some doubt.


Reread your original post and realized you where also asking about 
transaction consistency and WALs. The thumbnail version is that Postgres 
writes transactions to the WALs before they are written to the data 
files on disk. A checkpoint represents a point in the sequence when is 
is known that the changes recorded in the WAL have been also recorded in 
the disk data files. So Postgres then knows that in a recovery scenario 
it needs to only redo/replay the WAL changes that are past the last 
checkpoint. So the transactions are there it is just a matter of if they 
need to be replayed or not. This is subject to caveats:


https://www.postgresql.org/docs/9.5/static/wal-reliability.html


Regards
Pupillo





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver

On 01/09/2017 06:47 AM, Tom DalPozzo wrote:

https://www.postgresql.org/docs/9.5/static/wal-internals.html



"After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file pg_control. Therefore, at
the start of recovery, the server first reads pg_control and then
the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record. Because the entire content of data pages is saved in the log
on the first page modification after a checkpoint (assuming
full_page_writes is not disabled), all pages changed since the
checkpoint will be restored to a consistent state."


Hi, yes I know that, it's what I meant in my point 3). As it says first
 "the checkpoint's position (NOT the record)is saved in the file
pg_control" then I had some doubt.


Yes it is just one piece of information stored in the file.

To see what else is stored there do:

pg_controldata -D your_cluster_data_directory




Regards
Pupillo





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
>
> https://www.postgresql.org/docs/9.5/static/wal-internals.html
>>
>
> "After a checkpoint has been made and the log flushed, the checkpoint's
> position is saved in the file pg_control. Therefore, at the start of
> recovery, the server first reads pg_control and then the checkpoint record;
> then it performs the REDO operation by scanning forward from the log
> position indicated in the checkpoint record. Because the entire content of
> data pages is saved in the log on the first page modification after a
> checkpoint (assuming full_page_writes is not disabled), all pages changed
> since the checkpoint will be restored to a consistent state."
>
>
>> Hi, yes I know that, it's what I meant in my point 3). As it says first
 "the checkpoint's position (NOT the record)is saved in the file
pg_control" then I had some doubt.
Regards
Pupillo


Re: [GENERAL] checkpoint clarifications needed

2017-01-09 Thread Adrian Klaver

On 01/09/2017 06:14 AM, Tom DalPozzo wrote:

Hi, I need some clarifications about checkpoints.
Below here a log from my standby server when started and then some parts
of the interested WAL in the master's cluster  obtained by pg_xlogdump.
Just to have an example to talk on.

1) I see: "LOG:  redo starts at 1/F00A7448" . I was expecting a
checkpoint record around 1/F00A7448 but the related checkpoint record is
at lsn: 1/FCBD7510 instead.
I expected that because I read in the doc :"Checkpoints are points in
the sequence of transactions at which it is guaranteed that the heap and
index data files have been updated with ALL information written BEFORE
that checkpoint".
And I interpreted that as "All information written before that
checkpoint RECORD" but now I guess that one thing is a checkpoint point
and one thing is a checkpoint record. Right?

2) I see that a checkpoint position can be right in the middle of a
group of records related to a transaction (in the example, transaction
id 10684). So a checkpoint position is NOT a consistency state point, right?

3) According to doc at 29.5,  in pg_control the position of the last
checkpoint record (not the checkpoint position itself) is saved  right?

4) If I'm right at 2) then, between the checkpoint position (1/F00A7448
) and the checkpoint record position (1/FCBD7510) there must be a point
where the DB is in a consistency state. If not, in case of crash just
after writing the checkpoint record to the WAL and its position to
pg_control, the system would replay from the checkpoint position (known
by  last checkpoint record) without finding a consistency state. Right?

5) How can we define, in terms of log records, a consistency state position?


https://www.postgresql.org/docs/9.5/static/wal-internals.html

"After a checkpoint has been made and the log flushed, the checkpoint's 
position is saved in the file pg_control. Therefore, at the start of 
recovery, the server first reads pg_control and then the checkpoint 
record; then it performs the REDO operation by scanning forward from the 
log position indicated in the checkpoint record. Because the entire 
content of data pages is saved in the log on the first page modification 
after a checkpoint (assuming full_page_writes is not disabled), all 
pages changed since the checkpoint will be restored to a consistent state."




Best regards
Pupillo

STANDBY SERVER LOG
LOG:  redo starts at 1/F00A7448

LOG:  consistent recovery state reached at 2/426DF28
LOG:  invalid record length at 2/426DF28: wanted 24, got 0
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 2/400 on timeline 1


FROM PG_XLOGDUMP OF MASTER

rmgr: Heaplen (rec/tot): 14/  1186, tx:  10684, lsn:
1/F009EEE0, prev 1/F009EEA8, desc: UPDATE off 1 xmax 10684 ; new off 3
xmax 0, blkref #0: rel 1663/16384/16422 blk 20054, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btree   len (rec/tot):  2/64, tx:  10684, lsn:
1/F009F388, prev 1/F009EEE0, desc: INSERT_LEAF off 2, blkref #0: rel
1663/16384/16428 blk 711
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F009F3C8, prev 1/F009F388, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9663 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A13E8, prev 1/F009F3C8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9664 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A3408, prev 1/F00A13E8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9665 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A5428, prev 1/F00A3408, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9666 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A7448, prev 1/F00A5428, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9667 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A9468, prev 1/F00A7448, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9668 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00AB488, prev 1/F00A9468, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9669 FPW
rmgr: Heaplen (rec/tot):  8/  8063, tx:  10682, lsn:
1/F00AD4A8, prev 1/F00AB488, desc: LOCK off 5: xid 10682: flags 0
LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19787 FPW
rmgr: Heaplen (rec/tot):  8/  8063, tx:  10684, lsn:
1/F00AF440, prev 1/F00AD4A8, desc: LOCK off 2: xid 10684: flags 0
LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19774 FPW
rmgr: Heaplen (rec/tot): 14/  4657, tx:  10682, lsn:
1/F00B13D8, prev 1/F00AF440, desc: UPDATE off 5 xmax 10682 ; new off 4
xmax 0, blkref #0: rel 1663/16384/16422 blk 20075 FPW, blkref #1: rel
1663/16384/16422 blk 19787
rmgr: Heaplen (rec/tot): 14/  4657, tx:  10684, lsn:
1/F00B2628, prev 

[GENERAL] checkpoint clarifications needed

2017-01-09 Thread Tom DalPozzo
Hi, I need some clarifications about checkpoints.
Below here a log from my standby server when started and then some parts of
the interested WAL in the master's cluster  obtained by pg_xlogdump.
Just to have an example to talk on.

1) I see: "LOG:  redo starts at 1/F00A7448" . I was expecting a checkpoint
record around 1/F00A7448 but the related checkpoint record is at lsn:
1/FCBD7510 instead.
I expected that because I read in the doc :"Checkpoints are points in the
sequence of transactions at which it is guaranteed that the heap and index
data files have been updated with ALL information written BEFORE that
checkpoint".
And I interpreted that as "All information written before that checkpoint
RECORD" but now I guess that one thing is a checkpoint point and one thing
is a checkpoint record. Right?

2) I see that a checkpoint position can be right in the middle of a group
of records related to a transaction (in the example, transaction id 10684).
So a checkpoint position is NOT a consistency state point, right?

3) According to doc at 29.5,  in pg_control the position of the last
checkpoint record (not the checkpoint position itself) is saved  right?

4) If I'm right at 2) then, between the checkpoint position (1/F00A7448 )
and the checkpoint record position (1/FCBD7510) there must be a point where
the DB is in a consistency state. If not, in case of crash just after
writing the checkpoint record to the WAL and its position to pg_control,
the system would replay from the checkpoint position (known by  last
checkpoint record) without finding a consistency state. Right?

5) How can we define, in terms of log records, a consistency state position?

Best regards
Pupillo

STANDBY SERVER LOG
LOG:  redo starts at 1/F00A7448

LOG:  consistent recovery state reached at 2/426DF28
LOG:  invalid record length at 2/426DF28: wanted 24, got 0
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 2/400 on timeline 1


FROM PG_XLOGDUMP OF MASTER

rmgr: Heaplen (rec/tot): 14/  1186, tx:  10684, lsn:
1/F009EEE0, prev 1/F009EEA8, desc: UPDATE off 1 xmax 10684 ; new off 3 xmax
0, blkref #0: rel 1663/16384/16422 blk 20054, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btree   len (rec/tot):  2/64, tx:  10684, lsn:
1/F009F388, prev 1/F009EEE0, desc: INSERT_LEAF off 2, blkref #0: rel
1663/16384/16428 blk 711
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F009F3C8, prev 1/F009F388, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9663 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A13E8, prev 1/F009F3C8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9664 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A3408, prev 1/F00A13E8, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9665 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A5428, prev 1/F00A3408, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9666 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A7448, prev 1/F00A5428, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9667 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00A9468, prev 1/F00A7448, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9668 FPW
rmgr: XLOGlen (rec/tot):  0/  8193, tx:  0, lsn:
1/F00AB488, prev 1/F00A9468, desc: FPI_FOR_HINT , blkref #0: rel
1663/16384/16484 blk 9669 FPW
rmgr: Heaplen (rec/tot):  8/  8063, tx:  10682, lsn:
1/F00AD4A8, prev 1/F00AB488, desc: LOCK off 5: xid 10682: flags 0 LOCK_ONLY
EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19787 FPW
rmgr: Heaplen (rec/tot):  8/  8063, tx:  10684, lsn:
1/F00AF440, prev 1/F00AD4A8, desc: LOCK off 2: xid 10684: flags 0 LOCK_ONLY
EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19774 FPW
rmgr: Heaplen (rec/tot): 14/  4657, tx:  10682, lsn:
1/F00B13D8, prev 1/F00AF440, desc: UPDATE off 5 xmax 10682 ; new off 4 xmax
0, blkref #0: rel 1663/16384/16422 blk 20075 FPW, blkref #1: rel
1663/16384/16422 blk 19787
rmgr: Heaplen (rec/tot): 14/  4657, tx:  10684, lsn:
1/F00B2628, prev 1/F00B13D8, desc: UPDATE off 2 xmax 10684 ; new off 4 xmax
0, blkref #0: rel 1663/16384/16422 blk 20054 FPW, blkref #1: rel
1663/16384/16422 blk 19774
rmgr: Btr
...
rmgr: Heap2   len (rec/tot):  8/68, tx:  0, lsn:
1/FCBD7448, prev 1/FCBD7400, desc: CLEAN remxid 10903, blkref #0: rel
1663/16384/16422 blk 2001
rmgr: Heap2   len (rec/tot):  8/66, tx:  0, lsn:
1/FCBD7490, prev 1/FCBD7448, desc: CLEAN remxid 10903, blkref #0: rel
1663/16384/16422 blk 2003
rmgr: Standby len (rec/tot): 24/50, tx:  0, lsn:
1/FCBD74D8, prev 1/FCBD7490, desc: RUNNING_XACTS nextXid 10907
latestCompletedXid 10906 oldestRunningXid 10907
rmgr: XLOG