[HACKERS] in drop database, auto-vacuum and immediate shutdown concurrency scene, hot-standby server redo FATAL

2016-03-29 Thread lannis
hi all,

I found that hot-standby server redo coredumped when the master do a
concurrency test in drop database, auto-vacuum and immediate shutdown scene.

Amm...the test steps as follow:
1. execute select txid_current() to force the auto vacuum launch frequenctly
2. in another session drop the database and concurrently, kill the master
3. start the master
4. execute select txid_current()
5. then i found the hot standby server redo failed in FATAL like this:

FATAL:  incorrect index offsets supplied
CONTEXT:  xlog redo vacuum: rel 1663/16638/13309; blk 1995,
lastBlockVacuumed 1994
LOG:  startup process (PID 31968) was terminated by signal 1: Hangup

I use pg_xlogdump to hack the wal file and found that: not only the btree
vacuum record but also heap clean/freeze/visible records, the tuples in
those records seem very strange... there are very the same with the record
before kill the master. And all the wal files are replicated from master...
so i think the problem is referred to master rather than standby.

During to the decoding records, the problem is very clear:
in drop database process, we take a checkpoint request, yeah, that's fine.
But before we insert the drop database record, the master got killed.
the transaction has not been committed.

the drop database code logic is like this:

dropDatabaseDependencies
DropDatabaseBuffers-- drop the dirty buffer directly
ForgetDatabaseFsyncRequests
RequestCheckpoint  -- flush nothing
(here we got killed)
remove_dbtablespaces  -- insert the wal record

see, we invalid the dirty buffer directly, that's why there are two same
vacuum record in wal.

and in the standby server, for the database drop transaction has not been
committed and no drop database record has been replicated. in database redo,
the startup process would not to invalid the buffer, so the standby server
thinks that the first and second vacuum are all valid.


the buffer dirty tag should not be cleaned up before the buffer has been
flush to disk.
that's a rule to keep consistency. but the rule has been broken in drop
database scene.

suggestion:
before we dropped the dirty buffer, we should request a checkpoint first
'cause in create database we request checkpoint for twice...


regards,

fanbin




--
View this message in context: 
http://postgresql.nabble.com/in-drop-database-auto-vacuum-and-immediate-shutdown-concurrency-scene-hot-standby-server-redo-FATAL-tp5895675.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: redo failed in physical streaming replication while stopping the master server

2016-03-03 Thread lannis
Thanks for your reply.

If we only take replay for consideration, yeah, we do this header check
until we've read the page first.

But thanks to the master xlog generator, we know that:
when we try advance XLOG insert buffer (page), we treate the new page header
as short header at first.
then we use this condition to make it a long header.

if ((NewPage->xlp_pageaddr.xrecoff % XLogSegSize) == 0)
{
XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;

NewLongPage->xlp_sysid = ControlFile->system_identifier;
NewLongPage->xlp_seg_size = XLogSegSize;
NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
NewPage   ->xlp_info |= XLP_LONG_HEADER;

Insert->currpos = ((char *) NewPage) +SizeOfXLogLongPHD;
}

So in the replay scenario, before we read the page from wal segment file,
using the specical RecPtr which point to the next page header address, can
we predicat the page header is a long or short?

regards,

fanbin





--
View this message in context: 
http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961p5890391.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] redo failed in physical streaming replication while stopping the master server

2016-03-01 Thread lannis
Hi all,

Issue:
I use hot standby stream replication in PostgreSQL 9.2.X.
And after i shut down the master server in fast stop mode, i compared the
xlog dump files between the master and slave and found that the shutdown
checkpoint was not replicated to the slave。
Then i check the pg_log in slave and found that redo process failed in
"record with zero length at %X/%X" during master shutdown 
startup process terminated current walreceiver and try to reconnect the
master but failed 'cause master is in shutting down mode.
Theoretically, all the wal records should be replicated to the slave when
the master shutdown in normal mode.
I read the source code and found that when we read a record to replay, we
use the last EndRecPtr to get an exact xlog page containing the next record.
If the EndRecPtr points to the end of the last page and the freespace of
that page is less than SizeOfXLogRecord, we align it to the next page.
I notice that there is an annotation below that code:
[1]
/*
 * RecPtr is pointing to end+1 of the previous WAL record.  We 
must
 * advance it if necessary to where the next record starts.  
First,
 * align to next page if no more records can fit on the current 
page.
 */
if (XLOG_BLCKSZ - (RecPtr->xrecoff % XLOG_BLCKSZ) < 
SizeOfXLogRecord)
NextLogPage(*RecPtr);

/* Check for crossing of xlog logid boundary */
if (RecPtr->xrecoff >= XLogFileSize)
{
(RecPtr->xlogid)++;
RecPtr->xrecoff = 0;
}
/*
 * If at page start, we must skip over the page header.  But we 
can't
 * do that until we've read in the page, since the header size 
is
 * variable.
 */
The scenario is that:
1. when we do the shutdown checkpoint, we first advance the xlog buffer,
then checkpointguts, then log the checkpoint.
2. for the slave receiver, we received only the xlog page header of the next
page because the shutdown checkpoint record has not been assembled.
3. for the slave recovery, we request the next record using xlogpageread
with an LSN exactly pointing to the next page boundary.
4. for xlog page read, it uses this condition to confirm that receiver has
received some records.
[2] /* See if we need to retrieve more data */
if (readFile < 0 ||
(readSource == XLOG_FROM_STREAM && !XLByteLT(*RecPtr, 
receivedUpto)))
Here, the RecPtr points to the page boundary[1], receivedUpto points to the
end of page header(2). So it 
thinks that receiver has just received some records, so it returns the page
to caller(readrecord).
5. Readrecord check the pageheader ok in this page, and when it try to read
the record, it gets nothing...only a pageheader in the xlog page...

I think the problem is that we try to get an xlog page containing the
"record", and it should be a record, not a page boundary.

Can we use current boudary RecPtr to calculate the true record in the next
page ? Cause we know that next page is a long page header or a short page
header. I don't know the reason why we did not fix this problem in Postgres
9.2, even in 9.6 devel. 
Does this can work?
if ((RecPtr->xrecoff % XLogSegSize) == 0)
XLByteAdvance((*RecPtr), SizeOfXLogLongPHD)
else
 XLByteAdvance((*RecPtr), SizeOfXLogShortPHD)

yours,
sincerely

fanbin




--
View this message in context: 
http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers