On Tue, Jul 1, 2014 at 7:18 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
> On Tue, Jul 1, 2014 at 5:51 AM, Heikki Linnakangas > <hlinnakan...@vmware.com> wrote: > > Thanks for the review! > > > > > > On 06/27/2014 09:12 AM, Michael Paquier wrote: > >> > >> Looking at this patch, it does not compile when assertions are enabled > >> because of this assertion in heapam.c: > >> Assert(recdata == data + datalen); > >> It seems like a thinko as removing it makes the code compile. > > > > > > Fixed. > > > > > >> Some comments about the code: > >> 1) In src/backend/access/transam/README: > >> 's/no further action is no required/no further action is required/g' > > > > > > Fixed. > > > > > >> 2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and > >> XLogGetBlockRefIds are missing in src/backend/access/transam/README. > > > > > > Added descriptions of XLogBeginInsert and XLogHasBlockRef. IÍ„'m not sure > > what to do with the latter two. They're not really intended for use by > redo > > routines, but for things like pg_xlogdump that want to extract > information > > about any record type. > > > > > >> 3) There are a couple of code blocks marked as FIXME and BROKEN. There > are > >> as well some NOT_USED blocks. > > > > > > Yeah. The FIXMEs and BROKEN blocks are all in rm_desc routines. They > mostly > > stand for code that used to print block numbers or relfilenodes, which > > doesn't make much sense to do in an ad hoc way in rmgr-specific desc > > routines anymore. Will need to go through them all an decide what's the > > important information to print for each record type. > > > > The few NOT_USED blocks are around code in redo routines that extract > some > > information from the WAL record, that isn't needed anymore. We could > remove > > the fields from the WAL records altogether, but might be useful to keep > for > > debugging purposes. > > > > > >> 4) Current patch crashes when running make check in > >> contrib/test_decoding. > >> Looking closer, wal_level = logical is broken: > >> =# show wal_level; > >> wal_level > >> ----------- > >> logical > >> (1 row) > >> =# create database foo; > >> The connection to the server was lost. Attempting reset: Failed. > > > > > > Fixed. > > > > > >> Looking even closer, log_heap_new_cid is broken: > > > > > > Fixed. > > > > > >> 6) XLogRegisterData creates a XLogRecData entry and appends it in one of > >> the static XLogRecData entries if buffer_id >= 0 or to > >> rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to > the > >> WAL record). XLogRegisterXLogRecData does something similar, but with an > >> entry of XLogRecData already built. What do you think about removing > >> XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes > the > >> interface simpler, and XLogRecData could be kept private in xlog.c. This > >> would need more work especially on gin side where I am seeing for > example > >> constructLeafRecompressWALData directly building a XLogRecData entry. > > > > > > Hmm. I considered that, but punted because I didn't want to rewrite all > the > > places that currently use XLogRecDatas in less straightforward ways. I > kept > > XLogRegisterXLogRecData as an escape hatch for those. > > > > But I bit the bullet now, and got rid of all the > XLogRegisterXLogRecData() > > calls. XLogRecData struct is now completely private to xlog.c. > > > > Another big change in the attached patch version is that > XLogRegisterData() > > now *copies* the data to a working area, instead of merely adding a > > XLogRecData entry to the chain. This simplifies things in xlog.c, now > that > > the XLogRecData concept is not visible to the rest of the system. This > is a > > subtle semantic change for the callers: you can no longer register a > struct > > first, and fill the fields afterwards. > > > > That adds a lot of memcpys to the critical path, which could be bad for > > performance. In practice, I think it's OK, or even a win, because a > typical > > WAL record payload is very small. But I haven't measured that. If it > becomes > > an issue, we could try to optimize, and link larger data blocks to the > rdata > > chain, but memcpy small small chunks of data. There are a few WAL record > > types that don't fit in a small statically allocated buffer, so I > provided a > > new function, XLogEnsureSpace(), that can be called before XLogBegin() to > > allocate a large-enough working area. > > > > These changes required changing a couple of places where WAL records are > > created: > > > > * ginPlaceToPage. This function has a strange split of responsibility > > between ginPlaceToPage and dataPlaceToPage/entryPlaceToPage. > ginPlaceToPage > > calls dataPlaceToPage or entryPlaceToPage, depending on what kind of a > tree > > it is, and dataPlaceToPage or entryPlaceToPage contributes some data to > the > > WAL record. Then ginPlaceToPage adds some more, and finally calls > > XLogInsert(). I simplified the SPLIT case so that we now just create full > > page images of both page halves. We were already logging all the data on > > both page halves, but we were doing it in a "smarter" way, knowing what > kind > > of pages they were. For example, for an entry tree page, we included an > > IndexTuple struct for every tuple on the page, but didn't include the > item > > pointers. A straight full page image takes more space, but not much, so I > > think in any real application it's going to be lost in noise. (again, > > haven't measured it though) > > > > > >> 8) What do you think about adding in the README a description about how > to > >> retrieve the block list modified by a WAL record, for a flow similar to > >> that: > >> record = XLogReadRecord(...); > >> nblocks = XLogGetBlockRefIds(record, blockids); > >> for (i = 0; i < nblocks; i++) > >> { > >> bkpb = XLogGetBlockRef(record, id, NULL); > >> // stuff > >> } > >> pg_xlogdump is a good example of what can be done as well. > > > > > > Yeah, that's probably in order. I don't actually like the way that works > > currently very much. Passing the char[256] blockids array to the function > > feels weird. Will have to think a bit harder about that.. > > > > > >> 9) In ginxlog.c, shouldn't this block of code use XLogGetPayload? > >> + char *payload = XLogRecGetData(record) + > >> sizeof(ginxlogInsert); > >> +#ifdef NOT_USED > >> leftChildBlkno = BlockIdGetBlockNumber((BlockId) > >> payload); > >> +#endif > > > > > > No. XLogRecGetData() is used to get data from the "main chunk", while > > XLogGetPayload() is used to get data associated with a buffer. This does > the > > former. > > > > > >> 10) Shouldn't the call to XLogBeginInsert come first here? > >> @@ -1646,7 +1647,16 @@ spgAddNodeAction(Relation index, SpGistState > >> *state, > >> { > >> XLogRecPtr recptr; > >> > >> - recptr = XLogInsert(RM_SPGIST_ID, > >> XLOG_SPGIST_ADD_NODE, rdata); > >> + XLogRegisterBuffer(0, saveCurrent.buffer, > >> REGBUF_STANDARD); /* orig page */ > >> + XLogRegisterBuffer(1, current->buffer, > >> REGBUF_STANDARD); /* new page */ > >> + if (xlrec.parentBlk == 2) > >> + XLogRegisterBuffer(2, parent->buffer, > >> REGBUF_STANDARD); > >> + > >> + XLogBeginInsert(); > >> + XLogRegisterData(-1, (char *) &xlrec, > >> sizeof(xlrec)); > >> + XLogRegisterData(-1, (char *) newInnerTuple, > >> newInnerTuple->size); > >> + > >> + recptr = XLogInsert(RM_SPGIST_ID, > >> XLOG_SPGIST_ADD_NODE); > > > > > > Yeah. Apparently we have no regression test coverage of this codepath, > or it > > would've triggered an assertion. Fixed now, just by looking at the code, > but > > there's still no test coverage of this so it's likely still broken in > other > > ways. Before this is committed, I think we'll need to go through the > > coverage report and make sure all the XLogInsert() codepaths (and their > redo > > routines) are covered. > > > > > >> 11) Any reason for raising a PANIC here? > >> @@ -1126,7 +1126,7 @@ LWLockRelease(LWLock *l) > >> break; > >> } > >> if (i < 0) > >> - elog(ERROR, "lock %s %d is not held", T_NAME(l), > T_ID(l)); > >> + elog(PANIC, "lock %s %d is not held", T_NAME(l), > T_ID(l)); > >> num_held_lwlocks--; > >> for (; i < num_held_lwlocks; i++) > > > > > > No, leftover from debugging. Fixed. > > > > > >> 14) SPgist redo is crashing (during make installcheck run on a master, > >> redo > >> done on a standby at recovery). > >> (lldb) bt > >> * thread #1: tid = 0x0000, 0x00007fff8a3c2866 > >> libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP > >> * frame #0: 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill > + > >> 10 > >> frame #1: 0x00007fff8d37235c libsystem_pthread.dylib`pthread_kill + > >> 92 > >> frame #2: 0x00007fff9369db1a libsystem_c.dylib`abort + 125 > >> frame #3: 0x000000010ba8d819 > >> postgres`ExceptionalCondition(conditionName=0x000000010bb21ea9, > >> errorType=0x000000010baf019f, fileName=0x000000010bb212dd, > lineNumber=65) > >> + > >> 137 at assert.c:54 > >> frame #4: 0x000000010b5c3a50 > >> postgres`addOrReplaceTuple(page=0x000000010bfee980, > >> tuple=0x00007ffd41822902, size=770059, offset=18) + 688 at spgxlog.c:65 > >> frame #5: 0x000000010b5c031f > postgres`spgRedoMoveLeafs(lsn=71708608, > >> record=0x00007ffd41822800) + 719 at spgxlog.c:263 > >> frame #6: 0x000000010b5bf492 postgres`spg_redo(lsn=71708608, > >> record=0x00007ffd41822800) + 402 at spgxlog.c:988 > >> frame #7: 0x000000010b5e774a postgres`StartupXLOG + 8474 at > >> xlog.c:6780 > >> frame #8: 0x000000010b86a5fe postgres`StartupProcessMain + 430 at > >> startup.c:224 > >> frame #9: 0x000000010b600409 postgres`AuxiliaryProcessMain(argc=2, > >> argv=0x00007fff546ea190) + 1897 at bootstrap.c:416 > >> frame #10: 0x000000010b864998 > >> postgres`StartChildProcess(type=StartupProcess) + 328 at > postmaster.c:5090 > >> frame #11: 0x000000010b862aa9 postgres`PostmasterMain(argc=3, > >> argv=0x00007ffd40c04470) + 5401 at postmaster.c:1212 > >> frame #12: 0x000000010b7a9335 postgres`main(argc=3, > >> argv=0x00007ffd40c04470) + 773 at main.c:227 > > > > > > Fixed. > > > > > >> Also, I wanted to run the WAL replay facility to compare before and > after > >> buffer images, but code crashed before... I am still planning to do so > >> once > >> this code gets more stable though. > > > > > > I did that earlier, and did it now again. I got a single difference from > a > > sp-gist SPLIT_TUPLE operation. I didn't dig deeper right now, will have > to > > investigate it later. > > With the patch, when I ran the simple test which generates lots of > WAL, I got PANIC error. > > =# create extension pgcrypto ; > CREATE EXTENSION > =# create extension pg_trgm ; > CREATE EXTENSION > =# create table hoge (col1 text); > CREATE TABLE > =# create index hogeidx on hoge using gin (col1 gin_trgm_ops); > CREATE INDEX > =# insert into hoge select gen_random_uuid() from > generate_series(1,10000000); > PANIC: too many registered buffers > PANIC: too many registered buffers > The connection to the server was lost. Attempting reset: Failed. > Yep, this is easily reproducible. I imagine that a call to XLogEnsureRecordSpace before START_CRIT_SECTION() is missing in the gin code path. -- Michael