Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

johnlumby Sun, 24 Aug 2014 14:45:11 -0700

Thanks for the replies and thoughts.

On 08/19/14 18:27, Heikki Linnakangas wrote:

On 08/20/2014 12:17 AM, John Lumby wrote:
I am attaching a new version of the patch for consideration in thecurrent commit fest.
Thanks for working on this!
Relative to the one I submitted on 25 June inbay175-w412ff89303686022a9f16aa3...@phx.gblthe method for handling aio completion using sigevent has beenre-written to use
signals exclusively rather than a composite of signals and LWlocks,
and this has fixed the problem I mentioned before with the LWlockmethod.
ISTM the patch is still allocating stuff in shared memory that reallydoesn't belong there. Namely, the BufferAiocb structs. Or at leastparts of it; there's also a waiter queue there which probably needs tolive in shared memory, but the rest of it does not.

Actually the reason the BufferAiocb ( the postgresql block correspondingto the aio's aiocb)must be located in shared memory is that, as you said, it acts asanchor for the waiter list.

See further comment below.

At least BufAWaitAioCompletion is still calling aio_error() on an AIOrequest that might've been initiated by another backend. That's not OK.


Yes,   you are right,  and I agree with this one  -
I will add a aio_error_return_code field in the BufferAiocb
and only the originator will set this from the real aiocb.

Please write the patch without atomic CAS operation. Just use a spinlock.

Umm, this is a new criticism I think. I use CAS for things otherthan locking,such as add/remove from shared queue. I suppose maybe a spinlock onthe entire queuecan be used equivalently, but with more code (extra confusion) andworse performance(coarser serialization). What is your objection to using gcc'satomic ops? Portability?

There's a patch in the commitfest to add support for that,


sorry,  support for what? There are already spinlocks in postgresql,

you mean some new kind? please point me at it with hacker msgid orsomething.

but it's not committed yet, and all thoseUSE_AIO_ATOMIC_BUILTIN_COMP_SWAP ifdefs make the patch more difficultto read. Same with all the other #ifdefs; please just pick a methodthat works.


Ok,  yes,   the ifdefs are unpleasant.    I will do something about that.

Ideally they would be entirely confined to header files and only macrofunctionsin C files - maybe I can do that. And eventually when the dust hassettled

eliminate obsolete ifdef blocks altogether.

Also, please split prefetching of regular index scans into a separatepatch. It's orthogonal to doing async I/O;


actually not completely orthogonal, see next

we could prefetch regular index scans with posix_fadvise too, and AIOshould be useful for prefetching other stuff.


On 08/19/14 19:10, Claudio Freire wrote:

On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:

Also, please split prefetching of regular index scans into a separate patch. ...

That patch already happened on the list, and it wasn't a win in many
cases. I'm not sure it should be proposed independently of this one.
Maybe a separate patch, but it should be considered dependent on this.

I don't have an archive link at hand atm, but I could produce one later.

Several people have asked to split this patch into several smaller onesand I

have thought about it.     It would introduce some awkward dependencies.

E.g. splitting the scanner code (index, relation-heap) into separatepatch

from aio code would mean that the scanner patch becomes dependent
on the aio patch.     They are not quite orthogonal.

The reason is that the scanners call a new function, DiscardBuffer(blockid)

when aio is in use. We can get around it by providing a stub forthat function

in the scanner patch,   but then there is some risk of someone getting the

wrong version of that function in their build. It just adds yet morecomplexity

and breakage opportunities.


- Heikki

One further comment concerning these BufferAiocb and aiocb control blocks
being in shared memory :

I explained above that the BufferAiocb must be in shared memory forwait/post.

The aiocb does not necessarily have to be in shared memory,
but since there is a one-to-one relationship between BufferAiocb and aiocb,
it makes the code much simpler ,  and the operation much more efficient,
if the aiocb is embedded in the BufferAiocb as I have it now.

E.g. if the aiocb is in private-process memory, then an additionalallocation

scheme is needed (fixed number?   palloc()in'g extra ones as needed?  ...)

which adds complexity, and probably some inefficiency since a sharedpool is usuallymore efficient (allows higher maximum per process etc), and therewould have to besome pointer de-referencing from BufferAiocb to aiocb adding some(small) overhead.

I understood your objection to use of shared memory as being that youdon't wanta non-originator to access the originator's aiocb using aio_xxx calls,and, with theone change I've said I will do above (put the aio_error retcode in theBufferAiocb)I will have achieved that requirement. I am hoping this answers yourobjections

concerning shared memory.

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

Reply via email to