---------------------------------------- > From: st...@mit.edu > Date: Mon, 23 Jun 2014 16:04:50 -0700 > Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch > To: johnlu...@hotmail.com > CC: klaussfre...@gmail.com; hlinnakan...@vmware.com; > pgsql-hackers@postgresql.org > > On Mon, Jun 23, 2014 at 2:43 PM, John Lumby <johnlu...@hotmail.com> wrote: >> It is when some *other* backend gets there first with the ReadBuffer that >> things are a bit trickier. The current version of the patch did polling for >> that case >> but that drew criticism, and so an imminent new version of the patch >> uses the sigevent mechanism. And there are other ways still. > > I'm a bit puzzled by this though. Postgres *already* has code for this > case. When you call ReadBuffer you set the bits on the buffer
Good question. Let me explain. Yes, postgresql has code for the case of a backend is inside a synchronous read() or write(), performed from a ReadBuffer(), and some other backend wants that buffer. asynchronous aio is initiated not from ReadBuffer but from PrefetchBuffer, and performs its aio_read into an allocated, pinned, postgresql buffer. This is entirely different from the synchronous io case. Why? Because the issuer of the aio_read (the "originator") is unaware of this buffer pinned on its behalf, and is then free to do any other reading or writing it wishes, such as more prefetching or any other operation. And furthermore, it may *never* issue a ReadBuffer for the block which it prefetched. Therefore, asynchronous IO is different from synchronous IO, and a new bit, BM_AIO_IN_PROGRESS, in the buf_header is required to track this aio operation until completion. I would encourage you to read the new postgresql-prefetching-asyncio.README in the patch file where this is explained in greater detail. > indicating I/O is in progress. If another backend does ReadBuffer for > the same block they'll get the same buffer and then wait until the > first backend's I/O completes. ReadBuffer goes through some hoops to > handle this (and all the corner cases such as the other backend's I/O > completing and the buffer being reused for another block before the > first backend reawakens). It would be a shame to reinvent the wheel. No re-invention! Actually some effort has been made to use the existing functions in bufmgr.c as much as possible rather than rewriting them. > > The problem with using the Buffers I/O in progress bit is that the I/O > might complete while the other backend is busy doing stuff. As long as > you can handle the I/O completion promptly -- either in callback or > thread or signal handler then that wouldn't matter. But I'm not clear > that any of those will work reliably. They both work reliably, but the criticism was that backend B polling an aiocb of an aio issued by backend A is not documented as being supported (although it happens to work), hence the proposed change to use sigevent. By the way, on the "will it actually work though?" question which several folks have raised, I should mention that this patch has been in semi-production use for almost 2 years now in different stages of completion on all postgresql releases from 9.1.4 to 9.5 devel. I would guess it has had around 500 hours of operation by now. I'm sure there are bugs still to be found but I am confident it is fundamentally sound. > > -- > greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers