On 06/24/2014 04:29 PM, John Lumby wrote:
On Mon, Jun 23, 2014 at 2:43 PM, John Lumby <johnlu...@hotmail.com> wrote:
It is when some *other* backend gets there first with the ReadBuffer that
things are a bit trickier. The current version of the patch did polling for
that case
but that drew criticism, and so an imminent new version of the patch
uses the sigevent mechanism. And there are other ways still.
I'm a bit puzzled by this though. Postgres *already* has code for this
case. When you call ReadBuffer you set the bits on the buffer
Good question. Let me explain.
Yes, postgresql has code for the case of a backend is inside a synchronous
read() or write(), performed from a ReadBuffer(), and some other backend
wants that buffer. asynchronous aio is initiated not from ReadBuffer
but from PrefetchBuffer, and performs its aio_read into an allocated,
pinned,
postgresql buffer. This is entirely different from the synchronous io case.
Why? Because the issuer of the aio_read (the "originator") is unaware
of this buffer pinned on its behalf, and is then free to do any other
reading or writing it wishes, such as more prefetching or any other
operation.
And furthermore, it may *never* issue a ReadBuffer for the block which it
prefetched.
I still don't see the difference. Once an asynchronous read is initiated
on the buffer, it can't be used for anything else until the read has
finished. This is exactly the same situation as with a synchronous read:
after read() is called, the buffer can't be used for anything else until
the call finishes.
In particular, consider the situation from another backend's point of
view. Looking from another backend (i.e. one that didn't initiate the
read), there's no difference between a synchronous and asynchronous
read. So why do we need a different IPC mechanism for the synchronous
and asynchronous cases? We don't.
I understand that *within the backend*, you need to somehow track the
I/O, and you'll need to treat synchronous and asynchronous I/Os
differently. But that's all within the same backend, and doesn't need to
involve the flags or locks in shared memory at all. The inter-process
communication doesn't need any changes.
The problem with using the Buffers I/O in progress bit is that the I/O
might complete while the other backend is busy doing stuff. As long as
you can handle the I/O completion promptly -- either in callback or
thread or signal handler then that wouldn't matter. But I'm not clear
that any of those will work reliably.
They both work reliably, but the criticism was that backend B polling
an aiocb of an aio issued by backend A is not documented as
being supported (although it happens to work), hence the proposed
change to use sigevent.
You didn't understand what Greg meant. You need to handle the completion
of the I/O in the same process that initiated it, by clearing the
in-progress bit of the buffer and releasing the I/O in-progress lwlock
on it. And you need to do that very quickly after the I/O has finished,
because there might be another backend waiting for the buffer and you
don't want him to wait longer than necessary.
The question is, if you receive the notification of the I/O completion
using a signal or a thread, is it safe to release the lwlock from the
signal handler or a separate thread?
By the way, on the "will it actually work though?" question which several
folks
have raised, I should mention that this patch has been in semi-production
use for almost 2 years now in different stages of completion on all postgresql
releases from 9.1.4 to 9.5 devel. I would guess it has had around
500 hours of operation by now. I'm sure there are bugs still to be
found but I am confident it is fundamentally sound.
Well, a committable version of this patch is going to look quite
different from the first version that you posted, so I don't put much
weight on how long you've tested the first version.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers