On 05/30/2014 12:53 AM, John Lumby wrote:
Date: Thu, 29 May 2014 18:00:28 -0300
Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal
and patch
From: klaussfre...@gmail.com
To: hlinnakan...@vmware.com
CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
On 05/29/2014 11:34 PM, Claudio Freire wrote:
On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
On 05/29/2014 04:12 PM, John Lumby wrote:
On 05/28/2014 11:52 PM, John Lumby wrote:
The patch seems to assume that you can put the aiocb struct in shared
memory, initiate an asynchronous I/O request from one process, and wait
for its completion from another process. I'm pretty surprised if that
works on any platform.
It works on linux. Actually this ability allows the asyncio
implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,. since librt takes care of the
waiting.
specifically, no need for extra wait-for-io control blocks
such as in bufmgr's WaitIO()
[checks]. No, it doesn't work. See attached test program.
Thanks for checking and thanks for coming up with that test program.
However, yes, it really does work -- always (on linux).
Your test program is doing things in the wrong order -
it calls aio_suspend *before* aio_error.
However, the rule is, call aio_suspend *after* aio_error
and *only* if aio_error returns EINPROGRESS.
I see no such a rule in the man pages of any of the functions involved.
And it wouldn't matter anyway; the behavior is exactly the same if you
aio_error() first.
See the code changes to fd.c function FileCompleteaio()
to see how we have done it. And I am attaching corrected version
of your test program which runs just fine.
As Claudio mentioned earlier, the way FileCompleteaio() uses aio_suspend
is just a complicated way of polling. You might as well replace the
aio_suspend() calls with pg_usleep().
It kinda seems to work sometimes, because of the way it's implemented in
glibc. The aiocb struct has a field for the result value and errno, and
when
the I/O is finished, the worker thread fills them in. aio_error() and
aio_return() just return the values of those fields, so calling
aio_error()
or aio_return() do in fact happen to work from a different process.
aio_suspend(), however, is implemented by sleeping on a process-local
mutex,
which does not work from a different process.
Even if it worked on Linux today, it would be a bad idea to rely on it
from
a portability point of view. No, the only sane way to make this work is
that
the process that initiates an I/O request is responsible for completing
it.
If another process needs to wait for an async I/O to complete, we must
use
some other means to do the waiting. Like the io_in_progress_lock that we
already have, for the same purpose.
But calls to it are timeouted by 10us, effectively turning the thing
into polling mode.
We don't want polling... And even if we did, calling aio_suspend() in a way
that's known to be broken, in a loop, is a pretty crappy way of polling.
Well, as mentioned earlier, it is not broken. Whether it is efficient I
am not sure.
I have looked at the mutex in aio_suspend that you mentioned and I am not
quite convinced that, if caller is not the original aio_read process,
it renders the suspend() into an instant timeout. I will see if I can
verify that.
I don't see the point of pursuing this design further. Surely we don't
want to use polling here, and you're relying on undefined behavior
anyway. I'm pretty sure aio_return/aio_error won't work from a different
process on all platforms, even if it happens to work on Linux. Even on
Linux, it might stop working if the underlying implementation changes
from the glibc pthread emulation to something kernel-based.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers