On Sun, Oct 12, 2014 at 3:37 PM, Victor Stinner <victor.stin...@gmail.com>
wrote:

> 2014-10-12 21:55 GMT+02:00 Guido van Rossum <gu...@python.org>:
> > So you're only concerned about the streams example, right?
>
> Oh no sorry, my question was general. It's just that I discovered the
> issue when working on these examples for the documentation.
>

OK. Sorry for misunderstanding.


> > - writer.close() calls self._transport.close(), which is
> > _SelectorTransport.close()
> > - that removes the socket from the selector and schedules a call to
> > _call_connection_lost
>
> It only calls immedialty _call_connection_lost() if the write buffer is
> empty.
>

Which is the case in the streams example. :-)


> Right now, it's not clear for me what "close" means, and it's not well
> documented. When I close a file descriptor, I except that future read
> and write operations will fail. Usually, all operations are blocking,
> so I don't have to worry of on-going operations.
>

Actually, closing a FD doesn't mean that future reads and writes will fail
-- it means that the caller promises not to use that FD any more. The FD
itself may be reused for other purposes. (It's different with IO objects,
they typically maintain state to enforce this promise, e.g. by setting the
FD to -1 or setting and testing an explicit closed flag.)

Also, when closing a socket, what actually happens depends on whether there
are other processes that still have the socket open, or whether there are
other FDs referencing the same underlying kernel data structures (e.g.
using dup() or dup2()). The close() call is not actually synchronous AFAIK,
but the kernel typically attempts to send any data it still has buffered,
as long as the remote side doesn't refuse it. (That's AFAIK for TCP
sockets; I'm not sure what happens with UDP sockets if there are outgoing
packets still buffered in the kernel.)


> In asyncio, it looks like closing a transport stops immediately reading
> but writing is still possible. So it's possible to write into a closed
> transport. Is that correct? It looks like transport.write_eof() is the
> way to block *future* write operations.
>

That's not how it's supposed to be. You should never write to a transport
after closing it.

A TCP connection can be thought of as a *pair* of pipes, one in each
direction (incoming and outgoing). The Transport manages the reading end of
the incoming pipe, and the writing end of the outgoing pipe. Calling
write_eof() closes the writing end of the outgoing pipe (but leaving data
that's still waiting to be transferred to the other end of that pipe in the
buffers), while still allowing more data to flow through the incoming pipe;
data_received() will be called for incoming data. Calling close() makes a
similar promise about the outgoing pipe but forcefully closes the incoming
pipe, promising data_received() will not be called any more. (It is
possible that the process at the other end of the pipes keeps writing; in
that case it will eventually get an error. But if all incoming data has
been received on our end, the other process will receive all outgoing data
that's still buffered or in transit.)

There's an additional wrinkle in that certain transports are unable to
support write_eof() and eof_received() -- in particular this is the case
for TLS. When using such a transport, you have to use application-level
signalling to indicate the of the data -- for example, HTTP can use the
Content-Length header for this purpose, but it can also use
"Transfer-encoding: chunked" for the same purpose.


> My question is: how can I ensure a connection is completely closed?
> Buffer flushed, transport closed, etc.? Or said differently, what is
> the safest way to close a connection controlled by a pair of (reader,
> writer) streams?
>

In the latter case, call writer.close() after you have read all you want to
read.

It looks like a problem with the examples is that they want to close the
event loop, and there are situations where closing the event loop prevents
some data from being sent (or received). But a similar thing can happen
with sockets: I believe that if a process sends a large amount of data and
then closes the socket, the close() call may return while the data is still
in a kernel buffer. If at that point the host crashes, that data will never
be seen by the recipient across the network. And this is why you must use
Content-Length, or Transfer-Encoding: chunked, or some other
application-level protocol that lets the receiving end determine from the
data alone that it has received the final byte -- never rely on
connection_lost() except to release resources.


> Is it safer to call write_eof() before close()?
>

Yes. In fact you can't call write_eof() after close().


> Transport.close() documentation says "buffered data will be flushed
> asynchronously". Does it mean that close() must be followed by "yield
> from writer.drain()"?
>
> https://docs.python.org/dev/library/asyncio-protocol.html#asyncio.BaseTransport.close
>

No, that's not what it means. Beware, you're mixing levels here -- drain()
only applies to stream writers, but the docs you are quoting are for
Transports, which are a much lower-level abstraction.

The drain() call doesn't actually flush anything. It merely may block the
calling coroutine until the Transport's internal write buffer has drained
sufficiently (using two thresholds, see set_write_buffer_limits()), while
letting other tasks and callbacks continue. But if the write buffer is not
filled over the higher threshold, drain() won't block even though there are
unwritten bytes. If you close the stream you merely promise that you won't
be using the stream any more.

So what happens to the bytes you wrote? The stream doesn't have its own
write buffer -- all the buffering is done by the Transport. Closing the
stream calls close() on the Transport, but the transport doesn't then just
throw away its buffers -- it still has the socket registered with the
selector for writing, and it will attempt to write the buffered bytes to
the socket whenever the write handler is called. Once the last byte has
been written (as indicated by the send() return value), the transport (if
it has been closed) will call the protocol's connection_lost() callback,
with a None argument indicating all is well.

However, the kernel may still have the bytes buffered, and my explanation
of kernel-level stuff above still applies. AFAIK even Twisted doesn't
attempt to wait until the kernel has received an ACK from the remote host
that the bytes have been received there -- you need an application-level
protocol if you need such assurance. (For example, if you are using HTTP
PUT or POST to mutate a remote resource, receiving the "200 OK" status
assures you that the remote side has received your bytes.(*))

If you want a little more assurance that your bytes have been sent off to
the network, without implementing an application-level protocol, you could
call transport.set_write_buffer_limits(0, 0) and then call writer.drain().
That will block until the Transport's write buffer is empty. But the other
caveats apply.


> (I'm still asking for the general case, for example when I don't
> control all operations done on the reader nor the writer.)
>

Well, at some point if you don't have control you can't make strong
promises. The best you can do may be to explain how things work, which is
what I have tried in this message.

__________
(*) Even then you may not be out of the woods. Disk controllers have their
own buffers, which are hard to control even for kernel code. And calling
fsync() for every POST or PUT request may reduce your server's performance
to a crawl. And so on... So in the end all you are doing is controlling
probabilities. Such is life.

-- 
--Guido van Rossum (python.org/~guido)

Reply via email to