[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-24 Thread Jim Crist-Harif


Jim Crist-Harif  added the comment:

Apologies for the delay here. I've pushed a documentation patch at 
https://github.com/python/cpython/pull/29760.

--

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-24 Thread Jim Crist-Harif


Change by Jim Crist-Harif :


--
keywords: +patch
pull_requests: +27998
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/29760

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45883] reuse_address mistakenly removed from loop.create_server

2021-11-23 Thread Jim Crist-Harif


Change by Jim Crist-Harif :


--
keywords: +patch
pull_requests: +27970
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/29733

___
Python tracker 
<https://bugs.python.org/issue45883>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45883] reuse_address mistakenly removed from loop.create_server

2021-11-23 Thread Jim Crist-Harif


New submission from Jim Crist-Harif :

In https://bugs.python.org/issue45129 the deprecated `reuse_address` parameter 
to `create_datagram_endpoint` was removed. This PR mistakenly removed this 
parameter from `create_server` as well (where it wasn't deprecated).

--
components: asyncio
messages: 406876
nosy: asvetlov, jcristharif, yselivanov
priority: normal
severity: normal
status: open
title: reuse_address mistakenly removed from loop.create_server
type: behavior
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue45883>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45819] Avoid releasing the GIL in nonblocking socket operations

2021-11-16 Thread Jim Crist-Harif


Change by Jim Crist-Harif :


--
components: +asyncio -C API
nosy: +asvetlov, yselivanov

___
Python tracker 
<https://bugs.python.org/issue45819>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45819] Avoid releasing the GIL in nonblocking socket operations

2021-11-16 Thread Jim Crist-Harif


Change by Jim Crist-Harif :


--
keywords: +patch
pull_requests: +27822
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/29579

___
Python tracker 
<https://bugs.python.org/issue45819>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45819] Avoid releasing the GIL in nonblocking socket operations

2021-11-16 Thread Jim Crist-Harif


New submission from Jim Crist-Harif :

In https://bugs.python.org/issue7946 an issue with how the current GIL 
interacts with mixing IO and CPU bound work. Quoting this issue:

> when an I/O bound thread executes an I/O call,
> it always releases the GIL.  Since the GIL is released, a CPU bound
> thread is now free to acquire the GIL and run.  However, if the I/O
> call completes immediately (which is common), the I/O bound thread
> immediately stalls upon return from the system call.  To get the GIL
> back, it now has to go through the timeout process to force the
> CPU-bound thread to release the GIL again.

This issue can come up in any application where IO and CPU bound work are mixed 
(we've found it to be a cause of performance issues in https://dask.org for 
example). Fixing the general problem is tricky and likely requires changes to 
the GIL's internals, but in the specific case of mixing asyncio running in one 
thread and CPU work happening in background threads, there may be a simpler fix 
- don't release the GIL if we don't have to.

Asyncio relies on nonblocking socket operations, which by definition shouldn't 
block. As such, releasing the GIL shouldn't be needed for many operations 
(`send`, `recv`, ...) on `socket.socket` objects provided they're in 
nonblocking mode (as suggested in https://bugs.python.org/issue7946#msg99477). 
Likewise, dropping the GIL can be avoided when calling `select` on 
`selectors.BaseSelector` objects with a timeout of 0 (making it a non-blocking 
call).

I've made a patch 
(https://github.com/jcrist/cpython/tree/keep-gil-for-fast-syscalls) with these 
two changes, and run a benchmark (attached) to evaluate the effect of 
background threads with/without the patch. The benchmark starts an asyncio 
server in one process, and a number of clients in a separate process. A number 
of background threads that just spin are started in the server process 
(configurable by the `-t` flag, defaults to 0), then the server is loaded to 
measure the RPS.

Here are the results:

```
# Main branch
$ python bench.py -c1 -t0
Benchmark: clients = 1, msg-size = 100, background-threads = 0
16324.2 RPS
$ python bench.py -c1 -t1
Benchmark: clients = 1, msg-size = 100, background-threads = 1
Spinner spun 1.52e+07 cycles/second
97.6 RPS
$ python bench.py -c2 -t0
Benchmark: clients = 2, msg-size = 100, background-threads = 0
31308.0 RPS
$ python bench.py -c2 -t1
Benchmark: clients = 2, msg-size = 100, background-threads = 1
Spinner spun 1.52e+07 cycles/second
96.2 RPS
$ python bench.py -c10 -t0
Benchmark: clients = 10, msg-size = 100, background-threads = 0
47169.6 RPS
$ python bench.py -c10 -t1
Benchmark: clients = 10, msg-size = 100, background-threads = 1
Spinner spun 1.54e+07 cycles/second
95.4 RPS

# With this patch
$ ./python bench.py -c1 -t0
Benchmark: clients = 1, msg-size = 100, background-threads = 0
18201.8 RPS
$ ./python bench.py -c1 -t1
Benchmark: clients = 1, msg-size = 100, background-threads = 1
Spinner spun 9.03e+06 cycles/second
194.6 RPS
$ ./python bench.py -c2 -t0
Benchmark: clients = 2, msg-size = 100, background-threads = 0
34151.8 RPS
$ ./python bench.py -c2 -t1
Benchmark: clients = 2, msg-size = 100, background-threads = 1
Spinner spun 8.72e+06 cycles/second
729.6 RPS
$ ./python bench.py -c10 -t0
Benchmark: clients = 10, msg-size = 100, background-threads = 0
53666.6 RPS
$ ./python bench.py -c10 -t1
Benchmark: clients = 10, msg-size = 100, background-threads = 1
Spinner spun 5e+06 cycles/second
21838.2 RPS
```

A few comments on the results:

- On the main branch, any GIL contention sharply decreases the number of RPS an 
asyncio server can handle, regardless of the number of clients. This makes 
sense - any socket operation will release the GIL, and the server thread will 
have to wait to reacquire it (up to the switch interval), rinse and repeat. So 
if every request requires 1 recv and 1 send, a server with background GIL 
contention is stuck at a max of `1 / (2 * switchinterval)` or 200 RPS with 
default configuration. This effectively prioritizes the background thread over 
the IO thread, since the IO thread releases the GIL very frequently and the 
background thread never does.

- With the patch, we still see a performance degradation, but the degradation 
is less severe and improves with the number of clients. This is because with 
these changes the asyncio thread only releases the GIL when doing a blocking 
poll for new IO events (or when the switch interval is hit). With low load (1 
client), the IO thread becomes idle more frequently and releases the GIL. Under 
higher load though the event loop frequently still has work to do at the end of 
a cycle and issues a `selector.select` call with a 0 timeout (nonblocking), 
avoiding releasing the GIL at all during that loop (note the nonlinear effect 
of adding more clients). Since the IO thread still releases the GIL sometimes, 
the background thread still holds the GIL a larger percen

[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-03 Thread Jim Crist-Harif


Jim Crist-Harif  added the comment:

> Is tornado the only example or you are aware of other libraries with such 
> behavior?

A quick survey of other language network stacks didn't turn anything up, *But* 
I also didn't find any implementations (other than asyncio & tornado) that bind 
multiple sockets with a single api call (as `create_server` does).

I think part of the issue here is that dual IPV6 & IPV4 support is 
intentionally disabled in asyncio (and tornado), so two sockets are needed (one 
to support each interface). Other TCP implementations (e.g. both go and rust) 
don't disable this, so one listener == one socket. This makes comparing API 
designs across stacks harder - with e.g. Go it's straightforward to listen on a 
random port on IPV4 & IPV6 with a single TCPListener, since both can be handled 
by a single socket. Since this is disabled (by default) in asyncio we end up 
using 2 sockets and run into the issue described above.

Also note that this issue will trigger for any address that resolves to 
multiple interfaces (not just `host=""`). For example, on osx `localhost` will 
resolve to `::1` and `127.0.0.1` by default, meaning that the following fairly 
straightforward asyncio code has a bug in it:

```python
# Start a server on localhost with a random port
server = await loop.create_server(
EchoServerProtocol,
host="localhost",
port=0
)

# Retrieve and log the port
port = server.sockets[0].getsockname()[1]
print(f"listening at tcp://localhost:{port}")
```

As written, this looks correct enough, but on systems where localhost resolves 
to multiple interfaces this will accidentally listen on multiple ports (instead 
of one). This can be fixed with some additional logic external to asyncio, but 
it makes for a much less straightforward asyncio example.

--

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-03 Thread Jim Crist-Harif


Jim Crist-Harif  added the comment:

If you decline that a change is needed here, at the very least the current 
behavior of `port=0` should be documented. I'd be happy to push up a fix if so.

--

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-03 Thread Jim Crist-Harif


Jim Crist-Harif  added the comment:

> I'm not aware of an OS API call that binds both IPv4 and IPv6 to the same 
> random port.

Sure, but `loop.create_server` is already higher-level than a single OS API 
call. 

By default `create_server` will already bind multiple sockets if `host=""`, 
`host=None`, or if `host` is a list. I'm arguing that the current behavior with 
`port=0` in these situations is unexpected. Other libraries (like tornado) have 
come to the same conclusion, and have implemented logic to handle this that 
seems to work well in practice (though can fail, as you've pointed out).

Is there a use case where the current behavior (binding to multiple random 
ports) is desired?

--

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-02 Thread Jim Crist-Harif


Jim Crist-Harif  added the comment:

Hmmm, I'd find that situation a bit surprising, but I suppose it could happen. 
Looks like tornado just errors, and that seems to work fine for them in 
practice 
(https://github.com/tornadoweb/tornado/blob/790715ae0f0a30b9ee830bfee75bb7fa4c4ec2f6/tornado/netutil.py#L153-L182).
 Binding IPv4 first might help reduce the chance of a collision, since I 
suspect there are more IPv4-only applications than IPv6-only.

--

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-02 Thread Jim Crist-Harif


Jim Crist-Harif  added the comment:

> Is there an OS interface to ensure the same port on both stacks?

I don't think this is needed? Right now the code processes as:

- Expand host + port + family + flags into a list of one or more tuples of 
socket options 
(https://github.com/python/cpython/blob/401272e6e660445d6556d5cd4db88ed4267a50b3/Lib/asyncio/base_events.py#L1432-L1436)
- Iterate through this list, creating a new socket for each option tuple, and 
bind to the corresponding host + port 
(https://github.com/python/cpython/blob/401272e6e660445d6556d5cd4db88ed4267a50b3/Lib/asyncio/base_events.py#L1441-L1464)

In this case, each call to `socket.bind` gets a 0 port, thus binding to a new 
random open port number each time.

What I'm asking for is that if the user passes in `port=0`, then the port is 
extracted in the first call to `socket.bind` when looping and used for all 
subsequent `socket.bind` calls in the loop. This way we only ever choose a 
single random open port rather than 1 for each interface. FWIW, this is also 
what tornado does when `port=0` is provided.

--

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-02 Thread Jim Crist-Harif


Change by Jim Crist-Harif :


--
versions: +Python 3.10, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45693] `loop.create_server` with port=0 uses different ports for ipv4 & ipv6

2021-11-02 Thread Jim Crist-Harif


New submission from Jim Crist-Harif :

To create a new server with `loop.create_server` that listens on all interfaces 
and a random port, I'd expect passing in `host=""`, `port=0` to work (per the 
documentation). However, as written this results in 2 different ports being 
used - one for ipv4 and one for ipv6. Instead I'd expect a single random port 
be determined once, and reused for all other interfaces.

Running the example test code (attached) results in:

```
$ python test.py
listening on 0.0.0.0:38023
listening on :::40899
Traceback (most recent call last):
  File "/home/jcristharif/Code/distributed/test.py", line 36, in 
asyncio.run(main())
  File 
"/home/jcristharif/miniconda3/envs/dask/lib/python3.9/asyncio/runners.py", line 
44, in run
return loop.run_until_complete(main)
  File 
"/home/jcristharif/miniconda3/envs/dask/lib/python3.9/asyncio/base_events.py", 
line 642, in run_until_complete
return future.result()
  File "/home/jcristharif/Code/distributed/test.py", line 30, in main
assert len(ports) == 1, "Only 1 port expected!"
AssertionError: Only 1 port expected!
```

This behavior can be worked around by manually handling `port=0` outside of 
asyncio, but as it stands naive use can result in accidentally listening on 
multiple ports.

--
components: asyncio
files: test.py
messages: 405530
nosy: asvetlov, jcristharif, yselivanov
priority: normal
severity: normal
status: open
title: `loop.create_server` with port=0 uses different ports for ipv4 & ipv6
type: behavior
versions: Python 3.9
Added file: https://bugs.python.org/file50421/test.py

___
Python tracker 
<https://bugs.python.org/issue45693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com