Re: Request for more intelligent local port allocation algorithm

2019-02-06 Thread David King via freebsd-net
Just to add to this, if anyone is doing some work on the outbound tcp
connection, could they also have a look at the bug here
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210726

Thanks!

On Wed, 6 Feb 2019 at 15:15, Paul  wrote:

> Hi dev team,
>
> It's not a secret that when application is trying to establish new TCP
> connection, without
> first binding a socket to specific local interface address, OS handles
> that automatically.
> Unfortunately there is a catch, that lies in a different logic of local
> port allocation:
> (1) when socket is bound before connect() vs (2) when it is not. When
> allocating the port
> in in_pcb_lport() by checking whether different ports are free, using
> in_pcblookup_local(),
> the behaviour is following:
>
> (1) Bound, ie laddr is assigned with specific address:
> Port is considered occupied only if there is a PCBs that matches both
> laddr and lport
>
> (2) Not bound, ie laddr == INADDR_ANY:
> Port is considered occupied if there is any PCBs that only matches
> lport. What this
> means is that in order to allocate a port none of the all available
> local addresses
> should have it allocated, even though this requirement is ridiculous,
> since we are
> allocating only one PCB
>
> Looking though the code, it seems that (2) is due to the fact that
> tcp_connect() first
> allocates the port, indirectly through the call to in_pcbbind() and only
> then allocates
> the actual local address, also indirectly, though the call to
> in_pcbconnect_setup(), that
> in turn calls in_pcbladdr(). So, probably, in order to guarantee that
> in_pcbconnect_setup()
> will not fail we make sure that all range of local addresses are
> available, no matter
> which one of them is actually selected by in_pcbladdr()?
>
> In real world, this creates serious problems for servers that have a lot
> of outgoing
> connections, for example nginx proxy with a lot of open HTTP2 connections.
> In order to
> avoid this limitation we have created workarounds within the nginx config
> as well as
> within our  own software, basically by having 50 local addresses and only
> following the
> scenario (1). Alas, all of the built-in Unix utilities as well as other
> software always
> follow scenario (2). As the result given large number of connections there
> may be points
> in time, when whole range of ports is occupied by at least one local
> address. Even worse is
> the outcome of such condition: when in_pcb_lport() travels over the range
> of possible port
> numbers, making myriad of calls to in_pcblookup_local(), some  kind of
> important lock is
> being held withing the kernel. So important that it leads to a complete
> lock of the system.
> Even the direct terminal access is not available: it is not responsive.
> The more calls to
> connect through scenario (2) there are the longer it takes the system to
> unfreeze. Given
> some circumstances, the only option is hard reset.
>
> Is it possible to somehow update the code that does connect via scenario
> (2) to enable
> more intelligent port allocation, like for example allocating local
> address and port simultaneously
>
> ___
> freebsd-sta...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Request for more intelligent local port allocation algorithm

2019-02-06 Thread Paul
Hi dev team,

It's not a secret that when application is trying to establish new TCP 
connection, without
first binding a socket to specific local interface address, OS handles that 
automatically.
Unfortunately there is a catch, that lies in a different logic of local port 
allocation: 
(1) when socket is bound before connect() vs (2) when it is not. When 
allocating the port  
in in_pcb_lport() by checking whether different ports are free, using 
in_pcblookup_local(),
the behaviour is following:

(1) Bound, ie laddr is assigned with specific address: 
Port is considered occupied only if there is a PCBs that matches both laddr 
and lport

(2) Not bound, ie laddr == INADDR_ANY: 
Port is considered occupied if there is any PCBs that only matches lport. 
What this  
means is that in order to allocate a port none of the all available local 
addresses  
should have it allocated, even though this requirement is ridiculous, since 
we are 
allocating only one PCB

Looking though the code, it seems that (2) is due to the fact that 
tcp_connect() first 
allocates the port, indirectly through the call to in_pcbbind() and only then 
allocates
the actual local address, also indirectly, though the call to 
in_pcbconnect_setup(), that
in turn calls in_pcbladdr(). So, probably, in order to guarantee that 
in_pcbconnect_setup()
will not fail we make sure that all range of local addresses are available, no 
matter 
which one of them is actually selected by in_pcbladdr()?

In real world, this creates serious problems for servers that have a lot of 
outgoing 
connections, for example nginx proxy with a lot of open HTTP2 connections. In 
order to 
avoid this limitation we have created workarounds within the nginx config as 
well as 
within our  own software, basically by having 50 local addresses and only 
following the 
scenario (1). Alas, all of the built-in Unix utilities as well as other 
software always  
follow scenario (2). As the result given large number of connections there may 
be points
in time, when whole range of ports is occupied by at least one local address. 
Even worse is  
the outcome of such condition: when in_pcb_lport() travels over the range of 
possible port 
numbers, making myriad of calls to in_pcblookup_local(), some  kind of 
important lock is 
being held withing the kernel. So important that it leads to a complete lock of 
the system.
Even the direct terminal access is not available: it is not responsive. The 
more calls to 
connect through scenario (2) there are the longer it takes the system to 
unfreeze. Given 
some circumstances, the only option is hard reset.

Is it possible to somehow update the code that does connect via scenario (2) to 
enable
more intelligent port allocation, like for example allocating local address and 
port simultaneously  

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"