Re: [PATCH] [MEDIUM] Improve "no free ports" error case

Krishna Kumar (Engineering) Wed, 08 Mar 2017 23:21:07 -0800

Hi Willy,

Thanks for your comments.


1. About 'retries', I am not sure if it works for connect() failing
synchronously on the
local system (as opposed to getting a timeout/refused via callback). The
document
on retries says:

"   <value>   is the number of times a connection attempt should be retried
on
              a server when a connection either is refused or times out. The
              default value is 3.
"

The two conditions above don't fall in our use case. The way I understood
was that
retries happens during the callback handler. Also I am not sure if there is
any way to
circumvent the "1 second" gap for a retry.

2. For nolinger, it was not recommended in the document, and also I wonder
if any data
loss can happen if the socket is not lingered for some time beyond the FIN
packet that
the remote server sent for doing the close(), delayed data packets, etc.

3. Ports: Actually each HAProxy process has 400 ports limitation to a
single backend,
and there are many haproxy processes on this and other servers. The ports
are split per
process and per system. E.g. system1 has 'n' processes and each have a
separate port
range from each other, system2 has 'n' processes and a completely different
port range.
For infra reasons, we are restricting the total port range. The unique
ports for different
haproxy processes running on same system is to avoid attempting to use the
same port
(first port# in the range) by two processes and failing in connect, when
attempting to
connect to the same remote server. Hope I explained that clearly.

Thanks,
- Krishna


On Thu, Mar 9, 2017 at 12:19 PM, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Krishna,
>
> On Thu, Mar 09, 2017 at 12:03:19PM +0530, Krishna Kumar (Engineering)
> wrote:
> > Hi Willy,
> >
> > We use HAProxy as a Forward Proxy (I know this is not the intended
> > application for HAProxy) to access outside world from within the DC, and
> > this requires setting a source port range for return traffic to reach the
> > correct
> > box from which a connection was established. On our production boxes, we
> > see around 500 "no free ports" errors per day, but this could increase to
> > about 120K errors during big sale events. The reason for this is due to
> > connect getting a EADDRNOTAVAIL error, since an earlier closed socket
> > may be in last-ack state, as it may take some time for the remote server
> to
> > send the final ack.
> >
> > The attached patch reduces the number of errors by attempting more ports,
> > if they are available.
> >
> > Please review, and let me know if this sounds reasonable to implement.
>
> Well, while the patch looks clean I'm really not convinced it's the correct
> approach. Normally you should simply be using the "retries" parameter to
> increase the amount of connect retries. There's nothing wrong with setting
> it to a really high value if needed. Doesn't it work in your case ?
>
> Also a few other points :
>   - when the remote server sends the FIN with the last segment, your
>     connection ends up in CLOSE_WAIT state. Haproxy then closes as
>     well, sending a FIN and your socket ends up in LAST_ACK waiting
>     for the server to respond. You may instead ask haproxy to close
>     with an RST by setting "option nolinger" in the backend. The port
>     will then always be free locally. The side effect is that if the
>     RST is lost, the SYN of a new outgoing connection may get an ACK
>     instead of a SYN-ACK as a reply and will respond to it with an
>     RST and try again. This will result in all connections working,
>     some taking slightly longer a time (typically 1 second).
>
>   - 500 outgoing ports is a very low value. You should keep in mind
>     that nowadays most servers use 60 seconds FIN_WAIT/TIME_WAIT
>     delays (the remote server remains in FIN_WAIT1 while waiting for
>     your ACK, then enters TIME_WAIT when receiving your FIN). So with
>     only 500 ports, you can *safely* support only 500/60 = 8 connections
>     per second. Fortunately in practice it doesn't work like this
>     since most of the time connections are correctly closed. But if
>     you start to enter big trouble, you need to understand that you
>     can very quickly reach some limits. And 500 outgoing ports means
>     you don't expect to support more than 500 concurrent conns per
>     proxy, which seems quite low.
>
> Thus normally what you're experiencing should only be dealt with
> using configuration :
>   - increase retries setting
>   - possibly enable option nolinger (backend only, never on a frontend)
>   - try to increase the available source port ranges.
>
> Regards,
> Willy
>

Re: [PATCH] [MEDIUM] Improve "no free ports" error case

Reply via email to