> hi mike-
>
>> Every once in a while, we see complaints that nfs mounts are
>> failing due
>> to there being no more reserved ports available for outbound rpc
>> communication.  This often happens when using TCP transports,
>> because all
>> outbound connections that are closed go into a TIME_WAIT state.
>
> i know that steved has also been looking at this problem.  he found that
> user-land uses TCP connections with abandon and has put it on a diet.
> and, the client side should take care to avoid using reserved ports
> unless it is absolutely necessary.

Yup. I tested out SteveD's stuff and it certainly cut down on reserved
port usage.  However, I was still able locally DOS my test client by
mounting several thousand mounts within a minute or so.  The problem is
that talking to mountd still requires the reserved port :\

>
>> For various reasons, avoiding the TIME_WAIT state using connect(fd,
>> {AF_UNSPEC}, ...) or doing SO_REUSEADDR may not be the safest way to
>> handle things.
>
> i have a patch for the kernel RPC client to use AF_UNSPEC to reuse the
> port number on connections.  before i head into the wilderness, what's
> unsafe about that?

It's less unsafe than SO_REUSEADDR, however it completely ignores
TIME_WAIT,  which has two purposes (according to W.R. Stevens):

- If the client performs the active close, it allows the socket to
remember to resend the final ACK in response to the server's FIN. 
Otherwise the client would respond with a RST.

- It ensures that no re-incarnations of the same four-tuple occur within 2
* MSL, ensuring that no packets that were lost on the network from the
first incarnation get used as part of the new incarnation.

Avoiding TIME_WAIT altogether keeps TCP from doing a proper full-duplex
close and also allows old packets to screw up TCP state.

>
>> rpcproxyd will create outbound connections and multiplex the
>> transports
>> with any number of simultaneous clients.  There is no support for
>> re-binding your transport once created.  It will also cache outbound
>> server connections for 30 seconds after last use, which
>> greatly helps keep the number of ports used down in a mount-storm
> situation.
>
> the typical RPC client-side connection timeout is 5 minutes.  any reason
> not to use that value instead of 30 seconds?

The timeouts used currently in rpcproxyd were chosen arbitrarily.

Which timeout is 5 minutes? (sparing me the trouble of finding it myself).

I figured 30 for caching un-used tcp connections sounded like a good
number as it means that if TIME_WAIT period is 2 minutes, that the most
TIME_WAIT connections that you can have to a given remote service at any
time is 4.

A bit more thought could be had for timeouts (wishlist):
- TCP connect()s aren't timedout. (uses EINPROGRESS, but will wait
indefinitely, which I think is bounded by the kernel anyway).
- UDP retransmission occurs in the proxy itself, which is currently
hardcoded to retry 5 times, once every two seconds.  I can trivially set
it up so that it gets the actual timeout values from the client though and
use those parameters.

>
> you will also need a unique connection for each program/version number
> combination to a given server; ie you can't share a single socket among
> different program/version numbers (even though some implementations try
> to do this, it is a bad practice, in my opinion).
>

So, I know I discussed this with you last week, however I was under the
impression that that was needed for the case where you support re-binding
of the transport.  I'm not up to speed of who are the users of such a
thing (I'm assuming NFSv4).

Also, AFAICT, the glibc sunrpc stuff will actually bind you to a program
even if the version doesn't exist.  This is used by rpcinfo [-u | -t] for
udpping / tcpping.

> to support IPv6 you will need support for rpcbind versions 3 and 4; but
> that may be an issue outside of rpcproxyd.
>

Okay.  I'm not so familiar with RPCB, but it is just a PMAP on steroids,
right?

It wouldn't be needed for the basic common use of rpcproxyd, however it
would likely be required for cases where clntproxy_create has to do some
lookups (eg: port == 0, maybe even addr == NULL).

>> rpcproxyd is written as a single-threaded/no-signals/select-based
> daemon.
>
> will using select be a scalability issue?  you might be better off with
> something like libevent or using epoll.
>

It may become a scalability issue, although I don't know what the typical
numbers are for live descriptors.  It was written with select cause that's
what I know ;)

In any event, using epoll would likely be a good thing to do.

Mike Waychison

_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to