Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

Eric Dumazet Fri, 23 Oct 2015 06:05:28 -0700

On Fri, 2015-10-23 at 11:52 +0200, casper....@oracle.com wrote:
> 
> >Ho-hum...  It could even be made lockless in fast path; the problems I see
> >are
> >     * descriptor-to-file lookup becomes unsafe in a lot of locking
> >conditions.  Sure, most of that happens on the entry to some syscall, with
> >very light locking environment, but... auditing every sodding ioctl that
> >might be doing such lookups is an interesting exercise, and then there are
> >->mount() instances doing the same thing.  And procfs accesses.  Probably
> >nothing impossible to deal with, but nothing pleasant either.
> 
> In the Solaris kernel code, the ioctl code is generally not handled a file 
> descriptor but instead a file pointer (i.e., the lookup is done early in 
> the system call).
> 
> In those specific cases where a system call needs to convert a file 
> descriptor to a file pointer, there is only one routines which can be used.
> 
> >     * memory footprint.  In case of Linux on amd64 or sparc64,
> >main()
> >{
> >     int i;
> >     for (i = 0; i < 1<<24; dup2(0, i++))    // 16M descriptors
> >             ;
> >}
> >will chew 132Mb of kernel data (16Mpointer + 32Mbit, assuming sufficient 
> >ulimit -n,
> >of course).  How much will Solaris eat on the same?
> 
> Yeah, that is a large amount of memory.  Of course, the table is only 
> sized when it is extended and there is a reason why there is a limit on 
> file descriptors.  But we're using more data per file descriptor entry.
> 
> 
> >     * related to the above - how much cacheline sharing will that involve?
> >These per-descriptor use counts are bitch to pack, and giving each a 
> >cacheline
> >of its own...  <shudder>
> 
> As I said, we do actually use a lock and yes that means that you really  
> want to have a single cache line for each and every entry.  It does make 
> it easy to have non-racy file description updates.  You certainly do not 
> want false sharing when there is a lot of contention.
> 
> Other data is used to make sure that it only takes O(log(n)) to find the 
> lowest available file descriptor entry.  (Where n, I think, is the returned
> descriptor)


Yet another POSIX deficiency.

When a server deals with 10,000,000+ socks, we absolutely do not care of
this requirement.

O(log(n)) is still crazy if it involves O(log(n)) cache misses.

> 
> Not contended locks aren't expensive.  And all is done on a single cache 
> line.
> 
> One question about the Linux implementation: what happens when a socket in 
> select is closed?  I'm assuming that the kernel waits until "shutdown" is 
> given or when a connection comes in?
> 
> Is it a problem that you can "hide" your listening socket with a thread in 
> accept()?  I would think so (It would be visible in netstat but you can't 
> easily find out why has it)

Again, netstat -p on a server with 10,000,000 sockets never completes.

Never try this unless you are desperate and want to avoid a reboot
maybe.

If you absolutely want to nuke a listener because of untrusted
applications, we better implement a proper syscall.

Android has such a facility.

Alternative would be to extend netlink (ss command from iproute2
package) to carry one pid per socket.

ss -atnp state listening

-> would not have to readlink (/proc/*/fd/*)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

Reply via email to