At 04:14 AM 8/18/2008, Robert Watson wrote:

On Sun, 3 Aug 2008, Robert Watson wrote:

This is an advance warning that, late next week, I will be merging a fairly large set of changes to the IPv4 and IPv6 protocols layered over the inpcb/inpcbinfo kernel infrastructure. To be specific, this affects TCP, UDP, and raw sockets on both IPv4 and IPv6. I will post a further e-mail announcement along with patch set and schedule in a day or two once it's prepared.

FYI: This patch has now been committed to Subversion. I'll keep a close eye out for difficulties; if you run into issues, please send me an e-mail (and CC stable@).


Hi Robert,
I just did a buildworld/kernel in case your commit fixed the routing bugs, but I am still seeing those bogus arp / routing table entries. I narrowed it down to the commits below. I dont think its the intel stuff, as another user reported the same issue using bce nics.

date=2008.07.30.18.00.00
and
date=2008.07.31.00.00.00

Updating collection src-all/cvs
 Edit src/sys/conf/files
  Add delta 1.1243.2.32 2008.07.30.20.35.41 kmacy
 Checkout src/sys/dev/e1000/LICENSE
 Checkout src/sys/dev/e1000/README
 Checkout src/sys/dev/e1000/e1000_80003es2lan.c
 Checkout src/sys/dev/e1000/e1000_80003es2lan.h
 Checkout src/sys/dev/e1000/e1000_82540.c
 Checkout src/sys/dev/e1000/e1000_82541.c
 Checkout src/sys/dev/e1000/e1000_82541.h
 Checkout src/sys/dev/e1000/e1000_82542.c
 Checkout src/sys/dev/e1000/e1000_82543.c
 Checkout src/sys/dev/e1000/e1000_82543.h
 Checkout src/sys/dev/e1000/e1000_82571.c
 Checkout src/sys/dev/e1000/e1000_82571.h
 Checkout src/sys/dev/e1000/e1000_82575.c
 Checkout src/sys/dev/e1000/e1000_82575.h
 Checkout src/sys/dev/e1000/e1000_api.c
 Checkout src/sys/dev/e1000/e1000_api.h
 Checkout src/sys/dev/e1000/e1000_defines.h
 Checkout src/sys/dev/e1000/e1000_hw.h
 Checkout src/sys/dev/e1000/e1000_ich8lan.c
 Checkout src/sys/dev/e1000/e1000_ich8lan.h
 Checkout src/sys/dev/e1000/e1000_mac.c
 Checkout src/sys/dev/e1000/e1000_mac.h
 Checkout src/sys/dev/e1000/e1000_manage.c
 Checkout src/sys/dev/e1000/e1000_manage.h
 Checkout src/sys/dev/e1000/e1000_nvm.c
 Checkout src/sys/dev/e1000/e1000_nvm.h
 Checkout src/sys/dev/e1000/e1000_osdep.c
 Checkout src/sys/dev/e1000/e1000_osdep.h
 Checkout src/sys/dev/e1000/e1000_phy.c
 Checkout src/sys/dev/e1000/e1000_phy.h
 Checkout src/sys/dev/e1000/e1000_regs.h
 Checkout src/sys/dev/e1000/if_em.c
 Checkout src/sys/dev/e1000/if_em.h
 Checkout src/sys/dev/e1000/if_igb.h
 Edit src/sys/kern/kern_synch.c
  Add delta 1.302.2.3 2008.07.30.18.28.09 rwatson
 Edit src/sys/kern/sys_process.c
  Add delta 1.145.2.1 2008.07.30.19.49.10 jhb
 Edit src/sys/netinet/tcp_subr.c
  Add delta 1.300.2.4 2008.07.30.20.35.41 kmacy
 Edit src/sys/netinet/tcp_syncache.c
  Add delta 1.130.2.9 2008.07.30.20.35.41 kmacy
  Add delta 1.130.2.10 2008.07.30.20.51.20 kmacy
 Edit src/sys/netinet/tcp_syncache.h
  Add delta 1.1.2.1 2008.07.30.20.35.41 kmacy
 Edit src/sys/netinet/tcp_usrreq.c
  Add delta 1.163.2.4 2008.07.30.20.35.41 kmacy
 Edit src/sys/netinet/udp_usrreq.c
  Add delta 1.218.2.1 2008.07.30.21.23.21 bz
 Edit src/sys/netinet6/ip6_input.c
  Add delta 1.95.2.1 2008.07.30.21.23.21 bz
 Edit src/sys/netinet6/ip6_var.h
  Add delta 1.39.2.2 2008.07.30.21.23.21 bz
 Edit src/sys/sys/socket.h
  Add delta 1.95.2.3 2008.07.30.19.35.40 kmacy
 Edit src/sys/ufs/ufs/ufs_lookup.c
  Add delta 1.83.2.2 2008.07.30.21.43.42 jhb
 Edit src/sys/vm/vm_object.c
  Add delta 1.385.2.2 2008.07.30.21.43.42 jhb
 Edit src/sys/vm/vm_object.h
  Add delta 1.114.2.1 2008.07.30.21.43.42 jhb
 Edit src/sys/vm/vnode_pager.c
  Add delta 1.236.2.2 2008.07.30.21.43.42 jhb


        ---Mike




Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge


The thrust of this change is to replace the mutexes protecting the inpcb and inpcbinfo data structures with read-write locks (rwlocks). These structures represent, respectively, particular sockets and the global socket lists for all socket types in IPv4 and IPv6 except for SCTP. When you run netstat, inpcbinfo is the data structure referencing all connections, and each line in the nestat output reflects the contents of a specific inpcb.

In the current stage of this work, the intent is to improve performance for datagram-related protocols on SMP systems by allowing concurrent acquisition of both global and connection locks during receive and transmit. This is possible because, in the common case, no connection or global state is modified during UDP/raw receive and transmit at the IP layer, so a read lock is sufficient to prevent data in those structures from unexpectedly changing. For receive, socket layer state is modified, but this is separately protected by socket layer locks. On transmit, no state is modified at any layer, so in principle we will allow fully parallel transmit from multiple threads down to about the routing and network interface layers, whereas previously they would bottleneck in UDP.

The applications targeted by this change are threaded UDP server applications, such as BIND9, nsd, and UDP-based memcached. Kris Kennaway and Paul Saab have done fairly extensive testing with the changes and demonstrated significant performance improvements due to reduced contention and overhead. Perhaps they can mention some of those numbers in a follow-up to this post.

The reason for the heads up is that, while carefully-tested, changes of this sort do come with risks. We've carefully structured them so as to avoid breaking the ABIs for netstat, etc, but it's not impossible that some problems will arise as the changes settle. The goal, however, is to see these performance improvements in 7.1, and since they've had a bit to shake out in 8.x and seen some heavy use, I think now is the right time to merge them.

In any case, I will send out e-mail in a couple of days with a proposed merge patch and schedule for merging, and perhaps if you are in a positition where you might benefit from these improvements, or have interesting UDP or raw-socket based applications running on 7.x, you could test the candidate patch before it's merged, reporting any problems. Unless I receive negative feedback, I will plan on merging the changes late in the week, and keep a close eye on stable@ for any reports of problems.

Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to