On Sat, Oct 30, 2010 at 06:55:19PM +0300, Julian Anastasov wrote: > > Hello, > > On Sat, 30 Oct 2010, Simon Horman wrote: > > >>>Could the nf_conntrack changes have caused this? There were also many > >>>MSI and bnx2 updates in 2.6.36, so not sure if it's LVS or not. > >> > >>Hi Howard, > >> > >>Yes, it is very likely that the problem you are seeing > >>is a regression caused by the introduction of full-NAT. > >> > >>There is a fix for this, which will be included in 2.6.37-rc1 > >>but unfortunately it was to invasive to include in 2.6.36 as > >>the problem was noticed fairly late in the release cycle. > > If Howard is happy with this idea we can prepare > single or separated patches for testing with 2.6.36. It will > make the conntrack optional and disabled by default.
The existing patches seem to apply to 2.6.36. I'm not sure there is a need for an extra patch / reworked patches with different behaviour to what will appear in 2.6.37-rc1. > >>As I understand it, the fix that was made by the three patches > >>listed below. > >> > >>These patches appear to apply cleanly on top of 2.6.36. > >>The v2.6.36-nfct branch of > >>git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git > >>is 2.6.36 plus these three patches. > >> > >>I believe that even with these patches in order to avoid the performance > >>penalty you need to set /proc/sys/net/ipv4/vs/snat_reroute to 0. > >> > >> > >> > >>commit 8a8030407f55a6aaedb51167c1a2383311fcd707 > >>Author: Julian Anastasov <[email protected]> > >>Date: Tue Sep 21 17:38:57 2010 +0200 > >> > >> ipvs: make rerouting optional with snat_reroute > >> > >> Add new sysctl flag "snat_reroute". Recent kernels use > >> ip_route_me_harder() to route LVS-NAT responses properly by > >> VIP when there are multiple paths to client. But setups > >> that do not have alternative default routes can skip this > >> routing lookup by using snat_reroute=0. > >> > >> Signed-off-by: Julian Anastasov <[email protected]> > >> Signed-off-by: Patrick McHardy <[email protected]> > > > >Julian, > > > >do you think that it would be possible to add some auto-detection > >that turns snat_reroute on and off as necessary? > > Not sure how snat_reroute can be optimized because > it is for traffic to client. But in the case with OPS > it is not used at all. It is true that 2.6.36 > changes the picture, I'm just not sure how much because > now every IPVS packet hits existing netfilter conntrack > while before 2.6.36 we create and destroy conntrack per packet. > With boxes having enough memory both for IPVS conns and > netfilters conntracks and if the netfilter's hash lookups are > faster than creating new conntrack we can see better > results. Except nf_conntrack_max I'm not sure what needs to be > tuned. And 2.6.37-rc1 will add more delays for non-IPVS > traffic with these new handlers in LOCAL_OUT. Understood. > May be we > have to find some trick there to avoid lookups that are > not needed. For OPS 2.6.37-rc1 will destroy conntrack > immediately while 2.6.36 keeps them according to the UDP > timeout. OPS is a special case, so I guess there is some scope for optimising it. But OPS is not the common case IMHO. > OTOH, we can reorder some checks in __ip_vs_conn_in_get > and ip_vs_conn_out_get. In the old days it was equally > faster to check v4 addresses and ports but now when > RAM is slower and IPv6 is in the game we can put the ports > at first position. For example: > > this code > > if (cp->af == p->af && > ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) && > ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) && > p->cport == cp->cport && p->vport == cp->vport && > ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) && > p->protocol == cp->protocol) { > > can be optimized to: > > if (p->cport == cp->cport && p->vport == cp->vport && > cp->af == p->af && > ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) && > ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) && > ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) && > p->protocol == cp->protocol) { > > It will help also to reorder ip_vs_conn fields in this way: > > struct list_head c_list; /* hashed list heads */ > __be16 cport; > __be16 vport; > __be16 dport; > __u8 af; /* address family */ > __u8 protocol; /* Which protocol (TCP/UDP) */ > volatile __u32 flags; /* status flags */ > union nf_inet_addr caddr; /* client address */ > union nf_inet_addr vaddr; /* virtual address */ > union nf_inet_addr daddr; /* destination address */ > > It will help IPv4 to see main fields in first 32 bytes. > > Note that this change converts af and protocol to > single octet. May be protocol was u16 just to fill space > but when af was added we can put them together in a word. These optimisation seem reasonable to me. I guess we should do some benchmarking to see if they make any difference. _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
