Hello, On Fri, 3 Jul 2015, Alex Gartrell wrote:
> > - if packets go to local server IPVS should not touch > > skb->dst, skb->sk, etc (NF_ACCEPT case) > > Yeah, the thing is that early demux could totally match for a socket > that existed before we created the service, and in that instance it > might make the most sense to retain the connection and simply > NF_ACCEPT. The problem with that approach though is that is that the > behavior changes if early_demux is not enabled. I believe that we > should just do the consistent thing and always drop the early_demux > result if bound for non-local, as you've said. We must not forget that a local server listening on 0.0.0.0:VPORT or VIP:VPORT can be reached if a real server with some local IP is used as RIP. So, early demux will really work for this case when local stack is one of the real servers. > The interesting thing though is that, for the purposes of routing, > enabling early_demux does change the behavior. I suspect that's a > bug, but it's far enough away from actual use cases that it's probably > fine (who is out there tearing down addresses and setting up routes in > their place?) Looks like routing by definition can not divert skbs with early-demux socket because input routing is not called. Netfilter's DNAT may change daddr/dport before early-demux and in this case socket should not be found (eg. if we DNAT to other host). So, there is problem mostly for IPVS, I don't remember for other cases. May be CLUSTERIP too, I'm not sure. There is the problem that at LOCAL_IN SNAT is valid operation, not sure how it affects early-demux. > What do you think of the following: > > commit f04c42f8041cc4ccc4cb2a30c1058136dd497a83 > Author: Alex Gartrell <agartr...@fb.com> > Date: Wed Jul 1 13:24:46 2015 -0700 > > ipvs: orphan_skb in case of forwarding skb_orphan or orphan skb > It is possible that we bind against a local socket in early_demux when we > are actually going to want to forward it. In this case, the socket serves > no purpose and only serves to confuse things (particularly functions which > implicitly expect sk_fullsock to be true, like ip_local_out). > Additionally, skb_set_owner_w is totally broken for non full-socks. > > Signed-off-by: Alex Gartrell <agartr...@fb.com> > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c > index bf66a86..3efe719 100644 > --- a/net/netfilter/ipvs/ip_vs_xmit.c > +++ b/net/netfilter/ipvs/ip_vs_xmit.c > @@ -527,6 +527,19 @@ static inline int > ip_vs_tunnel_xmit_prepare(struct sk_buff *skb, > return ret; > } > > +/* In the event of a remote destination, it's possible that we would have > + * matches against an old socket (particularly a TIME-WAIT socket). This > + * causes havoc down the line (ip_local_out et. al. expect regular sockets > + * and invalid memory accesses will happen) so simply drop the association > + * in this case > +*/ > +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) { Move '{' on next line and below comment should be closed on next line. But I guess you will run later scripts/checkpatch.pl --strict /tmp/file.patch > + /* If dev is set, the packet came from the LOCAL_IN callback and > + * not from a local TCP socket */ > + if (skb->dev) > + skb_orphan(skb); > +} > + > /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */ > static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, > struct ip_vs_conn *cp, int local) > @@ -539,6 +552,7 @@ static inline int ip_vs_nat_send_or_cont(int pf, > struct sk_buff *skb, > else > ip_vs_update_conntrack(skb, cp, 1); > if (!local) { > + ip_vs_drop_early_demux_sk(skb); > skb_forward_csum(skb); > NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb, > NULL, skb_dst(skb)->dev, dst_output_sk); For the local=true case in ip_vs_nat_send_or_cont may be we should call skb_orphan when cp->dport != cp->vport or cp->daddr != cp->vaddr. This is a case where we DNAT to local real server but on different addr/port. If early demux finds socket, it is some socket shadowed after adding the virtual service. So, may be we have to add such checks near the NF_ACCEPT code. Can this work? else { /* Drop early-demux socket on DNAT */ if (cp->vport != cp->dport || !ip_vs_addr_equal(cp->af, cp->vaddr, &cp->caddr)) ip_vs_drop_early_demux_sk(skb); ret = NF_ACCEPT; } Otherwise, the other changes look good to me. Regards -- Julian Anastasov <j...@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html