[PATCH] packet: tpacket_snd(): fix signed/unsigned comparison
tpacket_fill_skb() can return a negative value (-errno) which is stored in tp_len variable. In that case the following condition will be (but shouldn't be) true: tp_len > dev->mtu + dev->hard_header_len as dev->mtu and dev->hard_header_len are both unsigned. That may lead to just returning an incorrect EMSGSIZE errno to the user. Signed-off-by: Alexander Drozdov --- net/packet/af_packet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c9e8741..d1d3625 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2403,7 +2403,8 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) } tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto, addr, hlen); - if (tp_len > dev->mtu + dev->hard_header_len) { + if (likely(tp_len >= 0) && + tp_len > dev->mtu + dev->hard_header_len) { struct ethhdr *ehdr; /* Earlier code assumed this would be a VLAN pkt, * double-check this now that we have the actual -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RESUBMIT Patch 1/1] net: replace if()/BUG with BUG_ON
On Wed, 17 Jun 2015 09:36:11 +0200, Frans Klaver wrote: On Wed, Jun 17, 2015 at 6:36 AM, Maninder Singh wrote: Use BUG_ON(condition) instead of if(condition)/BUG() . Signed-off-by: Maninder Singh Reviewed-by: Akhilesh Kumar --- net/packet/af_packet.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b5989c6..c91d405 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -547,8 +547,7 @@ static void prb_setup_retire_blk_timer(struct packet_sock *po, int tx_ring) { struct tpacket_kbdq_core *pkc; - if (tx_ring) - BUG(); + BUG_ON(tx_ring); pkc = tx_ring ? GET_PBDQC_FROM_RB(&po->tx_ring) : GET_PBDQC_FROM_RB(&po->rx_ring); I don't get this. We're not allowed to be using tx_ring, but we can and do handle it? Does that still warrant a BUG() or BUG_ON()? It's been in since the function introduction[0]. Can somebody explain? TPACKET_V3 doesn't support tx for now, so the function is actualy never called with non-zero tx_ring. I think there were plans to add the support. But retire timer for tx will not be needed anyway as it is up to user to fill the ring buffer. Thanks, Frans [0] f6fb8f100b80 (af-packet: TPACKET_V3 flexible buffer implementation.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 2/2] af_packet: pass checksum validation status to the user
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the af_packet user that at least the transport header checksum has been already validated. For now, the flag may be set for incoming packets only. Signed-off-by: Alexander Drozdov Cc: Willem de Bruijn --- Documentation/networking/packet_mmap.txt | 13 ++--- include/uapi/linux/if_packet.h | 1 + net/packet/af_packet.c | 9 + 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index a6d7cb9..daa015a 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -440,9 +440,10 @@ and the following flags apply: +++ Capture process: from include/linux/if_packet.h - #define TP_STATUS_COPY 2 - #define TP_STATUS_LOSING4 - #define TP_STATUS_CSUMNOTREADY 8 + #define TP_STATUS_COPY (1 << 1) + #define TP_STATUS_LOSING(1 << 2) + #define TP_STATUS_CSUMNOTREADY (1 << 3) + #define TP_STATUS_CSUM_VALID(1 << 7) TP_STATUS_COPY: This flag indicates that the frame (and associated meta information) has been truncated because it's @@ -466,6 +467,12 @@ TP_STATUS_CSUMNOTREADY: currently it's used for outgoing IP packets which reading the packet we should not try to check the checksum. +TP_STATUS_CSUM_VALID : This flag indicates that at least the transport +header checksum of the packet has been already +validated on the kernel side. If the flag is not set +then we are free to check the checksum by ourselves +provided that TP_STATUS_CSUMNOTREADY is also not set. + for convenience there are also the following defines: #define TP_STATUS_KERNEL0 diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index da2d668..053bd10 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -99,6 +99,7 @@ struct tpacket_auxdata { #define TP_STATUS_VLAN_VALID (1 << 4) /* auxdata has valid tp_vlan_tci */ #define TP_STATUS_BLK_TMO (1 << 5) #define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ +#define TP_STATUS_CSUM_VALID (1 << 7) /* Tx ring - header status */ #define TP_STATUS_AVAILABLE 0 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 6ecf8dd..3f09dda 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (skb->ip_summed == CHECKSUM_PARTIAL) status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + status |= TP_STATUS_CSUM_VALID; if (snaplen > res) snaplen = res; @@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock, aux.tp_status = TP_STATUS_USER; if (skb->ip_summed == CHECKSUM_PARTIAL) aux.tp_status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + aux.tp_status |= TP_STATUS_CSUM_VALID; + aux.tp_len = PACKET_SKB_CB(skb)->origlen; aux.tp_snaplen = skb->len; aux.tp_mac = 0; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 1/2] af_packet: make tpacket_rcv to not set status value before run_filter
It is just an optimization. We don't need the value of status variable if the packet is filtered. Signed-off-by: Alexander Drozdov --- net/packet/af_packet.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f8db706..6ecf8dd 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1910,14 +1910,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, } } - if (skb->ip_summed == CHECKSUM_PARTIAL) - status |= TP_STATUS_CSUMNOTREADY; - snaplen = skb->len; res = run_filter(skb, sk, snaplen); if (!res) goto drop_n_restore; + + if (skb->ip_summed == CHECKSUM_PARTIAL) + status |= TP_STATUS_CSUMNOTREADY; + if (snaplen > res) snaplen = res; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] af_packet: pass checksum validation status to the user
On 19.03.2015 21:50:03 +0300 Willem de Bruijn wrote: On Thu, Mar 19, 2015 at 2:08 PM, Alexander Drozdov wrote: On Thu, 19 Mar 2015 11:29:32 -0400, Willem de Bruijn wrote: On Thu, Mar 19, 2015 at 4:01 AM, Alexander Drozdov wrote: Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the af_packet user that at least the transport header checksum has been already validated. This changes the interface slightly. Processes should be treating this previously unused bit as reserved and other flags have been added to the bitmap in this manner as well, so this should then be safe here, too. For now, the flag may be set for incoming packets only. Why? I can't figure out how af_packet could get that the outgoing packet's checksum has been validated. "Checksumming on output" from skbuff.h tells that skb->ip_summed should equal to CHECKSUM_NONE in that case, CHECKSUM_UNNECESSARY is a valid flag on the outgoing path according to that documentation. And, indeed, I also see no checksum scrubbing on forwarding paths in practice (but I may be wrong there, only took a quick glance). but that is not true for me (in my tests, skb->ip_summed == CHECKSUM_UNNECESSARY for forwarded packets in some cases). When you disable hardware checksum offload and generate packets locally, you do see the expected CHECKSUM_NONE value? Yes, I do, but see below. You cannot change the semantics of the flag afterwards. I think the semantics of the flag won't be changed if one set the flag for outgoing packets. If the flag is not set (for any directions) then that is not mean that the packet checksum is invalid. The user just can then checksum the packet by itself. So, the user may check the flag for any packet right now. I see. So it is a hint. Okay. It would be nice if it behaves as expected in as many cases as possible from the outset. This would include PACKET_OUTGOING and CHECKSUM_NONE. I've just done some testing, and I've found that packets generated by 'nping --badsum' (socket(PF_INET, SOCK_RAW, IPPROTO_RAW)) have CHECKSUM_NONE when they are viewed by af_packet. I've used rather old Linux kernel for the tests, but isn't that a reason to not set TP_STATUS_CSUM_VALID for outgoing packets right now? Please also note the flag and semantics in Documentation/networking/packet_mmap.txt I'll do it and I'll resend the patches with the note. Better to support both directions from the start. Signed-off-by: Alexander Drozdov --- include/uapi/linux/if_packet.h | 1 + net/packet/af_packet.c | 9 + 2 files changed, 10 insertions(+) diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index da2d668..053bd10 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -99,6 +99,7 @@ struct tpacket_auxdata { #define TP_STATUS_VLAN_VALID (1 << 4) /* auxdata has valid tp_vlan_tci */ #define TP_STATUS_BLK_TMO (1 << 5) #define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ +#define TP_STATUS_CSUM_VALID (1 << 7) /* Tx ring - header status */ #define TP_STATUS_AVAILABLE 0 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 6ecf8dd..3f09dda 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (skb->ip_summed == CHECKSUM_PARTIAL) status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + status |= TP_STATUS_CSUM_VALID; if (snaplen > res) snaplen = res; @@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock, aux.tp_status = TP_STATUS_USER; if (skb->ip_summed == CHECKSUM_PARTIAL) aux.tp_status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + aux.tp_status |= TP_STATUS_CSUM_VALID; + These two sections are near duplicates. I'd move the entire status initialization, including existing TP_STATUS_USER and TP_STATUS_CSUMNOTREADY fields to a helper function. It's a bit unfortunately that we have to use an extra bit to add a signal that is a near inverse of the existing TP_STATUS_CSUMNOTREADY (bar CHECKSUM_NONE). I do not immediately see a better way, either, though. And tp_status has plenty room at 32 bits. aux.tp_len = PACKET_SKB_CB(skb)->origlen; aux.tp_snaplen = skb->len;
Re: [PATCH 2/2] af_packet: pass checksum validation status to the user
On Thu, 19 Mar 2015 11:29:32 -0400, Willem de Bruijn wrote: On Thu, Mar 19, 2015 at 4:01 AM, Alexander Drozdov wrote: Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the af_packet user that at least the transport header checksum has been already validated. This changes the interface slightly. Processes should be treating this previously unused bit as reserved and other flags have been added to the bitmap in this manner as well, so this should then be safe here, too. For now, the flag may be set for incoming packets only. Why? I can't figure out how af_packet could get that the outgoing packet's checksum has been validated. "Checksumming on output" from skbuff.h tells that skb->ip_summed should equal to CHECKSUM_NONE in that case, but that is not true for me (in my tests, skb->ip_summed == CHECKSUM_UNNECESSARY for forwarded packets in some cases). You cannot change the semantics of the flag afterwards. I think the semantics of the flag won't be changed if one set the flag for outgoing packets. If the flag is not set (for any directions) then that is not mean that the packet checksum is invalid. The user just can then checksum the packet by itself. So, the user may check the flag for any packet right now. Better to support both directions from the start. Signed-off-by: Alexander Drozdov --- include/uapi/linux/if_packet.h | 1 + net/packet/af_packet.c | 9 + 2 files changed, 10 insertions(+) diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index da2d668..053bd10 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -99,6 +99,7 @@ struct tpacket_auxdata { #define TP_STATUS_VLAN_VALID (1 << 4) /* auxdata has valid tp_vlan_tci */ #define TP_STATUS_BLK_TMO (1 << 5) #define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ +#define TP_STATUS_CSUM_VALID (1 << 7) /* Tx ring - header status */ #define TP_STATUS_AVAILABLE 0 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 6ecf8dd..3f09dda 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (skb->ip_summed == CHECKSUM_PARTIAL) status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + status |= TP_STATUS_CSUM_VALID; if (snaplen > res) snaplen = res; @@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock, aux.tp_status = TP_STATUS_USER; if (skb->ip_summed == CHECKSUM_PARTIAL) aux.tp_status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + aux.tp_status |= TP_STATUS_CSUM_VALID; + These two sections are near duplicates. I'd move the entire status initialization, including existing TP_STATUS_USER and TP_STATUS_CSUMNOTREADY fields to a helper function. It's a bit unfortunately that we have to use an extra bit to add a signal that is a near inverse of the existing TP_STATUS_CSUMNOTREADY (bar CHECKSUM_NONE). I do not immediately see a better way, either, though. And tp_status has plenty room at 32 bits. aux.tp_len = PACKET_SKB_CB(skb)->origlen; aux.tp_snaplen = skb->len; aux.tp_mac = 0; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] af_packet: pass checksum validation status to the user
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the af_packet user that at least the transport header checksum has been already validated. For now, the flag may be set for incoming packets only. Signed-off-by: Alexander Drozdov --- include/uapi/linux/if_packet.h | 1 + net/packet/af_packet.c | 9 + 2 files changed, 10 insertions(+) diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index da2d668..053bd10 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -99,6 +99,7 @@ struct tpacket_auxdata { #define TP_STATUS_VLAN_VALID (1 << 4) /* auxdata has valid tp_vlan_tci */ #define TP_STATUS_BLK_TMO (1 << 5) #define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ +#define TP_STATUS_CSUM_VALID (1 << 7) /* Tx ring - header status */ #define TP_STATUS_AVAILABLE 0 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 6ecf8dd..3f09dda 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (skb->ip_summed == CHECKSUM_PARTIAL) status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + status |= TP_STATUS_CSUM_VALID; if (snaplen > res) snaplen = res; @@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock, aux.tp_status = TP_STATUS_USER; if (skb->ip_summed == CHECKSUM_PARTIAL) aux.tp_status |= TP_STATUS_CSUMNOTREADY; + else if (skb->pkt_type != PACKET_OUTGOING && +(skb->ip_summed == CHECKSUM_COMPLETE || + skb_csum_unnecessary(skb))) + aux.tp_status |= TP_STATUS_CSUM_VALID; + aux.tp_len = PACKET_SKB_CB(skb)->origlen; aux.tp_snaplen = skb->len; aux.tp_mac = 0; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] af_packet: make tpacket_rcv to not set status value before run_filter
It is just an optimization. We don't need the value of status variable if the packet is filtered. Signed-off-by: Alexander Drozdov --- net/packet/af_packet.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f8db706..6ecf8dd 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1910,14 +1910,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, } } - if (skb->ip_summed == CHECKSUM_PARTIAL) - status |= TP_STATUS_CSUMNOTREADY; - snaplen = skb->len; res = run_filter(skb, sk, snaplen); if (!res) goto drop_n_restore; + + if (skb->ip_summed == CHECKSUM_PARTIAL) + status |= TP_STATUS_CSUMNOTREADY; + if (snaplen > res) snaplen = res; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netfilter: ipset: make ip_set_get_ip*_port to use skb_network_offset
All the ipset functions respect skb->network_header value, except for ip_set_get_ip4_port() & ip_set_get_ip6_port(). The functions should use skb_network_offset() to get the transport header offset. Signed-off-by: Alexander Drozdov --- net/netfilter/ipset/ip_set_getport.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipset/ip_set_getport.c b/net/netfilter/ipset/ip_set_getport.c index 29fb01d..1981f02 100644 --- a/net/netfilter/ipset/ip_set_getport.c +++ b/net/netfilter/ipset/ip_set_getport.c @@ -98,7 +98,7 @@ ip_set_get_ip4_port(const struct sk_buff *skb, bool src, __be16 *port, u8 *proto) { const struct iphdr *iph = ip_hdr(skb); - unsigned int protooff = ip_hdrlen(skb); + unsigned int protooff = skb_network_offset(skb) + ip_hdrlen(skb); int protocol = iph->protocol; /* See comments at tcp_match in ip_tables.c */ @@ -135,7 +135,9 @@ ip_set_get_ip6_port(const struct sk_buff *skb, bool src, __be16 frag_off = 0; nexthdr = ipv6_hdr(skb)->nexthdr; - protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, + protoff = ipv6_skip_exthdr(skb, + skb_network_offset(skb) + + sizeof(struct ipv6hdr), &nexthdr, &frag_off); if (protoff < 0 || (frag_off & htons(~0x7)) != 0) return false; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipv4: ip_check_defrag should not assume that skb_network_offset is zero
ip_check_defrag() may be used by af_packet to defragment outgoing packets. skb_network_offset() of af_packet's outgoing packets is not zero. Signed-off-by: Alexander Drozdov --- net/ipv4/ip_fragment.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index 2c8d98e..145a50c 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -659,27 +659,30 @@ EXPORT_SYMBOL(ip_defrag); struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user) { struct iphdr iph; + int netoff; u32 len; if (skb->protocol != htons(ETH_P_IP)) return skb; - if (skb_copy_bits(skb, 0, &iph, sizeof(iph)) < 0) + netoff = skb_network_offset(skb); + + if (skb_copy_bits(skb, netoff, &iph, sizeof(iph)) < 0) return skb; if (iph.ihl < 5 || iph.version != 4) return skb; len = ntohs(iph.tot_len); - if (skb->len < len || len < (iph.ihl * 4)) + if (skb->len < netoff + len || len < (iph.ihl * 4)) return skb; if (ip_is_fragment(&iph)) { skb = skb_share_check(skb, GFP_ATOMIC); if (skb) { - if (!pskb_may_pull(skb, iph.ihl*4)) + if (!pskb_may_pull(skb, netoff + iph.ihl * 4)) return skb; - if (pskb_trim_rcsum(skb, len)) + if (pskb_trim_rcsum(skb, netoff + len)) return skb; memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); if (ip_defrag(skb, user)) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH] af_packet: don't pass empty blocks for PACKET_V3
Before da413eec729d ("packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet") poll listening for an af_packet socket was not signaled if there was no packets to process. After the patch poll is signaled evety time when block retire timer expires. That happens because af_packet closes the current block on timeout even if the block is empty. Passing empty blocks to the user not only wastes CPU but also wastes ring buffer space increasing probability of packets dropping on small timeouts. Signed-off-by: Alexander Drozdov Cc: Dan Collins Cc: Willem de Bruijn Cc: Guy Harris --- net/packet/af_packet.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 9c28cec..1da46ef 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long data) if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) { if (!frozen) { + if (!BLOCK_NUM_PKTS(pbd)) { + /* An empty block. Just refresh the timer. */ + goto refresh_timer; + } prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO); if (!prb_dispatch_next_block(pkc, po)) goto refresh_timer; @@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1, h1->ts_last_pkt.ts_sec = last_pkt->tp_sec; h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec; } else { - /* Ok, we tmo'd - so get the current time */ + /* Ok, we tmo'd - so get the current time. +* +* It shouldn't really happen as we don't close empty +* blocks. See prb_retire_rx_blk_timer_expired(). +*/ struct timespec ts; getnstimeofday(&ts); h1->ts_last_pkt.ts_sec = ts.tv_sec; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] af_packet: allow packets defragmentation not only for hash fanout type
Packets defragmentation was introduced for PACKET_FANOUT_HASH only, see 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts") It may be useful to have defragmentation enabled regardless of fanout type. Without that, the AF_PACKET user may have to: 1. Collect fragments from different rings 2. Defragment by itself Signed-off-by: Alexander Drozdov --- net/packet/af_packet.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 9c28cec..99fc628 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1349,14 +1349,14 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev, return 0; } + if (fanout_has_flag(f, PACKET_FANOUT_FLAG_DEFRAG)) { + skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET); + if (!skb) + return 0; + } switch (f->type) { case PACKET_FANOUT_HASH: default: - if (fanout_has_flag(f, PACKET_FANOUT_FLAG_DEFRAG)) { - skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET); - if (!skb) - return 0; - } idx = fanout_demux_hash(f, skb, num); break; case PACKET_FANOUT_LB: -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND PATCH] ipv4: ip_check_defrag should correctly check return value of skb_copy_bits
skb_copy_bits() returns zero on success and a negative value on error, so it is needed to invert the condition in ip_check_defrag(). Fixes: 1bf3751ec90c ("ipv4: ip_check_defrag must not modify skb before unsharing") Signed-off-by: Alexander Drozdov Acked-by: Eric Dumazet Cc: Johannes Berg --- net/ipv4/ip_fragment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index e5b6d0d..2c8d98e 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -664,7 +664,7 @@ struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user) if (skb->protocol != htons(ETH_P_IP)) return skb; - if (!skb_copy_bits(skb, 0, &iph, sizeof(iph))) + if (skb_copy_bits(skb, 0, &iph, sizeof(iph)) < 0) return skb; if (iph.ihl < 5 || iph.version != 4) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipv4: ip_check_defrag should correctly check return value of skb_copy_bits
skb_copy_bits() returns zero on success and negative value on error, so it is needed to invert the condition in ip_check_defrag(). Fixes: 1bf3751ec90c ("ipv4: ip_check_defrag must not modify skb before unsharing") Signed-off-by: Alexander Drozdov --- net/ipv4/ip_fragment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index e5b6d0d..2c8d98e 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -664,7 +664,7 @@ struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user) if (skb->protocol != htons(ETH_P_IP)) return skb; - if (!skb_copy_bits(skb, 0, &iph, sizeof(iph))) + if (skb_copy_bits(skb, 0, &iph, sizeof(iph)) < 0) return skb; if (iph.ihl < 5 || iph.version != 4) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] af_packet: don't pass empty blocks for PACKET_V3
On 05.02.2015 23:01:38 +0300 Willem de Bruijn wrote: On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov wrote: Don't close an empty block on timeout. Its meaningless to pass it to the user. Moreover, passing empty blocks wastes CPU & buffer space increasing probability of packets dropping on small timeouts. Side effect of this patch is indefinite user-space wait in poll on idle links. But, I believe its better to set timeout for poll(2) when needed than to get empty blocks every millisecond when not needed. This change would break existing applications that have come to depend on the periodic signal. I don't disagree with the argument that the data ready signal should be sent only when a block is full or a timer expires and at least some data is waiting, but that is moot at this point. I missed something. As pointed by Guy Harris , before the previous patch periodic signal was not delivered. The previous patch (da413eec729dae5dc by Dan Collins ) is for 3.19 kernel only. Should we care about existing 3.19-only applications? Signed-off-by: Alexander Drozdov --- net/packet/af_packet.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 9cfe2e1..9a2f70a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long data) if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) { if (!frozen) { + if (!BLOCK_NUM_PKTS(pbd)) { + /* An empty block. Just refresh the timer. */ + goto refresh_timer; + } prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO); if (!prb_dispatch_next_block(pkc, po)) goto refresh_timer; @@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1, h1->ts_last_pkt.ts_sec = last_pkt->tp_sec; h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec; } else { - /* Ok, we tmo'd - so get the current time */ + /* Ok, we tmo'd - so get the current time. +* +* It shouldn't really happen as we don't close empty +* blocks. See prb_retire_rx_blk_timer_expired(). +*/ struct timespec ts; getnstimeofday(&ts); h1->ts_last_pkt.ts_sec = ts.tv_sec; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] af_packet: don't pass empty blocks for PACKET_V3
On 06.02.2015 00:16:30 +0300 Guy Harris wrote: On Feb 5, 2015, at 12:01 PM, Willem de Bruijn wrote: On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov wrote: Don't close an empty block on timeout. Its meaningless to pass it to the user. Moreover, passing empty blocks wastes CPU & buffer space increasing probability of packets dropping on small timeouts. Side effect of this patch is indefinite user-space wait in poll on idle links. But, I believe its better to set timeout for poll(2) when needed than to get empty blocks every millisecond when not needed. This change would break existing applications that have come to depend on the periodic signal. I don't disagree with the argument that the data ready signal should be sent only when a block is full or a timer expires and at least some data is waiting, but that is moot at this point. For what it's worth, the BPF packet capture mechanism (which really needs a new name, to distinguish itself from the BPF packet filter language and its implementation(s), but I digress) has the same issue - when the timer expires, a wakeup is delivered even if there are no packets to read. *However*, if there are no packets available, the buffers aren't rotated, so the empty buffer is left around to be filled up with packets, rather than being made the hold buffer. Given that before the previous TPACKET_V3 change, wakeups were delivered when packets arrived rather than when a block was closed, presumably code using TPACKET_V3 was capable of dealing with wakeups being delivered when no new blocks had been made available to userland; could TPACKET_V3 work a bit more like BPF and deliver a wakeup when the timer expires *without* closing the empty block? Thank you all for your comments! I'll try to create two patches: 1. Wakeup by timeout without closing the empty block 2. Allow to not wakeup by timeout (the feature should be explicitly requested by a user) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] af_packet: don't pass empty blocks for PACKET_V3
Don't close an empty block on timeout. Its meaningless to pass it to the user. Moreover, passing empty blocks wastes CPU & buffer space increasing probability of packets dropping on small timeouts. Side effect of this patch is indefinite user-space wait in poll on idle links. But, I believe its better to set timeout for poll(2) when needed than to get empty blocks every millisecond when not needed. Signed-off-by: Alexander Drozdov --- net/packet/af_packet.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 9cfe2e1..9a2f70a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long data) if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) { if (!frozen) { + if (!BLOCK_NUM_PKTS(pbd)) { + /* An empty block. Just refresh the timer. */ + goto refresh_timer; + } prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO); if (!prb_dispatch_next_block(pkc, po)) goto refresh_timer; @@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1, h1->ts_last_pkt.ts_sec = last_pkt->tp_sec; h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec; } else { - /* Ok, we tmo'd - so get the current time */ + /* Ok, we tmo'd - so get the current time. +* +* It shouldn't really happen as we don't close empty +* blocks. See prb_retire_rx_blk_timer_expired(). +*/ struct timespec ts; getnstimeofday(&ts); h1->ts_last_pkt.ts_sec = ts.tv_sec; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/