[PATCH] packet: tpacket_snd(): fix signed/unsigned comparison

2015-07-28 Thread Alexander Drozdov
tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:

tp_len > dev->mtu + dev->hard_header_len

as dev->mtu and dev->hard_header_len are both unsigned.

That may lead to just returning an incorrect EMSGSIZE errno
to the user.

Signed-off-by: Alexander Drozdov 
---
 net/packet/af_packet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9e8741..d1d3625 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2403,7 +2403,8 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
}
tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
  addr, hlen);
-   if (tp_len > dev->mtu + dev->hard_header_len) {
+   if (likely(tp_len >= 0) &&
+   tp_len > dev->mtu + dev->hard_header_len) {
struct ethhdr *ehdr;
/* Earlier code assumed this would be a VLAN pkt,
 * double-check this now that we have the actual
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESUBMIT Patch 1/1] net: replace if()/BUG with BUG_ON

2015-06-17 Thread Alexander Drozdov

On Wed, 17 Jun 2015 09:36:11 +0200, Frans Klaver wrote:

On Wed, Jun 17, 2015 at 6:36 AM, Maninder Singh  wrote:

Use BUG_ON(condition) instead of if(condition)/BUG() .

Signed-off-by: Maninder Singh 
Reviewed-by: Akhilesh Kumar 
---
  net/packet/af_packet.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5989c6..c91d405 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -547,8 +547,7 @@ static void prb_setup_retire_blk_timer(struct packet_sock 
*po, int tx_ring)
  {
 struct tpacket_kbdq_core *pkc;

-   if (tx_ring)
-   BUG();
+   BUG_ON(tx_ring);

 pkc = tx_ring ? GET_PBDQC_FROM_RB(&po->tx_ring) :
 GET_PBDQC_FROM_RB(&po->rx_ring);


I don't get this. We're not allowed to be using tx_ring, but we can
and do handle it? Does that still warrant a BUG() or BUG_ON()? It's
been in since the function introduction[0]. Can somebody explain?


TPACKET_V3 doesn't support tx for now, so the function is actualy never 
called with non-zero tx_ring. I think there were plans to add the

support. But retire timer for tx will not be needed anyway as it is
up to user to fill the ring buffer.


Thanks,
Frans

[0] f6fb8f100b80 (af-packet: TPACKET_V3 flexible buffer implementation.)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/2] af_packet: pass checksum validation status to the user

2015-03-22 Thread Alexander Drozdov
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.

For now, the flag may be set for incoming packets only.

Signed-off-by: Alexander Drozdov 
Cc: Willem de Bruijn 
---
 Documentation/networking/packet_mmap.txt | 13 ++---
 include/uapi/linux/if_packet.h   |  1 +
 net/packet/af_packet.c   |  9 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/packet_mmap.txt 
b/Documentation/networking/packet_mmap.txt
index a6d7cb9..daa015a 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -440,9 +440,10 @@ and the following flags apply:
 +++ Capture process:
  from include/linux/if_packet.h
 
- #define TP_STATUS_COPY  2 
- #define TP_STATUS_LOSING4 
- #define TP_STATUS_CSUMNOTREADY  8 
+ #define TP_STATUS_COPY  (1 << 1)
+ #define TP_STATUS_LOSING(1 << 2)
+ #define TP_STATUS_CSUMNOTREADY  (1 << 3)
+ #define TP_STATUS_CSUM_VALID(1 << 7)
 
 TP_STATUS_COPY: This flag indicates that the frame (and associated
 meta information) has been truncated because it's 
@@ -466,6 +467,12 @@ TP_STATUS_CSUMNOTREADY: currently it's used for outgoing 
IP packets which
 reading the packet we should not try to check the 
 checksum. 
 
+TP_STATUS_CSUM_VALID  : This flag indicates that at least the transport
+header checksum of the packet has been already
+validated on the kernel side. If the flag is not set
+then we are free to check the checksum by ourselves
+provided that TP_STATUS_CSUMNOTREADY is also not set.
+
 for convenience there are also the following defines:
 
  #define TP_STATUS_KERNEL0
diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index da2d668..053bd10 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -99,6 +99,7 @@ struct tpacket_auxdata {
 #define TP_STATUS_VLAN_VALID   (1 << 4) /* auxdata has valid 
tp_vlan_tci */
 #define TP_STATUS_BLK_TMO  (1 << 5)
 #define TP_STATUS_VLAN_TPID_VALID  (1 << 6) /* auxdata has valid 
tp_vlan_tpid */
+#define TP_STATUS_CSUM_VALID   (1 << 7)
 
 /* Tx ring - header status */
 #define TP_STATUS_AVAILABLE  0
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6ecf8dd..3f09dda 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
if (skb->ip_summed == CHECKSUM_PARTIAL)
status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   status |= TP_STATUS_CSUM_VALID;
 
if (snaplen > res)
snaplen = res;
@@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct 
socket *sock,
aux.tp_status = TP_STATUS_USER;
if (skb->ip_summed == CHECKSUM_PARTIAL)
aux.tp_status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   aux.tp_status |= TP_STATUS_CSUM_VALID;
+
aux.tp_len = PACKET_SKB_CB(skb)->origlen;
aux.tp_snaplen = skb->len;
aux.tp_mac = 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/2] af_packet: make tpacket_rcv to not set status value before run_filter

2015-03-22 Thread Alexander Drozdov
It is just an optimization. We don't need the value of status variable
if the packet is filtered.

Signed-off-by: Alexander Drozdov 
---
 net/packet/af_packet.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f8db706..6ecf8dd 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1910,14 +1910,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
}
}
 
-   if (skb->ip_summed == CHECKSUM_PARTIAL)
-   status |= TP_STATUS_CSUMNOTREADY;
-
snaplen = skb->len;
 
res = run_filter(skb, sk, snaplen);
if (!res)
goto drop_n_restore;
+
+   if (skb->ip_summed == CHECKSUM_PARTIAL)
+   status |= TP_STATUS_CSUMNOTREADY;
+
if (snaplen > res)
snaplen = res;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] af_packet: pass checksum validation status to the user

2015-03-19 Thread Alexander Drozdov


On 19.03.2015 21:50:03 +0300 Willem de Bruijn wrote:

On Thu, Mar 19, 2015 at 2:08 PM, Alexander Drozdov  wrote:

On Thu, 19 Mar 2015 11:29:32 -0400, Willem de Bruijn 
wrote:


On Thu, Mar 19, 2015 at 4:01 AM, Alexander Drozdov 
wrote:


Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.


This changes the interface slightly. Processes should be treating
this previously unused bit as reserved and other flags have
been added to the bitmap in this manner as well, so this should
then be safe here, too.


For now, the flag may be set for incoming packets only.


Why?


I can't figure out how af_packet could get that the outgoing
packet's checksum has been validated. "Checksumming on
output" from skbuff.h tells that skb->ip_summed should
equal to CHECKSUM_NONE in that case,


CHECKSUM_UNNECESSARY is a valid flag on the outgoing
path according to that documentation. And, indeed, I also see
no checksum scrubbing on forwarding paths in practice (but I
may be wrong there, only took a quick glance).


but that is not true for me (in my tests, skb->ip_summed ==
CHECKSUM_UNNECESSARY for forwarded packets in some
cases).


When you disable hardware checksum offload and generate
packets locally, you do see the expected CHECKSUM_NONE
value?


Yes, I do, but see below.


You cannot change the semantics of the flag afterwards.



I think the semantics of the flag won't be changed if one set the flag
for outgoing packets. If the flag is not set (for any directions)
then that is not mean that the packet checksum is invalid. The user just can
then
checksum the packet by itself. So, the user may check the flag for any
packet
right now.


I see. So it is a hint. Okay. It would be nice if it behaves as
expected in as many cases as possible from the outset. This
would include PACKET_OUTGOING and CHECKSUM_NONE.


I've just done some testing, and I've found that packets generated by
'nping --badsum' (socket(PF_INET, SOCK_RAW, IPPROTO_RAW)) have CHECKSUM_NONE
when they are viewed by af_packet. I've used rather old Linux kernel
for the tests, but isn't that a reason to not set TP_STATUS_CSUM_VALID for
outgoing packets right now?


Please also note the flag and semantics in
Documentation/networking/packet_mmap.txt


I'll do it and I'll resend the patches with the note.





Better to support both directions from the start.


Signed-off-by: Alexander Drozdov 
---
   include/uapi/linux/if_packet.h | 1 +
   net/packet/af_packet.c | 9 +
   2 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/if_packet.h
b/include/uapi/linux/if_packet.h
index da2d668..053bd10 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -99,6 +99,7 @@ struct tpacket_auxdata {
   #define TP_STATUS_VLAN_VALID   (1 << 4) /* auxdata has valid
tp_vlan_tci */
   #define TP_STATUS_BLK_TMO  (1 << 5)
   #define TP_STATUS_VLAN_TPID_VALID  (1 << 6) /* auxdata has valid
tp_vlan_tpid */
+#define TP_STATUS_CSUM_VALID   (1 << 7)

   /* Tx ring - header status */
   #define TP_STATUS_AVAILABLE  0
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6ecf8dd..3f09dda 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct
net_device *dev,

  if (skb->ip_summed == CHECKSUM_PARTIAL)
  status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   status |= TP_STATUS_CSUM_VALID;

  if (snaplen > res)
  snaplen = res;
@@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb,
struct socket *sock,
  aux.tp_status = TP_STATUS_USER;
  if (skb->ip_summed == CHECKSUM_PARTIAL)
  aux.tp_status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   aux.tp_status |= TP_STATUS_CSUM_VALID;
+


These two sections are near duplicates. I'd move the entire status
initialization, including existing TP_STATUS_USER and
TP_STATUS_CSUMNOTREADY fields to a helper function.

It's a bit unfortunately that we have to use an extra bit to add a signal
that is a near inverse of the existing TP_STATUS_CSUMNOTREADY
(bar CHECKSUM_NONE). I do not immediately see a better way,
either, though. And tp_status has plenty room at 32 bits.


  aux.tp_len = PACKET_SKB_CB(skb)->origlen;
  aux.tp_snaplen = skb->len;
 

Re: [PATCH 2/2] af_packet: pass checksum validation status to the user

2015-03-19 Thread Alexander Drozdov

On Thu, 19 Mar 2015 11:29:32 -0400, Willem de Bruijn  wrote:

On Thu, Mar 19, 2015 at 4:01 AM, Alexander Drozdov  wrote:

Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.

This changes the interface slightly. Processes should be treating
this previously unused bit as reserved and other flags have
been added to the bitmap in this manner as well, so this should
then be safe here, too.


For now, the flag may be set for incoming packets only.

Why?

I can't figure out how af_packet could get that the outgoing
packet's checksum has been validated. "Checksumming on
output" from skbuff.h tells that skb->ip_summed should
equal to CHECKSUM_NONE in that case, but that is not true
for me (in my tests, skb->ip_summed ==
CHECKSUM_UNNECESSARY for forwarded packets in some
cases).


You cannot change the semantics of the flag afterwards.


I think the semantics of the flag won't be changed if one set the flag
for outgoing packets. If the flag is not set (for any directions)
then that is not mean that the packet checksum is invalid. The user just can 
then
checksum the packet by itself. So, the user may check the flag for any packet
right now.


Better to support both directions from the start.


Signed-off-by: Alexander Drozdov 
---
  include/uapi/linux/if_packet.h | 1 +
  net/packet/af_packet.c | 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index da2d668..053bd10 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -99,6 +99,7 @@ struct tpacket_auxdata {
  #define TP_STATUS_VLAN_VALID   (1 << 4) /* auxdata has valid 
tp_vlan_tci */
  #define TP_STATUS_BLK_TMO  (1 << 5)
  #define TP_STATUS_VLAN_TPID_VALID  (1 << 6) /* auxdata has valid 
tp_vlan_tpid */
+#define TP_STATUS_CSUM_VALID   (1 << 7)

  /* Tx ring - header status */
  #define TP_STATUS_AVAILABLE  0
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6ecf8dd..3f09dda 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,

 if (skb->ip_summed == CHECKSUM_PARTIAL)
 status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   status |= TP_STATUS_CSUM_VALID;

 if (snaplen > res)
 snaplen = res;
@@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct 
socket *sock,
 aux.tp_status = TP_STATUS_USER;
 if (skb->ip_summed == CHECKSUM_PARTIAL)
 aux.tp_status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   aux.tp_status |= TP_STATUS_CSUM_VALID;
+

These two sections are near duplicates. I'd move the entire status
initialization, including existing TP_STATUS_USER and
TP_STATUS_CSUMNOTREADY fields to a helper function.

It's a bit unfortunately that we have to use an extra bit to add a signal
that is a near inverse of the existing TP_STATUS_CSUMNOTREADY
(bar CHECKSUM_NONE). I do not immediately see a better way,
either, though. And tp_status has plenty room at 32 bits.


 aux.tp_len = PACKET_SKB_CB(skb)->origlen;
 aux.tp_snaplen = skb->len;
 aux.tp_mac = 0;
--
1.9.1




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] af_packet: pass checksum validation status to the user

2015-03-19 Thread Alexander Drozdov
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.

For now, the flag may be set for incoming packets only.

Signed-off-by: Alexander Drozdov 
---
 include/uapi/linux/if_packet.h | 1 +
 net/packet/af_packet.c | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index da2d668..053bd10 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -99,6 +99,7 @@ struct tpacket_auxdata {
 #define TP_STATUS_VLAN_VALID   (1 << 4) /* auxdata has valid 
tp_vlan_tci */
 #define TP_STATUS_BLK_TMO  (1 << 5)
 #define TP_STATUS_VLAN_TPID_VALID  (1 << 6) /* auxdata has valid 
tp_vlan_tpid */
+#define TP_STATUS_CSUM_VALID   (1 << 7)
 
 /* Tx ring - header status */
 #define TP_STATUS_AVAILABLE  0
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6ecf8dd..3f09dda 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1918,6 +1918,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
 
if (skb->ip_summed == CHECKSUM_PARTIAL)
status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   status |= TP_STATUS_CSUM_VALID;
 
if (snaplen > res)
snaplen = res;
@@ -3015,6 +3019,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct 
socket *sock,
aux.tp_status = TP_STATUS_USER;
if (skb->ip_summed == CHECKSUM_PARTIAL)
aux.tp_status |= TP_STATUS_CSUMNOTREADY;
+   else if (skb->pkt_type != PACKET_OUTGOING &&
+(skb->ip_summed == CHECKSUM_COMPLETE ||
+ skb_csum_unnecessary(skb)))
+   aux.tp_status |= TP_STATUS_CSUM_VALID;
+
aux.tp_len = PACKET_SKB_CB(skb)->origlen;
aux.tp_snaplen = skb->len;
aux.tp_mac = 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] af_packet: make tpacket_rcv to not set status value before run_filter

2015-03-19 Thread Alexander Drozdov
It is just an optimization. We don't need the value of status variable
if the packet is filtered.

Signed-off-by: Alexander Drozdov 
---
 net/packet/af_packet.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f8db706..6ecf8dd 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1910,14 +1910,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev,
}
}
 
-   if (skb->ip_summed == CHECKSUM_PARTIAL)
-   status |= TP_STATUS_CSUMNOTREADY;
-
snaplen = skb->len;
 
res = run_filter(skb, sk, snaplen);
if (!res)
goto drop_n_restore;
+
+   if (skb->ip_summed == CHECKSUM_PARTIAL)
+   status |= TP_STATUS_CSUMNOTREADY;
+
if (snaplen > res)
snaplen = res;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] netfilter: ipset: make ip_set_get_ip*_port to use skb_network_offset

2015-03-06 Thread Alexander Drozdov
All the ipset functions respect skb->network_header value,
except for ip_set_get_ip4_port() & ip_set_get_ip6_port(). The
functions should use skb_network_offset() to get the transport
header offset.

Signed-off-by: Alexander Drozdov 
---
 net/netfilter/ipset/ip_set_getport.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_getport.c 
b/net/netfilter/ipset/ip_set_getport.c
index 29fb01d..1981f02 100644
--- a/net/netfilter/ipset/ip_set_getport.c
+++ b/net/netfilter/ipset/ip_set_getport.c
@@ -98,7 +98,7 @@ ip_set_get_ip4_port(const struct sk_buff *skb, bool src,
__be16 *port, u8 *proto)
 {
const struct iphdr *iph = ip_hdr(skb);
-   unsigned int protooff = ip_hdrlen(skb);
+   unsigned int protooff = skb_network_offset(skb) + ip_hdrlen(skb);
int protocol = iph->protocol;
 
/* See comments at tcp_match in ip_tables.c */
@@ -135,7 +135,9 @@ ip_set_get_ip6_port(const struct sk_buff *skb, bool src,
__be16 frag_off = 0;
 
nexthdr = ipv6_hdr(skb)->nexthdr;
-   protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
+   protoff = ipv6_skip_exthdr(skb,
+  skb_network_offset(skb) +
+   sizeof(struct ipv6hdr), &nexthdr,
   &frag_off);
if (protoff < 0 || (frag_off & htons(~0x7)) != 0)
return false;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ipv4: ip_check_defrag should not assume that skb_network_offset is zero

2015-03-04 Thread Alexander Drozdov
ip_check_defrag() may be used by af_packet to defragment outgoing packets.
skb_network_offset() of af_packet's outgoing packets is not zero.

Signed-off-by: Alexander Drozdov 
---
 net/ipv4/ip_fragment.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 2c8d98e..145a50c 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -659,27 +659,30 @@ EXPORT_SYMBOL(ip_defrag);
 struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user)
 {
struct iphdr iph;
+   int netoff;
u32 len;
 
if (skb->protocol != htons(ETH_P_IP))
return skb;
 
-   if (skb_copy_bits(skb, 0, &iph, sizeof(iph)) < 0)
+   netoff = skb_network_offset(skb);
+
+   if (skb_copy_bits(skb, netoff, &iph, sizeof(iph)) < 0)
return skb;
 
if (iph.ihl < 5 || iph.version != 4)
return skb;
 
len = ntohs(iph.tot_len);
-   if (skb->len < len || len < (iph.ihl * 4))
+   if (skb->len < netoff + len || len < (iph.ihl * 4))
return skb;
 
if (ip_is_fragment(&iph)) {
skb = skb_share_check(skb, GFP_ATOMIC);
if (skb) {
-   if (!pskb_may_pull(skb, iph.ihl*4))
+   if (!pskb_may_pull(skb, netoff + iph.ihl * 4))
return skb;
-   if (pskb_trim_rcsum(skb, len))
+   if (pskb_trim_rcsum(skb, netoff + len))
return skb;
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
if (ip_defrag(skb, user))
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH] af_packet: don't pass empty blocks for PACKET_V3

2015-02-23 Thread Alexander Drozdov
Before da413eec729d ("packet: Fixed TPACKET V3 to signal poll when block is
closed rather than every packet") poll listening for an af_packet socket was
not signaled if there was no packets to process. After the patch poll is
signaled evety time when block retire timer expires. That happens because
af_packet closes the current block on timeout even if the block is empty.

Passing empty blocks to the user not only wastes CPU but also wastes ring
buffer space increasing probability of packets dropping on small timeouts.

Signed-off-by: Alexander Drozdov 
Cc: Dan Collins 
Cc: Willem de Bruijn 
Cc: Guy Harris 
---
 net/packet/af_packet.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9c28cec..1da46ef 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long 
data)
 
if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
if (!frozen) {
+   if (!BLOCK_NUM_PKTS(pbd)) {
+   /* An empty block. Just refresh the timer. */
+   goto refresh_timer;
+   }
prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
if (!prb_dispatch_next_block(pkc, po))
goto refresh_timer;
@@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1,
h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec;
} else {
-   /* Ok, we tmo'd - so get the current time */
+   /* Ok, we tmo'd - so get the current time.
+*
+* It shouldn't really happen as we don't close empty
+* blocks. See prb_retire_rx_blk_timer_expired().
+*/
struct timespec ts;
getnstimeofday(&ts);
h1->ts_last_pkt.ts_sec = ts.tv_sec;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] af_packet: allow packets defragmentation not only for hash fanout type

2015-02-19 Thread Alexander Drozdov
Packets defragmentation was introduced for PACKET_FANOUT_HASH only,
see 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4
fanouts")

It may be useful to have defragmentation enabled regardless of
fanout type. Without that, the AF_PACKET user may have to:
1. Collect fragments from different rings
2. Defragment by itself

Signed-off-by: Alexander Drozdov 
---
 net/packet/af_packet.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9c28cec..99fc628 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1349,14 +1349,14 @@ static int packet_rcv_fanout(struct sk_buff *skb, 
struct net_device *dev,
return 0;
}
 
+   if (fanout_has_flag(f, PACKET_FANOUT_FLAG_DEFRAG)) {
+   skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET);
+   if (!skb)
+   return 0;
+   }
switch (f->type) {
case PACKET_FANOUT_HASH:
default:
-   if (fanout_has_flag(f, PACKET_FANOUT_FLAG_DEFRAG)) {
-   skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET);
-   if (!skb)
-   return 0;
-   }
idx = fanout_demux_hash(f, skb, num);
break;
case PACKET_FANOUT_LB:
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH] ipv4: ip_check_defrag should correctly check return value of skb_copy_bits

2015-02-19 Thread Alexander Drozdov
skb_copy_bits() returns zero on success and a negative value on error,
so it is needed to invert the condition in ip_check_defrag().

Fixes: 1bf3751ec90c ("ipv4: ip_check_defrag must not modify skb before 
unsharing")
Signed-off-by: Alexander Drozdov 
Acked-by: Eric Dumazet 
Cc: Johannes Berg 
---
 net/ipv4/ip_fragment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index e5b6d0d..2c8d98e 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -664,7 +664,7 @@ struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 
user)
if (skb->protocol != htons(ETH_P_IP))
return skb;
 
-   if (!skb_copy_bits(skb, 0, &iph, sizeof(iph)))
+   if (skb_copy_bits(skb, 0, &iph, sizeof(iph)) < 0)
return skb;
 
if (iph.ihl < 5 || iph.version != 4)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ipv4: ip_check_defrag should correctly check return value of skb_copy_bits

2015-02-17 Thread Alexander Drozdov
skb_copy_bits() returns zero on success and negative value on error,
so it is needed to invert the condition in ip_check_defrag().

Fixes: 1bf3751ec90c ("ipv4: ip_check_defrag must not modify skb before 
unsharing")
Signed-off-by: Alexander Drozdov 
---
 net/ipv4/ip_fragment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index e5b6d0d..2c8d98e 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -664,7 +664,7 @@ struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 
user)
if (skb->protocol != htons(ETH_P_IP))
return skb;
 
-   if (!skb_copy_bits(skb, 0, &iph, sizeof(iph)))
+   if (skb_copy_bits(skb, 0, &iph, sizeof(iph)) < 0)
return skb;
 
if (iph.ihl < 5 || iph.version != 4)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] af_packet: don't pass empty blocks for PACKET_V3

2015-02-05 Thread Alexander Drozdov

On 05.02.2015 23:01:38 +0300 Willem de Bruijn wrote:

On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov  wrote:

Don't close an empty block on timeout. Its meaningless to
pass it to the user. Moreover, passing empty blocks wastes
CPU & buffer space increasing probability of packets
dropping on small timeouts.

Side effect of this patch is indefinite user-space wait
in poll on idle links. But, I believe its better to set
timeout for poll(2) when needed than to get empty blocks
every millisecond when not needed.

This change would break existing applications that have come
to depend on the periodic signal.

I don't disagree with the argument that the data ready signal
should be sent only when a block is full or a timer expires and
at least some data is waiting, but that is moot at this point.

I missed something. As pointed by Guy Harris ,
before the previous patch periodic signal was not delivered. The previous patch
(da413eec729dae5dc by Dan Collins ) is for 3.19 kernel 
only.
Should we care about existing 3.19-only applications?



Signed-off-by: Alexander Drozdov 
---
  net/packet/af_packet.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9cfe2e1..9a2f70a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long 
data)

 if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
 if (!frozen) {
+   if (!BLOCK_NUM_PKTS(pbd)) {
+   /* An empty block. Just refresh the timer. */
+   goto refresh_timer;
+   }
 prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
 if (!prb_dispatch_next_block(pkc, po))
 goto refresh_timer;
@@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1,
 h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
 h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec;
 } else {
-   /* Ok, we tmo'd - so get the current time */
+   /* Ok, we tmo'd - so get the current time.
+*
+* It shouldn't really happen as we don't close empty
+* blocks. See prb_retire_rx_blk_timer_expired().
+*/
 struct timespec ts;
 getnstimeofday(&ts);
 h1->ts_last_pkt.ts_sec = ts.tv_sec;
--
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] af_packet: don't pass empty blocks for PACKET_V3

2015-02-05 Thread Alexander Drozdov

On 06.02.2015 00:16:30 +0300 Guy Harris  wrote:

On Feb 5, 2015, at 12:01 PM, Willem de Bruijn  wrote:


On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov  wrote:

Don't close an empty block on timeout. Its meaningless to
pass it to the user. Moreover, passing empty blocks wastes
CPU & buffer space increasing probability of packets
dropping on small timeouts.

Side effect of this patch is indefinite user-space wait
in poll on idle links. But, I believe its better to set
timeout for poll(2) when needed than to get empty blocks
every millisecond when not needed.

This change would break existing applications that have come
to depend on the periodic signal.

I don't disagree with the argument that the data ready signal
should be sent only when a block is full or a timer expires and
at least some data is waiting, but that is moot at this point.

For what it's worth, the BPF packet capture mechanism (which really needs a new 
name, to distinguish itself from the BPF packet filter language and its 
implementation(s), but I digress) has the same issue - when the timer expires, 
a wakeup is delivered even if there are no packets to read.

*However*, if there are no packets available, the buffers aren't rotated, so 
the empty buffer is left around to be filled up with packets, rather than being 
made the hold buffer.

Given that before the previous TPACKET_V3 change, wakeups were delivered when 
packets arrived rather than when a block was closed, presumably code using 
TPACKET_V3 was capable of dealing with wakeups being delivered when no new 
blocks had been made available to userland; could TPACKET_V3 work a bit more 
like BPF and deliver a wakeup when the timer expires *without* closing the 
empty block?

Thank you all for your comments! I'll try to create two patches:
1. Wakeup by timeout without closing the empty block
2. Allow to not wakeup by timeout (the feature should be explicitly requested 
by a user)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] af_packet: don't pass empty blocks for PACKET_V3

2015-02-04 Thread Alexander Drozdov
Don't close an empty block on timeout. Its meaningless to
pass it to the user. Moreover, passing empty blocks wastes
CPU & buffer space increasing probability of packets
dropping on small timeouts.

Side effect of this patch is indefinite user-space wait
in poll on idle links. But, I believe its better to set
timeout for poll(2) when needed than to get empty blocks
every millisecond when not needed.

Signed-off-by: Alexander Drozdov 
---
 net/packet/af_packet.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9cfe2e1..9a2f70a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long 
data)
 
if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
if (!frozen) {
+   if (!BLOCK_NUM_PKTS(pbd)) {
+   /* An empty block. Just refresh the timer. */
+   goto refresh_timer;
+   }
prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
if (!prb_dispatch_next_block(pkc, po))
goto refresh_timer;
@@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1,
h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec;
} else {
-   /* Ok, we tmo'd - so get the current time */
+   /* Ok, we tmo'd - so get the current time.
+*
+* It shouldn't really happen as we don't close empty
+* blocks. See prb_retire_rx_blk_timer_expired().
+*/
struct timespec ts;
getnstimeofday(&ts);
h1->ts_last_pkt.ts_sec = ts.tv_sec;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/