Re: [Xen-devel] [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Dongli Zhang
Hi Eric,

On 03/29/2018 12:03 PM, Eric Dumazet wrote:
> 
> 
> On 03/28/2018 08:51 PM, Dongli Zhang wrote:
>> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
>> the received sk_buff is malformed, that is, when the sk_buff has pattern
>> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
>> stack:
>>
>> ...
> 
> 
>>
>> The issue is hit by xen-netback when there is bug with other networking
>> interface (e.g., dom0 physical NIC), who has generated and forwarded
>> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
>> purpose with below sample code in a kernel module:
>>
>> skb->dev = dev; // dev of vifX.Y
>> skb->len = 386;
>> skb->data_len = 352;
>> skb->tail = 98;
>> skb->end = 384;
>> skb_shinfo(skb)->nr_frags = 0;
>> dev->netdev_ops->ndo_start_xmit(skb, dev);
>>
> 
> This would be a serious bug in the provider of such skb.

/nods

> 
> Are you sure you do not have instead an skb with a chain of skbs ?
> 
> (skb_shinfo(skb)->frag_list would be not NULL)

I am sure the skb_shinfo(skb)->frag_list is NULL.

> 
> Maybe your driver is wrongly advertising NETIF_F_FRAGLIST
> 
> commit 2167ca029c244901831 would be the bug origin then...

Unlike the new linux version (whose BUG_ON() does not panic the server), the
BUG_ON() in prior old kernel version would panic xen dom0 server and then people
would always blame xen paravirtual driver.

Indeed, xen-netback did not process the malformed sk_buff appropriately on rx
path. The issue is not hit with old dom0 kernel, when I am running the debug
module (as shown in below link) to generate a malformed sk_buff on purpose.

https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg03176.html

Dongli Zhang


Re: [Xen-devel] [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Dongli Zhang
Hi Eric,

On 03/29/2018 12:03 PM, Eric Dumazet wrote:
> 
> 
> On 03/28/2018 08:51 PM, Dongli Zhang wrote:
>> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
>> the received sk_buff is malformed, that is, when the sk_buff has pattern
>> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
>> stack:
>>
>> ...
> 
> 
>>
>> The issue is hit by xen-netback when there is bug with other networking
>> interface (e.g., dom0 physical NIC), who has generated and forwarded
>> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
>> purpose with below sample code in a kernel module:
>>
>> skb->dev = dev; // dev of vifX.Y
>> skb->len = 386;
>> skb->data_len = 352;
>> skb->tail = 98;
>> skb->end = 384;
>> skb_shinfo(skb)->nr_frags = 0;
>> dev->netdev_ops->ndo_start_xmit(skb, dev);
>>
> 
> This would be a serious bug in the provider of such skb.

/nods

> 
> Are you sure you do not have instead an skb with a chain of skbs ?
> 
> (skb_shinfo(skb)->frag_list would be not NULL)

I am sure the skb_shinfo(skb)->frag_list is NULL.

> 
> Maybe your driver is wrongly advertising NETIF_F_FRAGLIST
> 
> commit 2167ca029c244901831 would be the bug origin then...

Unlike the new linux version (whose BUG_ON() does not panic the server), the
BUG_ON() in prior old kernel version would panic xen dom0 server and then people
would always blame xen paravirtual driver.

Indeed, xen-netback did not process the malformed sk_buff appropriately on rx
path. The issue is not hit with old dom0 kernel, when I am running the debug
module (as shown in below link) to generate a malformed sk_buff on purpose.

https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg03176.html

Dongli Zhang


Re: [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Eric Dumazet


On 03/28/2018 08:51 PM, Dongli Zhang wrote:
> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
> the received sk_buff is malformed, that is, when the sk_buff has pattern
> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
> stack:
> 
>...


> 
> The issue is hit by xen-netback when there is bug with other networking
> interface (e.g., dom0 physical NIC), who has generated and forwarded
> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
> purpose with below sample code in a kernel module:
> 
> skb->dev = dev; // dev of vifX.Y
> skb->len = 386;
> skb->data_len = 352;
> skb->tail = 98;
> skb->end = 384;
> skb_shinfo(skb)->nr_frags = 0;
> dev->netdev_ops->ndo_start_xmit(skb, dev);
>

This would be a serious bug in the provider of such skb.

Are you sure you do not have instead an skb with a chain of skbs ?

(skb_shinfo(skb)->frag_list would be not NULL)

Maybe your driver is wrongly advertising NETIF_F_FRAGLIST

commit 2167ca029c244901831 would be the bug origin then...





Re: [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Eric Dumazet


On 03/28/2018 08:51 PM, Dongli Zhang wrote:
> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
> the received sk_buff is malformed, that is, when the sk_buff has pattern
> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
> stack:
> 
>...


> 
> The issue is hit by xen-netback when there is bug with other networking
> interface (e.g., dom0 physical NIC), who has generated and forwarded
> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
> purpose with below sample code in a kernel module:
> 
> skb->dev = dev; // dev of vifX.Y
> skb->len = 386;
> skb->data_len = 352;
> skb->tail = 98;
> skb->end = 384;
> skb_shinfo(skb)->nr_frags = 0;
> dev->netdev_ops->ndo_start_xmit(skb, dev);
>

This would be a serious bug in the provider of such skb.

Are you sure you do not have instead an skb with a chain of skbs ?

(skb_shinfo(skb)->frag_list would be not NULL)

Maybe your driver is wrongly advertising NETIF_F_FRAGLIST

commit 2167ca029c244901831 would be the bug origin then...





[PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Dongli Zhang
The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
the received sk_buff is malformed, that is, when the sk_buff has pattern
(skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
stack:

[  438.652658] [ cut here ]
[  438.652660] kernel BUG at drivers/net/xen-netback/rx.c:325!
[  438.652714] invalid opcode:  [#1] SMP NOPTI
[  438.652813] CPU: 0 PID: 2492 Comm: vif1.0-q0-guest Tainted: G   O
 4.16.0-rc6+ #1
[  438.652896] RIP: e030:xenvif_rx_skb+0x3c2/0x5e0 [xen_netback]
[  438.652926] RSP: e02b:c90040877dc8 EFLAGS: 00010246
[  438.652956] RAX: 0160 RBX: 0022 RCX: 0001
[  438.652993] RDX: c900402890d0 RSI:  RDI: c90040889000
[  438.653029] RBP: 88002b460040 R08: c90040877de0 R09: 0100
[  438.653065] R10: 7ff0 R11: 0002 R12: c90040889000
[  438.653100] R13: 8000 R14: 0022 R15: 8000
[  438.653149] FS:  7f15603778c0() GS:88003040() 
knlGS:
[  438.653188] CS:  e033 DS:  ES:  CR0: 80050033
[  438.653219] CR2: 01832a08 CR3: 29c12000 CR4: 00042660
[  438.653262] Call Trace:
[  438.653284]  ? xen_hypercall_event_channel_op+0xa/0x20
[  438.653313]  xenvif_rx_action+0x41/0x80 [xen_netback]
[  438.653341]  xenvif_kthread_guest_rx+0xb2/0x2a8 [xen_netback]
[  438.653374]  ? __schedule+0x352/0x700
[  438.653398]  ? wait_woken+0x80/0x80
[  438.653421]  kthread+0xf3/0x130
[  438.653442]  ? xenvif_rx_action+0x80/0x80 [xen_netback]
[  438.653470]  ? kthread_destroy_worker+0x40/0x40
[  438.653497]  ret_from_fork+0x35/0x40

The issue is hit by xen-netback when there is bug with other networking
interface (e.g., dom0 physical NIC), who has generated and forwarded
malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
purpose with below sample code in a kernel module:

skb->dev = dev; // dev of vifX.Y
skb->len = 386;
skb->data_len = 352;
skb->tail = 98;
skb->end = 384;
skb_shinfo(skb)->nr_frags = 0;
dev->netdev_ops->ndo_start_xmit(skb, dev);

This patch stops processing sk_buff immediately if it is detected as
malformed, that is, pkt->frag_iter is NULL but there is still remaining
pkt->remaining_len.

Signed-off-by: Dongli Zhang 

---
Changed since v1:
  * return XEN_NETIF_RSP_ERROR in response to netfront

 drivers/net/xen-netback/rx.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
index b1cf7c6..a3d8ee9 100644
--- a/drivers/net/xen-netback/rx.c
+++ b/drivers/net/xen-netback/rx.c
@@ -358,6 +358,7 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
 {
unsigned int offset = 0;
unsigned int flags;
+   bool err = false;
 
do {
size_t len;
@@ -369,6 +370,15 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
offset += len;
pkt->remaining_len -= len;
 
+   if (unlikely(!pkt->frag_iter && pkt->remaining_len)) {
+   pkt->remaining_len = 0;
+   pkt->extra_count = 0;
+   err = true;
+   pr_err_ratelimited("malformed sk_buff at %s\n",
+  queue->name);
+   break;
+   }
+
} while (offset < XEN_PAGE_SIZE && pkt->remaining_len > 0);
 
if (pkt->remaining_len > 0)
@@ -392,7 +402,7 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
rsp->offset = 0;
rsp->flags = flags;
rsp->id = req->id;
-   rsp->status = (s16)offset;
+   rsp->status = likely(!err) ? (s16)offset : XEN_NETIF_RSP_ERROR;
 }
 
 static void xenvif_rx_extra_slot(struct xenvif_queue *queue,
-- 
2.7.4



[PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Dongli Zhang
The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
the received sk_buff is malformed, that is, when the sk_buff has pattern
(skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
stack:

[  438.652658] [ cut here ]
[  438.652660] kernel BUG at drivers/net/xen-netback/rx.c:325!
[  438.652714] invalid opcode:  [#1] SMP NOPTI
[  438.652813] CPU: 0 PID: 2492 Comm: vif1.0-q0-guest Tainted: G   O
 4.16.0-rc6+ #1
[  438.652896] RIP: e030:xenvif_rx_skb+0x3c2/0x5e0 [xen_netback]
[  438.652926] RSP: e02b:c90040877dc8 EFLAGS: 00010246
[  438.652956] RAX: 0160 RBX: 0022 RCX: 0001
[  438.652993] RDX: c900402890d0 RSI:  RDI: c90040889000
[  438.653029] RBP: 88002b460040 R08: c90040877de0 R09: 0100
[  438.653065] R10: 7ff0 R11: 0002 R12: c90040889000
[  438.653100] R13: 8000 R14: 0022 R15: 8000
[  438.653149] FS:  7f15603778c0() GS:88003040() 
knlGS:
[  438.653188] CS:  e033 DS:  ES:  CR0: 80050033
[  438.653219] CR2: 01832a08 CR3: 29c12000 CR4: 00042660
[  438.653262] Call Trace:
[  438.653284]  ? xen_hypercall_event_channel_op+0xa/0x20
[  438.653313]  xenvif_rx_action+0x41/0x80 [xen_netback]
[  438.653341]  xenvif_kthread_guest_rx+0xb2/0x2a8 [xen_netback]
[  438.653374]  ? __schedule+0x352/0x700
[  438.653398]  ? wait_woken+0x80/0x80
[  438.653421]  kthread+0xf3/0x130
[  438.653442]  ? xenvif_rx_action+0x80/0x80 [xen_netback]
[  438.653470]  ? kthread_destroy_worker+0x40/0x40
[  438.653497]  ret_from_fork+0x35/0x40

The issue is hit by xen-netback when there is bug with other networking
interface (e.g., dom0 physical NIC), who has generated and forwarded
malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
purpose with below sample code in a kernel module:

skb->dev = dev; // dev of vifX.Y
skb->len = 386;
skb->data_len = 352;
skb->tail = 98;
skb->end = 384;
skb_shinfo(skb)->nr_frags = 0;
dev->netdev_ops->ndo_start_xmit(skb, dev);

This patch stops processing sk_buff immediately if it is detected as
malformed, that is, pkt->frag_iter is NULL but there is still remaining
pkt->remaining_len.

Signed-off-by: Dongli Zhang 

---
Changed since v1:
  * return XEN_NETIF_RSP_ERROR in response to netfront

 drivers/net/xen-netback/rx.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
index b1cf7c6..a3d8ee9 100644
--- a/drivers/net/xen-netback/rx.c
+++ b/drivers/net/xen-netback/rx.c
@@ -358,6 +358,7 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
 {
unsigned int offset = 0;
unsigned int flags;
+   bool err = false;
 
do {
size_t len;
@@ -369,6 +370,15 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
offset += len;
pkt->remaining_len -= len;
 
+   if (unlikely(!pkt->frag_iter && pkt->remaining_len)) {
+   pkt->remaining_len = 0;
+   pkt->extra_count = 0;
+   err = true;
+   pr_err_ratelimited("malformed sk_buff at %s\n",
+  queue->name);
+   break;
+   }
+
} while (offset < XEN_PAGE_SIZE && pkt->remaining_len > 0);
 
if (pkt->remaining_len > 0)
@@ -392,7 +402,7 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
rsp->offset = 0;
rsp->flags = flags;
rsp->id = req->id;
-   rsp->status = (s16)offset;
+   rsp->status = likely(!err) ? (s16)offset : XEN_NETIF_RSP_ERROR;
 }
 
 static void xenvif_rx_extra_slot(struct xenvif_queue *queue,
-- 
2.7.4