Re: Fwd: Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-26 Thread Khalid Aziz

On 02/23/2018 06:15 PM, Matthew Wilcox wrote:

On Fri, Feb 23, 2018 Randy Dunlap wrote:

[add Matthew Wilcox; hopefully he can look/see]


Thanks, Randy.  I don't understand why nobody else thought to cc the
author of the patch that it was bisected to ...


Sorry, Willy. That was my fault. I should have cc'd you to begin with.



Please try this patch.  It fixes ffe0, but there may be more things
tested that it may not work for.



This patch fixes the problem. I do not see kernel panics with this patch 
any more.


--
Khalid


Chris Mi, what happened to that set of testcases you promised to write
for me?

diff --git a/lib/idr.c b/lib/idr.c
index c98d77fcf393..10d9b8d47c33 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -36,8 +36,8 @@ int idr_alloc_u32(struct idr *idr, void *ptr, u32 *nextid,
  {
struct radix_tree_iter iter;
void __rcu **slot;
-   int base = idr->idr_base;
-   int id = *nextid;
+   unsigned int base = idr->idr_base;
+   unsigned int id = *nextid;
  
  	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))

return -EINVAL;





Re: Fwd: Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-26 Thread Chris Mi

Hi Matthew,

Sorry for the late response. I'll add the idr test cases for the new 
APIs ASAP.


Thanks,
Chris

On 2/24/2018 10:46 AM, Matthew Wilcox wrote:

On Sat, Feb 24, 2018 at 01:49:35AM +, Chris Mi wrote:

To verify this patch, the following is a sanity test case:

# tc qdisc delete dev $link ingress > /dev/null 2>&1;
# tc qdisc add dev $link ingress;
# tc filter add dev $link prio 1 protocol ip handle 0x8001 parent : 
flower skip_hw src_mac e4:11:0:0:0:2 dst_mac e4:12:0:0:0:2 action drop;
# tc filter show dev $link parent :

filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x8001

I added these tests to my local tree for now.

diff --git a/tools/testing/radix-tree/idr-test.c 
b/tools/testing/radix-tree/idr-test.c
index 44ef9eba5a7a..28d99325a32d 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -178,6 +178,29 @@ void idr_get_next_test(int base)
idr_destroy();
  }
  
+void idr_u32_test(struct idr *idr, int base)

+{
+   assert(idr_is_empty(idr));
+   idr_init_base(idr, base);
+   u32 handle = 10;
+   idr_alloc_u32(idr, NULL, , handle, GFP_KERNEL);
+   BUG_ON(handle != 10);
+   idr_remove(idr, handle);
+   assert(idr_is_empty(idr));
+
+   handle = 0x8001;
+   idr_alloc_u32(idr, NULL, , handle, GFP_KERNEL);
+   BUG_ON(handle != 0x8001);
+   idr_remove(idr, handle);
+   assert(idr_is_empty(idr));
+
+   handle = 0xffe0;
+   idr_alloc_u32(idr, NULL, , handle, GFP_KERNEL);
+   BUG_ON(handle != 0xffe0);
+   idr_remove(idr, handle);
+   assert(idr_is_empty(idr));
+}
+
  void idr_checks(void)
  {
unsigned long i;
@@ -248,6 +271,9 @@ void idr_checks(void)
idr_get_next_test(0);
idr_get_next_test(1);
idr_get_next_test(4);
+   idr_u32_test(, 0);
+   idr_u32_test(, 1);
+   idr_u32_test(, 4);
  }
  
  /*




Re: Fwd: Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Matthew Wilcox
On Sat, Feb 24, 2018 at 01:49:35AM +, Chris Mi wrote:
> To verify this patch, the following is a sanity test case:
> 
> # tc qdisc delete dev $link ingress > /dev/null 2>&1;
> # tc qdisc add dev $link ingress;
> # tc filter add dev $link prio 1 protocol ip handle 0x8001 parent : 
> flower skip_hw src_mac e4:11:0:0:0:2 dst_mac e4:12:0:0:0:2 action drop;
> # tc filter show dev $link parent :
> 
> filter pref 1 flower chain 0
> filter pref 1 flower chain 0 handle 0x8001

I added these tests to my local tree for now.

diff --git a/tools/testing/radix-tree/idr-test.c 
b/tools/testing/radix-tree/idr-test.c
index 44ef9eba5a7a..28d99325a32d 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -178,6 +178,29 @@ void idr_get_next_test(int base)
idr_destroy();
 }
 
+void idr_u32_test(struct idr *idr, int base)
+{
+   assert(idr_is_empty(idr));
+   idr_init_base(idr, base);
+   u32 handle = 10;
+   idr_alloc_u32(idr, NULL, , handle, GFP_KERNEL);
+   BUG_ON(handle != 10);
+   idr_remove(idr, handle);
+   assert(idr_is_empty(idr));
+
+   handle = 0x8001;
+   idr_alloc_u32(idr, NULL, , handle, GFP_KERNEL);
+   BUG_ON(handle != 0x8001);
+   idr_remove(idr, handle);
+   assert(idr_is_empty(idr));
+
+   handle = 0xffe0;
+   idr_alloc_u32(idr, NULL, , handle, GFP_KERNEL);
+   BUG_ON(handle != 0xffe0);
+   idr_remove(idr, handle);
+   assert(idr_is_empty(idr));
+}
+
 void idr_checks(void)
 {
unsigned long i;
@@ -248,6 +271,9 @@ void idr_checks(void)
idr_get_next_test(0);
idr_get_next_test(1);
idr_get_next_test(4);
+   idr_u32_test(, 0);
+   idr_u32_test(, 1);
+   idr_u32_test(, 4);
 }
 
 /*


RE: Fwd: Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Chris Mi
> -Original Message-
> From: Matthew Wilcox [mailto:wi...@infradead.org]
> Sent: Saturday, February 24, 2018 9:15 AM
> To: Cong Wang <xiyou.wangc...@gmail.com>; Khalid Aziz
> <khalid.a...@oracle.com>; linux-ker...@vger.kernel.org;
> netdev@vger.kernel.org
> Cc: Chris Mi <chr...@mellanox.com>
> Subject: Re: Fwd: Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running
> selftest
> 
> On Fri, Feb 23, 2018 Randy Dunlap wrote:
> > [add Matthew Wilcox; hopefully he can look/see]
> 
> Thanks, Randy.  I don't understand why nobody else thought to cc the author
> of the patch that it was bisected to ...
> 
> > On 02/23/2018 04:13 PM, Cong Wang wrote:
> > > On Fri, Feb 23, 2018 at 3:27 PM, Cong Wang
> > > <xiyou.wangc...@gmail.com>
> > wrote:
> > >> On Fri, Feb 23, 2018 at 11:00 AM, Randy Dunlap
> > >> <rdun...@infradead.org>
> > wrote:
> > >>> On 02/23/2018 08:05 AM, Khalid Aziz wrote:
> > >>>> Same selftest does not cause panic on 4.15. git bisect pointed to
> > commit 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make 1-based
> > IDRs more efficient").
> > >>>> Kernel config is attached.
> > >>
> > >> Looks like something horribly wrong with u32 key id idr...
> > >
> > > Adding a few printk's, I got:
> > >
> > > [   31.231560] requested handle = ffe0
> > > [   31.232426] allocated handle = 0
> > > ...
> > > [   31.246475] requested handle = ffd0
> > > [   31.247555] allocated handle = 1
> > >
> > >
> > > So the bug is here where we can't allocate a specific handle:
> > >
> > > err = idr_alloc_u32(_c->handle_idr, ht,
> > ,
> > > handle, GFP_KERNEL);
> > > if (err) {
> > > kfree(ht);
> > > return err;
> > > }
> 
> Please try this patch.  It fixes ffe0, but there may be more things tested
> that it may not work for.
> 
> Chris Mi, what happened to that set of testcases you promised to write for
> me?
I promised to write it after the API is stabilized since you were going to 
change it.
I will inform the management about this new task and get back to you later.
> 
> diff --git a/lib/idr.c b/lib/idr.c
> index c98d77fcf393..10d9b8d47c33 100644
> --- a/lib/idr.c
> +++ b/lib/idr.c
> @@ -36,8 +36,8 @@ int idr_alloc_u32(struct idr *idr, void *ptr, u32 *nextid,  
> {
>   struct radix_tree_iter iter;
>   void __rcu **slot;
> - int base = idr->idr_base;
> - int id = *nextid;
> + unsigned int base = idr->idr_base;
> + unsigned int id = *nextid;
> 
>   if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
>   return -EINVAL;
To verify this patch, the following is a sanity test case:

# tc qdisc delete dev $link ingress > /dev/null 2>&1;
# tc qdisc add dev $link ingress;
# tc filter add dev $link prio 1 protocol ip handle 0x8001 parent : 
flower skip_hw src_mac e4:11:0:0:0:2 dst_mac e4:12:0:0:0:2 action drop;
# tc filter show dev $link parent :

filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x8001
  dst_mac e4:12:00:00:00:02
  src_mac e4:11:00:00:00:02
  eth_type ipv4
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 1 ref 1 bind 1

Please make sure the handle is the same as the user specifies.


Re: Fwd: Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Matthew Wilcox
On Fri, Feb 23, 2018 Randy Dunlap wrote:
> [add Matthew Wilcox; hopefully he can look/see]

Thanks, Randy.  I don't understand why nobody else thought to cc the
author of the patch that it was bisected to ...

> On 02/23/2018 04:13 PM, Cong Wang wrote:
> > On Fri, Feb 23, 2018 at 3:27 PM, Cong Wang 
> wrote:
> >> On Fri, Feb 23, 2018 at 11:00 AM, Randy Dunlap 
> wrote:
> >>> On 02/23/2018 08:05 AM, Khalid Aziz wrote:
>  Same selftest does not cause panic on 4.15. git bisect pointed to
> commit 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make 1-based IDRs
> more efficient").
>  Kernel config is attached.
> >>
> >> Looks like something horribly wrong with u32 key id idr...
> >
> > Adding a few printk's, I got:
> >
> > [   31.231560] requested handle = ffe0
> > [   31.232426] allocated handle = 0
> > ...
> > [   31.246475] requested handle = ffd0
> > [   31.247555] allocated handle = 1
> >
> >
> > So the bug is here where we can't allocate a specific handle:
> >
> > err = idr_alloc_u32(_c->handle_idr, ht,
> ,
> > handle, GFP_KERNEL);
> > if (err) {
> > kfree(ht);
> > return err;
> > }

Please try this patch.  It fixes ffe0, but there may be more things
tested that it may not work for.

Chris Mi, what happened to that set of testcases you promised to write
for me?

diff --git a/lib/idr.c b/lib/idr.c
index c98d77fcf393..10d9b8d47c33 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -36,8 +36,8 @@ int idr_alloc_u32(struct idr *idr, void *ptr, u32 *nextid,
 {
struct radix_tree_iter iter;
void __rcu **slot;
-   int base = idr->idr_base;
-   int id = *nextid;
+   unsigned int base = idr->idr_base;
+   unsigned int id = *nextid;
 
if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
return -EINVAL;


Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Randy Dunlap
[add Matthew Wilcox; hopefully he can look/see]

On 02/23/2018 04:13 PM, Cong Wang wrote:
> On Fri, Feb 23, 2018 at 3:27 PM, Cong Wang  wrote:
>> On Fri, Feb 23, 2018 at 11:00 AM, Randy Dunlap  wrote:
>>> [adding netdev]
>>>
>>> On 02/23/2018 08:05 AM, Khalid Aziz wrote:
 I am seeing a kernel panic with 4.16-rc1 and 4.16-rc2 kernels when running 
 selftests
 from tools/testing/selftests. Last messages from selftest before kernel 
 panic are:

>> ...
 Same selftest does not cause panic on 4.15. git bisect pointed to commit 
 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make 1-based IDRs more 
 efficient").
 Kernel config is attached.
>>
>> Looks like something horribly wrong with u32 key id idr...
> 
> Adding a few printk's, I got:
> 
> [   31.231560] requested handle = ffe0
> [   31.232426] allocated handle = 0
> ...
> [   31.246475] requested handle = ffd0
> [   31.247555] allocated handle = 1
> 
> 
> So the bug is here where we can't allocate a specific handle:
> 
> err = idr_alloc_u32(_c->handle_idr, ht, ,
> handle, GFP_KERNEL);
> if (err) {
> kfree(ht);
> return err;
> }
> 


-- 
~Randy


Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Cong Wang
On Fri, Feb 23, 2018 at 3:27 PM, Cong Wang  wrote:
> On Fri, Feb 23, 2018 at 11:00 AM, Randy Dunlap  wrote:
>> [adding netdev]
>>
>> On 02/23/2018 08:05 AM, Khalid Aziz wrote:
>>> I am seeing a kernel panic with 4.16-rc1 and 4.16-rc2 kernels when running 
>>> selftests
>>> from tools/testing/selftests. Last messages from selftest before kernel 
>>> panic are:
>>>
> ...
>>> Same selftest does not cause panic on 4.15. git bisect pointed to commit 
>>> 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make 1-based IDRs more 
>>> efficient").
>>> Kernel config is attached.
>
> Looks like something horribly wrong with u32 key id idr...

Adding a few printk's, I got:

[   31.231560] requested handle = ffe0
[   31.232426] allocated handle = 0
...
[   31.246475] requested handle = ffd0
[   31.247555] allocated handle = 1


So the bug is here where we can't allocate a specific handle:

err = idr_alloc_u32(_c->handle_idr, ht, ,
handle, GFP_KERNEL);
if (err) {
kfree(ht);
return err;
}


Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Cong Wang
On Fri, Feb 23, 2018 at 11:00 AM, Randy Dunlap  wrote:
> [adding netdev]
>
> On 02/23/2018 08:05 AM, Khalid Aziz wrote:
>> I am seeing a kernel panic with 4.16-rc1 and 4.16-rc2 kernels when running 
>> selftests
>> from tools/testing/selftests. Last messages from selftest before kernel 
>> panic are:
>>
...
>> Same selftest does not cause panic on 4.15. git bisect pointed to commit 
>> 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make 1-based IDRs more 
>> efficient").
>> Kernel config is attached.

Looks like something horribly wrong with u32 key id idr...


Re: Kernel panic with 4.16-rc1 (and 4.16-rc2) running selftest

2018-02-23 Thread Randy Dunlap
[adding netdev]

On 02/23/2018 08:05 AM, Khalid Aziz wrote:
> I am seeing a kernel panic with 4.16-rc1 and 4.16-rc2 kernels when running 
> selftests
> from tools/testing/selftests. Last messages from selftest before kernel panic 
> are:
> 
> 
> running psock_tpacket test
> 
> test: TPACKET_V1 with PACKET_RX_RING test: skip TPACKET_V1 PACKET_RX_RING 
> since user and kernel space have different bit width
> test: TPACKET_V1 with PACKET_TX_RING test: skip TPACKET_V1 PACKET_TX_RING 
> since user and kernel space have different bit width
> test: TPACKET_V2 with PACKET_RX_RING  100 pkts (14200 
> bytes)
> test: TPACKET_V2 with PACKET_TX_RING  100 pkts (14200 
> bytes)
> test: TPACKET_V3 with PACKET_RX_RING  100 pkts (14200 
> bytes)
> test: TPACKET_V3 with PACKET_TX_RING  100 pkts (14200 
> bytes)
> OK. All tests passed
> [PASS]
> ok 1..7 selftests: run_afpackettests [PASS]
> selftests: test_bpf.sh
> 
> test_bpf: [FAIL]
> not ok 1..8 selftests:  test_bpf.sh [FAIL]
> selftests: netdevice.sh
> 
> ok 1..9 selftests: netdevice.sh [PASS]
> selftests: rtnetlink.sh
> 
> PASS: policy routing
> PASS: route get
> 
> 
> Kernel panic message is below:
> 
> [  572.486722] BUG: unable to handle kernel paging request at 0600
> [  572.494498] IP: tcf_exts_dump_stats+0x10/0x30
> [  572.499360] PGD 80be413cb067 P4D 80be413cb067 PUD bead15c067 PMD 0 
> [  572.507126] Oops:  [#1] SMP PTI
> [  572.511010] Modules linked in: cls_u32 sch_htb dummy vfat fat ext4 mbcache 
> jb
> d2 intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
> crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper 
> cryptd sg iTCO_wdt iTCO_vendor_support ioatdma ipmi_ssif pcspkr wmi i2c_i801 
> lpc_ich shpchp mfd_core ipmi_si ipmi_devintf ipmi_msghandler nfsd auth_rpcgss 
> nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb ahci 
> crc32c_intel nvme libahci dca drm megaraid_sas nvme_core i2c_algo_bit libata 
> bnxt_en i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [  572.574377] CPU: 81 PID: 17886 Comm: tc Not tainted 4.16.0-rc2 #112
> [  572.581371] Hardware name: Oracle Corporation ORACLE SERVER X7-2/ASM, MB, 
> X7-2, BIOS 41017600 10/06/2017
> [  572.591957] RIP: 0010:tcf_exts_dump_stats+0x10/0x30
> [  572.597402] RSP: 0018:c900313b7928 EFLAGS: 00010206
> [  572.603226] RAX: 0600 RBX: 88bea9117db0 RCX: 
> 1ca4
> [  572.611191] RDX: 1ca3 RSI: 88bea90cf018 RDI: 
> 88be4fb6c000
> [  572.619157] RBP: 88be4fb6c000 R08: 00024800 R09: 
> a05697fb
> [  572.627121] R10: 88bebe064800 R11: ea02faa445c0 R12: 
> 88bea90ce034
> [  572.635087] R13: 88bea90cf000 R14: 88be9fe33300 R15: 
> 88bea90ce000
> [  572.643053] FS:  7f98ae464740() GS:88bebe04() 
> knlGS:
> [  572.652084] CS:  0010 DS:  ES:  CR0: 80050033
> [  572.658497] CR2: 0600 CR3: 00be41a94005 CR4: 
> 007606e0
> [  572.666462] DR0:  DR1:  DR2: 
> 
> [  572.674428] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  572.682393] PKRU: 5554
> [  572.685413] Call Trace:
> [  572.688145]  u32_dump+0x2be/0x3c0 [cls_u32]
> [  572.692816]  tcf_fill_node.isra.29+0x15b/0x1f0
> [  572.69]  tfilter_notify+0xc1/0x150
> [  572.701952]  tc_ctl_tfilter+0x87d/0xbd0
> [  572.706238]  rtnetlink_rcv_msg+0x29c/0x310
> [  572.710813]  ? _cond_resched+0x15/0x30
> [  572.714999]  ? __kmalloc_node_track_caller+0x1b9/0x270
> [  572.720737]  ? rtnl_calcit.isra.28+0x100/0x100
> [  572.725697]  netlink_rcv_skb+0xd2/0x110
> [  572.729969]  netlink_unicast+0x17c/0x230
> [  572.734348]  netlink_sendmsg+0x2cd/0x3c0
> [  572.738719]  sock_sendmsg+0x30/0x40
> [  572.742612]  ___sys_sendmsg+0x27a/0x290
> [  572.746896]  ? do_wp_page+0x89/0x4c0
> [  572.750886]  ? page_add_new_anon_rmap+0x72/0xc0
> [  572.755944]  ? __handle_mm_fault+0x74b/0x1280
> [  572.760807]  __sys_sendmsg+0x51/0x90
> [  572.764800]  do_syscall_64+0x6e/0x1a0
> [  572.76]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [  572.774526] RIP: 0033:0x7f98ada843b0
> [  572.778515] RSP: 002b:7fff833a4f38 EFLAGS: 0246 ORIG_RAX: 
> 002e
> [  572.786963] RAX: ffda RBX: 5a8deb31 RCX: 
> 7f98ada843b0
> [  572.794929] RDX:  RSI: 7fff833a4f80 RDI: 
> 0003
> [  572.802892] RBP: 7fff833a4f80 R08:  R09: 
> 0001
> [  572.810856] R10: 7fff833a4320 R11: 0246 R12: 
> 
> [  572.818823] R13: 00650ba0 R14: 7fff833b11e8 R15: