On 06/26/2017 05:51 PM, 王志克 wrote:
Hi Greg,

The exact issue occured on the 20th of check-kmod (sometimes there are other 
kernel issue: kernel just hangs but without panic). OVS2.6.0 on CentOS7.2 with 
kernel 3.10.0-327.el7.x86_64. Some info below, which hopes helpful.

OK, I'll try with that kernel.  The three VMs I have that are running the test 
are still up and running after overnight.  So let me try the base install  
kernel.

Thanks,

- Greg


datapath-sanity

   1: datapath - ping between two ports               ok
   2: datapath - http between two ports               ok
   3: datapath - ping between two ports on vlan       ok
   4: datapath - ping6 between two ports              ok
   5: datapath - ping6 between two ports on vlan      ok
   6: datapath - ping over vxlan tunnel               FAILED 
(system-traffic.at:159)
   7: datapath - ping over gre tunnel                 FAILED 
(system-traffic.at:199)
   8: datapath - ping over geneve tunnel              skipped 
(system-traffic.at:213)
   9: datapath - basic truncate action                ok
  10: datapath - truncate and output to gre tunnel    FAILED 
(system-traffic.at:445)
  11: conntrack - controller                          FAILED 
(system-traffic.at:522)
  12: conntrack - IPv4 HTTP                           ok
  13: conntrack - IPv6 HTTP                           ok
  14: conntrack - IPv4 ping                           ok
  15: conntrack - IPv6 ping                           ok
  16: conntrack - commit, recirc                      ok
  17: conntrack - preserve registers                  ok
  18: conntrack - invalid                             ok
  19: conntrack - zones                               ok
  20: conntrack - zones from field ....(system crash...)


[root@localhost vmcore-127.0.0.1-2017-06-25-23:17:12]# ls
analyzer      backtrace  count      last_occurrence  os_info     runlevel  type 
 username  vmcore
architecture  component  event_log  machineid        os_release  time      uid  
 uuid      vmcore-dmesg.txt
[root@localhost vmcore-127.0.0.1-2017-06-25-23:17:12]# cat backtrace

Version: 3.10.0-327.el7.x86_64
BUG: unable to handle kernel paging request at ffffffffa0715ae8
IP: [<ffffffff8108e6a7>] get_next_timer_interrupt+0x97/0x270
PGD 194d067 PUD 194e063 PMD b746f067 PTE 0
Oops: 0000 [#1] SMP
Modules linked in: nf_nat_ftp nf_conntrack_ftp nf_conntrack_netlink nfnetlink 
ip_gre ip_tunnel gre vxlan ip6_udp_tunnel udp_tunnel 8021q garp m               
                                                                               
rp veth xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter 
ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute brid             
                                                                                
 ge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6   
                                                                                
           table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_ra      
                                                                                
        w iptable_filter vmw_vsock_vmci_transport vsock bnep dm_mirror 
dm_region_hash dm_log dm_mod snd_seq_midi snd_seq_midi_event snd_ens1371 
snd_raw                                                                         
                     midi coretemp snd_ac97_codec ac97_bus crc32_pclmul snd_seq 
ghash_clmulni_intel ppdev
  snd_seq_device cryptd btusb snd_pcm bluetooth snd_timer snd soundcore sg 
vmw_balloon rfkill pcspkr parport_pc parport i2c_piix4 vmw_vmci shpch           
                                                                                
   p nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod 
cdrom ata_generic sd_mod crc_t10dif crct10dif_generic pata_acpi cr              
                                                                                
ct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmwgfx drm_kms_helper 
ttm mptspi scsi_transport_spi e1000 mptscsih mptbase drm i2c_core               
                                                                                
ata_piix libata [last unloaded: openvswitch]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE  ------------   
3.10.0-327.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
Platform, BIOS 6.00 07/02/2015
task: ffff8800b9a81700 ti: ffff8800b9a8c000 task.ti: ffff8800b9a8c000
RIP: 0010:[<ffffffff8108e6a7>]  [<ffffffff8108e6a7>] 
get_next_timer_interrupt+0x97/0x270
RSP: 0018:ffff8800b9a8fdd8  EFLAGS: 00010012
RAX: ffffffffa0715ad0 RBX: 00000863b6f08300 RCX: ffff8800b95a8d08
RDX: 00000000000000ce RSI: 00000000000000ce RDI: 0000000100882cce
RBP: ffff8800b9a8fe30 R08: 0000000000000202 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000100882ccd
R13: 7fffffffffffffff R14: ffff8800b95a8000 R15: 0000000100882ccd
FS:  0000000000000000(0000) GS:ffff8800bb620000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0715ae8 CR3: 00000000b64d8000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
  ffff8800b9f5e780 0000000000000000 ffff8800b9a8dfd8 ffff8800b9a8fe10
  ffff8800b9a8fe48 20cc1170855d3261 ffff8800bb62dbc0 00000863b6f08300
  0000000000000001 ffff8800bb62cf00 0000000100882ccd ffff8800b9a8fe88
Call Trace:
  [<ffffffff810e0978>] tick_nohz_stop_sched_tick+0x1e8/0x2e0
  [<ffffffff8101cd15>] ? native_sched_clock+0x35/0x80
  [<ffffffff810e0b0e>] __tick_nohz_idle_enter+0x9e/0x150
  [<ffffffff810e102d>] tick_nohz_idle_enter+0x3d/0x70
  [<ffffffff810d615e>] cpu_startup_entry+0x9e/0x290
  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
Code: 18 49 8b 7e 10 48 39 cf 48 89 ca 78 5a 40 0f b6 d7 89 d6 48 63 c6 48 c1 e0 04 
49 8d 0c 06 48 8b 41 28 48 83 c1 28 48 39 c8 74 0e <f6> 40                      
                                                                         18 01 74 23 
48 8b 00 48 39 c8 75 f2 83 c6 01 40 0f b6 f6
RIP  [<ffffffff8108e6a7>] get_next_timer_interrupt+0x97/0x270
  RSP <ffff8800b9a8fdd8>


Wang Zhike

-----邮件原件-----
发件人: Greg Rose [mailto:gvrose8...@gmail.com]
发送时间: 2017年6月27日 6:26
收件人: 王志克
抄送: d...@openvswitch.org; Joe Stringer
主题: Re: [ovs-dev] 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs 
reassemble

On 06/26/2017 04:56 AM, 王志克 wrote:
> Hi Joe,
>
> I will try to check how to send the patch. Maybe tomorrow since I am quite 
busy now.
>
> Regarding the crash, I can reproduce it even with official OVS, like 
ovs2.6.0. (I just run the check kmod in a loop until kernel panic). So it is not 
related to the new fix.
>
> Br,
> Wang Zhike
I've been running 'make check-kmod' in a continuous loop on 3 virtual machines 
since this morning.  So far no kernel splats but plenty of errors:

This is on the Ubuntu machine running 4.0 kernel:

ERROR: 66 tests were run,
24 failed unexpectedly.
23 tests were skipped.
## -------------------------------------- ## ## system-kmod-testsuite.log was 
created. ## ## -------------------------------------- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

     To: <b...@openvswitch.org>
        Subject: [openvswitch 2.7.90] system-kmod-testsuite: 16 17 35 57 58 59 
60 61 62 63 70 71 72 75 76 81 82 83 84 85 86 87 88 89 failed

Centos 7.2 running 4.9.24 kernel:

## ------------- ##
## Test results. ##
## ------------- ##

ERROR: 76 tests were run,
34 failed unexpectedly.
13 tests were skipped.
## -------------------------------------- ## ## system-kmod-testsuite.log was 
created. ## ## -------------------------------------- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

     To: <b...@openvswitch.org>
        Subject: [openvswitch 2.7.90] system-kmod-testsuite: 2 14 15 20 21 22 
23 24 25 26 27 28 29 30 31 32 47 48 49 50 51 57 59 60 61 62 70 71 75 76 84 85 
86 87 failed

Centos 7.2 running 4.10.17 kernel:

## ------------- ##
## Test results. ##
## ------------- ##

ERROR: 74 tests were run,
34 failed unexpectedly.
15 tests were skipped.
## -------------------------------------- ## ## system-kmod-testsuite.log was 
created. ## ## -------------------------------------- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

     To: <b...@openvswitch.org>
        Subject: [openvswitch 2.7.90] system-kmod-testsuite: 2 14 15 20 21 22 
23 24 25 26 27 28 29 30 31 32 47 48 49 50 51 57 59 60 61 62 70 71 75 76 84 85 
86 87 failed

I confess to not spending a lot of time running check-kmod.  I certainly intend 
to in the future.

- Greg

>
> -----邮件原件-----
> 发件人: Joe Stringer [mailto:j...@ovn.org]
> 发送时间: 2017年6月24日 5:15
> 收件人: 王志克
> 抄送: d...@openvswitch.org
> 主题: Re: 答复: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs
> reassemble
>
> Hi Wang Zhike,
>
> I'd like if others like Greg could take a look as well, since this code is 
delicate. The more review it gets, the better. It seems like maybe the version of 
your email that goes to the list does not get the attachment. Perhaps you could 
try sending the patch using git send-email or putting the patch on GitHub instead, 
and linking to it here.
>
> For what it's worth, I did run your patch for a while and it seemed
> OK, but when I tried again today on an Ubuntu Trusty (Linux
> 3.13.0-119-generic) box, running make check-kmod, I saw an issue with
> get_next_timer_interrupt():
>
> [181250.892557] BUG: unable to handle kernel paging request at
> ffffffffa03317e0 [181250.892557] IP: [<ffffffff81079606>]
> get_next_timer_interrupt+0x86/0x250
> [181250.892557] PGD 1c11067 PUD 1c12063 PMD 1381a2067 PTE 0
> [181250.892557] Oops: 0000 [#1] SMP [181250.892557] Modules linked in:
> nf_nat_ipv6 nf_nat_ipv4 nf_nat
> gre(-) nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6
> nf_defrag_ipv4 nf_conntrack_netlink nfnetlink nf_conntrack bonding
> 8021q garp stp mrp llc veth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
> fscache dm_crypt kvm_intel kvm serio_raw netconsole configfs
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy ahci 
libahci [last unloaded: libcrc32c]
> [181250.892557] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OX
> 3.13.0-119-generic #166-Ubuntu
> [181250.892557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011 [181250.892557] task: ffffffff81c15480 ti: ffffffff81c00000 
task.ti:
> ffffffff81c00000
> [181250.892557] RIP: 0010:[<ffffffff81079606>]  [<ffffffff81079606>]
> get_next_timer_interrupt+0x86/0x250
> [181250.892557] RSP: 0018:ffffffff81c01e00  EFLAGS: 00010002 [181250.892557] 
RAX: ffffffffa03317c8 RBX: 0000000102b245da RCX:
> 00000000000000db
> [181250.892557] RDX: ffffffff81ebac58 RSI: 00000000000000db RDI:
> 0000000102b245db
> [181250.892557] RBP: ffffffff81c01e48 R08: 0000000000c88c1c R09:
> 0000000000000000
> [181250.892557] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000142b245d9
> [181250.892557] R13: ffffffff81eb9e80 R14: 0000000102b245da R15:
> 0000000000cd63e8
> [181250.892557] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000)
> knlGS:0000000000000000
> [181250.892557] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
[181250.892557] CR2: ffffffffa03317e0 CR3: 000000003707f000 CR4:
> 00000000000006f0
> [181250.892557] Stack:
> [181250.892557]  0000000000000000 ffffffff81c01e30 ffffffff810a3af5
> ffff88013fc13bc0
> [181250.892557]  ffff88013fc0dce0 0000000102b245da 0000000000000000
> 00000063ae154000
> [181250.892557]  0000000000cd63e8 ffffffff81c01ea8 ffffffff810da655
> 0000a4d8c2cb6200
> [181250.892557] Call Trace:
> [181250.892557]  [<ffffffff810a3af5>] ? set_next_entity+0x95/0xb0
> [181250.892557]  [<ffffffff810da655>]
> tick_nohz_stop_sched_tick+0x1e5/0x340
> [181250.892557]  [<ffffffff810da851>]
> __tick_nohz_idle_enter+0xa1/0x160 [181250.892557]
> [<ffffffff810dab4d>] tick_nohz_idle_enter+0x3d/0x70 [181250.892557]
> [<ffffffff810c2af7>] cpu_startup_entry+0x87/0x2b0 [181250.892557]
> [<ffffffff8171b387>] rest_init+0x77/0x80 [181250.892557]
> [<ffffffff81d34f6a>] start_kernel+0x432/0x43d [181250.892557]
> [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c [181250.892557]
> [<ffffffff81d34120>] ? early_idt_handler_array+0x120/0x120
> [181250.892557]  [<ffffffff81d345ee>]
> x86_64_start_reservations+0x2a/0x2c
> [181250.892557]  [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
> [181250.892557] Code: 8b 7d 10 4d 8b 75 18 4c 39 f7 78 5c 40 0f b6 cf
> 89 ce 48 63 c6 48 c1 e0 04 49 8d 54 05 00 48 8b 42 28 48 83 c2 28 48
> 39 d0 74 0e <f6> 40 18 01 74 24 48 8b 00 48 39 d0 75 f2 83 c6 01 40 0f
> b6 f6
> [181250.892557] RIP  [<ffffffff81079606>]
> get_next_timer_interrupt+0x86/0x250
> [181250.892557]  RSP <ffffffff81c01e00> [181250.892557] CR2:
> ffffffffa03317e0
>
> It seems like perhaps a fragment timer signed up by OVS is still
> remaining when the OVS module is unloaded, so it may attempt to clean
> up an entry using OVS code but the OVS code has been unloaded at that
> point. This might be related to IPv6 cvlan test - that seems to be
> where my VM froze and went to 100% CPU, but I would think that the
> IPv6 fragmentation cleanup test is a more likely to cause this, since it 
leaves fragments behind in the cache after the test finishes. I've only hit this 
when running all of the tests in make check-kmod.
>
> Cheers,
> Joe
>
> On 22 June 2017 at 17:53, 王志克 <wangzh...@jd.com> wrote:
>> Hi Joe,
>>
>> Please check the attachment. Thanks.
>>
>> Br,
>> Wang Zhike
>>
>> -----邮件原件-----
>> 发件人: Joe Stringer [mailto:j...@ovn.org]
>> 发送时间: 2017年6月23日 8:20
>> 收件人: 王志克
>> 抄送: d...@openvswitch.org
>> 主题: Re: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs
>> reassemble
>>
>> On 21 June 2017 at 18:54, 王志克 <wangzh...@jd.com> wrote:
>>> Ovs and kernel stack would add frag_queue to same netns_frags list.
>>> As result, ovs and kernel may access the fraq_queue without correct
>>> lock. Also the struct ipq may be different on kernel(older than
>>> 4.3), which leads to invalid pointer access.
>>>
>>> The fix creates specific netns_frags for ovs.
>>>
>>> Signed-off-by: wangzhike <wangzh...@jd.com>
>>> ---
>>
>> Hi,
>>
>> It looks like the whitespace has been corrupted in this version of the patch 
that you sent, I cannot apply it. Probably your email client mistreats it when 
sending the email out. A reliable method to send patches correctly via email is to 
use the commandline client 'git send-email'. This is the preferred method. If you are 
unable to set that up, consider attaching the patch to the email (or send a pull 
request on GitHub).
>>
>> Cheers,
>> Joe
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to