just an update for this question.
the issue is resolved with a kernel update

upgrading multiple compute nodes from kernel 4.4.0.93 to 4.4.0.98 fixed the
softlockup issue. Also this kernel change does not seem to have broken
anything else in openstack

-- Jim

On Fri, Nov 10, 2017 at 9:50 AM, Jim Okken <j...@jokken.com> wrote:

> ===== UPDATE 11/10 ======
> hi again,
>
> based on some advice from a member of this mailing list we've been looking
> into kernel and driver versions of our compute nodes
>
> We also have plain non openstack "KVM on Ubuntu" servers for testing.
>
> I looked at driver and kernel differences between these Ubuntu 16 w/ KVM
> systems and our openstack compute nodes. I found Ubuntu 16 w/ KVM was at
> kernel version 4.4.0-87 and that the openstack compute nodes were at
> 4.4.0-93. So I upgraded the Ubuntu 16 w/ KVM to 4.4.0-93 and was able to
> reproduce this problem (but only on the exact HP hardware that is our
> openstack compute nodes, and not on other hardware).
> Next I updated these Ubuntu 16 w/ KVM to 4.4.0-98 and the problem no
> longer occured!
>
> I need to upgrade a few openstack compute nodes to 4.4.0-98 and test. Do
> anyone think this kernel change could break openstack?
>
> In the kernel change log I found a fix for a specific HP server in
> 4.4.0-98 (not the same as our server but somewhat similar)
>
> thanks!
>
> -- Jim
>
> On Mon, Oct 23, 2017 at 10:25 PM, Jim Okken <j...@jokken.com> wrote:
>
>> ===== UPDATE 10/23 ======
>>
>> we have been trying different things to get better debug we disabled
>> rate-limiting in order to get better info in /var/log/message. for some
>> reason (maybe unrelated) we didn't get the soft lockup during this test But
>> this time we got openvswitch, br_netfilter, etc in the call trace in
>> /var/log/messages
>>
>> Please advise in any way! thx!!
>>
>> basically we are running various types of SIP/RTP test traffic between 2
>> instances (on different compute nodes). This time instead of one hypervisor
>> getting the errors both hypervisors did, but neither got the soft lockup.
>>
>> log snippetes below, full logs here:
>>
>> www.jokken.com/downloads/node-68.txt
>>
>> www.jokken.com/downloads/node-90.txt
>>
>>
>> *node-68*
>>
>> 2017-10-20T17:48:37.031741+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 40 messages lost due to rate-limiting
>>
>> 2017-10-20T17:58:36.281069+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T17:58:37.548500+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 41 messages lost due to rate-limiting
>>
>> 2017-10-20T18:08:36.180377+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:08:37.058861+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 40 messages lost due to rate-limiting
>>
>> 2017-10-20T18:18:36.175797+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:18:37.583237+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 41 messages lost due to rate-limiting
>>
>> 2017-10-20T18:28:36.172090+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:28:37.125346+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 40 messages lost due to rate-limiting
>>
>>
>>
>> ps -aef | grep 5080
>>
>> ceilome+ 5080 3502 0 Oct03 ? 01:32:57 ceilometer-polling - AgentManager(0)
>>
>>
>>
>> 2017-10-20T18:35:10.759230+00:00 node-68 rsyslogd: [origin
>> software="rsyslogd" swVersion="8.16.0" x-pid="3431" x-info="
>> http://www.rsyslog.com";] exiting on signal 15.
>>
>> 2017-10-20T18:35:10.790611+00:00 node-68 rsyslogd: [origin
>> software="rsyslogd" swVersion="8.16.0" x-pid="23851" x-info="
>> http://www.rsyslog.com";] start
>>
>> 2017-10-20T18:35:10.790395+00:00 node-68 rsyslogd: rsyslogd's groupid
>> changed to 108
>>
>> 2017-10-20T18:35:10.790455+00:00 node-68 rsyslogd: rsyslogd's userid
>> changed to 104
>>
>> 2017-10-20T18:35:10.790491+00:00 node-68 rsyslogd-2357: queue "action 0
>> queue": high water mark is set quite low at 8000. You should only set it
>> below 60% (600000) if you have a good reason for this. [v8.16.0 try
>> http://www.rsyslog.com/e/2357 ]
>>
>>
>>
>> Test starts: Fri Oct 20 18:52:48 2017
>>
>>
>>
>> 2017-10-20T18:56:20.408532+00:00 node-68 kernel: [1458996.797708]
>> ------------[ cut here ]------------
>>
>> 2017-10-20T18:56:20.408571+00:00 node-68 kernel: [1458996.797728]
>> WARNING: CPU: 27 PID: 0 at 
>> /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
>> skb_warn_bad_offload+0xd1/0x120()
>>
>> 2017-10-20T18:56:20.408574+00:00 node-68 kernel: [1458996.797732]
>> qvofd385f05-cb: caps=(0x00000184075b59e9, 0x0000000000000000) len=2636
>> data_len=2594 gso_size=1480 gso_type=6 ip_summed=0
>>
>> 2017-10-20T18:56:20.408576+00:00 node-68 kernel: [1458996.797735]
>> Modules linked in: bonding binfmt_misc nf_conntrack_netlink vhost_net vhost
>> macvtap macvlan xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
>> ip_set_hash_net ip_set nfnetlink veth ip6table_raw ebtable_filter ebtables
>> openvswitch ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>> ocfs2_nodemanager ocfs2_stackglue configfs ip6table_filter ip6_tables
>> xt_multiport xt_conntrack iptable_filter xt_comment xt_CT iptable_raw
>> ip_tables x_tables xfs ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal
>> intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev
>> 8021q serio_raw input_leds garp mrp stp llc sb_edac edac_core hpilo ioatdma
>> lpc_ich shpchp dca 8250_fintek ipmi_si ipmi_msghandler acpi_power_meter
>> mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad
>> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
>> nf_defrag_ipv4 nf_conntrack autofs4 dm_round_robin raid10 raid456
>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor ses
>> enclosure raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage
>> hid_generic usbhid hid psmouse lpfc ahci libahci be2net vxlan
>> scsi_transport_fc ip6_udp_tunnel udp_tunnel wmi fjes scsi_dh_emc
>> scsi_dh_rdac scsi_dh_alua dm_multipath
>>
>> 2017-10-20T18:56:20.408580+00:00 node-68 kernel: [1458996.797828] CPU:
>> 27 PID: 0 Comm: swapper/27 Tainted: G        W       4.4.0-93-generic
>> #116-Ubuntu
>>
>> 2017-10-20T18:56:20.408582+00:00 node-68 kernel: [1458996.797830]
>> Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
>>
>> 2017-10-20T18:56:20.408583+00:00 node-68 kernel: [1458996.797832]
>> 0000000000000286 0fc821d2ff4865f6 ffff88103fc437d0 ffffffff813f9f83
>>
>> 2017-10-20T18:56:20.408615+00:00 node-68 kernel: [1458996.797835]
>> ffff88103fc43818 ffffffff81d6f780 ffff88103fc43808 ffffffff810812f2
>>
>> 2017-10-20T18:56:20.408623+00:00 node-68 kernel: [1458996.797838]
>> ffff88203343f200 ffff880f8e0d1000 0000000000000006 0000000000000006
>>
>> 2017-10-20T18:56:20.408624+00:00 node-68 kernel: [1458996.797840] Call
>> Trace:
>>
>> 2017-10-20T18:56:20.408625+00:00 node-68 kernel: [1458996.797842]
>> <IRQ>  [<ffffffff813f9f83>] dump_stack+0x63/0x90
>>
>> 2017-10-20T18:56:20.408626+00:00 node-68 kernel: [1458996.797859]
>> [<ffffffff810812f2>] warn_slowpath_common+0x82/0xc0
>>
>> 2017-10-20T18:56:20.408627+00:00 node-68 kernel: [1458996.797861]
>> [<ffffffff8108138c>] warn_slowpath_fmt+0x5c/0x80
>>
>> 2017-10-20T18:56:20.408632+00:00 node-68 kernel: [1458996.797865]
>> [<ffffffff814000a2>] ? ___ratelimit+0xa2/0xe0
>>
>> 2017-10-20T18:56:20.408635+00:00 node-68 kernel: [1458996.797867]
>> [<ffffffff81735cd1>] skb_warn_bad_offload+0xd1/0x120
>>
>> 2017-10-20T18:56:20.408636+00:00 node-68 kernel: [1458996.797870]
>> [<ffffffff817393dd>] __skb_gso_segment+0xfd/0x110
>>
>> 2017-10-20T18:56:20.408638+00:00 node-68 kernel: [1458996.797878]
>> [<ffffffffc054927b>] queue_gso_packets+0x5b/0x150 [openvswitch]
>>
>> 2017-10-20T18:56:20.408639+00:00 node-68 kernel: [1458996.797881]
>> [<ffffffffc0572e43>] ? br_nf_forward_ip+0x2a3/0x480 [br_netfilter]
>>
>> 2017-10-20T18:56:20.408640+00:00 node-68 kernel: [1458996.797884]
>> [<ffffffffc0572530>] ? br_validate_ipv4.isra.23+0x200/0x200
>> [br_netfilter]
>>
>> 2017-10-20T18:56:20.408641+00:00 node-68 kernel: [1458996.797889]
>> [<ffffffff8176e312>] ? nf_iterate+0x62/0x80
>>
>> 2017-10-20T18:56:20.408644+00:00 node-68 kernel: [1458996.797892]
>> [<ffffffff8176e3a3>] ? nf_hook_slow+0x73/0xd0
>>
>> 2017-10-20T18:56:20.408645+00:00 node-68 kernel: [1458996.797901]
>> [<ffffffffc07b11f4>] ? __br_forward+0x104/0x130 [bridge]
>>
>> 2017-10-20T18:56:20.408646+00:00 node-68 kernel: [1458996.797905]
>> [<ffffffffc05494a1>] ovs_dp_upcall+0x31/0x60 [openvswitch]
>>
>> 2017-10-20T18:56:20.408648+00:00 node-68 kernel: [1458996.797909]
>> [<ffffffffc05495da>] ovs_dp_process_packet+0x10a/0x130 [openvswitch]
>>
>> 2017-10-20T18:56:20.408649+00:00 node-68 kernel: [1458996.797914]
>> [<ffffffffc055267c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
>>
>> 2017-10-20T18:56:20.408650+00:00 node-68 kernel: [1458996.797917]
>> [<ffffffff8172ef06>] ? __skb_flow_dissect+0x6a6/0x9f0
>>
>> 2017-10-20T18:56:20.408653+00:00 node-68 kernel: [1458996.797920]
>> [<ffffffff8176e312>] ? nf_iterate+0x62/0x80
>>
>> 2017-10-20T18:56:20.408654+00:00 node-68 kernel: [1458996.797922]
>> [<ffffffff8172f2ea>] ? __skb_get_hash+0x9a/0x300
>>
>> 2017-10-20T18:56:20.408655+00:00 node-68 kernel: [1458996.797926]
>> [<ffffffff811ee1eb>] ? __slab_free+0xcb/0x2c0
>>
>> 2017-10-20T18:56:20.408656+00:00 node-68 kernel: [1458996.797930]
>> [<ffffffff81722a27>] ? skb_release_data+0xa7/0xd0
>>
>> 2017-10-20T18:56:20.408657+00:00 node-68 kernel: [1458996.797934]
>> [<ffffffffc0553899>] netdev_frame_hook+0xe9/0x150 [openvswitch]
>>
>> 2017-10-20T18:56:20.408658+00:00 node-68 kernel: [1458996.797937]
>> [<ffffffff817374a4>] __netif_receive_skb_core+0x364/0xa60
>>
>> 2017-10-20T18:56:20.408665+00:00 node-68 kernel: [1458996.797939]
>> [<ffffffff81722f00>] ? skb_complete_wifi_ack+0xa0/0xe0
>>
>> 2017-10-20T18:56:20.408666+00:00 node-68 kernel: [1458996.797942]
>> [<ffffffff81735fef>] ? __dev_kfree_skb_any+0x2f/0x40
>>
>> 2017-10-20T18:56:20.408679+00:00 node-68 kernel: [1458996.797947]
>> [<ffffffffc0090304>] ? be_get_new_eqd.isra.63+0x124/0x1f0 [be2net]
>>
>> 2017-10-20T18:56:20.408680+00:00 node-68 kernel: [1458996.797949]
>> [<ffffffff81737bb8>] __netif_receive_skb+0x18/0x60
>>
>> 2017-10-20T18:56:20.408681+00:00 node-68 kernel: [1458996.797951]
>> [<ffffffff817389a8>] process_backlog+0xa8/0x150
>>
>> 2017-10-20T18:56:20.408684+00:00 node-68 kernel: [1458996.797954]
>> [<ffffffff817380fe>] net_rx_action+0x21e/0x360
>>
>> 2017-10-20T18:56:20.408685+00:00 node-68 kernel: [1458996.797957]
>> [<ffffffff81085dd1>] __do_softirq+0x101/0x290
>>
>> 2017-10-20T18:56:20.408686+00:00 node-68 kernel: [1458996.797959]
>> [<ffffffff810860d3>] irq_exit+0xa3/0xb0
>>
>> 2017-10-20T18:56:20.408687+00:00 node-68 kernel: [1458996.797963]
>> [<ffffffff81050e03>] smp_call_function_single_interrupt+0x33/0x40
>>
>> 2017-10-20T18:56:20.408687+00:00 node-68 kernel: [1458996.797967]
>> [<ffffffff81844622>] call_function_single_interrupt+0x82/0x90
>>
>> 2017-10-20T18:56:20.408688+00:00 node-68 kernel: [1458996.797968]
>> <EOI>  [<ffffffff816d5ad1>] ? cpuidle_enter_state+0x111/0x2b0
>>
>> 2017-10-20T18:56:20.408691+00:00 node-68 kernel: [1458996.797973]
>> [<ffffffff816d5ca7>] cpuidle_enter+0x17/0x20
>>
>> 2017-10-20T18:56:20.408692+00:00 node-68 kernel: [1458996.797977]
>> [<ffffffff810c4772>] call_cpuidle+0x32/0x60
>>
>> 2017-10-20T18:56:20.408693+00:00 node-68 kernel: [1458996.797979]
>> [<ffffffff816d5c83>] ? cpuidle_select+0x13/0x20
>>
>> 2017-10-20T18:56:20.408694+00:00 node-68 kernel: [1458996.797982]
>> [<ffffffff810c4a30>] cpu_startup_entry+0x290/0x350
>>
>> 2017-10-20T18:56:20.408695+00:00 node-68 kernel: [1458996.797984]
>> [<ffffffff810517b4>] start_secondary+0x154/0x190
>>
>> 2017-10-20T18:56:20.408695+00:00 node-68 kernel: [1458996.797989] ---[
>> end trace d44d42b3ada78269 ]---
>>
>> 2017-10-20T19:00:19.679060+00:00 node-68 kernel: [1459236.052489]
>> ------------[ cut here ]------------
>>
>> 2017-10-20T19:00:19.679080+00:00 node-68 kernel: [1459236.052509]
>> WARNING: CPU: 27 PID: 0 at 
>> /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
>> skb_warn_bad_offload+0xd1/0x120()
>>
>> 2017-10-20T19:00:19.679081+00:00 node-68 kernel: [1459236.052513]
>> qvofd385f05-cb: caps=(0x00000184075b59e9, 0x0000000000000000) len=2642
>> data_len=0 gso_size=1480 gso_type=6 ip_summed=0
>>
>> 2017-10-20T19:00:19.679082+00:00 node-68 kernel: [1459236.052515]
>> Modules linked in: bonding binfmt_misc nf_conntrack_netlink vhost_net vhost
>> macvtap macvlan xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
>> ip_set_hash_net ip_set nfnetlink veth ip6table_raw ebtable_filter ebtables
>> openvswitch ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>> ocfs2_nodemanager ocfs2_stackglue configfs ip6table_filter ip6_tables
>> xt_multiport xt_conntrack iptable_filter xt_comment xt_CT iptable_raw
>> ip_tables x_tables xfs ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal
>> intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev
>> 8021q serio_raw input_leds garp mrp stp llc sb_edac edac_core hpilo ioatdma
>> lpc_ich shpchp dca 8250_fintek ipmi_si ipmi_msghandler acpi_power_meter
>> mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad
>> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
>> nf_defrag_ipv4 nf_conntrack autofs4 dm_round_robin raid10 raid456
>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor ses
>> enclosure raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage
>> hid_generic usbhid hid psmouse lpfc ahci libahci be2net vxlan
>> scsi_transport_fc ip6_udp_tunnel udp_tunnel wmi fjes scsi_dh_emc
>> scsi_dh_rdac scsi_dh_alua dm_multipath
>>
>> 2017-10-20T19:00:19.679084+00:00 node-68 kernel: [1459236.052606] CPU:
>> 27 PID: 0 Comm: swapper/27 Tainted: G        W       4.4.0-93-generic
>> #116-Ubuntu
>>
>> 2017-10-20T19:00:19.679098+00:00 node-68 kernel: [1459236.052609]
>> Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
>>
>> 2017-10-20T19:00:19.679099+00:00 node-68 kernel: [1459236.052611]
>> 0000000000000286 0fc821d2ff4865f6 ffff88103fc437d0 ffffffff813f9f83
>>
>> 2017-10-20T19:00:19.679114+00:00 node-68 kernel: [1459236.052614]
>> ffff88103fc43818 ffffffff81d6f780 ffff88103fc43808 ffffffff810812f2
>>
>> 2017-10-20T19:00:19.679116+00:00 node-68 kernel: [1459236.052616]
>> ffff880f89469000 ffff880f8e0d1000 0000000000000006 0000000000000006
>>
>> 2017-10-20T19:00:19.679117+00:00 node-68 kernel: [1459236.052619] Call
>> Trace:
>>
>> 2017-10-20T19:00:19.679118+00:00 node-68 kernel: [1459236.052620]
>> <IRQ>  [<ffffffff813f9f83>] dump_stack+0x63/0x90
>>
>> 2017-10-20T19:00:19.679119+00:00 node-68 kernel: [1459236.052629]
>> [<ffffffff810812f2>] warn_slowpath_common+0x82/0xc0
>>
>> 2017-10-20T19:00:19.679119+00:00 node-68 kernel: [1459236.052633]
>> [<ffffffff8108138c>] warn_slowpath_fmt+0x5c/0x80
>>
>> 2017-10-20T19:00:19.679120+00:00 node-68 kernel: [1459236.052637]
>> [<ffffffff814000a2>] ? ___ratelimit+0xa2/0xe0
>>
>> 2017-10-20T19:00:19.679121+00:00 node-68 kernel: [1459236.052639]
>> [<ffffffff81735cd1>] skb_warn_bad_offload+0xd1/0x120
>>
>> 2017-10-20T19:00:19.679122+00:00 node-68 kernel: [1459236.052642]
>> [<ffffffff817393dd>] __skb_gso_segment+0xfd/0x110
>>
>> 2017-10-20T19:00:19.679123+00:00 node-68 kernel: [1459236.052649]
>> [<ffffffffc054927b>] queue_gso_packets+0x5b/0x150 [openvswitch]
>>
>> 2017-10-20T19:00:19.679124+00:00 node-68 kernel: [1459236.052653]
>> [<ffffffffc0572e43>] ? br_nf_forward_ip+0x2a3/0x480 [br_netfilter]
>>
>> 2017-10-20T19:00:19.679124+00:00 node-68 kernel: [1459236.052659]
>> [<ffffffffc009252d>] ? be_xmit_enqueue+0x5bd/0x630 [be2net]
>>
>> 2017-10-20T19:00:19.679125+00:00 node-68 kernel: [1459236.052662]
>> [<ffffffffc009269b>] ? be_xmit_flush+0xfb/0x110 [be2net]
>>
>> 2017-10-20T19:00:19.679126+00:00 node-68 kernel: [1459236.052665]
>> [<ffffffffc00929a0>] ? be_xmit+0x2f0/0x730 [be2net]
>>
>> 2017-10-20T19:00:19.679127+00:00 node-68 kernel: [1459236.052670]
>> [<ffffffffc05494a1>] ovs_dp_upcall+0x31/0x60 [openvswitch]
>>
>> 2017-10-20T19:00:19.679129+00:00 node-68 kernel: [1459236.052673]
>> [<ffffffffc05495da>] ovs_dp_process_packet+0x10a/0x130 [openvswitch]
>>
>> 2017-10-20T19:00:19.679129+00:00 node-68 kernel: [1459236.052678]
>> [<ffffffffc055267c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
>>
>> 2017-10-20T19:00:19.679130+00:00 node-68 kernel: [1459236.052684]
>> [<ffffffffc07b0eb0>] ? br_fdb_external_learn_del+0x120/0x120 [bridge]
>>
>> 2017-10-20T19:00:19.679131+00:00 node-68 kernel: [1459236.052688]
>> [<ffffffffc07b1196>] ? __br_forward+0xa6/0x130 [bridge]
>>
>> 2017-10-20T19:00:19.679132+00:00 node-68 kernel: [1459236.052693]
>> [<ffffffffc07b1040>] ? deliver_clone+0x50/0x50 [bridge]
>>
>> 2017-10-20T19:00:19.679133+00:00 node-68 kernel: [1459236.052698]
>> [<ffffffffc07b1727>] ? br_forward+0x87/0x90 [bridge]
>>
>> 2017-10-20T19:00:19.679134+00:00 node-68 kernel: [1459236.052702]
>> [<ffffffffc07b2860>] ? br_handle_frame_finish+0x3a0/0x620 [bridge]
>>
>> 2017-10-20T19:00:19.679135+00:00 node-68 kernel: [1459236.052706]
>> [<ffffffff811ee1eb>] ? __slab_free+0xcb/0x2c0
>>
>> 2017-10-20T19:00:19.679135+00:00 node-68 kernel: [1459236.052711]
>> [<ffffffffc07b2c54>] ? br_handle_frame+0x174/0x2b0 [bridge]
>>
>> 2017-10-20T19:00:19.679136+00:00 node-68 kernel: [1459236.052715]
>> [<ffffffffc0553899>] netdev_frame_hook+0xe9/0x150 [openvswitch]
>>
>> 2017-10-20T19:00:19.679137+00:00 node-68 kernel: [1459236.052717]
>> [<ffffffff817374a4>] __netif_receive_skb_core+0x364/0xa60
>>
>> 2017-10-20T19:00:19.679147+00:00 node-68 kernel: [1459236.052721]
>> [<ffffffff81722f00>] ? skb_complete_wifi_ack+0xa0/0xe0
>>
>> 2017-10-20T19:00:19.679148+00:00 node-68 kernel: [1459236.052722]
>> [<ffffffff81735fef>] ? __dev_kfree_skb_any+0x2f/0x40
>>
>> 2017-10-20T19:00:19.679149+00:00 node-68 kernel: [1459236.052723]
>> [<ffffffff81737bb8>] __netif_receive_skb+0x18/0x60
>>
>> 2017-10-20T19:00:19.679149+00:00 node-68 kernel: [1459236.052725]
>> [<ffffffff817389a8>] process_backlog+0xa8/0x150
>>
>> 2017-10-20T19:00:19.679150+00:00 node-68 kernel: [1459236.052726]
>> [<ffffffff817380fe>] net_rx_action+0x21e/0x360
>>
>> 2017-10-20T19:00:19.679155+00:00 node-68 kernel: [1459236.052728]
>> [<ffffffff81085dd1>] __do_softirq+0x101/0x290
>>
>> 2017-10-20T19:00:19.679157+00:00 node-68 kernel: [1459236.052730]
>> [<ffffffff810860d3>] irq_exit+0xa3/0xb0
>>
>> 2017-10-20T19:00:19.679158+00:00 node-68 kernel: [1459236.052733]
>> [<ffffffff81050e03>] smp_call_function_single_interrupt+0x33/0x40
>>
>> 2017-10-20T19:00:19.679158+00:00 node-68 kernel: [1459236.052738]
>> [<ffffffff81844622>] call_function_single_interrupt+0x82/0x90
>>
>> 2017-10-20T19:00:19.679159+00:00 node-68 kernel: [1459236.052739]
>> <EOI>  [<ffffffff816d5ad1>] ? cpuidle_enter_state+0x111/0x2b0
>>
>> 2017-10-20T19:00:19.679160+00:00 node-68 kernel: [1459236.052743]
>> [<ffffffff816d5ca7>] cpuidle_enter+0x17/0x20
>>
>> 2017-10-20T19:00:19.679162+00:00 node-68 kernel: [1459236.052746]
>> [<ffffffff810c4772>] call_cpuidle+0x32/0x60
>>
>> 2017-10-20T19:00:19.679163+00:00 node-68 kernel: [1459236.052747]
>> [<ffffffff816d5c83>] ? cpuidle_select+0x13/0x20
>>
>> 2017-10-20T19:00:19.679163+00:00 node-68 kernel: [1459236.052749]
>> [<ffffffff810c4a30>] cpu_startup_entry+0x290/0x350
>>
>> 2017-10-20T19:00:19.679164+00:00 node-68 kernel: [1459236.052750]
>> [<ffffffff810517b4>] start_secondary+0x154/0x190
>>
>> 2017-10-20T19:00:19.679165+00:00 node-68 kernel: [1459236.052753] ---[
>> end trace d44d42b3ada7826a ]---
>>
>>
>>
>>
>>
>> *node-90*
>>
>> 2017-10-20T18:04:40.933607+00:00 node-90 rsyslogd-2177: imuxsock[pid
>> 5001]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:04:42.868706+00:00 node-90 rsyslogd-2177: imuxsock[pid
>> 5001]: 41 messages lost due to rate-limiting
>>
>> 2017-10-20T18:14:40.927790+00:00 node-90 rsyslogd-2177: imuxsock[pid
>> 5001]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:14:42.537996+00:00 node-90 rsyslogd-2177: imuxsock[pid
>> 5001]: 41 messages lost due to rate-limiting
>>
>> 2017-10-20T18:24:40.921904+00:00 node-90 rsyslogd-2177: imuxsock[pid
>> 5001]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:24:42.091415+00:00 node-90 rsyslogd-2177: imuxsock[pid
>> 5001]: 41 messages lost due to rate-limiting
>>
>>
>>
>> ps -aef | grep 5001
>>
>> ceilome+ 5001 3401 0 Oct19 ? 00:19:09 ceilometer-polling - AgentManager(0)
>>
>>
>>
>>
>>
>> 2017-10-20T18:30:37.734912+00:00 node-90 rsyslogd: [origin
>> software="rsyslogd" swVersion="8.16.0" x-pid="3305" x-info="
>> http://www.rsyslog.com";] exiting on signal 15.
>>
>> 2017-10-20T18:30:37.834236+00:00 node-90 rsyslogd: [origin
>> software="rsyslogd" swVersion="8.16.0" x-pid="21427" x-info="
>> http://www.rsyslog.com";] start
>>
>> 2017-10-20T18:30:37.833919+00:00 node-90 rsyslogd: rsyslogd's groupid
>> changed to 108
>>
>> 2017-10-20T18:30:37.833993+00:00 node-90 rsyslogd: rsyslogd's userid
>> changed to 104
>>
>> 2017-10-20T18:30:37.834050+00:00 node-90 rsyslogd-2357: queue "action 0
>> queue": high water mark is set quite low at 8000. You should only set it
>> below 60% (600000) if you have a good reason for this. [v8.16.0 try
>> http://www.rsyslog.com/e/2357 ]
>>
>>
>>
>> Test starts: Fri Oct 20 18:52:48 2017
>>
>>
>>
>>
>>
>> 2017-10-20T18:56:20.421681+00:00 node-90 kernel: [97344.379555]
>> ------------[ cut here ]------------
>>
>> 2017-10-20T18:56:20.421718+00:00 node-90 kernel: [97344.379563] WARNING:
>> CPU: 30 PID: 18870 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
>> skb_warn_bad_offload+0xd1/0x120()
>>
>> 2017-10-20T18:56:20.421719+00:00 node-90 kernel: [97344.379565]
>> qvo14d5a4ef-47: caps=(0x00000184075b59e9, 0x0000000000000000) len=2531
>> data_len=0 gso_size=1480 gso_type=6 ip_summed=0
>>
>> 2017-10-20T18:56:20.421720+00:00 node-90 kernel: [97344.379567] Modules
>> linked in: vhost_net vhost macvtap macvlan veth nf_conntrack_netlink
>> ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
>> ip_set_hash_net ip_set nfnetlink ebtable_filter ebtables openvswitch ocfs2
>> quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport
>> xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables
>> xfs bridge 8021q garp mrp stp llc intel_rapl x86_pkg_temp_thermal
>> intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>> aesni_intel aes_x86_64 lrw gf128mul hpilo input_leds joydev kvm_intel
>> glue_helper ipmi_ssif kvm ablk_helper cryptd irqbypass ipmi_si shpchp
>> 8250_fintek ipmi_msghandler ioatdma serio_raw sb_edac lpc_ich edac_core dca
>> acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core
>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
>> nf_defrag_ipv4 nf_conntrack autofs4 raid10 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx dm_round_robin xor ses enclosure
>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid
>> psmouse lpfc ahci libahci be2net vxlan scsi_transport_fc ip6_udp_tunnel
>> udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
>>
>> 2017-10-20T18:56:20.421723+00:00 node-90 kernel: [97344.379625] CPU: 30
>> PID: 18870 Comm: vhost-18868 Not tainted 4.4.0-93-generic #116-Ubuntu
>>
>> 2017-10-20T18:56:20.421871+00:00 node-90 kernel: [97344.379626] Hardware
>> name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
>>
>> 2017-10-20T18:56:20.421876+00:00 node-90 kernel: [97344.379627]
>> 0000000000000286 f1812d601dc61f3e ffff88203f2837f0 ffffffff813f9f83
>>
>> 2017-10-20T18:56:20.421877+00:00 node-90 kernel: [97344.379629]
>> ffff88203f283838 ffffffff81d6f780 ffff88203f283828 ffffffff810812f2
>>
>> 2017-10-20T18:56:20.421895+00:00 node-90 kernel: [97344.379630]
>> ffff881fb3dfa700 ffff88202d5d1000 0000000000000006 0000000000000006
>>
>> 2017-10-20T18:56:20.421899+00:00 node-90 kernel: [97344.379632] Call
>> Trace:
>>
>> 2017-10-20T18:56:20.421900+00:00 node-90 kernel: [97344.379634]  <IRQ>
>> [<ffffffff813f9f83>] dump_stack+0x63/0x90
>>
>> 2017-10-20T18:56:20.421901+00:00 node-90 kernel: [97344.379642]
>> [<ffffffff810812f2>] warn_slowpath_common+0x82/0xc0
>>
>> 2017-10-20T18:56:20.421901+00:00 node-90 kernel: [97344.379643]
>> [<ffffffff8108138c>] warn_slowpath_fmt+0x5c/0x80
>>
>> 2017-10-20T18:56:20.421902+00:00 node-90 kernel: [97344.379646]
>> [<ffffffff814000a2>] ? ___ratelimit+0xa2/0xe0
>>
>> 2017-10-20T18:56:20.421904+00:00 node-90 kernel: [97344.379648]
>> [<ffffffff81735cd1>] skb_warn_bad_offload+0xd1/0x120
>>
>> 2017-10-20T18:56:20.421905+00:00 node-90 kernel: [97344.379650]
>> [<ffffffff817393dd>] __skb_gso_segment+0xfd/0x110
>>
>> 2017-10-20T18:56:20.421905+00:00 node-90 kernel: [97344.379656]
>> [<ffffffffc060227b>] queue_gso_packets+0x5b/0x150 [openvswitch]
>>
>> 2017-10-20T18:56:20.421906+00:00 node-90 kernel: [97344.379658]
>> [<ffffffffc04f1e43>] ? br_nf_forward_ip+0x2a3/0x480 [br_netfilter]
>>
>> 2017-10-20T18:56:20.421907+00:00 node-90 kernel: [97344.379660]
>> [<ffffffffc04f1530>] ? br_validate_ipv4.isra.23+0x200/0x200
>> [br_netfilter]
>>
>> 2017-10-20T18:56:20.421907+00:00 node-90 kernel: [97344.379666]
>> [<ffffffff8176e312>] ? nf_iterate+0x62/0x80
>>
>> 2017-10-20T18:56:20.421909+00:00 node-90 kernel: [97344.379668]
>> [<ffffffff8176e3a3>] ? nf_hook_slow+0x73/0xd0
>>
>> 2017-10-20T18:56:20.421910+00:00 node-90 kernel: [97344.379676]
>> [<ffffffffc041c1f4>] ? __br_forward+0x104/0x130 [bridge]
>>
>> 2017-10-20T18:56:20.421911+00:00 node-90 kernel: [97344.379679]
>> [<ffffffffc06024a1>] ovs_dp_upcall+0x31/0x60 [openvswitch]
>>
>> 2017-10-20T18:56:20.421911+00:00 node-90 kernel: [97344.379681]
>> [<ffffffffc06025da>] ovs_dp_process_packet+0x10a/0x130 [openvswitch]
>>
>> 2017-10-20T18:56:20.421912+00:00 node-90 kernel: [97344.379684]
>> [<ffffffffc060b67c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
>>
>> 2017-10-20T18:56:20.421912+00:00 node-90 kernel: [97344.379685]
>> [<ffffffffc04f21c9>] ? br_nf_pre_routing_finish+0x1a9/0x350
>> [br_netfilter]
>>
>> 2017-10-20T18:56:20.421915+00:00 node-90 kernel: [97344.379688]
>> [<ffffffffc041d4c0>] ? br_handle_local_finish+0xa0/0xa0 [bridge]
>>
>> 2017-10-20T18:56:20.421915+00:00 node-90 kernel: [97344.379690]
>> [<ffffffff8176e312>] ? nf_iterate+0x62/0x80
>>
>> 2017-10-20T18:56:20.421916+00:00 node-90 kernel: [97344.379692]
>> [<ffffffffc04f2f61>] ? br_nf_pre_routing+0x2e1/0x440 [br_netfilter]
>>
>> 2017-10-20T18:56:20.421916+00:00 node-90 kernel: [97344.379693]
>> [<ffffffffc04f2020>] ? br_nf_forward_ip+0x480/0x480 [br_netfilter]
>>
>> 2017-10-20T18:56:20.421917+00:00 node-90 kernel: [97344.379696]
>> [<ffffffffc041dcba>] ? br_handle_frame+0x1da/0x2b0 [bridge]
>>
>> 2017-10-20T18:56:20.421917+00:00 node-90 kernel: [97344.379699]
>> [<ffffffffc060c899>] netdev_frame_hook+0xe9/0x150 [openvswitch]
>>
>> 2017-10-20T18:56:20.421920+00:00 node-90 kernel: [97344.379700]
>> [<ffffffff817374a4>] __netif_receive_skb_core+0x364/0xa60
>>
>> 2017-10-20T18:56:20.421920+00:00 node-90 kernel: [97344.379702]
>> [<ffffffff81737bb8>] __netif_receive_skb+0x18/0x60
>>
>> 2017-10-20T18:56:20.421921+00:00 node-90 kernel: [97344.379703]
>> [<ffffffff817389a8>] process_backlog+0xa8/0x150
>>
>> 2017-10-20T18:56:20.421928+00:00 node-90 kernel: [97344.379704]
>> [<ffffffff817380fe>] net_rx_action+0x21e/0x360
>>
>> 2017-10-20T18:56:20.421929+00:00 node-90 kernel: [97344.379706]
>> [<ffffffff81085dd1>] __do_softirq+0x101/0x290
>>
>> 2017-10-20T18:56:20.421929+00:00 node-90 kernel: [97344.379709]
>> [<ffffffff81844f0c>] do_softirq_own_stack+0x1c/0x30
>>
>> 2017-10-20T18:56:20.421931+00:00 node-90 kernel: [97344.379710]  <EOI>
>> [<ffffffff81085818>] do_softirq.part.19+0x38/0x40
>>
>> 2017-10-20T18:56:20.421932+00:00 node-90 kernel: [97344.379713]
>> [<ffffffff81085fcd>] do_softirq+0x1d/0x20
>>
>> 2017-10-20T18:56:20.421932+00:00 node-90 kernel: [97344.379714]
>> [<ffffffff817367b3>] netif_rx_ni+0x33/0x80
>>
>> 2017-10-20T18:56:20.421933+00:00 node-90 kernel: [97344.379718]
>> [<ffffffff816063e6>] tun_get_user+0x506/0x880
>>
>> 2017-10-20T18:56:20.421933+00:00 node-90 kernel: [97344.379720]
>> [<ffffffff816067b1>] tun_sendmsg+0x51/0x70
>>
>> 2017-10-20T18:56:20.421934+00:00 node-90 kernel: [97344.379723]
>> [<ffffffffc0544f56>] handle_tx+0x306/0x4e0 [vhost_net]
>>
>> 2017-10-20T18:56:20.421938+00:00 node-90 kernel: [97344.379726]
>> [<ffffffffc0545165>] handle_tx_kick+0x15/0x20 [vhost_net]
>>
>> 2017-10-20T18:56:20.421938+00:00 node-90 kernel: [97344.379730]
>> [<ffffffffc052f723>] vhost_worker+0xf3/0x190 [vhost]
>>
>> 2017-10-20T18:56:20.421939+00:00 node-90 kernel: [97344.379733]
>> [<ffffffffc052f630>] ? vhost_poll_wakeup+0x30/0x30 [vhost]
>>
>> 2017-10-20T18:56:20.421939+00:00 node-90 kernel: [97344.379736]
>> [<ffffffff810a0c95>] kthread+0xe5/0x100
>>
>> 2017-10-20T18:56:20.421940+00:00 node-90 kernel: [97344.379738]
>> [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0
>>
>> 2017-10-20T18:56:20.421942+00:00 node-90 kernel: [97344.379740]
>> [<ffffffff8184358f>] ret_from_fork+0x3f/0x70
>>
>> 2017-10-20T18:56:20.421943+00:00 node-90 kernel: [97344.379742]
>> [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0
>>
>> 2017-10-20T18:56:20.421943+00:00 node-90 kernel: [97344.379743] ---[ end
>> trace d7e73079b38e57b3 ]---
>>
>> 2017-10-20T19:00:19.698016+00:00 node-90 kernel: [97583.653007]
>> ------------[ cut here ]------------
>>
>> 2017-10-20T19:00:19.698034+00:00 node-90 kernel: [97583.653016] WARNING:
>> CPU: 2 PID: 18870 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
>> skb_warn_bad_offload+0xd1/0x120()
>>
>> 2017-10-20T19:00:19.698036+00:00 node-90 kernel: [97583.653018]
>> qvo14d5a4ef-47: caps=(0x00000184075b59e9, 0x0000000000000000) len=2531
>> data_len=0 gso_size=1480 gso_type=6 ip_summed=0
>>
>> 2017-10-20T19:00:19.698037+00:00 node-90 kernel: [97583.653019] Modules
>> linked in: vhost_net vhost macvtap macvlan veth nf_conntrack_netlink
>> ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
>> ip_set_hash_net ip_set nfnetlink ebtable_filter ebtables openvswitch ocfs2
>> quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport
>> xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables
>> xfs bridge 8021q garp mrp stp llc intel_rapl x86_pkg_temp_thermal
>> intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>> aesni_intel aes_x86_64 lrw gf128mul hpilo input_leds joydev kvm_intel
>> glue_helper ipmi_ssif kvm ablk_helper cryptd irqbypass ipmi_si shpchp
>> 8250_fintek ipmi_msghandler ioatdma serio_raw sb_edac lpc_ich edac_core dca
>> acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core
>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
>> nf_defrag_ipv4 nf_conntrack autofs4 raid10 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx dm_round_robin xor ses enclosure
>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid
>> psmouse lpfc ahci libahci be2net vxlan scsi_transport_fc ip6_udp_tunnel
>> udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
>>
>> 2017-10-20T19:00:19.698046+00:00 node-90 kernel: [97583.653082] CPU: 2
>> PID: 18870 Comm: vhost-18868 Tainted: G        W       4.4.0-93-generic
>> #116-Ubuntu
>>
>> 2017-10-20T19:00:19.698048+00:00 node-90 kernel: [97583.653083] Hardware
>> name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
>>
>> 2017-10-20T19:00:19.698049+00:00 node-90 kernel: [97583.653084]
>> 0000000000000286 f1812d601dc61f3e ffff88103f8837f0 ffffffff813f9f83
>>
>> 2017-10-20T19:00:19.698064+00:00 node-90 kernel: [97583.653086]
>> ffff88103f883838 ffffffff81d6f780 ffff88103f883828 ffffffff810812f2
>>
>> 2017-10-20T19:00:19.698065+00:00 node-90 kernel: [97583.653088]
>> ffff881034e2fe00 ffff88202d5d1000 0000000000000006 0000000000000006
>>
>> 2017-10-20T19:00:19.698067+00:00 node-90 kernel: [97583.653090] Call
>> Trace:
>>
>> 2017-10-20T19:00:19.698069+00:00 node-90 kernel: [97583.653091]  <IRQ>
>> [<ffffffff813f9f83>] dump_stack+0x63/0x90
>>
>> 2017-10-20T19:00:19.698069+00:00 node-90 kernel: [97583.653098]
>> [<ffffffff810812f2>] warn_slowpath_common+0x82/0xc0
>>
>> 2017-10-20T19:00:19.698070+00:00 node-90 kernel: [97583.653100]
>> [<ffffffff8108138c>] warn_slowpath_fmt+0x5c/0x80
>>
>> 2017-10-20T19:00:19.698071+00:00 node-90 kernel: [97583.653102]
>> [<ffffffff814000a2>] ? ___ratelimit+0xa2/0xe0
>>
>> 2017-10-20T19:00:19.698071+00:00 node-90 kernel: [97583.653103]
>> [<ffffffff81735cd1>] skb_warn_bad_offload+0xd1/0x120
>>
>> 2017-10-20T19:00:19.698073+00:00 node-90 kernel: [97583.653105]
>> [<ffffffff817393dd>] __skb_gso_segment+0xfd/0x110
>>
>> 2017-10-20T19:00:19.698074+00:00 node-90 kernel: [97583.653111]
>> [<ffffffffc060227b>] queue_gso_packets+0x5b/0x150 [openvswitch]
>>
>> 2017-10-20T19:00:19.698075+00:00 node-90 kernel: [97583.653114]
>> [<ffffffffc04f1e43>] ? br_nf_forward_ip+0x2a3/0x480 [br_netfilter]
>>
>> 2017-10-20T19:00:19.698076+00:00 node-90 kernel: [97583.653116]
>> [<ffffffffc04f1530>] ? br_validate_ipv4.isra.23+0x200/0x200
>> [br_netfilter]
>>
>> 2017-10-20T19:00:19.698076+00:00 node-90 kernel: [97583.653120]
>> [<ffffffff8176e312>] ? nf_iterate+0x62/0x80
>>
>> 2017-10-20T19:00:19.698077+00:00 node-90 kernel: [97583.653122]
>> [<ffffffff8176e3a3>] ? nf_hook_slow+0x73/0xd0
>>
>> 2017-10-20T19:00:19.698079+00:00 node-90 kernel: [97583.653128]
>> [<ffffffffc041c1f4>] ? __br_forward+0x104/0x130 [bridge]
>>
>> 2017-10-20T19:00:19.698080+00:00 node-90 kernel: [97583.653131]
>> [<ffffffffc06024a1>] ovs_dp_upcall+0x31/0x60 [openvswitch]
>>
>> 2017-10-20T19:00:19.698081+00:00 node-90 kernel: [97583.653133]
>> [<ffffffffc06025da>] ovs_dp_process_packet+0x10a/0x130 [openvswitch]
>>
>> 2017-10-20T19:00:19.698081+00:00 node-90 kernel: [97583.653136]
>> [<ffffffffc060b67c>] ovs_vport_receive+0x6c/0xd0 [openvswitch]
>>
>> 2017-10-20T19:00:19.698082+00:00 node-90 kernel: [97583.653138]
>> [<ffffffffc04f21c9>] ? br_nf_pre_routing_finish+0x1a9/0x350
>> [br_netfilter]
>>
>> 2017-10-20T19:00:19.698083+00:00 node-90 kernel: [97583.653141]
>> [<ffffffffc041d4c0>] ? br_handle_local_finish+0xa0/0xa0 [bridge]
>>
>> 2017-10-20T19:00:19.698084+00:00 node-90 kernel: [97583.653143]
>> [<ffffffff8176e312>] ? nf_iterate+0x62/0x80
>>
>> 2017-10-20T19:00:19.698085+00:00 node-90 kernel: [97583.653144]
>> [<ffffffffc04f2f61>] ? br_nf_pre_routing+0x2e1/0x440 [br_netfilter]
>>
>> 2017-10-20T19:00:19.698086+00:00 node-90 kernel: [97583.653146]
>> [<ffffffffc04f2020>] ? br_nf_forward_ip+0x480/0x480 [br_netfilter]
>>
>> 2017-10-20T19:00:19.698086+00:00 node-90 kernel: [97583.653149]
>> [<ffffffffc041dcba>] ? br_handle_frame+0x1da/0x2b0 [bridge]
>>
>> 2017-10-20T19:00:19.698087+00:00 node-90 kernel: [97583.653152]
>> [<ffffffffc060c899>] netdev_frame_hook+0xe9/0x150 [openvswitch]
>>
>> 2017-10-20T19:00:19.698088+00:00 node-90 kernel: [97583.653154]
>> [<ffffffff817374a4>] __netif_receive_skb_core+0x364/0xa60
>>
>> 2017-10-20T19:00:19.698099+00:00 node-90 kernel: [97583.653156]
>> [<ffffffff8105a003>] ? x2apic_send_IPI_mask+0x13/0x20
>>
>> 2017-10-20T19:00:19.698099+00:00 node-90 kernel: [97583.653159]
>> [<ffffffff810508ba>] ? native_send_call_func_single_ipi+0x3a/0x40
>>
>> 2017-10-20T19:00:19.698100+00:00 node-90 kernel: [97583.653163]
>> [<ffffffff811046a5>] ? generic_exec_single+0x85/0x120
>>
>> 2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653167]
>> [<ffffffffc00a01f0>] ? be_eq_notify+0x60/0x70 [be2net]
>>
>> 2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653168]
>> [<ffffffff81737bb8>] __netif_receive_skb+0x18/0x60
>>
>> 2017-10-20T19:00:19.698102+00:00 node-90 kernel: [97583.653170]
>> [<ffffffff817389a8>] process_backlog+0xa8/0x150
>>
>> 2017-10-20T19:00:19.698104+00:00 node-90 kernel: [97583.653171]
>> [<ffffffff817380fe>] net_rx_action+0x21e/0x360
>>
>> 2017-10-20T19:00:19.698105+00:00 node-90 kernel: [97583.653173]
>> [<ffffffff81085dd1>] __do_softirq+0x101/0x290
>>
>> 2017-10-20T19:00:19.698106+00:00 node-90 kernel: [97583.653175]
>> [<ffffffff81844f0c>] do_softirq_own_stack+0x1c/0x30
>>
>> 2017-10-20T19:00:19.698107+00:00 node-90 kernel: [97583.653176]  <EOI>
>> [<ffffffff81085818>] do_softirq.part.19+0x38/0x40
>>
>> 2017-10-20T19:00:19.698108+00:00 node-90 kernel: [97583.653179]
>> [<ffffffff81085fcd>] do_softirq+0x1d/0x20
>>
>> 2017-10-20T19:00:19.698110+00:00 node-90 kernel: [97583.653181]
>> [<ffffffff817367b3>] netif_rx_ni+0x33/0x80
>>
>> 2017-10-20T19:00:19.698111+00:00 node-90 kernel: [97583.653184]
>> [<ffffffff816063e6>] tun_get_user+0x506/0x880
>>
>> 2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653185]
>> [<ffffffff816067b1>] tun_sendmsg+0x51/0x70
>>
>> 2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653188]
>> [<ffffffffc0544f56>] handle_tx+0x306/0x4e0 [vhost_net]
>>
>> 2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653190]
>> [<ffffffffc0545165>] handle_tx_kick+0x15/0x20 [vhost_net]
>>
>> 2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653193]
>> [<ffffffffc052f723>] vhost_worker+0xf3/0x190 [vhost]
>>
>> 2017-10-20T19:00:19.698115+00:00 node-90 kernel: [97583.653195]
>> [<ffffffffc052f630>] ? vhost_poll_wakeup+0x30/0x30 [vhost]
>>
>> 2017-10-20T19:00:19.698116+00:00 node-90 kernel: [97583.653198]
>> [<ffffffff810a0c95>] kthread+0xe5/0x100
>>
>> 2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653199]
>> [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0
>>
>> 2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653203]
>> [<ffffffff8184358f>] ret_from_fork+0x3f/0x70
>>
>> 2017-10-20T19:00:19.698118+00:00 node-90 kernel: [97583.653204]
>> [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0
>>
>> 2017-10-20T19:00:19.698123+00:00 node-90 kernel: [97583.653206] ---[ end
>> trace d7e73079b38e57b4 ]---
>>
>>
>>
>>
>>
>> -- Jim
>>
>> On Wed, Oct 18, 2017 at 11:37 PM, Jim Okken <j...@jokken.com> wrote:
>>
>>> hi all,
>>>
>>> please help us out with an issue we are seeing on multiple compute nodes
>>> running Newton (Ubuntu 16.04.3 Kernel 4.4.0). After about 1 hour of running
>>> our VOIP test application the instances become non-responsive and can't be
>>> pinged as well do the compute nodes.
>>>
>>> messages appear on the compute node console screens. a screen shot of
>>> that is hosted here:
>>>
>>> http://www.jokken.com/downloads/console.png
>>>
>>> i'll try to attach it also.
>>>
>>> The first compute node this was seen on was running 2 instances, the
>>> second was running only 1 instance. They were using on a portion of the
>>> total 40 vCPUs available, and the load was moderate. Cold boot these nodes
>>> and all is well again, until we run our application for about 1 hour.
>>>
>>> please let us know what you think thanks!
>>>
>>> not a lot is shown in DEBUG logging of Nova and Neutron on the compute
>>> node
>>>
>>> these logs are here:
>>>
>>> http://www.jokken.com/downloads/logs.zip
>>>
>>> i'll try to attach them too.
>>>
>>> https://ask.openstack.org/en/question/110748/soft-lockup-on-
>>> newton-compute-nodes/
>>>
>>> /var/log/messages on the compute node shows many repeats of these
>>> messages:
>>>
>>> 2017-10-18T20:49:26.462309+00:00 node-58 kernel: [1297007.624935]
>>> Modules linked in: binfmt_misc nf_conntrack_netlink vhost_net vhost macvtap
>>> macvlan ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
>>> ip_set_hash_net ip_set nfnetlink veth ebtable_filter ebtables openvswitch
>>> ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>>> ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport
>>> xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables
>>> xfs ipmi_ssif 8021q garp mrp intel_rapl x86_pkg_temp_thermal
>>> intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>>> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
>>> serio_raw bridge stp llc sb_edac edac_core hpilo ioatdma lpc_ich shpchp dca
>>> ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter mac_hid kvm_intel kvm
>>> irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr
>>> iscsi_tcp libiscsi_tcp nf_conntrack_proto_gre nf_conntrack_ipv6
>>> nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 raid10
>>> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>>> raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses
>>> enclosure uas usb_storage psmouse ahci lpfc be2iscsi libahci be2net
>>> iscsi_boot_sysfs libiscsi vxlan scsi_transport_fc ip6_udp_tunnel
>>> scsi_transport_iscsi udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac
>>> scsi_dh_alua dm_multipath
>>>
>>> 2017-10-18T20:49:26.462311+00:00 node-58 kernel: [1297007.625008] CPU:
>>> 27 PID: 860 Comm: qemu-system-x86 Not tainted 4.4.0-93-generic #116-Ubuntu
>>>
>>> 2017-10-18T20:49:26.462313+00:00 node-58 kernel: [1297007.625009]
>>> Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
>>>
>>> 2017-10-18T20:49:26.462314+00:00 node-58 kernel: [1297007.625010] task:
>>> ffff881faaaa7000 ti: ffff881fa3a34000 task.ti: ffff881fa3a34000
>>>
>>> 2017-10-18T20:49:26.462315+00:00 node-58 kernel: [1297007.625011] RIP:
>>> 0010:[<ffffffff810cb29c>]  [<ffffffff810cb29c>]
>>> native_queued_spin_lock_slowpath+0x15c/0x170
>>>
>>> 2017-10-18T20:49:26.462316+00:00 node-58 kernel: [1297007.625018] RSP:
>>> 0018:ffff883fff143c30  EFLAGS: 00000202
>>>
>>> 2017-10-18T20:49:26.462317+00:00 node-58 kernel: [1297007.625019] RAX:
>>> 0000000000000101 RBX: ffff881f677603f0 RCX: 0000000000000001
>>>
>>> 2017-10-18T20:49:26.462337+00:00 node-58 kernel: [1297007.625020] RDX:
>>> 0000000000000101 RSI: 0000000000000001 RDI: ffff881f677603ec
>>>
>>> 2017-10-18T20:49:26.462340+00:00 node-58 kernel: [1297007.625020] RBP:
>>> ffff883fff143c30 R08: 0000000000000101 R09: ffffffff81191e27
>>>
>>> 2017-10-18T20:49:26.462341+00:00 node-58 kernel: [1297007.625021] R10:
>>> ffffea00ffb09780 R11: 0000000000000a00 R12: ffff881f677603ec
>>>
>>> 2017-10-18T20:49:26.462342+00:00 node-58 kernel: [1297007.625022] R13:
>>> 0000000000000a00 R14: 00000000000a5000 R15: 0000000000000a00
>>>
>>> 2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625023] FS:
>>> 00007f0c53fb3c00(0000) GS:ffff883fff140000(0000) knlGS:0000000000000000
>>>
>>> 2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625024] CS:
>>> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>
>>> 2017-10-18T20:49:26.462344+00:00 node-58 kernel: [1297007.625025] CR2:
>>> 00007fe018e2547e CR3: 0000003ec0b75000 CR4: 00000000001426e0
>>>
>>> 2017-10-18T20:49:26.462345+00:00 node-58 kernel: [1297007.625026] Stack:
>>>
>>> 2017-10-18T20:49:26.462347+00:00 node-58 kernel: [1297007.625026]
>>> ffff883fff143c40 ffffffff81842f71 ffff883fff143c60 ffffffff81841085
>>>
>>> 2017-10-18T20:49:26.462348+00:00 node-58 kernel: [1297007.625028]
>>> ffff881dc609ac00 ffff881f677604b0 ffff883fff143c70 ffffffff818410cb
>>>
>>> 2017-10-18T20:49:26.462349+00:00 node-58 kernel: [1297007.625029]
>>> ffff883fff143ca0 ffffffffc08c658d ffff883feff9d500 0000000000000a00
>>>
>>> 2017-10-18T20:49:26.462351+00:00 node-58 kernel: [1297007.625031] Call
>>> Trace:
>>>
>>> 2017-10-18T20:49:26.462353+00:00 node-58 kernel: [1297007.625032]
>>> <IRQ>
>>>
>>> 2017-10-18T20:49:26.462354+00:00 node-58 kernel: [1297007.625039]
>>> [<ffffffff81842f71>] _raw_spin_lock+0x21/0x30
>>>
>>> 2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625041]
>>> [<ffffffff81841085>] __mutex_unlock_slowpath+0x25/0x50
>>>
>>> 2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625042]
>>> [<ffffffff818410cb>] mutex_unlock+0x1b/0x20
>>>
>>> 2017-10-18T20:49:26.462357+00:00 node-58 kernel: [1297007.625076]
>>> [<ffffffffc08c658d>] ocfs2_dio_end_io+0x6d/0x80 [ocfs2]
>>>
>>> 2017-10-18T20:49:26.462358+00:00 node-58 kernel: [1297007.625080]
>>> [<ffffffff8124d34c>] dio_complete+0x11c/0x1c0
>>>
>>> 2017-10-18T20:49:26.462359+00:00 node-58 kernel: [1297007.625081]
>>> [<ffffffff8124d463>] dio_bio_end_aio+0x73/0x100
>>>
>>> 2017-10-18T20:49:26.462361+00:00 node-58 kernel: [1297007.625085]
>>> [<ffffffff813c2b9f>] bio_endio+0x3f/0x60
>>>
>>> 2017-10-18T20:49:26.462362+00:00 node-58 kernel: [1297007.625087]
>>> [<ffffffff813ca547>] blk_update_request+0x87/0x310
>>>
>>> 2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625091]
>>> [<ffffffff816bae96>] end_clone_bio+0x46/0x70
>>>
>>> 2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625092]
>>> [<ffffffff813c2b9f>] bio_endio+0x3f/0x60
>>>
>>> 2017-10-18T20:49:26.462364+00:00 node-58 kernel: [1297007.625093]
>>> [<ffffffff813ca547>] blk_update_request+0x87/0x310
>>>
>>> 2017-10-18T20:49:26.462365+00:00 node-58 kernel: [1297007.625097]
>>> [<ffffffff815c4583>] scsi_end_request+0x33/0x1d0
>>>
>>> 2017-10-18T20:49:26.462367+00:00 node-58 kernel: [1297007.625100]
>>> [<ffffffff815c7cb6>] scsi_io_completion+0x1b6/0x690
>>>
>>> 2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625104]
>>> [<ffffffff810beb66>] ? rebalance_domains+0x166/0x2d0
>>>
>>> 2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625107]
>>> [<ffffffff815be8df>] scsi_finish_command+0xcf/0x120
>>>
>>> 2017-10-18T20:49:26.462377+00:00 node-58 kernel: [1297007.625109]
>>> [<ffffffff815c7444>] scsi_softirq_done+0x124/0x150
>>>
>>> 2017-10-18T20:49:26.462378+00:00 node-58 kernel: [1297007.625112]
>>> [<ffffffff813d2437>] blk_done_softirq+0x87/0xb0
>>>
>>> 2017-10-18T20:49:26.462379+00:00 node-58 kernel: [1297007.625116]
>>> [<ffffffff81085dd1>] __do_softirq+0x101/0x290
>>>
>>> 2017-10-18T20:49:26.462381+00:00 node-58 kernel: [1297007.625118]
>>> [<ffffffff810860d3>] irq_exit+0xa3/0xb0
>>>
>>> 2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625121]
>>> [<ffffffff81050e03>] smp_call_function_single_interrupt+0x33/0x40
>>>
>>> 2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625124]
>>> [<ffffffff81844622>] call_function_single_interrupt+0x82/0x90
>>>
>>> 2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625125]
>>> <EOI>
>>>
>>> 2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625127]
>>> [<ffffffff81842f64>] ? _raw_spin_lock+0x14/0x30
>>>
>>> 2017-10-18T20:49:26.462385+00:00 node-58 kernel: [1297007.625129]
>>> [<ffffffff81840f72>] __mutex_lock_slowpath+0x72/0x130
>>>
>>> 2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625142]
>>> [<ffffffffc08dd099>] ? ocfs2_inode_unlock+0x119/0x120 [ocfs2]
>>>
>>> 2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625143]
>>> [<ffffffff8184104f>] mutex_lock+0x1f/0x30
>>>
>>> 2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625155]
>>> [<ffffffffc08e677a>] ocfs2_file_write_iter+0x95a/0xdf0 [ocfs2]
>>>
>>> 2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625158]
>>> [<ffffffff81224090>] ? poll_select_copy_remaining+0x140/0x140
>>>
>>> 2017-10-18T20:49:26.462389+00:00 node-58 kernel: [1297007.625169]
>>> [<ffffffffc08e5e20>] ? ocfs2_check_range_for_refcount+0x150/0x150
>>> [ocfs2]
>>>
>>> 2017-10-18T20:49:26.462391+00:00 node-58 kernel: [1297007.625171]
>>> [<ffffffff812601ba>] aio_run_iocb+0x26a/0x2d0
>>>
>>> 2017-10-18T20:49:26.462392+00:00 node-58 kernel: [1297007.625174]
>>> [<ffffffff8122d6b5>] ? __fget_light+0x25/0x60
>>>
>>> 2017-10-18T20:49:26.462394+00:00 node-58 kernel: [1297007.625175]
>>> [<ffffffff8122d703>] ? __fdget+0x13/0x20
>>>
>>> 2017-10-18T20:49:26.462395+00:00 node-58 kernel: [1297007.625177]
>>> [<ffffffff8126108f>] do_io_submit+0x25f/0x500
>>>
>>> 2017-10-18T20:49:26.462396+00:00 node-58 kernel: [1297007.625178]
>>> [<ffffffff81261340>] SyS_io_submit+0x10/0x20
>>>
>>> 2017-10-18T20:49:26.462398+00:00 node-58 kernel: [1297007.625181]
>>> [<ffffffff818431f2>] entry_SYSCALL_64_fastpath+0x16/0x71
>>>
>>> 2017-10-18T20:49:26.462399+00:00 node-58 kernel: [1297007.625181] Code:
>>> 01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00
>>> 00 e9 63 ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8
>>> b8 01 00 00 00 66 89 07 5d c3 0f 1f 40 00 0f
>>>
>>>
>>>
>>
>
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to