[ https://ovirt-jira.atlassian.net/browse/OVIRT-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Evgheni Dereveanchin reassigned OVIRT-736: ------------------------------------------ Assignee: Evgheni Dereveanchin (was: infra) > soft lockup on el7-vm25 > ----------------------- > > Key: OVIRT-736 > URL: https://ovirt-jira.atlassian.net/browse/OVIRT-736 > Project: oVirt - virtualization made easy > Issue Type: Bug > Reporter: Evgheni Dereveanchin > Assignee: Evgheni Dereveanchin > > I've noticed some slaves going offline in Jenkins with 100% CPU reported on > the Engine. They eventually return to normal state. CHecked the logs on > el7-vm25.phx.ovirt.org which had these symptoms and there seems to be a soft > lockup due to the qemu-kvm process: > Sep 21 04:57:18 el7-vm25 kernel: BUG: soft lockup - CPU#0 stuck for 22s! > [qemu-kvm:13768] > Sep 21 04:57:18 el7-vm25 kernel: Modules linked in: nls_utf8 isofs loop > dm_mod xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter > ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc > ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 > nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat > nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter > aesni_intel lrw gf128mul glue_helper ppdev ablk_helper cryptd sg pcspkr > parport_pc parport i2c_piix4 kvm_intel nfsd kvm auth_rpcgss nfs_acl lockd > grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi > virtio_blk virtio_console virtio_scsi virtio_net qxl syscopyarea sysfillrect > sysimgblt drm_kms_helper > Sep 21 04:57:18 el7-vm25 kernel: ttm ata_piix crc32c_intel libata serio_raw > virtio_pci virtio_ring virtio drm i2c_core floppy > Sep 21 04:57:18 el7-vm25 kernel: CPU: 0 PID: 13768 Comm: qemu-kvm Not tainted > 3.10.0-327.28.3.el7.x86_64 #1 > Sep 21 04:57:18 el7-vm25 kernel: Hardware name: oVirt oVirt Node, BIOS 0.5.1 > 01/01/2011 > Sep 21 04:57:18 el7-vm25 kernel: task: ffff880210017300 ti: ffff8800363f8000 > task.ti: ffff8800363f8000 > Sep 21 04:57:18 el7-vm25 kernel: RIP: 0010:[<ffffffff810e69da>] > [<ffffffff810e69da>] generic_exec_single+0xfa/0x1a0 > Sep 21 04:57:18 el7-vm25 kernel: RSP: 0018:ffff8800363fbc40 EFLAGS: 00000202 > Sep 21 04:57:18 el7-vm25 kernel: RAX: 0000000000000020 RBX: ffff8800363fbc10 > RCX: 0000000000000020 > Sep 21 04:57:18 el7-vm25 kernel: RDX: 00000000ffffffff RSI: 0000000000000020 > RDI: 0000000000000282 > Sep 21 04:57:18 el7-vm25 kernel: RBP: ffff8800363fbc88 R08: ffffffff8165fbe0 > R09: ffffea000357c4c0 > Sep 21 04:57:18 el7-vm25 kernel: R10: 0000000000003496 R11: 0000000000000206 > R12: ffff880210017300 > Sep 21 04:57:18 el7-vm25 kernel: R13: ffff880210017300 R14: 0000000000000001 > R15: ffff880210017300 > Sep 21 04:57:18 el7-vm25 kernel: FS: 00007fe7b288e700(0000) > GS:ffff880216e00000(0000) knlGS:0000000000000000 > Sep 21 04:57:18 el7-vm25 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Sep 21 04:57:18 el7-vm25 kernel: CR2: 00000000ffffffff CR3: 0000000211bcb000 > CR4: 00000000000026f0 > Sep 21 04:57:18 el7-vm25 kernel: DR0: 0000000000000000 DR1: 0000000000000000 > DR2: 0000000000000000 > Sep 21 04:57:18 el7-vm25 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 > DR7: 0000000000000400 > Sep 21 04:57:18 el7-vm25 kernel: Stack: > Sep 21 04:57:18 el7-vm25 kernel: 0000000000000000 0000000000000000 > ffffffff81065c90 ffff8800363fbd10 > Sep 21 04:57:18 el7-vm25 kernel: 0000000000000003 000000009347ffdf > 0000000000000001 ffffffff81065c90 > Sep 21 04:57:18 el7-vm25 kernel: ffffffff81065c90 ffff8800363fbcb8 > ffffffff810e6adf ffff8800363fbcb8 > Sep 21 04:57:18 el7-vm25 kernel: Call Trace: > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065c90>] ? leave_mm+0x70/0x70 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065c90>] ? leave_mm+0x70/0x70 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065c90>] ? leave_mm+0x70/0x70 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810e6adf>] > smp_call_function_single+0x5f/0xa0 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff812f3015>] ? > cpumask_next_and+0x35/0x50 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810e7083>] > smp_call_function_many+0x223/0x260 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065e58>] > native_flush_tlb_others+0xb8/0xc0 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81065f26>] > flush_tlb_mm_range+0x66/0x140 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff811929d3>] > tlb_flush_mmu.part.54+0x33/0xc0 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81193565>] tlb_finish_mmu+0x55/0x60 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81195afa>] > zap_page_range+0x12a/0x170 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81192224>] SyS_madvise+0x394/0x820 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810aa86d>] ? > hrtimer_nanosleep+0xad/0x170 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff810e5820>] ? SyS_futex+0x80/0x180 > Sep 21 04:57:18 el7-vm25 kernel: [<ffffffff81646b49>] > system_call_fastpath+0x16/0x1b > Sep 21 04:57:18 el7-vm25 kernel: Code: 80 72 01 00 48 89 de 48 03 14 c5 20 c9 > a5 81 48 89 df e8 7a 03 22 00 84 c0 75 46 45 85 ed 74 11 f6 43 20 01 74 0b 0f > 1f 00 f3 90 <f6> 43 20 01 75 f8 31 c0 48 8b 7c 24 28 65 48 33 3c 25 28 00 00 > need to find the root cause and fix the issue as this is negatively affecting > jobs being run. -- This message was sent by Atlassian JIRA (v1000.350.2#100014) _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra