Hi all,

So it seems this is not the same issue as chris is seeing, as I now
have done the kvm_stat for my hanging domain. I'm not seeing all zero
counters. CPU is still hanging at 100%, I can still login and saw
again the timejump in the dmesg output (the guest was startet with
acpi, but still used the kvmclock, the other guest had used both acpi
and acpi_pm clock).

I'm switching also this guest to acpi_pm now.

dmesg in the guest shows again that nice timejump:
[    3.968415] Uniform CD-ROM driver Revision: 3.20
[    4.122814]  vda: vda1 vda2 < vda5 >
[    4.620176] kjournald starting.  Commit interval 5 seconds
[    4.626042] EXT3-fs: mounted filesystem with ordered data mode.
[    5.641077] udevd version 125 started
[    6.548858] input: Power Button (FF) as /class/input/input1
[    6.568119] ACPI: Power Button (FF) [PWRF]
[    6.847566] piix4_smbus 0000:00:01.3: Found 0000:00:01.3 device
[    7.065429] input: PC Speaker as /class/input/input2
[    7.229412] input: ImExPS/2 Generic Explorer Mouse as /class/input/input3
[    7.269947] Error: Driver 'pcspkr' is already registered, aborting...
[    7.277315] udev: renamed network interface eth0 to eth1
[    8.674616] Adding 489940k swap on /dev/vda5.  Priority:-1
extents:1 across:489940k
[    8.767526] EXT3 FS on vda1, internal journal
[   10.270122] loop: module loaded
[   10.461641] device-mapper: uevent: version 1.0.3
[   10.475258] device-mapper: ioctl: 4.13.0-ioctl (2007-10-18)
initialised: [EMAIL PROTECTED]
[   11.221794] NET: Registered protocol family 10
[   11.224276] lo: Disabled Privacy Extensions
[   19.770963] warning: `ntpd' uses 32-bit capabilities (legacy support in use)
[   21.420169] eth1: no IPv6 routers present
[1266862591.699790] BUG: soft lockup - CPU#0 stuck for 1179853412s!
[logcheck:4056]
[1266862591.699790] Modules linked in: video output ac battery ipv6
dm_snapshot dm_mirror dm_log dm_mod loop virtio_net virtio_balloon
snd_pcsp serio_raw psmouse snd_pcm snd_timer snd soundcore
snd_page_alloc i2c_piix4 i2c_core button evdev ext3 jbd mbcache
virtio_blk ide_cd_mod cdrom ata_generic libata scsi_mod dock
ide_pci_generic floppy virtio_pci uhci_hcd usbcore piix ide_core
thermal processor fan thermal_sys
[1266862591.699790]
[1266862591.699790] Pid: 4056, comm: logcheck Not tainted (2.6.26-1-486 #1)
[1266862591.699790] EIP: 0060:[<c0115324>] EFLAGS: 00000202 CPU: 0
[1266862591.699790] EIP is at ptep_set_access_flags+0x3e/0x6e
[1266862591.699790] EAX: 19070067 EBX: 09661cc0 ECX: ddb0d984 EDX: 09661cc0
[1266862591.699790] ESI: ddb0d984 EDI: 00000001 EBP: ddb0541c ESP: dededeb0
[1266862591.699790]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[1266862591.699791] CR0: 8005003b CR2: 09661cc0 CR3: 1dc49000 CR4: 00000690
[1266862591.699791] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[1266862591.699791] DR6: ffff0ff0 DR7: 00000400
[1266862591.699791]  [<c0154c38>] ? do_wp_page+0x3db/0x434
[1266862591.699791]  [<c011314b>] ? pvclock_clocksource_read+0x4b/0xd0
[1266862591.699791]  [<c011314b>] ? pvclock_clocksource_read+0x4b/0xd0
[1266862591.699791]  [<c0155da9>] ? handle_mm_fault+0x55a/0x5d2
[1266862591.699791]  [<c0116b87>] ? __dequeue_entity+0x1f/0x71
[1266862591.699791]  [<c0113ac2>] ? do_page_fault+0x294/0x5ea
[1266862591.699791]  [<c011f275>] ? __do_softirq+0x3e/0x87
[1266862591.699791]  [<c011382e>] ? do_page_fault+0x0/0x5ea
[1266862591.699791]  [<c02a6a1a>] ? error_code+0x6a/0x70
[1266862591.699791]  =======================


 efer_relo      exits  fpu_reloa  halt_exit  halt_wake  host_stat
hypercall  insn_emul  insn_emul     invlpg   io_exits  irq_exits
irq_injec  irq_windo  kvm_reque  largepage  mmio_exit  mmu_cache
mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  mmu_recyc  mmu_shado
mmu_unsyn  nmi_injec  nmi_windo   pf_fixed   pf_guest  remote_tl
request_n  signal_ex  tlb_flush
         0       1848          0          0          0          5
    0        948          0          0          0        646
653          0          0          0          0          0          0
        0          0          0          0          0          0
   0          0          0          0          0          0          0
       150
         0       1852          0          0          0          5
    0        949          0          0          0        649
654          0          0          0          0          0          0
        0          0          0          0          0          0
   0          0          0          0          0          0          0
       149
         0       1848          0          0          0          5
    0        949          0          0          0        649
649          0          0          0          0          0          0
        0          0          0          0          0          0
   0          0          0          0          0          0          0
       151
         0       1843          0          0          0          6
    0        951          0          0          0        649
645          0          0          0          0          0          0
        0          0          0          0          0          0
   0          0          0          0          0          0          0
       149
         0       1825          0          0          0          6
    0        946          0          0          0        649
625          0          0          0          0          0          0
        0          0          0          0          0          0
   0          0          0          0          0          0          0
       150
         0       1832          0          0          0          6
    0        948          0          0          0


I've tried a dd if=/dev/zero of=/tmp/zero.file bs=10M count=100 to
test IO in the hanging guest, and now the console hangs there. Dong an
strace shows:
select(17, [4 7 9 10 11 12 14 16], [], [], {1, 0}) = 2 (in [12 14], left {1, 0})
read(12, 0x7fff94b745e0, 8)             = -1 EIO (Input/output error)
write(15, "\1\0\0\0\0\0\0\0"..., 8)     = 8
clock_gettime(CLOCK_MONOTONIC, {263807, 676112562}) = 0
clock_gettime(CLOCK_MONOTONIC, {263807, 676175416}) = 0
clock_gettime(CLOCK_MONOTONIC, {263807, 676237712}) = 0
timer_gettime(0, {it_interval={0, 0}, it_value={0, 9550245}}) = 0
read(14, "\2\0\0\0\0\0\0\0"..., 8)      = 8
select(17, [4 7 9 10 11 14 16], [], [], {1, 0}) = 1 (in [16], left {0, 992000})
read(16, 
"\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
128) = 128
rt_sigaction(SIGALRM, NULL, {0x405980, ~[KILL STOP RTMIN RT_1],
SA_RESTORER, 0x7fe58c0aaa80}, 8) = 0
write(5, "\0"..., 1)                    = 1
read(16, 0x7fff94b74950, 128)           = -1 EAGAIN (Resource
temporarily unavailable)
select(17, [4 7 9 10 11 14 16], [], [], {1, 0}) = 1 (in [4], left {1, 0})
read(4, "\0"..., 512)                   = 1
read(4, 0x7fff94b747e0, 512)            = -1 EAGAIN (Resource
temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {263807, 686388502}) = 0
clock_gettime(CLOCK_MONOTONIC, {263807, 686449959}) = 0
clock_gettime(CLOCK_MONOTONIC, {263807, 686511137}) = 0
clock_gettime(CLOCK_MONOTONIC, {263807, 686572035}) = 0

Should I start a different thread for this issue, to not mix things up
with chris problem?

+rl

On Fri, Nov 21, 2008 at 8:32 PM, Marcelo Tosatti <[EMAIL PROTECTED]> wrote:
> On Thu, Nov 20, 2008 at 09:10:57AM -0800, [EMAIL PROTECTED] wrote:
>> On Wed, Nov 19, 2008 at 02:43:42PM -0800, [EMAIL PROTECTED] wrote:
>> > Thanks for the responses,
>> >
>> > I'm not sure if my problem is the same as Roland's, but it definitely 
>> > sounds
>> > plausible.  I had been running ntpdate in the host to synchronize time 
>> > every hour (in a cron job), so it sounds as if we could be seeing the same 
>> > issue.
>> >
>>
>> Actually, with ntpdate taken out of crontab, I'm still seeing periodic
>> hangs, so it's either a different problem or I'm hitting it in a
>> different manner.
>>
>> OK, I installed kvm-79 and kernel 2.6.27.6, and here's the the kvm-stat 
>> output
>> with 1 guest hung and 3 more operational:
>
> <snip>
>
>> If I shut down the 3 operational guests leaving just the hung guest, the
>> kvm-stat output is all 0s:
>>
>>  efer_relo      exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall 
>>  insn_emul  insn_emul     invlpg   io_exits  irq_exits  irq_windo  largepage 
>>  mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  mmu_recyc 
>>  mmu_shado  nmi_windo   pf_fixed   pf_guest  remote_tl  request_i  signal_ex 
>>  tlb_flush
>>          0          0          0          0          0          0          0 
>>          0          0          0          0          0          0          0 
>>          0          0          0          0          0          0          0 
>>          0          0          0          0          0          0          0 
>>          0
>
> So the guest is not actually running here, which means its
> QEMU that its hanging at.
>
>> The hung guest in this case was run with this command:
>>
>> sudo /usr/local/bin/qemu-system-x86_64             \
>>      -daemonize                                    \
>>      -no-kvm-irqchip                               \
>>      -hda Imgs/ndev_root.img                       \
>>      -m 1024                                       \
>>      -cdrom ISOs/ubuntu-8.10-server-amd64.iso      \
>>      -vnc :4                                       \
>>      -net nic,macaddr=DE:AD:BE:EF:04:04,model=e1000 \
>>      -net tap,ifname=tap4,script=/home/chris/kvm/qemu-ifup.sh \
>>      >>& Logs/ndev_run.log
>>
>>
>> I should also mention that when the guest is hung, I can still switch
>> to the monitor with ctrl-alt 2. So, at least it's a little bit alive.
>
> In coma perhaps.
>
>> I've also noticed that the behavior with the hung guest is slightly
>> different on kvm-79 than it was earlier. When the guest hangs, the kvm
>> process in the host doesn't spin at 100% busy any longer - the guest is
>> just unresponsive at both the network and VNC console.
>
>> Also, I've noticed that if I reset the guest from the monitor, the
>> guest will boot up again, and I can get through to it on the network,
>> but strangely, the mouse and keyboard will still be hung at the
>> VNC console (except that I can still switch back and forth to the
>> monitor).
>>
>> Hope some of this helps, let me know if you need to me to provide any
>> other troubleshooting info.
>
> $ gdb -p pid-of-qemu
>
> (gdb) info threads
>
> Print the backtrace for every thread with:
>
> (gdb) thread N
> (gdb) bt
>
>



-- 
Roland Lammel
QuikIT - IT Lösungen - flexibel und schnell
Web: http://www.quikit.at
Email: [EMAIL PROTECTED]

"Enjoy your job, make lots of money, work within the law. Choose any two."
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to