>On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote: >> Kevin, >> >> After deeply analyzing, i think there may be 3 possible reasons: >> 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no >> lock to protect. So, sometimes, 2 or more vcpu may get the same >> current value of CountCPUs. Then we'll get a single incrementation >> instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will >> loop forever; > >The handle_smp() code is called from romlayout.S:entry_smp() which >does take a lock. So, all of handle_smp() should run synchronous. >
Ok, this possibility is ruled out! >> 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number? > >Not sure - the last time there were problems in this area of the code >others used kvmtrace to try and track this down. Since you are >getting dprintf statements, you could also try outputting >cmos_smp_count prior to the loop (see patch below). > I'll test again with this patch, and observe the output. But frankly speaking, i don't think SeaBIOS may read an incorrect number. Because, QEMU set smp_cpus value in pc_cmos_init (cmos ox5f) at the 1st time (QEMU does not execute pc_cmos_init again during repetitive reboot). SeaBIOS works well for many times which means there is no reason that a time SeaBIOS suddenly reads an incorrect number. >> 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield? >> I've tested, when yield() is running, SeaBIOS seems has not created >> some other threads except the main thread. So I don't know what's >> the function of yield() here.? > >The yield() allows hardware interrupts to occur. But note that >yield() isn't called in the loop - is is only called after the loop >completes. > >If you are only getting this on massive repetitive reboot requests, >there are some other possible explanations: > >- perhaps the SIPI is getting lost because one of the CPUs is still > resetting or still processing a SIPI from the last reboot? > Seems impossible. Seen from the qemu log and SeaBIOS log, the VM has booted successfully and can execute our "regular rebooting" daemon process before the last reboot. >- the seabios code itself may have been corrupted if the memcpy() in > qemu_prep_reset() got far enough along to clear HaveRunPost, but did > not get far enough along to fully complete the memcpy(). > BTW, my VM reboots every 220 second (reboot time interval = 220s). I think SeaBIOS has enough time to process all kinds of affairs, like SIPI and memcpy(). Kevin, I want to know whether it is possible that if my VM is stuck at QEMU (a point of pci device reset procedure? or whatever) SeaBIOS will hold at hanle_smp() and could not printf "Found %d cpu(s) max supported %d cpu(s)"? Is this possible? ================== bad QEMU log======= [2015-11-13 18:45:57] qemu_devices_reset:1941 reset all devices [2015-11-13 18:45:57] set_nmi_flag:71 set nmi val = 0 [2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650381}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}} [2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 796285}, "event": "RESET"} [2015-11-13 18:45:58] qemu_devices_reset:1941 reset all devices [2015-11-13 18:45:59] set_nmi_flag:71 set nmi val = 0 [2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411560, "microseconds": 212196}, "event": "RESET"} [2015-11-13 18:46:00] qemu_reset_report:749 domain is rebooting [2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650558}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}} ================ good QEMU log========= [2015-11-13 18:42:12] qemu_devices_reset:1941 reset all devices [2015-11-13 18:42:12] set_nmi_flag:71 set nmi val = 0 [2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718617}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}} [2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 848236}, "event": "RESET"} [2015-11-13 18:42:14] qemu_devices_reset:1941 reset all devices [2015-11-13 18:42:14] set_nmi_flag:71 set nmi val = 0 [2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411335, "microseconds": 280198}, "event": "RESET"} [2015-11-13 18:42:15] qemu_reset_report:749 domain is rebooting [2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718794}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}} [2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER [2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK [2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER [2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK [2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 1 that means ACKNOWLEDGE [2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 3 that means DRIVER [2015-11-13 18:42:23] handle_control_message:333 virtio serial port '-1' hanle control message event = 0, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 1, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 1, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 1, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 4 send control message event = 1, value = 1 [2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 7 that means DRIVER OK [2015-11-13 18:42:23] handle_control_message:333 virtio serial port '1' hanle control message event = 3, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1 [2015-11-13 18:42:23] handle_control_message:333 virtio serial port '2' hanle control message event = 3, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 6, value = 1 [2015-11-13 18:42:23] handle_control_message:333 virtio serial port '3' hanle control message event = 3, value = 1 [2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1 [2015-11-13 18:42:23] handle_control_message:333 virtio serial port '4' hanle control message event = 3, value = 1 [2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE [2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE [2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER [2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK [2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER [2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK [2015-11-13 18:42:30] handle_control_message:333 virtio serial port '2' hanle control message event = 6, value = 1 [2015-11-13 18:42:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 214826}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel1"}} [2015-11-13 18:42:30] handle_control_message:333 virtio serial port '1' hanle control message event = 6, value = 1 [2015-11-13 18:42:30] handle_control_message:333 virtio serial port '3' hanle control message event = 6, value = 1 [2015-11-13 18:42:30] handle_control_message:333 virtio serial port '4' hanle control message event = 6, value = 1 [2015-11-13 18:42:31] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 220665}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel3"}} >If the failure is reproducible, the patch below could help narrow the >possibilities. > >-Kevin > > >--- a/src/fw/smp.c >+++ b/src/fw/smp.c >@@ -125,6 +125,7 @@ smp_setup(void) > > // Wait for other CPUs to process the SIPI. > u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; >+ dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count); > while (cmos_smp_count != CountCPUs) > asm volatile( > // Release lock and allow other processors to use the stack. >@@ -136,6 +137,7 @@ smp_setup(void) > " jc 1b\n" > : "+m" (SMPLock), "+m" (SMPStack) > : : "cc", "memory"); >+ dprintf(1, "finish smp\n"); > yield(); > > // Restore memory. _______________________________________________ SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios