Re: Cannot boot xen DomU > 2.6.23.1
On Jan 18, 2008 5:19 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > > First of all this patch solves the lock-ups, it works as advertised :) > > OK, good. I guess events are getting lost somewhere with vcpu_info > placement. > Would it be possible to map the eip and some top parts of the stack back > to kernel symbols? Seems to be the same place in both traces, which is > interesting. Can you tell me how, or show me some pointers? > > Scenario 2 (have_vcpu_info_placement = 0) > > -- > > > > test1: no crash > > test2: no crash, but occationally I still get funny output like this > > > > 00AAZZ > > 00AAZZ00AAZZ > > 00AAZZ > > 00AAZZ > > AAZZ > > 00AAZZ > > 00AAZZ > > 000AAZZ > > 000AAZZ > > 00AAZZ > > > > Hm, I guess some of the output is getting dropped. Does this happen > with 2.6.18-xen? yes it does 00AAZZ AAZZ 00AAZZ 00AAZZ 00AAZZ00AAZZ 00AAZZ # uname -a Linux builder 2.6.18-xen-r8 #3 SMP Thu Dec 20 15:07:20 CET 2007 i686 AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux cheers xming -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cannot boot xen DomU > 2.6.23.1
> OK, I misunderstood your original report to mean that something was > complaining about "too much" output. You're saying that lots of console > output seems to lock the domain. Sorry about that, and yes that is the case. > I've had a report about heavy disk IO seems to lock up as well. Perhaps > they're both related to high event rates. Do you think you could try an > IO-intensive workload to see if you can get a similar lockup? IO-intensive locks up too (see below) > When the domain is locked up, what does /usr/lib/xen/bin/xenctx say? see below > Hm. Rather than backing out the structure-change patch, could you try > this workaround: > > diff -r be3ca4e0e19e arch/x86/xen/enlighten.c > --- a/arch/x86/xen/enlighten.c Thu Jan 17 14:25:07 2008 -0800 > +++ b/arch/x86/xen/enlighten.c Thu Jan 17 16:37:42 2008 -0800 > @@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in > * > * 0: not available, 1: available > */ > -static int have_vcpu_info_placement = 1; > +static int have_vcpu_info_placement = 0; > > static void __init xen_vcpu_setup(int cpu) > { First of all this patch solves the lock-ups, it works as advertised :) The DomU works as before. Just for the record for people trying to apply this to 2.6.23.x you need to change the /x86/ to /i386/, unified x86 is since 2.6.24. I tried to create 2 tests, one is IO intensive and the other is console output intensive: test1. bonnie++ -s 1024 -u nobody test2. for i in `seq 1 5`; do echo 00AAZZ; done In all acese where it crashed(hanged) there was no oops/panic. scenario 1 (booted 2.6.23.14 as is) -- (but with init=/bin/bash, otherwise I couldn't get a prompt) test1: crashed # /usr/lib/xen/bin/xenctx 108 eip: c037c0c7 esp: c0343f90 eax: ebx: 0001 ecx: edx: c0342000 esi: c0373004 edi: c1210df4 ebp: 1b7d cs: 0061ds: 007bfs: 00d8gs: Stack: c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 0025 c0348430 0004 9000 6df4 00ea1000 c0363be0 c0343fe8 c03dd007 c0343fec c0349868 c0343fe0 178bc1f1 2001 01020800 00060fb1 c03dd000 Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82 cc cc cc cc cc cc cc cc cc cc Call Trace: [] <-- [] [] [] [] [] [] [] [] [] [] [] [] [] [] [<178bc1f1>] [] test2: crashed after many many retries and sometimes with strange output 00AAZZ 00AAZZ 00AAA 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ 00AAZZ AAZZ 00AAZZ 00AAZZ00AAZZ 00AAZZ # /usr/lib/xen/bin/xenctx 113 eip: c037c0c7 esp: c0343f90 eax: ebx: 0001 ecx: edx: c0342000 esi: c0373004 edi: c1210df4 ebp: 1b7d cs: 0061ds: 007bfs: 00d8gs: Stack: c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 0025 c0348430 0004 9000 6df4 00ea1000 c0363be0 c0343fe8 c03dd007 c0343fec c0349868 c0343fe0 178bc1f1 2001 00020800 00060fb1 c03dd000 Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82 cc cc cc cc cc cc cc cc cc cc Call Trace: [] <-- [] [] [] [] [] [] [] [] [] [] [] [] [] [] [<178bc1f1>] [] Scenario 2 (have_vcpu_info_placement = 0) -- test1: no crash test2: no crash, but occationally I still get funny output like this 00AAZZ 00AAZZ00AAZZ 00AAZZ 00AAZZ AAZZ 00AAZZ 00AAZZ 000AAZZ 000AAZZ 00AAZZ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cannot boot xen DomU > 2.6.23.1
On Jan 18, 2008 6:26 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > >> Would it be possible to map the eip and some top parts of the stack back > >> to kernel symbols? Seems to be the same place in both traces, which is > >> interesting. > Do "nm -n vmlinux" on the kernel to set an address sorted list of > symbols, and then look to see what's near the eip (c037c0c7) and near > the top of the stack (c0100add, c0378980, c0101962, ...). Some of these > may be in data, or other strange places, but the ones which correspond > to code are interesting. ok I have done some of them, but I still don't know what I should be looking at. Do you mean code related to xen or code related to have_vcpu_info_placement? Please be patient with me :) I just paste some of the result (around those addresses) here: c037b000 B empty_zero_page c037c000 B hypercall_page c037d000 B system_state c0100a00 t xen_cpuid c0100a80 t xen_set_debugreg c0100a90 t xen_get_debugreg c0100aa0 t xen_save_fl c0100ac0 t xen_irq_disable c0100ad0 t xen_safe_halt c0100af0 t xen_halt c0100b20 t xen_store_tr c0100b30 t cvt_gate_to_trap c0100bb0 t xen_io_delay c0378980 D per_cpu__irq_stat c03789c0 d per_cpu__runqueues c0378df4 D __per_cpu_end c01018b0 t xen_flush_tlb_single c0101940 t xen_idle c0101980 T xen_setup_features c01019c0 T xen_mc_flush c0101aa0 T xen_mc_callback c0104710 T kernel_thread c01047c0 T cpu_idle c0104840 T cpu_idle_wait c0104940 T exit_thread c0103fe4 T xen_irq_enable_direct c0103ff1 T xen_irq_enable_direct_reloc c0103ff5 T xen_irq_enable_direct_end c0103ff8 T xen_irq_disable_direct c0104000 T xen_irq_disable_direct_end c0104004 T xen_save_fl_direct c0104011 T xen_save_fl_direct_end c0104014 T xen_restore_fl_direct c010402b T xen_restore_fl_direct_reloc c03483f0 t maxcpus c0348430 t unknown_bootoption c0348610 T parse_early_param > >> Hm, I guess some of the output is getting dropped. Does this happen > >> with 2.6.18-xen? > > yes it does > OK, good. I Didn't Break It (tm) ;) So no fix from you? :) Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cannot boot xen DomU > 2.6.23.1
On Jan 20, 2008 7:37 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > xming wrote: > > ok I have done some of them, but I still don't know what I should be looking > > at. Do you mean code related to xen or code related to > > have_vcpu_info_placement? > > Please be patient with me :) > > > > I just paste some of the result (around those addresses) here: > > > > Thanks, that answers that particular question; the vcpu is blocked > waiting for something to happen, which probably means it missed the > event which was supposed to wake it up. Why is another question. At > least there's a workaround, and that workaround gives me some clue where > to look. Want me to test it? > BTW, is it an SMP or UP domain? Does it make a difference? It doesn't matter, I tried vcpu=1 and vcpu=2, unless you want me to try to recompile a UP kernel? > >> OK, good. I Didn't Break It (tm) ;) > > > So no fix from you? :) > > Maybe when I have nothing else to do. I'll wait, or should I poke xen-devel? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Cannot boot xen DomU > 2.6.23.1
Hi, I finally found the piece of code that prevents me from booting Xen DomU with vallina kernel > 2.6.23.1. The problem is that with every kernel (> 2.6.32.1 including 2.6.24 RCs) will just hang with "too much" console activity. Sometimes (well most of the time) boot msg is too much. When I can boot into the kernel, generating a lots of cosole out it will hang, no oops, no more console/network. Generating with the same way through ssh will not hang the domU. When I reverse the following patch, things work as before, tried this with 2.6.23.14 and 2.6.14-rc8. But I don't have the knowledge to understand the reason behind this. BTW, I am not subscribed. --- a/include/xen/interface/vcpu.h +++ b/include/xen/interface/vcpu.h @@ -160,8 +160,9 @@ struct vcpu_set_singleshot_timer { */ #define VCPUOP_register_vcpu_info 10 /* arg == struct vcpu_info */ struct vcpu_register_vcpu_info { -uint32_t mfn; /* mfn of page to place vcpu_info */ -uint32_t offset;/* offset within page */ +uint64_t mfn;/* mfn of page to place vcpu_info */ +uint32_t offset; /* offset within page */ +uint32_t rsvd; /* unused */ }; #endif /* __XEN_PUBLIC_VCPU_H__ */ I am running Xen 3.1.2 PAE # uname -a Linux builder 2.6.24-rc8 #2 SMP Thu Jan 17 16:37:19 CET 2008 i686 AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) X2 Dual Core Processor BE-2300 stepping: 1 cpu MHz : 1899.930 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp tm stc 100mhzsteps bogomips: 3829.72 clflush size: 64 processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) X2 Dual Core Processor BE-2300 stepping: 1 cpu MHz : 1899.930 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp tm stc 100mhzsteps bogomips: 3829.72 clflush size: 64 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cannot boot xen DomU > 2.6.23.1
On Jan 17, 2008 6:15 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Uh, that's very strange. That patch fixes an outright bug, unless > you're using one specific changeset out of the xen-unstable mercurial > tree. What version of Xen are you using? Is it one you've built from > xenbits, or distributed by someone? Are you using a 32 or 64 bit xen/dom0? > > I don't understand where the "too much console activity" message comes > from. Could you post an actual cut'n'paste of the message, or a > screenshot? Are you using a serial console on your dom0? I am running Xen 3.1.2 from Gentoo, tried 3.1.1. 32 bit PAE as I wrote in the previous post > > I am running Xen 3.1.2 PAE I never had any problem (at least THIS problem) with any PV DomU (2.6.18, 2.6.20, 2.6.21, 2.6.23-RCs and 2.6.23.1). The problem started with 2.6.23.3 and today I finally found time to track it down. This only affects PV domU, so I don't undestand your question about serial console of Dom0. The symptom is (with a lot of subjective judgment) when there is a lot (or too quick) output on the console of the domU (hvc0 connected with either "xm crea file.cfg -c" or "xm cons id") the whole PV domU hangs. It will really hang at random places, sometimes right after init and sometime after I logged in and just generate some ouput (on hvc0) like "find /". IIRC I have never seen a hang before init. When it hangs there is no output on console any more and network to that domU is dead too, nothing affects dom0. "xm list" still reports the domU as "r", nothing special in the logs (of dom0 xen logs) and no OOPS nor panic reported in the domU. It's seem that it's running in a infinite loop. So, I can make screenshots, but they won't tell you anything, the is no message, it's just dead. Gentoo's xen is just a src tarball made from the mercurial repro, w/o any patches (AFAIK). cheers Ming-Wei Shih -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/