from:"xming"

Re: Cannot boot xen DomU > 2.6.23.1

2008-01-18 Thread xming

On Jan 18, 2008 5:19 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> > First of all this patch solves the lock-ups, it works as advertised :)
>
> OK, good.  I guess events are getting lost somewhere with vcpu_info
> placement.


> Would it be possible to map the eip and some top parts of the stack back
> to kernel symbols?  Seems to be the same place in both traces, which is
> interesting.

Can you tell me how, or show me some pointers?


> > Scenario 2 (have_vcpu_info_placement = 0)
> > --
> >
> > test1: no crash
> > test2: no crash, but occationally I still get funny output like this
> >
> > 00AAZZ
> > 00AAZZ00AAZZ
> > 00AAZZ
> > 00AAZZ
> > AAZZ
> > 00AAZZ
> > 00AAZZ
> > 000AAZZ
> > 000AAZZ
> > 00AAZZ
> >
>
> Hm, I guess some of the output is getting dropped.  Does this happen
> with 2.6.18-xen?

yes it does

00AAZZ
AAZZ
00AAZZ
00AAZZ
00AAZZ00AAZZ
00AAZZ

# uname -a
Linux builder 2.6.18-xen-r8 #3 SMP Thu Dec 20 15:07:20 CET 2007 i686
AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux

cheers

xming
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cannot boot xen DomU > 2.6.23.1

2008-01-18 Thread xming

> OK, I misunderstood your original report to mean that something was
> complaining about "too much" output.  You're saying that lots of console
> output seems to lock the domain.

Sorry about that, and yes that is the case.

> I've had a report about heavy disk IO seems to lock up as well.  Perhaps
> they're both related to high event rates.  Do you think you could try an
> IO-intensive workload to see if you can get a similar lockup?

IO-intensive locks up too (see below)

> When the domain is locked up, what does /usr/lib/xen/bin/xenctx say?

see below

> Hm.  Rather than backing out the structure-change patch, could you try
> this workaround:
>
> diff -r be3ca4e0e19e arch/x86/xen/enlighten.c
> --- a/arch/x86/xen/enlighten.c  Thu Jan 17 14:25:07 2008 -0800
> +++ b/arch/x86/xen/enlighten.c  Thu Jan 17 16:37:42 2008 -0800
> @@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in
>   *
>   * 0: not available, 1: available
>   */
> -static int have_vcpu_info_placement = 1;
> +static int have_vcpu_info_placement = 0;
>
>  static void __init xen_vcpu_setup(int cpu)
>  {

First of all this patch solves the lock-ups, it works as advertised :) The DomU
works as before. Just for the record for people trying to apply this to 2.6.23.x
you need to change the /x86/ to /i386/, unified x86 is since 2.6.24.

I tried to create 2 tests, one is IO intensive and the other is console
output intensive:

test1. bonnie++ -s 1024 -u nobody
test2. for i in `seq 1 5`; do echo 00AAZZ; done

In all acese where it crashed(hanged) there was no oops/panic.

scenario 1 (booted 2.6.23.14 as is)
--
(but with init=/bin/bash, otherwise I couldn't get a prompt)

test1: crashed

# /usr/lib/xen/bin/xenctx 108
eip: c037c0c7
esp: c0343f90
eax:    ebx: 0001   ecx:    edx: c0342000
esi: c0373004   edi: c1210df4   ebp: 1b7d
 cs: 0061ds: 007bfs: 00d8gs: 

Stack:
 c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 0025
 c0348430 0004 9000 6df4 00ea1000 c0363be0 c0343fe8 c03dd007
  c0343fec c0349868 c0343fe0 178bc1f1 2001 01020800 00060fb1
  c03dd000  

Code:
cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82  cc
cc cc cc cc cc cc cc cc cc

Call Trace:
  []  <--
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  [<178bc1f1>]
  []

test2: crashed after many many retries and sometimes with strange output

00AAZZ
00AAZZ
00AAA
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
00AAZZ
AAZZ
00AAZZ
00AAZZ00AAZZ
00AAZZ

# /usr/lib/xen/bin/xenctx 113
eip: c037c0c7
esp: c0343f90
eax:    ebx: 0001   ecx:    edx: c0342000
esi: c0373004   edi: c1210df4   ebp: 1b7d
 cs: 0061ds: 007bfs: 00d8gs: 

Stack:
 c0100add c0378980 c0101962 c0104821 c120a000 c0378df4 c0348cff 0025
 c0348430 0004 9000 6df4 00ea1000 c0363be0 c0343fe8 c03dd007
  c0343fec c0349868 c0343fe0 178bc1f1 2001 00020800 00060fb1
  c03dd000  

Code:
cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 06 00 00 00 cd 82  cc
cc cc cc cc cc cc cc cc cc

Call Trace:
  []  <--
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  []
  [<178bc1f1>]
  []

Scenario 2 (have_vcpu_info_placement = 0)
--

test1: no crash
test2: no crash, but occationally I still get funny output like this

00AAZZ
00AAZZ00AAZZ
00AAZZ
00AAZZ
AAZZ
00AAZZ
00AAZZ
000AAZZ
000AAZZ
00AAZZ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cannot boot xen DomU > 2.6.23.1

2008-01-20 Thread xming

On Jan 18, 2008 6:26 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> >> Would it be possible to map the eip and some top parts of the stack back
> >> to kernel symbols?  Seems to be the same place in both traces, which is
> >> interesting.

> Do "nm -n vmlinux" on the kernel to set an address sorted list of
> symbols, and then look to see what's near the eip (c037c0c7) and near
> the top of the stack (c0100add, c0378980, c0101962, ...).  Some of these
> may be in data, or other strange places, but the ones which correspond
> to code are interesting.

ok I have done some of them, but I still don't know what I should be looking
at. Do you mean code related to xen or code related to have_vcpu_info_placement?
Please be patient with me :)

I just paste some of the result (around those addresses) here:

c037b000 B empty_zero_page
c037c000 B hypercall_page
c037d000 B system_state

c0100a00 t xen_cpuid
c0100a80 t xen_set_debugreg
c0100a90 t xen_get_debugreg
c0100aa0 t xen_save_fl
c0100ac0 t xen_irq_disable
c0100ad0 t xen_safe_halt
c0100af0 t xen_halt
c0100b20 t xen_store_tr
c0100b30 t cvt_gate_to_trap
c0100bb0 t xen_io_delay


c0378980 D per_cpu__irq_stat
c03789c0 d per_cpu__runqueues
c0378df4 D __per_cpu_end

c01018b0 t xen_flush_tlb_single
c0101940 t xen_idle
c0101980 T xen_setup_features
c01019c0 T xen_mc_flush
c0101aa0 T xen_mc_callback

c0104710 T kernel_thread
c01047c0 T cpu_idle
c0104840 T cpu_idle_wait
c0104940 T exit_thread

c0103fe4 T xen_irq_enable_direct
c0103ff1 T xen_irq_enable_direct_reloc
c0103ff5 T xen_irq_enable_direct_end
c0103ff8 T xen_irq_disable_direct
c0104000 T xen_irq_disable_direct_end
c0104004 T xen_save_fl_direct
c0104011 T xen_save_fl_direct_end
c0104014 T xen_restore_fl_direct
c010402b T xen_restore_fl_direct_reloc

c03483f0 t maxcpus
c0348430 t unknown_bootoption
c0348610 T parse_early_param


> >> Hm, I guess some of the output is getting dropped.  Does this happen
> >> with 2.6.18-xen?

> > yes it does

> OK, good.  I Didn't Break It (tm) ;)

So no fix from you? :)

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cannot boot xen DomU > 2.6.23.1

2008-01-20 Thread xming

On Jan 20, 2008 7:37 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> xming wrote:
> > ok I have done some of them, but I still don't know what I should be looking
> > at. Do you mean code related to xen or code related to 
> > have_vcpu_info_placement?
> > Please be patient with me :)
> >
> > I just paste some of the result (around those addresses) here:
> >
>
> Thanks, that answers that particular question; the vcpu is blocked
> waiting for something to happen, which probably means it missed the
> event which was supposed to wake it up.  Why is another question.  At
> least there's a workaround, and that workaround gives me some clue where
> to look.

Want me to test it?

> BTW, is it an SMP or UP domain?   Does it make a difference?

It doesn't matter, I tried vcpu=1 and vcpu=2, unless you want me to try
to recompile a UP kernel?

> >> OK, good.  I Didn't Break It (tm) ;)
>
> > So no fix from you? :)
>
> Maybe when I have nothing else to do.

I'll wait, or should I poke xen-devel?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Cannot boot xen DomU > 2.6.23.1

2008-01-17 Thread xming

Hi,

I finally found the piece of code that prevents me from booting Xen DomU
with vallina kernel > 2.6.23.1.

The problem is that with every kernel (> 2.6.32.1 including 2.6.24 RCs)
will just hang with "too much" console activity. Sometimes (well most
of the time) boot msg is too much. When I can boot into the kernel,
generating a lots of cosole out it will hang, no oops, no more
console/network. Generating with the same way through ssh will not
hang the domU.

When I reverse the following patch, things work as before, tried this
with 2.6.23.14 and 2.6.14-rc8. But I don't have the knowledge to
understand the reason behind this.

BTW, I am not subscribed.

--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -160,8 +160,9 @@ struct vcpu_set_singleshot_timer {
  */
 #define VCPUOP_register_vcpu_info   10  /* arg == struct vcpu_info */
 struct vcpu_register_vcpu_info {
-uint32_t mfn;   /* mfn of page to place vcpu_info */
-uint32_t offset;/* offset within page */
+uint64_t mfn;/* mfn of page to place vcpu_info */
+uint32_t offset; /* offset within page */
+uint32_t rsvd;   /* unused */
 };

 #endif /* __XEN_PUBLIC_VCPU_H__ */

I am running Xen 3.1.2 PAE

# uname -a
Linux builder 2.6.24-rc8 #2 SMP Thu Jan 17 16:37:19 CET 2008 i686
AMD Athlon(tm) X2 Dual Core Processor BE-2300 AuthenticAMD GNU/Linux

# cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) X2 Dual Core Processor BE-2300
stepping: 1
cpu MHz : 1899.930
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp
tm stc 100mhzsteps
bogomips: 3829.72
clflush size: 64

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) X2 Dual Core Processor BE-2300
stepping: 1
cpu MHz : 1899.930
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu tsc msr pae mce cx8 mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16
lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch ts fid vid ttp
tm stc 100mhzsteps
bogomips: 3829.72
clflush size: 64
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cannot boot xen DomU > 2.6.23.1

2008-01-17 Thread xming

On Jan 17, 2008 6:15 PM, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> Uh, that's very strange.  That patch fixes an outright bug, unless
> you're using one specific changeset out of the xen-unstable mercurial
> tree.  What version of Xen are you using?  Is it one you've built from
> xenbits, or distributed by someone?  Are you using a 32 or 64 bit xen/dom0?
>
> I don't understand where the "too much console activity" message comes
> from.  Could you post an actual cut'n'paste of the message, or a
> screenshot?  Are you using a serial console on your dom0?

I am running Xen 3.1.2 from Gentoo, tried 3.1.1. 32 bit PAE as I wrote in the
previous post

> > I am running Xen 3.1.2 PAE

I never had any problem (at least THIS problem) with any PV DomU (2.6.18,
2.6.20, 2.6.21, 2.6.23-RCs and 2.6.23.1). The problem started with 2.6.23.3
and today I finally found time to track it down.

This only affects PV domU, so I don't undestand your question about serial
console of Dom0.

The symptom is (with a lot of subjective judgment) when there is a lot (or
too quick) output on the console of the domU (hvc0 connected with either
"xm crea file.cfg -c" or "xm cons id") the whole PV domU hangs. It will
really hang at random places, sometimes right after init and sometime
after I logged in and just generate some ouput (on hvc0) like "find /". IIRC
I have never seen a hang before init.

When it hangs there is no output on console any more and network to
that domU is dead too, nothing affects dom0. "xm list" still reports the
domU as "r", nothing special in the logs (of dom0 xen logs) and no
OOPS nor panic reported in the domU. It's seem that it's running in a
infinite loop.

So, I can make screenshots, but they won't tell you anything, the is no
message, it's just dead.

Gentoo's xen is just a src tarball made from the mercurial repro, w/o
any patches (AFAIK).

cheers

Ming-Wei Shih
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cannot boot xen DomU > 2.6.23.1

Re: Cannot boot xen DomU > 2.6.23.1

Re: Cannot boot xen DomU > 2.6.23.1

Re: Cannot boot xen DomU > 2.6.23.1

Cannot boot xen DomU > 2.6.23.1

Re: Cannot boot xen DomU > 2.6.23.1

6 matches

Site Navigation

Mail list logo

Footer information