Re: Problems with the demo CD and qemu

2005-02-21 Thread Adam Lackorzynski

On Fri Feb 18, 2005 at 11:24:07 +0100, Cedric Roux wrote:
> (I just tried a boot without CONFIG_HANDLE_SEGMENTS, it crashed,
> I did not try to know why.)

Probably because Linux tries to use it and the kernel rejects it.

> It would be nice to have some infos about this issue, though,
> just to understand what's exactly going on. Do you, Adam, or
> someone else, have pointers or info about it? A second question

Well, it's hard to stay calm when talking about this issue, also, you'll
find some nice rants out there. But I'll try.
Once upon a time someone decided that fast access to a thread local
storage (TLS) for user programs is necessary. As x86 doesn't have many
registers a segment is used to point to the TLS. It's %gs. L4 uses %gs
to point to the UTCB. In Linux, this is implemented by using 3 GDT
entries for each thread which are reloaded for every threadswitch. glibc
still supports the old method with LDT entries, but that didn't work
quite well last time I tried. I guess nobody's using it and it's buggy
somehow but I don't really know. Fiasco can play the LDT game which can
also be useful for other things. But fortunately TLS usage can be
disabled either by rm'ing the tls dirs or using LD_ASSUME_KERNEL, so we
can live with it.

Standard pointers:
http://people.redhat.com/drepper/nptl-design.pdf
http://people.redhat.com/drepper/tls.pdf

> is: is the CONFIG_HANDLE_SEGMENTS mandatory and why? More generally,
> what option in the fiasco configuration are mandatory and what
> are optionnal, and what's the best configuration (having speed
> constraints in mind)?

The option is mandatory right now, I'll need some support outside of
L4Linux to make it optional. For performance stuff, Fiasco outputs the
options on startup you should enable/disable, just try this.




Adam
-- 
Adam [EMAIL PROTECTED]
  Lackorzynski http://os.inf.tu-dresden.de/~adam/

___
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers


Re: Problems with the demo CD and qemu

2005-02-18 Thread Cedric Roux
On Mon, 14 Feb 2005, Adam Lackorzynski wrote:

> On Mon Feb 14, 2005 at 10:40:17 +0100, Cedric Roux wrote:
> 
> I know, but haven't found time yet to look deeper into it. Something
> broke, that's sure. Disabling TLS could help for the time being (rm -r
> /lib/tls...).

L4Linux is there, thanks.

The protection fault was occuring when fiasco calls switch_cpu,
there is a "pop gs" that generates the protection fault.
That's all I can to help you. The TLS stuff is out of my knowledge,
and I run out of time for this. The system boots without the
TLS handling of the tls/libc (or libpthread), that's fine for me :)
(I just tried a boot without CONFIG_HANDLE_SEGMENTS, it crashed,
I did not try to know why.)

It would be nice to have some infos about this issue, though,
just to understand what's exactly going on. Do you, Adam, or
someone else, have pointers or info about it? A second question
is: is the CONFIG_HANDLE_SEGMENTS mandatory and why? More generally,
what option in the fiasco configuration are mandatory and what
are optionnal, and what's the best configuration (having speed
constraints in mind)?

Thanks,
Cedric.

> 
> 
> Adam
> 


___
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers


Re: Problems with the demo CD and qemu

2005-02-14 Thread Adam Lackorzynski
On Mon Feb 14, 2005 at 10:40:17 +0100, Cedric Roux wrote:
> Maybe you can test this specific case and not call switch_to_irq_idle_loop
> when the calling thread is the IRQ one?

I guess that's broken then, I'll look into it.

> Now, I have some General Protection Faults occuring at l4linux boot time
> I don't know where and I don't know why. Is it a known issue?
> because I don't have much time to investigate it.
> And if it is a known issue, what should I do to fix this?
> (otherwise, I'll live with it, trying to get the point when time
> will be there)

I know, but haven't found time yet to look deeper into it. Something
broke, that's sure. Disabling TLS could help for the time being (rm -r
/lib/tls...).


Adam
-- 
Adam [EMAIL PROTECTED]
  Lackorzynski http://os.inf.tu-dresden.de/~adam/

___
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers


Re: Problems with the demo CD and qemu

2005-02-14 Thread Cedric Roux
On Sat, 12 Feb 2005, Adam Lackorzynski wrote:

> > My questions are:
> >   1 - why to call this switch_to_irq_idle_loop? what's
> >   the purpose of it?
> 
> The purpose is to prevent that interrupts get through. The tricky part
> here has been IRQ probing. I guess I need to reevaluate this issue...

Yes, the point is that the thread calls switch_to_irq_idle_loop to
disable itself, weird :) (it can't be enabled again, because the
one that would do it is itself, but itself disabled itself, I mean...
weird :))

Maybe you can test this specific case and not call switch_to_irq_idle_loop
when the calling thread is the IRQ one?

> 
> >   2 - if I remove this call, do I get a wrong system or
> >   is it ok? what do I lose if it is ok (speed?)?
> 
> Should be ok if it works on your system.

Mmm, not very satisfaying as an answer :)

> > (By the way, the l4linux kernel won't compile with 4k stacks,
> 
> It compiled for me as of today but I had to fix some small issues to
> make it actually work (but I only tested this slightly). Should hit CVS
> by tomorrow.
> 
> > you never call irq_ctx_init, maybe you should call it in
> > init_IRQ?)
> 
> No, l4linux has always worked more like the 4k-IRQ way, not as the old
> way in 8k-stacks.

OK, I used the wrong word. It compiled, but it didn't work at runtime.

> Adam

Anyway, thanks.

Now, I have some General Protection Faults occuring at l4linux boot time
I don't know where and I don't know why. Is it a known issue?
because I don't have much time to investigate it.
And if it is a known issue, what should I do to fix this?
(otherwise, I'll live with it, trying to get the point when time
will be there)

Cedric.


___
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers


Re: Problems with the demo CD and qemu

2005-02-12 Thread Adam Lackorzynski
On Fri Feb 11, 2005 at 20:05:31 +0100, Cedric Roux wrote:
> ethernet card (ne2k-pci) sends an IRQ (number 9).
> The IRQ thread passes wait_for_irq_message_hw then calls do_IRQ.
> do_IRQ does its stuff, then calls irq_exit.
> 
> In irq_exit, we have a softirq pending (don't ask me why, that's just
> the way it is), so we call do_softirq.
> 
> We then enter net_tx_action.
> 
> I then pass the details. To be short, we enter the TCP/IP stack,
> do some stuff, then go back into the ethernet driver code,
> in ei_start_xmit (8390.c).
> 
> This function calls disable_irq_nosync, which calls
> desc->handler->disable, which is in fact do_l4lx_irq_dev_disable.
> 
> This one will call switch_to_irq_idle_loop.
> 
> I don't exactly know what happens next (lack of time), but if
> I remove the call to switch_to_irq_idle_loop (and of course
> the corresponding call to switch_to_irq_thread) in
> do_l4lx_irq_dev_disable (respectively do_l4lx_irq_dev_enable)
> everything works fine (well, I don't get crashes when I do
> my telnet anymore).

Thanks for this ample explanation.
 
> My questions are:
>   1 - why to call this switch_to_irq_idle_loop? what's
>   the purpose of it?

The purpose is to prevent that interrupts get through. The tricky part
here has been IRQ probing. I guess I need to reevaluate this issue...

>   2 - if I remove this call, do I get a wrong system or
>   is it ok? what do I lose if it is ok (speed?)?

Should be ok if it works on your system.

>   3 - a comment in switch_to_irq_idle_loop says:
> /* Looks like interrupts are disabled multiple times in 2.6 */
>   shouldn't you use a counter in switch_to_irq_thread and
>   only do the switch if it's back to zero? (I mean, imagine 2
>   calls to switch_to_irq_idle_loop followed by 1 call to
>   switch_to_irq_thread, should it really come back from idle
>   at this point?)

That's not what I would expect from the hardware, disable just disables
it, no matter how ofter you do it.


> (By the way, the l4linux kernel won't compile with 4k stacks,

It compiled for me as of today but I had to fix some small issues to
make it actually work (but I only tested this slightly). Should hit CVS
by tomorrow.

> you never call irq_ctx_init, maybe you should call it in
> init_IRQ?)

No, l4linux has always worked more like the 4k-IRQ way, not as the old
way in 8k-stacks.


Adam
-- 
Adam [EMAIL PROTECTED]
  Lackorzynski http://os.inf.tu-dresden.de/~adam/

___
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers


Re: Problems with the demo CD and qemu

2005-02-11 Thread Cedric Roux
Hi again L4 Hackers,

here is what's going on.

ethernet card (ne2k-pci) sends an IRQ (number 9).

The IRQ thread passes wait_for_irq_message_hw then calls do_IRQ.

do_IRQ does its stuff, then calls irq_exit.

In irq_exit, we have a softirq pending (don't ask me why, that's just
the way it is), so we call do_softirq.

We then enter net_tx_action.

I then pass the details. To be short, we enter the TCP/IP stack,
do some stuff, then go back into the ethernet driver code,
in ei_start_xmit (8390.c).

This function calls disable_irq_nosync, which calls
desc->handler->disable, which is in fact do_l4lx_irq_dev_disable.

This one will call switch_to_irq_idle_loop.

I don't exactly know what happens next (lack of time), but if
I remove the call to switch_to_irq_idle_loop (and of course
the corresponding call to switch_to_irq_thread) in
do_l4lx_irq_dev_disable (respectively do_l4lx_irq_dev_enable)
everything works fine (well, I don't get crashes when I do
my telnet anymore).

My questions are:
  1 - why to call this switch_to_irq_idle_loop? what's
  the purpose of it?
  2 - if I remove this call, do I get a wrong system or
  is it ok? what do I lose if it is ok (speed?)?
  3 - a comment in switch_to_irq_idle_loop says:
/* Looks like interrupts are disabled multiple times in 2.6 */
  shouldn't you use a counter in switch_to_irq_thread and
  only do the switch if it's back to zero? (I mean, imagine 2
  calls to switch_to_irq_idle_loop followed by 1 call to
  switch_to_irq_thread, should it really come back from idle
  at this point?)

Thank you by advance.

Best regards,
Cedric.

(By the way, the l4linux kernel won't compile with 4k stacks,
you never call irq_ctx_init, maybe you should call it in
init_IRQ?)

On Thu, 10 Feb 2005, Cedric Roux wrote:

> Hello L4 Hackers,
> 
> here follows a description of what I did. My questions come to the end
> of the message. Sorry for the length, but I wanted to be clear.

[SNIP]

> <0>Kernel panic: Aiee, killing interrupt handler!
> <0>In interrupt handler - not syncing
> 
> I would like to know:
>   1 - what's going on? I suspected some kind of weird IRQ firing because
>   of the use of qemu, but as far as my investigations have told me,
>   it doesn't seem to be that. I believed that one IRQ went to fast
>   after a first one, so the linux was not yet out of the driver
>   code, but the interrupts were enabled, thus crashing everything.
>   I think I was wrong, no?
>   2 - how to solve this. What code/doc should I read to debug it, where
>   to dig. I am a bit confused for now.
> 
> Thank you by advance for your help.
> 
> Best regards,
> Cedric.


___
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers


Problems with the demo CD and qemu

2005-02-10 Thread Cedric Roux
Hello L4 Hackers,

here follows a description of what I did. My questions come to the end
of the message. Sorry for the length, but I wanted to be clear.

perignan:/home/roux/qemu_try2>wget 
http://fabrice.bellard.free.fr/qemu/qemu-0.6.1.tar.gz
perignan:/home/roux/qemu_try2>tar -xvzf qemu-0.6.1.tar.gz
perignan:/home/roux/qemu_try2>cd qemu-0.6.1
perignan:/home/roux/qemu_try2/qemu-0.6.1>./configure --target-list=i386-softmmu 
--prefix=/home/roux/qemu_try2
Install prefix/home/roux/qemu_try2
BIOS directory/home/roux/qemu_try2/share/qemu
binary directory  /home/roux/qemu_try2/bin
Manual directory  /home/roux/qemu_try2/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /home/roux/qemu_try2/qemu-0.6.1
C compilergcc
make  make
host CPU  i386
host big endian   no
target list   i386-softmmu
gprof enabled no
static build  no
SDL support   yes
SDL static link   yes
mingw32 support   no
Adlib support no
FMOD support  no
perignan:/home/roux/qemu_try2/qemu-0.6.1>make
... bulding output data ... everything was fine ...
perignan:/home/roux/qemu_try2/qemu-0.6.1>make install
... install output data ... everything was fine ...
perignan:/home/roux/qemu_try2/qemu-0.6.1>cd ..
perignan:/home/roux/qemu_try2>wget 
http://os.inf.tu-dresden.de/L4/LinuxOnL4/demo-cd/demo-cd-0.1.iso.bz2
... donwload ok ...
perignan:/home/roux/qemu_try2>bunzip2 demo-cd-0.1.iso.bz2 
perignan:/home/roux/qemu_try2>bin/qemu -cdrom demo-cd-0.1.iso -boot d -serial 
stdio
Could not configure '/dev/rtc' to have a 1024 Hz timer. This is not a fatal
error, but for better emulation accuracy either use a 2.6 host Linux kernel or
type 'echo 1024 > /proc/sys/dev/rtc/max-user-freq' as root.
... a lot of output from l4 ...

Now, we are inside qemu, we just choose "L4Linux 2.6 from this cd [dope]".
When linux is there, we try a: telnet 10.0.2.2 (10.0.2.2 is the address
of the localhost, as set by qemu) inside the l4linux window,
and we've got this:
<3>bad: scheduling while atomic!
 [<00407efe>] dump_stack+0x1e/0x20
 [<0061c3f0>] schedule+0x450/0x490
 [<00401b16>] l4x_idle+0x196/0x1a0
 [<004049c8>] cpu_idle+0x8/0x10
 [<006d0719>] start_kernel+0x1f9/0x260
 [<006d4571>] l4env_linux_startup+0x111/0x120
 [<00015205>] 0x15205
 [<00018198>] 0x18198
<3>bad: scheduling while atomic!
 [<00407efe>] dump_stack+0x1e/0x20
 [<0061c3f0>] schedule+0x450/0x490
 [<004022bf>] l4x_dispatch_message+0x79f/0x11b0
 [<>] 0x0
<3>bad: scheduling while atomic!
 [<00407efe>] dump_stack+0x1e/0x20
 [<0061c3f0>] schedule+0x450/0x490
 [<0040bb99>] sys_sched_yield+0x39/0x50
 [<0044858a>] do_coredump+0xca/0x2b5
 [<0041744a>] get_signal_to_deliver+0x29a/0x310
 [<00406811>] do_signal+0x51/0x720
 [<00401fbe>] l4x_dispatch_message+0x49e/0x11b0
 [<>] 0x0
<3>bad: scheduling while atomic!
 [<00407efe>] dump_stack+0x1e/0x20
 [<0061c3f0>] schedule+0x450/0x490
 [<0061c7b8>] schedule_timeout+0xa8/0xb0
 [<0044f253>] do_select+0x363/0x370
 [<0044f4d1>] sys_select+0x271/0x410
 [<004024c7>] l4x_dispatch_message+0x9a7/0x11b0
 [<>] 0x0
<3>bad: scheduling while atomic!
 [<00407efe>] dump_stack+0x1e/0x20
 [<0061c3f0>] schedule+0x450/0x490
 [<0061c7b8>] schedule_timeout+0xa8/0xb0
 [<0044f253>] do_select+0x363/0x370
 [<0044f4d1>] sys_select+0x271/0x410
 [<004024c7>] l4x_dispatch_message+0x9a7/0x11b0
 [<>] 0x0
<0>Kernel panic: Aiee, killing interrupt handler!
<0>In interrupt handler - not syncing

This error message is not always the same, I once had:
<3>bad: scheduling while atomic!
 [<00407efe>] dump_stack+0x1e/0x20
 [<0061c3f0>] schedule+0x450/0x490
 [<00401b16>] l4x_idle+0x196/0x1a0
 [<004049c8>] cpu_idle+0x8/0x10
 [<006d0719>] start_kernel+0x1f9/0x260
 [<006d4571>] l4env_linux_startup+0x111/0x120
 [<00015205>] 0x15205
 [<00018198>] 0x18198
KERNEL: c.3 (tcb=c0301800) killed:
Unhandled trap
EAX 030ded40 EBX 3293 ECX 00682f00 EDX 030ded48
ESI  EDI 00682f04 EBP 007d1f04 ESP 007d1f04
EIP 0040e27f EFLAGS 3206
CS 001b SS 0023 DS 0023 ES 0023 FS 0023 GS 0043
trapno 6, error , from user mode

We have here:
perignan:/home/roux/qemu_try2>uname -a
Linux perignan 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 
GNU/Linux
perignan:/home/roux/qemu_try2>gcc --version 
gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
My CPU is a Pentium 4 running at 3GHz, I've got 1GB of RAM.
My X Window has a depth of 24, it's a TrueColor visual class.

If you need more informations to solve this, just tell me.

I would like to know:
  1 - what's going on? I suspected some kind of weird IRQ firing because
  of the use of qemu, but as far as my investigations have told me,
  it doesn't seem to be that. I believed that one IRQ went to fast
  after a first one, so the linux was not yet out of the driver
  code, but the interrupts were enabled, thus crashing everything.
  I think I was wrong, no?
  2 - how to solve this. What code/doc should I read to debug it, where
  to dig. I am a bit conf