Wessel, Jason wrote:
>>>>>> I'm also getting this with RT patch applied on x86_64 SMP machine
>>>>>>(with low-latency desktop kernel) after hitting initial
>>breakpoint:
>>>>>>BUG: at kernel/softirq.c:647 __tasklet_action()
>>>>>>Call Trace:
>>>>>>[<ffffffff8022e61a>] __tasklet_action+0xe7/0x138
>>>>>>[<ffffffff8022e693>] tasklet_action+0x28/0x2a
>>[<ffffffff8022e892>]
>>>>>>ksoftirqd+0x149/0x1f3 [<ffffffff8022e749>] ksoftirqd+0x0/0x1f3
>>>>>>[<ffffffff8023d324>] kthread+0xdc/0x113 [<ffffffff8020adf8>]
>>>>>>child_rip+0xa/0x12 [<ffffffff8023d44f>]
>>kthread_create+0x6a/0x15c
>>>>>>[<ffffffff8023d248>] kthread+0x0/0x113 [<ffffffff8020adee>]
>>>>>>child_rip+0x0/0x12
>>>>>>---------------------------
>>>>>>| preempt count: 00000100 ]
>>>>>>| 0-level deep critical section nesting:
>>>>>>----------------------------------------
>>> Ugh, this one was really nasty. The actual reason has turned to be
>>>that the KGDB's tasklet gets scheduled *before* per-CPU data gets
>>>replicated for each CPU, therefore it modifies the .data.percpu
>>>section itself. But the tasklet is actually run *after* the
>>>replication, so it gets into the tasklet lists on every CPU -- and so
>>>I get that BUG on every CPU! Any thoughts on how to avoid this
>>>nuisance? :-/
>> Looks like a design issue to me: KGDB (ab)uses tasklets
>>before per-CPU data gets replicated. This only happens on
>>x86_64 SMP machines because those don't have exception stack
>>setup by the time initial breakpoint is hit. What I don't
>>understand yet is why these BUGs don't show up without the
>>-rt patch...
The mainline code has the same TASKLET_STATE_SCHED but check and BUG, yet
it didn't seem to give the trace -- I'll investigate today...
> Is this during the boot cycle or attaching afterwards?
The former -- it's caused by the 'kgdbwait' option.
> The tasklet at
> runtime should only be used to break in initially.
And it is.
> It sounds like the problem might be else where though.
No, it lays exactly where I've described. Quoting the boot log with my
some printk() added:
Command line: BOOT_IMAGE=vmlinuz-headless ip=any root=/dev/nfs kgdbwait
[EMAIL PROTECTED]/,@192.168.222.1/ console=ttyS0,115200n1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000d4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff70000 (usable)
BIOS-e820: 000000007ff70000 - 000000007ff73000 (ACPI data)
BIOS-e820: 000000007ff73000 - 000000007ff80000 (ACPI NVS)
BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
__tasklet_common_schedule called on CPU0 with t = ffffffff80810ba0, head =
ffffffff808ec5c8, nr = 5
Schedule tasklet with next = 0000000000000000, state = 1, count = 0, func =
ffffffff80259598
This is where kgdb_early_init() schedules the tasklet -- note the value of
'head' arg, it's in the initial .data.percpu section.
KGDB cannot initialize I/O yet.
end_pfn_map = 1048576
[...]
Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000)
PERCPU: Allocating 34816 bytes of per cpu data
This is where per-CPU data gets allocated and copied from .percpu.data
section. Our tasklet is still queued, so it gets into each CPUs' data!
Built 1 zonelists. Total pages: 514985
Kernel command line: BOOT_IMAGE=vmlinuz-headless ip=any root=/dev/nfs kgdbwait
[EMAIL PROTECTED]/,@192.168.222.1/ console=ttyS0,115200n1
kgdboe: local port 6443
kgdboe: local IP 192.168.222.22
kgdboe: interface eth0
kgdboe: remote port 6442
kgdboe: remote IP 192.168.222.1
kgdboe: remote ethernet address ff:ff:ff:ff:ff:ff
Initializing CPU#0
WARNING: experimental RCU implementation.
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC calibrated against PM_TIMER
time.c: Detected 1794.068 MHz processor.
tasklet_action: &tasklet_vec = ffff810002c155c8
Execute tasklet on CPU0 with next = 0000000000000000, state = 1, count = 0,
func = ffffffff80259598
And here it is executed at last on CPU0. And later it gets wrongly
re-excuted on CPU1 (note zero state meaning TASKLET_STATF_SCHED bit already
cleared by that time):
[...]
SCSI subsystem initialized
Execute tasklet on CPU1 with next = 0000000000000000, state = 0, count = 0,
func = ffffffff80259598
BUG: at kernel/softirq.c:667 __tasklet_action()
Call Trace:
<IRQ> [<ffffffff8022e5b4>] __tasklet_action+0x108/0x159
[<ffffffff8022e663>] tasklet_action+0x5e/0x6c
[<ffffffff8022e028>] ___do_softirq+0xb1/0x183
[<ffffffff8022e13c>] __do_softirq+0x42/0x5b
[<ffffffff80243388>] tick_periodic+0x71/0x73
[<ffffffff8020b18c>] call_softirq+0x1c/0x28
[<ffffffff8020cfb9>] do_softirq+0x3d/0xb4
[<ffffffff8022e23c>] irq_exit+0x40/0x52
[<ffffffff80216f76>] smp_apic_timer_interrupt+0x49/0x64
[<ffffffff80208206>] default_idle+0x0/0x4c
[<ffffffff8020ac36>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff8020823d>] default_idle+0x37/0x4c
[<ffffffff802081bb>] enter_idle+0x22/0x24
[<ffffffff8020840c>] cpu_idle+0x58/0xa3
[<ffffffff808b094d>] start_secondary+0x2fe/0x30d
KGDB cannot initialize I/O yet.
(The debug output could have been more verbose but it was hard to
cut/paste the complete log that way; I've also skipped large irrelevant parts
of it).
> The tasklet should run on a single
> CPU and that CPU will have an exception to put KGDB into the exception
> context where it should try to obtain control of all the other
> processors via an IPI.
Yeah, that's clear. But in reality, in x86_64 SMP kernel, the tasklet
gets executed on each CPU!
> Perhaps the RT code you have patched the kernel with has not changed the
> lock semantics in the kernel/kgdb.c to lock down all the processors?
No, that I have fixed (and it would have caused different kind of BUG).
> Jason.
WBR, Sergei
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Kgdb-bugreport mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport