|
|
 |
 |

htmldeveloper
at gmail
Oct 16, 2009, 10:43 PM
Post #1 of 5 (67 views)
Permalink
|
Today, both my system (2.6.32.-rc4 from linus git tree and
linux-next)
bootup blocked indefinitely
on:
kgdb:
Registered I/O driver kgdbts.
while booting
up.
The expected line:
kgdb:
Unregistered I/O driver kgdbts, debugger disabled.
never comes up.
My bootup menu.lst:
title Fedora (2.6.26-rc4-next-20080530)
root (hd1,7)
kernel /boot/vmlinuz-2.6.26-rc4-next-20080530 ro
root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
initrd /boot/initrd-2.6.26-rc4-next-20080530.img
title Fedora (2.6.32-rc4)
root (hd1,7)
kernel /boot/vmlinuz-2.6.32-rc4 ro
root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
initrd /boot/initrd-2.6.32-rc4.img
and kgdb-related
option:
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_TESTS=y
CONFIG_KGDB_TESTS_ON_BOOT=y
CONFIG_KGDB_TESTS_BOOT_STRING="y"
The same 2.6.32-rc4 image have bootup previously before without any
problem. So what could be the potential cause of this permanent
wait?
--
Regards,
Peter Teoh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
|
|
|
 |
 |

htmldeveloper
at gmail
Oct 17, 2009, 12:40 AM
Post #2 of 5 (62 views)
Permalink
|
sorry....now I reboot it is ok. I don't know why. sorry
about that.
On
Sat, Oct 17, 2009 at 1:43 AM, Peter Teoh
<htmldeveloper[at]gmail.com> wrote:
> Today, both my system
(2.6.32.-rc4 from linus git tree and linux-next)
> bootup blocked indefinitely
on:
>
> kgdb:
Registered I/O driver kgdbts.
>
> while booting
up.
The expected line:
>
> kgdb:
Unregistered I/O driver kgdbts, debugger disabled.
>
> never comes up.
>
> My bootup menu.lst:
>
> title Fedora (2.6.26-rc4-next-20080530)
> root (hd1,7)
> kernel /boot/vmlinuz-2.6.26-rc4-next-20080530 ro
> root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
> initrd /boot/initrd-2.6.26-rc4-next-20080530.img
> title Fedora (2.6.32-rc4)
> root (hd1,7)
> kernel /boot/vmlinuz-2.6.32-rc4 ro
> root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
> initrd /boot/initrd-2.6.32-rc4.img
>
> and kgdb-related
option:
>
> CONFIG_HAVE_ARCH_KGDB=y
> CONFIG_KGDB=y
> CONFIG_KGDB_SERIAL_CONSOLE=y
> CONFIG_KGDB_TESTS=y
> CONFIG_KGDB_TESTS_ON_BOOT=y
> CONFIG_KGDB_TESTS_BOOT_STRING="y"
>
> The same 2.6.32-rc4 image have bootup previously before without
any
> problem. So what could be the potential cause of this permanent
> wait?
>
> --
> Regards,
> Peter Teoh
>
--
Regards,
Peter Teoh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
|
|
|
 |
 |

jason.wessel
at windriver
Oct 19, 2009, 6:24 AM
Post #3 of 5 (50 views)
Permalink
|
Peter Teoh wrote:
> sorry....now I reboot it
is ok. I don't know why. sorry about that.
>
>
This is actually a real problem. It is a race condition, and there are
actually two separate problems.
1) When a processor kernel thread is put into the single step state,
kgdb
expects it to hit the single trap on the
same processor the single
step request was made on.
On an
SMP system a process or kernel thread can migrate to another
processor after kgdb
resumes. This will result in a hard hang in the
cpu roundup part of kgdb.
2) Schedule lock contention can cause a hard hang.
On an
SMP system kgdb
for the x86 architecture single steps by running
only a single core. This is quite problematic if you have the schedule
a lock held by a cpu which is in busy wait. The system will deadlock on
the single step operation from kgdb.
This problem is easily observed by
doing on
a 2 processor system by doing:
while [ 1 ] ; do find / 2> /dev/null > /dev/null; done &
while [ 1 ] ; do date > /dev/null ; done &
echo V1 > /sys/module/kgdbts/parameters/kgdbts
For the first problem, I have a fix which is in the linux-next branch
and will I will send a merge request to Linus to get it into the
mainline tree.
For the second problem, I am going to merge a change to release all the
processors to run, at the expense of missing a breakpoint. It is
possible to change the behavior of this dynamically, for someone who
might care about this behavior, until a longer term approach is
implemented. I have an experimental patch which implements the longer
term approach of using displaced stepping.
The experimental patch uses kprobes to manage software breakpoints. The
kprobe allows the breakpoint to remain planted while stepping around it
by using out of line instruction execution, where you emulate the
original instruction using memory elsewhere, followed by another trap
instruction.
Thanks,
Jason.
> On
Sat, Oct 17, 2009 at 1:43 AM, Peter Teoh
<htmldeveloper[at]gmail.com> wrote:
>
>> Today, both my
system (2.6.32.-rc4 from linus git tree and linux-next)
>> bootup blocked indefinitely
on:
>>
>> kgdb:
Registered I/O driver kgdbts.
>>
>> while booting
up.
The expected line:
>>
>> kgdb:
Unregistered I/O driver kgdbts, debugger disabled.
>>
>> never comes up.
>>
>> My bootup menu.lst:
>>
>> title Fedora (2.6.26-rc4-next-20080530)
>> root (hd1,7)
>> kernel /boot/vmlinuz-2.6.26-rc4-next-20080530 ro
>> root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
>> initrd /boot/initrd-2.6.26-rc4-next-20080530.img
>> title Fedora (2.6.32-rc4)
>> root (hd1,7)
>> kernel /boot/vmlinuz-2.6.32-rc4 ro
>> root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
>> initrd /boot/initrd-2.6.32-rc4.img
>>
>> and kgdb-related
option:
>>
>> CONFIG_HAVE_ARCH_KGDB=y
>> CONFIG_KGDB=y
>> CONFIG_KGDB_SERIAL_CONSOLE=y
>> CONFIG_KGDB_TESTS=y
>> CONFIG_KGDB_TESTS_ON_BOOT=y
>> CONFIG_KGDB_TESTS_BOOT_STRING="y"
>>
>> The same 2.6.32-rc4 image have bootup previously before
without any
>> problem. So what could be the potential cause of this
permanent
>> wait?
>>
>> --
>> Regards,
>> Peter Teoh
>>
>>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
|
|
|
 |
 |

htmldeveloper
at gmail
Oct 19, 2009, 8:54 AM
Post #4 of 5 (50 views)
Permalink
|
thank you for the explanation.
On
Mon, Oct 19, 2009 at 9:24 AM, Jason Wessel
<jason.wessel[at]windriver.com> wrote:
> Peter Teoh wrote:
>> sorry....now I
reboot it is ok. I don't know why. sorry about that.
>>
>>
>
> This is actually a real problem. It is a race condition, and
there are
> actually two separate problems.
>
> 1) When a processor kernel thread is put into the single step
state,
> kgdb
expects it to hit the single trap on the
same processor the single
> step request was made on.
>
sorry for being irrelevant....can i ask this: even if the present
CPU is in single step mode, all other CPU can be fully running and
executing all the time, correct? kgdb
is not designed to handle more
than one CPU in single step mode, right? if wrong, then i supposed
there must be a way to switch among processor, which i don't know how.
not sure if the same concept pertained to kdb?
> On an
SMP system a process or kernel thread can migrate to another
> processor after kgdb
resumes. This will result in a hard hang in the
> cpu roundup part of kgdb.
not sure if it is ok if i can know more about the reason for the hard
hang (in slightly more detail). The reason is because i am trying to
understand if this same problem does exists in any other parts of the
kernel? eg, kdb? or anywhere in the suspend-resume cycle? or
perhaps it can be generalized into a smatch or sparse rules for
standard error pattern recognition? or perhaps inlined into the
kernel source some kind of dynamic test to test/identify the problem?
>
> 2) Schedule lock contention can cause a hard hang.
>
> On
an SMP system kgdb
for the x86 architecture single steps by running
> only a single core. This is quite problematic if you have the
schedule
> a lock held by a cpu which is in busy wait. The system will
deadlock on
> the single step operation from kgdb.
This problem is easily observed by
> doing on a 2
processor system by doing:
>
> while [ 1 ] ; do find / 2> /dev/null > /dev/null; done
&
> while [ 1 ] ; do date > /dev/null ; done &
> echo V1 > /sys/module/kgdbts/parameters/kgdbts
>
> For the first problem, I have a fix which is in the linux-next
branch
> and will I will send a merge request to Linus to get it into the
> mainline tree.
>
> For the second problem, I am going to merge a change to release
all the
> processors to run, at the expense of missing a breakpoint. It is
> possible to change the behavior of this dynamically, for someone
who
> might care about this behavior, until a longer term approach is
> implemented. I have an experimental patch which implements the
longer
> term approach of using displaced stepping.
>
> The experimental patch uses kprobes to manage software
breakpoints. The
> kprobe allows the breakpoint to remain planted while stepping
around it
> by using out of line instruction execution, where you emulate the
> original instruction using memory elsewhere, followed by another
trap
> instruction.
>
> Thanks,
thank you for the verbose explanation.......appreciate very much.
let me take some time to understand......
> Jason.
>
>> On
Sat, Oct 17, 2009 at 1:43 AM, Peter Teoh
<htmldeveloper[at]gmail.com> wrote:
>>
>>> Today, both
my system (2.6.32.-rc4 from linus git tree and linux-next)
>>> bootup blocked indefinitely
on:
>>>
>>> kgdb:
Registered I/O driver kgdbts.
>>>
>>> while booting
up.
The expected line:
>>>
>>> kgdb:
Unregistered I/O driver kgdbts, debugger disabled.
>>>
>>> never comes up.
>>>
>>> My bootup menu.lst:
>>>
>>> title Fedora (2.6.26-rc4-next-20080530)
>>> root (hd1,7)
>>> kernel /boot/vmlinuz-2.6.26-rc4-next-20080530 ro
>>> root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
>>> initrd /boot/initrd-2.6.26-rc4-next-20080530.img
>>> title Fedora (2.6.32-rc4)
>>> root (hd1,7)
>>> kernel /boot/vmlinuz-2.6.32-rc4 ro
>>> root=UUID=d10fe8db-e7d4-4b42-b265-0109a3f3eedf
>>> initrd /boot/initrd-2.6.32-rc4.img
>>>
>>> and kgdb-related
option:
>>>
>>> CONFIG_HAVE_ARCH_KGDB=y
>>> CONFIG_KGDB=y
>>> CONFIG_KGDB_SERIAL_CONSOLE=y
>>> CONFIG_KGDB_TESTS=y
>>> CONFIG_KGDB_TESTS_ON_BOOT=y
>>> CONFIG_KGDB_TESTS_BOOT_STRING="y"
>>>
>>> The same 2.6.32-rc4 image have bootup previously before
without any
>>> problem. So what could be the potential cause of this
permanent
>>> wait?
>>>
>>> --
>>> Regards,
>>> Peter Teoh
>>>
>>>
>>
>>
>>
>>
>
>
--
Regards,
Peter Teoh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
|
|
|
 |
 |

jason.wessel
at windriver
Oct 19, 2009, 12:17 PM
Post #5 of 5 (49 views)
Permalink
|
Peter Teoh wrote:
> On
Mon, Oct 19, 2009 at 9:24 AM, Jason Wessel
> <jason.wessel[at]windriver.com> wrote:
>
>> This is
actually a real problem. It is a race condition, and there are
>> actually two separate problems.
>>
>> 1) When a processor kernel thread is put into the single step
state,
>> kgdb
expects it to hit the single trap on the
same processor the single
>> step request was made on.
>>
>>
>
> sorry for being irrelevant....can i ask this: even if the present
> CPU is in single step mode, all other CPU can be fully running and
> executing all the time, correct?
It is not quite that simple. The single step mode is a kernel task
state.
When kgdb
does a single step on the
x86 architecture, the HW single step
bit is set in the active kernel task on CPU
2 for instance. Then kgdb
starts just that CPU. If an interrupt occurs or any kind of
preemption, is when the problem case arises. This task may get
scheduled onto a different CPU at a later point, and dead lock ensues.
> kgdb
is not designed to handle more
> than one CPU in single step mode, right? if wrong, then i supposed
> there must be a way to switch among processor, which i don't know
how.
> not sure if the same concept pertained to kdb?
>
>
Kgdb
will not single step more that one task at a time. In kdb it has
the capability of switching CPUs, and in the kgdb+kdb
merge branch I
implemented that functionality as well. Either way it it still can only
single step one kernel thread at a time.
>> On an
SMP system a process or kernel thread can migrate to another
>> processor after kgdb
resumes. This will result in a hard hang in the
>> cpu roundup part of kgdb.
>>
>
> not sure if it is ok if i can know more about the reason for the
hard
> hang (in slightly more detail). The reason is because i am trying
to
> understand if this same problem does exists in any other parts of
the
> kernel? eg, kdb? or anywhere in the suspend-resume cycle? or
> perhaps it can be generalized into a smatch or sparse rules for
> standard error pattern recognition? or perhaps inlined into the
> kernel source some kind of dynamic test to test/identify the
problem?
>
>
This particular problem does not exist anywhere else in the kernel. It
is unique to the way kgdb
deals with stopping and starting the system.
In kernel/kgdb.c
the key is in anything that touches the variable
"kgdb_cpu_doing_single_step". It is up to
each architecture that makes
use of kgdb
to set/unset this variable. The x86 arch sets it, and what
it does is not allow the other CPUs to run when single stepping. If we
remove the set on the
x86 arch, then you end up
with the task migration
issue, so I was proposing putting in the fix to both issues, until a
displaced solution with kprobes or another implementation is completed.
You trade one problem for another of course with allowing the CPU's to
run.
The original problem was a "hard hang". The new problem is the
possibility of a missed break point. For instance if you set a
breakpoint in a chunk of common code that can execute in parallel on two
different CPUs. The breakpoint gets removed, the single step HW flag is
set, and if another CPU or task runs through that chunk of code, the
break point is missed. My preference is to trade the hard hang away for
the time being.
Jason.
-- |
|
|
|