Re: Serial debug broken in recent -CURRENT?

2003-10-14 Thread Greg 'groggy' Lehey
On Wednesday,  8 October 2003 at  2:08:55 +1000, Bruce Evans wrote:
 On Tue, 30 Sep 2003, Sam Leffler wrote:

 It reliably locks up for me when you break into a running system; set a
 breakpoint; and then continue.  Machine is UP+HTT.  Haven't tried other
 machines.

 This seems to be because rev.1.75 of db_interface.c disturbed some much
 larger bugs related to the ones that it fixed.  It takes miracles for
 entering ddb to even sort of work in the SMP case. 

Ah, interesting.  I hadn't thought that it might be related to SMP.

 If one of multiple CPUs in kdb_trap() somehow stops the others, then the
 others face different problems when they restart.  They can't just return
 because debugger traps are not restartable (by just returning).  They can't
 just proceed because the first CPU may changed the state in such a way as
 to make proceeding in the normal way not work (e.g., it may have deleted
 a breakpoint).

 These problems are not correctly or completely fixed in:


 Index: db_interface.c
 ===
 RCS file: /home/ncvs/src/sys/i386/i386/db_interface.c,v
 retrieving revision 1.75
 diff -u -2 -r1.75 db_interface.c
 --- db_interface.c7 Sep 2003 13:43:01 -   1.75
 +++ db_interface.c7 Oct 2003 14:11:35 -
 ...
 This is supposed to stop the other CPUs either in kdb_trap() or normally.
 The timeouts are hopefully long enough for all the CPUs to stop in 1
 of these ways.  But it doesn't always work.  1 possible problem is
 that stop and start IPIs may be delivered out of order, so CPUs stopped
 in kdb_trap() may end up stopped (since we don't wait for them to see
 the stop IPI).

Correct.  This patch doesn't fix the problem on my system.  I've built
a single processor kernel (comment out SMP and APIC_IO), and that
*does* work with remote gdb, so it's almost certainly an SMP issue.  I
have a dump of a partially hanging system if that's of any help.

Greg
--
See complete headers for address and phone numbers.


pgp0.pgp
Description: PGP signature


Re: Serial debug broken in recent -CURRENT?

2003-10-07 Thread Bruce Evans
On Tue, 30 Sep 2003, Sam Leffler wrote:

 It reliably locks up for me when you break into a running system; set a
 breakpoint; and then continue.  Machine is UP+HTT.  Haven't tried other
 machines.

This seems to be because rev.1.75 of db_interface.c disturbed some much
larger bugs related to the ones that it fixed.  It takes miracles for
entering ddb to even sort of work in the SMP case.  If multiple CPUs
call kdb_trap() concurrently, e.g., by all hitting the same breakpoint,
then after 1.75 they first race to stop each other.  Before 1.75, they
raced to clobber each others registers before this.  The race to stop
each other cannot be won since all the CPUs have interrupts disabled
so they cannot respond to IPIs.  It doesn't help that stop_cpus is silent
about this.  It spins silently forever if a CPU can't be stopped, unless
DIAGNOSTIC is configured in which case it gives the plain broken behaviour
of warning and returning after not waiting for long enough.  But things
somehow worked better before 1.75.  I don't know exactly why.  1.75 only
changes the timing a little, and I would have thought that it reduced the
races by giving the other CPUs less time to enter ddb.  My tests mainly
used a breakpoint at ithread_schedule which is sure to be hit by multiple
CPUs quite often, but there wasn't enough interrupt activity for concurrent
entry to be the usual case.  Debugging printfs affected the races a lot --
turning on VERBOSE_STOP_ON_CPU_BREAK mostly avoided the problem, but with
a syscons console it sometimes caused fatal traps in bcopy().

If one of multiple CPUs in kdb_trap() somehow stops the others, then the
others face different problems when they restart.  They can't just return
because debugger traps are not restartable (by just returning).  They can't
just proceed because the first CPU may changed the state in such a way as
to make proceeding in the normal way not work (e.g., it may have deleted
a breakpoint).

These problems are not correctly or completely fixed in:

%%%
Index: db_interface.c
===
RCS file: /home/ncvs/src/sys/i386/i386/db_interface.c,v
retrieving revision 1.75
diff -u -2 -r1.75 db_interface.c
--- db_interface.c  7 Sep 2003 13:43:01 -   1.75
+++ db_interface.c  7 Oct 2003 14:11:35 -
@@ -35,4 +35,5 @@
 #include sys/reboot.h
 #include sys/cons.h
+#include sys/ktr.h
 #include sys/pcpu.h
 #include sys/proc.h
@@ -41,4 +42,5 @@
 #include machine/cpu.h
 #ifdef SMP
+#include machine/smp.h
 #include machine/smptests.h  /** CPUSTOP_ON_DDBBREAK */
 #endif
@@ -73,4 +75,31 @@
 }

+/* XXX this is cloned from stop_cpus() since that function can hang. */
+static int
+attempt_to_stop_cpus(u_int map)
+{
+   int i;
+
+   if (!smp_started)
+   return 0;
+
+   CTR1(KTR_SMP, attempt_to_stop_cpus(%x), map);
+
+   /* send the stop IPI to all CPUs in map */
+   ipi_selected(map, IPI_STOP);
+
+   i = 0;
+   while ((atomic_load_acq_int(stopped_cpus)  map) != map) {
+   /* spin */
+   i++;
+   if (i == 1) {
+   printf(timeout stopping cpus\n);
+   break;
+   }
+   }
+
+   return 1;
+}
+
 /*
  *  kdb_trap - field a TRACE or BPT trap
@@ -81,4 +110,6 @@
u_int ef;
volatile int ddb_mode = !(boothowto  RB_GDB);
+   static u_int kdb_trap_lock = NOCPU;
+   static u_int output_lock;

/*
@@ -103,16 +134,48 @@

 #ifdef SMP
+   if (atomic_cmpset_int(kdb_trap_lock, NOCPU, PCPU_GET(cpuid)) == 0 
+   kdb_trap_lock != PCPU_GET(cpuid)) {
+   while (atomic_cmpset_int(output_lock, 0, 1) == 0)
+   ;
+   db_printf(
+   concurrent ddb entry: type %d trap, code=%x cpu=%d\n,
+   type, code, PCPU_GET(cpuid));
+   atomic_store_rel_int(output_lock, 0);
+   if (type == T_BPTFLT)
+   regs-tf_eip--;
+   else {
+   while (atomic_cmpset_int(output_lock, 0, 1) == 0)
+   ;
+   db_printf(
+concurrent ddb entry on non-breakpoint: too hard to handle properly\n);
+   atomic_store_rel_int(output_lock, 0);
+   }
+   while (atomic_load_acq_int(kdb_trap_lock) != NOCPU)
+   ;
+   write_eflags(ef);
+   return (1);
+   }
+#endif
+
+#ifdef SMP
 #ifdef CPUSTOP_ON_DDBBREAK

+#define VERBOSE_CPUSTOP_ON_DDBBREAK
 #if defined(VERBOSE_CPUSTOP_ON_DDBBREAK)
+   while (atomic_cmpset_int(output_lock, 0, 1) == 0)
+   ;
db_printf(\nCPU%d stopping CPUs: 0x%08x..., PCPU_GET(cpuid),
PCPU_GET(other_cpus));
+   atomic_store_rel_int(output_lock, 0);
 #endif /* VERBOSE_CPUSTOP_ON_DDBBREAK */

/* We stop all CPUs except ourselves (obviously) */
-   

Re: Serial debug broken in recent -CURRENT?

2003-09-30 Thread Bruce Evans
On Mon, 29 Sep 2003, Greg 'groggy' Lehey wrote:

 After building a new kernel, remote serial gdb no longer works.  When
 I issue a 'continue' command, I lose control of the system, but it
 doesn't continue running.  Has anybody else seen this?

It works as well as it did a few months ago here.  (Not very well compared
with ddb.  E.g., calling a function is usually fatal.)

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Serial debug broken in recent -CURRENT?

2003-09-30 Thread Greg 'groggy' Lehey
On Tuesday, 30 September 2003 at 16:23:35 +1000, Bruce Evans wrote:
 On Mon, 29 Sep 2003, Greg 'groggy' Lehey wrote:

 After building a new kernel, remote serial gdb no longer works.  When
 I issue a 'continue' command, I lose control of the system, but it
 doesn't continue running.  Has anybody else seen this?

 It works as well as it did a few months ago here.  (Not very well compared
 with ddb.  E.g., calling a function is usually fatal.)

Hmm, that's not what Sam or I are seeing.  How old is your kernel?
You *are* able to continue, right?  Everything else works for me.

Greg
--
See complete headers for address and phone numbers.
NOTE: Due to the currently active Microsoft-based worms, I am limiting
all incoming mail to 131,072 bytes.  This is enough for normal mail,
but not for large attachments.  Please send these as URLs.


pgp0.pgp
Description: PGP signature


Re: Serial debug broken in recent -CURRENT?

2003-09-30 Thread Bruce Evans
On Tue, 30 Sep 2003, Greg 'groggy' Lehey wrote:

 On Tuesday, 30 September 2003 at 16:23:35 +1000, Bruce Evans wrote:
  On Mon, 29 Sep 2003, Greg 'groggy' Lehey wrote:
 
  After building a new kernel, remote serial gdb no longer works.  When
  I issue a 'continue' command, I lose control of the system, but it
  doesn't continue running.  Has anybody else seen this?
 
  It works as well as it did a few months ago here.  (Not very well compared
  with ddb.  E.g., calling a function is usually fatal.)

 Hmm, that's not what Sam or I are seeing.  How old is your kernel?
 You *are* able to continue, right?  Everything else works for me.

I didn't test with my kernel; I tested with almost-current SMP and !SMP
kernels (amost-current = 217 lines of patches; my kernel = 96934 lines
of patches).  They were about half an hour old when I tried it.  I tested
little more than continuing from Debugger().  I didn't test using optional
foot shooting devices like acpi or modules.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Serial debug broken in recent -CURRENT?

2003-09-30 Thread Sam Leffler
On Tuesday 30 September 2003 04:01 am, Bruce Evans wrote:
 On Tue, 30 Sep 2003, Greg 'groggy' Lehey wrote:
  On Tuesday, 30 September 2003 at 16:23:35 +1000, Bruce Evans wrote:
   On Mon, 29 Sep 2003, Greg 'groggy' Lehey wrote:
   After building a new kernel, remote serial gdb no longer works.  When
   I issue a 'continue' command, I lose control of the system, but it
   doesn't continue running.  Has anybody else seen this?
  
   It works as well as it did a few months ago here.  (Not very well
   compared with ddb.  E.g., calling a function is usually fatal.)
 
  Hmm, that's not what Sam or I are seeing.  How old is your kernel?
  You *are* able to continue, right?  Everything else works for me.

 I didn't test with my kernel; I tested with almost-current SMP and !SMP
 kernels (amost-current = 217 lines of patches; my kernel = 96934 lines
 of patches).  They were about half an hour old when I tried it.  I tested
 little more than continuing from Debugger().  I didn't test using optional
 foot shooting devices like acpi or modules.

It reliably locks up for me when you break into a running system; set a 
breakpoint; and then continue.  Machine is UP+HTT.  Haven't tried other 
machines.

Sam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Serial debug broken in recent -CURRENT?

2003-09-30 Thread Andrew Gallatin

Sam Leffler writes:
  It reliably locks up for me when you break into a running system; set a 
  breakpoint; and then continue.  Machine is UP+HTT.  Haven't tried other 
  machines.

Perhaps related, perhaps a red-herring:   With a single P4 + HTT, +
SMP kernel, if I break into the ddb debugger on a serial console, the
machine locks solid about 1 in 4 times.

This is with a kernel from mid August.  I have been too busy / too
wimpy to upgrade past ATAng.

Drew

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Serial debug broken in recent -CURRENT?

2003-09-30 Thread Greg 'groggy' Lehey
On Tuesday, 30 September 2003 at 16:13:09 -0400, Andrew Gallatin wrote:

 Sam Leffler writes:
 It reliably locks up for me when you break into a running system; set a
 breakpoint; and then continue.  Machine is UP+HTT.  Haven't tried other
 machines.

 Perhaps related, perhaps a red-herring:   With a single P4 + HTT, +
 SMP kernel, if I break into the ddb debugger on a serial console, the
 machine locks solid about 1 in 4 times.

Hmm, the first suggestion that it's possibly transient.  My machine is
a 2 processor Celeron 500 (obviously not HTT :-).  I get the same
results when debugging over firewire, which suggest that the problem
isn't in the serial link handling.

Greg
--
See complete headers for address and phone numbers.
NOTE: Due to the currently active Microsoft-based worms, I am limiting
all incoming mail to 131,072 bytes.  This is enough for normal mail,
but not for large attachments.  Please send these as URLs.


pgp0.pgp
Description: PGP signature


Serial debug broken in recent -CURRENT?

2003-09-29 Thread Greg 'groggy' Lehey
After building a new kernel, remote serial gdb no longer works.  When
I issue a 'continue' command, I lose control of the system, but it
doesn't continue running.  Has anybody else seen this?

Greg
--
See complete headers for address and phone numbers.
NOTE: Due to the currently active Microsoft-based worms, I am limiting
all incoming mail to 131,072 bytes.  This is enough for normal mail,
but not for large attachments.  Please send these as URLs.


pgp0.pgp
Description: PGP signature


Re: Serial debug broken in recent -CURRENT?

2003-09-29 Thread Sam Leffler
On Monday 29 September 2003 01:30 am, Greg 'groggy' Lehey wrote:
 After building a new kernel, remote serial gdb no longer works.  When
 I issue a 'continue' command, I lose control of the system, but it
 doesn't continue running.  Has anybody else seen this?

Yes, I noticed this late last week.  I think it's been busted for 1 week.  I 
tried to pinpoint the commit but ran out of time.

Sam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]