Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Bruce Evans

On Sun, 31 Mar 2019, Konstantin Belousov wrote:


On Sun, Mar 31, 2019 at 10:27:54PM +1100, Bruce Evans wrote:

On Sat, 30 Mar 2019, Konstantin Belousov wrote:


On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:

On Fri, 29 Mar 2019, Konstantin Belousov wrote:


On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:

Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.

  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.

  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
...
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }

 static void
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);

This of course misses the siginfo information from the real fault.


It is in the nested signal frame.


Why SIGBUS/SIGSEGV are caught at all ?


Otherwise, the screen is left in an application mode that the kernel
doesn't support (except to not write to the screen, so that the
application has full control).


Also, the keyboard may be left in a strange state.  Usually this is no
worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
switching to another vty without going through a libvgl handler anyway.
The combination is usually enough to break switching to vty0 to run ddb,
though that should work for at least breakpoints in the kernel.  IIRC,
sc_cngrab() switches the keyboard mode, and my version of it restores
ignoring of the anti-switch flag.


...
I am more about not doing kill(2) from the handler, instead do plain
return to the context which caused the original signal.


That won't work well for SIGBUS's and SIGSEGV's related to the library.
If the above cleanup is not done, then it is too hard to debug the problem,
and if the above cleanup is done then it is impossible to debug the original
problem if it was for and invalid access in libvgl code.

The latter is a problem for the above cleanup too -- the cleanup clobbers
the libvgl state.

However, I could do a more limited cleanup of just restoring the screen and
keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
most ioctls are async-signal safe in practice provided they don't use
global variables that were set too recently or too non-atomically.
VGLEnd() uses main globals established by VGLInit().  It depends on program
order which is not guaranteed in signal handlers for testing the flags set
by initialization.  The cleanup now also does:
- screen clearing.  This breaks debugging and is dangerous if the bug is
   in screen clearing.  Screen clearing is also very slow, and belongs
   in the kernel.  It is now done 4 times per libvgl use, but its
   slowness is limited by the kernel not clearing properly and the
   kernel not allowing the application to mmap() the whole physical
   frame buffer in some versions of FreeBSD (this breaks panning).
- munmap().  This breaks debugging.
- free() of backing store buffer and the main VGLDisplay data.  This breaks
   debugging and is unnecessary if we are going to exit.


You can only try to do async-safe calls from SIGSEGV handler, to not
break a limited chances to success. Also you should not modify in-memory
state for the library, for the same reason, and then it fits your goal
of keeping the state intact for debugging. Then you should return to
re-create the original situation causing and not kill(2).


That is what I suggested in "however".  However2, it is extra work to write
a different exit path.  When running the program directly under gdb, you
also have to tell gdb to not stop at any caught signal, since the screen
will be unusable then.


I find it weird to try to debug mode-switched code without serial console.


Running gdb on a remote system via a remote shell works almost as well.
It is inconvenient to

Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Bruce Evans

On Sun, 31 Mar 2019, Konstantin Belousov wrote:


On Mon, Apr 01, 2019 at 12:04:45AM +1100, Bruce Evans wrote:

Serial consoles are not always available.

Better debuggers switch the screen mode as necessary.

I recently noticed another mode switching problem.  On i386, cycling
through about 50 modes to test them all takes a small fraction of a
second for each mode switch (done directly in syscons or by a vm86
BIOS call) even when the monitor takes too long to resync (the monitor
resyncs are coalesced, so 50 resyncs take about 5 seconds instead of
250).  But on amd64, mode switches to VESA mode and back take about
20 seconds.  They are not coalesced, and most of the system is stopped
waiting for them (at least remote network access is stopped).  amd64
can't use vm86, so it emulates BIOS calls.  Why does the emulation stop
the rest of the system and take so long?


How many CPUs do you have ?


8 on the machine that shows the problem -- an underdesktop with Haswell
graphics.

A laptop with Sandybridge graphics only waits for 0.25 seconds (sys time)
I don't know if it stops the system for that long (it would be 2 blockages
for half that long).  The laptop's integrated screen also syncs instantly.
The undermydesktop has an LED screen connected by DVI-D.


x86bios.c x86bios_call() does spinlock_enter() around emulator calls.
This disables interrupts.

If you have only one CPU, the consequences are obvious. If you have
more than one, then perhaps next IPI targeted to this CPU blocks the
originator, if IPI requires ack, which typically true for most common
TLB shutdown IPIs. Then both this CPU and originator block any other CPU
trying to send an IPI.


Serial console interrupts on another CPU work to enter ddb instantly and
show the emulator stopped.

Binding drivers to CPU unimproves scheduling in general, and here it might
result in iflib waiting for the CPU with interrupts disabled, but usually
the CPU with interrupts disabled is not the one(s) bound to by iflib, and
binding the graphics program to one not used by iflib doesn't help.

Removing the spinlock_enter/exit() using ddb makes no difference.


For this reason I have to come through several hoops to not disable
interrupts for duration of EFI runtime calls, otherwise spinlocks die
due to 'spin lock held too long' on other CPUs.


I turn off the 'spin lock held too long' panics in my fixes for console
and message buffer i/o.  Printing "spin lock held too long\n" takes 120
seconds at 2 bps.  Draining 16K of buffered messages takes 81920 seconds
at 2 bps.  The main fix is to cancel the timeouts here while any console
driver is making progress (hopefully printing messages about the actual
error).  2 bps is too slow to be useful, but it gives a good stress test
and a check that the timeouts are scaled properly by the console speed.

Disabling interrupts might be FUD.  vm86 on i386 only uses critical_enter()
in addition to its sleep lock.  But this is bad for scheduling too.

Bruce


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Konstantin Belousov
On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
> 
> > On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:
> >> Author: bde
> >> Date: Fri Mar 29 15:57:08 2019
> >> New Revision: 345696
> >> URL: https://svnweb.freebsd.org/changeset/base/345696
> >>
> >> Log:
> >>   Fix endless loops for handling SIGBUS and SIGSEGV.
> >>
> >>   r80270 has the usual wrong fix for unsafe signal handling -- just set
> >>   a flag and return to let an event loop check the flag and do safe
> >>   handling.  This never works for signals like SIGBUS and SIGSEGV that
> >>   repeat and works poorly for others unless the application has an event
> >>   loop designed to support this.
> >>
> >>   For these signals, clean up unsafely as before, except for arranging that
> >>   nested signals are fatal and forcing a nested signal if the cleanup 
> >> doesn't
> >>   cause one.
> >>
> >> Modified:
> >>   head/lib/libvgl/main.c
> >>
> >> Modified: head/lib/libvgl/main.c
> >> ==
> >> --- head/lib/libvgl/main.c Fri Mar 29 15:20:48 2019(r345695)
> >> +++ head/lib/libvgl/main.c Fri Mar 29 15:57:08 2019(r345696)
> >> ...
> >> @@ -107,14 +107,22 @@ struct vt_mode smode;
> >>  }
> >>
> >>  static void
> >> -VGLAbort(int arg __unused)
> >> +VGLAbort(int arg)
> >>  {
> >> +  sigset_t mask;
> >> +
> >>VGLAbortPending = 1;
> >>signal(SIGINT, SIG_IGN);
> >>signal(SIGTERM, SIG_IGN);
> >> -  signal(SIGSEGV, SIG_IGN);
> >> -  signal(SIGBUS, SIG_IGN);
> >>signal(SIGUSR2, SIG_IGN);
> >> +  if (arg == SIGBUS || arg == SIGSEGV) {
> >> +signal(arg, SIG_DFL);
> >> +sigemptyset(&mask);
> >> +sigaddset(&mask, arg);
> >> +sigprocmask(SIG_UNBLOCK, &mask, NULL);
> >> +VGLEnd();
> >> +kill(getpid(), arg);
> > This of course misses the siginfo information from the real fault.
> 
> It is in the nested signal frame.
> 
> > Why SIGBUS/SIGSEGV are caught at all ?
> 
> Otherwise, the screen is left in an application mode that the kernel
> doesn't support (except to not write to the screen, so that the
> application has full control).
> 
> Of course, it is bad for libraries to catch signals.  The man page warns
> about installing signal handlers and trying to call VGLEnd() and exit(3)
> [from the signal handler] to end the program, although it is important
> to use VGLEnd() to end the program.  I haven't tried installing signal
> handlers in programs, but my main application is another graphics library
> that wraps libvgl, and it installs an atexit() handler which calls its
> cleanup function which calls VGLEnd().  libvgl should do the same.  The
> cleanup is safe in exit() provided calling exit() is safe.
> 
> The man page has general nonsense recommending setting a flag and not
> terminating the program in signal handlers.  This was added in the same
> commit that completely broke the SIGBUS and SIGSEGV handling using this
> method.  A much longer lesson is needed to describe how to use this
> method correctly.  SA_RESTART must be turned off for all signals, and
> this gives the complexity of handling EINTR returns from all syscalls
> affected by SA_RESTART ...
> 
> The man page has almost enough details about which signals the library
> catches.  It doesn't mention SIGUSR1 (used for screen switches) or
> SIGUSR2 (used for mouse signals).  I plan to add use of SIGIO to fix
> busy-waiting for keyboard events.  select() might work for the keyboard,
> but I doubt that it works for the mouse.
> 
> What should happen is if the application installs signal handlers and
> does unsafe things not including calling VGLEnd(), then nested SIGBUS's
> and SIGSEGV's should reach the above cleanup.  Applications should not
> catch SIGBUS or SIGSEGV unless they can clean up better than the above.
> It is possible to clean up better than the above, but not in applications.
> The above can check a soft signal mask flag like the one set by INTOFF()
> and not do so much when it is set.  When it is not set, I think restoring
> the screen mode is safe in signal handlers, and only things like free()
> and printf() are not safe in signal handlers.

I am more about not doing kill(2) from the handler, instead do plain
return to the context which caused the original signal.


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Konstantin Belousov
On Sun, Mar 31, 2019 at 10:27:54PM +1100, Bruce Evans wrote:
> On Sat, 30 Mar 2019, Konstantin Belousov wrote:
> 
> > On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
> >> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:
>  Author: bde
>  Date: Fri Mar 29 15:57:08 2019
>  New Revision: 345696
>  URL: https://svnweb.freebsd.org/changeset/base/345696
> 
>  Log:
>    Fix endless loops for handling SIGBUS and SIGSEGV.
> 
>    r80270 has the usual wrong fix for unsafe signal handling -- just set
>    a flag and return to let an event loop check the flag and do safe
>    handling.  This never works for signals like SIGBUS and SIGSEGV that
>    repeat and works poorly for others unless the application has an event
>    loop designed to support this.
> 
>    For these signals, clean up unsafely as before, except for arranging 
>  that
>    nested signals are fatal and forcing a nested signal if the cleanup 
>  doesn't
>    cause one.
> 
>  Modified:
>    head/lib/libvgl/main.c
> 
>  Modified: head/lib/libvgl/main.c
>  ==
>  --- head/lib/libvgl/main.c   Fri Mar 29 15:20:48 2019
>  (r345695)
>  +++ head/lib/libvgl/main.c   Fri Mar 29 15:57:08 2019
>  (r345696)
>  ...
>  @@ -107,14 +107,22 @@ struct vt_mode smode;
>   }
> 
>   static void
>  -VGLAbort(int arg __unused)
>  +VGLAbort(int arg)
>   {
>  +  sigset_t mask;
>  +
> VGLAbortPending = 1;
> signal(SIGINT, SIG_IGN);
> signal(SIGTERM, SIG_IGN);
>  -  signal(SIGSEGV, SIG_IGN);
>  -  signal(SIGBUS, SIG_IGN);
> signal(SIGUSR2, SIG_IGN);
>  +  if (arg == SIGBUS || arg == SIGSEGV) {
>  +signal(arg, SIG_DFL);
>  +sigemptyset(&mask);
>  +sigaddset(&mask, arg);
>  +sigprocmask(SIG_UNBLOCK, &mask, NULL);
>  +VGLEnd();
>  +kill(getpid(), arg);
> >>> This of course misses the siginfo information from the real fault.
> >>
> >> It is in the nested signal frame.
> >>
> >>> Why SIGBUS/SIGSEGV are caught at all ?
> >>
> >> Otherwise, the screen is left in an application mode that the kernel
> >> doesn't support (except to not write to the screen, so that the
> >> application has full control).
> 
> Also, the keyboard may be left in a strange state.  Usually this is no
> worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
> mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
> switching to another vty without going through a libvgl handler anyway.
> The combination is usually enough to break switching to vty0 to run ddb,
> though that should work for at least breakpoints in the kernel.  IIRC,
> sc_cngrab() switches the keyboard mode, and my version of it restores
> ignoring of the anti-switch flag.
> 
> > ...
> > I am more about not doing kill(2) from the handler, instead do plain
> > return to the context which caused the original signal.
> 
> That won't work well for SIGBUS's and SIGSEGV's related to the library.
> If the above cleanup is not done, then it is too hard to debug the problem,
> and if the above cleanup is done then it is impossible to debug the original
> problem if it was for and invalid access in libvgl code.
> 
> The latter is a problem for the above cleanup too -- the cleanup clobbers
> the libvgl state.
> 
> However, I could do a more limited cleanup of just restoring the screen and
> keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
> most ioctls are async-signal safe in practice provided they don't use
> global variables that were set too recently or too non-atomically.
> VGLEnd() uses main globals established by VGLInit().  It depends on program
> order which is not guaranteed in signal handlers for testing the flags set
> by initialization.  The cleanup now also does:
> - screen clearing.  This breaks debugging and is dangerous if the bug is
>in screen clearing.  Screen clearing is also very slow, and belongs
>in the kernel.  It is now done 4 times per libvgl use, but its
>slowness is limited by the kernel not clearing properly and the
>kernel not allowing the application to mmap() the whole physical
>frame buffer in some versions of FreeBSD (this breaks panning).
> - munmap().  This breaks debugging.
> - free() of backing store buffer and the main VGLDisplay data.  This breaks
>debugging and is unnecessary if we are going to exit.

You can only try to do async-safe calls from SIGSEGV handler, to not
break a limited chances to success. Also you should not modify in-memory
state for the library, for the same reason, and then it fits your goal
of keeping the state intact for debugging. Then you should return to
re-create the original situa

Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Konstantin Belousov
On Mon, Apr 01, 2019 at 12:04:45AM +1100, Bruce Evans wrote:
> Serial consoles are not always available.
> 
> Better debuggers switch the screen mode as necessary.
> 
> I recently noticed another mode switching problem.  On i386, cycling
> through about 50 modes to test them all takes a small fraction of a
> second for each mode switch (done directly in syscons or by a vm86
> BIOS call) even when the monitor takes too long to resync (the monitor
> resyncs are coalesced, so 50 resyncs take about 5 seconds instead of
> 250).  But on amd64, mode switches to VESA mode and back take about
> 20 seconds.  They are not coalesced, and most of the system is stopped
> waiting for them (at least remote network access is stopped).  amd64
> can't use vm86, so it emulates BIOS calls.  Why does the emulation stop
> the rest of the system and take so long?

How many CPUs do you have ?

x86bios.c x86bios_call() does spinlock_enter() around emulator calls.
This disables interrupts.

If you have only one CPU, the consequences are obvious. If you have
more than one, then perhaps next IPI targeted to this CPU blocks the
originator, if IPI requires ack, which typically true for most common
TLB shutdown IPIs. Then both this CPU and originator block any other CPU
trying to send an IPI.

For this reason I have to come through several hoops to not disable
interrupts for duration of EFI runtime calls, otherwise spinlocks die
due to 'spin lock held too long' on other CPUs.


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Bruce Evans

On Sat, 30 Mar 2019, Konstantin Belousov wrote:


On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:

On Fri, 29 Mar 2019, Konstantin Belousov wrote:


On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:

Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.

  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.

  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
...
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }

 static void
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);

This of course misses the siginfo information from the real fault.


It is in the nested signal frame.


Why SIGBUS/SIGSEGV are caught at all ?


Otherwise, the screen is left in an application mode that the kernel
doesn't support (except to not write to the screen, so that the
application has full control).


Also, the keyboard may be left in a strange state.  Usually this is no
worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
switching to another vty without going through a libvgl handler anyway.
The combination is usually enough to break switching to vty0 to run ddb,
though that should work for at least breakpoints in the kernel.  IIRC,
sc_cngrab() switches the keyboard mode, and my version of it restores
ignoring of the anti-switch flag.


...
I am more about not doing kill(2) from the handler, instead do plain
return to the context which caused the original signal.


That won't work well for SIGBUS's and SIGSEGV's related to the library.
If the above cleanup is not done, then it is too hard to debug the problem,
and if the above cleanup is done then it is impossible to debug the original
problem if it was for and invalid access in libvgl code.

The latter is a problem for the above cleanup too -- the cleanup clobbers
the libvgl state.

However, I could do a more limited cleanup of just restoring the screen and
keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
most ioctls are async-signal safe in practice provided they don't use
global variables that were set too recently or too non-atomically.
VGLEnd() uses main globals established by VGLInit().  It depends on program
order which is not guaranteed in signal handlers for testing the flags set
by initialization.  The cleanup now also does:
- screen clearing.  This breaks debugging and is dangerous if the bug is
  in screen clearing.  Screen clearing is also very slow, and belongs
  in the kernel.  It is now done 4 times per libvgl use, but its
  slowness is limited by the kernel not clearing properly and the
  kernel not allowing the application to mmap() the whole physical
  frame buffer in some versions of FreeBSD (this breaks panning).
- munmap().  This breaks debugging.
- free() of backing store buffer and the main VGLDisplay data.  This breaks
  debugging and is unnecessary if we are going to exit.

Bruce


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Bruce Evans
Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.
  
  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.
  
  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
@@ -31,9 +31,9 @@
 #include 
 __FBSDID("$FreeBSD$");
 
+#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }
 
 static void 
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);
+  }
 }
 
 static void


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Bruce Evans

On Fri, 29 Mar 2019, Konstantin Belousov wrote:


On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:

Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.

  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.

  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
...
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }

 static void
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);

This of course misses the siginfo information from the real fault.


It is in the nested signal frame.


Why SIGBUS/SIGSEGV are caught at all ?


Otherwise, the screen is left in an application mode that the kernel
doesn't support (except to not write to the screen, so that the
application has full control).

Of course, it is bad for libraries to catch signals.  The man page warns
about installing signal handlers and trying to call VGLEnd() and exit(3)
[from the signal handler] to end the program, although it is important
to use VGLEnd() to end the program.  I haven't tried installing signal
handlers in programs, but my main application is another graphics library
that wraps libvgl, and it installs an atexit() handler which calls its
cleanup function which calls VGLEnd().  libvgl should do the same.  The
cleanup is safe in exit() provided calling exit() is safe.

The man page has general nonsense recommending setting a flag and not
terminating the program in signal handlers.  This was added in the same
commit that completely broke the SIGBUS and SIGSEGV handling using this
method.  A much longer lesson is needed to describe how to use this
method correctly.  SA_RESTART must be turned off for all signals, and
this gives the complexity of handling EINTR returns from all syscalls
affected by SA_RESTART ...

The man page has almost enough details about which signals the library
catches.  It doesn't mention SIGUSR1 (used for screen switches) or
SIGUSR2 (used for mouse signals).  I plan to add use of SIGIO to fix
busy-waiting for keyboard events.  select() might work for the keyboard,
but I doubt that it works for the mouse.

What should happen is if the application installs signal handlers and
does unsafe things not including calling VGLEnd(), then nested SIGBUS's
and SIGSEGV's should reach the above cleanup.  Applications should not
catch SIGBUS or SIGSEGV unless they can clean up better than the above.
It is possible to clean up better than the above, but not in applications.
The above can check a soft signal mask flag like the one set by INTOFF()
and not do so much when it is set.  When it is not set, I think restoring
the screen mode is safe in signal handlers, and only things like free()
and printf() are not safe in signal handlers.




+  }
 }

 static void




Bruce


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-09-03 Thread Konstantin Belousov
On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:
> Author: bde
> Date: Fri Mar 29 15:57:08 2019
> New Revision: 345696
> URL: https://svnweb.freebsd.org/changeset/base/345696
> 
> Log:
>   Fix endless loops for handling SIGBUS and SIGSEGV.
>   
>   r80270 has the usual wrong fix for unsafe signal handling -- just set
>   a flag and return to let an event loop check the flag and do safe
>   handling.  This never works for signals like SIGBUS and SIGSEGV that
>   repeat and works poorly for others unless the application has an event
>   loop designed to support this.
>   
>   For these signals, clean up unsafely as before, except for arranging that
>   nested signals are fatal and forcing a nested signal if the cleanup doesn't
>   cause one.
> 
> Modified:
>   head/lib/libvgl/main.c
> 
> Modified: head/lib/libvgl/main.c
> ==
> --- head/lib/libvgl/main.cFri Mar 29 15:20:48 2019(r345695)
> +++ head/lib/libvgl/main.cFri Mar 29 15:57:08 2019(r345696)
> @@ -31,9 +31,9 @@
>  #include 
>  __FBSDID("$FreeBSD$");
>  
> +#include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -107,14 +107,22 @@ struct vt_mode smode;
>  }
>  
>  static void 
> -VGLAbort(int arg __unused)
> +VGLAbort(int arg)
>  {
> +  sigset_t mask;
> +
>VGLAbortPending = 1;
>signal(SIGINT, SIG_IGN);
>signal(SIGTERM, SIG_IGN);
> -  signal(SIGSEGV, SIG_IGN);
> -  signal(SIGBUS, SIG_IGN);
>signal(SIGUSR2, SIG_IGN);
> +  if (arg == SIGBUS || arg == SIGSEGV) {
> +signal(arg, SIG_DFL);
> +sigemptyset(&mask);
> +sigaddset(&mask, arg);
> +sigprocmask(SIG_UNBLOCK, &mask, NULL);
> +VGLEnd();
> +kill(getpid(), arg);
This of course misses the siginfo information from the real fault.
Why SIGBUS/SIGSEGV are caught at all ?

> +  }
>  }
>  
>  static void


___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-04-01 Thread Bruce Evans

On Sun, 31 Mar 2019, Konstantin Belousov wrote:


On Mon, Apr 01, 2019 at 12:04:45AM +1100, Bruce Evans wrote:

Serial consoles are not always available.

Better debuggers switch the screen mode as necessary.

I recently noticed another mode switching problem.  On i386, cycling
through about 50 modes to test them all takes a small fraction of a
second for each mode switch (done directly in syscons or by a vm86
BIOS call) even when the monitor takes too long to resync (the monitor
resyncs are coalesced, so 50 resyncs take about 5 seconds instead of
250).  But on amd64, mode switches to VESA mode and back take about
20 seconds.  They are not coalesced, and most of the system is stopped
waiting for them (at least remote network access is stopped).  amd64
can't use vm86, so it emulates BIOS calls.  Why does the emulation stop
the rest of the system and take so long?


How many CPUs do you have ?


8 on the machine that shows the problem -- an underdesktop with Haswell
graphics.

A laptop with Sandybridge graphics only waits for 0.25 seconds (sys time)
I don't know if it stops the system for that long (it would be 2 blockages
for half that long).  The laptop's integrated screen also syncs instantly.
The undermydesktop has an LED screen connected by DVI-D.


x86bios.c x86bios_call() does spinlock_enter() around emulator calls.
This disables interrupts.

If you have only one CPU, the consequences are obvious. If you have
more than one, then perhaps next IPI targeted to this CPU blocks the
originator, if IPI requires ack, which typically true for most common
TLB shutdown IPIs. Then both this CPU and originator block any other CPU
trying to send an IPI.


Serial console interrupts on another CPU work to enter ddb instantly and
show the emulator stopped.

Binding drivers to CPU unimproves scheduling in general, and here it might
result in iflib waiting for the CPU with interrupts disabled, but usually
the CPU with interrupts disabled is not the one(s) bound to by iflib, and
binding the graphics program to one not used by iflib doesn't help.

Removing the spinlock_enter/exit() using ddb makes no difference.


For this reason I have to come through several hoops to not disable
interrupts for duration of EFI runtime calls, otherwise spinlocks die
due to 'spin lock held too long' on other CPUs.


I turn off the 'spin lock held too long' panics in my fixes for console
and message buffer i/o.  Printing "spin lock held too long\n" takes 120
seconds at 2 bps.  Draining 16K of buffered messages takes 81920 seconds
at 2 bps.  The main fix is to cancel the timeouts here while any console
driver is making progress (hopefully printing messages about the actual
error).  2 bps is too slow to be useful, but it gives a good stress test
and a check that the timeouts are scaled properly by the console speed.

Disabling interrupts might be FUD.  vm86 on i386 only uses critical_enter()
in addition to its sleep lock.  But this is bad for scheduling too.

Bruce
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-03-31 Thread Konstantin Belousov
On Mon, Apr 01, 2019 at 12:04:45AM +1100, Bruce Evans wrote:
> Serial consoles are not always available.
> 
> Better debuggers switch the screen mode as necessary.
> 
> I recently noticed another mode switching problem.  On i386, cycling
> through about 50 modes to test them all takes a small fraction of a
> second for each mode switch (done directly in syscons or by a vm86
> BIOS call) even when the monitor takes too long to resync (the monitor
> resyncs are coalesced, so 50 resyncs take about 5 seconds instead of
> 250).  But on amd64, mode switches to VESA mode and back take about
> 20 seconds.  They are not coalesced, and most of the system is stopped
> waiting for them (at least remote network access is stopped).  amd64
> can't use vm86, so it emulates BIOS calls.  Why does the emulation stop
> the rest of the system and take so long?

How many CPUs do you have ?

x86bios.c x86bios_call() does spinlock_enter() around emulator calls.
This disables interrupts.

If you have only one CPU, the consequences are obvious. If you have
more than one, then perhaps next IPI targeted to this CPU blocks the
originator, if IPI requires ack, which typically true for most common
TLB shutdown IPIs. Then both this CPU and originator block any other CPU
trying to send an IPI.

For this reason I have to come through several hoops to not disable
interrupts for duration of EFI runtime calls, otherwise spinlocks die
due to 'spin lock held too long' on other CPUs.
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-03-31 Thread Konstantin Belousov
On Sun, Mar 31, 2019 at 10:27:54PM +1100, Bruce Evans wrote:
> On Sat, 30 Mar 2019, Konstantin Belousov wrote:
> 
> > On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
> >> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:
>  Author: bde
>  Date: Fri Mar 29 15:57:08 2019
>  New Revision: 345696
>  URL: https://svnweb.freebsd.org/changeset/base/345696
> 
>  Log:
>    Fix endless loops for handling SIGBUS and SIGSEGV.
> 
>    r80270 has the usual wrong fix for unsafe signal handling -- just set
>    a flag and return to let an event loop check the flag and do safe
>    handling.  This never works for signals like SIGBUS and SIGSEGV that
>    repeat and works poorly for others unless the application has an event
>    loop designed to support this.
> 
>    For these signals, clean up unsafely as before, except for arranging 
>  that
>    nested signals are fatal and forcing a nested signal if the cleanup 
>  doesn't
>    cause one.
> 
>  Modified:
>    head/lib/libvgl/main.c
> 
>  Modified: head/lib/libvgl/main.c
>  ==
>  --- head/lib/libvgl/main.c   Fri Mar 29 15:20:48 2019
>  (r345695)
>  +++ head/lib/libvgl/main.c   Fri Mar 29 15:57:08 2019
>  (r345696)
>  ...
>  @@ -107,14 +107,22 @@ struct vt_mode smode;
>   }
> 
>   static void
>  -VGLAbort(int arg __unused)
>  +VGLAbort(int arg)
>   {
>  +  sigset_t mask;
>  +
> VGLAbortPending = 1;
> signal(SIGINT, SIG_IGN);
> signal(SIGTERM, SIG_IGN);
>  -  signal(SIGSEGV, SIG_IGN);
>  -  signal(SIGBUS, SIG_IGN);
> signal(SIGUSR2, SIG_IGN);
>  +  if (arg == SIGBUS || arg == SIGSEGV) {
>  +signal(arg, SIG_DFL);
>  +sigemptyset(&mask);
>  +sigaddset(&mask, arg);
>  +sigprocmask(SIG_UNBLOCK, &mask, NULL);
>  +VGLEnd();
>  +kill(getpid(), arg);
> >>> This of course misses the siginfo information from the real fault.
> >>
> >> It is in the nested signal frame.
> >>
> >>> Why SIGBUS/SIGSEGV are caught at all ?
> >>
> >> Otherwise, the screen is left in an application mode that the kernel
> >> doesn't support (except to not write to the screen, so that the
> >> application has full control).
> 
> Also, the keyboard may be left in a strange state.  Usually this is no
> worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
> mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
> switching to another vty without going through a libvgl handler anyway.
> The combination is usually enough to break switching to vty0 to run ddb,
> though that should work for at least breakpoints in the kernel.  IIRC,
> sc_cngrab() switches the keyboard mode, and my version of it restores
> ignoring of the anti-switch flag.
> 
> > ...
> > I am more about not doing kill(2) from the handler, instead do plain
> > return to the context which caused the original signal.
> 
> That won't work well for SIGBUS's and SIGSEGV's related to the library.
> If the above cleanup is not done, then it is too hard to debug the problem,
> and if the above cleanup is done then it is impossible to debug the original
> problem if it was for and invalid access in libvgl code.
> 
> The latter is a problem for the above cleanup too -- the cleanup clobbers
> the libvgl state.
> 
> However, I could do a more limited cleanup of just restoring the screen and
> keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
> most ioctls are async-signal safe in practice provided they don't use
> global variables that were set too recently or too non-atomically.
> VGLEnd() uses main globals established by VGLInit().  It depends on program
> order which is not guaranteed in signal handlers for testing the flags set
> by initialization.  The cleanup now also does:
> - screen clearing.  This breaks debugging and is dangerous if the bug is
>in screen clearing.  Screen clearing is also very slow, and belongs
>in the kernel.  It is now done 4 times per libvgl use, but its
>slowness is limited by the kernel not clearing properly and the
>kernel not allowing the application to mmap() the whole physical
>frame buffer in some versions of FreeBSD (this breaks panning).
> - munmap().  This breaks debugging.
> - free() of backing store buffer and the main VGLDisplay data.  This breaks
>debugging and is unnecessary if we are going to exit.

You can only try to do async-safe calls from SIGSEGV handler, to not
break a limited chances to success. Also you should not modify in-memory
state for the library, for the same reason, and then it fits your goal
of keeping the state intact for debugging. Then you should return to
re-create the original situa

Re: svn commit: r345696 - head/lib/libvgl

2019-03-31 Thread Bruce Evans

On Sun, 31 Mar 2019, Konstantin Belousov wrote:


On Sun, Mar 31, 2019 at 10:27:54PM +1100, Bruce Evans wrote:

On Sat, 30 Mar 2019, Konstantin Belousov wrote:


On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:

On Fri, 29 Mar 2019, Konstantin Belousov wrote:


On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:

Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.

  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.

  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
...
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }

 static void
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);

This of course misses the siginfo information from the real fault.


It is in the nested signal frame.


Why SIGBUS/SIGSEGV are caught at all ?


Otherwise, the screen is left in an application mode that the kernel
doesn't support (except to not write to the screen, so that the
application has full control).


Also, the keyboard may be left in a strange state.  Usually this is no
worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
switching to another vty without going through a libvgl handler anyway.
The combination is usually enough to break switching to vty0 to run ddb,
though that should work for at least breakpoints in the kernel.  IIRC,
sc_cngrab() switches the keyboard mode, and my version of it restores
ignoring of the anti-switch flag.


...
I am more about not doing kill(2) from the handler, instead do plain
return to the context which caused the original signal.


That won't work well for SIGBUS's and SIGSEGV's related to the library.
If the above cleanup is not done, then it is too hard to debug the problem,
and if the above cleanup is done then it is impossible to debug the original
problem if it was for and invalid access in libvgl code.

The latter is a problem for the above cleanup too -- the cleanup clobbers
the libvgl state.

However, I could do a more limited cleanup of just restoring the screen and
keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
most ioctls are async-signal safe in practice provided they don't use
global variables that were set too recently or too non-atomically.
VGLEnd() uses main globals established by VGLInit().  It depends on program
order which is not guaranteed in signal handlers for testing the flags set
by initialization.  The cleanup now also does:
- screen clearing.  This breaks debugging and is dangerous if the bug is
   in screen clearing.  Screen clearing is also very slow, and belongs
   in the kernel.  It is now done 4 times per libvgl use, but its
   slowness is limited by the kernel not clearing properly and the
   kernel not allowing the application to mmap() the whole physical
   frame buffer in some versions of FreeBSD (this breaks panning).
- munmap().  This breaks debugging.
- free() of backing store buffer and the main VGLDisplay data.  This breaks
   debugging and is unnecessary if we are going to exit.


You can only try to do async-safe calls from SIGSEGV handler, to not
break a limited chances to success. Also you should not modify in-memory
state for the library, for the same reason, and then it fits your goal
of keeping the state intact for debugging. Then you should return to
re-create the original situation causing and not kill(2).


That is what I suggested in "however".  However2, it is extra work to write
a different exit path.  When running the program directly under gdb, you
also have to tell gdb to not stop at any caught signal, since the screen
will be unusable then.


I find it weird to try to debug mode-switched code without serial console.


Running gdb on a remote system via a remote shell works almost as well.
It is inconvenient to

Re: svn commit: r345696 - head/lib/libvgl

2019-03-31 Thread Bruce Evans

On Sat, 30 Mar 2019, Konstantin Belousov wrote:


On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:

On Fri, 29 Mar 2019, Konstantin Belousov wrote:


On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:

Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.

  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.

  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
...
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }

 static void
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);

This of course misses the siginfo information from the real fault.


It is in the nested signal frame.


Why SIGBUS/SIGSEGV are caught at all ?


Otherwise, the screen is left in an application mode that the kernel
doesn't support (except to not write to the screen, so that the
application has full control).


Also, the keyboard may be left in a strange state.  Usually this is no
worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
switching to another vty without going through a libvgl handler anyway.
The combination is usually enough to break switching to vty0 to run ddb,
though that should work for at least breakpoints in the kernel.  IIRC,
sc_cngrab() switches the keyboard mode, and my version of it restores
ignoring of the anti-switch flag.


...
I am more about not doing kill(2) from the handler, instead do plain
return to the context which caused the original signal.


That won't work well for SIGBUS's and SIGSEGV's related to the library.
If the above cleanup is not done, then it is too hard to debug the problem,
and if the above cleanup is done then it is impossible to debug the original
problem if it was for and invalid access in libvgl code.

The latter is a problem for the above cleanup too -- the cleanup clobbers
the libvgl state.

However, I could do a more limited cleanup of just restoring the screen and
keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
most ioctls are async-signal safe in practice provided they don't use
global variables that were set too recently or too non-atomically.
VGLEnd() uses main globals established by VGLInit().  It depends on program
order which is not guaranteed in signal handlers for testing the flags set
by initialization.  The cleanup now also does:
- screen clearing.  This breaks debugging and is dangerous if the bug is
  in screen clearing.  Screen clearing is also very slow, and belongs
  in the kernel.  It is now done 4 times per libvgl use, but its
  slowness is limited by the kernel not clearing properly and the
  kernel not allowing the application to mmap() the whole physical
  frame buffer in some versions of FreeBSD (this breaks panning).
- munmap().  This breaks debugging.
- free() of backing store buffer and the main VGLDisplay data.  This breaks
  debugging and is unnecessary if we are going to exit.

Bruce
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-03-30 Thread Konstantin Belousov
On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
> 
> > On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:
> >> Author: bde
> >> Date: Fri Mar 29 15:57:08 2019
> >> New Revision: 345696
> >> URL: https://svnweb.freebsd.org/changeset/base/345696
> >>
> >> Log:
> >>   Fix endless loops for handling SIGBUS and SIGSEGV.
> >>
> >>   r80270 has the usual wrong fix for unsafe signal handling -- just set
> >>   a flag and return to let an event loop check the flag and do safe
> >>   handling.  This never works for signals like SIGBUS and SIGSEGV that
> >>   repeat and works poorly for others unless the application has an event
> >>   loop designed to support this.
> >>
> >>   For these signals, clean up unsafely as before, except for arranging that
> >>   nested signals are fatal and forcing a nested signal if the cleanup 
> >> doesn't
> >>   cause one.
> >>
> >> Modified:
> >>   head/lib/libvgl/main.c
> >>
> >> Modified: head/lib/libvgl/main.c
> >> ==
> >> --- head/lib/libvgl/main.c Fri Mar 29 15:20:48 2019(r345695)
> >> +++ head/lib/libvgl/main.c Fri Mar 29 15:57:08 2019(r345696)
> >> ...
> >> @@ -107,14 +107,22 @@ struct vt_mode smode;
> >>  }
> >>
> >>  static void
> >> -VGLAbort(int arg __unused)
> >> +VGLAbort(int arg)
> >>  {
> >> +  sigset_t mask;
> >> +
> >>VGLAbortPending = 1;
> >>signal(SIGINT, SIG_IGN);
> >>signal(SIGTERM, SIG_IGN);
> >> -  signal(SIGSEGV, SIG_IGN);
> >> -  signal(SIGBUS, SIG_IGN);
> >>signal(SIGUSR2, SIG_IGN);
> >> +  if (arg == SIGBUS || arg == SIGSEGV) {
> >> +signal(arg, SIG_DFL);
> >> +sigemptyset(&mask);
> >> +sigaddset(&mask, arg);
> >> +sigprocmask(SIG_UNBLOCK, &mask, NULL);
> >> +VGLEnd();
> >> +kill(getpid(), arg);
> > This of course misses the siginfo information from the real fault.
> 
> It is in the nested signal frame.
> 
> > Why SIGBUS/SIGSEGV are caught at all ?
> 
> Otherwise, the screen is left in an application mode that the kernel
> doesn't support (except to not write to the screen, so that the
> application has full control).
> 
> Of course, it is bad for libraries to catch signals.  The man page warns
> about installing signal handlers and trying to call VGLEnd() and exit(3)
> [from the signal handler] to end the program, although it is important
> to use VGLEnd() to end the program.  I haven't tried installing signal
> handlers in programs, but my main application is another graphics library
> that wraps libvgl, and it installs an atexit() handler which calls its
> cleanup function which calls VGLEnd().  libvgl should do the same.  The
> cleanup is safe in exit() provided calling exit() is safe.
> 
> The man page has general nonsense recommending setting a flag and not
> terminating the program in signal handlers.  This was added in the same
> commit that completely broke the SIGBUS and SIGSEGV handling using this
> method.  A much longer lesson is needed to describe how to use this
> method correctly.  SA_RESTART must be turned off for all signals, and
> this gives the complexity of handling EINTR returns from all syscalls
> affected by SA_RESTART ...
> 
> The man page has almost enough details about which signals the library
> catches.  It doesn't mention SIGUSR1 (used for screen switches) or
> SIGUSR2 (used for mouse signals).  I plan to add use of SIGIO to fix
> busy-waiting for keyboard events.  select() might work for the keyboard,
> but I doubt that it works for the mouse.
> 
> What should happen is if the application installs signal handlers and
> does unsafe things not including calling VGLEnd(), then nested SIGBUS's
> and SIGSEGV's should reach the above cleanup.  Applications should not
> catch SIGBUS or SIGSEGV unless they can clean up better than the above.
> It is possible to clean up better than the above, but not in applications.
> The above can check a soft signal mask flag like the one set by INTOFF()
> and not do so much when it is set.  When it is not set, I think restoring
> the screen mode is safe in signal handlers, and only things like free()
> and printf() are not safe in signal handlers.

I am more about not doing kill(2) from the handler, instead do plain
return to the context which caused the original signal.
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-03-29 Thread Bruce Evans

On Fri, 29 Mar 2019, Konstantin Belousov wrote:


On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:

Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.

  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.

  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
...
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }

 static void
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);

This of course misses the siginfo information from the real fault.


It is in the nested signal frame.


Why SIGBUS/SIGSEGV are caught at all ?


Otherwise, the screen is left in an application mode that the kernel
doesn't support (except to not write to the screen, so that the
application has full control).

Of course, it is bad for libraries to catch signals.  The man page warns
about installing signal handlers and trying to call VGLEnd() and exit(3)
[from the signal handler] to end the program, although it is important
to use VGLEnd() to end the program.  I haven't tried installing signal
handlers in programs, but my main application is another graphics library
that wraps libvgl, and it installs an atexit() handler which calls its
cleanup function which calls VGLEnd().  libvgl should do the same.  The
cleanup is safe in exit() provided calling exit() is safe.

The man page has general nonsense recommending setting a flag and not
terminating the program in signal handlers.  This was added in the same
commit that completely broke the SIGBUS and SIGSEGV handling using this
method.  A much longer lesson is needed to describe how to use this
method correctly.  SA_RESTART must be turned off for all signals, and
this gives the complexity of handling EINTR returns from all syscalls
affected by SA_RESTART ...

The man page has almost enough details about which signals the library
catches.  It doesn't mention SIGUSR1 (used for screen switches) or
SIGUSR2 (used for mouse signals).  I plan to add use of SIGIO to fix
busy-waiting for keyboard events.  select() might work for the keyboard,
but I doubt that it works for the mouse.

What should happen is if the application installs signal handlers and
does unsafe things not including calling VGLEnd(), then nested SIGBUS's
and SIGSEGV's should reach the above cleanup.  Applications should not
catch SIGBUS or SIGSEGV unless they can clean up better than the above.
It is possible to clean up better than the above, but not in applications.
The above can check a soft signal mask flag like the one set by INTOFF()
and not do so much when it is set.  When it is not set, I think restoring
the screen mode is safe in signal handlers, and only things like free()
and printf() are not safe in signal handlers.




+  }
 }

 static void




Bruce
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r345696 - head/lib/libvgl

2019-03-29 Thread Konstantin Belousov
On Fri, Mar 29, 2019 at 03:57:09PM +, Bruce Evans wrote:
> Author: bde
> Date: Fri Mar 29 15:57:08 2019
> New Revision: 345696
> URL: https://svnweb.freebsd.org/changeset/base/345696
> 
> Log:
>   Fix endless loops for handling SIGBUS and SIGSEGV.
>   
>   r80270 has the usual wrong fix for unsafe signal handling -- just set
>   a flag and return to let an event loop check the flag and do safe
>   handling.  This never works for signals like SIGBUS and SIGSEGV that
>   repeat and works poorly for others unless the application has an event
>   loop designed to support this.
>   
>   For these signals, clean up unsafely as before, except for arranging that
>   nested signals are fatal and forcing a nested signal if the cleanup doesn't
>   cause one.
> 
> Modified:
>   head/lib/libvgl/main.c
> 
> Modified: head/lib/libvgl/main.c
> ==
> --- head/lib/libvgl/main.cFri Mar 29 15:20:48 2019(r345695)
> +++ head/lib/libvgl/main.cFri Mar 29 15:57:08 2019(r345696)
> @@ -31,9 +31,9 @@
>  #include 
>  __FBSDID("$FreeBSD$");
>  
> +#include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -107,14 +107,22 @@ struct vt_mode smode;
>  }
>  
>  static void 
> -VGLAbort(int arg __unused)
> +VGLAbort(int arg)
>  {
> +  sigset_t mask;
> +
>VGLAbortPending = 1;
>signal(SIGINT, SIG_IGN);
>signal(SIGTERM, SIG_IGN);
> -  signal(SIGSEGV, SIG_IGN);
> -  signal(SIGBUS, SIG_IGN);
>signal(SIGUSR2, SIG_IGN);
> +  if (arg == SIGBUS || arg == SIGSEGV) {
> +signal(arg, SIG_DFL);
> +sigemptyset(&mask);
> +sigaddset(&mask, arg);
> +sigprocmask(SIG_UNBLOCK, &mask, NULL);
> +VGLEnd();
> +kill(getpid(), arg);
This of course misses the siginfo information from the real fault.
Why SIGBUS/SIGSEGV are caught at all ?

> +  }
>  }
>  
>  static void
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r345696 - head/lib/libvgl

2019-03-29 Thread Bruce Evans
Author: bde
Date: Fri Mar 29 15:57:08 2019
New Revision: 345696
URL: https://svnweb.freebsd.org/changeset/base/345696

Log:
  Fix endless loops for handling SIGBUS and SIGSEGV.
  
  r80270 has the usual wrong fix for unsafe signal handling -- just set
  a flag and return to let an event loop check the flag and do safe
  handling.  This never works for signals like SIGBUS and SIGSEGV that
  repeat and works poorly for others unless the application has an event
  loop designed to support this.
  
  For these signals, clean up unsafely as before, except for arranging that
  nested signals are fatal and forcing a nested signal if the cleanup doesn't
  cause one.

Modified:
  head/lib/libvgl/main.c

Modified: head/lib/libvgl/main.c
==
--- head/lib/libvgl/main.c  Fri Mar 29 15:20:48 2019(r345695)
+++ head/lib/libvgl/main.c  Fri Mar 29 15:57:08 2019(r345696)
@@ -31,9 +31,9 @@
 #include 
 __FBSDID("$FreeBSD$");
 
+#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -107,14 +107,22 @@ struct vt_mode smode;
 }
 
 static void 
-VGLAbort(int arg __unused)
+VGLAbort(int arg)
 {
+  sigset_t mask;
+
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
-  signal(SIGSEGV, SIG_IGN);
-  signal(SIGBUS, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
+  if (arg == SIGBUS || arg == SIGSEGV) {
+signal(arg, SIG_DFL);
+sigemptyset(&mask);
+sigaddset(&mask, arg);
+sigprocmask(SIG_UNBLOCK, &mask, NULL);
+VGLEnd();
+kill(getpid(), arg);
+  }
 }
 
 static void
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"