skipping locks, mutex_owned, usb

2011-08-23 Thread Andriy Gapon

Yes, the subject sounds quite hairy, so please let me try to explain it.
First, let's consider one concrete function:

static int
ukbd_poll(keyboard_t *kbd, int on)
{
struct ukbd_softc *sc = kbd-kb_data;

if (!mtx_owned(Giant)) {
/* XXX cludge */
int retval;
mtx_lock(Giant);
retval = ukbd_poll(kbd, on);
mtx_unlock(Giant);
return (retval);
}

if (on) {
sc-sc_flags |= UKBD_FLAG_POLLING;
sc-sc_poll_thread = curthread;
} else {
sc-sc_flags = ~UKBD_FLAG_POLLING;
ukbd_start_timer(sc);   /* start timer */
}
return (0);
}

This XXX cludge [sic] pattern is scattered around a few functions in the ukbd
code and perhaps other usb code:
func()
{
if (!mtx_owned(Giant)) {
mtx_lock(Giant);
func();
mtx_unlock(Giant);
return;
}

// etc ...
}

I have a few question directly and indirectly related to this pattern.

I.  [Why] do we need this pattern?
Can the code be re-written in a smarter (if not to say proper) way?

II.  ukbd_poll() in particular can be called from the kdb context (kdb_trap -
db_trap - db_command_loop - etc).  What would happen if the Giant is held by a
thread on some other CPU (which would be hard-stopped by kdp_trap)?  Won't we 
get
a lockup here?
So shouldn't this code actually check for kdb_active?

III.  With my stop_scheduler_on_panic patch ukbd_poll() produces infinite chains
of 'infinite' recursion because mtx_owned() always returns false.  This is 
because
I skip all lock/unlock/etc operations in the post-panic context.  I think that
it's a good philosophical question: what operations like mtx_owned(),
mtx_recursed(), mtx_trylock() 'should' return when we actually act as if no 
locks
exist at all?

My first knee-jerk reaction was to change mtx_owned() to return true in this
lock-less context, but _hypothetically_ there could exist some symmetric code
that does something like:
func()
{
if (mtx_owned(Giant)) {
mtx_unlock(Giant);
func();
mtx_lock(Giant);
return;
}

// etc ...

What do you think about this problem?
Should we re-write ukbd_poll() (and possibly usb code) or should mutex_owned() 
be
adjusted?


That question III actually brings another thought: perhaps the whole of idea of
skipping locks in a certain context is not a correct direction?
Perhaps, instead we should identify all the code that could be legitimately
executed in the after-panic and/or kdb contexts and make that could explicitly
aware of its execution context.  That is, instead of trying to make _lock,
_unlock, _owned, _trylock, etc do the right thing auto-magically, we should try 
to
make the calling code check panicstr and kdb_active and do the right thing on 
that
level (which would include not invoking those lock-related operations or other
inappropriate operations).

Thank you very much in advance for your insights and help!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: skipping locks, mutex_owned, usb

2011-08-23 Thread Hans Petter Selasky
On Tuesday 23 August 2011 14:09:15 Andriy Gapon wrote:
 Yes, the subject sounds quite hairy, so please let me try to explain it.
 First, let's consider one concrete function:
 
 static int
 ukbd_poll(keyboard_t *kbd, int on)
 {
 struct ukbd_softc *sc = kbd-kb_data;
 
 if (!mtx_owned(Giant)) {
 /* XXX cludge */
 int retval;
 mtx_lock(Giant);
 retval = ukbd_poll(kbd, on);
 mtx_unlock(Giant);
 return (retval);
 }
 
 if (on) {
 sc-sc_flags |= UKBD_FLAG_POLLING;
 sc-sc_poll_thread = curthread;
 } else {
 sc-sc_flags = ~UKBD_FLAG_POLLING;
 ukbd_start_timer(sc);   /* start timer */
 }
 return (0);
 }
 
 This XXX cludge [sic] pattern is scattered around a few functions in the
 ukbd code and perhaps other usb code:
 func()
 {
   if (!mtx_owned(Giant)) {
   mtx_lock(Giant);
 func();
 mtx_unlock(Giant);
   return;
   }
 
   // etc ...
 }
 
 I have a few question directly and indirectly related to this pattern.

Hi Andriy,

Thanks for bringing up this topic.

 
 I.  [Why] do we need this pattern?
 Can the code be re-written in a smarter (if not to say proper) way?

Unless that API's surrounding ukbd change, we are stuck with auto-magic Giant 
locking inside ukbd. The USB stack itself does not require that Giant is used 
for the USB keyboard driver. It is some times since I touched this topic. From 
what I remember some interfacing layers of ukbd are using Giant. Some are not.

Much of the existing non-USB keyboard code is written to get/put key-presses 
directly through legacy PCI/ISA registers. This approach is deferred in the 
USB stack, hence it would cause blocking delays of 1ms per polling operation.

Another corner case:

When you are scrolling the console having scroll lock locked, then the first 
printf on that console will directly call an IOCTL in the ukbd to switch off 
the scroll lock led. As you might be aware this can be dangerous. The ukbd is 
also sometimes printing stuff. The chance of deadlock and LOR is increasing.

Without having to push for multiple changes outside ukbd, this problem is 
simply hidden in UKBD by checking if Giant is locked, which is required for 
starting the USB transfers in UKBD. I think it would be a very bad idea to 
lock another lock in a sub-function of printf. See!

A multithread / taskqueue interface is needed for postponing keyboard events.

And what about polling and UKBD? This needs a separate calling path I think.

 
 II.  ukbd_poll() in particular can be called from the kdb context (kdb_trap
 - db_trap - db_command_loop - etc).  What would happen if the Giant is
 held by a thread on some other CPU (which would be hard-stopped by
 kdp_trap)?  Won't we get a lockup here?
 So shouldn't this code actually check for kdb_active?
 
 III.  With my stop_scheduler_on_panic patch ukbd_poll() produces infinite
 chains of 'infinite' recursion because mtx_owned() always returns false. 
 This is because I skip all lock/unlock/etc operations in the post-panic
 context.  I think that it's a good philosophical question: what operations
 like mtx_owned(), mtx_recursed(), mtx_trylock() 'should' return when we
 actually act as if no locks exist at all?
 
 My first knee-jerk reaction was to change mtx_owned() to return true in
 this lock-less context, but _hypothetically_ there could exist some
 symmetric code that does something like:
 func()
 {
   if (mtx_owned(Giant)) {
   mtx_unlock(Giant);
 func();
 mtx_lock(Giant);
   return;
   }
 
   // etc ...
 
 What do you think about this problem?
 Should we re-write ukbd_poll() (and possibly usb code) or should
 mutex_owned() be adjusted?

I think you've brought an interesing problem. I would suggest to change 
mtx_owned or make a new function to take an integer value to handle the case 
of SCHEDULER_STOPPED:

mtx_owned_default(struct mtx *, int return value in case of 
SCHEDULER_STOPPED);

 
 That question III actually brings another thought: perhaps the whole of
 idea of skipping locks in a certain context is not a correct direction?
 Perhaps, instead we should identify all the code that could be legitimately
 executed in the after-panic and/or kdb contexts and make that could
 explicitly aware of its execution context.  That is, instead of trying to
 make _lock, _unlock, _owned, _trylock, etc do the right thing
 auto-magically, we should try to make the calling code check panicstr and
 kdb_active and do the right thing on that level (which would include not
 invoking those lock-related operations or other inappropriate operations).

 Thank you very much in advance for your insights and help!

--HPS
___
freebsd-current@freebsd.org mailing list