Bug#479438: Bug#504063: Bug#511708: aptitude: [etch upgrade] TUI consistently blocks after doing one set of operations

2009-01-28 Thread Eddy Petrișor
2009/1/28 Daniel Burrows :
>  I'd like to thank everyone who replied on these bugs over the last
> couple years.  Especially the people who prodded me into trying out
> kernel 2.6.18-6.  I've tracked down the bug, or at least I think maybe
> I have.  There is definitely a bug here, but it seems to be failing
> even in cases where it "ought to" work; I suspect that 2.6.18-6 is also
> a bit broken.

[..]

>  So, the long and the short of it is: if we just lock the mutex when
> we actually need it, that is, when we're about to send a get_input_event
> to the main thread, there's no danger of being canceled and leaving the
> mutex locked.  I've attached a patch that does this; on my computer, it
> eliminates the hang.  I'll ask the release managers for permission to
> apply this patch in Lenny.

I can confirm this patch fixes the problem for me.


-- 
Regards,
EddyP
=
"Imagination is more important than knowledge" A.Einstein



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#479438: Bug#504063: Bug#511708: aptitude: [etch upgrade] TUI consistently blocks after doing one set of operations

2009-01-27 Thread Daniel Burrows
  I'd like to thank everyone who replied on these bugs over the last
couple years.  Especially the people who prodded me into trying out
kernel 2.6.18-6.  I've tracked down the bug, or at least I think maybe
I have.  There is definitely a bug here, but it seems to be failing
even in cases where it "ought to" work; I suspect that 2.6.18-6 is also
a bit broken.

  The problem stems from the fact that the input thread, for its entire
execution, holds a lock on a mutex.  The purpose of this mutex is to
guard a condition variable and an associated member variable of the
input thread, both of which are used to synchronize the input thread and
the foreground code that actually reads from the keyboard.  Reading
input in the foreground was introduced in the patch that introduced this
bug (previously we tried to read in the background; it turns out that
curses updates the display when you read keyboard input, so this led to
simultaneous calls to refresh() and corrupted output).

  Unfortunately, the threads wrapper used in cwidget does not provide
any way to push thread cleanup handlers for mutex locks, because the
authors of the POSIX threads spec cunningly made it impossible to embed
these in an RAII C++ object due to their use of a macro.  You probably
see where this is going...

  Because of the flakiness of pthread_cancel(), cwidget mostly avoids
it.  However, it is used in a few places, including when we shut down
the input thread.  IIRC, this is done because it's necessary to
interrupt a select(), and cancel() is the main tool at our disposal to
do that.  The problem is that, in the words of pthread_cleanup_push(3):

...if a thread exits OR IS CANCELLED while it owns a locked mutex,
   the mutex will remain locked forever and prevent other threads
   from executing normally.  (emphasis mine)

  Recall that the only reason this mutex is locked is that we need it
for the condition variable.  But if you look at the code of the input
thread, you'll see that the condition variable is only accessed in one
sub-clause of that code.  In fact, of the three cancellation points in
that function (counting select(), which isn't one on Linux, and the
bracketing pthread_testcancel() calls, which are), only one of them
occurs in the area where the mutex actually has to be locked, and that's
the line where we actually wait on the condition variable.  And it turns
out that wait() *DOES* handle cleanup from cancellation properly.

  One unexplained question is why the bug hit even when aptitude
suspended in response to a keystroke.  When aptitude was waiting for
a keystroke, the input thread would be blocked in select(), so no
cleanup handlers were installed and the scenario above makes sense.
But when aptitude was waiting for a keystroke, I thought that the
input thread would still be blocked waiting on the input event
condition, and as I noted earlier, that should install a proper cleanup
handler.  Another question is why we didn't see this bug in 2.6.26.
I can't think of anything, except that maybe the implementation of
mutexes changed somehow so that they get unlocked when a thread is
canceled.

  So, the long and the short of it is: if we just lock the mutex when
we actually need it, that is, when we're about to send a get_input_event
to the main thread, there's no danger of being canceled and leaving the
mutex locked.  I've attached a patch that does this; on my computer, it
eliminates the hang.  I'll ask the release managers for permission to
apply this patch in Lenny.

  Now I just need to reboot into 2.6.26 so that my wireless works and I
can actually send this mail.

  Daniel
diff --git a/src/cwidget/toplevel.cc b/src/cwidget/toplevel.cc
index c912ad6..e0b6ab2 100644
--- a/src/cwidget/toplevel.cc
+++ b/src/cwidget/toplevel.cc
@@ -1,6 +1,6 @@
 // toplevel.cc
 //
-//  Copyright 1999-2005, 2007-2008 Daniel Burrows
+//  Copyright 1999-2005, 2007-2009 Daniel Burrows
 //
 //  This program is free software; you can redistribute it and/or modify
 //  it under the terms of the GNU General Public License as published by
@@ -446,7 +446,6 @@ namespace cwidget
 
   void operator()()
   {
-	threads::mutex::lock l(input_event_mutex);
 	input_event_fired = false;
 
 	// Important note: this routine only blocks indefinitely in
@@ -480,6 +479,7 @@ namespace cwidget
 	  }
 	else
 	  {
+		threads::mutex::lock l(input_event_mutex);
 		post_event(new get_input_event(input_event_mutex,
 	   input_event_fired,
 	   input_event_condition));


Bug#479438: Bug#504063: Bug#511708: aptitude: [etch upgrade] TUI consistently blocks after doing one set of operations

2009-01-27 Thread Daniel Burrows
On Tue, Jan 27, 2009 at 09:46:22PM +0200, Eddy Petrișor 
 was heard to say:
> 2009/1/27 Daniel Burrows :
> >  This is for everyone who can reproduce the bug.
> >
> >  Could you build a libcwidget3 with the attached patch and see what's
> > produced in /tmp/cwidget.input.log when you reproduce the bug?  If I
> > start aptitude, run dpkg once, and exit, I get this:
> >
> > Starting input thread
> > Creating new input thread
> > Input thread created
> > Stopping input thread
> > Destroying existing input thread
> > Input thread destroyed
> > Starting input thread
> > Creating new input thread
> > Input thread created
> > Stopping input thread
> > Destroying existing input thread
> > Input thread destroyed
> 
> 
> Starting input thread
> Creating new input thread
> Input thread created
> Stopping input thread
> Destroying existing input thread
> Input thread destroyed
> Starting input thread
> Creating new input thread
> Input thread created

  Thanks.

  That's more interesting than I expected!  I was assuming that the
input thread wasn't restarting properly, but it looks like it is.  So
why aren't we responding to events after that?  We're responding to
some of them, or the display wouldn't redraw properly; only keyboard
events are being dropped.

  I'll send you another patch with some ad-hoc logging in a bit.

  Daniel



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#479438: Bug#504063: Bug#511708: aptitude: [etch upgrade] TUI consistently blocks after doing one set of operations

2009-01-27 Thread Eddy Petrișor
2009/1/27 Daniel Burrows :
>  This is for everyone who can reproduce the bug.
>
>  Could you build a libcwidget3 with the attached patch and see what's
> produced in /tmp/cwidget.input.log when you reproduce the bug?  If I
> start aptitude, run dpkg once, and exit, I get this:
>
> Starting input thread
> Creating new input thread
> Input thread created
> Stopping input thread
> Destroying existing input thread
> Input thread destroyed
> Starting input thread
> Creating new input thread
> Input thread created
> Stopping input thread
> Destroying existing input thread
> Input thread destroyed


Starting input thread
Creating new input thread
Input thread created
Stopping input thread
Destroying existing input thread
Input thread destroyed
Starting input thread
Creating new input thread
Input thread created


-- 
Regards,
EddyP
=
"Imagination is more important than knowledge" A.Einstein



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#479438: Bug#504063: Bug#511708: aptitude: [etch upgrade] TUI consistently blocks after doing one set of operations

2009-01-27 Thread Eddy Petrișor
2009/1/27 Rich Griffiths :
> Eddy Petrișor wrote:
>>
>> 
>> microwaverich, could you test once more, just to be sure, that under
>> each one of these setups aptitude blocks:
>>
>> 1) etch kernel (2.6.18-6-k7 - if you no longer have it, you can grab
>> it from http://packages.debian.org/etch/linux-image-2.6.18-6-k7)
>>official aptitude from lenny (0.4.11.11-1~lenny1)
>>
>> 2) lenny kernel (2.6.26 -
>> http://packages.debian.org/lenny/linux-image-2.6.26-1-686)
>>official aptitude from lenny (0.4.11.11-1~lenny1)
>>
>> 
>
> Eddy:
>
> I still have both kernels installed, so this was easy.

Thanks for testing.

> I rebooted to 2.6.18-6-k7, switched to console 2, and invoked aptitude.
>
> In the ncurses aptitude, I searched for xsoldier, hit '+' to select it, and
> 'g' twice to install.  After completing the install, aptitude did the usual
> thing with the cache and looked like it had returned to command mode, but it
> was frozen.  It would not respond to the keyboard, so I used ctrl-C to kill
> it.  The keystrokes I had entered while aptitude was frozen were then
> waiting there at the root '#' command prompt.

As expected.

> I rebooted to 2.6.26-1-686 and went through a similar procedure, except I
> used '-' to deselect xsoldier.  Aptitude worked fine, and exited normally
> using 'q', as it has the last few times I've used it.

OK, at least we know that the 2.6.26-1-686 kernel is not a good
candidate if one wants to catch the bug.

> I'm 99% sure (nothing is 100%) that this (behaving properly under the -686
> kernel) is not what was happening when I posted the problem on
> linux.debian.user.   I keep a written log of everything I do on the machine,
> and I noted the freeze-up problem even after switching from the k7 to 686
> kernel.

Weird.

> But it ain't like that no more!  I don't know what has changed.  Perhaps I
> did make a mistake somewhere along the way.

It is possible that maybe something else changed in the environment
and the bug is no longer triggered.

> I'm happy that the problem is 'fixed' for me.  But if I can still be of
> help, please feel free to ask.  I'm pleased to contribute in any way I can.

Thanks, your help is welcome.

-- 
Regards,
EddyP
=
"Imagination is more important than knowledge" A.Einstein



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org