Re: Weird rawhide desktop behavior

2012-03-26 Thread Adam Jackson
On Sat, 2012-03-24 at 11:58 -0600, Jonathan Corbet wrote:
 Here's a strange pathology that just bit me for the first time in a while,
 though I've seen it before.  I'm not sure where to file a bug on this
 one...

There's several levels of X locked up pathology, let's see if I can
shed some light here.  (For bonus points, someone who wanted to add this
kind of info to the wiki would be Way Cool.)

 In short: I'll be working away, minding my own business, when the desktop
 goes completely dead - no response to any key or mouse events.  That said,
 the X server is still running; the pointer still moves with the mouse.  I
 can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
 things start working again after some time (measured in minutes);
 sometimes I lose patience and start over.  Today I went and made lunch and
 it never came back.

The pointer position (but not image) updates during a SIGIO handler if
you have hardware cursors enabled [1].  How do you know if you have
hardware cursors?  Short answer is, you do, unless you're running a dumb
driver like vesa/fbdev/modesetting.

So, class 1 lockup here is I can't move the cursor, and boy are you in
trouble.  For KMS drivers this usually means X is waiting on a blocking
DRM ioctl; ps will show X in D state, and /proc/$(pidof Xorg)/wchan will
show you somewhere in ioctl land.  This is always a video driver bug,
and you will typically see something in dmesg when this happens.  Don't
bother trying to get an xserver backtrace here, ptrace can't attach to
D-state processes.

Class 2 lockup is I can move the cursor, but the image never changes,
as in, if you mouse over a text entry field it doesn't change to the
vertical bar, or over a resize grip it doesn't change to a resize
indicator.  Here, the X server is stuck somewhere away from the main
loop, but at least isn't stuck in the kernel.  gdb on X will work, and
will probably tell you where you're stuck.  This class is usually a
userspace bug, could be either the driver or the server.

Class 3 lockup is I can move the cursor and it behaves normally, but I
can't type.  In this scenario X _is_ successfully going around its main
loop.  If you can VT switch, this is you; VT switch processing happens
while draining the event queue, which is driven off the main loop.  This
scenario has an outside chance of being an xserver bug, but typically
this is the server dutifully doing what clients have told it to do:
something takes a grab, and then deadlocks.  Sorry about X11, we keep
trying to get rid of it for a reason.

Class 3 here one could debug more readily if you had some of the
debugging key combos wired up in XKB:

http://cgit.freedesktop.org/xorg/xserver/commit/?id=7d2543a3cb3089241982ce4f8984fd723d5312a1

Sadly gnome does not yet have UI for this, and I don't remember how to
drive setxkbmap to add them.  Note that the Ungrab and CloseGrab combos
allow you to defeat screensaver locking - ie, they are security holes -
which is why they're not enabled by default.  You don't want to use them
anyway if you're debugging, you want PrintGrabs so you can then go
inspect the grabbing process to see why it's deadlocked.

 I've tried killing off applications to see if somebody has some sort of
 all-inclusive grab, but I can't find the right one if that's the case.  I
 can kill something like Firefox and verify that the process is gone, but
 the Firefox window remains on-screen when I return to X.

This is significant.  It means the compositor isn't repainting.  So
either:

a) the compositor isn't the client with the stuck grab,
b) the compositor's internal grab logic is broken

[1] - Why position but not image?  Because on most hardware position is
just one register to poke, but image updates require an image upload,
which isn't safe to do if the driver is in the middle of some other
accelerated rendering.  Why only for hardware cursor?  Because software
cursor rendering only caches the pixels behind the cursor on motion,
which means you could race with normal rendering.  Both of these you
could fix if you were willing to take much more of a mutex overhead than
you're probably okay with.

- ajax


signature.asc
Description: This is a digitally signed message part
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread Adam Williamson
On Sat, 2012-03-24 at 21:09 +, Jóhann B. Guðmundsson wrote:
 On 03/24/2012 08:34 PM, Jonathan Corbet wrote:
  On Sat, 24 Mar 2012 12:23:26 -0700
  Adam Williamsonawill...@redhat.com  wrote:
 
  Jonathan, Chuck - if you try holding down a key that ought to do
  something for half a second instead of just pressing it, does it work?
  I'll try that next time the problem hits.  I don't have any real way to
  provoke it now, though, so I don't know when that will be...stay tuned.
 
 The best way to deal with this is to follow [1] then file a bug against 
 gnome-shell and attach the relevant output along with .xsession-errors

If it's a Shell bug. Mine isn't.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread Adam Williamson
On Sat, 2012-03-24 at 19:32 -0700, Chuck Forsberg WA7KGX N2469R wrote:
 X wedged on me again while I was running Gnome.
 I rebooted and started Xfce.  It wedged after some minutes.
 
 I then installed the current Nvidia proprietary driver.  The computer 
 has not
 wedged for a few hours.  This suggests the problem lies
 with the default X server for my GTX 460SE ... or an interaction
 between my karma and the ozone layer.

Oh, another possible cause if you're on GNOME 3.3.90 is a memory leak in
Shell. This was fixed in 3.3.92. So, one thing to try is updating to
3.3.92.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread stan
On Sat, 24 Mar 2012 11:58:15 -0600
Jonathan Corbet corbet...@lwn.net wrote:

 Anybody got a clue what's going on, or where I could look to get more
 information?

Another suggestion is to use the emergency recovery key sequence.  I
think it is compiled by default into the Fedora kernels.  Described
here:
http://en.wikipedia.org/wiki/Magic_SysRq_key#.E2.80.9CREISUB.E2.80.9D_.E2.80.93_safe_reboot
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread stan
On Sun, 25 Mar 2012 16:01:26 +
Jóhann B. Guðmundsson johan...@gmail.com wrote:

 I would also like to point to our own documentation regarding SysRQ
 [1] which I created a while back for the QA community to use and
 improve.
 
 Dont hesitate improve/add anything to that page.

 1.http://fedoraproject.org/wiki/QA/Sysrq

Thanks for the link!  My future reference. :-)
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Jóhann B. Guðmundsson

On 03/24/2012 05:58 PM, Jonathan Corbet wrote:

Here's a strange pathology that just bit me for the first time in a while,
though I've seen it before.  I'm not sure where to file a bug on this
one...

In short: I'll be working away, minding my own business, when the desktop
goes completely dead - no response to any key or mouse events.  That said,
the X server is still running; the pointer still moves with the mouse.  I
can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
things start working again after some time (measured in minutes);
sometimes I lose patience and start over.  Today I went and made lunch and
it never came back.


Hmm

I think there was a lock screen bug mentioned upstream that fits this 
description...


JBG
--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Jonathan Corbet
On Sat, 24 Mar 2012 12:23:26 -0700
Adam Williamson awill...@redhat.com wrote:

 Jonathan, Chuck - if you try holding down a key that ought to do
 something for half a second instead of just pressing it, does it work?

I'll try that next time the problem hits.  I don't have any real way to
provoke it now, though, so I don't know when that will be...stay tuned.

Thanks,

jon
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread stan
On Sat, 24 Mar 2012 11:58:15 -0600
Jonathan Corbet corbet...@lwn.net wrote:

 Here's a strange pathology that just bit me for the first time in a
 while, though I've seen it before.  I'm not sure where to file a bug
 on this one...
 
 In short: I'll be working away, minding my own business, when the
 desktop goes completely dead - no response to any key or mouse
 events.
...
 Anybody got a clue what's going on, or where I could look to get more
 information?

This is old, and might be unrelated, but I used to have these lockups
starting around F11,F12.  For me, it was almost always firefox related,
I had just clicked on something, and bingo, lock-up.  But it was other
things often enough that I couldn't say for sure it was firefox. So I
started compiling my own kernels, customized for my system.  And the
problem went away, and I haven't seen it again.  Maybe it is gone in the
generic kernels, maybe not, but it only takes about 10 to 15 minutes of
my time now to compile and install a kernel, so I just continue doing
it.  Such lockups only happened while X was running for me, and that
would seem to absolve the kernel.  So it is probably that the recompile
removes the troublesome code, or changes it enough that it no longer
fails.

My best guess at the time was that there was a race condition leading
to a deadlock.  One other thing you could try is boosting the priority
of your user to -1.  It seems counter-intuitive, but for a workstation
instead of a server, this makes sense because then your graphical user
experience doesn't get impacted by background processes as much, yet
they still run if you have any CPU time (most of the time).  It is
in /etc/security/limits.conf, and as I said I have my user set to -1.
This is especially important to prevent system io from affecting your
gui experience as much. Think of it this way; is it more important to
your user experience that writing log files get done, or that the file
you want to edit gets loaded?  It also might finesse the race condition
leading to your lockups, by shifting the priorities of jobs in the
interrupt chain.

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Jóhann B. Guðmundsson

On 03/24/2012 08:34 PM, Jonathan Corbet wrote:

On Sat, 24 Mar 2012 12:23:26 -0700
Adam Williamsonawill...@redhat.com  wrote:


Jonathan, Chuck - if you try holding down a key that ought to do
something for half a second instead of just pressing it, does it work?

I'll try that next time the problem hits.  I don't have any real way to
provoke it now, though, so I don't know when that will be...stay tuned.


The best way to deal with this is to follow [1] then file a bug against 
gnome-shell and attach the relevant output along with .xsession-errors


JBG

1.https://live.gnome.org/GnomeShell/Debugging

--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Lucas Meneghel Rodrigues
On Sat, Mar 24, 2012 at 2:58 PM, Jonathan Corbet corbet...@lwn.net wrote:
 Here's a strange pathology that just bit me for the first time in a while,
 though I've seen it before.  I'm not sure where to file a bug on this
 one...

 In short: I'll be working away, minding my own business, when the desktop
 goes completely dead - no response to any key or mouse events.  That said,
 the X server is still running; the pointer still moves with the mouse.  I
 can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
 things start working again after some time (measured in minutes);
 sometimes I lose patience and start over.  Today I went and made lunch and
 it never came back.

I've noticed the same behavior on my box, that was freshly installed
with Fedora 17 last Friday. I'll see what I can do to gather
information about the problem as well.

-- 
Lucas
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

RE: Weird rawhide desktop behavior

2012-03-24 Thread John Dulaney


 Date: Sat, 24 Mar 2012 19:37:38 -0300
 Subject: Re: Weird rawhide desktop behavior
 From: look...@gmail.com
 To: test@lists.fedoraproject.org
 
 On Sat, Mar 24, 2012 at 2:58 PM, Jonathan Corbet corbet...@lwn.net wrote:
  Here's a strange pathology that just bit me for the first time in a while,
  though I've seen it before.  I'm not sure where to file a bug on this
  one...
 
  In short: I'll be working away, minding my own business, when the desktop
  goes completely dead - no response to any key or mouse events.  That said,
  the X server is still running; the pointer still moves with the mouse.  I
  can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
  things start working again after some time (measured in minutes);
  sometimes I lose patience and start over.  Today I went and made lunch and
  it never came back.
 
 I've noticed the same behavior on my box, that was freshly installed
 with Fedora 17 last Friday. I'll see what I can do to gather
 information about the problem as well.
 
 -- 
 Lucas

I've been seeing similar behaviour in F16 for quite some time.  Usually, it 
happens
when I am in the middle of something else and don't have time to try to track 
it down,
and when I do have time for trying to figure it out, it doesn't happen.  It 
does seem to
occur more frequently when my box is a little warm, so I assumed that it was 
hardware
related.

John.
  -- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Chuck Forsberg WA7KGX N2469R

X wedged on me again while I was running Gnome.
I rebooted and started Xfce.  It wedged after some minutes.

I then installed the current Nvidia proprietary driver.  The computer 
has not

wedged for a few hours.  This suggests the problem lies
with the default X server for my GTX 460SE ... or an interaction
between my karma and the ozone layer.

--
Chuck Forsberg WA7KGX N2469R c...@omen.com   www.omen.com
Developer of Industrial ZMODEM(Tm) for Embedded Applications
  Omen Technology Inc  The High Reliability Software
10255 NW Old Cornelius Pass Portland OR 97231   503-614-0430

--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test