Re: Weird rawhide desktop behavior

2012-03-26 Thread Adam Jackson
On Sat, 2012-03-24 at 11:58 -0600, Jonathan Corbet wrote:
> Here's a strange pathology that just bit me for the first time in a while,
> though I've seen it before.  I'm not sure where to file a bug on this
> one...

There's several levels of "X locked up" pathology, let's see if I can
shed some light here.  (For bonus points, someone who wanted to add this
kind of info to the wiki would be Way Cool.)

> In short: I'll be working away, minding my own business, when the desktop
> goes completely dead - no response to any key or mouse events.  That said,
> the X server is still running; the pointer still moves with the mouse.  I
> can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
> things start working again after some time (measured in minutes);
> sometimes I lose patience and start over.  Today I went and made lunch and
> it never came back.

The pointer position (but not image) updates during a SIGIO handler if
you have hardware cursors enabled [1].  How do you know if you have
hardware cursors?  Short answer is, you do, unless you're running a dumb
driver like vesa/fbdev/modesetting.

So, class 1 lockup here is "I can't move the cursor", and boy are you in
trouble.  For KMS drivers this usually means X is waiting on a blocking
DRM ioctl; ps will show X in D state, and /proc/$(pidof Xorg)/wchan will
show you somewhere in ioctl land.  This is always a video driver bug,
and you will typically see something in dmesg when this happens.  Don't
bother trying to get an xserver backtrace here, ptrace can't attach to
D-state processes.

Class 2 lockup is "I can move the cursor, but the image never changes",
as in, if you mouse over a text entry field it doesn't change to the
vertical bar, or over a resize grip it doesn't change to a resize
indicator.  Here, the X server is stuck somewhere away from the main
loop, but at least isn't stuck in the kernel.  gdb on X will work, and
will probably tell you where you're stuck.  This class is usually a
userspace bug, could be either the driver or the server.

Class 3 lockup is "I can move the cursor and it behaves normally, but I
can't type".  In this scenario X _is_ successfully going around its main
loop.  If you can VT switch, this is you; VT switch processing happens
while draining the event queue, which is driven off the main loop.  This
scenario has an outside chance of being an xserver bug, but typically
this is the server dutifully doing what clients have told it to do:
something takes a grab, and then deadlocks.  Sorry about X11, we keep
trying to get rid of it for a reason.

Class 3 here one could debug more readily if you had some of the
debugging key combos wired up in XKB:

http://cgit.freedesktop.org/xorg/xserver/commit/?id=7d2543a3cb3089241982ce4f8984fd723d5312a1

Sadly gnome does not yet have UI for this, and I don't remember how to
drive setxkbmap to add them.  Note that the Ungrab and CloseGrab combos
allow you to defeat screensaver locking - ie, they are security holes -
which is why they're not enabled by default.  You don't want to use them
anyway if you're debugging, you want PrintGrabs so you can then go
inspect the grabbing process to see why it's deadlocked.

> I've tried killing off applications to see if somebody has some sort of
> all-inclusive grab, but I can't find the right one if that's the case.  I
> can kill something like Firefox and verify that the process is gone, but
> the Firefox window remains on-screen when I return to X.

This is significant.  It means the compositor isn't repainting.  So
either:

a) the compositor isn't the client with the stuck grab,
b) the compositor's internal grab logic is broken

[1] - Why position but not image?  Because on most hardware position is
just one register to poke, but image updates require an image upload,
which isn't safe to do if the driver is in the middle of some other
accelerated rendering.  Why only for hardware cursor?  Because software
cursor rendering only caches the pixels behind the cursor on motion,
which means you could race with normal rendering.  Both of these you
could fix if you were willing to take much more of a mutex overhead than
you're probably okay with.

- ajax


signature.asc
Description: This is a digitally signed message part
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread stan
On Sun, 25 Mar 2012 16:01:26 +
"Jóhann B. Guðmundsson"  wrote:

> I would also like to point to our own documentation regarding SysRQ
> [1] which I created a while back for the QA community to use and
> improve.
> 
> Dont hesitate improve/add anything to that page.

> 1.http://fedoraproject.org/wiki/QA/Sysrq

Thanks for the link!  My future reference. :-)
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread Jóhann B. Guðmundsson

On 03/25/2012 03:46 PM, stan wrote:

On Sat, 24 Mar 2012 11:58:15 -0600
Jonathan Corbet  wrote:


Anybody got a clue what's going on, or where I could look to get more
information?

Another suggestion is to use the emergency recovery key sequence.  I
think it is compiled by default into the Fedora kernels.  Described
here:
http://en.wikipedia.org/wiki/Magic_SysRq_key#.E2.80.9CREISUB.E2.80.9D_.E2.80.93_safe_reboot


I would also like to point to our own documentation regarding SysRQ [1] 
which I created a while back for the QA community to use and improve.


Dont hesitate improve/add anything to that page.

JBG

1.http://fedoraproject.org/wiki/QA/Sysrq
--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-25 Thread stan
On Sat, 24 Mar 2012 11:58:15 -0600
Jonathan Corbet  wrote:

> Anybody got a clue what's going on, or where I could look to get more
> information?

Another suggestion is to use the emergency recovery key sequence.  I
think it is compiled by default into the Fedora kernels.  Described
here:
http://en.wikipedia.org/wiki/Magic_SysRq_key#.E2.80.9CREISUB.E2.80.9D_.E2.80.93_safe_reboot
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Adam Williamson
On Sat, 2012-03-24 at 19:32 -0700, Chuck Forsberg WA7KGX N2469R wrote:
> X wedged on me again while I was running Gnome.
> I rebooted and started Xfce.  It wedged after some minutes.
> 
> I then installed the current Nvidia proprietary driver.  The computer 
> has not
> wedged for a few hours.  This suggests the problem lies
> with the default X server for my GTX 460SE ... or an interaction
> between my karma and the ozone layer.

Oh, another possible cause if you're on GNOME 3.3.90 is a memory leak in
Shell. This was fixed in 3.3.92. So, one thing to try is updating to
3.3.92.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Adam Williamson
On Sat, 2012-03-24 at 21:09 +, "Jóhann B. Guðmundsson" wrote:
> On 03/24/2012 08:34 PM, Jonathan Corbet wrote:
> > On Sat, 24 Mar 2012 12:23:26 -0700
> > Adam Williamson  wrote:
> >
> >> Jonathan, Chuck - if you try holding down a key that ought to do
> >> something for half a second instead of just pressing it, does it work?
> > I'll try that next time the problem hits.  I don't have any real way to
> > provoke it now, though, so I don't know when that will be...stay tuned.
> 
> The best way to deal with this is to follow [1] then file a bug against 
> gnome-shell and attach the relevant output along with .xsession-errors

If it's a Shell bug. Mine isn't.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Chuck Forsberg WA7KGX N2469R

X wedged on me again while I was running Gnome.
I rebooted and started Xfce.  It wedged after some minutes.

I then installed the current Nvidia proprietary driver.  The computer 
has not

wedged for a few hours.  This suggests the problem lies
with the default X server for my GTX 460SE ... or an interaction
between my karma and the ozone layer.

--
Chuck Forsberg WA7KGX N2469R c...@omen.com   www.omen.com
Developer of Industrial ZMODEM(Tm) for Embedded Applications
  Omen Technology Inc  "The High Reliability Software"
10255 NW Old Cornelius Pass Portland OR 97231   503-614-0430

--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

RE: Weird rawhide desktop behavior

2012-03-24 Thread John Dulaney


> Date: Sat, 24 Mar 2012 19:37:38 -0300
> Subject: Re: Weird rawhide desktop behavior
> From: look...@gmail.com
> To: test@lists.fedoraproject.org
> 
> On Sat, Mar 24, 2012 at 2:58 PM, Jonathan Corbet  wrote:
> > Here's a strange pathology that just bit me for the first time in a while,
> > though I've seen it before.  I'm not sure where to file a bug on this
> > one...
> >
> > In short: I'll be working away, minding my own business, when the desktop
> > goes completely dead - no response to any key or mouse events.  That said,
> > the X server is still running; the pointer still moves with the mouse.  I
> > can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
> > things start working again after some time (measured in minutes);
> > sometimes I lose patience and start over.  Today I went and made lunch and
> > it never came back.
> 
> I've noticed the same behavior on my box, that was freshly installed
> with Fedora 17 last Friday. I'll see what I can do to gather
> information about the problem as well.
> 
> -- 
> Lucas

I've been seeing similar behaviour in F16 for quite some time.  Usually, it 
happens
when I am in the middle of something else and don't have time to try to track 
it down,
and when I do have time for trying to figure it out, it doesn't happen.  It 
does seem to
occur more frequently when my box is a little warm, so I assumed that it was 
hardware
related.

John.
  -- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Lucas Meneghel Rodrigues
On Sat, Mar 24, 2012 at 2:58 PM, Jonathan Corbet  wrote:
> Here's a strange pathology that just bit me for the first time in a while,
> though I've seen it before.  I'm not sure where to file a bug on this
> one...
>
> In short: I'll be working away, minding my own business, when the desktop
> goes completely dead - no response to any key or mouse events.  That said,
> the X server is still running; the pointer still moves with the mouse.  I
> can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
> things start working again after some time (measured in minutes);
> sometimes I lose patience and start over.  Today I went and made lunch and
> it never came back.

I've noticed the same behavior on my box, that was freshly installed
with Fedora 17 last Friday. I'll see what I can do to gather
information about the problem as well.

-- 
Lucas
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Jóhann B. Guðmundsson

On 03/24/2012 08:34 PM, Jonathan Corbet wrote:

On Sat, 24 Mar 2012 12:23:26 -0700
Adam Williamson  wrote:


Jonathan, Chuck - if you try holding down a key that ought to do
something for half a second instead of just pressing it, does it work?

I'll try that next time the problem hits.  I don't have any real way to
provoke it now, though, so I don't know when that will be...stay tuned.


The best way to deal with this is to follow [1] then file a bug against 
gnome-shell and attach the relevant output along with .xsession-errors


JBG

1.https://live.gnome.org/GnomeShell/Debugging

--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread stan
On Sat, 24 Mar 2012 11:58:15 -0600
Jonathan Corbet  wrote:

> Here's a strange pathology that just bit me for the first time in a
> while, though I've seen it before.  I'm not sure where to file a bug
> on this one...
> 
> In short: I'll be working away, minding my own business, when the
> desktop goes completely dead - no response to any key or mouse
> events.
...
> Anybody got a clue what's going on, or where I could look to get more
> information?

This is old, and might be unrelated, but I used to have these lockups
starting around F11,F12.  For me, it was almost always firefox related,
I had just clicked on something, and bingo, lock-up.  But it was other
things often enough that I couldn't say for sure it was firefox. So I
started compiling my own kernels, customized for my system.  And the
problem went away, and I haven't seen it again.  Maybe it is gone in the
generic kernels, maybe not, but it only takes about 10 to 15 minutes of
my time now to compile and install a kernel, so I just continue doing
it.  Such lockups only happened while X was running for me, and that
would seem to absolve the kernel.  So it is probably that the recompile
removes the troublesome code, or changes it enough that it no longer
fails.

My best guess at the time was that there was a race condition leading
to a deadlock.  One other thing you could try is boosting the priority
of your user to -1.  It seems counter-intuitive, but for a workstation
instead of a server, this makes sense because then your graphical user
experience doesn't get impacted by background processes as much, yet
they still run if you have any CPU time (most of the time).  It is
in /etc/security/limits.conf, and as I said I have my user set to -1.
This is especially important to prevent system io from affecting your
gui experience as much. Think of it this way; is it more important to
your user experience that writing log files get done, or that the file
you want to edit gets loaded?  It also might finesse the race condition
leading to your lockups, by shifting the priorities of jobs in the
interrupt chain.

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Jonathan Corbet
On Sat, 24 Mar 2012 12:23:26 -0700
Adam Williamson  wrote:

> Jonathan, Chuck - if you try holding down a key that ought to do
> something for half a second instead of just pressing it, does it work?

I'll try that next time the problem hits.  I don't have any real way to
provoke it now, though, so I don't know when that will be...stay tuned.

Thanks,

jon
-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Adam Williamson
On Sat, 2012-03-24 at 11:43 -0700, Chuck Forsberg WA7KGX N2469R wrote:
> On 03/24/2012 11:12 AM, "Jóhann B. Guðmundsson" wrote:
> > On 03/24/2012 05:58 PM, Jonathan Corbet wrote:
> >> Here's a strange pathology that just bit me for the first time in a 
> >> while,
> >> though I've seen it before.  I'm not sure where to file a bug on this
> >> one...
> >>
> >> In short: I'll be working away, minding my own business, when the 
> >> desktop
> >> goes completely dead - no response to any key or mouse events.  That 
> >> said,
> >> the X server is still running; the pointer still moves with the 
> >> mouse.  I
> >> can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
> >> things start working again after some time (measured in minutes);
> >> sometimes I lose patience and start over.  Today I went and made 
> >> lunch and
> >> it never came back.
> >
> > Hmm
> >
> > I think there was a lock screen bug mentioned upstream that fits this 
> > description...
> >
> > JBG
> Same thing happened to me with RC1 Gnome on 64 bit.  Yum update was running
> in one window and I was surfing in another, each window taking up about 
> half the
> 1080x1920 monitor.  All of a sudden Firefox stopped responding and Yum 
> ground to a halt.
> The mouse cursor still followed mouse movements, but the keyboard  and 
> mouse keys
> were dead.  No Ctrl-Alt-Fn did anything.  No LED response to CapsLock 
> etc..   Some
> background process accessed the HD from time to time.  I had to use the  
> hardware reset.
> 
> Yum was hopelessly confused so I reinstalled RC1 and ran yum update from 
> a console
> terminal.  Hooray for Xfce which works better now.  Everyone I know of 
> who uses Linux
> as a tool rather dislikes Gnome 3, so let's make sure Xfce works properly.

Josh - to me, the above sound vaguely like that interrupt problem I
have, the one we were talking about the other day, where I have to hold
down keys for half a second before they register. Could it be the same?

Jonathan, Chuck - if you try holding down a key that ought to do
something for half a second instead of just pressing it, does it work?
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Chuck Forsberg WA7KGX N2469R

On 03/24/2012 11:12 AM, "Jóhann B. Guðmundsson" wrote:

On 03/24/2012 05:58 PM, Jonathan Corbet wrote:
Here's a strange pathology that just bit me for the first time in a 
while,

though I've seen it before.  I'm not sure where to file a bug on this
one...

In short: I'll be working away, minding my own business, when the 
desktop
goes completely dead - no response to any key or mouse events.  That 
said,
the X server is still running; the pointer still moves with the 
mouse.  I

can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
things start working again after some time (measured in minutes);
sometimes I lose patience and start over.  Today I went and made 
lunch and

it never came back.


Hmm

I think there was a lock screen bug mentioned upstream that fits this 
description...


JBG

Same thing happened to me with RC1 Gnome on 64 bit.  Yum update was running
in one window and I was surfing in another, each window taking up about 
half the
1080x1920 monitor.  All of a sudden Firefox stopped responding and Yum 
ground to a halt.
The mouse cursor still followed mouse movements, but the keyboard  and 
mouse keys
were dead.  No Ctrl-Alt-Fn did anything.  No LED response to CapsLock 
etc..   Some
background process accessed the HD from time to time.  I had to use the  
hardware reset.


Yum was hopelessly confused so I reinstalled RC1 and ran yum update from 
a console
terminal.  Hooray for Xfce which works better now.  Everyone I know of 
who uses Linux

as a tool rather dislikes Gnome 3, so let's make sure Xfce works properly.

--
Chuck Forsberg WA7KGX N2469R c...@omen.com   www.omen.com
Developer of Industrial ZMODEM(Tm) for Embedded Applications
  Omen Technology Inc  "The High Reliability Software"
10255 NW Old Cornelius Pass Portland OR 97231   503-614-0430

--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test

Re: Weird rawhide desktop behavior

2012-03-24 Thread Jóhann B. Guðmundsson

On 03/24/2012 05:58 PM, Jonathan Corbet wrote:

Here's a strange pathology that just bit me for the first time in a while,
though I've seen it before.  I'm not sure where to file a bug on this
one...

In short: I'll be working away, minding my own business, when the desktop
goes completely dead - no response to any key or mouse events.  That said,
the X server is still running; the pointer still moves with the mouse.  I
can also switch to another virtual console with alt-ctrl-Fn.  Sometimes
things start working again after some time (measured in minutes);
sometimes I lose patience and start over.  Today I went and made lunch and
it never came back.


Hmm

I think there was a lock screen bug mentioned upstream that fits this 
description...


JBG
--
test mailing list
test@lists.fedoraproject.org
To unsubscribe:
https://admin.fedoraproject.org/mailman/listinfo/test