[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-12-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

Alexandre Courbot gnu...@gmail.com changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
  Component|EGL/Wayland |Driver/nVidia (open)
   Assignee|wayland-bugs@lists.freedesk |aplatt...@nvidia.com
   |top.org |
Product|Mesa|xorg
 QA Contact|mesa-dev@lists.freedesktop. |xorg-t...@lists.x.org
   |org |

--- Comment #10 from Alexandre Courbot gnu...@gmail.com ---
After further confirmation (checking the GET/PUT pointers to make sure
pushbuffers are executed, and directly peeking the value of the screen fence
counter using PRAMIN), it appears that the fence value is indeed updated by the
GPU as expected, but user-space is reading an old value. Seems like our old
friend ARM memory coherency issue is back. :(

Moving this issue to Nouveau, since this is clearly a driver problem, and
taking ownership of it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-12-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #7 from Alexandre Courbot gnu...@gmail.com ---
Some more input about this oddity.

We are really waiting for fences here. During the freezes I can see the EGL
client looping in nouveau_fence_wait(), and displaying the fence sequence and
the highest signaled fence by the screen (screen-fence.sequence_ack) shows
that sequence_ack  fence-sequence, indicating the GPU has not signaled this
fence yet.

I am starting to suspect a CPU scheduling issue here. Another very interesting
phenomenon happens if I disable all but 1 CPU core: the issue becomes
immediately visible (even worse than when perf top is running), for all Wayland
EGL clients that set eglSwapInterval to 0. Wayland itself (or any DRM program
for that instance) is still not affected, neither are Wayland clients not
touching eglSwapInterval *until* they exit, at which point nouveau_fence_wait()
will again endlessly loop on a fence. One way to release that fence is to
trigger a draw event in another client. Then the screen's sequence_ack finally
increases and the client goes through and exits. Doing the same in a client
that set eglSwapInterval(0) allows it to draw a few frames, and then it blocks
again.

I suspect that some event in Wayland remains stuck somewhere, which leads to
the EGL client to wait for a fence that Wayland fails to release when expected.
Here is a gdb backtrace of weston-simple-egl when waiting for the fence:

#0  0xb6d9298c in sched_yield ()
#1  0xb6a448ce in nouveau_fence_wait ()
#2  0xb6a45480 in nouveau_screen_fence_finish ()
#3  0xb69a351e in dri_flush ()
#4  0xb6fc399e in dri2_wl_swap_buffers_with_damage ()
#5  0xb6fc0f66 in dri2_swap_buffers_with_damage ()
#6  0xb6fb700c in eglSwapBuffersWithDamageEXT ()
#7  0xaa4a in redraw (data=0xbefff9e4, callback=0x0, time=0)
#8  0xb254 in main (argc=2, argv=0xbefffc64)

I guess the next step for me is to find how to get some output about Wayland
events as they happen and see which one remains stuck.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-12-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #8 from Michel Dänzer mic...@daenzer.net ---
(In reply to Alexandre Courbot from comment #7)
 I guess the next step for me is to find how to get some output about Wayland
 events as they happen and see which one remains stuck.

FWIW, I don't think these fences have anything to do with Wayland events. A
Gallium fence is created as the result of a flush and should signal when the
flushed operations have finished. It sounds like something in the nouveau
driver/winsys/kernel code is preventing that from happening.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-12-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #9 from Alexandre Courbot gnu...@gmail.com ---
(been reminded we should have a drink one of these days! ;))

Mmm, since this happens *only* for Wayland EGL programs (e.g. no problem with
DRM), I was hoping to see unexpected things happening at the Wayland level.

But what seems even stranger is that I think I can see the fence release
pushbuffers being queued *and* run by the GPU from the kernel side. So it may
also be that the screen's fence value is properly updated but the new value not
seen by user-space because of some memory incoherency.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-12-01 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #6 from Alexandre Courbot gnu...@gmail.com ---
I have tried reverting the following patches as suggested on
http://lists.freedesktop.org/archives/dri-devel/2014-November/072367.html:

809e9447: drm/nouveau: use shared fences for readable objects
055dffdf: drm/nouveau: bump driver patchlevel to 1.2.1
e3be4c23: drm/nouveau: specify if interruptible wait is desired in
nouveau_fence_sync
15a996bb: drm/nouveau: assign fence_chan-name correctly

Doing so helped with another (unrelated) issue I was having, but sadly did not
change anything to this problem or its symptoms. Looks like we are dealing with
something completely different, even though somehow fence-related.

Also FWIW, running weston-simple-egl with the -b option (eglSwapInterval 0)
leads to the same behavior as glmark2.

I will keep investigating where/why the fences are blocking.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-11-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #3 from Alexandre Courbot gnu...@gmail.com ---
Created attachment 110159
  -- https://bugs.freedesktop.org/attachment.cgi?id=110159action=edit
Output of strace -tt weston-simple-egl

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-11-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #4 from Alexandre Courbot gnu...@gmail.com ---
As discussed on IRC, it appears that running perf top at the same time as any
Weston EGL client delivers the same behavior of frozen frames in said client,
without affecting Weston itself or other non-EGL clients. perf top reports a
high usage (~30% CPU time) of _raw_spin_unlock_irq() *only* when the condition
occurs.

Very strange. I have run strace on the EGL client for both conditions
(non-bound EGL program, and EGL program while perf top is running), and
noticed that these freezes correspond to long consecutive series of calls to
sched_yield(), e.g:

17:42:56.337425 sched_yield()   = 0
17:42:56.337830 sched_yield()   = 0
...
17:42:57.713679 sched_yield()   = 0
17:42:57.714122 sched_yield()   = 0

sched_yield() is never called in the case of a well-behaving client.

Grepping into Mesa, I found one loop in nouveau_fence_wait() where
sched_yield() is called. Removing this call does not fix the issue, but the
calls the sched_yield() are not visible in the trace anymore, which tends to
confirm this is where the misbehavior is happening.

The series of sched_yield() calls are always between the two following ioctls:

17:42:56.266742 ioctl(6, 0xc0406481, 0xbeeec788) = 0
17:42:56.268952 sched_yield()   = 0
17:42:56.269466 sched_yield()   = 0
...
17:42:56.281481 sched_yield()   = 0
17:42:56.281848 sched_yield()   = 0
17:42:56.282210 ioctl(6, 0xc0406481, 0xbeeec750) = 0

with the calls to sched_yield() removed, the strace log shows large delays
between these ioctls, hinting again that nouveau_fence_wait() is busy-waiting
in that loop:

17:56:00.248145 ioctl(6, 0xc0406481, 0xbecf2768) = 0
17:56:01.155312 ioctl(6, 0xc0406481, 0xbecf2730) = 0

What is really remarkable is that this is true for both cases: a non-capped EGL
client, or a (capped or non-capped) EGL client while perf top is running. The
same ioctls are involved, in the same pattern.

While this seems to make it clear that we are waiting for fences, I still
cannot imagine how perf top can have such a strong influence on this.

So it seems that in both cases we are waiting for fences. I have attached a
full strace of weston-simple-egl in case this could be helpful.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-11-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #5 from Pekka Paalanen ppaala...@gmail.com ---
Maarten Lankhorst suspects this might have something to do with
http://lists.freedesktop.org/archives/dri-devel/2014-November/072367.html

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

Bug ID: 86690
   Summary: glmark2-es2-wayland shortly freezes on some frames
with egl_dri2 backend (Nouveau/GK20A)
   Product: Mesa
   Version: git
  Hardware: ARM
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: EGL/Wayland
  Assignee: wayland-b...@lists.freedesktop.org
  Reporter: gnu...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

When using the egl_dri2 driver, glmark2 will sometimes keep displaying the same
frame for up to 1/2 second before resuming normally. This also affects the
reported frame rate, which drops dramatically with each occurence.

This only seems to happen for applications that set eglSwapInterval to 0 in
order to exceed the monitor frame rate. With applications that do not set
eglSwapInterval (like weston-simple-egl) or the (recently removed) egl_gallium
driver that also does not allow more than 60fps, the issue is not visible.

Relevant comments from Pekka Paalanen when discussing this on the mailing-list:

I have a hunch (wl_buffer.release not delivered in time, and client
side EGL running out of available buffers), but confirming that would
require a Wayland protocol dump up to such hickup. You could try to get
one by setting the enviroment variable WAYLAND_DEBUG=client for
glmark2. It will be a flood, especially if glmark2 succeeds in running
at uncapped framerates. The trace will come to stderr, so you want to
redirect that to file. You need to find out where in the trace the
hickup happened. The timestamps are in milliseconds. I could then take
a look (will need the whole trace).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #1 from Alexandre Courbot gnu...@gmail.com ---
Created attachment 109990
  -- https://bugs.freedesktop.org/attachment.cgi?id=109990action=edit
Trace when running glmark2 with WAYLAND_DEBUG=client

Attached the trace suggested by Pekka. The stuttering is visible almost
immediatly after the bench started, and occured regularly until the end of the
trace.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86690] glmark2-es2-wayland shortly freezes on some frames with egl_dri2 backend (Nouveau/GK20A)

2014-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86690

--- Comment #2 from Pekka Paalanen ppaala...@gmail.com ---
(In reply to Alexandre Courbot from comment #1)
 Created attachment 109990 [details]
 Trace when running glmark2 with WAYLAND_DEBUG=client
 
 Attached the trace suggested by Pekka. The stuttering is visible almost
 immediatly after the bench started, and occured regularly until the end of
 the trace.

Ok, here's a piece of what I detected as delays  50ms in the protocol dump
(timestamp, skip in ms):
2007458 55
2007628 56
2008779 62
2008926 142
2009104 128
2009638 529
2009733 84
2012659 105
2012769 62
2012838 64

It looks quite random in both occurrence and length. I checked the first six,
and the delay always happens between damage and commit requests. Due to how
libwayland-client prints these, these timestamp are when Mesa calls the
functions wl_surface_damage and wl_surface_commit.

In src/egl/drivers/dri2/platform_wayland.c (egl_dri2), in
dri2_wl_swap_buffers_with_damage() you see these function calls. The things
between them are flush and invalidate. One of these probably causes the stall.

If we look at egl_gallium in
src/gallium/state_trackers/egl/wayland/native_wayland.c, well, the code is
different. Swapbuffers and present seem to be two different hooks.

I'm not familiar with this code to say what the actual difference is. Maybe
egl_dri2 allows queueing many frames' worth of drawing to the GPU and at some
point the driver simply has to wait for it all to drain? And egl_gallium does
something slightly different?

Oh right, you said egl_gallium didn't respect swapinterval=0 but stayed
vsynced. That explains why the drawing never accumulates. Looks like
wayland_surface_swap_buffers() always waits for the frame callback, which
causes exactly the throttling to vsync rate.

I believe this would be a case of queueing too much GPU work and eventually
having to wait for it to drain. I'm not sure how or where you would start
fixing it. Buffer re-use could also be a factor.

I am fairly sure this is not a Weston bug. It seems to swapping between the two
wl_buffers just fine.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev