On Tue, Mar 24, 2026 at 09:06:02AM -0700, Matthew Brost wrote:
> On Tue, Mar 24, 2026 at 10:23:45AM +0100, Boris Brezillon wrote:
> > On Mon, 23 Mar 2026 11:38:06 -0700
> > Matthew Brost <[email protected]> wrote:
> > 
> > > 
> > > Ok, getting stats is easier than I thought...
> > > 
> > > ./perf stat -a -e 
> > > context-switches,cpu-migrations,task-clock,cycles,instructions 
> > > /home/mbrost/xe/source/drivers.gpu.i915.igt-gpu-tools/build/tests/xe_exec_threads
> > >  --r threads-basic
> > > 
> > > This test creates one thread per engine instance (7 instances this BMG
> > > device) and submits 1k exec IOCTLs per thread, each performing a DW
> > > write. Each exec IOCTL typically does not have unsignaled input 
> > > dependencies.
> > > 
> > > With IRQ putting of jobs off + no bypass (drm_dep_queue_flags = 0):
> > > 
> > >              8,449      context-switches
> > >                412      cpu-migrations
> > >           2,531.43 msec task-clock
> > >      1,847,846,588      cpu_atom/cycles/
> > >      1,847,856,947      cpu_core/cycles/
> > >    <not supported>      cpu_atom/instructions/
> > >        460,744,020      cpu_core/instructions/
> > > 
> > > With IRQ putting of jobs off + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
> > > 
> > >              8,655      context-switches
> > >                229      cpu-migrations
> > >           2,571.33 msec task-clock
> > >        855,900,607      cpu_atom/cycles/
> > >        855,900,272      cpu_core/cycles/
> > >    <not supported>      cpu_atom/instructions/
> > >        403,651,469      cpu_core/instructions/
> > > 
> > > With IRQ putting of jobs on + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED |
> > > DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
> > > 
> > >              5,361      context-switches
> > >                169      cpu-migrations
> > >           2,577.44 msec task-clock
> > >        685,769,153      cpu_atom/cycles/
> > >        685,768,407      cpu_core/cycles/
> > >    <not supported>      cpu_atom/instructions/
> > >        321,336,297      cpu_core/instructions/
> > 
> > Thanks for sharing those numbers. For completeness, can you also add the
> > "With IRQ putting of jobs on + no bypass" case?
> > 
> 
> Yes, I also will share a DRM sched baseline too + I figured out power
> can be measured too - initial results confirm what I expected too - less
> power.
> 
> I'm putting together a doc based on running glxgears and another
> benchmark on top Ubuntu 24.10 + Wayland which has explicit sync
> (linux-drm-syncobj, behaves like surfface flinger when rendering flag to
> not pass in fences to draw jobs).
> 
> Almost have all the data. Will share here once I have it.
> 

Here are some numbers based on glxgears and weston-simple-egl.

5 configurations tested:
DRM sched
DRM dep (no opt flags)
DRM dep + bypass flag
DRM dep + IRQ-safe flag
DRM dep + bypass + IRQ-safe flags

Each configuration was run 3× on both glxgears and weston-simple-egl.
Raptor lake CPU, BMG G21.

Summary:
DRM dep reduces power usage, CPU cycles, and context switches. Enabling
both the bypass and IRQ-safe flags further reduces all of these metrics.

I’d say this test case best models something like scrolling on a phone
or using a laptop for non-GPU-intensive workloads where the screen still
needs to refresh.

I’ve run more intensive benchmarks—glmark2 and Unigine Heaven as well.
The results are somewhat noisy between boots, but I think the same
conclusion holds.

Raw numbers (bit of a firehouse):

DRM sched:
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.565 FPS
300 frames in 5.0 seconds = 60.000 FPS
301 frames in 5.0 seconds = 60.001 FPS

 Performance counter stats for 'system wide':

            71,548        context-switches
             1,466        cpu-migrations
        320,440.96 msec   task-clock
     9,140,249,815        cpu_atom/cycles/
     9,140,253,058        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     7,071,794,806        cpu_core/instructions/
            168.76 Joules power/energy-pkg/
             57.78 Joules power/energy-cores/

      20.029126614 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.642 FPS
300 frames in 5.0 seconds = 59.988 FPS
301 frames in 5.0 seconds = 60.001 FPS

 Performance counter stats for 'system wide':

            71,720        context-switches
             1,581        cpu-migrations
        320,530.64 msec   task-clock
     8,990,313,521        cpu_atom/cycles/
     8,990,315,400        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,988,827,285        cpu_core/instructions/
            172.15 Joules power/energy-pkg/
             58.33 Joules power/energy-cores/

      20.034862844 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.741 FPS
299 frames in 5.0 seconds = 59.798 FPS
299 frames in 5.0 seconds = 59.799 FPS

 Performance counter stats for 'system wide':

            70,871        context-switches
             1,980        cpu-migrations
        320,558.82 msec   task-clock
     8,861,481,467        cpu_atom/cycles/
     8,861,485,448        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,665,294,516        cpu_core/instructions/
            167.82 Joules power/energy-pkg/
             56.97 Joules power/energy-cores/

      20.035713155 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            27,398        context-switches
               678        cpu-migrations
        160,255.17 msec   task-clock
     5,002,546,782        cpu_atom/cycles/
     5,002,549,920        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,498,672,077        cpu_core/instructions/
             93.41 Joules power/energy-pkg/
             23.91 Joules power/energy-cores/

      10.017552274 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            27,322        context-switches
               580        cpu-migrations
        160,307.12 msec   task-clock
     4,783,734,059        cpu_atom/cycles/
     4,783,737,645        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,224,510,206        cpu_core/instructions/
             91.89 Joules power/energy-pkg/
             23.28 Joules power/energy-cores/

      10.020629190 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            27,356        context-switches
               573        cpu-migrations
        160,362.30 msec   task-clock
     5,112,653,847        cpu_atom/cycles/
     5,112,658,503        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,395,873,668        cpu_core/instructions/
             94.40 Joules power/energy-pkg/
             24.58 Joules power/energy-cores/

      10.023979647 seconds time elapsed

No opt (drm_dep_queue_flags = 0):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.597 FPS
300 frames in 5.0 seconds = 59.989 FPS
297 frames in 5.0 seconds = 59.232 FPS

 Performance counter stats for 'system wide':

            66,233        context-switches
             1,820        cpu-migrations
        320,586.39 msec   task-clock
     9,028,164,726        cpu_atom/cycles/
     9,028,178,052        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,541,478,243        cpu_core/instructions/
            178.47 Joules power/energy-pkg/
             44.18 Joules power/energy-cores/

      20.036849235 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.691 FPS
297 frames in 5.0 seconds = 59.393 FPS
300 frames in 5.0 seconds = 59.803 FPS

 Performance counter stats for 'system wide':

            68,389        context-switches
             2,034        cpu-migrations
        320,457.18 msec   task-clock
     8,736,092,056        cpu_atom/cycles/
     8,736,096,958        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,511,630,145        cpu_core/instructions/
            183.23 Joules power/energy-pkg/
             47.43 Joules power/energy-cores/

      20.031469459 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.458 FPS
299 frames in 5.0 seconds = 59.606 FPS
298 frames in 5.0 seconds = 59.590 FPS

 Performance counter stats for 'system wide':

            67,692        context-switches
             1,877        cpu-migrations
        320,524.05 msec   task-clock
     8,837,946,224        cpu_atom/cycles/
     8,837,949,628        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,018,812,170        cpu_core/instructions/
            187.63 Joules power/energy-pkg/
             46.76 Joules power/energy-cores/

      20.034428856 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            27,259        context-switches
               313        cpu-migrations
        160,538.29 msec   task-clock
     5,079,653,975        cpu_atom/cycles/
     5,079,657,432        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,166,877,411        cpu_core/instructions/
             90.72 Joules power/energy-pkg/
             21.70 Joules power/energy-cores/

      10.034716719 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            26,933        context-switches
               449        cpu-migrations
        160,334.74 msec   task-clock
     4,851,027,105        cpu_atom/cycles/
     4,851,054,678        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,042,177,215        cpu_core/instructions/
             87.33 Joules power/energy-pkg/
             21.85 Joules power/energy-cores/

      10.021873082 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            27,101        context-switches
               351        cpu-migrations
        160,333.98 msec   task-clock
     4,903,047,240        cpu_atom/cycles/
     4,903,055,111        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,884,284,727        cpu_core/instructions/
             87.68 Joules power/energy-pkg/
             21.36 Joules power/energy-cores/

      10.021938190 seconds time elapsed

Bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.718 FPS
299 frames in 5.0 seconds = 59.615 FPS
299 frames in 5.0 seconds = 59.795 FPS

 Performance counter stats for 'system wide':

            56,788        context-switches
             2,576        cpu-migrations
        320,610.02 msec   task-clock
     9,056,383,522        cpu_atom/cycles/
     9,056,385,629        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,285,652,796        cpu_core/instructions/
            164.29 Joules power/energy-pkg/
             44.70 Joules power/energy-cores/

      20.041318795 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.734 FPS
300 frames in 5.0 seconds = 59.983 FPS
300 frames in 5.0 seconds = 60.000 FPS

 Performance counter stats for 'system wide':

            56,388        context-switches
             2,326        cpu-migrations
        320,581.07 msec   task-clock
     8,789,215,827        cpu_atom/cycles/
     8,789,217,484        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,251,346,200        cpu_core/instructions/
            162.67 Joules power/energy-pkg/
             44.30 Joules power/energy-cores/

      20.037648324 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.950 FPS
300 frames in 5.0 seconds = 59.993 FPS
300 frames in 5.0 seconds = 59.806 FPS

 Performance counter stats for 'system wide':

            56,167        context-switches
             2,434        cpu-migrations
        320,594.69 msec   task-clock
     8,700,873,664        cpu_atom/cycles/
     8,700,877,150        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,405,556,662        cpu_core/instructions/
            162.55 Joules power/energy-pkg/
             43.33 Joules power/energy-cores/

      20.038448851 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            24,747        context-switches
             1,254        cpu-migrations
        160,543.42 msec   task-clock
     5,047,832,024        cpu_atom/cycles/
     5,047,823,996        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,124,591,155        cpu_core/instructions/
             80.28 Joules power/energy-pkg/
             21.49 Joules power/energy-cores/

      10.034654628 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            24,953        context-switches
               921        cpu-migrations
        160,375.32 msec   task-clock
     5,197,283,835        cpu_atom/cycles/
     5,197,287,623        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,393,363,950        cpu_core/instructions/
             83.36 Joules power/energy-pkg/
             21.92 Joules power/energy-cores/

      10.024899366 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps

 Performance counter stats for 'system wide':

            24,576        context-switches
               966        cpu-migrations
        160,339.37 msec   task-clock
     4,915,705,971        cpu_atom/cycles/
     4,915,709,503        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,968,947,722        cpu_core/instructions/
             79.96 Joules power/energy-pkg/
             21.08 Joules power/energy-cores/

      10.022743041 seconds time elapsed

IRQ (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.643 FPS
298 frames in 5.0 seconds = 59.599 FPS
295 frames in 5.0 seconds = 58.998 FPS

 Performance counter stats for 'system wide':

            60,305        context-switches
             1,994        cpu-migrations
        320,528.79 msec   task-clock
     8,518,549,937        cpu_atom/cycles/
     8,518,573,906        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     5,813,890,066        cpu_core/instructions/
            184.52 Joules power/energy-pkg/
             40.79 Joules power/energy-cores/

      20.032795872 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.759 FPS
299 frames in 5.0 seconds = 59.790 FPS
301 frames in 5.0 seconds = 60.003 FPS

 Performance counter stats for 'system wide':

            59,401        context-switches
             2,256        cpu-migrations
        320,475.03 msec   task-clock
     8,581,759,828        cpu_atom/cycles/
     8,581,763,986        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,748,269,548        cpu_core/instructions/
            179.76 Joules power/energy-pkg/
             40.66 Joules power/energy-cores/

      20.029861532 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.653 FPS
298 frames in 5.0 seconds = 59.404 FPS
300 frames in 5.0 seconds = 59.990 FPS

 Performance counter stats for 'system wide':

            59,381        context-switches
             1,800        cpu-migrations
        320,616.35 msec   task-clock
     8,829,473,025        cpu_atom/cycles/
     8,829,477,019        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,505,926,710        cpu_core/instructions/
            180.38 Joules power/energy-pkg/
             40.86 Joules power/energy-cores/

      20.040016190 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps

 Performance counter stats for 'system wide':

            27,341        context-switches
               786        cpu-migrations
        160,478.01 msec   task-clock
     4,681,440,843        cpu_atom/cycles/
     4,681,443,905        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,969,039,615        cpu_core/instructions/
             91.74 Joules power/energy-pkg/
             20.84 Joules power/energy-cores/

      10.031116623 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            24,626        context-switches
               429        cpu-migrations
        160,367.44 msec   task-clock
     4,828,015,355        cpu_atom/cycles/
     4,828,019,887        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,675,419,833        cpu_core/instructions/
             90.35 Joules power/energy-pkg/
             21.10 Joules power/energy-cores/

      10.024476921 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            24,679        context-switches
               340        cpu-migrations
        160,303.90 msec   task-clock
     4,500,129,961        cpu_atom/cycles/
     4,500,132,697        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,766,150,592        cpu_core/instructions/
             88.01 Joules power/energy-pkg/
             19.76 Joules power/energy-cores/

      10.019653353 seconds time elapsed

IRQ plus bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED | 
DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.958 FPS
299 frames in 5.0 seconds = 59.607 FPS
299 frames in 5.0 seconds = 59.603 FPS

 Performance counter stats for 'system wide':

            46,934        context-switches
             1,558        cpu-migrations
        320,569.83 msec   task-clock
     7,976,414,449        cpu_atom/cycles/
     7,976,417,934        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,126,973,947        cpu_core/instructions/
            178.36 Joules power/energy-pkg/
             40.10 Joules power/energy-cores/

      20.037681420 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.696 FPS
299 frames in 5.0 seconds = 59.616 FPS
299 frames in 5.0 seconds = 59.781 FPS

 Performance counter stats for 'system wide':

            47,691        context-switches
             1,994        cpu-migrations
        320,602.83 msec   task-clock
     8,270,567,663        cpu_atom/cycles/
     8,270,572,484        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     4,361,204,861        cpu_core/instructions/
            181.56 Joules power/energy-pkg/
             40.16 Joules power/energy-cores/

      20.038511163 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.911 FPS
298 frames in 5.0 seconds = 59.597 FPS
300 frames in 5.0 seconds = 59.803 FPS

 Performance counter stats for 'system wide':

            47,129        context-switches
             1,921        cpu-migrations
        320,491.09 msec   task-clock
     8,054,513,204        cpu_atom/cycles/
     8,054,518,711        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,131,796,639        cpu_core/instructions/
            178.54 Joules power/energy-pkg/
             40.08 Joules power/energy-cores/

      20.032444923 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            21,991        context-switches
               286        cpu-migrations
        160,343.73 msec   task-clock
     4,497,475,288        cpu_atom/cycles/
     4,497,477,011        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,042,007,163        cpu_core/instructions/
             89.14 Joules power/energy-pkg/
             20.09 Joules power/energy-cores/

      10.021642254 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            22,366        context-switches
               225        cpu-migrations
        160,386.68 msec   task-clock
     4,398,432,348        cpu_atom/cycles/
     4,398,435,205        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,086,156,274        cpu_core/instructions/
             89.07 Joules power/energy-pkg/
             19.68 Joules power/energy-cores/

      10.024827902 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e 
context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/
 timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            22,515        context-switches
               286        cpu-migrations
        160,481.91 msec   task-clock
     4,447,740,222        cpu_atom/cycles/
     4,447,743,314        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,217,285,071        cpu_core/instructions/
             90.15 Joules power/energy-pkg/
             19.65 Joules power/energy-cores/

      10.029135743 seconds time elapsed

Matt

> > I'm a bit surprised by the difference in number of context switches
> > given I'd expect the local-CPU to be picked in priority, and so queuing
> > work items on the same wq from another work item to be almost free in
> > term on scheduling. But I guess there's some load-balancing happening
> > when you execute jobs at such a high rate.
> > 
> > Also, I don't know if that's just noise or if it's reproducible, but
> > task-clock seems to be ~40usec lower with the deferred cleanup and
> > no-bypass (higher throughput because you're not blocking the dequeuing
> > of the next job on the cleanup of the previous one, I suspect).
> 
> I think that is just noise of what the test is doing in user space -
> that bounces around a bit.
> 
> Matt
> 
> > 

Reply via email to