On 22/10/15 00:43, Rowley, Timothy O wrote:

On Oct 20, 2015, at 5:58 PM, Jose Fonseca <jfons...@vmware.com> wrote:

Thanks for the explanations.  It's closer now, but still a bit of gap:

$ KNOB_MAX_THREADS_PER_CORE=0 ./gloss
SWR create screen!
This processor supports AVX2.
--> numThreads = 3
1102 frames in 5.002 seconds = 220.312 FPS
1133 frames in 5.001 seconds = 226.555 FPS
1130 frames in 5.002 seconds = 225.91 FPS
^C
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1456 frames in 5 seconds = 291.2 FPS
1617 frames in 5.003 seconds = 323.206 FPS
1571 frames in 5.002 seconds = 314.074 FPS

A bit more of an apples to apples comparison might be single-threaded llvmpipe 
(LP_NUM_THREADS=1) and single-threaded swr (KNOB_SINGLE_THREADED=1).  Running 
gloss and glxgears (another favorite “benchmark” :) ) under these conditions 
show swr running a bit slower, though a little closer than your numbers.


Indeed that seems a better comparison.

$ KNOB_SINGLE_THREADED=1 ./gloss
SWR create screen!
This processor supports AVX2.
733 frames in 5.003 seconds = 146.512 FPS
787 frames in 5.004 seconds = 157.274 FPS
793 frames in 5.005 seconds = 158.442 FPS
799 frames in 5.001 seconds = 159.768 FPS
787 frames in 5.005 seconds = 157.243 FPS
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=0 ./gloss
939 frames in 5.002 seconds = 187.725 FPS
1032 frames in 5.001 seconds = 206.359 FPS
1017 frames in 5.002 seconds = 203.319 FPS
1021 frames in 5 seconds = 204.2 FPS
1039 frames in 5.002 seconds = 207.717 FPS

> Examining performance traces, we think swr’s concept of hot-tiles, the working memory representation of the render target, and the associated load/store functions contribute to most of the difference. We might be able to optimize those conversions; additionally fast clear would help these demos. For larger workloads this small per-frame cost doesn’t really affect the performance.

These initial observations from you and others regarding performance have been 
interesting.  Our performance work has been with large workloads on high core 
count configurations, where while some of the decisions such as a dedicated 
core for the application/API might have cost performance a bit, the percentage 
is much less than on the dual and quad core processors.  We’ll look into some 
changes/tuning that will benefit both extremes, though we might have to end up 
conceding that llvmpipe will be faster at glxgears. :-)

I don't care for gears -- it practically measure present/blit rate --, but gloss spite simple is sensitive to texturing performance.

Final thoughts: I understand this project has its own history, but I echo what 
Roland said -- it would be nice to unify with llvmpipe at one point, in some 
way or fashion.  Our (VMware's) focus has been desktop composition, but there's 
no reason why a single SW renderer can't satisfy both ends of the spectrum, 
especially for JIT enable renderers, since they can emit at runtime the code 
most suited for the workload.

We would be happy for someone to take some of the ideas from swr to speed up 
llvmpipe, but for now our development will continue on the swr core and driver. 
 We’re not planning on replacing llvmpipe - its intent of working on any 
architecture is admirable.  In the ideal world the solution would be something 
that combines the best traits of both rasterizers, but at this point the 
shortest path to having a performant solution for our customers is with swr.

Fair enough.

They do share a lot already, Mesa, gallium statetracker, and gallivm. If further development in openswr is planned, it might require to jump through a few hoops, but I think it's worth to figure out what would take to get this merged into master so that, whenever there are interface changes, openswer won't get the short stick.

That said, it's really nice seeing Mesa and Gallium enabling this sort of 
experiments with SW rendering.

Yes, we were quite happy with how fast we were able to get a new driver 
functioning with gallium.  The major thing slowing us was the documentation, 
which is not uniform in coverage.  There was a lot of reading other drivers’ 
source to figure out how things were supposed to work.

Yes, that's a fair comment.

Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to