Hi,
In the previous status update I said that the r600-sb branch is not ready to
be merged yet, but recently I've done some cleanups and reworks, and though
I haven't finished everything that I planned initially, I think now it's in
a better state and may be considered for merging.
I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.
Although I understand that the development of llvm backend is a primary goal
for the r600g developers, it's a complicated process and may require quite
some time to achieve good results regarding the shader/compiler performance,
and at the same time this branch already works and provides good results in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader performance
problems. We can always get rid of it if it becomes too much a maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.
Regarding the support and maintenance of this code, I'll try to do my best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch is
used with both default and llvm backends.
This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.
Some work might be required though if we'll want to add support for the new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think I'll be able to
catch up when it's implemented in the driver and default or llvm backend.
E.g. this branch already works for me on evergreen with some simple OpenCL
kernels, including bfgminer where it increases performance of the kernel
compiled with llvm backend by more than 20% for me.
Besides the performance benefits, I think that alternative backend also
might help with debugging of the default or llvm backend, in some cases it
helped me by exposing the bugs that are not very obvious otherwise, e.g. it
may be hard to compare the dumps from default and llvm backend to spot the
regression because they are too different, but after processing both shaders
with r600-sb the code is usually transformed to some more common form, and
often this makes it easier to compare and find the differences in shader
logic.
One additional feature that might help with llvm backend debugging is the
disassembler that works on the hardware bytecode instead of the internal
r600g bytecode structs. This results in the more readable shader dumps for
instructions passed in native hw encoding from llvm backend. I think this
also can help to catch more potential bugs related to bytecode building in
r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader
dumps, including the fetch shaders, even when optimization is not enabled.
Basically it can replace r600_bytecode_disasm and related code completely.
Below are some quick benchmarks for shader performance and compilation time,
to demonstrate that currently r600-sb might provide better performance for
users, at least in some cases.
As an example of the shaders with good optimization opportunities I used the
application that computes and renders atmospheric scattering effects, it was
mentioned in the previous thread:
http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html
Here are current results for that app (Main.noprecompute, frames per second)
with default backend, default backend + r600-sb, and llvm backend:
def def+sb llvm
240 590 248
Another quick benchmark is an OpenCL kernel performance with bfgminer
(megahash/s):
llvm llvm+sb
68 87
One more benchmark is for compilation speed/overhead - I used two piglit
tests, first compiles a lot of shaders (IIRC more than thousand), second
compiles a few huge shaders. Result is a test run time in seconds, this
includes not only the compilation time but anyway shows the difference:
def def+sb llvm
tfb max-varyings 10 14 53
fp-long-alu 0.17 0.38 0.68
This is especially important for GL apps, because longer compilation time
results in the more significant freezes in the games etc. As for the quality
of the compiled code in this test, of course generally llvm backend is
already able to produce better code in some cases, but e.g. for the longest
shader from the fp-long-alu test both backends optimize it to the two alu
instructions.
Of course this branch won't magically make all applications faster, many
older apps are not really limited by the shader performance at all, but I
think it might improve performance for many relatively modern
applications/engines, e.g. for the applications based on the Unigine and
Source engines.
The branch itself can be found here:
http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
You might prefer to browse new files in a tree instead of reading a huge
patch:
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb?h=r600-sb
If you'd like to test it, currently the optimization for GL shaders is
enabled by default, can be disabled with R600_SB=0. Optimization for compute
shaders is not enabled by default because it's still very limited and
experimental, can be enabled with R600_SB_CL=1. Disassemble of the optimized
shaders is printed with R600_DUMP_SHADERS=2.
If you think that merging of the branch makes sense, any
comments/suggestions about what is required to prepare the branch for
merging are welcome.
Vadim
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev