I got a lot of good feedback at SIGGRAPH, and a recurring item was a request to 
improve performance. I heard loud and clear, and the first installment is due 
today.

I just made simultaneous checkins to the master branches of OIIO and OSL that 
include a major rewrite of the TextureSystem internals to use SSE instructions. 
With this first stab at it, I have approximately doubled the performance for 
color texture lookups. How much performance improvement you'll see in a render 
is variable -- it depends on the mix of 1 vs 3 or 4 channel lookups, exactly 
what machine architecture you are using, whether or not you are asking for 
derivatives of texture results, and of course what percentage of time in your 
render is texture math to begin with. There's still more performance to squeeze 
out, I think, this is just the first stab at it, and I have only done 2D 
texture() so far, no work on environment() or texture3d(). More coming. (Also, 
using similar technology to speed up ImageBufAlgo functions, maketx, oiiotool, 
and so on. This is really just the very first bit.)

In the process of this work, I've changed some aspects of the TextureSystem 
API, and that's why there is a simultaneous change to OSL. It's a true 
compatibility break, so you'll need to upgrade both simultaneously if you are 
working from the master branches. Additionally, in the process, I changed OSL's 
RendererServices API, so you may need to make minor changes on the renderer 
side to match it. (If you're using release branches, i.e. OIIO <= 1.4 and OSL 
<= 1.5, you don't need to worry about this at all, until the next major 
releases are cut, probably not for a few months.)

I also tagged dev releases in both projects JUST BEFORE merging in the SIMD 
changes. So if you are in master and don't want to take on the risky SIMD 
changes until the rest of us have tested more extensively, or if you try it and 
discover problems, then look for the tags that mention "pre-SIMD", those will 
be the latest fixes before the big change.

Those of you building OSL and OIIO from source, you'll probably want to build 
both packages using the make/cmake option USE_SIMD=arch. The arch can be sse2, 
sse3, sse4.1, sse4.2. If you are building on x86_64, that automatically implies 
sse2. But if you want SSE>2, you'll need to specify the build flag, and be sure 
not to specify one that is more advanced than the machines you are deploying on 
-- this is entirely a build-time choice and it does not currently do any 
runtime adjustment based on the machine it finds itself running on.

The next step that I'm already working on, now that I've got some helper 
classes written and my brain attuned to thinking about vector ops, is to have 
OSL pad and align all the "triples" (colors, points, and so on) to be 4-float 
vectors and generate SSE code for all their manipulation (in the end, the 4th 
components will be discarded). I'm hoping to get a significant speedup in 
shader execution as a result, but it's too early to predict how much. 
Additionally, there are all sorts of other areas where these techniques can be 
applied, such as internally to the computation of the various noises, and so on.

I've also received many requests for "batch shading" -- computing texture 
and/or OSL shading for many points at once, hoping to use SIMD or other ways to 
amortize computation across the batches. I have plans for that as well, but I 
want to squeeze as much as possible out of the vector instructions first.

So stay tuned, I expect a series of continued performance improvements in both 
packages to come at a steady pace over the next several weeks and months.

--
Larry Gritz
[email protected]



_______________________________________________
Oiio-dev mailing list
[email protected]
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Reply via email to