I got a lot of good feedback at SIGGRAPH, and a recurring item was a request to improve performance. I heard loud and clear, and the first installment is due today.
I just made simultaneous checkins to the master branches of OIIO and OSL that include a major rewrite of the TextureSystem internals to use SSE instructions. With this first stab at it, I have approximately doubled the performance for color texture lookups. How much performance improvement you'll see in a render is variable -- it depends on the mix of 1 vs 3 or 4 channel lookups, exactly what machine architecture you are using, whether or not you are asking for derivatives of texture results, and of course what percentage of time in your render is texture math to begin with. There's still more performance to squeeze out, I think, this is just the first stab at it, and I have only done 2D texture() so far, no work on environment() or texture3d(). More coming. (Also, using similar technology to speed up ImageBufAlgo functions, maketx, oiiotool, and so on. This is really just the very first bit.) In the process of this work, I've changed some aspects of the TextureSystem API, and that's why there is a simultaneous change to OSL. It's a true compatibility break, so you'll need to upgrade both simultaneously if you are working from the master branches. Additionally, in the process, I changed OSL's RendererServices API, so you may need to make minor changes on the renderer side to match it. (If you're using release branches, i.e. OIIO <= 1.4 and OSL <= 1.5, you don't need to worry about this at all, until the next major releases are cut, probably not for a few months.) I also tagged dev releases in both projects JUST BEFORE merging in the SIMD changes. So if you are in master and don't want to take on the risky SIMD changes until the rest of us have tested more extensively, or if you try it and discover problems, then look for the tags that mention "pre-SIMD", those will be the latest fixes before the big change. Those of you building OSL and OIIO from source, you'll probably want to build both packages using the make/cmake option USE_SIMD=arch. The arch can be sse2, sse3, sse4.1, sse4.2. If you are building on x86_64, that automatically implies sse2. But if you want SSE>2, you'll need to specify the build flag, and be sure not to specify one that is more advanced than the machines you are deploying on -- this is entirely a build-time choice and it does not currently do any runtime adjustment based on the machine it finds itself running on. The next step that I'm already working on, now that I've got some helper classes written and my brain attuned to thinking about vector ops, is to have OSL pad and align all the "triples" (colors, points, and so on) to be 4-float vectors and generate SSE code for all their manipulation (in the end, the 4th components will be discarded). I'm hoping to get a significant speedup in shader execution as a result, but it's too early to predict how much. Additionally, there are all sorts of other areas where these techniques can be applied, such as internally to the computation of the various noises, and so on. I've also received many requests for "batch shading" -- computing texture and/or OSL shading for many points at once, hoping to use SIMD or other ways to amortize computation across the batches. I have plans for that as well, but I want to squeeze as much as possible out of the vector instructions first. So stay tuned, I expect a series of continued performance improvements in both packages to come at a steady pace over the next several weeks and months. -- Larry Gritz [email protected] _______________________________________________ Oiio-dev mailing list [email protected] http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
