Replies inline... On 16 January 2016 at 07:52, Larry Gritz <[email protected]> wrote:
> I'm sorry for the long delay here, I got sidetracked for quite a while > trying to unravel a site-specific problem -- in the process of trying to > benchmark different OpenEXR versions, I found out that I was getting vastly > different speeds even on the same exr version depending on whether I built > libIlmImf myself or used the system libraries. It seems to have boiled down > to compiler releases (gcc 4.4 vs gcc 4.8 vs clang -- the latter two make > much faster code for some reason) so it's important to do these kinds of > benchmarks certain that you used the same toolchain for each option you're > benchmarking. > I was using GCC 4.7.2 for both tests, and building everything (IlmIlmf as well as OpenEXR) in both cases, not using system libs (the system hasn't got them installed). I've just done a *very* rough single-threaded test of just opening a tiled image once, and the speeds between 1.7 and 2.0 are close to identical, so maybe the discrepancy I can see (tested it again within renderer) is to do with the usage profile there of multiple threads interacting with other renderer stuff... > Anyway, the long and short of it is that I'm unable to replicate Peter's > results. For me, OpenEXR 2.2 is not any slower than 1.7 in my benchmarks. > If anything, 2.2 is slightly faster. The identical benchmark using tiled, > MIP-mapped TIFF files is still about 15% faster than OpenEXR, even when I > use the compiler versions that give the best exr results. > > So I'm still very eager to get suggestions for what to try next, and if > anybody more familiar with OpenEXR internals is interested in taking > deeper look at why performance may not be what we hope. > I've got no evidence this *is* the issue for EXR reading, but in terms of performance, I've long suspected that the use of IOStreams within OpenEXR might account for some performance penalty compared to raw fread()s - streams in C++ are generally slower, and getting the buffering right for high-performance stuff is tricky, definitely cross-platform. Also, reading and writing of values in OpenEXR goes through ImfXdr.h's conversion routines doing bitshifting for I assume endianness conversion? - I guess the x86 port for OpenEXR had to convert this, whereas the SGI versions didn't, and we're stuck with it now? On top of that, in the multi-threading scenario, while using a LUT for half->float conversion is faster than not using it, it causes absolute havoc in terms of L1/L2 cache thrashing - from disk I've sometimes found reading full float EXRs faster than half EXRs due to this, but that's probably only when the OS disk cache has them, so in general it's not a huge issue given the IO saving that'll happen in most real-world usage for big facilities... Cheers, Peter
_______________________________________________ Oiio-dev mailing list [email protected] http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
