I think just one TextureSystem overall should be fine. I don't think there is
any advantage to having it be per-thread, and you *really* wouldn't want to
have any accident where a per-thread TS inadvertently ended up with a separate
ImageCache per thread.
A bunch of suggestions, in no particular order, because I don't know how many
you are already doing:
Be sure you are preprocessing all your textures with maketx so that they are
tiled and MIP-mapped. That's definitely better then forcing it to emulate
tiling/mipmapping, which will happen if you use untiled, un-mipped textures.
Note that there are two varieties of each call, for example,
bool texture (ustring filename, TextureOpt &options, ...)
and
bool texture (TextureHandle *texture_handle, Perthread *thread_info,
TextureOpt &options, ...)
You can reduce the per-call overhead somewhat if your you use the latter call
-- that is, if each thread already knows its thread_info (which you can
retrieve ONCE per thread with get_thread_info()), and also if you pass the
handle rather than the filename (which you can retrieve ONCE per filename,
using get_texture_handle()).
And if you have to use the first variety of the call, where you look up by
filename and without knowing the per-thread info already, then at least ensure
that you are creating the ustring ONCE and passing it repeatedly, and not
inadvertently constructing a ustring every time.
In other words, this is the most wasteful thing to do:
texturesys->texture (ustring("foo.exr"), /* construct ustring every time */
options, s, t, ...);
and this is the most efficient thing to do:
// ONCE per thread: my_thread_info = texturesys->get_thread_info();
// ONCE per texture: handle = texturesys->get_texture_handle(filename);
// for each texture lookup:
texturesys->texture (handle, my_thread_info, options, ...);
Are your derivatives reasonable? If they are 0 or very small, you'll always be
sampling from the finest level of the MIP-map, which is probably not kind to
caches, and also that finest level of the MIPmap will tend to use bicubic
sampling unless you force bilinear everywhere (somewhat more math). If you are
using correct derivs and your textures are sized well to handle all your views
(without forcing the highest-res level), then you should be in good shape and
as long as you're not "magnifying"/blurring/on the top level, then
"SmartBicubic" will actually give you bilinear most of the time.
Another difference you may be seeing is from our anisotropic texturing,
compared to your old engine. If you don't require the anisotropy, then you may
want to set options.mipmode to MipModeTrilinear rather than MipModeAniso (which
is the default).
What kind of hardware are you compiling for? Are you using appropriate USE_SIMD
flags? Because that can speed up the texture system quite a bit.
I'm not sure how you are benchmarking, but make sure your benchmark run is long
enough (in time) that you are measuring the steady state, and not having it
dominated by initial texture read time. For example, if your prior system was
reading whole textures in one shot, and the new one is reading tiles on demand
(and reading multiple MIP levels as well), the total read time may be a bit
higher. That won't matter at all for a 1 hour render, but the increase in disk
read may show up as significant for a 15 second benchmark.
Assuming you're doing all this... well, you may just be seeing the overhead of
all the flexibility of TextureSystem. Remember that in some sense, it is NOT
designed to be the fastest possible texture implementation for texture sets
that fit in memory. Rather, it's supposed to be acceptable speed and degrade
gracefully as the texture set grows. In production, we routinely render frames
that reference many thousands of textures totalling many hundreds of GB (well
into the TB range), using a memory cache of perhaps only 2 or 4GB, and it
performs very, very well. Texture sets much larger than available memory are
the case where it really shines.
-- lg
> On May 11, 2017, at 7:26 AM, Stefan Werner <[email protected]> wrote:
>
> Hi,
>
> I’m in the middle of integrating OIIO’s TextureSys into a path tracer.
> Previously, textures were just loaded into memory in in full, and lookups
> would always happen at the full resolution, without mip maps. When replacing
> that with TextureSys, I’m noticing a significant performance drop, up to the
> point where texture lookups (sample_bilinear() for example, sample_bicubic()
> even more) occupy 30% or more of the render time. This is with good cache hit
> rates, the cache size exceeds the size of all textures and the OIIO stats
> report a cache miss rate of < 0.01% (in addition, I tried hardcoding
> dsdx/dsdy/dtdx/dtdy to 0.01, just to be sure).
>
> I did expect some performance drop compared to the previous naive strategy,
> but this is a bit steeper than I expected. I am wondering if I am doing
> something wrong on my side and if there are some best practises on how to
> integrate OIIO into a path tracer. (I had it running in a REYES renderer
> years ago and don’t remember it being that slow.)
>
> I am creating one TextureSys instance per CPU thread, with a shared
> ImageCache - are separate caches per thread any better? I cache perthread
> data and do lookups using TextureHandle, not texture name. Do people
> generally use smartbicubic for path tracing or do you not see enough of a
> difference and stay with bilinear (as pbrt does)? For any diffuse/sss/smooth
> glossy/etc bounces, I use MipModeNoMIP/InterpClosest. I am observing this on
> macOS, Windows and Ubuntu, OIIO built with whatever compiler flags CMake
> picks for a Release build. Is it worth it forcing more aggressive
> optimisation (-O3 -lto -ffast-math…)?
>
> Thanks,
> Stefan
> _______________________________________________
> Oiio-dev mailing list
> [email protected]
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
--
Larry Gritz
[email protected]
_______________________________________________
Oiio-dev mailing list
[email protected]
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org