jsnow: > David Roundy wrote: > >On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote: > > > >>On a particular scene with one instance of the single-threaded renderer > >>running, it takes about 19 seconds to render an image. With two > >>instances running, they each take about 23 seconds. This is on an > >>Athlon-64 3800+ dual core, with 512kB of L2 cache per core. So, it > >>seems my memory really is slowing things down noticeably. > >> > > > >This doesn't mean there's no hope, it just means that you'll need to be > >extra-clever if you're to get a speedup that is close to optimal. The key > >to overcoming memory bandwidth issues is to think about cache use and > >figure out how to improve it. For instance, O(N^3) matrix multiplication > >can be done in close to O(N^2) time provided it's memory-limited, by > >blocking memory accesses so that you access less memory at once. > > > >In the case of ray-tracing I've little idea where or how you could improve > >memory access patterns, but this is what you should be thinking about (if > >you want to optimize this code). Of course, improving overall scaling is > >best (e.g. avoiding using lists if you need random access). Next I'd ask > >if there are more efficent or compact data structures that you could be > >using. If your program uses less memory, a greater fraction of that > >memory > >will fit into cache. Switching to stricter data structures and turning on > >-funbox-strict-fields (or whatever it's called) may help localize your > >memory access. Even better if you could manage to use unboxed arrays, so > >that your memory accesses really would be localized (assuming that you > >actually do have localize memory-access patterns). > > > >Of course, also ask yourself how much memory your program is using in > >total. If it's not much more than 512kB, for instance, we may have > >misdiagnosed your problem. > > > Interestingly, switching between "Float" and "Double" doesn't make any > noticeable difference in speed (though I see more rendering artifacts > with Float). Transformation matrices are memory hogs, at 24 floats each > (a 4x4 matrix and its inverse with the bottom rows omitted (they're > always 0 0 0 1)). This may be one reason why many real-time ray tracers > just stick with triangles; a triangle can be transformed by transforming > its verticies, and then you can throw the matrix away.
The only differences I'd expect to see here would be with -fvia-C -fexcess-precision -O2 -optc-O2 which might trigger some SSE stuff from the C compiler _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe