Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

Jim Snow Fri, 18 Apr 2008 16:13:43 -0700

David Roundy wrote:

On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:

On a particular scene with one instance of the single-threaded renderer
running, it takes about 19 seconds to render an image.  With two
instances running, they each take about 23 seconds.  This is on an
Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
seems my memory really is slowing things down noticeably.


This doesn't mean there's no hope, it just means that you'll need to be
extra-clever if you're to get a speedup that is close to optimal.  The key
to overcoming memory bandwidth issues is to think about cache use and
figure out how to improve it.  For instance, O(N^3) matrix multiplication
can be done in close to O(N^2) time provided it's memory-limited, by
blocking memory accesses so that you access less memory at once.

In the case of ray-tracing I've little idea where or how you could improve
memory access patterns, but this is what you should be thinking about (if
you want to optimize this code).  Of course, improving overall scaling is
best (e.g. avoiding using lists if you need random access).  Next I'd ask
if there are more efficent or compact data structures that you could be
using.  If your program uses less memory, a greater fraction of that memory
will fit into cache.  Switching to stricter data structures and turning on
-funbox-strict-fields (or whatever it's called) may help localize your
memory access.  Even better if you could manage to use unboxed arrays, so
that your memory accesses really would be localized (assuming that you
actually do have localize memory-access patterns).

Of course, also ask yourself how much memory your program is using in
total.  If it's not much more than 512kB, for instance, we may have
misdiagnosed your problem.

Interestingly, switching between "Float" and "Double" doesn't make anynoticeable difference in speed (though I see more rendering artifactswith Float). Transformation matrices are memory hogs, at 24 floats each(a 4x4 matrix and its inverse with the bottom rows omitted (they'realways 0 0 0 1)). This may be one reason why many real-time ray tracersjust stick with triangles; a triangle can be transformed by transformingits verticies, and then you can throw the matrix away.

There are a lot of tricks for making ray tracers more memory-coherent.You can trace packets of rays instead of single rays against whateveracceleration structure you may be using. Kd-tree nodes can be compactedto fit in a single cacheline if you arrange the tree in memory in aparticular way that allows you to omit some of the pointers. (I use BIHtrees, but the same ideas probably apply.) A lot of these sorts oftricks make the resulting code more complex and/or uglier.

Useful references: "What Every Programmer Needs to Know About Memory"http://lwn.net/Articles/250967/Siggraph presentation on optimizing ray tracers (warning: ppt)http://www.openrt.de/Siggraph05/UpdatedCourseNotes/Stoll_Realtime.ppt


Bryan O'Sullivan wrote:

Jim Snow wrote:

> The concurrency bug has to do with excessive memory use, and was
> discussed earlier here on the mailing list (search for Glome).
> http://hackage.haskell.org/trac/ghc/ticket/2185


Interesting.  I looked at your test case.  I can reproduce your problem
when I build with the threaded runtime and run with a single core, but
not if I use +RTS -N2.  Did you overlook the possibility that you may
not have told GHC how many cores to use?

I just tested it again. Memory usage behaves differently depending onhow many cores I tell it to run on, but it always used the least memorywhen I compiled without threading support. With -N1 memory usage growsfaster than -N2, but memory is smaller and doesn't grow larger with eachre-render (except by a very small amount) if I don't use parmap.

Also, your code is sprinkled with many more strictness annotations than
it needs.

        <b

That doesn't surprise me. I haven't really figured out why somethingsare faster strict or not strict, or where it doesn't matter or theannotations are redundant.


-jim

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

Reply via email to