Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-26 Thread Andrew Coppin

david48 wrote:

On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin
<[EMAIL PROTECTED]> wrote:

  

 Personally, I don't see the point in rendering a couple of million
mathematically flat surfaces,



What about speed ?
  


Speed is not always the most important thing. ;-)

["The best things come to those who wait" and all that...]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-26 Thread Sebastian Sylvan
On Sat, Apr 26, 2008 at 10:21 AM, david48
<[EMAIL PROTECTED]<[EMAIL PROTECTED]>>
wrote:

> On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin
> <[EMAIL PROTECTED]> wrote:
>
> >  Personally, I don't see the point in rendering a couple of million
> > mathematically flat surfaces,
>
> What about speed ?
>

If you tessellate the curve so that there is zero error, then it's probably
going to be faster to ray trace the actual curve, since you'll essentially
have loads of very small (1 pixel or thereabouts) triangles...



-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-26 Thread Ketil Malde
david48 <[EMAIL PROTECTED]> writes:

>> Personally, I don't see the point in rendering a couple of million
>> mathematically flat surfaces,

> What about speed ?

"If it doesn't have to be correct, it can be arbitrarily fast"

:-)

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-26 Thread david48
On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin
<[EMAIL PROTECTED]> wrote:

>  Personally, I don't see the point in rendering a couple of million
> mathematically flat surfaces,

What about speed ?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-26 Thread Andrew Coppin

Jim Snow wrote:
The Scene Description Language (SDL) is the best and worst thing about 
POV-Ray.


Indeed.

It's very intuitive and user-friendly, so a person can reasonably 
write a complex scene in pure code without using an external GUI editor.


Not to mention that there's a smooth and continuous path from "I 
rendered a reflective sphere on a chequered plane" to "I build a complex 
CSG object" to "I wrote my first animation" to "I just built a particle 
physics system / fractal terrain generator / inverse kinetics animator / 
whatever".


Unfortunately, the SDL is so complex it's well-nigh impossible to 
support in other third-party applications.


I don't think it would be "impossible" to support in 3rd party apps. A 
lot of work, certainly! Especially the way the parser does parsing and 
macro expansion in the same pass, so a macro body need not contain a 
syntactically-complete code fragment. I think if somebody sat down and 
wrote something that would do all the macro expansion, variable 
substitution and so forth to leave just few geometry descriptions, 
*those* would be quite easy to parse and manipulate for 3rd parties.


It's also slow. The scene has to be re-parsed for every frame.  For 
complex scenes, the scene parsing could take longer than the actual 
render.


For complex scenes, or scenes involving complex algorithms like fractal 
generation or something, parsing can indeed take far longer than 
rendering. The SDL is basically a scripting language. It's kinda 
flexible in its own way, but totally useless in speed terms. It also 
lacks all the features of a "real" programming language - abstraction, 
encapsulation, rich data types, powerful control structures... actually, 
all the things that Haskell excells at. I did start a project a while 
back to try and make it possible to describe POV-Ray scenes in Haskell, 
thus leveraging the full power of the language. But that never came to 
much...


[The "clock" variable is one way to animate, yes. Another is to use the 
"frame_number" variable. A common idiom is to generate an initial state 
at frame #1, and on each frame, read the state from disk, update it, 
write the updated version back to disk, and then render the frame. 
Obviously this destroys the ability to render frames in arbitrary order 
or distribute them across a render farm, but that's rather unavoidable. 
Note that the loading and saving must be hand-coded, from scratch, every 
time you want to do this. Tedious...]


Personally, I'd quite like to write my own ray tracer to address some 
of these limitations. But every time I try, I end up hitting design 
issues [Haskell works very differently to Java] or performance issues 
[which I don't even know how to begin debugging]. Not to mention that 
POV-Ray uses sophisticated techniques like volume bounding that I 
know nothing about. (There's nothing like using an inherantly 
superior algorithm to make your code orders of magnitude faster...)


I haven't really run into any issues where Haskell didn't let me do 
what I want, except for the performance counters thing mentioned way 
back at the beginning of this thread (and which I've more or less 
given up on for now, since the acceleration structure seems to be 
working okay and I have other things to work on).


Well, for example, Haskell doesn't have hetrogenous lists - which are 
trivial in any OOP language. That's quite awkward to get around. Also, 
both spheres and cylinders have a "radius" property, but then you end up 
with name clashes. Again, a non-issue in OOP languages. [I gather 
there's some new GHC language extension to partially solve this one - 
but only when types are statically known. In the general case, you'd 
have to write a "radius class" and define instances... yuck!]


A good acceleration structure definitely makes everything go a lot 
faster.  It's the difference between rendering a scene in a few 
seconds or ten minutes.


BIH is what I'm using.  It's relatively new.  Here's a paper about 
it:  http://ainc.de/Research/BIH.pdf


The actual constructor is based loosely on this pseudocode: 
http://ompf.org/forum/viewtopic.php?p=1411#p1411


I'll take a look. Certainly all the ray tracers I've written just test 
every ray against every object - O(n) complexity? Hmm, that's not 
good[tm]...


You don't need to drop support for the whole multitude of primitives, 
though the ones that are packet-friendly will render faster.  Many 
modern ray tracers spend most of their time traversing the 
acceleration structure rather than doing intersection tests with 
actual geometry, so all you really need to do to get most of the 
benefit of packet tracing is to make the acceleration structure 
support packets.  (For the actual geometry, you can fall back on 
mono-rays.)  I tried 2x2 packets out last night and got about a 10-15% 
speed improvement on the level-3 SPD sphereflake.  (Shadow and 
reflection rays are still mono rays; if I converted them over, the 
improvement wo

Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-26 Thread Andrew Coppin

Jon Harrop wrote:

On Thursday 24 April 2008 20:29:50 Andrew Coppin wrote:
  

2. It's the only program I've seen that can render *real* curves, not
fake trickery with triangle meshes.



What you called "fake trickery with triangle meshes" is the core of all modern 
computer graphics. Focussing on ray tracing instead of that is rather missing 
the point, IMHO.
  


Well, that rather depends on what your "point" is, doesn't it?

Personally, I don't see the point in rendering a couple of million 
mathematically flat surfaces, and then trying to blur the edges to give 
the false illusion of a curved surface - when you could just, you know, 
render a real curved surface. Sure, surface normal tricks can make 
lighting and reflections look almost correct, but they won't fix profile 
views, shadows and self-shadows, intersections, or any of the other 
artifacts caused by using triangles.


But anyway, this is wandering off-topic. Suffice it to say, for my 
humble little hobby graphics, I intend to stick to ray tracing. If you 
disagree, feel free to use something else for your stuff. :-)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-24 Thread Jim Snow

Andrew Coppin wrote:


Wow. The POV-Ray guys are talking about Haskell [or rather, my 
personal addiction to it] and the Haskell guys are talking about 
POV-Ray... on the same day... How unlikely is that? ;-)


That's odd; I had a personal addiction to POV-Ray for a few years, and 
just now have started using Haskell.



I've been using POV-Ray for a long time. I like it for several reasons:

1. It's the only program I've ever seen [on a PC] that does ray 
tracing. [I'm sure there must be others...]
2. It's the only program I've seen that can render *real* curves, not 
fake trickery with triangle meshes.
3. It can render *arbitrary* mathematical surfaces. Want to render a 
3D slice of the 4D cubic Mandelbrot set? No problem!
4. It has a novel "scene description language", which does far more 
than describe scenes. It's Turing-complete, and you can build physics 
engines with it. [It's painfully slow though!]


The Scene Description Language (SDL) is the best and worst thing about 
POV-Ray.  It's very intuitive and user-friendly, so a person can 
reasonably write a complex scene in pure code without using an external 
GUI editor.  Unfortunately, the SDL is so complex it's well-nigh 
impossible to support in other third-party applications.  It's also 
slow.  I don't know if this is still the case, but the standard way of 
doing an animation was to reference a "clock" variable in your scene 
source code that went from 0 to 1; for instance, a command to make a 
swing swing back and forth might looks like this:


"rotate 15*sin((clock/2)*seconds*(2*pi)-((2/3)*pi))*x"

"seconds" here is a variable set to the number of seconds in the 
animation, and "x" is the X axis unit vector <1,0,0>.  The (2/3)*pi 
thing is to make it swing out of phase with the other swings.


(this rather obfuscatory example taken from an actual ancient povray 
source file, you can see a rendering here: 
http://syn.cs.pdx.edu/~jsnow/playground.png)


The scene then has to be re-parsed for every frame.  For complex scenes, 
the scene parsing could take longer than the actual render.


There are many other PC programs that do ray tracing, but POV-Ray is the 
only one I've had any experience with.


The POV-Ray team is currently working on the first multi-threaded 
version. [After years of saying it would never happen.] It's taking a 
while. (That's partly because the developers take product quality very 
seriously.) It should be interesting when it's done, but it's still 
taking a while.
Personally, I'd quite like to write my own ray tracer to address some 
of these limitations. But every time I try, I end up hitting design 
issues [Haskell works very differently to Java] or performance issues 
[which I don't even know how to begin debugging]. Not to mention that 
POV-Ray uses sophisticated techniques like volume bounding that I know 
nothing about. (There's nothing like using an inherantly superior 
algorithm to make your code orders of magnitude faster...)


I haven't really run into any issues where Haskell didn't let me do what 
I want, except for the performance counters thing mentioned way back at 
the beginning of this thread (and which I've more or less given up on 
for now, since the acceleration structure seems to be working okay and I 
have other things to work on).


I would certainly welcome any patches to Glome if you want to contribute 
in some way rather than write something from scratch.


A good acceleration structure definitely makes everything go a lot 
faster.  It's the difference between rendering a scene in a few seconds 
or ten minutes.


BIH is what I'm using.  It's relatively new.  Here's a paper about it:  
http://ainc.de/Research/BIH.pdf


The actual constructor is based loosely on this pseudocode: 
http://ompf.org/forum/viewtopic.php?p=1411#p1411



Evan Laforge wrote:

Not knowing anything about raytracing, one of the things I found
interesting about the paper was that he claimed that the speedups were
from using coherent ray packets, and that the shader model was
orthogonal, and enough much is spent raycasting that the shader code
to make much difference.  The implication was that you could graft a
packet style raycasting engine onto even povray and give it a nice
speed boost... though you might have to lose the nifty "real" shapes
to tessellated approximations.

Is this accurate?  True but reality is more complicated?
  
You don't need to drop support for the whole multitude of primitives, 
though the ones that are packet-friendly will render faster.  Many 
modern ray tracers spend most of their time traversing the acceleration 
structure rather than doing intersection tests with actual geometry, so 
all you really need to do to get most of the benefit of packet tracing 
is to make the acceleration structure support packets.  (For the actual 
geometry, you can fall back on mono-rays.)  I tried 2x2 packets out last 
night and got about a 10-15% speed improvement on the level-3 SPD 
sphereflake.  (Shadow and ref

Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-24 Thread Jon Harrop
On Thursday 24 April 2008 20:29:50 Andrew Coppin wrote:
> 2. It's the only program I've seen that can render *real* curves, not
> fake trickery with triangle meshes.

What you called "fake trickery with triangle meshes" is the core of all modern 
computer graphics. Focussing on ray tracing instead of that is rather missing 
the point, IMHO.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-24 Thread Evan Laforge
>  I've ended up writing something more like POV-Ray than OpenRT, and that's
> fine with me.  I'd rather have something that's slower but more expressive
> than fast but inflexible.  (The former is also perhaps more easily
> attainable, particularly in Haskell.)

Not knowing anything about raytracing, one of the things I found
interesting about the paper was that he claimed that the speedups were
from using coherent ray packets, and that the shader model was
orthogonal, and enough much is spent raycasting that the shader code
to make much difference.  The implication was that you could graft a
packet style raycasting engine onto even povray and give it a nice
speed boost... though you might have to lose the nifty "real" shapes
to tessellated approximations.

Is this accurate?  True but reality is more complicated?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-24 Thread Andrew Coppin
Jim Snow wrote: 
Those are good links.  It's good to see that the groups of people who 
know about Haskell and people who know about ray tracing do, in fact, 
overlap.


Background information for everyone else:  Wald's work is related to 
OpenRT, which is an OpenGL-like api for interactive ray tracing  
(despite the name, it is not, in fact, open source).  OpenRT makes for 
stiff competition.  Arauna (http://igad.nhtv.nl/~bikker/) is very 
impressive, as well.  On the other end of the spectrum, there's 
POV-Ray, which isn't known for its speed, but it is very flexible in 
the kinds of things it can render and can generate some fairly 
realistic images.  Unlike most real-time ray tracers, which only 
render triangles, POV-Ray can render native representations of 
spheres, cones, toruses, heightfields, julia fractals, and a few dozen 
other kinds of primitives.  Pbrt (http://www.pbrt.org/) is another 
renderer, more modern than POV-Ray, that focuses more on output 
quality than speed.


Wow. The POV-Ray guys are talking about Haskell [or rather, my personal 
addiction to it] and the Haskell guys are talking about POV-Ray... on 
the same day... How unlikely is that? ;-)


I've been using POV-Ray for a long time. I like it for several reasons:

1. It's the only program I've ever seen [on a PC] that does ray tracing. 
[I'm sure there must be others...]
2. It's the only program I've seen that can render *real* curves, not 
fake trickery with triangle meshes.
3. It can render *arbitrary* mathematical surfaces. Want to render a 3D 
slice of the 4D cubic Mandelbrot set? No problem!
4. It has a novel "scene description language", which does far more than 
describe scenes. It's Turing-complete, and you can build physics engines 
with it. [It's painfully slow though!]


Basically, as a maths addict, POV-Ray appeals to me. However, there are 
problems... The scene description language is basically a macro 
expansion language, and is chronically poor in data types and control 
structures. (The worst thing about Haskell is that IT MAKES YOU HATE 
NORMAL LANGUAGES!) Preserving complex scene state from frame to frame in 
an animation is notoriously tedious to code. Certain features are 
accessed in unintuitive ways to avoid breaking backwards compatibility. 
Overall, you tend to spend a lot of time saying which effects to 
simulate. [Not all of them matter for a given scene, so some can be left 
out - thus saving on speed!] I dislike such "clutter". For no objective 
reason.


The POV-Ray team is currently working on the first multi-threaded 
version. [After years of saying it would never happen.] It's taking a 
while. (That's partly because the developers take product quality very 
seriously.) It should be interesting when it's done, but it's still 
taking a while.


Personally, I'd quite like to write my own ray tracer to address some of 
these limitations. But every time I try, I end up hitting design issues 
[Haskell works very differently to Java] or performance issues [which I 
don't even know how to begin debugging]. Not to mention that POV-Ray 
uses sophisticated techniques like volume bounding that I know nothing 
about. (There's nothing like using an inherantly superior algorithm to 
make your code orders of magnitude faster...)



Simon Marlow wrote:
There's definitely a bug here, regardless of whether this example 
demonstrates it.  Use of par shouldn't introduce a space leak, and 
currently it can.


(fortunately I'm fixing it as we speak...)

Cheers,
Simon 

Oh, good.


Indeed - yay for Simon!

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-23 Thread Jim Snow

Derek Elkins wrote:


Ingo Wald's work on interactive coherent raytracing springs to mind.
http://www.sci.utah.edu/~wald/Publications/

"Interactive Rendering with Coherent Raytracing"
http://graphics.cs.uni-sb.de/~wald/Publications/EG2001_IRCRT/InteractiveRenderingWithCoherentRayTracing.pdf
is a decent, if dated, introduction.  He clearly has much more newer stuff as 
well.

  
Those are good links.  It's good to see that the groups of people who 
know about Haskell and people who know about ray tracing do, in fact, 
overlap.


Background information for everyone else:  Wald's work is related to 
OpenRT, which is an OpenGL-like api for interactive ray tracing  
(despite the name, it is not, in fact, open source).  OpenRT makes for 
stiff competition.  Arauna (http://igad.nhtv.nl/~bikker/) is very 
impressive, as well.  On the other end of the spectrum, there's POV-Ray, 
which isn't known for its speed, but it is very flexible in the kinds of 
things it can render and can generate some fairly realistic images.  
Unlike most real-time ray tracers, which only render triangles, POV-Ray 
can render native representations of spheres, cones, toruses, 
heightfields, julia fractals, and a few dozen other kinds of 
primitives.  Pbrt (http://www.pbrt.org/) is another renderer, more 
modern than POV-Ray, that focuses more on output quality than speed.


Unfortunately, all the notable ray tracers that I'm aware of are written 
in C or C++ (often with a little hand-coded SSE), and as you might 
imagine I find this to be a sad state of affairs.  Not that those are 
bad languages for this kind of work, but they shouldn't be the only option.


I've ended up writing something more like POV-Ray than OpenRT, and 
that's fine with me.  I'd rather have something that's slower but more 
expressive than fast but inflexible.  (The former is also perhaps more 
easily attainable, particularly in Haskell.) 

This isn't to say that I'm not interested in making it fast as well.  
There are plenty of ways to make my raytracer faster: Kd-trees built 
using a surface area heuristic performed better than naively-built BIH 
when I implemented them in my ocaml raytracer (though they take longer 
to construct).  If I can figure out how quaternions work, I could 
probably use them instead of transformation matricies to store cones in 
order to cut down on memory overhead (4 floats versus 24, if I 
understand correctly).  Ray packets, as described in Wald's paper linked 
above, might help as well.


Simon Marlow wrote:
There's definitely a bug here, regardless of whether this example 
demonstrates it.  Use of par shouldn't introduce a space leak, and 
currently it can.


(fortunately I'm fixing it as we speak...)

Cheers,
Simon 

Oh, good.

-jim
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-23 Thread Derek Elkins
On Fri, 2008-04-18 at 14:34 -0700, David Roundy wrote:
> On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:
> > On a particular scene with one instance of the single-threaded renderer
> > running, it takes about 19 seconds to render an image.  With two
> > instances running, they each take about 23 seconds.  This is on an
> > Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
> > seems my memory really is slowing things down noticeably.
> 
> This doesn't mean there's no hope, it just means that you'll need to be
> extra-clever if you're to get a speedup that is close to optimal.  The key
> to overcoming memory bandwidth issues is to think about cache use and
> figure out how to improve it.  For instance, O(N^3) matrix multiplication
> can be done in close to O(N^2) time provided it's memory-limited, by
> blocking memory accesses so that you access less memory at once.
> 
> In the case of ray-tracing I've little idea where or how you could improve
> memory access patterns, but this is what you should be thinking about (if
> you want to optimize this code).  Of course, improving overall scaling is
> best (e.g. avoiding using lists if you need random access).  Next I'd ask
> if there are more efficent or compact data structures that you could be
> using.  If your program uses less memory, a greater fraction of that memory
> will fit into cache.  Switching to stricter data structures and turning on
> -funbox-strict-fields (or whatever it's called) may help localize your
> memory access.  Even better if you could manage to use unboxed arrays, so
> that your memory accesses really would be localized (assuming that you
> actually do have localize memory-access patterns).
> 
> Of course, also ask yourself how much memory your program is using in
> total.  If it's not much more than 512kB, for instance, we may have
> misdiagnosed your problem.

Ingo Wald's work on interactive coherent raytracing springs to mind.
http://www.sci.utah.edu/~wald/Publications/

"Interactive Rendering with Coherent Raytracing"
http://graphics.cs.uni-sb.de/~wald/Publications/EG2001_IRCRT/InteractiveRenderingWithCoherentRayTracing.pdf
is a decent, if dated, introduction.  He clearly has much more newer stuff as 
well.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Don Stewart
jsnow:
> David Roundy wrote:
> >On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:
> >  
> >>On a particular scene with one instance of the single-threaded renderer
> >>running, it takes about 19 seconds to render an image.  With two
> >>instances running, they each take about 23 seconds.  This is on an
> >>Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
> >>seems my memory really is slowing things down noticeably.
> >>
> >
> >This doesn't mean there's no hope, it just means that you'll need to be
> >extra-clever if you're to get a speedup that is close to optimal.  The key
> >to overcoming memory bandwidth issues is to think about cache use and
> >figure out how to improve it.  For instance, O(N^3) matrix multiplication
> >can be done in close to O(N^2) time provided it's memory-limited, by
> >blocking memory accesses so that you access less memory at once.
> >
> >In the case of ray-tracing I've little idea where or how you could improve
> >memory access patterns, but this is what you should be thinking about (if
> >you want to optimize this code).  Of course, improving overall scaling is
> >best (e.g. avoiding using lists if you need random access).  Next I'd ask
> >if there are more efficent or compact data structures that you could be
> >using.  If your program uses less memory, a greater fraction of that 
> >memory
> >will fit into cache.  Switching to stricter data structures and turning on
> >-funbox-strict-fields (or whatever it's called) may help localize your
> >memory access.  Even better if you could manage to use unboxed arrays, so
> >that your memory accesses really would be localized (assuming that you
> >actually do have localize memory-access patterns).
> >
> >Of course, also ask yourself how much memory your program is using in
> >total.  If it's not much more than 512kB, for instance, we may have
> >misdiagnosed your problem.
> >  
> Interestingly, switching between "Float" and "Double" doesn't make any 
> noticeable difference in speed (though I see more rendering artifacts 
> with Float).  Transformation matrices are memory hogs, at 24 floats each 
> (a 4x4 matrix and its inverse with the bottom rows omitted (they're 
> always 0 0 0 1)).  This may be one reason why many real-time ray tracers 
> just stick with triangles; a triangle can be transformed by transforming 
> its verticies, and then you can throw the matrix away.

The only differences I'd expect to see here would
be with -fvia-C -fexcess-precision -O2 -optc-O2

which might trigger some SSE stuff from the C compiler
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Jim Snow

David Roundy wrote:

On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:
  

On a particular scene with one instance of the single-threaded renderer
running, it takes about 19 seconds to render an image.  With two
instances running, they each take about 23 seconds.  This is on an
Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
seems my memory really is slowing things down noticeably.



This doesn't mean there's no hope, it just means that you'll need to be
extra-clever if you're to get a speedup that is close to optimal.  The key
to overcoming memory bandwidth issues is to think about cache use and
figure out how to improve it.  For instance, O(N^3) matrix multiplication
can be done in close to O(N^2) time provided it's memory-limited, by
blocking memory accesses so that you access less memory at once.

In the case of ray-tracing I've little idea where or how you could improve
memory access patterns, but this is what you should be thinking about (if
you want to optimize this code).  Of course, improving overall scaling is
best (e.g. avoiding using lists if you need random access).  Next I'd ask
if there are more efficent or compact data structures that you could be
using.  If your program uses less memory, a greater fraction of that memory
will fit into cache.  Switching to stricter data structures and turning on
-funbox-strict-fields (or whatever it's called) may help localize your
memory access.  Even better if you could manage to use unboxed arrays, so
that your memory accesses really would be localized (assuming that you
actually do have localize memory-access patterns).

Of course, also ask yourself how much memory your program is using in
total.  If it's not much more than 512kB, for instance, we may have
misdiagnosed your problem.
  
Interestingly, switching between "Float" and "Double" doesn't make any 
noticeable difference in speed (though I see more rendering artifacts 
with Float).  Transformation matrices are memory hogs, at 24 floats each 
(a 4x4 matrix and its inverse with the bottom rows omitted (they're 
always 0 0 0 1)).  This may be one reason why many real-time ray tracers 
just stick with triangles; a triangle can be transformed by transforming 
its verticies, and then you can throw the matrix away.


There are a lot of tricks for making ray tracers more memory-coherent.  
You can trace packets of rays instead of single rays against whatever 
acceleration structure you may be using.  Kd-tree nodes can be compacted 
to fit in a single cacheline if you arrange the tree in memory in a 
particular way that allows you to omit some of the pointers.  (I use BIH 
trees, but the same ideas probably apply.)  A lot of these sorts of 
tricks make the resulting code more complex and/or uglier.


Useful references: "What Every Programmer Needs to Know About Memory" 
http://lwn.net/Articles/250967/
Siggraph presentation on optimizing ray tracers (warning: ppt) 
http://www.openrt.de/Siggraph05/UpdatedCourseNotes/Stoll_Realtime.ppt


Bryan O'Sullivan wrote:

Jim Snow wrote:

  

> The concurrency bug has to do with excessive memory use, and was
> discussed earlier here on the mailing list (search for Glome).
> http://hackage.haskell.org/trac/ghc/ticket/2185



Interesting.  I looked at your test case.  I can reproduce your problem
when I build with the threaded runtime and run with a single core, but
not if I use +RTS -N2.  Did you overlook the possibility that you may
not have told GHC how many cores to use?

  
I just tested it again.  Memory usage behaves differently depending on 
how many cores I tell it to run on, but it always used the least memory 
when I compiled without threading support.  With -N1 memory usage grows 
faster than -N2, but memory is smaller and doesn't grow larger with each 
re-render (except by a very small amount) if I don't use parmap.

Also, your code is sprinkled with many more strictness annotations than
it needs.

  
That doesn't surprise me.  I haven't really figured out why somethings 
are faster strict or not strict, or where it doesn't matter or the 
annotations are redundant.


-jim

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Bryan O'Sullivan
Jim Snow wrote:

> The concurrency bug has to do with excessive memory use, and was
> discussed earlier here on the mailing list (search for Glome).
> http://hackage.haskell.org/trac/ghc/ticket/2185

Interesting.  I looked at your test case.  I can reproduce your problem
when I build with the threaded runtime and run with a single core, but
not if I use +RTS -N2.  Did you overlook the possibility that you may
not have told GHC how many cores to use?

Also, your code is sprinkled with many more strictness annotations than
it needs.

http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread David Roundy
On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:
> On a particular scene with one instance of the single-threaded renderer
> running, it takes about 19 seconds to render an image.  With two
> instances running, they each take about 23 seconds.  This is on an
> Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
> seems my memory really is slowing things down noticeably.

This doesn't mean there's no hope, it just means that you'll need to be
extra-clever if you're to get a speedup that is close to optimal.  The key
to overcoming memory bandwidth issues is to think about cache use and
figure out how to improve it.  For instance, O(N^3) matrix multiplication
can be done in close to O(N^2) time provided it's memory-limited, by
blocking memory accesses so that you access less memory at once.

In the case of ray-tracing I've little idea where or how you could improve
memory access patterns, but this is what you should be thinking about (if
you want to optimize this code).  Of course, improving overall scaling is
best (e.g. avoiding using lists if you need random access).  Next I'd ask
if there are more efficent or compact data structures that you could be
using.  If your program uses less memory, a greater fraction of that memory
will fit into cache.  Switching to stricter data structures and turning on
-funbox-strict-fields (or whatever it's called) may help localize your
memory access.  Even better if you could manage to use unboxed arrays, so
that your memory accesses really would be localized (assuming that you
actually do have localize memory-access patterns).

Of course, also ask yourself how much memory your program is using in
total.  If it's not much more than 512kB, for instance, we may have
misdiagnosed your problem.
-- 
David Roundy
Department of Physics
Oregon State University
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Sebastian Sylvan
On Fri, Apr 18, 2008 at 9:10 PM, Jim Snow <[EMAIL PROTECTED]> wrote:

> The scene is shared between threads.  (Complex scenes can be quite large.)
>  I'm assuming this is handled as a read-only shared memory region or
> something similar, as one might expect, and is not actually copied.
>

Ah, when people say "shared memory concurrency", they usually mean shared
*mutable* memory concurrency, which this isn't.


-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Jim Snow

David Roundy wrote:

On Sat, Apr 19, 2008 at 12:19:19AM +0400, Bulat Ziganshin wrote:
  

Saturday, April 19, 2008, 12:10:23 AM, you wrote:


The other problem I had with concurrency is that I was getting about a
50% speedup instead of the 99% or so that I'd expect on two cores.  I 
  

2 cores doesn't guarantee 2x speedup. some programs are limited by
memory access speed and you still have just one memory :)



In fact, this is relatively easily tested (albeit crudely):  just run two
copies of your single-threaded program at the same time.  If they take
longer than when run one at a time, you can guess that you're
memory-limited, and you won't get such good performance from threading your
code.  But this is only a crude hint, since memory performance is strongly
dependent on cache behavior, and running one threaded job may either do
better or worse than two single-threaded jobs.  If you've got two separate CPUs
with two separate caches, the simultaneous single-threaded jobs should beat the
threaded job (meaning take less than twice as long), since each job should
have full access to one cache.  If you've got two cores sharing a single
cache, the behavior may be the opposite:  the threaded job uses less total
memory than the two single-threaded jobs, so more of the data may stay in
cache.

For reference, on a friend's dual quad-core Intel system (i.e. 8 cores
total), if he runs 8 simultaneous (identical) memory-intensive job he only
gets about five times the throughput of a job, meaning that each core is
running at something like 60% of it's CPU capacity due to memory
contention.  It's possible that your system is comparably limited, although
I'd be suprised, somehow it seems unlikely that your ray tracer is
stressing the cache all that much.
  

On a particular scene with one instance of the single-threaded renderer
running, it takes about 19 seconds to render an image.  With two
instances running, they each take about 23 seconds.  This is on an
Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
seems my memory really is slowing things down noticeably.

-jim

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread David Roundy
On Sat, Apr 19, 2008 at 12:19:19AM +0400, Bulat Ziganshin wrote:
> Saturday, April 19, 2008, 12:10:23 AM, you wrote:
> > The other problem I had with concurrency is that I was getting about a
> > 50% speedup instead of the 99% or so that I'd expect on two cores.  I 
> 
> 2 cores doesn't guarantee 2x speedup. some programs are limited by
> memory access speed and you still have just one memory :)

In fact, this is relatively easily tested (albeit crudely):  just run two
copies of your single-threaded program at the same time.  If they take
longer than when run one at a time, you can guess that you're
memory-limited, and you won't get such good performance from threading your
code.  But this is only a crude hint, since memory performance is strongly
dependent on cache behavior, and running one threaded job may either do
better or worse than two single-threaded jobs.  If you've got two separate CPUs
with two separate caches, the simultaneous single-threaded jobs should beat the
threaded job (meaning take less than twice as long), since each job should
have full access to one cache.  If you've got two cores sharing a single
cache, the behavior may be the opposite:  the threaded job uses less total
memory than the two single-threaded jobs, so more of the data may stay in
cache.

For reference, on a friend's dual quad-core Intel system (i.e. 8 cores
total), if he runs 8 simultaneous (identical) memory-intensive job he only
gets about five times the throughput of a job, meaning that each core is
running at something like 60% of it's CPU capacity due to memory
contention.  It's possible that your system is comparably limited, although
I'd be suprised, somehow it seems unlikely that your ray tracer is
stressing the cache all that much.
-- 
David Roundy
Department of Physics
Oregon State University
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Bulat Ziganshin
Hello Jim,

Saturday, April 19, 2008, 12:10:23 AM, you wrote:

> The other problem I had with concurrency is that I was getting about a
> 50% speedup instead of the 99% or so that I'd expect on two cores.  I 

2 cores doesn't guarantee 2x speedup. some programs are limited by
memory access speed and you still have just one memory :)

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Jim Snow

Don Stewart wrote:

jsnow:
  

A new version of my raytracer is out. ...



Very impressive. Did you consider cabalising the Haskell code, so it 
can be easily distributed from hackage.haskell.org?


I note on the website you say:

"no threading (shared-memory concurrency is not supported by ocaml,
in haskell it's buggy)"

Could you elaborate on this? Shared memory concurrency is a sweet spot
in Haskell, and heavily utilised, so I think we'd all like to know more
details..

-- Don
  

The concurrency bug has to do with excessive memory use, and was discussed 
earlier here on the mailing list (search for Glome).
http://hackage.haskell.org/trac/ghc/ticket/2185


The other problem I had with concurrency is that I was getting about a 
50% speedup instead of the 99% or so that I'd expect on two cores.  I 
figured I'm probably doing something wrong.


I don't have any objection to using cabal, I just haven't gone to the 
trouble to figure it out yet.  Maybe in the next release.



Sebastian Sylvan wrote:
Not sure what you need shared memory concurrency for in this case as 
it seems to be a straightforward parallelism problem (i.e. the 
different threads would be different pixels, there is no sharing needed).


The scene is shared between threads.  (Complex scenes can be quite 
large.)  I'm assuming this is handled as a read-only shared memory 
region or something similar, as one might expect, and is not actually 
copied.


-jim
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Sebastian Sylvan
On Fri, Apr 18, 2008 at 7:43 PM, Don Stewart <[EMAIL PROTECTED]> wrote:

> jsnow:
> > A new version of my raytracer is out.  It now supports cones, cylinders,
> > disks, boxes, and planes as base primitives (previously it only
> > supported triangles and spheres), as well as transformations of
> > arbitrary objects (rotate, scale, translate) and the CSG operations
> > difference and intersection.  Perlin noise and Blinn highlights have
> > been added, as well.
> >
> > http://syn.cs.pdx.edu/~jsnow/glome/
> >
> > Glome can parse NFF-format scene files (see
> > http://tog.acm.org/resources/SPD/), but many features are only
> > accessible via raw Haskell, since NFF doesn't support very many kinds of
> > primitives.  I included a TestScene.hs file that demonstrates how to
> > create a scene with various kinds of geometry (including a crude attempt
> > at a recursively-defined oak tree) in haskell.  There isn't any
> > documentation yet, but many of the primitives have constructors that
> > resemble their equivalents in povray, so anyone familiar with povray's
> > syntax should be able to figure out what's going on.
>
> Very impressive. Did you consider cabalising the Haskell code, so it
> can be easily distributed from hackage.haskell.org?
>
> I note on the website you say:
>
>"no threading (shared-memory concurrency is not supported by ocaml,
>in haskell it's buggy)"
>
> Could you elaborate on this? Shared memory concurrency is a sweet spot
> in Haskell, and heavily utilised, so I think we'd all like to know more
> details..
>

Not sure what you need shared memory concurrency for in this case as it
seems to be a straightforward parallelism problem (i.e. the different
threads would be different pixels, there is no sharing needed).

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-18 Thread Don Stewart
jsnow:
> A new version of my raytracer is out.  It now supports cones, cylinders, 
> disks, boxes, and planes as base primitives (previously it only 
> supported triangles and spheres), as well as transformations of 
> arbitrary objects (rotate, scale, translate) and the CSG operations 
> difference and intersection.  Perlin noise and Blinn highlights have 
> been added, as well.
> 
> http://syn.cs.pdx.edu/~jsnow/glome/
> 
> Glome can parse NFF-format scene files (see 
> http://tog.acm.org/resources/SPD/), but many features are only 
> accessible via raw Haskell, since NFF doesn't support very many kinds of 
> primitives.  I included a TestScene.hs file that demonstrates how to 
> create a scene with various kinds of geometry (including a crude attempt 
> at a recursively-defined oak tree) in haskell.  There isn't any 
> documentation yet, but many of the primitives have constructors that 
> resemble their equivalents in povray, so anyone familiar with povray's 
> syntax should be able to figure out what's going on.

Very impressive. Did you consider cabalising the Haskell code, so it 
can be easily distributed from hackage.haskell.org?

I note on the website you say:

"no threading (shared-memory concurrency is not supported by ocaml,
in haskell it's buggy)"

Could you elaborate on this? Shared memory concurrency is a sweet spot
in Haskell, and heavily utilised, so I think we'd all like to know more
details..

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

2008-04-17 Thread Jim Snow
A new version of my raytracer is out.  It now supports cones, cylinders, 
disks, boxes, and planes as base primitives (previously it only 
supported triangles and spheres), as well as transformations of 
arbitrary objects (rotate, scale, translate) and the CSG operations 
difference and intersection.  Perlin noise and Blinn highlights have 
been added, as well.


http://syn.cs.pdx.edu/~jsnow/glome/

Glome can parse NFF-format scene files (see 
http://tog.acm.org/resources/SPD/), but many features are only 
accessible via raw Haskell, since NFF doesn't support very many kinds of 
primitives.  I included a TestScene.hs file that demonstrates how to 
create a scene with various kinds of geometry (including a crude attempt 
at a recursively-defined oak tree) in haskell.  There isn't any 
documentation yet, but many of the primitives have constructors that 
resemble their equivalents in povray, so anyone familiar with povray's 
syntax should be able to figure out what's going on.


-jim
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe