So the parallelization of SmallPT uses `dynamic` scheduling
    
    
    #pragma omp parallel for schedule(dynamic, 1) private(r)       // OpenMP
      for (int y=0; y<h; y++){                       // Loop over image rows
    
    
    Run

This is very very important because some rays do not collide and do nothing, 
while some rays bounce all other the place. And the worse is that often you 
have wide expanse of "easy scenes" (sky, walls, ...) and so those would be 
assigned threads and the threads that are assigned the complex parts would be 
all alone.

Example: 
[https://www.cs.brandeis.edu/~dilant/WebPage_TA160/initialsllides.pdf](https://www.cs.brandeis.edu/~dilant/WebPage_TA160/initialsllides.pdf)

image:: 
[https://gist.githubusercontent.com/mratsim/aaecdb8d77582bcfc2994b0ee66b99d5/raw/6cb890ee5ff68946133ba45fd9013ff555156fa2/2020-05-22_18-33.png](https://gist.githubusercontent.com/mratsim/aaecdb8d77582bcfc2994b0ee66b99d5/raw/6cb890ee5ff68946133ba45fd9013ff555156fa2/2020-05-22_18-33.png)

A runtime without any load balancing wouldn't be able to scale a raytracing 
application (which makes it an interesting load balancing benchmark)

I expect the GCC team broke dynamic scheduling and default to static. An easy 
way to confirm that is to change the schedule from dynamic to static, and rerun 
with GCC8 and Clang and see if the perf degradation matches GCC10.

I leave that as an exercise to the reader (just joking).

Reply via email to