On Wed, 15 Apr 2020 20:59:40 GMT, Kevin Rushforth <k...@openjdk.org> wrote:

>> Here are the results on Phil's machine, which is a Mac Book Pro with a 
>> graphics accelerator (Nvidia, I think).
>> 
>> Without the patch:
>> 2000 quads average 8.805 fps
>> 
>> With the patch:
>> 2000 quads average 4.719 fps
>> 
>> Almost a 2x performance hit.
>
> Conclusion: The new shaders that support attenuation don't seem to have much 
> of a performance impact on machines with
> an Intel HD, but on systems with a graphics accelerator, it is a significant 
> slowdown.
> So we are left with the two choices of doubling the number of shaders (that 
> is, a set of shaders with attenuation and a
> set without) or living with the performance hit (which will only be a problem 
> on machines with a dedicated graphics
> accelerator for highly fill-limited scenes). The only way we can justify a 2x 
> drop in performance is if we are fairly
> certain that this is a corner case, and thus unlikely to hit real 
> applications.  If we do end up deciding to replicate
> the shaders, I don't think it is all that much work. I'm more worried about 
> how well it would scale to subsequent
> improvements, although we could easily decide that for, say, spotlights 
> attenuation is so common that you wouldn't
> create a version that doesn't do that.  In the D3D HLSL shaders, ifdefs are 
> used, so the work would be to restore the
> original code and add the new code under an ifdef. Then double the number of 
> lines of gradle (at that point, I'd do it
> in a for-each loop), then modify the logic that loads the shaders to pick the 
> right one.  For GLSL, the different parts
> of the shader are in different files, so it's a matter of creating new 
> versions of each of the three lighting shaders
> that handle attenuation and choosing the right one at runtime.

I discussed this with a graphics engineer. He said that a couple of branches do 
not have any real performance impact
even on modern mobile devices, and that, e.g., on iOS 7 using half floats 
instead of floats was improving shader
execution dramatically. Desktops with NVIDIA or AMD and even Intel modern cards 
can process dozens of branches with no
significant performance degradation.

He suggested actually to have all the light types in a single shader file 
(looking ahead here). He also suggested not
to permute on shaders based on the number of lights and just pass in a uniform 
for that number and loop over it. The
permutations on the bump, specular and self illuminations components are 
correct (not sure we are not doing that for
the diffuse component). If we add later shadows, which is not on my near to-do 
list, then we should permute there.

It also depends on our target hardware. If we take into account hardware from, 
say, 2005 then maybe branching will
cause significant performance loss, but that hinders our ability to increase 
performance for newer hardware. What is
the policy here?

I have a Win10 laptop with a GeForce 610M that I will test this weekend to see 
if the mobile NVidia cards have some
issue.

-------------

PR: https://git.openjdk.java.net/jfx/pull/43

Reply via email to