On Wed, 15 Apr 2020 20:59:40 GMT, Kevin Rushforth <k...@openjdk.org> wrote:
>> Here are the results on Phil's machine, which is a Mac Book Pro with a >> graphics accelerator (Nvidia, I think). >> >> Without the patch: >> 2000 quads average 8.805 fps >> >> With the patch: >> 2000 quads average 4.719 fps >> >> Almost a 2x performance hit. > > Conclusion: The new shaders that support attenuation don't seem to have much > of a performance impact on machines with > an Intel HD, but on systems with a graphics accelerator, it is a significant > slowdown. > So we are left with the two choices of doubling the number of shaders (that > is, a set of shaders with attenuation and a > set without) or living with the performance hit (which will only be a problem > on machines with a dedicated graphics > accelerator for highly fill-limited scenes). The only way we can justify a 2x > drop in performance is if we are fairly > certain that this is a corner case, and thus unlikely to hit real > applications. If we do end up deciding to replicate > the shaders, I don't think it is all that much work. I'm more worried about > how well it would scale to subsequent > improvements, although we could easily decide that for, say, spotlights > attenuation is so common that you wouldn't > create a version that doesn't do that. In the D3D HLSL shaders, ifdefs are > used, so the work would be to restore the > original code and add the new code under an ifdef. Then double the number of > lines of gradle (at that point, I'd do it > in a for-each loop), then modify the logic that loads the shaders to pick the > right one. For GLSL, the different parts > of the shader are in different files, so it's a matter of creating new > versions of each of the three lighting shaders > that handle attenuation and choosing the right one at runtime. I discussed this with a graphics engineer. He said that a couple of branches do not have any real performance impact even on modern mobile devices, and that, e.g., on iOS 7 using half floats instead of floats was improving shader execution dramatically. Desktops with NVIDIA or AMD and even Intel modern cards can process dozens of branches with no significant performance degradation. He suggested actually to have all the light types in a single shader file (looking ahead here). He also suggested not to permute on shaders based on the number of lights and just pass in a uniform for that number and loop over it. The permutations on the bump, specular and self illuminations components are correct (not sure we are not doing that for the diffuse component). If we add later shadows, which is not on my near to-do list, then we should permute there. It also depends on our target hardware. If we take into account hardware from, say, 2005 then maybe branching will cause significant performance loss, but that hinders our ability to increase performance for newer hardware. What is the policy here? I have a Win10 laptop with a GeForce 610M that I will test this weekend to see if the mobile NVidia cards have some issue. ------------- PR: https://git.openjdk.java.net/jfx/pull/43