Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. i965 already implements coarse and fine derivatives, selectable by the derivatives hint, coarse default. The calculation of the derivative itself isn't faster for coarse derivatives, but it was discovered that if all of the samples of a sample_d are from the same LOD, it's a bunch faster on Haswell at least. See commit 848c0e72. And with coarse derivatives they are. Maybe other hardware has similar optimizations? Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0 requirement in the spec. I'm not sure if we have a way to arbitrarily limit an extension to being exposed under certain GLSL versions... ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
On 08/13/2014 11:58 PM, Matt Turner wrote: On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. i965 already implements coarse and fine derivatives, selectable by the derivatives hint, coarse default. I don't think that's the same thing. The fine derivatives in i965 definitely do not meet this requirement: ...second-order fine derivatives, e.g., dFdxFine(dFdxFine(x)) will properly reflect the difference between the independent fine derivatives computed within the 2x2 square. As it is now, dFdxFine(dFdxFine(x*x*x))) will always be zero in the i965 driver. Two pixels on the same line will have different dFdy, but the dFdx will be the same. Right? Is there a piglit test for that specific part? (I haven't looked at the piglit list at all.) The calculation of the derivative itself isn't faster for coarse derivatives, but it was discovered that if all of the samples of a sample_d are from the same LOD, it's a bunch faster on Haswell at least. See commit 848c0e72. And with coarse derivatives they are. Maybe other hardware has similar optimizations? Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0 requirement in the spec. I'm not sure if we have a way to arbitrarily limit an extension to being exposed under certain GLSL versions... ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
On Thu, Aug 14, 2014 at 1:30 PM, Ian Romanick i...@freedesktop.org wrote: On 08/13/2014 11:58 PM, Matt Turner wrote: On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. i965 already implements coarse and fine derivatives, selectable by the derivatives hint, coarse default. I don't think that's the same thing. The fine derivatives in i965 definitely do not meet this requirement: ...second-order fine derivatives, e.g., dFdxFine(dFdxFine(x)) will properly reflect the difference between the independent fine derivatives computed within the 2x2 square. As it is now, dFdxFine(dFdxFine(x*x*x))) will always be zero in the i965 driver. Two pixels on the same line will have different dFdy, but the dFdx will be the same. Right? I sent a question about this to the list earlier (with no response other than my own), but I believe that to be a typo in the spec. Look at Issue 2, which explicitly talks about dFdxFine(dFdyFine(...)). There's no way to get second-order derivatives in a single variable with only 2 points, so it would want a larger block. Is there a piglit test for that specific part? (I haven't looked at the piglit list at all.) There's a piglit test for dFdxFine(dFdyFine()). The calculation of the derivative itself isn't faster for coarse derivatives, but it was discovered that if all of the samples of a sample_d are from the same LOD, it's a bunch faster on Haswell at least. See commit 848c0e72. And with coarse derivatives they are. Maybe other hardware has similar optimizations? Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0 requirement in the spec. I'm not sure if we have a way to arbitrarily limit an extension to being exposed under certain GLSL versions... ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
On 08/14/2014 10:30 AM, Ian Romanick wrote: On 08/13/2014 11:58 PM, Matt Turner wrote: On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. i965 already implements coarse and fine derivatives, selectable by the derivatives hint, coarse default. I don't think that's the same thing. The fine derivatives in i965 definitely do not meet this requirement: ...second-order fine derivatives, e.g., dFdxFine(dFdxFine(x)) will properly reflect the difference between the independent fine derivatives computed within the 2x2 square. As it is now, dFdxFine(dFdxFine(x*x*x))) will always be zero in the i965 driver. Two pixels on the same line will have different dFdy, but the dFdx will be the same. Right? Hm... the overview in the extension also says: For the fine-granularity derivative, two derivatives could be computed for each 2x2 group of pixels; one for the top row and one for the bottom row. This matches the fine derivatives in the i965 driver, but it seems at odds with the second-order derivative line in the GLSL 4.50 spec. I guess I'll submit a spec bug... Is there a piglit test for that specific part? (I haven't looked at the piglit list at all.) The calculation of the derivative itself isn't faster for coarse derivatives, but it was discovered that if all of the samples of a sample_d are from the same LOD, it's a bunch faster on Haswell at least. See commit 848c0e72. And with coarse derivatives they are. Maybe other hardware has similar optimizations? Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0 requirement in the spec. I'm not sure if we have a way to arbitrarily limit an extension to being exposed under certain GLSL versions... ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
On 08/14/2014 10:35 AM, Ilia Mirkin wrote: On Thu, Aug 14, 2014 at 1:30 PM, Ian Romanick i...@freedesktop.org wrote: On 08/13/2014 11:58 PM, Matt Turner wrote: On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote: I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. i965 already implements coarse and fine derivatives, selectable by the derivatives hint, coarse default. I don't think that's the same thing. The fine derivatives in i965 definitely do not meet this requirement: ...second-order fine derivatives, e.g., dFdxFine(dFdxFine(x)) will properly reflect the difference between the independent fine derivatives computed within the 2x2 square. As it is now, dFdxFine(dFdxFine(x*x*x))) will always be zero in the i965 driver. Two pixels on the same line will have different dFdy, but the dFdx will be the same. Right? I sent a question about this to the list earlier (with no response other than my own), but I believe that to be a typo in the spec. Look at Issue 2, which explicitly talks about dFdxFine(dFdyFine(...)). There's no way to get second-order derivatives in a single variable with only 2 points, so it would want a larger block. Right... I'm at SIGGRAPH, so I haven't been keeping up with the mailing list very well. I did some research in the Khronos (non-public) mailing list archives, and I came to the same conclusion. I've submitted a spec bug, so hopefully this will be fixed soon. Is there a piglit test for that specific part? (I haven't looked at the piglit list at all.) There's a piglit test for dFdxFine(dFdyFine()). Excellent. :) The calculation of the derivative itself isn't faster for coarse derivatives, but it was discovered that if all of the samples of a sample_d are from the same LOD, it's a bunch faster on Haswell at least. See commit 848c0e72. And with coarse derivatives they are. Maybe other hardware has similar optimizations? Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0 requirement in the spec. I'm not sure if we have a way to arbitrarily limit an extension to being exposed under certain GLSL versions... ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support
I left all the variants as separate operations in the glsl ir. However for gallium I only added the fine version, as it seems like DDX can do pretty much whatever it wants. I was on the fence about adding coarse versions as well and then using the FragmentShaderDerivative hint to select one or the other in the glsl - tgsi conversion. In the case of nv50/nvc0, doing the fine version is pretty much the only (easy) way of doing derivatives. I haven't traced the blob to see how it handles things yet. In any case, on nv50/nvc0 all this is completely moot, at least for now. Curious about what the situation with other hardware is. Also, the extension spec claims to require GLSL 4.00, which seems a little extreme. Instead I restrict it to core contexts. Let me know if I should change this. I will try to send some piglits out for this soon, but it's all fairly straightforward... Ilia Mirkin (5): mesa: add ARB_derivative_control extension bit glsl: add ARB_derivative control support gallium: add opcodes/cap for fine derivative support mesa/st: add support for emitting fine derivative opcodes nv50,nvc0: add support for fine derivatives docs/GL3.txt | 2 +- src/gallium/auxiliary/tgsi/tgsi_info.c | 3 ++ src/gallium/auxiliary/tgsi/tgsi_util.c | 2 + src/gallium/docs/source/screen.rst | 2 + src/gallium/docs/source/tgsi.rst | 12 +- src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 ++ src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 5 ++- src/glsl/builtin_functions.cpp | 48 ++ src/glsl/glcpp/glcpp-parse.y | 3 ++ src/glsl/glsl_parser_extras.cpp| 1 + src/glsl/glsl_parser_extras.h | 2 + src/glsl/ir.h | 4 ++ src/glsl/ir_validate.cpp | 4 ++ src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + src/mesa/state_tracker/st_extensions.c | 3 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 9 +++- 31 files changed, 114 insertions(+), 6 deletions(-) -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev