Re: [Mesa-dev] [PATCH] clover: support CL_PROGRAM_BINARY_TYPE query
On Saturday, August 09, 2014 01:18:57 AM Ilia Mirkin wrote: On Fri, Aug 8, 2014 at 10:10 PM, EdB edb+m...@sigluy.net wrote: --- src/gallium/state_trackers/clover/api/program.cpp | 3 +++ src/gallium/state_trackers/clover/core/program.cpp | 8 src/gallium/state_trackers/clover/core/program.hpp | 1 + 3 files changed, 12 insertions(+) diff --git a/src/gallium/state_trackers/clover/api/program.cpp b/src/gallium/state_trackers/clover/api/program.cpp index b81ce69..0e9e3c9 100644 --- a/src/gallium/state_trackers/clover/api/program.cpp +++ b/src/gallium/state_trackers/clover/api/program.cpp @@ -266,6 +266,9 @@ clGetProgramBuildInfo(cl_program d_prog, cl_device_id d_dev, buf.as_string() = prog.build_log(dev); break; + case CL_PROGRAM_BINARY_TYPE: + buf.as_scalarcl_program_binary_type() = prog.binary_type(dev); break? Thanks + default: throw error(CL_INVALID_VALUE); } diff --git a/src/gallium/state_trackers/clover/core/program.cpp b/src/gallium/state_trackers/clover/core/program.cpp index e09c3aa..482df7e 100644 --- a/src/gallium/state_trackers/clover/core/program.cpp +++ b/src/gallium/state_trackers/clover/core/program.cpp @@ -103,6 +103,14 @@ program::build_log(const device dev) const { return _logs.count(dev) ? _logs.find(dev)-second : ; } +cl_program_binary_type +program::binary_type(const device dev) const { + if (!_binaries.count(dev)) + return CL_PROGRAM_BINARY_TYPE_NONE; + else + return CL_PROGRAM_BINARY_TYPE_EXECUTABLE; +} + const compat::vectormodule::symbol program::symbols() const { if (_binaries.empty()) diff --git a/src/gallium/state_trackers/clover/core/program.hpp b/src/gallium/state_trackers/clover/core/program.hpp index 1081454..b932b95 100644 --- a/src/gallium/state_trackers/clover/core/program.hpp +++ b/src/gallium/state_trackers/clover/core/program.hpp @@ -57,6 +57,7 @@ namespace clover { cl_build_status build_status(const device dev) const; std::string build_opts(const device dev) const; std::string build_log(const device dev) const; + cl_program_binary_type binary_type(const device dev) const; const compat::vectormodule::symbol symbols() const; -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] clover: support CL_PROGRAM_BINARY_TYPE query
--- src/gallium/state_trackers/clover/api/program.cpp | 4 src/gallium/state_trackers/clover/core/program.cpp | 8 src/gallium/state_trackers/clover/core/program.hpp | 1 + 3 files changed, 13 insertions(+) diff --git a/src/gallium/state_trackers/clover/api/program.cpp b/src/gallium/state_trackers/clover/api/program.cpp index b81ce69..c3fe129 100644 --- a/src/gallium/state_trackers/clover/api/program.cpp +++ b/src/gallium/state_trackers/clover/api/program.cpp @@ -266,6 +266,10 @@ clGetProgramBuildInfo(cl_program d_prog, cl_device_id d_dev, buf.as_string() = prog.build_log(dev); break; + case CL_PROGRAM_BINARY_TYPE: + buf.as_scalarcl_program_binary_type() = prog.binary_type(dev); + break; + default: throw error(CL_INVALID_VALUE); } diff --git a/src/gallium/state_trackers/clover/core/program.cpp b/src/gallium/state_trackers/clover/core/program.cpp index e09c3aa..482df7e 100644 --- a/src/gallium/state_trackers/clover/core/program.cpp +++ b/src/gallium/state_trackers/clover/core/program.cpp @@ -103,6 +103,14 @@ program::build_log(const device dev) const { return _logs.count(dev) ? _logs.find(dev)-second : ; } +cl_program_binary_type +program::binary_type(const device dev) const { + if (!_binaries.count(dev)) + return CL_PROGRAM_BINARY_TYPE_NONE; + else + return CL_PROGRAM_BINARY_TYPE_EXECUTABLE; +} + const compat::vectormodule::symbol program::symbols() const { if (_binaries.empty()) diff --git a/src/gallium/state_trackers/clover/core/program.hpp b/src/gallium/state_trackers/clover/core/program.hpp index 1081454..b932b95 100644 --- a/src/gallium/state_trackers/clover/core/program.hpp +++ b/src/gallium/state_trackers/clover/core/program.hpp @@ -57,6 +57,7 @@ namespace clover { cl_build_status build_status(const device dev) const; std::string build_opts(const device dev) const; std::string build_log(const device dev) const; + cl_program_binary_type binary_type(const device dev) const; const compat::vectormodule::symbol symbols() const; -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote: Could you please do this without changing u_upload_mgr? You can still use u_upload_alloc to allocate buffer memory in the driver and the map buffer read/write flags are not important with persistent coherent buffer mappings anyway. Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM. We should therefore set buffer.usage to PIPE_USAGE_STAGING in u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we won't get any performance improvements. Would it now be OK to change u_upload_mgr or do you have a better proposal? Ole Marek On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: Using the DMA engine for buffer downloads vastly improves performance. This is because reads from VRAM by the CPU are slow because of the high latency of the PCIe bus. The first patch allows u_upload_mgr to be used for downloads, too. The second patch then uses u_upload_mgr in the radeon driver for downloads. I considered to rename u_upload_mgr to u_transfer_mgr since it might be confusing that an upload manager can be used for downloads. But then again we also have transfers so that u_transfer_mgr might also be confusing. Thus, I decided not to rename it for now. Without these patches, the buffer_bandwidth benchmark from uCLbench gives me: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) 14.61 MB/s(DH) With these paches, the read performance is much better: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) 1841.07 MB/s(DH) Judging by these numbers, it might even make sense to use the DMA engine for larger buffer downloads... Niels Ole Salscheider (2): util/u_upload_mgr: Allow to also use it for downloads radeon: Use transfer manager for buffer downloads src/gallium/auxiliary/hud/hud_context.c | 3 +- src/gallium/auxiliary/util/u_blitter.c | 3 +- src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++- src/gallium/auxiliary/util/u_upload_mgr.h | 13 - src/gallium/auxiliary/util/u_vbuf.c | 3 +- src/gallium/auxiliary/vl/vl_compositor.c| 3 +- src/gallium/drivers/ilo/ilo_context.c | 3 +- src/gallium/drivers/r300/r300_context.c | 3 +- src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++-- src/gallium/drivers/radeon/r600_pipe_common.c | 14 - src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/mesa/state_tracker/st_context.c | 9 ++- 12 files changed, 136 insertions(+), 46 deletions(-) -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
Acked-by: Marek Olšák marek.ol...@amd.com Marek On Sat, Aug 9, 2014 at 7:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote: So... can I get a review on this? It's the last bit needed for ARB_gs5 (well, except for the actual setting of the extension bit to 1) The only additional change I have in my local version is that instead of + inst-sampler_array_size = + ir-sampler-as_dereference_array() +-array-variable_referenced()-type-length; I use ir-sampler-as_dereference_array()-array-type-array_size() I sent out a few piglits and they pass with the nvc0 driver on both fermi and kepler hw (which do textures a little differently from one another). -ilia On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote: On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote: pc-MaxAddressRegs = pc-MaxNativeAddressRegs = _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS), MAX_PROGRAM_ADDRESS_REGS); Not really sure what that's referring to... ARB_vp/fp or something? Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0). Anyways, this is definitely a bit of a violation of that. OTOH, so is the indirect UBO indexing and indirect GS input access (assuming that's allowed), since those would use ADDR[1] and every driver (except nv30) returns 1, and sometimes 0 -- including nv50/nvc0/r600/radeonsi. So... dunno what the proper way to proceed is. Fix drivers to claim higher numbers? Continue the tradition of ignoring it and relying on the fact that GPU's that don't support it also won't support the features that cause it to get used? You don't have to worry about that for now. We can clean it up later. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads
You can try to do the allocation of the staging buffer with pipe_buffer_create instead of u_upload_mgr. You can also use u_suballocator, which is like a stripped out version of u_upload_mgr. You would need another instance of u_upload_mgr anyway, because we'd like to continue using PIPE_USAGE_STREAM for uploads. Marek On Sat, Aug 9, 2014 at 2:35 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote: Could you please do this without changing u_upload_mgr? You can still use u_upload_alloc to allocate buffer memory in the driver and the map buffer read/write flags are not important with persistent coherent buffer mappings anyway. Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM. We should therefore set buffer.usage to PIPE_USAGE_STAGING in u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we won't get any performance improvements. Would it now be OK to change u_upload_mgr or do you have a better proposal? Ole Marek On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider niels_...@salscheider-online.de wrote: Using the DMA engine for buffer downloads vastly improves performance. This is because reads from VRAM by the CPU are slow because of the high latency of the PCIe bus. The first patch allows u_upload_mgr to be used for downloads, too. The second patch then uses u_upload_mgr in the radeon driver for downloads. I considered to rename u_upload_mgr to u_transfer_mgr since it might be confusing that an upload manager can be used for downloads. But then again we also have transfers so that u_transfer_mgr might also be confusing. Thus, I decided not to rename it for now. Without these patches, the buffer_bandwidth benchmark from uCLbench gives me: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) 14.61 MB/s(DH) With these paches, the read performance is much better: ./buffer_bandwidth --size=2000 --iterations=100 # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory, 32 KB local memory) 1/1 direct 2000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) 1841.07 MB/s(DH) Judging by these numbers, it might even make sense to use the DMA engine for larger buffer downloads... Niels Ole Salscheider (2): util/u_upload_mgr: Allow to also use it for downloads radeon: Use transfer manager for buffer downloads src/gallium/auxiliary/hud/hud_context.c | 3 +- src/gallium/auxiliary/util/u_blitter.c | 3 +- src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++- src/gallium/auxiliary/util/u_upload_mgr.h | 13 - src/gallium/auxiliary/util/u_vbuf.c | 3 +- src/gallium/auxiliary/vl/vl_compositor.c| 3 +- src/gallium/drivers/ilo/ilo_context.c | 3 +- src/gallium/drivers/r300/r300_context.c | 3 +- src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++-- src/gallium/drivers/radeon/r600_pipe_common.c | 14 - src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/mesa/state_tracker/st_context.c | 9 ++- 12 files changed, 136 insertions(+), 46 deletions(-) -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
On closer look, it looks to me like it wouldn't be all that difficult to make proper sampler array dcls (as you apparently get them quite easily from glsl) which is really my only problem with it but it's not really all that important so Reviewed-by: Roland Scheidegger srol...@vmware.com Am 09.08.2014 07:52, schrieb Ilia Mirkin: So... can I get a review on this? It's the last bit needed for ARB_gs5 (well, except for the actual setting of the extension bit to 1) The only additional change I have in my local version is that instead of + inst-sampler_array_size = + ir-sampler-as_dereference_array() +-array-variable_referenced()-type-length; I use ir-sampler-as_dereference_array()-array-type-array_size() I sent out a few piglits and they pass with the nvc0 driver on both fermi and kepler hw (which do textures a little differently from one another). -ilia On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote: On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote: pc-MaxAddressRegs = pc-MaxNativeAddressRegs = _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS), MAX_PROGRAM_ADDRESS_REGS); Not really sure what that's referring to... ARB_vp/fp or something? Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0). Anyways, this is definitely a bit of a violation of that. OTOH, so is the indirect UBO indexing and indirect GS input access (assuming that's allowed), since those would use ADDR[1] and every driver (except nv30) returns 1, and sometimes 0 -- including nv50/nvc0/r600/radeonsi. So... dunno what the proper way to proceed is. Fix drivers to claim higher numbers? Continue the tradition of ignoring it and relying on the fact that GPU's that don't support it also won't support the features that cause it to get used? You don't have to worry about that for now. We can clean it up later. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=Xj1ezgC9gYJQCVun05Qsf2AVkWdMTWcufiVpM6QH7Mk%3D%0As=03cd8d1ec34661de07f9c6aa8d6efef3163477a18a55dc4a89d2eaa7d73d16f5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
On Sat, Aug 9, 2014 at 10:14 AM, Roland Scheidegger srol...@vmware.com wrote: On closer look, it looks to me like it wouldn't be all that difficult to make proper sampler array dcls (as you apparently get them quite easily If you can briefly outline how you think that should be done, I'd be happy to try to do it. Right now the samplers are declared in st_translate_program, which only has the program-samplers_used bitfield (where program is glsl_to_tgsi_visitor). One quick thing that occurs to me is to keep track of samplers in a map of index - array size (0 if it's not accessed indirectly or not an array), and then using the map to create the declarations. Does that seem reasonable? Either way, this seems like a nice-to-have rather than required, so I'm going to push this patch even if the other one isn't ready. from glsl) which is really my only problem with it but it's not really all that important so Reviewed-by: Roland Scheidegger srol...@vmware.com Thanks! Am 09.08.2014 07:52, schrieb Ilia Mirkin: So... can I get a review on this? It's the last bit needed for ARB_gs5 (well, except for the actual setting of the extension bit to 1) The only additional change I have in my local version is that instead of + inst-sampler_array_size = + ir-sampler-as_dereference_array() +-array-variable_referenced()-type-length; I use ir-sampler-as_dereference_array()-array-type-array_size() I sent out a few piglits and they pass with the nvc0 driver on both fermi and kepler hw (which do textures a little differently from one another). -ilia On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote: On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote: pc-MaxAddressRegs = pc-MaxNativeAddressRegs = _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS), MAX_PROGRAM_ADDRESS_REGS); Not really sure what that's referring to... ARB_vp/fp or something? Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0). Anyways, this is definitely a bit of a violation of that. OTOH, so is the indirect UBO indexing and indirect GS input access (assuming that's allowed), since those would use ADDR[1] and every driver (except nv30) returns 1, and sometimes 0 -- including nv50/nvc0/r600/radeonsi. So... dunno what the proper way to proceed is. Fix drivers to claim higher numbers? Continue the tradition of ignoring it and relying on the fact that GPU's that don't support it also won't support the features that cause it to get used? You don't have to worry about that for now. We can clean it up later. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=Xj1ezgC9gYJQCVun05Qsf2AVkWdMTWcufiVpM6QH7Mk%3D%0As=03cd8d1ec34661de07f9c6aa8d6efef3163477a18a55dc4a89d2eaa7d73d16f5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
Am 09.08.2014 16:33, schrieb Ilia Mirkin: On Sat, Aug 9, 2014 at 10:14 AM, Roland Scheidegger srol...@vmware.com wrote: On closer look, it looks to me like it wouldn't be all that difficult to make proper sampler array dcls (as you apparently get them quite easily If you can briefly outline how you think that should be done, I'd be happy to try to do it. Right now the samplers are declared in st_translate_program, which only has the program-samplers_used bitfield (where program is glsl_to_tgsi_visitor). One quick thing that occurs to me is to keep track of samplers in a map of index - array size (0 if it's not accessed indirectly or not an array), and then using the map to create the declarations. Does that seem reasonable? Well I didn't think about it that much but nothing is stopping you from putting whatever you need into the glsl_to_tgsi_visitor to keep that array information instead of just filling out samplers_used. Either way, this seems like a nice-to-have rather than required, so I'm going to push this patch even if the other one isn't ready. Agreed. The address thingy is also something which should go away as discussed but really separate issue too. btw you get the actual sampler type from the constant in the sampler reg right? So if you have 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D that +1 in there says the 2nd declared sampler determines what kind of sampler this is (2d, 3d, shadow...). Roland from glsl) which is really my only problem with it but it's not really all that important so Reviewed-by: Roland Scheidegger srol...@vmware.com Thanks! Am 09.08.2014 07:52, schrieb Ilia Mirkin: So... can I get a review on this? It's the last bit needed for ARB_gs5 (well, except for the actual setting of the extension bit to 1) The only additional change I have in my local version is that instead of + inst-sampler_array_size = + ir-sampler-as_dereference_array() +-array-variable_referenced()-type-length; I use ir-sampler-as_dereference_array()-array-type-array_size() I sent out a few piglits and they pass with the nvc0 driver on both fermi and kepler hw (which do textures a little differently from one another). -ilia On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote: On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote: pc-MaxAddressRegs = pc-MaxNativeAddressRegs = _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS), MAX_PROGRAM_ADDRESS_REGS); Not really sure what that's referring to... ARB_vp/fp or something? Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0). Anyways, this is definitely a bit of a violation of that. OTOH, so is the indirect UBO indexing and indirect GS input access (assuming that's allowed), since those would use ADDR[1] and every driver (except nv30) returns 1, and sometimes 0 -- including nv50/nvc0/r600/radeonsi. So... dunno what the proper way to proceed is. Fix drivers to claim higher numbers? Continue the tradition of ignoring it and relying on the fact that GPU's that don't support it also won't support the features that cause it to get used? You don't have to worry about that for now. We can clean it up later. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=Xj1ezgC9gYJQCVun05Qsf2AVkWdMTWcufiVpM6QH7Mk%3D%0As=03cd8d1ec34661de07f9c6aa8d6efef3163477a18a55dc4a89d2eaa7d73d16f5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
On Sat, Aug 9, 2014 at 11:12 AM, Roland Scheidegger srol...@vmware.com wrote: Am 09.08.2014 16:33, schrieb Ilia Mirkin: On Sat, Aug 9, 2014 at 10:14 AM, Roland Scheidegger srol...@vmware.com wrote: On closer look, it looks to me like it wouldn't be all that difficult to make proper sampler array dcls (as you apparently get them quite easily If you can briefly outline how you think that should be done, I'd be happy to try to do it. Right now the samplers are declared in st_translate_program, which only has the program-samplers_used bitfield (where program is glsl_to_tgsi_visitor). One quick thing that occurs to me is to keep track of samplers in a map of index - array size (0 if it's not accessed indirectly or not an array), and then using the map to create the declarations. Does that seem reasonable? Well I didn't think about it that much but nothing is stopping you from putting whatever you need into the glsl_to_tgsi_visitor to keep that array information instead of just filling out samplers_used. Either way, this seems like a nice-to-have rather than required, so I'm going to push this patch even if the other one isn't ready. Agreed. The address thingy is also something which should go away as discussed but really separate issue too. btw you get the actual sampler type from the constant in the sampler reg right? So if you have 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D that +1 in there says the 2nd declared sampler determines what kind of sampler this is (2d, 3d, shadow...). From what I can tell, samplers (in TGSI) have no attached semantics... they're just an index into an array that you're promised was set up correctly beforehand. For emitting the instructions, we look at the 2D from the tex instruction. BTW, I have a follow-up patch that I'll send shortly which generates FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL IN[0], GENERIC[0], PERSPECTIVE DCL OUT[0], COLOR DCL SAMP[0] DCL SAMP[1..3] DCL CONST[4] DCL TEMP[0..1], LOCAL DCL ADDR[0..2] IMM[0] FLT32 {0., 0., 0., 0.} 0: MOV TEMP[0].xy, IN[0].xyyy 1: TEX TEMP[0], TEMP[0], SAMP[0], 2D 2: MOV TEMP[1].xy, IN[0].xyyy 3: UARL ADDR[2].x, CONST[4]. 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D 5: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1] 6: MOV OUT[0], TEMP[0] 7: END Is that what you had in mind? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Fix glGetActiveAttribute for gl_VertexID when lowered.
The lower_vertex_id pass converts uses of the gl_VertexID system value to the gl_BaseVertex and gl_VertexIDMESA system values. Since gl_VertexID is no longer accessed, it would not be considered active. Of course, it should be, since the shader uses gl_VertexID. v2: Move the var-name dereference past the var != NULL check. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/main/shader_query.cpp | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) [15:31] anujp idr: gles3conform-v5-wip is not doing well. getactiveattribute_index_more_than_num_attribs test segfaults in src/mesa/main/shader_query.cpp:132. You can reproduce it on IVB too. [15:43] idr aw crap. [15:43] idr anujp: I guess Kayden's series needs a bit more work. :( This fixes that test. I haven't run other tests on v2 of this, but I don't really think I need to. I literally just moved the var_name declaration down a bit. diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp index 4871d09..766ad29 100644 --- a/src/mesa/main/shader_query.cpp +++ b/src/mesa/main/shader_query.cpp @@ -93,6 +93,7 @@ is_active_attrib(const ir_variable *var) * and gl_InstanceID. */ return var-data.location == SYSTEM_VALUE_VERTEX_ID || + var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE || var-data.location == SYSTEM_VALUE_INSTANCE_ID; default: @@ -133,7 +134,18 @@ _mesa_GetActiveAttrib(GLhandleARB program, GLuint desired_index, continue; if (current_index == desired_index) { -_mesa_copy_string(name, maxLength, length, var-name); + const char *var_name = var-name; + + /* Since gl_VertexID may be lowered to gl_VertexIDMESA, we need to + * consider gl_VertexIDMESA as gl_VertexID for purposes of checking + * active attributes. + */ + if (var-data.mode == ir_var_system_value + var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) { +var_name = gl_VertexID; + } + +_mesa_copy_string(name, maxLength, length, var_name); if (size) *size = (var-type-is_array()) ? var-type-length : 1; -- 2.0.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gbm: Fix gallium build when X11 is in a non-system directory
pipe-loader.h will include Xlib.h when HAVE_PIPE_LOADER_XLIB is set in the build. --- src/gallium/state_trackers/gbm/Makefile.am | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/state_trackers/gbm/Makefile.am b/src/gallium/state_trackers/gbm/Makefile.am index 4d9f3fe..50995b3 100644 --- a/src/gallium/state_trackers/gbm/Makefile.am +++ b/src/gallium/state_trackers/gbm/Makefile.am @@ -25,6 +25,7 @@ include $(top_srcdir)/src/gallium/Automake.inc AM_CFLAGS = \ $(GALLIUM_CFLAGS) \ + $(X11_INCLUDES) \ $(VISIBILITY_CFLAGS) AM_CPPFLAGS = \ -- 2.0.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Fix glGetActiveAttribute for gl_VertexID when lowered.
On 08/09/2014 10:54 AM, Kenneth Graunke wrote: The lower_vertex_id pass converts uses of the gl_VertexID system value to the gl_BaseVertex and gl_VertexIDMESA system values. Since gl_VertexID is no longer accessed, it would not be considered active. Of course, it should be, since the shader uses gl_VertexID. v2: Move the var-name dereference past the var != NULL check. Right... which I didn't notice before because the var != NULL check is hidden in is_active_attrib. D'oh. Reviewed-by: Ian Romanick ian.d.roman...@intel.com Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/main/shader_query.cpp | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) [15:31] anujp idr: gles3conform-v5-wip is not doing well. getactiveattribute_index_more_than_num_attribs test segfaults in src/mesa/main/shader_query.cpp:132. You can reproduce it on IVB too. [15:43] idr aw crap. [15:43] idr anujp: I guess Kayden's series needs a bit more work. :( This fixes that test. I haven't run other tests on v2 of this, but I don't really think I need to. I literally just moved the var_name declaration down a bit. diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp index 4871d09..766ad29 100644 --- a/src/mesa/main/shader_query.cpp +++ b/src/mesa/main/shader_query.cpp @@ -93,6 +93,7 @@ is_active_attrib(const ir_variable *var) * and gl_InstanceID. */ return var-data.location == SYSTEM_VALUE_VERTEX_ID || + var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE || var-data.location == SYSTEM_VALUE_INSTANCE_ID; default: @@ -133,7 +134,18 @@ _mesa_GetActiveAttrib(GLhandleARB program, GLuint desired_index, continue; if (current_index == desired_index) { - _mesa_copy_string(name, maxLength, length, var-name); + const char *var_name = var-name; + + /* Since gl_VertexID may be lowered to gl_VertexIDMESA, we need to + * consider gl_VertexIDMESA as gl_VertexID for purposes of checking + * active attributes. + */ + if (var-data.mode == ir_var_system_value + var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) { +var_name = gl_VertexID; + } + + _mesa_copy_string(name, maxLength, length, var_name); if (size) *size = (var-type-is_array()) ? var-type-length : 1; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] BDW viewport extents + misc
I realize it hasn't even been a week yet, but my remaining 2 weeks until my sabbatical have just filled up, so if anyone needs me to rework this, the sooner you let me know the better. On Mon, Aug 04, 2014 at 12:24:00PM -0700, Ben Widawsky wrote: The patch commit messages and comments within the diffs explain the intricacies of viewport extents and clipping. So rather, here is the data for these patches. All of the following is for a Broadwell system (which introduced viewport extents). EGYPT PERF == No change WARSOW PERF === No change Add xonotic and trex to this list of no change. piglit == viewport extents only: spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: fail pass spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail pass spec/glsl-1.50/execution/geometry/max-input-components: fail pass viewport extents + gb clipping: spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: fail pass spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail pass all: spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: fail pass spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail pass As you can observe, there are no wins found here other than conformance. Given our understanding of the hardware, we expect these patches to produce a performance improvements for certain applications (specifically those which define viewports smaller than the drawing rectangle, but some other caveats apply on top of that). Ben Widawsky (4): i965/guardband: Improve comments for guardband clipping i965: Viewport extents on GEN8 i965/guardband: Enable for all viewport dimensions (GEN8+) i965/clip: Removing scissor atom src/mesa/drivers/dri/i965/gen6_clip_state.c | 29 +- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 52 ++--- 2 files changed, 57 insertions(+), 24 deletions(-) -- 2.0.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Ben Widawsky, Intel Open Source Technology Center ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] i965/guardband: Improve comments for guardband clipping
On Monday, August 04, 2014 12:24:01 PM Ben Widawsky wrote: While working in this part of the code I had a great deal of trouble understanding what it was trying to do, and matching it with the spec. (mostly due bad wording in the PRM). To help future people, I've cleaned up the wording and provided some ascii art. --- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c b/src/mesa/drivers/dri/i965/gen8_viewport_state.c index b366246..b5171e0 100644 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c @@ -71,10 +71,24 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw) * maximum screen space coordinates of a small object may larger, but we * have no way to enforce the object size other than through clipping. * - * If you're surprised that we set clip to -gbx to +gbx and it seems like - * we'll end up with 16384 wide, note that for a 8192-wide render target, - * we'll end up with a normal (-1, 1) clip volume that just covers the - * drawable. + * The goal is to create the maximum sized guardband (8K x 8K) with the + * viewport rectangle in the center of the guardband. This looks weird + * because the hardware wants coordinates that are scaled to the viewport + * in NDC. In other words, an 8K x 8K viewport would have [-1,1] for X and Y. + * A 4K viewport would be [-2,2], 2K := [-4,4] etc. + * + * + * |Guardband | + * | | + * | | + * | |viewport | | + * | | | | + * | | | | + * | |__| | + * | | + * | | + * |__| + * */ const float maximum_guardband_extent = 8192; float gbx = maximum_guardband_extent / ctx-ViewportArray[i].Width; Reviewed-by: Kenneth Graunke kenn...@whitecape.org and you can probably add a: Reviewed-by: Chris Forbes chr...@ijw.co.nz or at least Acked-by based on #intel-gfx chatter: [Friday, August 01, 2014] [07:24:53 PM] bwidawsk Kayden, chrisf: useful p atch? http://sprunge.us/SPeA [Friday, August 01, 2014] [07:25:48 PM] Kaydenlooks reasonable to me [Friday, August 01, 2014] [07:26:48 PM] chrisfsame here signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/clip: Removing scissor atom
On Monday, August 04, 2014 12:24:04 PM Ben Widawsky wrote: On GEN8, a change in scissor state does not effect anything for the clipper/sf hardware state. The hardware will always do the right thing once the viewport extents are programmed. We can therefore remove the unecessary state emission. Ken originally spotted this. Scissoring state affects the value of ctx-DrawBuffer-_Xmin, _Xmax, _Ymin, and _Ymax. So, _NEW_SCISSORS was actually necessary prior to your patch #2. Perhaps reword the commit message to be something like this: Now that we no longer use ctx-DrawBuffer-_Xmin and related fields to program the screen-space viewport extents, we don't depend on any scissoring state. So we can drop the _NEW_SCISSOR dependency. --- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c b/src/mesa/drivers/dri/i965/gen8_viewport_state.c index 04a4530..d7e06c4 100644 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c @@ -100,13 +100,17 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw) vp[10] = -gby; /* y-min */ vp[11] = gby; /* y-max */ - /* _NEW_SCISSOR | _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport + /* _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport * The hardware will take the intersection of the drawing rectangle, * scissor rectangle, and the viewport extents. We don't need to be * smart, and can therefore just program the viewport extents. */ float viewport_Xmax = ctx-ViewportArray[i].X + ctx-ViewportArray[i].Width; float viewport_Ymax = ctx-ViewportArray[i].Y + ctx-ViewportArray[i].Height; + if (viewport_Ymax ctx-DrawBuffer-_Ymax || +viewport_Xmax ctx-DrawBuffer-_Xmax) { +perf_debug(Using viewport extents for savings\n); + } I suspect you didn't mean to add this :) Please drop it, as using ctx-DrawBuffer-_Xmax actually introduces a dependency on _NEW_SCISSOR again. :) With those fixed, this would get a: Reviewed-by: Kenneth Graunke kenn...@whitecape.org if (render_to_fbo) { vp[12] = ctx-ViewportArray[i].X; vp[13] = viewport_Xmax - 1; @@ -130,7 +134,7 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw) const struct brw_tracked_state gen8_sf_clip_viewport = { .dirty = { - .mesa = _NEW_BUFFERS | _NEW_SCISSOR | _NEW_VIEWPORT, + .mesa = _NEW_BUFFERS | _NEW_VIEWPORT, .brw = BRW_NEW_BATCH, .cache = 0, }, signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965/guardband: Enable for all viewport dimensions (GEN8+)
On Monday, August 04, 2014 12:24:03 PM Ben Widawsky wrote: The goal of guardband clipping is to try to avoid 3d clipping because it is an expensive operation. When guardband clipping is disabled, all geometry that intersects the viewport is to the FF 3d clipper. Objects ^^^ is sent to? which are entirely enclosed within the viewport are said to be trivially accepted while those entirely outside of the viewport are, trivially rejected. When guardband clipping is turned on the above behavior is change such is changed or changes ^^^ that if the geometry is within the guardband, and intersects the viewport, it skips the 3d clipper. Prior to GEN8, this was problematic if the viewport was smaller than the screen as it could allow for rendering to occur outside of the viewport. That could be mitigated if the programmer specified a scissor region which was less than or equal to the viewport - but this is not required for correctness in OpenGL. In theory you could be clever with the guardband so as not to invoke this problem. We do not do this, and have no data that suggests we should bother (nor the converse data). With viewport extents in place on GEN8, it should be safe to turn on guardband clipping for all cases While here, add a comment to the code which confused me thoroughly. --- src/mesa/drivers/dri/i965/gen6_clip_state.c | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_clip_state.c b/src/mesa/drivers/dri/i965/gen6_clip_state.c index 52027e0..42e78a1 100644 --- a/src/mesa/drivers/dri/i965/gen6_clip_state.c +++ b/src/mesa/drivers/dri/i965/gen6_clip_state.c @@ -97,17 +97,26 @@ upload_clip_state(struct brw_context *brw) GEN6_USER_CLIP_CLIP_DISTANCES_SHIFT); dw2 |= GEN6_CLIP_GB_TEST; - for (unsigned i = 0; i ctx-Const.MaxViewports; i++) { - if (ctx-ViewportArray[i].X != 0 || - ctx-ViewportArray[i].Y != 0 || - ctx-ViewportArray[i].Width != (float) fb-Width || - ctx-ViewportArray[i].Height != (float) fb-Height) { - dw2 = ~GEN6_CLIP_GB_TEST; - if (brw-gen = 8) { -perf_debug(Disabling GB clipping due to lack of Gen8 viewport - clipping setup code. This should be fixed.\n); + /* If the viewport dimensions screen dimensions we have to disable screen dimensions sounds like 1920x1080 (etc) to me. How about drawable dimensions? +* guardband clipping. This is because as the code works today, the +* guardband rectangle is set to a fixed size. If that size is larger than +* the viewport (likely) Guardband clipping can potentially make geometry +* bypass the 3d clipping phase (geometry which intersects the viewport). +* This will possibly cause rendering to occur outside of the viewport, but +* inside the screen. +* +* On GEN8 we get a useful viewport extents test which allows us to +* ignore this entirely. +*/ The comment seems a bit choppy to me. How about something like: /* If the viewport dimensions are smaller than the drawable dimensions, * we have to disable guardband clipping prior to Gen8. We always program * the guardband to a fixed size, which is almost always larger than the * viewport. Any geometry which intersects the viewport but lies within * the guardband would bypass the 3D clipping stage, so it wouldn't be * clipped to the viewport. Rendering would happen beyond the viewport, * but still inside the drawable. * * Gen8+ introduces a viewport extents test which restricts rendering to * the viewport, so we can ignore this restriction. */ Regardless of whether you take my suggestion, this is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org + if (brw-gen 8) { + for (unsigned i = 0; i ctx-Const.MaxViewports; i++) { + if (ctx-ViewportArray[i].X != 0 || + ctx-ViewportArray[i].Y != 0 || + ctx-ViewportArray[i].Width != (float) fb-Width || + ctx-ViewportArray[i].Height != (float) fb-Height) { +dw2 = ~GEN6_CLIP_GB_TEST; +break; } - break; } } signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965: Viewport extents on GEN8
On Monday, August 04, 2014 12:24:02 PM Ben Widawsky wrote: Viewport extents are a 3rd rectangle that defines which pixels get discarded as part of the rasterization process. This can potentially improve performance by reducing cache usage, and freeing up PS cycles. I'm not sure about cache effects...pretty sure it doesn't save PS cycles. It also permits the use of guardband clipping in all cases (see later patch). The actual pixels drawn to the screen are an intersection of the drawing rectangle, the viewport extents, and the scissor rectangle. Scissor rectangle is not super important this discussion as it should ^^^ to/for? always help do the right thing provided the programmer uses it. switch (viewport dimensions, drawrect dimension) { case viewport drawing rectangle: no effects; break; case viewport == drawing rectangle: no effects; break; case viewport drawing rectangle: Pixels (after the viewport transformation but before expensive rastersizing and shading operations) which are outside of the viewport are discarded. As we discussed, the 3D clipper normally gets involved and trims off any geometry outside of the viewport, but within the drawing rectangle. So, expensive pixel shading operations would not happen regardless. I think the main point of this patch is your earlier comment: The actual pixels drawn to the screen are an intersection of the drawing rectangle, the viewport extents, and the scissor rectangle. The previous code programmed the viewport extents to the intersection of the viewport, drawing rectangle, and scissor rectangle. This is unnecessary, because the hardware does that intersection for us. So we should simply program it to the viewport. Also, please do change the title to include some sort of verb. For example, i965: Simplify Gen8+ viewport extents programming. Commit messages aside! Your code looks good, and is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org Thank you for doing this! } I am unable to find a test case where this improves performance, but in all my testing it doesn't hurt performance, and intuitively, it should not ever hurt performance. It also permits us to use the guardband more freely (see upcoming patch). v2: Updating commit message. --- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c b/src/mesa/drivers/dri/i965/gen8_viewport_state.c index b5171e0..04a4530 100644 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c @@ -100,17 +100,23 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw) vp[10] = -gby; /* y-min */ vp[11] = gby; /* y-max */ - /* _NEW_SCISSOR | _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport */ + /* _NEW_SCISSOR | _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport + * The hardware will take the intersection of the drawing rectangle, + * scissor rectangle, and the viewport extents. We don't need to be + * smart, and can therefore just program the viewport extents. + */ + float viewport_Xmax = ctx-ViewportArray[i].X + ctx-ViewportArray[i].Width; + float viewport_Ymax = ctx-ViewportArray[i].Y + ctx-ViewportArray[i].Height; if (render_to_fbo) { - vp[12] = ctx-DrawBuffer-_Xmin; - vp[13] = ctx-DrawBuffer-_Xmax - 1; - vp[14] = ctx-DrawBuffer-_Ymin; - vp[15] = ctx-DrawBuffer-_Ymax - 1; + vp[12] = ctx-ViewportArray[i].X; + vp[13] = viewport_Xmax - 1; + vp[14] = ctx-ViewportArray[i].Y; + vp[15] = viewport_Ymax - 1; } else { - vp[12] = ctx-DrawBuffer-_Xmin; - vp[13] = ctx-DrawBuffer-_Xmax - 1; - vp[14] = ctx-DrawBuffer-Height - ctx-DrawBuffer-_Ymax; - vp[15] = ctx-DrawBuffer-Height - ctx-DrawBuffer-_Ymin - 1; + vp[12] = ctx-ViewportArray[i].X; + vp[13] = viewport_Xmax - 1; + vp[14] = ctx-DrawBuffer-Height - viewport_Ymax; + vp[15] = ctx-DrawBuffer-Height - ctx-ViewportArray[i].Y - 1; } vp += 16; signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] r600g: remove useless r600_resource_va calls
On Thu, Aug 7, 2014 at 4:06 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Aug 6, 2014 at 5:49 PM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com R600-R700 don't support virtual memory. For consistency, it might be nice to use gpu_address here as well, but just set it to 0 for 6xx/7xx. Either way, series is: Well, r600_resource_va isn't used elsewhere in the file. These are only a few cases that seem to have been copied from the same evergreen code. I assumed somebody had forgotten to clean them up. Marek Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/r600/r600_state.c | 27 +-- 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 258ffd1..607b199 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -595,25 +595,22 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view *view, unsigned width0, unsigned height0) { - struct pipe_context *ctx = view-base.context; struct r600_texture *tmp = (struct r600_texture*)view-base.texture; - uint64_t va; int stride = util_format_get_blocksize(view-base.format); unsigned format, num_format, format_comp, endian; - unsigned offset = view-base.u.buf.first_element * stride; + uint64_t offset = view-base.u.buf.first_element * stride; unsigned size = (view-base.u.buf.last_element - view-base.u.buf.first_element + 1) * stride; r600_vertex_data_type(view-base.format, format, num_format, format_comp, endian); - va = r600_resource_va(ctx-screen, view-base.texture) + offset; view-tex_resource = tmp-resource; - view-skip_mip_address_reloc = true; - view-tex_resource_words[0] = va; + + view-tex_resource_words[0] = offset; view-tex_resource_words[1] = size - 1; - view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(va 32UL) | + view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(offset 32UL) | S_038008_STRIDE(stride) | S_038008_DATA_FORMAT(format) | S_038008_NUM_FORMAT_ALL(num_format) | @@ -1105,8 +1102,7 @@ static void r600_init_depth_surface(struct r600_context *rctx, /* use htile only for first level */ if (rtex-htile_buffer !level) { - uint64_t va = r600_resource_va(rctx-screen-b.b, rtex-htile_buffer-b.b); - surf-db_htile_data_base = va 8; + surf-db_htile_data_base = 0; surf-db_htile_surface = S_028D24_HTILE_WIDTH(1) | S_028D24_HTILE_HEIGHT(1) | S_028D24_FULL_CACHE(1) | @@ -1944,7 +1940,6 @@ static void r600_emit_shader_stages(struct r600_context *rctx, struct r600_atom static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a) { - struct pipe_screen *screen = rctx-b.b.screen; struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs; struct r600_gs_rings_state *state = (struct r600_gs_rings_state*)a; struct r600_resource *rbuffer; @@ -1955,8 +1950,7 @@ static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a) if (state-enable) { rbuffer =(struct r600_resource*)state-esgs_ring.buffer; - r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE, - (r600_resource_va(screen, rbuffer-b.b)) 8); + r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE, 0); radeon_emit(cs, PKT3(PKT3_NOP, 0, 0)); radeon_emit(cs, r600_context_bo_reloc(rctx-b, rctx-b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE, @@ -1965,8 +1959,7 @@ static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a) state-esgs_ring.buffer_size 8); rbuffer =(struct r600_resource*)state-gsvs_ring.buffer; - r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE, - (r600_resource_va(screen, rbuffer-b.b)) 8); + r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE, 0); radeon_emit(cs, PKT3(PKT3_NOP, 0, 0)); radeon_emit(cs, r600_context_bo_reloc(rctx-b, rctx-b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE, @@ -2644,8 +2637,7 @@ void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_GS, S_02887C_NUM_GPRS(rshader-bc.ngpr) |
Re: [Mesa-dev] [PATCH] r600g/compute: Fix Warnings
Hi Bruno, Sorry, I fixed the warnings by myself before I saw your patch. Marek On Thu, Aug 7, 2014 at 12:07 PM, Bruno Jiménez brunoji...@gmail.com wrote: I have followed the following convention: - Positions in the pool are now 'int' (start_in_dw and related) - Sizes are 'unsigned' (size_in_dw and related) - IDs are 'unsigned' The pool and item's status are left as uint32_t The shadow has been also left as a pointer to an uint32_t --- src/gallium/drivers/r600/compute_memory_pool.c | 56 +- src/gallium/drivers/r600/compute_memory_pool.h | 26 ++-- 2 files changed, 41 insertions(+), 41 deletions(-) diff --git a/src/gallium/drivers/r600/compute_memory_pool.c b/src/gallium/drivers/r600/compute_memory_pool.c index 0ee8ceb..8bcab00 100644 --- a/src/gallium/drivers/r600/compute_memory_pool.c +++ b/src/gallium/drivers/r600/compute_memory_pool.c @@ -76,7 +76,7 @@ static void compute_memory_pool_init(struct compute_memory_pool * pool, unsigned initial_size_in_dw) { - COMPUTE_DBG(pool-screen, * compute_memory_pool_init() initial_size_in_dw = %ld\n, + COMPUTE_DBG(pool-screen, * compute_memory_pool_init() initial_size_in_dw = %u\n, initial_size_in_dw); pool-size_in_dw = initial_size_in_dw; @@ -104,9 +104,9 @@ void compute_memory_pool_delete(struct compute_memory_pool* pool) * \param size_in_dw The size of the space we are looking for. * \return -1 on failure */ -int64_t compute_memory_prealloc_chunk( +int compute_memory_prealloc_chunk( struct compute_memory_pool* pool, - int64_t size_in_dw) + unsigned size_in_dw) { struct compute_memory_item *item; @@ -114,7 +114,7 @@ int64_t compute_memory_prealloc_chunk( assert(size_in_dw = pool-size_in_dw); - COMPUTE_DBG(pool-screen, * compute_memory_prealloc_chunk() size_in_dw = %ld\n, + COMPUTE_DBG(pool-screen, * compute_memory_prealloc_chunk() size_in_dw = %u\n, size_in_dw); LIST_FOR_EACH_ENTRY(item, pool-item_list, link) { @@ -139,13 +139,13 @@ int64_t compute_memory_prealloc_chunk( */ struct list_head *compute_memory_postalloc_chunk( struct compute_memory_pool* pool, - int64_t start_in_dw) + int start_in_dw) { struct compute_memory_item *item; struct compute_memory_item *next; struct list_head *next_link; - COMPUTE_DBG(pool-screen, * compute_memory_postalloc_chunck() start_in_dw = %ld\n, + COMPUTE_DBG(pool-screen, * compute_memory_postalloc_chunck() start_in_dw = %i\n, start_in_dw); /* Check if we can insert it in the front of the list */ @@ -181,12 +181,12 @@ struct list_head *compute_memory_postalloc_chunk( * \see compute_memory_finalize_pending */ int compute_memory_grow_defrag_pool(struct compute_memory_pool *pool, - struct pipe_context *pipe, int new_size_in_dw) + struct pipe_context *pipe, unsigned new_size_in_dw) { new_size_in_dw = align(new_size_in_dw, ITEM_ALIGNMENT); COMPUTE_DBG(pool-screen, * compute_memory_grow_defrag_pool() - new_size_in_dw = %d (%d bytes)\n, + new_size_in_dw = %u (%u bytes)\n, new_size_in_dw, new_size_in_dw * 4); assert(new_size_in_dw = pool-size_in_dw); @@ -274,17 +274,17 @@ int compute_memory_finalize_pending(struct compute_memory_pool* pool, { struct compute_memory_item *item, *next; - int64_t allocated = 0; - int64_t unallocated = 0; - int64_t last_pos; + unsigned allocated = 0; + unsigned unallocated = 0; + int last_pos; int err = 0; COMPUTE_DBG(pool-screen, * compute_memory_finalize_pending()\n); LIST_FOR_EACH_ENTRY(item, pool-item_list, link) { - COMPUTE_DBG(pool-screen, + list: offset = %i id = %i size = %i - (%i bytes)\n,item-start_in_dw, item-id, + COMPUTE_DBG(pool-screen, + list: offset = %i id = %u size = %u + (%u bytes)\n, item-start_in_dw, item-id, item-size_in_dw, item-size_in_dw * 4); } @@ -347,7 +347,7 @@ void compute_memory_defrag(struct compute_memory_pool *pool, struct pipe_context *pipe) { struct compute_memory_item *item; - int64_t last_pos; + int last_pos; COMPUTE_DBG(pool-screen, * compute_memory_defrag()\n); @@ -374,7 +374,7 @@ void compute_memory_defrag(struct compute_memory_pool *pool, */ int compute_memory_promote_item(struct compute_memory_pool *pool, struct compute_memory_item *item, struct pipe_context *pipe, - int64_t start_in_dw) + int start_in_dw) { struct pipe_screen *screen = (struct pipe_screen *)pool-screen; struct r600_context *rctx = (struct r600_context
[Mesa-dev] [PATCH] radeonsi: simplify constant buffer upload for big endian
From: Marek Olšák marek.ol...@amd.com Point util_memcpy_cpu_to_le32 to a buffer storage directly. --- src/gallium/drivers/radeonsi/si_descriptors.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 81ad14b..bfd2b76 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -649,20 +649,11 @@ void si_upload_const_buffer(struct si_context *sctx, struct r600_resource **rbuf const uint8_t *ptr, unsigned size, uint32_t *const_offset) { if (SI_BIG_ENDIAN) { - uint32_t *tmpPtr; - unsigned i; - - if (!(tmpPtr = malloc(size))) { - R600_ERR(Failed to allocate BE swap buffer.\n); - return; - } + void *tmpPtr; + u_upload_alloc(sctx-b.uploader, 0, size, const_offset, + (struct pipe_resource**)rbuffer, tmpPtr); util_memcpy_cpu_to_le32(tmpPtr, ptr, size); - - u_upload_data(sctx-b.uploader, 0, size, tmpPtr, const_offset, - (struct pipe_resource**)rbuffer); - - free(tmpPtr); } else { u_upload_data(sctx-b.uploader, 0, size, ptr, const_offset, (struct pipe_resource**)rbuffer); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] BDW viewport extents + misc
On Sat, Aug 09, 2014 at 12:07:58PM -0700, Ben Widawsky wrote: I realize it hasn't even been a week yet, but my remaining 2 weeks until my sabbatical have just filled up, so if anyone needs me to rework this, the sooner you let me know the better. Hi Ken. Thanks a lot for reviewing it. I meant to incorporate all the changes you requested. If you don't see one there, please let me know. I've pushed a rebased branch here: http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=bdw-extents Do you have any idea or comments about the fixed piglit tests? I haven't actually run the rebased branch yet to re-confirm the results. I was just curious... On Mon, Aug 04, 2014 at 12:24:00PM -0700, Ben Widawsky wrote: The patch commit messages and comments within the diffs explain the intricacies of viewport extents and clipping. So rather, here is the data for these patches. All of the following is for a Broadwell system (which introduced viewport extents). EGYPT PERF == No change WARSOW PERF === No change Add xonotic and trex to this list of no change. piglit == viewport extents only: spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: fail pass spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail pass spec/glsl-1.50/execution/geometry/max-input-components: fail pass viewport extents + gb clipping: spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: fail pass spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail pass all: spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: fail pass spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail pass As you can observe, there are no wins found here other than conformance. Given our understanding of the hardware, we expect these patches to produce a performance improvements for certain applications (specifically those which define viewports smaller than the drawing rectangle, but some other caveats apply on top of that). Ben Widawsky (4): i965/guardband: Improve comments for guardband clipping i965: Viewport extents on GEN8 i965/guardband: Enable for all viewport dimensions (GEN8+) i965/clip: Removing scissor atom src/mesa/drivers/dri/i965/gen6_clip_state.c | 29 +- src/mesa/drivers/dri/i965/gen8_viewport_state.c | 52 ++--- 2 files changed, 57 insertions(+), 24 deletions(-) -- 2.0.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Ben Widawsky, Intel Open Source Technology Center -- Ben Widawsky, Intel Open Source Technology Center ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/11] ARB_gpu_shader5 d/u sampler array indexing
The last big piece of ARB_gpu_shader5! This series adds support for using dynamically uniform expressions in sampler array indexing, as required by ARB_gpu_shader5. Gen7 is supported (including Haswell's high samplers). Gen8 support would be a straightforward addition for someone who has access to the hardware. Caveats: - Haswell sampler state pointer adjustment is not optimal. If we knew the possible range of samplers an instruction could reference in the generator, we could take the constant fixup path if the sampler array did not span a 16-sampler boundary. - The various shader-based workaround flags are taken from array index 0, in the non-const case. This causes wrong behavior if the flags differ between array elements. [Ken and I discussed this during planning, and this seemed a reasonable initial implementation.] ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] glsl: Allow dynamically uniform sampler array indexing with 4.0/gs5
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/glsl/ast_array_index.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp index 50f9987..f8dca80 100644 --- a/src/glsl/ast_array_index.cpp +++ b/src/glsl/ast_array_index.cpp @@ -213,6 +213,9 @@ _mesa_ast_array_index_to_hir(void *mem_ctx, * as using a loop counter as the index to an array of samplers. If the * loop in unrolled, the code should compile correctly. Instead, emit a * warning. + * + * In GLSL 4.00 / ARB_gpu_shader5, this requirement is relaxed again to allow + * indexing with dynamically uniform expressions. */ if (array-type-element_type()-is_sampler()) { if (!state-is_version(130, 100)) { @@ -227,7 +230,7 @@ _mesa_ast_array_index_to_hir(void *mem_ctx, expressions will be forbidden in GLSL 1.30 and later); } -} else { +} else if (!state-is_version(400, 0) !state-ARB_gpu_shader5_enable) { _mesa_glsl_error(loc, state, sampler arrays indexed with non-constant expressions is forbidden in GLSL 1.30 and -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/11] mesa: Add a new function for getting the nonconst sampler array index
If the array index is not a constant expression, the existing support will assume a zero offset (giving us the sampler index of the base of the array). For dynamically uniform indexing of sampler arrays, we need both that and the indexing expression. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/program/sampler.cpp | 11 +++ src/mesa/program/sampler.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/src/mesa/program/sampler.cpp b/src/mesa/program/sampler.cpp index e6532be..29a5408 100644 --- a/src/mesa/program/sampler.cpp +++ b/src/mesa/program/sampler.cpp @@ -134,3 +134,14 @@ _mesa_get_sampler_uniform_value(class ir_dereference *sampler, return shader_program-UniformStorage[location].sampler[shader].index + getname.offset; } + + +extern C class ir_rvalue * +_mesa_get_sampler_array_nonconst_index(class ir_dereference *sampler) +{ + ir_dereference_array *deref_arr = sampler-as_dereference_array(); + if (!deref_arr || deref_arr-array_index-as_constant()) + return NULL; + + return deref_arr-array_index; +} diff --git a/src/mesa/program/sampler.h b/src/mesa/program/sampler.h index 22467e9..8b7c3b6 100644 --- a/src/mesa/program/sampler.h +++ b/src/mesa/program/sampler.h @@ -27,3 +27,6 @@ int _mesa_get_sampler_uniform_value(class ir_dereference *sampler, struct gl_shader_program *shader_program, const struct gl_program *prog); + +class ir_rvalue * +_mesa_get_sampler_array_nonconst_index(class ir_dereference *sampler); -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/11] i965: Generalize sampler state pointer mangling for non-const
For now, assume that the addressed sampler can be in any of the 16-sampler banks. If we preserved range information this far, we could avoid emitting these instructions if the sampler were known to be contained within one bank. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 29230a6..75e14c1 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2304,7 +2304,19 @@ void brw_adjust_sampler_state_pointer(struct brw_compile *p, brw_imm_ud(16 * (sampler / 16) * sampler_state_size)); } } else { - /* XXX: Non-const sampler array indexing case */ + /* Non-const sampler array indexing case */ + if (brw-gen 8 !brw-is_haswell) { + return; + } + + struct brw_reg temp = vec1(retype(scratch, BRW_REGISTER_TYPE_UD)); + + brw_AND(p, temp, sampler_index, brw_imm_ud(0x0f0)); + brw_SHL(p, temp, temp, brw_imm_ud(4)); + brw_ADD(p, + get_element_ud(header, 3), + get_element_ud(brw_vec8_grf(0, 0), 3), + temp); } } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] i965/vec4: Add support for non-const sampler indices in generator
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 52 +++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index d6ad8eb..c209e46 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -383,7 +383,57 @@ vec4_generator::generate_tex(vec4_instruction *inst, brw_mark_surface_used(prog_data-base, sampler + base_binding_table_index); } else { - /* XXX: Non-constant sampler index. */ + /* Non-constant sampler index. */ + /* Note: this clobbers `dst` as a temporary before emitting the send */ + + struct brw_reg addr = vec1(retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD)); + struct brw_reg temp = vec1(retype(dst, BRW_REGISTER_TYPE_UD)); + + struct brw_reg sampler_reg = vec1(retype(sampler_index, BRW_REGISTER_TYPE_UD)); + + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_set_default_access_mode(p, BRW_ALIGN_1); + + /* Some care required: `sampler` and `temp` may alias: + *addr = sampler 0xff + *temp = (sampler 8) 0xf00 + *addr = addr | temp + */ + brw_ADD(p, addr, sampler_reg, brw_imm_ud(base_binding_table_index)); + brw_SHL(p, temp, sampler_reg, brw_imm_ud(8u)); + brw_AND(p, temp, temp, brw_imm_ud(0x0f00)); + brw_AND(p, addr, addr, brw_imm_ud(0x0ff)); + brw_OR(p, addr, addr, temp); + + /* a0.0 |= descriptor */ + brw_inst *insn_or = brw_next_insn(p, BRW_OPCODE_OR); + brw_set_sampler_message(p, insn_or, + 0 /* surface */, + 0 /* sampler */, + msg_type, + 1 /* rlen */, + inst-mlen /* mlen */, + inst-header_present /* header */, + BRW_SAMPLER_SIMD_MODE_SIMD4X2, + return_format); + brw_inst_set_exec_size(p-brw, insn_or, BRW_EXECUTE_1); + brw_inst_set_src1_reg_type(p-brw, insn_or, BRW_REGISTER_TYPE_UD); + brw_set_src0(p, insn_or, addr); + brw_set_dest(p, insn_or, addr); + + + /* dst = send(offset, a0.0) */ + brw_inst *insn_send = brw_next_insn(p, BRW_OPCODE_SEND); + brw_set_dest(p, insn_send, dst); + brw_set_src0(p, insn_send, src); + brw_set_indirect_send_descriptor(p, insn_send, BRW_SFID_SAMPLER, addr); + + brw_pop_insn_state(p); + + /* visitor knows more than we do about the surface limit required, + * so has already done marking. + */ } } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] i965/vec4: Add support for nonconst sampler indexing in VS visitor
--- src/mesa/drivers/dri/i965/brw_vec4.h | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 53 +++--- 2 files changed, 49 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 9001286..4fe346c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -527,7 +527,7 @@ public: void emit_unpack_half_2x16(dst_reg dst, src_reg src0); uint32_t gather_channel(ir_texture *ir, uint32_t sampler); - src_reg emit_mcs_fetch(ir_texture *ir, src_reg coordinate, uint32_t sampler); + src_reg emit_mcs_fetch(ir_texture *ir, src_reg coordinate, src_reg sampler); void emit_gen6_gather_wa(uint8_t wa, dst_reg dst); void swizzle_result(ir_texture *ir, src_reg orig_val, uint32_t sampler); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 4760790..eedd862 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2299,7 +2299,7 @@ vec4_visitor::visit(ir_call *ir) } src_reg -vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, uint32_t sampler) +vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, src_reg sampler) { vec4_instruction *inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXF_MCS); inst-base_mrf = 2; @@ -2307,7 +2307,7 @@ vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, uint32_t sample inst-dst = dst_reg(this, glsl_type::uvec4_type); inst-dst.writemask = WRITEMASK_XYZW; - inst-src[1] = src_reg(sampler); + inst-src[1] = sampler; /* parameters are: u, v, r, lod; lod will always be zero due to api restrictions */ int param_base = inst-base_mrf; @@ -2324,12 +2324,55 @@ vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, uint32_t sample return src_reg(inst-dst); } +static bool +is_high_sampler(struct brw_context *brw, src_reg sampler) +{ + if (brw-gen 8 !brw-is_haswell) + return false; + + return sampler.file != IMM || sampler.fixed_hw_reg.dw1.ud = 16; +} + void vec4_visitor::visit(ir_texture *ir) { uint32_t sampler = _mesa_get_sampler_uniform_value(ir-sampler, shader_prog, prog); + ir_rvalue *nonconst_sampler_index = + _mesa_get_sampler_array_nonconst_index(ir-sampler); + + /* Handle non-constant sampler array indexing */ + src_reg sampler_reg; + if (nonconst_sampler_index) { + /* The highest sampler which may be used by this operation is + * the last element of the array. Mark it here, because the generator + * doesn't have enough information to determine the bound. + */ + uint32_t array_size = ir-sampler-as_dereference_array() + -array-type-array_size(); + + uint32_t max_used = sampler + array_size - 1; + if (ir-op == ir_tg4 brw-gen 8) { + max_used += prog_data-base.binding_table.gather_texture_start; + } else { + max_used += prog_data-base.binding_table.texture_start; + } + + brw_mark_surface_used(prog_data-base, max_used); + + /* Emit code to evaluate the actual indexing expression */ + nonconst_sampler_index-accept(this); + dst_reg temp(this, glsl_type::uint_type); + emit(ADD(temp, this-result, src_reg(sampler))); + sampler_reg = src_reg(temp); + } else { + /* Single sampler, or constant array index; the indexing expression + * is just an immediate. + */ + sampler_reg = src_reg(sampler); + } + /* When tg4 is used with the degenerate ZERO/ONE swizzles, don't bother * emitting anything other than setting up the constant result. */ @@ -2397,7 +2440,7 @@ vec4_visitor::visit(ir_texture *ir) sample_index_type = ir-lod_info.sample_index-type; if (brw-gen = 7 key-tex.compressed_multisample_layout_mask (1sampler)) - mcs = emit_mcs_fetch(ir, coordinate, sampler); + mcs = emit_mcs_fetch(ir, coordinate, sampler_reg); else mcs = src_reg(0u); break; @@ -2452,14 +2495,14 @@ vec4_visitor::visit(ir_texture *ir) */ inst-header_present = brw-gen 5 || inst-texture_offset != 0 || ir-op == ir_tg4 || - sampler = 16; + is_high_sampler(brw, sampler_reg); inst-base_mrf = 2; inst-mlen = inst-header_present + 1; /* always at least one */ inst-dst = dst_reg(this, ir-type); inst-dst.writemask = WRITEMASK_XYZW; inst-shadow_compare = ir-shadow_comparitor != NULL; - inst-src[1] = src_reg(sampler); + inst-src[1] = sampler_reg; /* MRF for the first parameter */ int param_base = inst-base_mrf + inst-header_present; -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/11] i965/vec4: Refactor generate_tex in prep for non-const samplers
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 47 +--- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index af7f8cc..d6ad8eb 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -314,11 +314,8 @@ vec4_generator::generate_tex(vec4_instruction *inst, assert(msg_type != -1); - assert(sampler_index.file == BRW_IMMEDIATE_VALUE); assert(sampler_index.type == BRW_REGISTER_TYPE_UD); - uint32_t sampler = sampler_index.dw1.ud; - /* Load the message header if present. If there's a texture offset, we need * to set it up explicitly and load the offset bitfield. Otherwise, we can * use an implied move from g0 to the first message register. @@ -363,25 +360,31 @@ vec4_generator::generate_tex(vec4_instruction *inst, break; } - uint32_t surface_index = ((inst-opcode == SHADER_OPCODE_TG4 || - inst-opcode == SHADER_OPCODE_TG4_OFFSET) - ? prog_data-base.binding_table.gather_texture_start - : prog_data-base.binding_table.texture_start) + sampler; - - brw_SAMPLE(p, - dst, - inst-base_mrf, - src, - surface_index, - sampler % 16, - msg_type, - 1, /* response length */ - inst-mlen, - inst-header_present, - BRW_SAMPLER_SIMD_MODE_SIMD4X2, - return_format); - - brw_mark_surface_used(prog_data-base, surface_index); + uint32_t base_binding_table_index = (inst-opcode == SHADER_OPCODE_TG4 || + inst-opcode == SHADER_OPCODE_TG4_OFFSET) + ? prog_data-base.binding_table.gather_texture_start + : prog_data-base.binding_table.texture_start; + + if (sampler_index.file == BRW_IMMEDIATE_VALUE) { + uint32_t sampler = sampler_index.dw1.ud; + + brw_SAMPLE(p, + dst, + inst-base_mrf, + src, + sampler + base_binding_table_index, + sampler % 16, + msg_type, + 1, /* response length */ + inst-mlen, + inst-header_present, + BRW_SAMPLER_SIMD_MODE_SIMD4X2, + return_format); + + brw_mark_surface_used(prog_data-base, sampler + base_binding_table_index); + } else { + /* XXX: Non-constant sampler index. */ + } } void -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] i965: Extract helper function for surface state pointer adjustment
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_eu.h | 5 src/mesa/drivers/dri/i965/brw_eu_emit.c | 35 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 17 +--- 3 files changed, 41 insertions(+), 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 4f2982d..84142ee 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -281,6 +281,11 @@ void brw_SAMPLE(struct brw_compile *p, unsigned simd_mode, unsigned return_format); +void brw_adjust_sampler_state_pointer(struct brw_compile *p, + struct brw_reg header, + struct brw_reg sampler_index, + struct brw_reg scratch); + void gen4_math(struct brw_compile *p, struct brw_reg dest, unsigned function, diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 7ea9313..29230a6 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2273,6 +2273,41 @@ void brw_SAMPLE(struct brw_compile *p, return_format); } +/* Adjust the message header's sampler state pointer to + * select the correct group of 16 samplers. + */ +void brw_adjust_sampler_state_pointer(struct brw_compile *p, + struct brw_reg header, + struct brw_reg sampler_index, + struct brw_reg scratch) +{ + /* The Sampler Index field can only store values between 0 and 15. +* However, we can add an offset to the Sampler State Pointer +* field, effectively selecting a different set of 16 samplers. +* +* The Sampler State Pointer needs to be aligned to a 32-byte +* offset, and each sampler state is only 16-bytes, so we can't +* exclusively use the offset - we have to use both. +*/ + + struct brw_context *brw = p-brw; + + if (sampler_index.file == BRW_IMMEDIATE_VALUE) { + const int sampler_state_size = 16; /* 16 bytes */ + uint32_t sampler = sampler_index.dw1.ud; + + if (sampler = 16) { + assert(brw-is_haswell || brw-gen = 8); + brw_ADD(p, + get_element_ud(header, 3), + get_element_ud(brw_vec8_grf(0, 0), 3), + brw_imm_ud(16 * (sampler / 16) * sampler_state_size)); + } + } else { + /* XXX: Non-const sampler array indexing case */ + } +} + /* All these variables are pretty confusing - we might be better off * using bitmasks and macros for this, in the old style. Or perhaps * just having the caller instantiate the fields in dword3 itself. diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index ed6857d..af7f8cc 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -344,22 +344,7 @@ vec4_generator::generate_tex(vec4_instruction *inst, brw_imm_ud(inst-texture_offset)); } - if (sampler = 16) { -/* The Sampler Index field can only store values between 0 and 15. - * However, we can add an offset to the Sampler State Pointer - * field, effectively selecting a different set of 16 samplers. - * - * The Sampler State Pointer needs to be aligned to a 32-byte - * offset, and each sampler state is only 16-bytes, so we can't - * exclusively use the offset - we have to use both. - */ -const int sampler_state_size = 16; /* 16 bytes */ -assert(brw-gen = 8 || brw-is_haswell); -brw_ADD(p, -get_element_ud(header, 3), -get_element_ud(brw_vec8_grf(0, 0), 3), -brw_imm_ud(16 * (sampler / 16) * sampler_state_size)); - } + brw_adjust_sampler_state_pointer(p, header, sampler_index, dst); brw_pop_insn_state(p); } } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] i965/fs: Add support for non-const sampler indices in generator
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 52 +- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 54f3287..cb359ba 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -597,7 +597,57 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src brw_mark_surface_used(prog_data-base, sampler + base_binding_table_index); } else { - /* XXX: Non-const sampler index */ + /* Non-const sampler index */ + /* Note: this clobbers `dst` as a temporary before emitting the send */ + + struct brw_reg addr = vec1(retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD)); + struct brw_reg temp = vec1(retype(dst, BRW_REGISTER_TYPE_UD)); + + struct brw_reg sampler_reg = vec1(retype(sampler_index, BRW_REGISTER_TYPE_UD)); + + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_set_default_access_mode(p, BRW_ALIGN_1); + + /* Some care required: `sampler` and `temp` may alias: + *addr = sampler 0xff + *temp = (sampler 8) 0xf00 + *addr = addr | temp + */ + brw_ADD(p, addr, sampler_reg, brw_imm_ud(base_binding_table_index)); + brw_SHL(p, temp, sampler_reg, brw_imm_ud(8u)); + brw_AND(p, temp, temp, brw_imm_ud(0x0f00)); + brw_AND(p, addr, addr, brw_imm_ud(0x0ff)); + brw_OR(p, addr, addr, temp); + + /* a0.0 |= descriptor */ + brw_inst *insn_or = brw_next_insn(p, BRW_OPCODE_OR); + brw_set_sampler_message(p, insn_or, + 0 /* surface */, + 0 /* sampler */, + msg_type, + rlen, + inst-mlen /* mlen */, + inst-header_present /* header */, + simd_mode, + return_format); + brw_inst_set_exec_size(p-brw, insn_or, BRW_EXECUTE_1); + brw_inst_set_src1_reg_type(p-brw, insn_or, BRW_REGISTER_TYPE_UD); + brw_set_src0(p, insn_or, addr); + brw_set_dest(p, insn_or, addr); + + + /* dst = send(offset, a0.0) */ + brw_inst *insn_send = brw_next_insn(p, BRW_OPCODE_SEND); + brw_set_dest(p, insn_send, dst); + brw_set_src0(p, insn_send, src); + brw_set_indirect_send_descriptor(p, insn_send, BRW_SFID_SAMPLER, addr); + + brw_pop_insn_state(p); + + /* visitor knows more than we do about the surface limit required, + * so has already done marking. + */ } } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/11] i965/fs: Add support for nonconst sampler indexing in FS visitor
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_fs.h | 4 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 60 2 files changed, 55 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index dfb13ea..105834a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -380,8 +380,8 @@ public: fs_reg sample_index, uint32_t sampler); fs_inst *emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, fs_reg shadow_comp, fs_reg lod, fs_reg lod2, - fs_reg sample_index, fs_reg mcs, uint32_t sampler); - fs_reg emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, uint32_t sampler); + fs_reg sample_index, fs_reg mcs, fs_reg sampler); + fs_reg emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, fs_reg sampler); void emit_gen6_gather_wa(uint8_t wa, fs_reg dst); fs_reg fix_math_operand(fs_reg src); fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 0b3a2bb..8644893 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1422,10 +1422,19 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg dst, fs_reg coordinate, return inst; } +static bool +is_high_sampler(struct brw_context *brw, fs_reg sampler) +{ + if (brw-gen 8 !brw-is_haswell) + return false; + + return sampler.file != IMM || sampler.fixed_hw_reg.dw1.ud = 16; +} + fs_inst * fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, fs_reg shadow_c, fs_reg lod, fs_reg lod2, - fs_reg sample_index, fs_reg mcs, uint32_t sampler) + fs_reg sample_index, fs_reg mcs, fs_reg sampler) { int reg_width = dispatch_width / 8; bool header_present = false; @@ -1436,7 +1445,8 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, } int length = 0; - if (ir-op == ir_tg4 || (ir-offset ir-op != ir_txf) || sampler = 16) { + if (ir-op == ir_tg4 || (ir-offset ir-op != ir_txf) || + is_high_sampler(brw, sampler)) { /* For general texture offsets (no txf workaround), we need a header to * put them in. Note that for SIMD16 we're making space for two actual * hardware registers here, so the emit will have to fix up for this. @@ -1611,7 +1621,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, default: unreachable(not reached); } - fs_inst *inst = emit(opcode, dst, src_payload, fs_reg(sampler)); + fs_inst *inst = emit(opcode, dst, src_payload, sampler); inst-base_mrf = -1; if (reg_width == 2) inst-mlen = length * reg_width - header_present; @@ -1744,7 +1754,7 @@ fs_visitor::rescale_texcoord(ir_texture *ir, fs_reg coordinate, /* Sample from the MCS surface attached to this multisample texture. */ fs_reg -fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, uint32_t sampler) +fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, fs_reg sampler) { int reg_width = dispatch_width / 8; int length = ir-coordinate-type-vector_elements; @@ -1762,7 +1772,7 @@ fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, uint32_t sampler) emit(LOAD_PAYLOAD(payload, sources, length)); - fs_inst *inst = emit(SHADER_OPCODE_TXF_MCS, dest, payload, fs_reg(sampler)); + fs_inst *inst = emit(SHADER_OPCODE_TXF_MCS, dest, payload, sampler); inst-base_mrf = -1; inst-mlen = length * reg_width; inst-header_present = false; @@ -1780,6 +1790,42 @@ fs_visitor::visit(ir_texture *ir) uint32_t sampler = _mesa_get_sampler_uniform_value(ir-sampler, shader_prog, prog); + + ir_rvalue *nonconst_sampler_index = + _mesa_get_sampler_array_nonconst_index(ir-sampler); + + /* Handle non-constant sampler array indexing */ + fs_reg sampler_reg; + if (nonconst_sampler_index) { + /* The highest sampler which may be used by this operation is + * the last element of the array. Mark it here, because the generator + * doesn't have enough information to determine the bound. + */ + uint32_t array_size = ir-sampler-as_dereference_array() + -array-type-array_size(); + + uint32_t max_used = sampler + array_size - 1; + if (ir-op == ir_tg4 brw-gen 8) { + max_used += prog_data-base.binding_table.gather_texture_start; + } else { + max_used += prog_data-base.binding_table.texture_start; + } + + brw_mark_surface_used(prog_data-base, max_used); + + /* Emit code to evaluate the actual indexing expression */ +
[Mesa-dev] [PATCH 08/11] i965/fs: Use brw_adjust_sampler_state_pointer in fs generator too
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 28b3525..a6e653e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -572,22 +572,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src brw_imm_ud(inst-texture_offset)); } - if (sampler = 16) { -/* The Sampler Index field can only store values between 0 and 15. - * However, we can add an offset to the Sampler State Pointer - * field, effectively selecting a different set of 16 samplers. - * - * The Sampler State Pointer needs to be aligned to a 32-byte - * offset, and each sampler state is only 16-bytes, so we can't - * exclusively use the offset - we have to use both. - */ -assert(brw-gen = 8 || brw-is_haswell); -const int sampler_state_size = 16; /* 16 bytes */ -brw_ADD(p, -get_element_ud(header_reg, 3), -get_element_ud(brw_vec8_grf(0, 0), 3), -brw_imm_ud(16 * (sampler / 16) * sampler_state_size)); - } + brw_adjust_sampler_state_pointer(p, header_reg, sampler_index, dst); brw_pop_insn_state(p); } } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] i965/fs: Refactor generate_tex in prep for nonconst sampler indexing
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 47 ++ 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index a6e653e..54f3287 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -537,11 +537,8 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src src.nr++; } - assert(sampler_index.file == BRW_IMMEDIATE_VALUE); assert(sampler_index.type == BRW_REGISTER_TYPE_UD); - uint32_t sampler = sampler_index.dw1.ud; - /* Load the message header if present. If there's a texture offset, * we need to set it up explicitly and load the offset bitfield. * Otherwise, we can use an implied move from g0 to the first message reg. @@ -577,25 +574,31 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src } } - uint32_t surface_index = ((inst-opcode == SHADER_OPCODE_TG4 || - inst-opcode == SHADER_OPCODE_TG4_OFFSET) - ? prog_data-base.binding_table.gather_texture_start - : prog_data-base.binding_table.texture_start) + sampler; - - brw_SAMPLE(p, - retype(dst, BRW_REGISTER_TYPE_UW), - inst-base_mrf, - src, - surface_index, - sampler % 16, - msg_type, - rlen, - inst-mlen, - inst-header_present, - simd_mode, - return_format); - - brw_mark_surface_used(prog_data-base, surface_index); + uint32_t base_binding_table_index = (inst-opcode == SHADER_OPCODE_TG4 || + inst-opcode == SHADER_OPCODE_TG4_OFFSET) + ? prog_data-base.binding_table.gather_texture_start + : prog_data-base.binding_table.texture_start; + + if (sampler_index.file == BRW_IMMEDIATE_VALUE) { + uint32_t sampler = sampler_index.dw1.ud; + + brw_SAMPLE(p, + retype(dst, BRW_REGISTER_TYPE_UW), + inst-base_mrf, + src, + sampler + base_binding_table_index, + sampler % 16, + msg_type, + rlen, + inst-mlen, + inst-header_present, + simd_mode, + return_format); + + brw_mark_surface_used(prog_data-base, sampler + base_binding_table_index); + } else { + /* XXX: Non-const sampler index */ + } } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] BDW viewport extents + misc
On Saturday, August 09, 2014 06:15:34 PM Ben Widawsky wrote: On Sat, Aug 09, 2014 at 12:07:58PM -0700, Ben Widawsky wrote: I realize it hasn't even been a week yet, but my remaining 2 weeks until my sabbatical have just filled up, so if anyone needs me to rework this, the sooner you let me know the better. Hi Ken. Thanks a lot for reviewing it. I meant to incorporate all the changes you requested. If you don't see one there, please let me know. I've pushed a rebased branch here: http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=bdw-extents Looks okay to me - I'd go ahead and push these. Do you have any idea or comments about the fixed piglit tests? I haven't actually run the rebased branch yet to re-confirm the results. I was just curious... Oh, I missed that. vs-max-ivec4-int sounds bizarre, but the others sound believable. I recall there were a few ARB_viewport_array tests I hadn't figured out, and this seems like the right fix. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/st: enable ARB_gpu_shader5 if the reported GLSL version = 400
The ARB_gpu_shader5 extension is made up of a lot of small sub-parts. Instead of adding PIPE_CAP's for each of these, just rely on the GLSL version reported by the pipe driver. The remaining extensions lend themselves naturally to being checked through a single CAP. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- The only issue will happen when GLSL 4.00 is supported for real and so we can't rely on the st to clamp it down to 330. (Although perhaps the mesa version limiting logic would kick in then as well?) The alternative is to add a ARB_GS5 cap, or a bunch of little ones for the sub-parts, but that doesn't seem too useful. src/mesa/state_tracker/st_extensions.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 7ac4840..51fd71a 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -826,4 +826,7 @@ void st_init_extensions(struct st_context *st) } if (ctx-Const.MaxProgramTextureGatherComponents 0) ctx-Extensions.ARB_texture_gather = GL_TRUE; + + if (glsl_feature_level = 400) + ctx-Extensions.ARB_gpu_shader5 = GL_TRUE; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: simplify constant buffer upload for big endian
On 10.08.2014 06:54, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com Point util_memcpy_cpu_to_le32 to a buffer storage directly. --- src/gallium/drivers/radeonsi/si_descriptors.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 81ad14b..bfd2b76 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -649,20 +649,11 @@ void si_upload_const_buffer(struct si_context *sctx, struct r600_resource **rbuf const uint8_t *ptr, unsigned size, uint32_t *const_offset) { if (SI_BIG_ENDIAN) { - uint32_t *tmpPtr; - unsigned i; - - if (!(tmpPtr = malloc(size))) { - R600_ERR(Failed to allocate BE swap buffer.\n); - return; - } + void *tmpPtr; + u_upload_alloc(sctx-b.uploader, 0, size, const_offset, +(struct pipe_resource**)rbuffer, tmpPtr); util_memcpy_cpu_to_le32(tmpPtr, ptr, size); - - u_upload_data(sctx-b.uploader, 0, size, tmpPtr, const_offset, - (struct pipe_resource**)rbuffer); - - free(tmpPtr); } else { u_upload_data(sctx-b.uploader, 0, size, ptr, const_offset, (struct pipe_resource**)rbuffer); Since util_memcpy_cpu_to_le32() is memcpy() on little endian, is there any point keeping the separate (non-)SI_BIG_ENDIAN paths here? Either way, Reviewed-by: Michel Dänzer michel.daen...@amd.com -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] docs: Mark off ARB_gpu_shader5 for i965
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- docs/GL3.txt | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 65facf5..29535ec 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -94,18 +94,18 @@ GL 4.0, GLSL 4.00: GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, r600, radeonsi, softpipe) GL_ARB_draw_indirect DONE (i965, nvc0, radeonsi, softpipe, llvmpipe) - GL_ARB_gpu_shader5 started + GL_ARB_gpu_shader5 DONE (i965) - 'precise' qualifierDONE - - Dynamically uniform sampler array indices started (Chris) - - Dynamically uniform UBO array indices DONE (i965) + - Dynamically uniform sampler array indices DONE () + - Dynamically uniform UBO array indices DONE () - Implicit signed - unsigned conversionsDONE - - Fused multiply-add DONE (i965, nvc0) - - Packing/bitfield/conversion functions DONE (i965, nvc0, r600) - - Enhanced textureGather DONE (i965, nvc0, r600, radeonsi) - - Geometry shader instancing DONE (i965, nvc0) - - Geometry shader multiple streams DONE (i965, nvc0) - - Enhanced per-sample shadingDONE (i965, r600) - - Interpolation functionsDONE (i965) + - Fused multiply-add DONE (nvc0) + - Packing/bitfield/conversion functions DONE (nvc0, r600) + - Enhanced textureGather DONE (nvc0, r600, radeonsi) + - Geometry shader instancing DONE (nvc0) + - Geometry shader multiple streams DONE (nvc0) + - Enhanced per-sample shadingDONE (r600) + - Interpolation functionsDONE () - New overload resolution rules DONE GL_ARB_gpu_shader_fp64 started (Dave) GL_ARB_sample_shadingDONE (i965, nv50, nvc0, radeonsi) -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] i965/eu: Change gen == 7 to gen = 7 in a couple brw_eu_emit.c cases.
Broadwell is going to use the brw_eu_emit.c code soon. We want to get the fake MRF handling and URB HWord channel mask handling. We don't need the CMP thread switch workaround, though. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index e894abd..038bfc6 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -97,7 +97,7 @@ gen7_convert_mrf_to_grf(struct brw_compile *p, struct brw_reg *reg) * registers required for messages with EOT. */ struct brw_context *brw = p-brw; - if (brw-gen == 7 reg-file == BRW_MESSAGE_REGISTER_FILE) { + if (brw-gen = 7 reg-file == BRW_MESSAGE_REGISTER_FILE) { reg-file = BRW_GENERAL_REGISTER_FILE; reg-nr += GEN7_MRF_HACK_START; } @@ -2265,7 +2265,7 @@ void brw_urb_WRITE(struct brw_compile *p, gen6_resolve_implied_move(p, src0, msg_reg_nr); - if (brw-gen == 7 !(flags BRW_URB_WRITE_USE_CHANNEL_MASKS)) { + if (brw-gen = 7 !(flags BRW_URB_WRITE_USE_CHANNEL_MASKS)) { /* Enable Channel Masks in the URB_WRITE_HWORD message header */ brw_push_insn_state(p); brw_set_default_access_mode(p, BRW_ALIGN_1); -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/11] i965/eu: Allow math on immediates on Broadwell.
Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 0a66d0c..9d7b4a5 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1886,7 +1886,8 @@ void gen6_math(struct brw_compile *p, assert(dest.file == BRW_GENERAL_REGISTER_FILE || (brw-gen = 7 dest.file == BRW_MESSAGE_REGISTER_FILE)); - assert(src0.file == BRW_GENERAL_REGISTER_FILE); + assert(src0.file == BRW_GENERAL_REGISTER_FILE || + (brw-gen = 8 src0.file == BRW_IMMEDIATE_VALUE)); assert(dest.hstride == BRW_HORIZONTAL_STRIDE_1); if (brw-gen == 6) { @@ -1899,12 +1900,14 @@ void gen6_math(struct brw_compile *p, function == BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER) { assert(src0.type != BRW_REGISTER_TYPE_F); assert(src1.type != BRW_REGISTER_TYPE_F); - assert(src1.file == BRW_GENERAL_REGISTER_FILE); + assert(src1.file == BRW_GENERAL_REGISTER_FILE || + (brw-gen = 8 src1.file == BRW_IMMEDIATE_VALUE)); } else { assert(src0.type == BRW_REGISTER_TYPE_F); assert(src1.type == BRW_REGISTER_TYPE_F); if (function == BRW_MATH_FUNCTION_POW) { - assert(src1.file == BRW_GENERAL_REGISTER_FILE); + assert(src1.file == BRW_GENERAL_REGISTER_FILE || +(brw-gen = 8 src1.file == BRW_IMMEDIATE_VALUE)); } else { assert(src1.file == BRW_ARCHITECTURE_REGISTER_FILE src1.nr == BRW_ARF_NULL); -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] i965/eu: Emulate F32TO16 and F16TO32 on Broadwell.
When we combine the Gen4-7 and Gen8+ generators, we'll need to handle half float packing/unpacking functions somehow. The Gen8+ generator code today just emulates the behavior of the Gen7 F32TO16/F16TO32 instructions, including the align16 mode bugs. Rather than messing with fs_generator/vec4_generator, I decided to just emulate the instructions at the brw_eu_emit.c layer. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 47 +++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 822c73c..5e14ba5 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1004,8 +1004,6 @@ ALU2(XOR) ALU2(SHR) ALU2(SHL) ALU2(ASR) -ALU1(F32TO16) -ALU1(F16TO32) ALU1(FRC) ALU1(RNDD) ALU2(MAC) @@ -1110,6 +1108,51 @@ brw_MUL(struct brw_compile *p, struct brw_reg dest, return brw_alu2(p, BRW_OPCODE_MUL, dest, src0, src1); } +brw_inst * +brw_F32TO16(struct brw_compile *p, struct brw_reg dst, struct brw_reg src) +{ + const struct brw_context *brw = p-brw; + + bool align16 = brw_inst_access_mode(brw, p-current) == BRW_ALIGN_16; + + if (align16) { + assert(dst.type == BRW_REGISTER_TYPE_UD); + } else { + assert(dst.type == BRW_REGISTER_TYPE_W || + dst.type == BRW_REGISTER_TYPE_UW || + dst.type == BRW_REGISTER_TYPE_HF); + } + + if (brw-gen = 8) { + if (align16) { + /* Emulate the Gen7 zeroing bug (see comments in vec4_visitor's + * emit_pack_half_2x16 method.) + */ + brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_UD), brw_imm_ud(0u)); + } + return brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_HF), src); + } else { + assert(brw-gen = 7); + return brw_alu1(p, BRW_OPCODE_F32TO16, dst, src); + } +} + +brw_inst * +brw_F16TO32(struct brw_compile *p, struct brw_reg dst, struct brw_reg src) +{ + const struct brw_context *brw = p-brw; + assert(src.type == BRW_REGISTER_TYPE_W || + src.type == BRW_REGISTER_TYPE_UW || + src.type == BRW_REGISTER_TYPE_HF); + + if (brw-gen = 8) { + return brw_MOV(p, dst, retype(src, BRW_REGISTER_TYPE_HF)); + } else { + assert(brw-gen = 7); + return brw_alu1(p, BRW_OPCODE_F16TO32, dst, src); + } +} + void brw_NOP(struct brw_compile *p) { -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] i965/eu: Set UIP on ELSE instructions on Broadwell.
Broadwell adds UIP on ELSE instructions. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 214ff0a..665fc07 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1434,6 +1434,12 @@ patch_IF_ELSE(struct brw_compile *p, /* The IF instruction's UIP and ELSE's JIP should point to ENDIF */ brw_inst_set_uip(brw, if_inst, br * (endif_inst - if_inst)); brw_inst_set_jip(brw, else_inst, br * (endif_inst - else_inst)); + if (brw-gen = 8) { +/* Since we don't set branch_ctrl, the ELSE's JIP and UIP both + * should point to ENDIF. + */ +brw_inst_set_uip(brw, else_inst, br * (endif_inst - else_inst)); + } } } } -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/11] i965/eu: Explicitly disable instruction compaction on Broadwell for now.
Until now, it's been off implicitly: we never call the compactor function. When we merge the generators, we'll start calling it, so we should make it do nothing. Matt will enable instruction compaction properly later. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_compact.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c b/src/mesa/drivers/dri/i965/brw_eu_compact.c index eec6454..625cfbb 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c @@ -711,7 +711,7 @@ brw_compact_instructions(struct brw_compile *p, int start_offset, */ int old_ip[(p-next_insn_offset - start_offset) / 8]; - if (brw-gen 6) + if (brw-gen 6 || brw-gen = 8) return; int src_offset; -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] Enable ARB_gpu_shader5 for i965/Gen7
[Applies on top of the dynamically uniform UBO indexing and sampler indexing series sent to the list recently] All the pieces are now in place for ARB_gpu_shader5 on Ivybridge/Baytrail/Haswell, and Broadwell is close. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/11] i965/eu: Use Haswell atomic messages on Broadwell.
Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 038bfc6..822c73c 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2511,7 +2511,7 @@ brw_set_dp_untyped_atomic_message(struct brw_compile *p, atomic_op | /* Atomic Operation Type: BRW_AOP_* */ (response_length ? 1 5 : 0); /* Return data expected */ - if (brw-is_haswell) { + if (brw-gen = 8 || brw-is_haswell) { brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1, msg_length, response_length, header_present, false); @@ -2574,7 +2574,7 @@ brw_set_dp_untyped_surface_read_message(struct brw_compile *p, (brw_inst_exec_size(brw, insn) == BRW_EXECUTE_16 ? 16 : 8); const unsigned num_channels = response_length / (dispatch_width / 8); - if (brw-is_haswell) { + if (brw-gen = 8 || brw-is_haswell) { brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1, msg_length, response_length, header_present, false); -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] i965: Broadwell using brw_eu_emit.c, part 1
Hello, This is a series of preparatory work for getting Broadwell to use the brw_eu_emit.c code generator rather than gen8_generator. The hope is to drop the separate Gen8+ code completely. We couldn't do this originally because the old code was centered around a struct describing the Gen4-7 assembly format, but now that we've shifted to the new brw_inst API, there's no reason we can't share code. This doesn't actually get us there - I've still got some bugs to track down, but these patches have been in my tree for a long time and seem pretty solid. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] i965/eu: Refactor jump distance scaling to use a helper function.
Different generations of hardware measure jump distances in different units. Previously, every function that needed to set a jump target open coded this scaling, or made a hardcoded assumption (i.e. just used 2). Most functions start with the number of instructions to jump, and scale up to the hardware-specific value. So, I made the function match that. Others start with a byte offset, and divide by a constant (8) to obtain the jump distance. This is actually 16 / 2 (the jump scale for Gen5-7). Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu.h | 2 ++ src/mesa/drivers/dri/i965/brw_eu_emit.c| 42 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 8 +++-- 3 files changed, 35 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 7efc028..fdf6e4c 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -316,6 +316,8 @@ void brw_shader_time_add(struct brw_compile *p, struct brw_reg payload, uint32_t surf_index); +unsigned brw_jump_scale(const struct brw_context *brw); + /* If/else/endif. Works by manipulating the execution flags on each * channel. */ diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 665fc07..3d9c96a 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1170,6 +1170,27 @@ void brw_NOP(struct brw_compile *p) * Comparisons, if/else/endif */ +/** + * Return the generation-specific jump distance scaling factor. + * + * Given the number of instructions to jump, we need to scale by + * some number to obtain the actual jump distance to program in an + * instruction. + */ +unsigned +brw_jump_scale(const struct brw_context *brw) +{ + /* Ironlake and later measure jump targets in 64-bit data chunks (in order +* (to support compaction), so each 128-bit instruction requires 2 chunks. +*/ + if (brw-gen = 5) + return 2; + + /* Gen4 simply uses the number of 128-bit instructions. */ + return 1; +} + + brw_inst * brw_JMPI(struct brw_compile *p, struct brw_reg index, unsigned predicate_control) @@ -1376,12 +1397,7 @@ patch_IF_ELSE(struct brw_compile *p, assert(endif_inst != NULL); assert(else_inst == NULL || brw_inst_opcode(brw, else_inst) == BRW_OPCODE_ELSE); - unsigned br = 1; - /* Jump count is for 64bit data chunk each, so one 128bit instruction -* requires 2 chunks. -*/ - if (brw-gen = 5) - br = 2; + unsigned br = brw_jump_scale(brw); assert(brw_inst_opcode(brw, endif_inst) == BRW_OPCODE_ENDIF); brw_inst_set_exec_size(brw, endif_inst, brw_inst_exec_size(brw, if_inst)); @@ -1678,7 +1694,7 @@ brw_patch_break_cont(struct brw_compile *p, brw_inst *while_inst) struct brw_context *brw = p-brw; brw_inst *do_inst = get_inner_do_insn(p); brw_inst *inst; - int br = (brw-gen == 5) ? 2 : 1; + unsigned br = brw_jump_scale(brw); assert(brw-gen 6); @@ -1702,10 +1718,7 @@ brw_WHILE(struct brw_compile *p) { struct brw_context *brw = p-brw; brw_inst *insn, *do_insn; - unsigned br = 1; - - if (brw-gen = 5) - br = 2; + unsigned br = brw_jump_scale(brw); if (brw-gen = 7) { insn = next_insn(p, BRW_OPCODE_WHILE); @@ -2389,7 +2402,7 @@ brw_find_loop_end(struct brw_compile *p, int start_offset) { struct brw_context *brw = p-brw; int offset; - int scale = 8; + int scale = 16 / brw_jump_scale(brw); void *store = p-store; assert(brw-gen = 6); @@ -2421,7 +2434,8 @@ brw_set_uip_jip(struct brw_compile *p) { struct brw_context *brw = p-brw; int offset; - int scale = 8; + int br = brw_jump_scale(brw); + int scale = 16 / br; void *store = p-store; if (brw-gen 6) @@ -2461,7 +2475,7 @@ brw_set_uip_jip(struct brw_compile *p) case BRW_OPCODE_ENDIF: if (block_end_offset == 0) -brw_inst_set_jip(brw, insn, 2); +brw_inst_set_jip(brw, insn, 1 * br); else brw_inst_set_jip(brw, insn, (block_end_offset - offset) / scale); break; diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index cec2e82..d3509a0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -67,6 +67,8 @@ fs_generator::patch_discard_jumps_to_fb_writes() if (brw-gen 6 || this-discard_halt_patches.is_empty()) return false; + int scale = brw_jump_scale(brw); + /* There is a somewhat strange undocumented requirement of using * HALT, according to the simulator. If some channel has HALTed to * a particular UIP, then by the end of the program, every channel @@ -79,8 +81,8 @@ fs_generator::patch_discard_jumps_to_fb_writes() *
[Mesa-dev] [PATCH 05/11] i965/eu: Port Broadwell CMP destination type hack to brw_eu_emit.c.
See gen8_generator::CMP(). Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 5e14ba5..33684d4 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1785,6 +1785,14 @@ void brw_CMP(struct brw_compile *p, struct brw_context *brw = p-brw; brw_inst *insn = next_insn(p, BRW_OPCODE_CMP); + if (brw-gen = 8) { + /* The CMP instruction appears to behave erratically for floating point + * sources unless the destination type is also float. Overriding it to + * match src0 makes it work in all cases. + */ + dest.type = src0.type; + } + brw_inst_set_cond_modifier(brw, insn, conditional); brw_set_dest(p, insn, dest); brw_set_src0(p, insn, src0); -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] docs: Update relnotes for ARB_gpu_shader5, sort list
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- docs/relnotes/10.3.html | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index f023ca6..c237601 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -44,21 +44,22 @@ Note: some of the new features are only available with certain drivers. /p ul +liGL_ARB_clear_texture on i965/li liGL_ARB_compressed_texture_pixel_storage on all drivers/li liGL_ARB_draw_indirect on nvc0, radeonsi/li liGL_ARB_explicit_uniform_location (all drivers that support GLSL)/li +liGL_ARB_fragment_layer_viewport on nv50, nvc0, llvmpipe, r600/li +liGL_ARB_gpu_shader5 on i965/gen7/li liGL_ARB_multi_draw_indirect on nvc0, radeonsi/li liGL_ARB_sample_shading on radeonsi/li +liGL_ARB_seamless_cubemap_per_texture on i965, llvmpipe, nvc0, r600, radeonsi, softpipe/li liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li liGL_ARB_texture_cube_map_array on radeonsi/li liGL_ARB_texture_gather on r600, radeonsi/li liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, softpipe/li liGL_ARB_texture_query_lod on r600, radeonsi/li liGL_ARB_viewport_array on nvc0/li -liGL_ARB_seamless_cubemap_per_texture on i965, llvmpipe, nvc0, r600, radeonsi, softpipe/li -liGL_ARB_fragment_layer_viewport on nv50, nvc0, llvmpipe, r600/li liGL_AMD_vertex_shader_viewport_index on i965/gen7+, r600/li -liGL_ARB_clear_texture on i965/li liA new software rasterizer driver (kms_swrast_dri.so) that works with DRM drivers that don't have a full-fledged GEM (such as qxl or simpledrm)/li /ul -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] i965/eu: Make it clear that brw_find_loop_end only runs on Gen6+.
It has Gen6+ knowledge baked in, and indeed is only called for Gen6+, but it wasn't immediately obvious that this was the case. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 33684d4..2e11eca 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2384,6 +2384,8 @@ brw_find_loop_end(struct brw_compile *p, int start_offset) int scale = 8; void *store = p-store; + assert(brw-gen = 6); + /* Always start after the instruction (such as a WHILE) we're trying to fix * up. */ -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965: Enable ARB_gpu_shader5 on Gen7
Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 4ee8636..04a47af 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -325,4 +325,7 @@ intelInitExtensions(struct gl_context *ctx) if (brw-gen = 7) ctx-Extensions.ARB_shader_atomic_counters = true; + + if (brw-gen == 7) + ctx-Extensions.ARB_gpu_shader5 = true; } -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] i965/eu: Update jump distance scaling for Broadwell.
Broadwell measures jump distances in bytes, so we need to scale by 16. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 1 file changed, 4 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 3d9c96a..0a66d0c 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1180,6 +1180,10 @@ void brw_NOP(struct brw_compile *p) unsigned brw_jump_scale(const struct brw_context *brw) { + /* Broadwell measures jump targets in bytes. */ + if (brw-gen = 8) + return 16; + /* Ironlake and later measure jump targets in 64-bit data chunks (in order * (to support compaction), so each 128-bit instruction requires 2 chunks. */ -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] mesa: Make ARB_gpu_shader5 core-profile-only
Requires GLSL 1.50 or higher, which we only support in the core profile. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/main/extensions.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 9ac8377..7a9214e 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -115,7 +115,7 @@ static const struct extension extension_table[] = { { GL_ARB_framebuffer_object, o(ARB_framebuffer_object), GL, 2005 }, { GL_ARB_framebuffer_sRGB,o(EXT_framebuffer_sRGB), GL, 1998 }, { GL_ARB_get_program_binary, o(dummy_true), GL, 2010 }, - { GL_ARB_gpu_shader5, o(ARB_gpu_shader5), GL, 2010 }, + { GL_ARB_gpu_shader5, o(ARB_gpu_shader5), GLC, 2010 }, { GL_ARB_half_float_pixel,o(dummy_true), GL, 2003 }, { GL_ARB_half_float_vertex, o(ARB_half_float_vertex), GL, 2008 }, { GL_ARB_instanced_arrays,o(ARB_instanced_arrays), GL, 2008 }, -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] i965/eu: Make it clear that brw_patch_break_count only runs on Gen4-5.
Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 2e11eca..214ff0a 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1674,6 +1674,8 @@ brw_patch_break_cont(struct brw_compile *p, brw_inst *while_inst) brw_inst *inst; int br = (brw-gen == 5) ? 2 : 1; + assert(brw-gen 6); + for (inst = while_inst - 1; inst != do_inst; inst--) { /* If the jump count is != 0, that means that this instruction has already * been patched because it's part of a loop inside of the one we're -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/11] i965/eu: Emulate F32TO16 and F16TO32 on Broadwell.
+ if (align16) { + /* Emulate the Gen7 zeroing bug (see comments in vec4_visitor's + * emit_pack_half_2x16 method.) + */ + brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_UD), brw_imm_ud(0u)); + } + return brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_HF), src); + } else { + assert(brw-gen = 7); This can be == 7. + return brw_alu1(p, BRW_OPCODE_F32TO16, dst, src); + } +} + +brw_inst * +brw_F16TO32(struct brw_compile *p, struct brw_reg dst, struct brw_reg src) +{ + const struct brw_context *brw = p-brw; + assert(src.type == BRW_REGISTER_TYPE_W || + src.type == BRW_REGISTER_TYPE_UW || + src.type == BRW_REGISTER_TYPE_HF); + + if (brw-gen = 8) { + return brw_MOV(p, dst, retype(src, BRW_REGISTER_TYPE_HF)); + } else { + assert(brw-gen = 7); Same here ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] i965: Broadwell using brw_eu_emit.c, part 1
For the series: Reviewed-by: Chris Forbes chr...@ijw.co.nz On Sun, Aug 10, 2014 at 9:28 AM, Kenneth Graunke kenn...@whitecape.org wrote: Hello, This is a series of preparatory work for getting Broadwell to use the brw_eu_emit.c code generator rather than gen8_generator. The hope is to drop the separate Gen8+ code completely. We couldn't do this originally because the old code was centered around a struct describing the Gen4-7 assembly format, but now that we've shifted to the new brw_inst API, there's no reason we can't share code. This doesn't actually get us there - I've still got some bugs to track down, but these patches have been in my tree for a long time and seem pretty solid. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev