Re: [Mesa-dev] [PATCH] clover: support CL_PROGRAM_BINARY_TYPE query

2014-08-09 Thread EdB
On Saturday, August 09, 2014 01:18:57 AM Ilia Mirkin wrote:
 On Fri, Aug 8, 2014 at 10:10 PM, EdB edb+m...@sigluy.net wrote:
  ---
  
   src/gallium/state_trackers/clover/api/program.cpp  | 3 +++
   src/gallium/state_trackers/clover/core/program.cpp | 8 
   src/gallium/state_trackers/clover/core/program.hpp | 1 +
   3 files changed, 12 insertions(+)
  
  diff --git a/src/gallium/state_trackers/clover/api/program.cpp
  b/src/gallium/state_trackers/clover/api/program.cpp index
  b81ce69..0e9e3c9 100644
  --- a/src/gallium/state_trackers/clover/api/program.cpp
  +++ b/src/gallium/state_trackers/clover/api/program.cpp
  @@ -266,6 +266,9 @@ clGetProgramBuildInfo(cl_program d_prog, cl_device_id
  d_dev, 
 buf.as_string() = prog.build_log(dev);
 break;
  
  +   case CL_PROGRAM_BINARY_TYPE:
  +  buf.as_scalarcl_program_binary_type() = prog.binary_type(dev);
 
 break?

Thanks
 
  +
  
  default:
 throw error(CL_INVALID_VALUE);
  
  }
  
  diff --git a/src/gallium/state_trackers/clover/core/program.cpp
  b/src/gallium/state_trackers/clover/core/program.cpp index
  e09c3aa..482df7e 100644
  --- a/src/gallium/state_trackers/clover/core/program.cpp
  +++ b/src/gallium/state_trackers/clover/core/program.cpp
  @@ -103,6 +103,14 @@ program::build_log(const device dev) const {
  
  return _logs.count(dev) ? _logs.find(dev)-second : ;
   
   }
  
  +cl_program_binary_type
  +program::binary_type(const device dev) const {
  +   if (!_binaries.count(dev))
  +  return CL_PROGRAM_BINARY_TYPE_NONE;
  +   else
  +  return CL_PROGRAM_BINARY_TYPE_EXECUTABLE;
  +}
  +
  
   const compat::vectormodule::symbol 
   program::symbols() const {
   
  if (_binaries.empty())
  
  diff --git a/src/gallium/state_trackers/clover/core/program.hpp
  b/src/gallium/state_trackers/clover/core/program.hpp index
  1081454..b932b95 100644
  --- a/src/gallium/state_trackers/clover/core/program.hpp
  +++ b/src/gallium/state_trackers/clover/core/program.hpp
  @@ -57,6 +57,7 @@ namespace clover {
  
 cl_build_status build_status(const device dev) const;
 std::string build_opts(const device dev) const;
 std::string build_log(const device dev) const;
  
  +  cl_program_binary_type binary_type(const device dev) const;
  
 const compat::vectormodule::symbol symbols() const;
  
  --
  2.0.4
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] clover: support CL_PROGRAM_BINARY_TYPE query

2014-08-09 Thread EdB
---
 src/gallium/state_trackers/clover/api/program.cpp  | 4 
 src/gallium/state_trackers/clover/core/program.cpp | 8 
 src/gallium/state_trackers/clover/core/program.hpp | 1 +
 3 files changed, 13 insertions(+)

diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
b/src/gallium/state_trackers/clover/api/program.cpp
index b81ce69..c3fe129 100644
--- a/src/gallium/state_trackers/clover/api/program.cpp
+++ b/src/gallium/state_trackers/clover/api/program.cpp
@@ -266,6 +266,10 @@ clGetProgramBuildInfo(cl_program d_prog, cl_device_id 
d_dev,
   buf.as_string() = prog.build_log(dev);
   break;
 
+   case CL_PROGRAM_BINARY_TYPE:
+  buf.as_scalarcl_program_binary_type() = prog.binary_type(dev);
+  break;
+
default:
   throw error(CL_INVALID_VALUE);
}
diff --git a/src/gallium/state_trackers/clover/core/program.cpp 
b/src/gallium/state_trackers/clover/core/program.cpp
index e09c3aa..482df7e 100644
--- a/src/gallium/state_trackers/clover/core/program.cpp
+++ b/src/gallium/state_trackers/clover/core/program.cpp
@@ -103,6 +103,14 @@ program::build_log(const device dev) const {
return _logs.count(dev) ? _logs.find(dev)-second : ;
 }
 
+cl_program_binary_type
+program::binary_type(const device dev) const {
+   if (!_binaries.count(dev))
+  return CL_PROGRAM_BINARY_TYPE_NONE;
+   else
+  return CL_PROGRAM_BINARY_TYPE_EXECUTABLE;
+}
+
 const compat::vectormodule::symbol 
 program::symbols() const {
if (_binaries.empty())
diff --git a/src/gallium/state_trackers/clover/core/program.hpp 
b/src/gallium/state_trackers/clover/core/program.hpp
index 1081454..b932b95 100644
--- a/src/gallium/state_trackers/clover/core/program.hpp
+++ b/src/gallium/state_trackers/clover/core/program.hpp
@@ -57,6 +57,7 @@ namespace clover {
   cl_build_status build_status(const device dev) const;
   std::string build_opts(const device dev) const;
   std::string build_log(const device dev) const;
+  cl_program_binary_type binary_type(const device dev) const;
 
   const compat::vectormodule::symbol symbols() const;
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-08-09 Thread Niels Ole Salscheider
On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote:
 Could you please do this without changing u_upload_mgr? You can still
 use u_upload_alloc to allocate buffer memory in the driver and the map
 buffer read/write flags are not important with persistent coherent
 buffer mappings anyway.

Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU 
streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM.
We should therefore set buffer.usage to PIPE_USAGE_STAGING in 
u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we 
won't get any performance improvements.
Would it now be OK to change u_upload_mgr or do you have a better proposal?

Ole

 Marek
 
 On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider
 
 niels_...@salscheider-online.de wrote:
  Using the DMA engine for buffer downloads vastly improves performance.
  This is because reads from VRAM by the CPU are slow because of the high
  latency of the PCIe bus.
  
  The first patch allows u_upload_mgr to be used for downloads, too. The
  second patch then uses u_upload_mgr in the radeon driver for downloads.
  I considered to rename u_upload_mgr to u_transfer_mgr since it might be
  confusing that an upload manager can be used for downloads. But then
  again we also have transfers so that u_transfer_mgr might also be
  confusing. Thus, I decided not to rename it for now.
  
  Without these patches, the buffer_bandwidth benchmark from uCLbench gives
  me:
  
  ./buffer_bandwidth --size=2000 --iterations=100
  # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
  memory, 
  32 KB local memory)
  
  1/1 direct 2000 Bytes   759.29 MB/s(HD) 17.13 MB/s(DD)
  
  14.61 MB/s(DH)
  
  With these paches, the read performance is much better:
  
  ./buffer_bandwidth --size=2000 --iterations=100
  # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
  memory, 
  32 KB local memory)
  
  1/1 direct 2000 Bytes   759.90 MB/s(HD) 613.49 MB/s(DD)
  
  1841.07 MB/s(DH)
  
  Judging by these numbers, it might even make sense to use the DMA engine
  for larger buffer downloads...
  
  Niels Ole Salscheider (2):
util/u_upload_mgr: Allow to also use it for downloads
radeon: Use transfer manager for buffer downloads
   
   src/gallium/auxiliary/hud/hud_context.c |  3 +-
   src/gallium/auxiliary/util/u_blitter.c  |  3 +-
   src/gallium/auxiliary/util/u_upload_mgr.c   | 49 +++-
   src/gallium/auxiliary/util/u_upload_mgr.h   | 13 -
   src/gallium/auxiliary/util/u_vbuf.c |  3 +-
   src/gallium/auxiliary/vl/vl_compositor.c|  3 +-
   src/gallium/drivers/ilo/ilo_context.c   |  3 +-
   src/gallium/drivers/r300/r300_context.c |  3 +-
   src/gallium/drivers/radeon/r600_buffer_common.c | 78
   +++-- src/gallium/drivers/radeon/r600_pipe_common.c 
| 14 -
   src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
   src/mesa/state_tracker/st_context.c |  9 ++-
   12 files changed, 136 insertions(+), 46 deletions(-)
  
  --
  1.9.0
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-09 Thread Marek Olšák
Acked-by: Marek Olšák marek.ol...@amd.com

Marek

On Sat, Aug 9, 2014 at 7:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 So... can I get a review on this? It's the last bit needed for ARB_gs5
 (well, except for the actual setting of the extension bit to 1)

 The only additional change I have in my local version is that instead of

 +  inst-sampler_array_size =
 + ir-sampler-as_dereference_array()
 +-array-variable_referenced()-type-length;

 I use ir-sampler-as_dereference_array()-array-type-array_size()

 I sent out a few piglits and they pass with the nvc0 driver on both
 fermi and kepler hw (which do textures a little differently from one
 another).

   -ilia


 On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
   pc-MaxAddressRegs = pc-MaxNativeAddressRegs =
  _min(screen-get_shader_param(screen, sh, 
 PIPE_SHADER_CAP_MAX_ADDRS),
   MAX_PROGRAM_ADDRESS_REGS);

 Not really sure what that's referring to... ARB_vp/fp or something?

 Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0).


 Anyways, this is definitely a bit of a violation of that. OTOH, so is
 the indirect UBO indexing and indirect GS input access (assuming
 that's allowed), since those would use ADDR[1] and every driver
 (except nv30) returns 1, and sometimes 0 -- including
 nv50/nvc0/r600/radeonsi.

 So... dunno what the proper way to proceed is. Fix drivers to claim
 higher numbers? Continue the tradition of ignoring it and relying on
 the fact that GPU's that don't support it also won't support the
 features that cause it to get used?

 You don't have to worry about that for now. We can clean it up later.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-08-09 Thread Marek Olšák
You can try to do the allocation of the staging buffer with
pipe_buffer_create instead of u_upload_mgr. You can also use
u_suballocator, which is like a stripped out version of u_upload_mgr.
You would need another instance of u_upload_mgr anyway, because we'd
like to continue using PIPE_USAGE_STREAM for uploads.

Marek

On Sat, Aug 9, 2014 at 2:35 PM, Niels Ole Salscheider
niels_...@salscheider-online.de wrote:
 On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote:
 Could you please do this without changing u_upload_mgr? You can still
 use u_upload_alloc to allocate buffer memory in the driver and the map
 buffer read/write flags are not important with persistent coherent
 buffer mappings anyway.

 Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU
 streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM.
 We should therefore set buffer.usage to PIPE_USAGE_STAGING in
 u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we
 won't get any performance improvements.
 Would it now be OK to change u_upload_mgr or do you have a better proposal?

 Ole

 Marek

 On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider

 niels_...@salscheider-online.de wrote:
  Using the DMA engine for buffer downloads vastly improves performance.
  This is because reads from VRAM by the CPU are slow because of the high
  latency of the PCIe bus.
 
  The first patch allows u_upload_mgr to be used for downloads, too. The
  second patch then uses u_upload_mgr in the radeon driver for downloads.
  I considered to rename u_upload_mgr to u_transfer_mgr since it might be
  confusing that an upload manager can be used for downloads. But then
  again we also have transfers so that u_transfer_mgr might also be
  confusing. Thus, I decided not to rename it for now.
 
  Without these patches, the buffer_bandwidth benchmark from uCLbench gives
  me:
 
  ./buffer_bandwidth --size=2000 --iterations=100
  # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
  memory,
  32 KB local memory)
 
  1/1 direct 2000 Bytes   759.29 MB/s(HD) 17.13 MB/s(DD)
 
  14.61 MB/s(DH)
 
  With these paches, the read performance is much better:
 
  ./buffer_bandwidth --size=2000 --iterations=100
  # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
  memory,
  32 KB local memory)
 
  1/1 direct 2000 Bytes   759.90 MB/s(HD) 613.49 MB/s(DD)
 
  1841.07 MB/s(DH)
 
  Judging by these numbers, it might even make sense to use the DMA engine
  for larger buffer downloads...
 
  Niels Ole Salscheider (2):
util/u_upload_mgr: Allow to also use it for downloads
radeon: Use transfer manager for buffer downloads
 
   src/gallium/auxiliary/hud/hud_context.c |  3 +-
   src/gallium/auxiliary/util/u_blitter.c  |  3 +-
   src/gallium/auxiliary/util/u_upload_mgr.c   | 49 +++-
   src/gallium/auxiliary/util/u_upload_mgr.h   | 13 -
   src/gallium/auxiliary/util/u_vbuf.c |  3 +-
   src/gallium/auxiliary/vl/vl_compositor.c|  3 +-
   src/gallium/drivers/ilo/ilo_context.c   |  3 +-
   src/gallium/drivers/r300/r300_context.c |  3 +-
   src/gallium/drivers/radeon/r600_buffer_common.c | 78
   +++-- src/gallium/drivers/radeon/r600_pipe_common.c
| 14 -
   src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
   src/mesa/state_tracker/st_context.c |  9 ++-
   12 files changed, 136 insertions(+), 46 deletions(-)
 
  --
  1.9.0
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-09 Thread Roland Scheidegger
On closer look, it looks to me like it wouldn't be all that difficult to
make proper sampler array dcls (as you apparently get them quite easily
from glsl) which is really my only problem with it but it's not really
all that important so
Reviewed-by: Roland Scheidegger srol...@vmware.com

Am 09.08.2014 07:52, schrieb Ilia Mirkin:
 So... can I get a review on this? It's the last bit needed for ARB_gs5
 (well, except for the actual setting of the extension bit to 1)
 
 The only additional change I have in my local version is that instead of
 
 +  inst-sampler_array_size =
 + ir-sampler-as_dereference_array()
 +-array-variable_referenced()-type-length;
 
 I use ir-sampler-as_dereference_array()-array-type-array_size()
 
 I sent out a few piglits and they pass with the nvc0 driver on both
 fermi and kepler hw (which do textures a little differently from one
 another).
 
   -ilia
 
 
 On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
   pc-MaxAddressRegs = pc-MaxNativeAddressRegs =
  _min(screen-get_shader_param(screen, sh, 
 PIPE_SHADER_CAP_MAX_ADDRS),
   MAX_PROGRAM_ADDRESS_REGS);

 Not really sure what that's referring to... ARB_vp/fp or something?

 Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0).


 Anyways, this is definitely a bit of a violation of that. OTOH, so is
 the indirect UBO indexing and indirect GS input access (assuming
 that's allowed), since those would use ADDR[1] and every driver
 (except nv30) returns 1, and sometimes 0 -- including
 nv50/nvc0/r600/radeonsi.

 So... dunno what the proper way to proceed is. Fix drivers to claim
 higher numbers? Continue the tradition of ignoring it and relying on
 the fact that GPU's that don't support it also won't support the
 features that cause it to get used?

 You don't have to worry about that for now. We can clean it up later.

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=Xj1ezgC9gYJQCVun05Qsf2AVkWdMTWcufiVpM6QH7Mk%3D%0As=03cd8d1ec34661de07f9c6aa8d6efef3163477a18a55dc4a89d2eaa7d73d16f5
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-09 Thread Ilia Mirkin
On Sat, Aug 9, 2014 at 10:14 AM, Roland Scheidegger srol...@vmware.com wrote:
 On closer look, it looks to me like it wouldn't be all that difficult to
 make proper sampler array dcls (as you apparently get them quite easily

If you can briefly outline how you think that should be done, I'd be
happy to try to do it. Right now the samplers are declared in
st_translate_program, which only has the program-samplers_used
bitfield (where program is glsl_to_tgsi_visitor).

One quick thing that occurs to me is to keep track of samplers in a
map of index - array size (0 if it's not accessed indirectly or not
an array), and then using the map to create the declarations. Does
that seem reasonable?

Either way, this seems like a nice-to-have rather than required, so
I'm going to push this patch even if the other one isn't ready.

 from glsl) which is really my only problem with it but it's not really
 all that important so
 Reviewed-by: Roland Scheidegger srol...@vmware.com

Thanks!


 Am 09.08.2014 07:52, schrieb Ilia Mirkin:
 So... can I get a review on this? It's the last bit needed for ARB_gs5
 (well, except for the actual setting of the extension bit to 1)

 The only additional change I have in my local version is that instead of

 +  inst-sampler_array_size =
 + ir-sampler-as_dereference_array()
 +-array-variable_referenced()-type-length;

 I use ir-sampler-as_dereference_array()-array-type-array_size()

 I sent out a few piglits and they pass with the nvc0 driver on both
 fermi and kepler hw (which do textures a little differently from one
 another).

   -ilia


 On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
   pc-MaxAddressRegs = pc-MaxNativeAddressRegs =
  _min(screen-get_shader_param(screen, sh, 
 PIPE_SHADER_CAP_MAX_ADDRS),
   MAX_PROGRAM_ADDRESS_REGS);

 Not really sure what that's referring to... ARB_vp/fp or something?

 Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0).


 Anyways, this is definitely a bit of a violation of that. OTOH, so is
 the indirect UBO indexing and indirect GS input access (assuming
 that's allowed), since those would use ADDR[1] and every driver
 (except nv30) returns 1, and sometimes 0 -- including
 nv50/nvc0/r600/radeonsi.

 So... dunno what the proper way to proceed is. Fix drivers to claim
 higher numbers? Continue the tradition of ignoring it and relying on
 the fact that GPU's that don't support it also won't support the
 features that cause it to get used?

 You don't have to worry about that for now. We can clean it up later.

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=Xj1ezgC9gYJQCVun05Qsf2AVkWdMTWcufiVpM6QH7Mk%3D%0As=03cd8d1ec34661de07f9c6aa8d6efef3163477a18a55dc4a89d2eaa7d73d16f5


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-09 Thread Roland Scheidegger
Am 09.08.2014 16:33, schrieb Ilia Mirkin:
 On Sat, Aug 9, 2014 at 10:14 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 On closer look, it looks to me like it wouldn't be all that difficult to
 make proper sampler array dcls (as you apparently get them quite easily
 
 If you can briefly outline how you think that should be done, I'd be
 happy to try to do it. Right now the samplers are declared in
 st_translate_program, which only has the program-samplers_used
 bitfield (where program is glsl_to_tgsi_visitor).
 
 One quick thing that occurs to me is to keep track of samplers in a
 map of index - array size (0 if it's not accessed indirectly or not
 an array), and then using the map to create the declarations. Does
 that seem reasonable?
 
Well I didn't think about it that much but nothing is stopping you from
putting whatever you need into the glsl_to_tgsi_visitor to keep that
array information instead of just filling out samplers_used.

 Either way, this seems like a nice-to-have rather than required, so
 I'm going to push this patch even if the other one isn't ready.
Agreed. The address thingy is also something which should go away as
discussed but really separate issue too.
btw you get the actual sampler type from the constant in the sampler reg
right? So if you have
4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D
that +1 in there says the 2nd declared sampler determines what kind of
sampler this is (2d, 3d, shadow...).

Roland


 
 from glsl) which is really my only problem with it but it's not really
 all that important so
 Reviewed-by: Roland Scheidegger srol...@vmware.com
 
 Thanks!
 

 Am 09.08.2014 07:52, schrieb Ilia Mirkin:
 So... can I get a review on this? It's the last bit needed for ARB_gs5
 (well, except for the actual setting of the extension bit to 1)

 The only additional change I have in my local version is that instead of

 +  inst-sampler_array_size =
 + ir-sampler-as_dereference_array()
 +-array-variable_referenced()-type-length;

 I use ir-sampler-as_dereference_array()-array-type-array_size()

 I sent out a few piglits and they pass with the nvc0 driver on both
 fermi and kepler hw (which do textures a little differently from one
 another).

   -ilia


 On Wed, Aug 6, 2014 at 12:02 PM, Marek Olšák mar...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
   pc-MaxAddressRegs = pc-MaxNativeAddressRegs =
  _min(screen-get_shader_param(screen, sh, 
 PIPE_SHADER_CAP_MAX_ADDRS),
   MAX_PROGRAM_ADDRESS_REGS);

 Not really sure what that's referring to... ARB_vp/fp or something?

 Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0).


 Anyways, this is definitely a bit of a violation of that. OTOH, so is
 the indirect UBO indexing and indirect GS input access (assuming
 that's allowed), since those would use ADDR[1] and every driver
 (except nv30) returns 1, and sometimes 0 -- including
 nv50/nvc0/r600/radeonsi.

 So... dunno what the proper way to proceed is. Fix drivers to claim
 higher numbers? Continue the tradition of ignoring it and relying on
 the fact that GPU's that don't support it also won't support the
 features that cause it to get used?

 You don't have to worry about that for now. We can clean it up later.

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-devk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0Am=Xj1ezgC9gYJQCVun05Qsf2AVkWdMTWcufiVpM6QH7Mk%3D%0As=03cd8d1ec34661de07f9c6aa8d6efef3163477a18a55dc4a89d2eaa7d73d16f5



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-09 Thread Ilia Mirkin
On Sat, Aug 9, 2014 at 11:12 AM, Roland Scheidegger srol...@vmware.com wrote:
 Am 09.08.2014 16:33, schrieb Ilia Mirkin:
 On Sat, Aug 9, 2014 at 10:14 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 On closer look, it looks to me like it wouldn't be all that difficult to
 make proper sampler array dcls (as you apparently get them quite easily

 If you can briefly outline how you think that should be done, I'd be
 happy to try to do it. Right now the samplers are declared in
 st_translate_program, which only has the program-samplers_used
 bitfield (where program is glsl_to_tgsi_visitor).

 One quick thing that occurs to me is to keep track of samplers in a
 map of index - array size (0 if it's not accessed indirectly or not
 an array), and then using the map to create the declarations. Does
 that seem reasonable?

 Well I didn't think about it that much but nothing is stopping you from
 putting whatever you need into the glsl_to_tgsi_visitor to keep that
 array information instead of just filling out samplers_used.

 Either way, this seems like a nice-to-have rather than required, so
 I'm going to push this patch even if the other one isn't ready.
 Agreed. The address thingy is also something which should go away as
 discussed but really separate issue too.
 btw you get the actual sampler type from the constant in the sampler reg
 right? So if you have
 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D
 that +1 in there says the 2nd declared sampler determines what kind of
 sampler this is (2d, 3d, shadow...).

From what I can tell, samplers (in TGSI) have no attached semantics...
they're just an index into an array that you're promised was set up
correctly beforehand. For emitting the instructions, we look at the 2D
from the tex instruction.

BTW, I have a follow-up patch that I'll send shortly which generates

FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL SAMP[1..3]
DCL CONST[4]
DCL TEMP[0..1], LOCAL
DCL ADDR[0..2]
IMM[0] FLT32 {0., 0., 0., 0.}
  0: MOV TEMP[0].xy, IN[0].xyyy
  1: TEX TEMP[0], TEMP[0], SAMP[0], 2D
  2: MOV TEMP[1].xy, IN[0].xyyy
  3: UARL ADDR[2].x, CONST[4].
  4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D
  5: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1]
  6: MOV OUT[0], TEMP[0]
  7: END

Is that what you had in mind?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Fix glGetActiveAttribute for gl_VertexID when lowered.

2014-08-09 Thread Kenneth Graunke
The lower_vertex_id pass converts uses of the gl_VertexID system value
to the gl_BaseVertex and gl_VertexIDMESA system values.  Since
gl_VertexID is no longer accessed, it would not be considered active.

Of course, it should be, since the shader uses gl_VertexID.

v2: Move the var-name dereference past the var != NULL check.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/main/shader_query.cpp | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

[15:31] anujp idr: gles3conform-v5-wip is not doing well. 
getactiveattribute_index_more_than_num_attribs test segfaults in 
src/mesa/main/shader_query.cpp:132. You can reproduce it on IVB too.
[15:43] idr aw crap.
[15:43] idr anujp: I guess Kayden's series needs a bit more work. :(

This fixes that test.  I haven't run other tests on v2 of this, but
I don't really think I need to.  I literally just moved the var_name
declaration down a bit.

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index 4871d09..766ad29 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -93,6 +93,7 @@ is_active_attrib(const ir_variable *var)
* and gl_InstanceID.
*/
   return var-data.location == SYSTEM_VALUE_VERTEX_ID ||
+ var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE ||
  var-data.location == SYSTEM_VALUE_INSTANCE_ID;
 
default:
@@ -133,7 +134,18 @@ _mesa_GetActiveAttrib(GLhandleARB program, GLuint 
desired_index,
  continue;
 
   if (current_index == desired_index) {
-_mesa_copy_string(name, maxLength, length, var-name);
+ const char *var_name = var-name;
+
+ /* Since gl_VertexID may be lowered to gl_VertexIDMESA, we need to
+  * consider gl_VertexIDMESA as gl_VertexID for purposes of checking
+  * active attributes.
+  */
+ if (var-data.mode == ir_var_system_value 
+ var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) {
+var_name = gl_VertexID;
+ }
+
+_mesa_copy_string(name, maxLength, length, var_name);
 
 if (size)
*size = (var-type-is_array()) ? var-type-length : 1;
-- 
2.0.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gbm: Fix gallium build when X11 is in a non-system directory

2014-08-09 Thread Eric Anholt
pipe-loader.h will include Xlib.h when HAVE_PIPE_LOADER_XLIB is set in the
build.
---
 src/gallium/state_trackers/gbm/Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/state_trackers/gbm/Makefile.am 
b/src/gallium/state_trackers/gbm/Makefile.am
index 4d9f3fe..50995b3 100644
--- a/src/gallium/state_trackers/gbm/Makefile.am
+++ b/src/gallium/state_trackers/gbm/Makefile.am
@@ -25,6 +25,7 @@ include $(top_srcdir)/src/gallium/Automake.inc
 
 AM_CFLAGS = \
$(GALLIUM_CFLAGS) \
+   $(X11_INCLUDES) \
$(VISIBILITY_CFLAGS)
 
 AM_CPPFLAGS = \
-- 
2.0.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Fix glGetActiveAttribute for gl_VertexID when lowered.

2014-08-09 Thread Ian Romanick
On 08/09/2014 10:54 AM, Kenneth Graunke wrote:
 The lower_vertex_id pass converts uses of the gl_VertexID system value
 to the gl_BaseVertex and gl_VertexIDMESA system values.  Since
 gl_VertexID is no longer accessed, it would not be considered active.
 
 Of course, it should be, since the shader uses gl_VertexID.
 
 v2: Move the var-name dereference past the var != NULL check.

Right... which I didn't notice before because the var != NULL check is
hidden in is_active_attrib.  D'oh.

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/main/shader_query.cpp | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)
 
 [15:31] anujp idr: gles3conform-v5-wip is not doing well. 
 getactiveattribute_index_more_than_num_attribs test segfaults in 
 src/mesa/main/shader_query.cpp:132. You can reproduce it on IVB too.
 [15:43] idr aw crap.
 [15:43] idr anujp: I guess Kayden's series needs a bit more work. :(
 
 This fixes that test.  I haven't run other tests on v2 of this, but
 I don't really think I need to.  I literally just moved the var_name
 declaration down a bit.
 
 diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
 index 4871d09..766ad29 100644
 --- a/src/mesa/main/shader_query.cpp
 +++ b/src/mesa/main/shader_query.cpp
 @@ -93,6 +93,7 @@ is_active_attrib(const ir_variable *var)
 * and gl_InstanceID.
 */
return var-data.location == SYSTEM_VALUE_VERTEX_ID ||
 + var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE ||
   var-data.location == SYSTEM_VALUE_INSTANCE_ID;
  
 default:
 @@ -133,7 +134,18 @@ _mesa_GetActiveAttrib(GLhandleARB program, GLuint 
 desired_index,
   continue;
  
if (current_index == desired_index) {
 -  _mesa_copy_string(name, maxLength, length, var-name);
 + const char *var_name = var-name;
 +
 + /* Since gl_VertexID may be lowered to gl_VertexIDMESA, we need to
 +  * consider gl_VertexIDMESA as gl_VertexID for purposes of checking
 +  * active attributes.
 +  */
 + if (var-data.mode == ir_var_system_value 
 + var-data.location == SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) {
 +var_name = gl_VertexID;
 + }
 +
 +  _mesa_copy_string(name, maxLength, length, var_name);
  
if (size)
   *size = (var-type-is_array()) ? var-type-length : 1;
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] BDW viewport extents + misc

2014-08-09 Thread Ben Widawsky
I realize it hasn't even been a week yet, but my remaining 2 weeks until
my sabbatical have just filled up, so if anyone needs me to rework this,
the sooner you let me know the better.

On Mon, Aug 04, 2014 at 12:24:00PM -0700, Ben Widawsky wrote:
 The patch commit messages and comments within the diffs explain the 
 intricacies
 of viewport extents and clipping. So rather, here is the data for these
 patches. All of the following is for a Broadwell system (which introduced
 viewport extents).
 
 EGYPT PERF
 ==
 No change
 
 WARSOW PERF
 ===
 No change
 

Add xonotic and trex to this list of no change.

 piglit
 ==
 viewport extents only:
 spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: 
 fail pass
 spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass
 spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail 
 pass
 spec/glsl-1.50/execution/geometry/max-input-components: fail pass
 
 viewport extents + gb clipping:
 spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: 
 fail pass
 spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass
 spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail 
 pass
 
 all:
 spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: 
 fail pass
 spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass
 spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: fail 
 pass
 
 As you can observe, there are no wins found here other than conformance. Given
 our understanding of the hardware, we expect these patches to produce a
 performance improvements for certain applications (specifically those which
 define viewports smaller than the drawing rectangle, but some other caveats
 apply on top of that).
 
 Ben Widawsky (4):
   i965/guardband: Improve comments for guardband clipping
   i965: Viewport extents on GEN8
   i965/guardband: Enable for all viewport dimensions (GEN8+)
   i965/clip: Removing scissor atom
 
  src/mesa/drivers/dri/i965/gen6_clip_state.c | 29 +-
  src/mesa/drivers/dri/i965/gen8_viewport_state.c | 52 
 ++---
  2 files changed, 57 insertions(+), 24 deletions(-)
 
 -- 
 2.0.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965/guardband: Improve comments for guardband clipping

2014-08-09 Thread Kenneth Graunke
On Monday, August 04, 2014 12:24:01 PM Ben Widawsky wrote:
 While working in this part of the code I had a great deal of trouble
 understanding what it was trying to do, and matching it with the spec.
 (mostly due bad wording in the PRM). To help future people, I've cleaned
 up the wording and provided some ascii art.
 ---
  src/mesa/drivers/dri/i965/gen8_viewport_state.c | 22 ++
  1 file changed, 18 insertions(+), 4 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c 
 b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 index b366246..b5171e0 100644
 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 @@ -71,10 +71,24 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw)
 * maximum screen space coordinates of a small object may larger, but 
 we
 * have no way to enforce the object size other than through clipping.
 *
 -   * If you're surprised that we set clip to -gbx to +gbx and it seems 
 like
 -   * we'll end up with 16384 wide, note that for a 8192-wide render 
 target,
 -   * we'll end up with a normal (-1, 1) clip volume that just covers the
 -   * drawable.
 +   * The goal is to create the maximum sized guardband (8K x 8K) with the
 +   * viewport rectangle in the center of the guardband. This looks weird
 +   * because the hardware wants coordinates that are scaled to the 
 viewport
 +   * in NDC. In other words, an 8K x 8K viewport would have [-1,1] for X 
 and Y.
 +   * A 4K viewport would be [-2,2], 2K := [-4,4] etc.
 +   *
 +   * 
 +   * |Guardband |
 +   * |  |
 +   * |  |
 +   * | |viewport  | |
 +   * | |  | |
 +   * | |  | |
 +   * | |__| |
 +   * |  |
 +   * |  |
 +   * |__|
 +   *
 */
const float maximum_guardband_extent = 8192;
float gbx = maximum_guardband_extent / ctx-ViewportArray[i].Width;
 

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

and you can probably add a:
Reviewed-by: Chris Forbes chr...@ijw.co.nz
or at least Acked-by

based on #intel-gfx chatter:

[Friday, August 01, 2014] [07:24:53 PM] bwidawsk  Kayden, chrisf: useful p
atch? http://sprunge.us/SPeA
[Friday, August 01, 2014] [07:25:48 PM] Kaydenlooks reasonable to me
[Friday, August 01, 2014] [07:26:48 PM] chrisfsame here

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965/clip: Removing scissor atom

2014-08-09 Thread Kenneth Graunke
On Monday, August 04, 2014 12:24:04 PM Ben Widawsky wrote:
 On GEN8, a change in scissor state does not effect anything for the
 clipper/sf hardware state. The hardware will always do the right thing
 once the viewport extents are programmed. We can therefore remove the
 unecessary state emission.
 
 Ken originally spotted this.

Scissoring state affects the value of ctx-DrawBuffer-_Xmin, _Xmax, _Ymin, and 
_Ymax.  So, _NEW_SCISSORS was actually necessary prior to your patch #2.

Perhaps reword the commit message to be something like this:

Now that we no longer use ctx-DrawBuffer-_Xmin and related fields to program 
the screen-space viewport extents, we don't depend on any scissoring state.  So 
we can drop the _NEW_SCISSOR dependency.

 ---
  src/mesa/drivers/dri/i965/gen8_viewport_state.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c 
 b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 index 04a4530..d7e06c4 100644
 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 @@ -100,13 +100,17 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw)
vp[10] = -gby; /* y-min */
vp[11] =  gby; /* y-max */
  
 -  /* _NEW_SCISSOR | _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport
 +  /* _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport
 * The hardware will take the intersection of the drawing rectangle,
 * scissor rectangle, and the viewport extents. We don't need to be
 * smart, and can therefore just program the viewport extents.
 */
float viewport_Xmax = ctx-ViewportArray[i].X + 
 ctx-ViewportArray[i].Width;
float viewport_Ymax = ctx-ViewportArray[i].Y + 
 ctx-ViewportArray[i].Height;
 +  if (viewport_Ymax  ctx-DrawBuffer-_Ymax ||
 +viewport_Xmax  ctx-DrawBuffer-_Xmax) {
 +perf_debug(Using viewport extents for savings\n);
 +  }

I suspect you didn't mean to add this :)  Please drop it, as using 
ctx-DrawBuffer-_Xmax actually introduces a dependency on _NEW_SCISSOR again. 
:)

With those fixed, this would get a:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

if (render_to_fbo) {
   vp[12] = ctx-ViewportArray[i].X;
   vp[13] = viewport_Xmax - 1;
 @@ -130,7 +134,7 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw)
  
  const struct brw_tracked_state gen8_sf_clip_viewport = {
 .dirty = {
 -  .mesa = _NEW_BUFFERS | _NEW_SCISSOR | _NEW_VIEWPORT,
 +  .mesa = _NEW_BUFFERS | _NEW_VIEWPORT,
.brw = BRW_NEW_BATCH,
.cache = 0,
 },
 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965/guardband: Enable for all viewport dimensions (GEN8+)

2014-08-09 Thread Kenneth Graunke
On Monday, August 04, 2014 12:24:03 PM Ben Widawsky wrote:
 The goal of guardband clipping is to try to avoid 3d clipping because it
 is an expensive operation. When guardband clipping is disabled, all
 geometry that intersects the viewport is to the FF 3d clipper. Objects

 ^^^ is sent to?

 which are entirely enclosed within the viewport are said to be
 trivially accepted while those entirely outside of the viewport are,
 trivially rejected.
 
 When guardband clipping is turned on the above behavior is change such

   is changed or changes ^^^

 that if the geometry is within the guardband, and intersects the
 viewport, it skips the 3d clipper. Prior to GEN8, this was problematic
 if the viewport was smaller than the screen as it could allow for
 rendering to occur outside of the viewport. That could be mitigated if
 the programmer specified a scissor region which was less than or equal
 to the viewport - but this is not required for correctness in OpenGL. In
 theory you could be clever with the guardband so as not to invoke this
 problem. We do not do this, and have no data that suggests we should
 bother (nor the converse data).
 
 With viewport extents in place on GEN8, it should be safe to turn on
 guardband clipping for all cases
 
 While here, add a comment to the code which confused me thoroughly.
 ---
  src/mesa/drivers/dri/i965/gen6_clip_state.c | 29 
 +++--
  1 file changed, 19 insertions(+), 10 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen6_clip_state.c 
 b/src/mesa/drivers/dri/i965/gen6_clip_state.c
 index 52027e0..42e78a1 100644
 --- a/src/mesa/drivers/dri/i965/gen6_clip_state.c
 +++ b/src/mesa/drivers/dri/i965/gen6_clip_state.c
 @@ -97,17 +97,26 @@ upload_clip_state(struct brw_context *brw)
 GEN6_USER_CLIP_CLIP_DISTANCES_SHIFT);
  
 dw2 |= GEN6_CLIP_GB_TEST;
 -   for (unsigned i = 0; i  ctx-Const.MaxViewports; i++) {
 -  if (ctx-ViewportArray[i].X != 0 ||
 -  ctx-ViewportArray[i].Y != 0 ||
 -  ctx-ViewportArray[i].Width != (float) fb-Width ||
 -  ctx-ViewportArray[i].Height != (float) fb-Height) {
 - dw2 = ~GEN6_CLIP_GB_TEST;
 - if (brw-gen = 8) {
 -perf_debug(Disabling GB clipping due to lack of Gen8 viewport 
 -   clipping setup code.  This should be fixed.\n);
 +   /* If the viewport dimensions  screen dimensions we have to disable

screen dimensions sounds like 1920x1080 (etc) to me.  How about drawable 
dimensions?

 +* guardband clipping. This is because as the code works today, the
 +* guardband rectangle is set to a fixed size. If that size is larger than
 +* the viewport (likely) Guardband clipping can potentially make geometry
 +* bypass the 3d clipping phase (geometry which intersects the viewport).
 +* This will possibly cause rendering to occur outside of the viewport, 
 but
 +* inside the screen.
 +*
 +* On GEN8 we get a useful viewport extents test which allows us to
 +* ignore this entirely.
 +*/

The comment seems a bit choppy to me.  How about something like:

/* If the viewport dimensions are smaller than the drawable dimensions,
 * we have to disable guardband clipping prior to Gen8.  We always program
 * the guardband to a fixed size, which is almost always larger than the
 * viewport.  Any geometry which intersects the viewport but lies within
 * the guardband would bypass the 3D clipping stage, so it wouldn't be
 * clipped to the viewport.  Rendering would happen beyond the viewport,
 * but still inside the drawable.
 *
 * Gen8+ introduces a viewport extents test which restricts rendering to
 * the viewport, so we can ignore this restriction.
 */

Regardless of whether you take my suggestion, this is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

 +   if (brw-gen  8) {
 +  for (unsigned i = 0; i  ctx-Const.MaxViewports; i++) {
 + if (ctx-ViewportArray[i].X != 0 ||
 + ctx-ViewportArray[i].Y != 0 ||
 + ctx-ViewportArray[i].Width != (float) fb-Width ||
 + ctx-ViewportArray[i].Height != (float) fb-Height) {
 +dw2 = ~GEN6_CLIP_GB_TEST;
 +break;
   }
 - break;
}
 }
  
 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965: Viewport extents on GEN8

2014-08-09 Thread Kenneth Graunke
On Monday, August 04, 2014 12:24:02 PM Ben Widawsky wrote:
 Viewport extents are a 3rd rectangle that defines which pixels get
 discarded as part of the rasterization process. This can potentially
 improve performance by reducing cache usage, and freeing up PS cycles.

I'm not sure about cache effects...pretty sure it doesn't save PS cycles.

 It also permits the use of guardband clipping in all cases (see later
 patch). The actual pixels drawn to the screen are an intersection of the
 drawing rectangle, the viewport extents, and the scissor rectangle.
 
 Scissor rectangle is not super important this discussion as it should
 ^^^ to/for?
 always help do the right thing provided the programmer uses it.
 
 switch (viewport dimensions, drawrect dimension) {
case viewport  drawing rectangle: no effects; break;
case viewport == drawing rectangle: no effects; break;
case viewport  drawing rectangle:
   Pixels (after the viewport transformation but before expensive
   rastersizing and shading operations) which are outside of the
   viewport are discarded.

As we discussed, the 3D clipper normally gets involved and trims off any 
geometry outside of the viewport, but within the drawing rectangle.  So, 
expensive pixel shading operations would not happen regardless.

I think the main point of this patch is your earlier comment:
The actual pixels drawn to the screen are an intersection of the
 drawing rectangle, the viewport extents, and the scissor rectangle.

The previous code programmed the viewport extents to the intersection of the 
viewport, drawing rectangle, and scissor rectangle.  This is unnecessary, 
because the hardware does that intersection for us.  So we should simply 
program it to the viewport.

Also, please do change the title to include some sort of verb.  For example,

   i965: Simplify Gen8+ viewport extents programming.

Commit messages aside!  Your code looks good, and is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

Thank you for doing this!

 }
 
 I am unable to find a test case where this improves performance, but in
 all my testing it doesn't hurt performance, and intuitively, it should
 not ever hurt performance. It also permits us to use the guardband more
 freely (see upcoming patch).
 
 v2: Updating commit message.
 ---
  src/mesa/drivers/dri/i965/gen8_viewport_state.c | 24 +++-
  1 file changed, 15 insertions(+), 9 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c 
 b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 index b5171e0..04a4530 100644
 --- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 +++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
 @@ -100,17 +100,23 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw)
vp[10] = -gby; /* y-min */
vp[11] =  gby; /* y-max */
  
 -  /* _NEW_SCISSOR | _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport 
 */
 +  /* _NEW_SCISSOR | _NEW_VIEWPORT | _NEW_BUFFERS: Screen Space Viewport
 +   * The hardware will take the intersection of the drawing rectangle,
 +   * scissor rectangle, and the viewport extents. We don't need to be
 +   * smart, and can therefore just program the viewport extents.
 +   */
 +  float viewport_Xmax = ctx-ViewportArray[i].X + 
 ctx-ViewportArray[i].Width;
 +  float viewport_Ymax = ctx-ViewportArray[i].Y + 
 ctx-ViewportArray[i].Height;
if (render_to_fbo) {
 - vp[12] = ctx-DrawBuffer-_Xmin;
 - vp[13] = ctx-DrawBuffer-_Xmax - 1;
 - vp[14] = ctx-DrawBuffer-_Ymin;
 - vp[15] = ctx-DrawBuffer-_Ymax - 1;
 + vp[12] = ctx-ViewportArray[i].X;
 + vp[13] = viewport_Xmax - 1;
 + vp[14] = ctx-ViewportArray[i].Y;
 + vp[15] = viewport_Ymax - 1;
} else {
 - vp[12] = ctx-DrawBuffer-_Xmin;
 - vp[13] = ctx-DrawBuffer-_Xmax - 1;
 - vp[14] = ctx-DrawBuffer-Height - ctx-DrawBuffer-_Ymax;
 - vp[15] = ctx-DrawBuffer-Height - ctx-DrawBuffer-_Ymin - 1;
 + vp[12] = ctx-ViewportArray[i].X;
 + vp[13] = viewport_Xmax - 1;
 + vp[14] = ctx-DrawBuffer-Height - viewport_Ymax;
 + vp[15] = ctx-DrawBuffer-Height - ctx-ViewportArray[i].Y - 1;
}
  
vp += 16;
 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] r600g: remove useless r600_resource_va calls

2014-08-09 Thread Marek Olšák
On Thu, Aug 7, 2014 at 4:06 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 5:49 PM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 R600-R700 don't support virtual memory.

 For consistency, it might be nice to use gpu_address here as well, but
 just set it to 0 for 6xx/7xx.  Either way, series is:

Well, r600_resource_va isn't used elsewhere in the file. These are
only a few cases that seem to have been copied from the same evergreen
code. I assumed somebody had forgotten to clean them up.

Marek

 Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---
  src/gallium/drivers/r600/r600_state.c | 27 +--
  1 file changed, 9 insertions(+), 18 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_state.c 
 b/src/gallium/drivers/r600/r600_state.c
 index 258ffd1..607b199 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -595,25 +595,22 @@ texture_buffer_sampler_view(struct 
 r600_pipe_sampler_view *view,
 unsigned width0, unsigned height0)

  {
 -   struct pipe_context *ctx = view-base.context;
 struct r600_texture *tmp = (struct r600_texture*)view-base.texture;
 -   uint64_t va;
 int stride = util_format_get_blocksize(view-base.format);
 unsigned format, num_format, format_comp, endian;
 -   unsigned offset = view-base.u.buf.first_element * stride;
 +   uint64_t offset = view-base.u.buf.first_element * stride;
 unsigned size = (view-base.u.buf.last_element - 
 view-base.u.buf.first_element + 1) * stride;

 r600_vertex_data_type(view-base.format,
   format, num_format, format_comp,
   endian);

 -   va = r600_resource_va(ctx-screen, view-base.texture) + offset;
 view-tex_resource = tmp-resource;
 -
 view-skip_mip_address_reloc = true;
 -   view-tex_resource_words[0] = va;
 +
 +   view-tex_resource_words[0] = offset;
 view-tex_resource_words[1] = size - 1;
 -   view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(va  32UL) |
 +   view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(offset  
 32UL) |
 S_038008_STRIDE(stride) |
 S_038008_DATA_FORMAT(format) |
 S_038008_NUM_FORMAT_ALL(num_format) |
 @@ -1105,8 +1102,7 @@ static void r600_init_depth_surface(struct 
 r600_context *rctx,

 /* use htile only for first level */
 if (rtex-htile_buffer  !level) {
 -   uint64_t va = r600_resource_va(rctx-screen-b.b, 
 rtex-htile_buffer-b.b);
 -   surf-db_htile_data_base = va  8;
 +   surf-db_htile_data_base = 0;
 surf-db_htile_surface = S_028D24_HTILE_WIDTH(1) |
 S_028D24_HTILE_HEIGHT(1) |
 S_028D24_FULL_CACHE(1) |
 @@ -1944,7 +1940,6 @@ static void r600_emit_shader_stages(struct 
 r600_context *rctx, struct r600_atom

  static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom 
 *a)
  {
 -   struct pipe_screen *screen = rctx-b.b.screen;
 struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs;
 struct r600_gs_rings_state *state = (struct r600_gs_rings_state*)a;
 struct r600_resource *rbuffer;
 @@ -1955,8 +1950,7 @@ static void r600_emit_gs_rings(struct r600_context 
 *rctx, struct r600_atom *a)

 if (state-enable) {
 rbuffer =(struct r600_resource*)state-esgs_ring.buffer;
 -   r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE,
 -   (r600_resource_va(screen, rbuffer-b.b))  
 8);
 +   r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE, 0);
 radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
 radeon_emit(cs, r600_context_bo_reloc(rctx-b, 
 rctx-b.rings.gfx, rbuffer,
   RADEON_USAGE_READWRITE,
 @@ -1965,8 +1959,7 @@ static void r600_emit_gs_rings(struct r600_context 
 *rctx, struct r600_atom *a)
 state-esgs_ring.buffer_size  8);

 rbuffer =(struct r600_resource*)state-gsvs_ring.buffer;
 -   r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE,
 -   (r600_resource_va(screen, rbuffer-b.b))  
 8);
 +   r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE, 0);
 radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
 radeon_emit(cs, r600_context_bo_reloc(rctx-b, 
 rctx-b.rings.gfx, rbuffer,
   RADEON_USAGE_READWRITE,
 @@ -2644,8 +2637,7 @@ void r600_update_gs_state(struct pipe_context *ctx, 
 struct r600_pipe_shader *sha
 r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_GS,
S_02887C_NUM_GPRS(rshader-bc.ngpr) |

Re: [Mesa-dev] [PATCH] r600g/compute: Fix Warnings

2014-08-09 Thread Marek Olšák
Hi Bruno,

Sorry, I fixed the warnings by myself before I saw your patch.

Marek

On Thu, Aug 7, 2014 at 12:07 PM, Bruno Jiménez brunoji...@gmail.com wrote:
 I have followed the following convention:
 - Positions in the pool are now 'int' (start_in_dw and related)
 - Sizes are 'unsigned' (size_in_dw and related)
 - IDs are 'unsigned'

 The pool and item's status are left as uint32_t
 The shadow has been also left as a pointer to an uint32_t
 ---
  src/gallium/drivers/r600/compute_memory_pool.c | 56 
 +-
  src/gallium/drivers/r600/compute_memory_pool.h | 26 ++--
  2 files changed, 41 insertions(+), 41 deletions(-)

 diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
 b/src/gallium/drivers/r600/compute_memory_pool.c
 index 0ee8ceb..8bcab00 100644
 --- a/src/gallium/drivers/r600/compute_memory_pool.c
 +++ b/src/gallium/drivers/r600/compute_memory_pool.c
 @@ -76,7 +76,7 @@ static void compute_memory_pool_init(struct 
 compute_memory_pool * pool,
 unsigned initial_size_in_dw)
  {

 -   COMPUTE_DBG(pool-screen, * compute_memory_pool_init() 
 initial_size_in_dw = %ld\n,
 +   COMPUTE_DBG(pool-screen, * compute_memory_pool_init() 
 initial_size_in_dw = %u\n,
 initial_size_in_dw);

 pool-size_in_dw = initial_size_in_dw;
 @@ -104,9 +104,9 @@ void compute_memory_pool_delete(struct 
 compute_memory_pool* pool)
   * \param size_in_dw   The size of the space we are looking for.
   * \return -1 on failure
   */
 -int64_t compute_memory_prealloc_chunk(
 +int compute_memory_prealloc_chunk(
 struct compute_memory_pool* pool,
 -   int64_t size_in_dw)
 +   unsigned size_in_dw)
  {
 struct compute_memory_item *item;

 @@ -114,7 +114,7 @@ int64_t compute_memory_prealloc_chunk(

 assert(size_in_dw = pool-size_in_dw);

 -   COMPUTE_DBG(pool-screen, * compute_memory_prealloc_chunk() 
 size_in_dw = %ld\n,
 +   COMPUTE_DBG(pool-screen, * compute_memory_prealloc_chunk() 
 size_in_dw = %u\n,
 size_in_dw);

 LIST_FOR_EACH_ENTRY(item, pool-item_list, link) {
 @@ -139,13 +139,13 @@ int64_t compute_memory_prealloc_chunk(
   */
  struct list_head *compute_memory_postalloc_chunk(
 struct compute_memory_pool* pool,
 -   int64_t start_in_dw)
 +   int start_in_dw)
  {
 struct compute_memory_item *item;
 struct compute_memory_item *next;
 struct list_head *next_link;

 -   COMPUTE_DBG(pool-screen, * compute_memory_postalloc_chunck() 
 start_in_dw = %ld\n,
 +   COMPUTE_DBG(pool-screen, * compute_memory_postalloc_chunck() 
 start_in_dw = %i\n,
 start_in_dw);

 /* Check if we can insert it in the front of the list */
 @@ -181,12 +181,12 @@ struct list_head *compute_memory_postalloc_chunk(
   * \see compute_memory_finalize_pending
   */
  int compute_memory_grow_defrag_pool(struct compute_memory_pool *pool,
 -   struct pipe_context *pipe, int new_size_in_dw)
 +   struct pipe_context *pipe, unsigned new_size_in_dw)
  {
 new_size_in_dw = align(new_size_in_dw, ITEM_ALIGNMENT);

 COMPUTE_DBG(pool-screen, * compute_memory_grow_defrag_pool() 
 -   new_size_in_dw = %d (%d bytes)\n,
 +   new_size_in_dw = %u (%u bytes)\n,
 new_size_in_dw, new_size_in_dw * 4);

 assert(new_size_in_dw = pool-size_in_dw);
 @@ -274,17 +274,17 @@ int compute_memory_finalize_pending(struct 
 compute_memory_pool* pool,
  {
 struct compute_memory_item *item, *next;

 -   int64_t allocated = 0;
 -   int64_t unallocated = 0;
 -   int64_t last_pos;
 +   unsigned allocated = 0;
 +   unsigned unallocated = 0;
 +   int last_pos;

 int err = 0;

 COMPUTE_DBG(pool-screen, * compute_memory_finalize_pending()\n);

 LIST_FOR_EACH_ENTRY(item, pool-item_list, link) {
 -   COMPUTE_DBG(pool-screen,   + list: offset = %i id = %i size 
 = %i 
 -   (%i bytes)\n,item-start_in_dw, item-id,
 +   COMPUTE_DBG(pool-screen,   + list: offset = %i id = %u size 
 = %u 
 +   (%u bytes)\n, item-start_in_dw, item-id,
 item-size_in_dw, item-size_in_dw * 4);
 }

 @@ -347,7 +347,7 @@ void compute_memory_defrag(struct compute_memory_pool 
 *pool,
 struct pipe_context *pipe)
  {
 struct compute_memory_item *item;
 -   int64_t last_pos;
 +   int last_pos;

 COMPUTE_DBG(pool-screen, * compute_memory_defrag()\n);

 @@ -374,7 +374,7 @@ void compute_memory_defrag(struct compute_memory_pool 
 *pool,
   */
  int compute_memory_promote_item(struct compute_memory_pool *pool,
 struct compute_memory_item *item, struct pipe_context *pipe,
 -   int64_t start_in_dw)
 +   int start_in_dw)
  {
 struct pipe_screen *screen = (struct pipe_screen *)pool-screen;
 struct r600_context *rctx = (struct r600_context 

[Mesa-dev] [PATCH] radeonsi: simplify constant buffer upload for big endian

2014-08-09 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

Point util_memcpy_cpu_to_le32 to a buffer storage directly.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 81ad14b..bfd2b76 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -649,20 +649,11 @@ void si_upload_const_buffer(struct si_context *sctx, 
struct r600_resource **rbuf
const uint8_t *ptr, unsigned size, uint32_t 
*const_offset)
 {
if (SI_BIG_ENDIAN) {
-   uint32_t *tmpPtr;
-   unsigned i;
-
-   if (!(tmpPtr = malloc(size))) {
-   R600_ERR(Failed to allocate BE swap buffer.\n);
-   return;
-   }
+   void *tmpPtr;
 
+   u_upload_alloc(sctx-b.uploader, 0, size, const_offset,
+  (struct pipe_resource**)rbuffer, tmpPtr);
util_memcpy_cpu_to_le32(tmpPtr, ptr, size);
-
-   u_upload_data(sctx-b.uploader, 0, size, tmpPtr, const_offset,
-   (struct pipe_resource**)rbuffer);
-
-   free(tmpPtr);
} else {
u_upload_data(sctx-b.uploader, 0, size, ptr, const_offset,
(struct pipe_resource**)rbuffer);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] BDW viewport extents + misc

2014-08-09 Thread Ben Widawsky
On Sat, Aug 09, 2014 at 12:07:58PM -0700, Ben Widawsky wrote:
 I realize it hasn't even been a week yet, but my remaining 2 weeks until
 my sabbatical have just filled up, so if anyone needs me to rework this,
 the sooner you let me know the better.

Hi Ken. Thanks a lot for reviewing it. I meant to incorporate all the
changes you requested. If you don't see one there, please let me know.

I've pushed a rebased branch here:
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=bdw-extents

Do you have any idea or comments about the fixed piglit tests? I haven't
actually run the rebased branch yet to re-confirm the results. I was
just curious...

 
 On Mon, Aug 04, 2014 at 12:24:00PM -0700, Ben Widawsky wrote:
  The patch commit messages and comments within the diffs explain the 
  intricacies
  of viewport extents and clipping. So rather, here is the data for these
  patches. All of the following is for a Broadwell system (which introduced
  viewport extents).
  
  EGYPT PERF
  ==
  No change
  
  WARSOW PERF
  ===
  No change
  
 
 Add xonotic and trex to this list of no change.
 
  piglit
  ==
  viewport extents only:
  spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: 
  fail pass
  spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass
  spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: 
  fail pass
  spec/glsl-1.50/execution/geometry/max-input-components: fail pass
  
  viewport extents + gb clipping:
  spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: 
  fail pass
  spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass
  spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: 
  fail pass
  
  all:
  spec/ARB_viewport_array/render-scissor/Render multi-viewport scissor test: 
  fail pass
  spec/glsl-1.30/execution/built-in-functions/vs-max-ivec4-int: fail pass
  spec/ARB_viewport_array/render-scissor/Render multi-scissor rectangles: 
  fail pass
  
  As you can observe, there are no wins found here other than conformance. 
  Given
  our understanding of the hardware, we expect these patches to produce a
  performance improvements for certain applications (specifically those which
  define viewports smaller than the drawing rectangle, but some other caveats
  apply on top of that).
  
  Ben Widawsky (4):
i965/guardband: Improve comments for guardband clipping
i965: Viewport extents on GEN8
i965/guardband: Enable for all viewport dimensions (GEN8+)
i965/clip: Removing scissor atom
  
   src/mesa/drivers/dri/i965/gen6_clip_state.c | 29 +-
   src/mesa/drivers/dri/i965/gen8_viewport_state.c | 52 
  ++---
   2 files changed, 57 insertions(+), 24 deletions(-)
  
  -- 
  2.0.3
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 -- 
 Ben Widawsky, Intel Open Source Technology Center

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/11] ARB_gpu_shader5 d/u sampler array indexing

2014-08-09 Thread Chris Forbes
The last big piece of ARB_gpu_shader5!

This series adds support for using dynamically uniform expressions
in sampler array indexing, as required by ARB_gpu_shader5. Gen7 is
supported (including Haswell's high samplers). Gen8 support would
be a straightforward addition for someone who has access to the
hardware.

Caveats:

- Haswell sampler state pointer adjustment is not optimal. If we knew
  the possible range of samplers an instruction could reference in the
  generator, we could take the constant fixup path if the sampler array
  did not span a 16-sampler boundary.

- The various shader-based workaround flags are taken from array index 0,
  in the non-const case. This causes wrong behavior if the flags differ
  between array elements. [Ken and I discussed this during planning,
  and this seemed a reasonable initial implementation.]

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] glsl: Allow dynamically uniform sampler array indexing with 4.0/gs5

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/glsl/ast_array_index.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp
index 50f9987..f8dca80 100644
--- a/src/glsl/ast_array_index.cpp
+++ b/src/glsl/ast_array_index.cpp
@@ -213,6 +213,9 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
* as using a loop counter as the index to an array of samplers.  If the
* loop in unrolled, the code should compile correctly.  Instead, emit a
* warning.
+   *
+   * In GLSL 4.00 / ARB_gpu_shader5, this requirement is relaxed again to 
allow
+   * indexing with dynamically uniform expressions.
*/
   if (array-type-element_type()-is_sampler()) {
 if (!state-is_version(130, 100)) {
@@ -227,7 +230,7 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
  expressions will be forbidden in GLSL 1.30 
  and later);
}
-} else {
+} else if (!state-is_version(400, 0)  
!state-ARB_gpu_shader5_enable) {
_mesa_glsl_error(loc, state,
 sampler arrays indexed with non-constant 
 expressions is forbidden in GLSL 1.30 and 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/11] mesa: Add a new function for getting the nonconst sampler array index

2014-08-09 Thread Chris Forbes
If the array index is not a constant expression, the existing support
will assume a zero offset (giving us the sampler index of the base of
the array).

For dynamically uniform indexing of sampler arrays, we need both that
and the indexing expression.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/program/sampler.cpp | 11 +++
 src/mesa/program/sampler.h   |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/src/mesa/program/sampler.cpp b/src/mesa/program/sampler.cpp
index e6532be..29a5408 100644
--- a/src/mesa/program/sampler.cpp
+++ b/src/mesa/program/sampler.cpp
@@ -134,3 +134,14 @@ _mesa_get_sampler_uniform_value(class ir_dereference 
*sampler,
return shader_program-UniformStorage[location].sampler[shader].index +
   getname.offset;
 }
+
+
+extern C class ir_rvalue *
+_mesa_get_sampler_array_nonconst_index(class ir_dereference *sampler)
+{
+   ir_dereference_array *deref_arr = sampler-as_dereference_array();
+   if (!deref_arr || deref_arr-array_index-as_constant())
+  return NULL;
+
+   return deref_arr-array_index;
+}
diff --git a/src/mesa/program/sampler.h b/src/mesa/program/sampler.h
index 22467e9..8b7c3b6 100644
--- a/src/mesa/program/sampler.h
+++ b/src/mesa/program/sampler.h
@@ -27,3 +27,6 @@ int
 _mesa_get_sampler_uniform_value(class ir_dereference *sampler,
struct gl_shader_program *shader_program,
const struct gl_program *prog);
+
+class ir_rvalue *
+_mesa_get_sampler_array_nonconst_index(class ir_dereference *sampler);
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/11] i965: Generalize sampler state pointer mangling for non-const

2014-08-09 Thread Chris Forbes
For now, assume that the addressed sampler can be in any of the
16-sampler banks. If we preserved range information this far, we
could avoid emitting these instructions if the sampler were known
to be contained within one bank.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 29230a6..75e14c1 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2304,7 +2304,19 @@ void brw_adjust_sampler_state_pointer(struct brw_compile 
*p,
  brw_imm_ud(16 * (sampler / 16) * sampler_state_size));
   }
} else {
-  /* XXX: Non-const sampler array indexing case */
+  /* Non-const sampler array indexing case */
+  if (brw-gen  8  !brw-is_haswell) {
+ return;
+  }
+
+  struct brw_reg temp = vec1(retype(scratch, BRW_REGISTER_TYPE_UD));
+
+  brw_AND(p, temp, sampler_index, brw_imm_ud(0x0f0));
+  brw_SHL(p, temp, temp, brw_imm_ud(4));
+  brw_ADD(p,
+  get_element_ud(header, 3),
+  get_element_ud(brw_vec8_grf(0, 0), 3),
+  temp);
}
 }
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] i965/vec4: Add support for non-const sampler indices in generator

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 52 +++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index d6ad8eb..c209e46 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -383,7 +383,57 @@ vec4_generator::generate_tex(vec4_instruction *inst,
 
   brw_mark_surface_used(prog_data-base, sampler + 
base_binding_table_index);
} else {
-  /* XXX: Non-constant sampler index. */
+  /* Non-constant sampler index. */
+  /* Note: this clobbers `dst` as a temporary before emitting the send */
+
+  struct brw_reg addr = vec1(retype(brw_address_reg(0), 
BRW_REGISTER_TYPE_UD));
+  struct brw_reg temp = vec1(retype(dst, BRW_REGISTER_TYPE_UD));
+
+  struct brw_reg sampler_reg = vec1(retype(sampler_index, 
BRW_REGISTER_TYPE_UD));
+
+  brw_push_insn_state(p);
+  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+  brw_set_default_access_mode(p, BRW_ALIGN_1);
+
+  /* Some care required: `sampler` and `temp` may alias:
+   *addr = sampler  0xff
+   *temp = (sampler  8)  0xf00
+   *addr = addr | temp
+   */
+  brw_ADD(p, addr, sampler_reg, brw_imm_ud(base_binding_table_index));
+  brw_SHL(p, temp, sampler_reg, brw_imm_ud(8u));
+  brw_AND(p, temp, temp, brw_imm_ud(0x0f00));
+  brw_AND(p, addr, addr, brw_imm_ud(0x0ff));
+  brw_OR(p, addr, addr, temp);
+
+  /* a0.0 |= descriptor */
+  brw_inst *insn_or = brw_next_insn(p, BRW_OPCODE_OR);
+  brw_set_sampler_message(p, insn_or,
+  0 /* surface */,
+  0 /* sampler */,
+  msg_type,
+  1 /* rlen */,
+  inst-mlen /* mlen */,
+  inst-header_present /* header */,
+  BRW_SAMPLER_SIMD_MODE_SIMD4X2,
+  return_format);
+  brw_inst_set_exec_size(p-brw, insn_or, BRW_EXECUTE_1);
+  brw_inst_set_src1_reg_type(p-brw, insn_or, BRW_REGISTER_TYPE_UD);
+  brw_set_src0(p, insn_or, addr);
+  brw_set_dest(p, insn_or, addr);
+
+
+  /* dst = send(offset, a0.0) */
+  brw_inst *insn_send = brw_next_insn(p, BRW_OPCODE_SEND);
+  brw_set_dest(p, insn_send, dst);
+  brw_set_src0(p, insn_send, src);
+  brw_set_indirect_send_descriptor(p, insn_send, BRW_SFID_SAMPLER, addr);
+
+  brw_pop_insn_state(p);
+
+  /* visitor knows more than we do about the surface limit required,
+   * so has already done marking.
+   */
}
 }
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/11] i965/vec4: Add support for nonconst sampler indexing in VS visitor

2014-08-09 Thread Chris Forbes
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 53 +++---
 2 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 9001286..4fe346c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -527,7 +527,7 @@ public:
void emit_unpack_half_2x16(dst_reg dst, src_reg src0);
 
uint32_t gather_channel(ir_texture *ir, uint32_t sampler);
-   src_reg emit_mcs_fetch(ir_texture *ir, src_reg coordinate, uint32_t 
sampler);
+   src_reg emit_mcs_fetch(ir_texture *ir, src_reg coordinate, src_reg sampler);
void emit_gen6_gather_wa(uint8_t wa, dst_reg dst);
void swizzle_result(ir_texture *ir, src_reg orig_val, uint32_t sampler);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 4760790..eedd862 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2299,7 +2299,7 @@ vec4_visitor::visit(ir_call *ir)
 }
 
 src_reg
-vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, uint32_t 
sampler)
+vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg coordinate, src_reg 
sampler)
 {
vec4_instruction *inst = new(mem_ctx) vec4_instruction(this, 
SHADER_OPCODE_TXF_MCS);
inst-base_mrf = 2;
@@ -2307,7 +2307,7 @@ vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg 
coordinate, uint32_t sample
inst-dst = dst_reg(this, glsl_type::uvec4_type);
inst-dst.writemask = WRITEMASK_XYZW;
 
-   inst-src[1] = src_reg(sampler);
+   inst-src[1] = sampler;
 
/* parameters are: u, v, r, lod; lod will always be zero due to api 
restrictions */
int param_base = inst-base_mrf;
@@ -2324,12 +2324,55 @@ vec4_visitor::emit_mcs_fetch(ir_texture *ir, src_reg 
coordinate, uint32_t sample
return src_reg(inst-dst);
 }
 
+static bool
+is_high_sampler(struct brw_context *brw, src_reg sampler)
+{
+   if (brw-gen  8  !brw-is_haswell)
+  return false;
+
+   return sampler.file != IMM || sampler.fixed_hw_reg.dw1.ud = 16;
+}
+
 void
 vec4_visitor::visit(ir_texture *ir)
 {
uint32_t sampler =
   _mesa_get_sampler_uniform_value(ir-sampler, shader_prog, prog);
 
+   ir_rvalue *nonconst_sampler_index =
+  _mesa_get_sampler_array_nonconst_index(ir-sampler);
+
+   /* Handle non-constant sampler array indexing */
+   src_reg sampler_reg;
+   if (nonconst_sampler_index) {
+  /* The highest sampler which may be used by this operation is
+   * the last element of the array. Mark it here, because the generator
+   * doesn't have enough information to determine the bound.
+   */
+  uint32_t array_size = ir-sampler-as_dereference_array()
+ -array-type-array_size();
+
+  uint32_t max_used = sampler + array_size - 1;
+  if (ir-op == ir_tg4  brw-gen  8) {
+ max_used += prog_data-base.binding_table.gather_texture_start;
+  } else {
+ max_used += prog_data-base.binding_table.texture_start;
+  }
+
+  brw_mark_surface_used(prog_data-base, max_used);
+
+  /* Emit code to evaluate the actual indexing expression */
+  nonconst_sampler_index-accept(this);
+  dst_reg temp(this, glsl_type::uint_type);
+  emit(ADD(temp, this-result, src_reg(sampler)));
+  sampler_reg = src_reg(temp);
+   } else {
+  /* Single sampler, or constant array index; the indexing expression
+   * is just an immediate.
+   */
+  sampler_reg = src_reg(sampler);
+   }
+
/* When tg4 is used with the degenerate ZERO/ONE swizzles, don't bother
 * emitting anything other than setting up the constant result.
 */
@@ -2397,7 +2440,7 @@ vec4_visitor::visit(ir_texture *ir)
   sample_index_type = ir-lod_info.sample_index-type;
 
   if (brw-gen = 7  key-tex.compressed_multisample_layout_mask  
(1sampler))
- mcs = emit_mcs_fetch(ir, coordinate, sampler);
+ mcs = emit_mcs_fetch(ir, coordinate, sampler_reg);
   else
  mcs = src_reg(0u);
   break;
@@ -2452,14 +2495,14 @@ vec4_visitor::visit(ir_texture *ir)
 */
inst-header_present =
   brw-gen  5 || inst-texture_offset != 0 || ir-op == ir_tg4 ||
-  sampler = 16;
+  is_high_sampler(brw, sampler_reg);
inst-base_mrf = 2;
inst-mlen = inst-header_present + 1; /* always at least one */
inst-dst = dst_reg(this, ir-type);
inst-dst.writemask = WRITEMASK_XYZW;
inst-shadow_compare = ir-shadow_comparitor != NULL;
 
-   inst-src[1] = src_reg(sampler);
+   inst-src[1] = sampler_reg;
 
/* MRF for the first parameter */
int param_base = inst-base_mrf + inst-header_present;
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] i965/vec4: Refactor generate_tex in prep for non-const samplers

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 47 +---
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index af7f8cc..d6ad8eb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -314,11 +314,8 @@ vec4_generator::generate_tex(vec4_instruction *inst,
 
assert(msg_type != -1);
 
-   assert(sampler_index.file == BRW_IMMEDIATE_VALUE);
assert(sampler_index.type == BRW_REGISTER_TYPE_UD);
 
-   uint32_t sampler = sampler_index.dw1.ud;
-
/* Load the message header if present.  If there's a texture offset, we need
 * to set it up explicitly and load the offset bitfield.  Otherwise, we can
 * use an implied move from g0 to the first message register.
@@ -363,25 +360,31 @@ vec4_generator::generate_tex(vec4_instruction *inst,
   break;
}
 
-   uint32_t surface_index = ((inst-opcode == SHADER_OPCODE_TG4 ||
-  inst-opcode == SHADER_OPCODE_TG4_OFFSET)
-  ? prog_data-base.binding_table.gather_texture_start
-  : prog_data-base.binding_table.texture_start) + sampler;
-
-   brw_SAMPLE(p,
- dst,
- inst-base_mrf,
- src,
-  surface_index,
- sampler % 16,
- msg_type,
- 1, /* response length */
- inst-mlen,
- inst-header_present,
- BRW_SAMPLER_SIMD_MODE_SIMD4X2,
- return_format);
-
-   brw_mark_surface_used(prog_data-base, surface_index);
+   uint32_t base_binding_table_index = (inst-opcode == SHADER_OPCODE_TG4 ||
+ inst-opcode == SHADER_OPCODE_TG4_OFFSET)
+ ? prog_data-base.binding_table.gather_texture_start
+ : prog_data-base.binding_table.texture_start;
+
+   if (sampler_index.file == BRW_IMMEDIATE_VALUE) {
+  uint32_t sampler = sampler_index.dw1.ud;
+
+  brw_SAMPLE(p,
+ dst,
+ inst-base_mrf,
+ src,
+ sampler + base_binding_table_index,
+ sampler % 16,
+ msg_type,
+ 1, /* response length */
+ inst-mlen,
+ inst-header_present,
+ BRW_SAMPLER_SIMD_MODE_SIMD4X2,
+ return_format);
+
+  brw_mark_surface_used(prog_data-base, sampler + 
base_binding_table_index);
+   } else {
+  /* XXX: Non-constant sampler index. */
+   }
 }
 
 void
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/11] i965: Extract helper function for surface state pointer adjustment

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  5 
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 35 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 17 +---
 3 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 4f2982d..84142ee 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -281,6 +281,11 @@ void brw_SAMPLE(struct brw_compile *p,
unsigned simd_mode,
unsigned return_format);
 
+void brw_adjust_sampler_state_pointer(struct brw_compile *p,
+  struct brw_reg header,
+  struct brw_reg sampler_index,
+  struct brw_reg scratch);
+
 void gen4_math(struct brw_compile *p,
   struct brw_reg dest,
   unsigned function,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 7ea9313..29230a6 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2273,6 +2273,41 @@ void brw_SAMPLE(struct brw_compile *p,
return_format);
 }
 
+/* Adjust the message header's sampler state pointer to
+ * select the correct group of 16 samplers.
+ */
+void brw_adjust_sampler_state_pointer(struct brw_compile *p,
+  struct brw_reg header,
+  struct brw_reg sampler_index,
+  struct brw_reg scratch)
+{
+   /* The Sampler Index field can only store values between 0 and 15.
+* However, we can add an offset to the Sampler State Pointer
+* field, effectively selecting a different set of 16 samplers.
+*
+* The Sampler State Pointer needs to be aligned to a 32-byte
+* offset, and each sampler state is only 16-bytes, so we can't
+* exclusively use the offset - we have to use both.
+*/
+
+   struct brw_context *brw = p-brw;
+
+   if (sampler_index.file == BRW_IMMEDIATE_VALUE) {
+  const int sampler_state_size = 16; /* 16 bytes */
+  uint32_t sampler = sampler_index.dw1.ud;
+
+  if (sampler = 16) {
+ assert(brw-is_haswell || brw-gen = 8);
+ brw_ADD(p,
+ get_element_ud(header, 3),
+ get_element_ud(brw_vec8_grf(0, 0), 3),
+ brw_imm_ud(16 * (sampler / 16) * sampler_state_size));
+  }
+   } else {
+  /* XXX: Non-const sampler array indexing case */
+   }
+}
+
 /* All these variables are pretty confusing - we might be better off
  * using bitmasks and macros for this, in the old style.  Or perhaps
  * just having the caller instantiate the fields in dword3 itself.
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index ed6857d..af7f8cc 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -344,22 +344,7 @@ vec4_generator::generate_tex(vec4_instruction *inst,
 brw_imm_ud(inst-texture_offset));
  }
 
- if (sampler = 16) {
-/* The Sampler Index field can only store values between 0 and 
15.
- * However, we can add an offset to the Sampler State Pointer
- * field, effectively selecting a different set of 16 samplers.
- *
- * The Sampler State Pointer needs to be aligned to a 32-byte
- * offset, and each sampler state is only 16-bytes, so we can't
- * exclusively use the offset - we have to use both.
- */
-const int sampler_state_size = 16; /* 16 bytes */
-assert(brw-gen = 8 || brw-is_haswell);
-brw_ADD(p,
-get_element_ud(header, 3),
-get_element_ud(brw_vec8_grf(0, 0), 3),
-brw_imm_ud(16 * (sampler / 16) * sampler_state_size));
- }
+ brw_adjust_sampler_state_pointer(p, header, sampler_index, dst);
  brw_pop_insn_state(p);
   }
}
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] i965/fs: Add support for non-const sampler indices in generator

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 52 +-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 54f3287..cb359ba 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -597,7 +597,57 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
 
   brw_mark_surface_used(prog_data-base, sampler + 
base_binding_table_index);
} else {
-  /* XXX: Non-const sampler index */
+  /* Non-const sampler index */
+  /* Note: this clobbers `dst` as a temporary before emitting the send */
+
+  struct brw_reg addr = vec1(retype(brw_address_reg(0), 
BRW_REGISTER_TYPE_UD));
+  struct brw_reg temp = vec1(retype(dst, BRW_REGISTER_TYPE_UD));
+
+  struct brw_reg sampler_reg = vec1(retype(sampler_index, 
BRW_REGISTER_TYPE_UD));
+
+  brw_push_insn_state(p);
+  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+  brw_set_default_access_mode(p, BRW_ALIGN_1);
+
+  /* Some care required: `sampler` and `temp` may alias:
+   *addr = sampler  0xff
+   *temp = (sampler  8)  0xf00
+   *addr = addr | temp
+   */
+  brw_ADD(p, addr, sampler_reg, brw_imm_ud(base_binding_table_index));
+  brw_SHL(p, temp, sampler_reg, brw_imm_ud(8u));
+  brw_AND(p, temp, temp, brw_imm_ud(0x0f00));
+  brw_AND(p, addr, addr, brw_imm_ud(0x0ff));
+  brw_OR(p, addr, addr, temp);
+
+  /* a0.0 |= descriptor */
+  brw_inst *insn_or = brw_next_insn(p, BRW_OPCODE_OR);
+  brw_set_sampler_message(p, insn_or,
+  0 /* surface */,
+  0 /* sampler */,
+  msg_type,
+  rlen,
+  inst-mlen /* mlen */,
+  inst-header_present /* header */,
+  simd_mode,
+  return_format);
+  brw_inst_set_exec_size(p-brw, insn_or, BRW_EXECUTE_1);
+  brw_inst_set_src1_reg_type(p-brw, insn_or, BRW_REGISTER_TYPE_UD);
+  brw_set_src0(p, insn_or, addr);
+  brw_set_dest(p, insn_or, addr);
+
+
+  /* dst = send(offset, a0.0) */
+  brw_inst *insn_send = brw_next_insn(p, BRW_OPCODE_SEND);
+  brw_set_dest(p, insn_send, dst);
+  brw_set_src0(p, insn_send, src);
+  brw_set_indirect_send_descriptor(p, insn_send, BRW_SFID_SAMPLER, addr);
+
+  brw_pop_insn_state(p);
+
+  /* visitor knows more than we do about the surface limit required,
+   * so has already done marking.
+   */
}
 }
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/11] i965/fs: Add support for nonconst sampler indexing in FS visitor

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_fs.h   |  4 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 60 
 2 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index dfb13ea..105834a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -380,8 +380,8 @@ public:
   fs_reg sample_index, uint32_t sampler);
fs_inst *emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate,
   fs_reg shadow_comp, fs_reg lod, fs_reg lod2,
-  fs_reg sample_index, fs_reg mcs, uint32_t 
sampler);
-   fs_reg emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, uint32_t sampler);
+  fs_reg sample_index, fs_reg mcs, fs_reg sampler);
+   fs_reg emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, fs_reg sampler);
void emit_gen6_gather_wa(uint8_t wa, fs_reg dst);
fs_reg fix_math_operand(fs_reg src);
fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 0b3a2bb..8644893 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1422,10 +1422,19 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
return inst;
 }
 
+static bool
+is_high_sampler(struct brw_context *brw, fs_reg sampler)
+{
+   if (brw-gen  8  !brw-is_haswell)
+  return false;
+
+   return sampler.file != IMM || sampler.fixed_hw_reg.dw1.ud = 16;
+}
+
 fs_inst *
 fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate,
   fs_reg shadow_c, fs_reg lod, fs_reg lod2,
-  fs_reg sample_index, fs_reg mcs, uint32_t 
sampler)
+  fs_reg sample_index, fs_reg mcs, fs_reg sampler)
 {
int reg_width = dispatch_width / 8;
bool header_present = false;
@@ -1436,7 +1445,8 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
}
int length = 0;
 
-   if (ir-op == ir_tg4 || (ir-offset  ir-op != ir_txf) || sampler = 16) {
+   if (ir-op == ir_tg4 || (ir-offset  ir-op != ir_txf) ||
+   is_high_sampler(brw, sampler)) {
   /* For general texture offsets (no txf workaround), we need a header to
* put them in.  Note that for SIMD16 we're making space for two actual
* hardware registers here, so the emit will have to fix up for this.
@@ -1611,7 +1621,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
default:
   unreachable(not reached);
}
-   fs_inst *inst = emit(opcode, dst, src_payload, fs_reg(sampler));
+   fs_inst *inst = emit(opcode, dst, src_payload, sampler);
inst-base_mrf = -1;
if (reg_width == 2)
   inst-mlen = length * reg_width - header_present;
@@ -1744,7 +1754,7 @@ fs_visitor::rescale_texcoord(ir_texture *ir, fs_reg 
coordinate,
 
 /* Sample from the MCS surface attached to this multisample texture. */
 fs_reg
-fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, uint32_t sampler)
+fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg coordinate, fs_reg sampler)
 {
int reg_width = dispatch_width / 8;
int length = ir-coordinate-type-vector_elements;
@@ -1762,7 +1772,7 @@ fs_visitor::emit_mcs_fetch(ir_texture *ir, fs_reg 
coordinate, uint32_t sampler)
 
emit(LOAD_PAYLOAD(payload, sources, length));
 
-   fs_inst *inst = emit(SHADER_OPCODE_TXF_MCS, dest, payload, fs_reg(sampler));
+   fs_inst *inst = emit(SHADER_OPCODE_TXF_MCS, dest, payload, sampler);
inst-base_mrf = -1;
inst-mlen = length * reg_width;
inst-header_present = false;
@@ -1780,6 +1790,42 @@ fs_visitor::visit(ir_texture *ir)
 
uint32_t sampler =
   _mesa_get_sampler_uniform_value(ir-sampler, shader_prog, prog);
+
+   ir_rvalue *nonconst_sampler_index =
+  _mesa_get_sampler_array_nonconst_index(ir-sampler);
+
+   /* Handle non-constant sampler array indexing */
+   fs_reg sampler_reg;
+   if (nonconst_sampler_index) {
+  /* The highest sampler which may be used by this operation is
+   * the last element of the array. Mark it here, because the generator
+   * doesn't have enough information to determine the bound.
+   */
+  uint32_t array_size = ir-sampler-as_dereference_array()
+ -array-type-array_size();
+
+  uint32_t max_used = sampler + array_size - 1;
+  if (ir-op == ir_tg4  brw-gen  8) {
+ max_used += prog_data-base.binding_table.gather_texture_start;
+  } else {
+ max_used += prog_data-base.binding_table.texture_start;
+  }
+
+  brw_mark_surface_used(prog_data-base, max_used);
+
+  /* Emit code to evaluate the actual indexing expression */
+  

[Mesa-dev] [PATCH 08/11] i965/fs: Use brw_adjust_sampler_state_pointer in fs generator too

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 28b3525..a6e653e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -572,22 +572,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
brw_imm_ud(inst-texture_offset));
  }
 
- if (sampler = 16) {
-/* The Sampler Index field can only store values between 0 and 
15.
- * However, we can add an offset to the Sampler State Pointer
- * field, effectively selecting a different set of 16 samplers.
- *
- * The Sampler State Pointer needs to be aligned to a 32-byte
- * offset, and each sampler state is only 16-bytes, so we can't
- * exclusively use the offset - we have to use both.
- */
-assert(brw-gen = 8 || brw-is_haswell);
-const int sampler_state_size = 16; /* 16 bytes */
-brw_ADD(p,
-get_element_ud(header_reg, 3),
-get_element_ud(brw_vec8_grf(0, 0), 3),
-brw_imm_ud(16 * (sampler / 16) * sampler_state_size));
- }
+ brw_adjust_sampler_state_pointer(p, header_reg, sampler_index, dst);
  brw_pop_insn_state(p);
   }
}
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] i965/fs: Refactor generate_tex in prep for nonconst sampler indexing

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 47 ++
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index a6e653e..54f3287 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -537,11 +537,8 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
   src.nr++;
}
 
-   assert(sampler_index.file == BRW_IMMEDIATE_VALUE);
assert(sampler_index.type == BRW_REGISTER_TYPE_UD);
 
-   uint32_t sampler = sampler_index.dw1.ud;
-
/* Load the message header if present.  If there's a texture offset,
 * we need to set it up explicitly and load the offset bitfield.
 * Otherwise, we can use an implied move from g0 to the first message reg.
@@ -577,25 +574,31 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
   }
}
 
-   uint32_t surface_index = ((inst-opcode == SHADER_OPCODE_TG4 ||
-  inst-opcode == SHADER_OPCODE_TG4_OFFSET)
-  ? prog_data-base.binding_table.gather_texture_start
-  : prog_data-base.binding_table.texture_start) + sampler;
-
-   brw_SAMPLE(p,
- retype(dst, BRW_REGISTER_TYPE_UW),
- inst-base_mrf,
- src,
-  surface_index,
- sampler % 16,
- msg_type,
- rlen,
- inst-mlen,
- inst-header_present,
- simd_mode,
- return_format);
-
-   brw_mark_surface_used(prog_data-base, surface_index);
+   uint32_t base_binding_table_index = (inst-opcode == SHADER_OPCODE_TG4 ||
+ inst-opcode == SHADER_OPCODE_TG4_OFFSET)
+ ? prog_data-base.binding_table.gather_texture_start
+ : prog_data-base.binding_table.texture_start;
+
+   if (sampler_index.file == BRW_IMMEDIATE_VALUE) {
+  uint32_t sampler = sampler_index.dw1.ud;
+
+  brw_SAMPLE(p,
+ retype(dst, BRW_REGISTER_TYPE_UW),
+ inst-base_mrf,
+ src,
+ sampler + base_binding_table_index,
+ sampler % 16,
+ msg_type,
+ rlen,
+ inst-mlen,
+ inst-header_present,
+ simd_mode,
+ return_format);
+
+  brw_mark_surface_used(prog_data-base, sampler + 
base_binding_table_index);
+   } else {
+  /* XXX: Non-const sampler index */
+   }
 }
 
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] BDW viewport extents + misc

2014-08-09 Thread Kenneth Graunke
On Saturday, August 09, 2014 06:15:34 PM Ben Widawsky wrote:
 On Sat, Aug 09, 2014 at 12:07:58PM -0700, Ben Widawsky wrote:
  I realize it hasn't even been a week yet, but my remaining 2 weeks until
  my sabbatical have just filled up, so if anyone needs me to rework this,
  the sooner you let me know the better.
 
 Hi Ken. Thanks a lot for reviewing it. I meant to incorporate all the
 changes you requested. If you don't see one there, please let me know.
 
 I've pushed a rebased branch here:
 http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=bdw-extents

Looks okay to me - I'd go ahead and push these.

 Do you have any idea or comments about the fixed piglit tests? I haven't
 actually run the rebased branch yet to re-confirm the results. I was
 just curious...

Oh, I missed that.  vs-max-ivec4-int sounds bizarre, but the others sound 
believable.  I recall there were a few ARB_viewport_array tests I hadn't 
figured out, and this seems like the right fix.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: enable ARB_gpu_shader5 if the reported GLSL version = 400

2014-08-09 Thread Ilia Mirkin
The ARB_gpu_shader5 extension is made up of a lot of small sub-parts.
Instead of adding PIPE_CAP's for each of these, just rely on the GLSL
version reported by the pipe driver. The remaining extensions lend
themselves naturally to being checked through a single CAP.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---

The only issue will happen when GLSL 4.00 is supported for real and so we
can't rely on the st to clamp it down to 330. (Although perhaps the mesa
version limiting logic would kick in then as well?)

The alternative is to add a ARB_GS5 cap, or a bunch of little ones for the
sub-parts, but that doesn't seem too useful.

 src/mesa/state_tracker/st_extensions.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 7ac4840..51fd71a 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -826,4 +826,7 @@ void st_init_extensions(struct st_context *st)
}
if (ctx-Const.MaxProgramTextureGatherComponents  0)
   ctx-Extensions.ARB_texture_gather = GL_TRUE;
+
+   if (glsl_feature_level = 400)
+  ctx-Extensions.ARB_gpu_shader5 = GL_TRUE;
 }
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: simplify constant buffer upload for big endian

2014-08-09 Thread Michel Dänzer
On 10.08.2014 06:54, Marek Olšák wrote:
 From: Marek Olšák marek.ol...@amd.com
 
 Point util_memcpy_cpu_to_le32 to a buffer storage directly.
 ---
  src/gallium/drivers/radeonsi/si_descriptors.c | 15 +++
  1 file changed, 3 insertions(+), 12 deletions(-)
 
 diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
 b/src/gallium/drivers/radeonsi/si_descriptors.c
 index 81ad14b..bfd2b76 100644
 --- a/src/gallium/drivers/radeonsi/si_descriptors.c
 +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
 @@ -649,20 +649,11 @@ void si_upload_const_buffer(struct si_context *sctx, 
 struct r600_resource **rbuf
   const uint8_t *ptr, unsigned size, uint32_t 
 *const_offset)
  {
   if (SI_BIG_ENDIAN) {
 - uint32_t *tmpPtr;
 - unsigned i;
 -
 - if (!(tmpPtr = malloc(size))) {
 - R600_ERR(Failed to allocate BE swap buffer.\n);
 - return;
 - }
 + void *tmpPtr;
  
 + u_upload_alloc(sctx-b.uploader, 0, size, const_offset,
 +(struct pipe_resource**)rbuffer, tmpPtr);
   util_memcpy_cpu_to_le32(tmpPtr, ptr, size);
 -
 - u_upload_data(sctx-b.uploader, 0, size, tmpPtr, const_offset,
 - (struct pipe_resource**)rbuffer);
 -
 - free(tmpPtr);
   } else {
   u_upload_data(sctx-b.uploader, 0, size, ptr, const_offset,
   (struct pipe_resource**)rbuffer);
 

Since util_memcpy_cpu_to_le32() is memcpy() on little endian, is there
any point keeping the separate (non-)SI_BIG_ENDIAN paths here? Either way,

Reviewed-by: Michel Dänzer michel.daen...@amd.com


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] docs: Mark off ARB_gpu_shader5 for i965

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 docs/GL3.txt | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 65facf5..29535ec 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -94,18 +94,18 @@ GL 4.0, GLSL 4.00:
 
   GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, 
r600, radeonsi, softpipe)
   GL_ARB_draw_indirect DONE (i965, nvc0, 
radeonsi, softpipe, llvmpipe)
-  GL_ARB_gpu_shader5   started
+  GL_ARB_gpu_shader5   DONE (i965)
   - 'precise' qualifierDONE
-  - Dynamically uniform sampler array indices  started (Chris)
-  - Dynamically uniform UBO array indices  DONE (i965)
+  - Dynamically uniform sampler array indices  DONE ()
+  - Dynamically uniform UBO array indices  DONE ()
   - Implicit signed - unsigned conversionsDONE
-  - Fused multiply-add DONE (i965, nvc0)
-  - Packing/bitfield/conversion functions  DONE (i965, nvc0, r600)
-  - Enhanced textureGather DONE (i965, nvc0, r600, 
radeonsi)
-  - Geometry shader instancing DONE (i965, nvc0)
-  - Geometry shader multiple streams   DONE (i965, nvc0)
-  - Enhanced per-sample shadingDONE (i965, r600)
-  - Interpolation functionsDONE (i965)
+  - Fused multiply-add DONE (nvc0)
+  - Packing/bitfield/conversion functions  DONE (nvc0, r600)
+  - Enhanced textureGather DONE (nvc0, r600, 
radeonsi)
+  - Geometry shader instancing DONE (nvc0)
+  - Geometry shader multiple streams   DONE (nvc0)
+  - Enhanced per-sample shadingDONE (r600)
+  - Interpolation functionsDONE ()
   - New overload resolution rules  DONE
   GL_ARB_gpu_shader_fp64   started (Dave)
   GL_ARB_sample_shadingDONE (i965, nv50, nvc0, 
radeonsi)
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] i965/eu: Change gen == 7 to gen = 7 in a couple brw_eu_emit.c cases.

2014-08-09 Thread Kenneth Graunke
Broadwell is going to use the brw_eu_emit.c code soon.  We want to get
the fake MRF handling and URB HWord channel mask handling.

We don't need the CMP thread switch workaround, though.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index e894abd..038bfc6 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -97,7 +97,7 @@ gen7_convert_mrf_to_grf(struct brw_compile *p, struct brw_reg 
*reg)
 * registers required for messages with EOT.
 */
struct brw_context *brw = p-brw;
-   if (brw-gen == 7  reg-file == BRW_MESSAGE_REGISTER_FILE) {
+   if (brw-gen = 7  reg-file == BRW_MESSAGE_REGISTER_FILE) {
   reg-file = BRW_GENERAL_REGISTER_FILE;
   reg-nr += GEN7_MRF_HACK_START;
}
@@ -2265,7 +2265,7 @@ void brw_urb_WRITE(struct brw_compile *p,
 
gen6_resolve_implied_move(p, src0, msg_reg_nr);
 
-   if (brw-gen == 7  !(flags  BRW_URB_WRITE_USE_CHANNEL_MASKS)) {
+   if (brw-gen = 7  !(flags  BRW_URB_WRITE_USE_CHANNEL_MASKS)) {
   /* Enable Channel Masks in the URB_WRITE_HWORD message header */
   brw_push_insn_state(p);
   brw_set_default_access_mode(p, BRW_ALIGN_1);
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/11] i965/eu: Allow math on immediates on Broadwell.

2014-08-09 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 0a66d0c..9d7b4a5 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1886,7 +1886,8 @@ void gen6_math(struct brw_compile *p,
 
assert(dest.file == BRW_GENERAL_REGISTER_FILE ||
   (brw-gen = 7  dest.file == BRW_MESSAGE_REGISTER_FILE));
-   assert(src0.file == BRW_GENERAL_REGISTER_FILE);
+   assert(src0.file == BRW_GENERAL_REGISTER_FILE ||
+  (brw-gen = 8  src0.file == BRW_IMMEDIATE_VALUE));
 
assert(dest.hstride == BRW_HORIZONTAL_STRIDE_1);
if (brw-gen == 6) {
@@ -1899,12 +1900,14 @@ void gen6_math(struct brw_compile *p,
function == BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER) {
   assert(src0.type != BRW_REGISTER_TYPE_F);
   assert(src1.type != BRW_REGISTER_TYPE_F);
-  assert(src1.file == BRW_GENERAL_REGISTER_FILE);
+  assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
+ (brw-gen = 8  src1.file == BRW_IMMEDIATE_VALUE));
} else {
   assert(src0.type == BRW_REGISTER_TYPE_F);
   assert(src1.type == BRW_REGISTER_TYPE_F);
   if (function == BRW_MATH_FUNCTION_POW) {
- assert(src1.file == BRW_GENERAL_REGISTER_FILE);
+ assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
+(brw-gen = 8  src1.file == BRW_IMMEDIATE_VALUE));
   } else {
  assert(src1.file == BRW_ARCHITECTURE_REGISTER_FILE 
 src1.nr == BRW_ARF_NULL);
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/11] i965/eu: Emulate F32TO16 and F16TO32 on Broadwell.

2014-08-09 Thread Kenneth Graunke
When we combine the Gen4-7 and Gen8+ generators, we'll need to handle
half float packing/unpacking functions somehow.  The Gen8+ generator
code today just emulates the behavior of the Gen7 F32TO16/F16TO32
instructions, including the align16 mode bugs.

Rather than messing with fs_generator/vec4_generator, I decided to just
emulate the instructions at the brw_eu_emit.c layer.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 47 +++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 822c73c..5e14ba5 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1004,8 +1004,6 @@ ALU2(XOR)
 ALU2(SHR)
 ALU2(SHL)
 ALU2(ASR)
-ALU1(F32TO16)
-ALU1(F16TO32)
 ALU1(FRC)
 ALU1(RNDD)
 ALU2(MAC)
@@ -1110,6 +1108,51 @@ brw_MUL(struct brw_compile *p, struct brw_reg dest,
return brw_alu2(p, BRW_OPCODE_MUL, dest, src0, src1);
 }
 
+brw_inst *
+brw_F32TO16(struct brw_compile *p, struct brw_reg dst, struct brw_reg src)
+{
+   const struct brw_context *brw = p-brw;
+
+   bool align16 = brw_inst_access_mode(brw, p-current) == BRW_ALIGN_16;
+
+   if (align16) {
+  assert(dst.type == BRW_REGISTER_TYPE_UD);
+   } else {
+  assert(dst.type == BRW_REGISTER_TYPE_W ||
+ dst.type == BRW_REGISTER_TYPE_UW ||
+ dst.type == BRW_REGISTER_TYPE_HF);
+   }
+
+   if (brw-gen = 8) {
+  if (align16) {
+ /* Emulate the Gen7 zeroing bug (see comments in vec4_visitor's
+  * emit_pack_half_2x16 method.)
+  */
+ brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_UD), brw_imm_ud(0u));
+  }
+  return brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_HF), src);
+   } else {
+  assert(brw-gen = 7);
+  return brw_alu1(p, BRW_OPCODE_F32TO16, dst, src);
+   }
+}
+
+brw_inst *
+brw_F16TO32(struct brw_compile *p, struct brw_reg dst, struct brw_reg src)
+{
+   const struct brw_context *brw = p-brw;
+   assert(src.type == BRW_REGISTER_TYPE_W ||
+  src.type == BRW_REGISTER_TYPE_UW ||
+  src.type == BRW_REGISTER_TYPE_HF);
+
+   if (brw-gen = 8) {
+  return brw_MOV(p, dst, retype(src, BRW_REGISTER_TYPE_HF));
+   } else {
+  assert(brw-gen = 7);
+  return brw_alu1(p, BRW_OPCODE_F16TO32, dst, src);
+   }
+}
+
 
 void brw_NOP(struct brw_compile *p)
 {
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] i965/eu: Set UIP on ELSE instructions on Broadwell.

2014-08-09 Thread Kenneth Graunke
Broadwell adds UIP on ELSE instructions.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 214ff0a..665fc07 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1434,6 +1434,12 @@ patch_IF_ELSE(struct brw_compile *p,
 /* The IF instruction's UIP and ELSE's JIP should point to ENDIF */
  brw_inst_set_uip(brw, if_inst, br * (endif_inst - if_inst));
  brw_inst_set_jip(brw, else_inst, br * (endif_inst - else_inst));
+ if (brw-gen = 8) {
+/* Since we don't set branch_ctrl, the ELSE's JIP and UIP both
+ * should point to ENDIF.
+ */
+brw_inst_set_uip(brw, else_inst, br * (endif_inst - else_inst));
+ }
   }
}
 }
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] i965/eu: Explicitly disable instruction compaction on Broadwell for now.

2014-08-09 Thread Kenneth Graunke
Until now, it's been off implicitly: we never call the compactor
function.  When we merge the generators, we'll start calling it, so we
should make it do nothing.

Matt will enable instruction compaction properly later.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_compact.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
b/src/mesa/drivers/dri/i965/brw_eu_compact.c
index eec6454..625cfbb 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
@@ -711,7 +711,7 @@ brw_compact_instructions(struct brw_compile *p, int 
start_offset,
 */
int old_ip[(p-next_insn_offset - start_offset) / 8];
 
-   if (brw-gen  6)
+   if (brw-gen  6 || brw-gen = 8)
   return;
 
int src_offset;
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/4] Enable ARB_gpu_shader5 for i965/Gen7

2014-08-09 Thread Chris Forbes
[Applies on top of the dynamically uniform UBO indexing and sampler
indexing series sent to the list recently]

All the pieces are now in place for ARB_gpu_shader5 on
Ivybridge/Baytrail/Haswell, and Broadwell is close.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/11] i965/eu: Use Haswell atomic messages on Broadwell.

2014-08-09 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 038bfc6..822c73c 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2511,7 +2511,7 @@ brw_set_dp_untyped_atomic_message(struct brw_compile *p,
   atomic_op | /* Atomic Operation Type: BRW_AOP_* */
   (response_length ? 1  5 : 0); /* Return data expected */
 
-   if (brw-is_haswell) {
+   if (brw-gen = 8 || brw-is_haswell) {
   brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,
  msg_length, response_length,
  header_present, false);
@@ -2574,7 +2574,7 @@ brw_set_dp_untyped_surface_read_message(struct 
brw_compile *p,
   (brw_inst_exec_size(brw, insn) == BRW_EXECUTE_16 ? 16 : 8);
const unsigned num_channels = response_length / (dispatch_width / 8);
 
-   if (brw-is_haswell) {
+   if (brw-gen = 8 || brw-is_haswell) {
   brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,
  msg_length, response_length,
  header_present, false);
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] i965: Broadwell using brw_eu_emit.c, part 1

2014-08-09 Thread Kenneth Graunke
Hello,

This is a series of preparatory work for getting Broadwell to use the
brw_eu_emit.c code generator rather than gen8_generator.  The hope is
to drop the separate Gen8+ code completely.  We couldn't do this
originally because the old code was centered around a struct describing
the Gen4-7 assembly format, but now that we've shifted to the new
brw_inst API, there's no reason we can't share code.

This doesn't actually get us there - I've still got some bugs to track
down, but these patches have been in my tree for a long time and seem
pretty solid.

--Ken

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] i965/eu: Refactor jump distance scaling to use a helper function.

2014-08-09 Thread Kenneth Graunke
Different generations of hardware measure jump distances in different
units.  Previously, every function that needed to set a jump target open
coded this scaling, or made a hardcoded assumption (i.e. just used 2).

Most functions start with the number of instructions to jump, and scale
up to the hardware-specific value.  So, I made the function match that.

Others start with a byte offset, and divide by a constant (8) to obtain
the jump distance.  This is actually 16 / 2 (the jump scale for Gen5-7).

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu.h |  2 ++
 src/mesa/drivers/dri/i965/brw_eu_emit.c| 42 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  8 +++--
 3 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 7efc028..fdf6e4c 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -316,6 +316,8 @@ void brw_shader_time_add(struct brw_compile *p,
  struct brw_reg payload,
  uint32_t surf_index);
 
+unsigned brw_jump_scale(const struct brw_context *brw);
+
 /* If/else/endif.  Works by manipulating the execution flags on each
  * channel.
  */
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 665fc07..3d9c96a 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1170,6 +1170,27 @@ void brw_NOP(struct brw_compile *p)
  * Comparisons, if/else/endif
  */
 
+/**
+ * Return the generation-specific jump distance scaling factor.
+ *
+ * Given the number of instructions to jump, we need to scale by
+ * some number to obtain the actual jump distance to program in an
+ * instruction.
+ */
+unsigned
+brw_jump_scale(const struct brw_context *brw)
+{
+   /* Ironlake and later measure jump targets in 64-bit data chunks (in order
+* (to support compaction), so each 128-bit instruction requires 2 chunks.
+*/
+   if (brw-gen = 5)
+  return 2;
+
+   /* Gen4 simply uses the number of 128-bit instructions. */
+   return 1;
+}
+
+
 brw_inst *
 brw_JMPI(struct brw_compile *p, struct brw_reg index,
  unsigned predicate_control)
@@ -1376,12 +1397,7 @@ patch_IF_ELSE(struct brw_compile *p,
assert(endif_inst != NULL);
assert(else_inst == NULL || brw_inst_opcode(brw, else_inst) == 
BRW_OPCODE_ELSE);
 
-   unsigned br = 1;
-   /* Jump count is for 64bit data chunk each, so one 128bit instruction
-* requires 2 chunks.
-*/
-   if (brw-gen = 5)
-  br = 2;
+   unsigned br = brw_jump_scale(brw);
 
assert(brw_inst_opcode(brw, endif_inst) == BRW_OPCODE_ENDIF);
brw_inst_set_exec_size(brw, endif_inst, brw_inst_exec_size(brw, if_inst));
@@ -1678,7 +1694,7 @@ brw_patch_break_cont(struct brw_compile *p, brw_inst 
*while_inst)
struct brw_context *brw = p-brw;
brw_inst *do_inst = get_inner_do_insn(p);
brw_inst *inst;
-   int br = (brw-gen == 5) ? 2 : 1;
+   unsigned br = brw_jump_scale(brw);
 
assert(brw-gen  6);
 
@@ -1702,10 +1718,7 @@ brw_WHILE(struct brw_compile *p)
 {
struct brw_context *brw = p-brw;
brw_inst *insn, *do_insn;
-   unsigned br = 1;
-
-   if (brw-gen = 5)
-  br = 2;
+   unsigned br = brw_jump_scale(brw);
 
if (brw-gen = 7) {
   insn = next_insn(p, BRW_OPCODE_WHILE);
@@ -2389,7 +2402,7 @@ brw_find_loop_end(struct brw_compile *p, int start_offset)
 {
struct brw_context *brw = p-brw;
int offset;
-   int scale = 8;
+   int scale = 16 / brw_jump_scale(brw);
void *store = p-store;
 
assert(brw-gen = 6);
@@ -2421,7 +2434,8 @@ brw_set_uip_jip(struct brw_compile *p)
 {
struct brw_context *brw = p-brw;
int offset;
-   int scale = 8;
+   int br = brw_jump_scale(brw);
+   int scale = 16 / br;
void *store = p-store;
 
if (brw-gen  6)
@@ -2461,7 +2475,7 @@ brw_set_uip_jip(struct brw_compile *p)
 
   case BRW_OPCODE_ENDIF:
  if (block_end_offset == 0)
-brw_inst_set_jip(brw, insn, 2);
+brw_inst_set_jip(brw, insn, 1 * br);
  else
 brw_inst_set_jip(brw, insn, (block_end_offset - offset) / scale);
 break;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index cec2e82..d3509a0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -67,6 +67,8 @@ fs_generator::patch_discard_jumps_to_fb_writes()
if (brw-gen  6 || this-discard_halt_patches.is_empty())
   return false;
 
+   int scale = brw_jump_scale(brw);
+
/* There is a somewhat strange undocumented requirement of using
 * HALT, according to the simulator.  If some channel has HALTed to
 * a particular UIP, then by the end of the program, every channel
@@ -79,8 +81,8 @@ fs_generator::patch_discard_jumps_to_fb_writes()
 * 

[Mesa-dev] [PATCH 05/11] i965/eu: Port Broadwell CMP destination type hack to brw_eu_emit.c.

2014-08-09 Thread Kenneth Graunke
See gen8_generator::CMP().

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 5e14ba5..33684d4 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1785,6 +1785,14 @@ void brw_CMP(struct brw_compile *p,
struct brw_context *brw = p-brw;
brw_inst *insn = next_insn(p, BRW_OPCODE_CMP);
 
+   if (brw-gen = 8) {
+  /* The CMP instruction appears to behave erratically for floating point
+   * sources unless the destination type is also float.  Overriding it to
+   * match src0 makes it work in all cases.
+   */
+  dest.type = src0.type;
+   }
+
brw_inst_set_cond_modifier(brw, insn, conditional);
brw_set_dest(p, insn, dest);
brw_set_src0(p, insn, src0);
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] docs: Update relnotes for ARB_gpu_shader5, sort list

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 docs/relnotes/10.3.html | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html
index f023ca6..c237601 100644
--- a/docs/relnotes/10.3.html
+++ b/docs/relnotes/10.3.html
@@ -44,21 +44,22 @@ Note: some of the new features are only available with 
certain drivers.
 /p
 
 ul
+liGL_ARB_clear_texture on i965/li
 liGL_ARB_compressed_texture_pixel_storage on all drivers/li
 liGL_ARB_draw_indirect on nvc0, radeonsi/li
 liGL_ARB_explicit_uniform_location (all drivers that support GLSL)/li
+liGL_ARB_fragment_layer_viewport on nv50, nvc0, llvmpipe, r600/li
+liGL_ARB_gpu_shader5 on i965/gen7/li
 liGL_ARB_multi_draw_indirect on nvc0, radeonsi/li
 liGL_ARB_sample_shading on radeonsi/li
+liGL_ARB_seamless_cubemap_per_texture on i965, llvmpipe, nvc0, r600, 
radeonsi, softpipe/li
 liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li
 liGL_ARB_texture_cube_map_array on radeonsi/li
 liGL_ARB_texture_gather on r600, radeonsi/li
 liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, 
softpipe/li
 liGL_ARB_texture_query_lod on r600, radeonsi/li
 liGL_ARB_viewport_array on nvc0/li
-liGL_ARB_seamless_cubemap_per_texture on i965, llvmpipe, nvc0, r600, 
radeonsi, softpipe/li
-liGL_ARB_fragment_layer_viewport on nv50, nvc0, llvmpipe, r600/li
 liGL_AMD_vertex_shader_viewport_index on i965/gen7+, r600/li
-liGL_ARB_clear_texture on i965/li
 liA new software rasterizer driver (kms_swrast_dri.so) that works with
 DRM drivers that don't have a full-fledged GEM (such as qxl or simpledrm)/li
 /ul
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] i965/eu: Make it clear that brw_find_loop_end only runs on Gen6+.

2014-08-09 Thread Kenneth Graunke
It has Gen6+ knowledge baked in, and indeed is only called for Gen6+,
but it wasn't immediately obvious that this was the case.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 33684d4..2e11eca 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2384,6 +2384,8 @@ brw_find_loop_end(struct brw_compile *p, int start_offset)
int scale = 8;
void *store = p-store;
 
+   assert(brw-gen = 6);
+
/* Always start after the instruction (such as a WHILE) we're trying to fix
 * up.
 */
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] i965: Enable ARB_gpu_shader5 on Gen7

2014-08-09 Thread Chris Forbes
Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 4ee8636..04a47af 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -325,4 +325,7 @@ intelInitExtensions(struct gl_context *ctx)
 
if (brw-gen = 7)
   ctx-Extensions.ARB_shader_atomic_counters = true;
+
+   if (brw-gen == 7)
+  ctx-Extensions.ARB_gpu_shader5 = true;
 }
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] i965/eu: Update jump distance scaling for Broadwell.

2014-08-09 Thread Kenneth Graunke
Broadwell measures jump distances in bytes, so we need to scale by 16.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3d9c96a..0a66d0c 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1180,6 +1180,10 @@ void brw_NOP(struct brw_compile *p)
 unsigned
 brw_jump_scale(const struct brw_context *brw)
 {
+   /* Broadwell measures jump targets in bytes. */
+   if (brw-gen = 8)
+  return 16;
+
/* Ironlake and later measure jump targets in 64-bit data chunks (in order
 * (to support compaction), so each 128-bit instruction requires 2 chunks.
 */
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa: Make ARB_gpu_shader5 core-profile-only

2014-08-09 Thread Chris Forbes
Requires GLSL 1.50 or higher, which we only support in the core profile.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/main/extensions.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 9ac8377..7a9214e 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -115,7 +115,7 @@ static const struct extension extension_table[] = {
{ GL_ARB_framebuffer_object,  o(ARB_framebuffer_object),  
GL, 2005 },
{ GL_ARB_framebuffer_sRGB,o(EXT_framebuffer_sRGB),
GL, 1998 },
{ GL_ARB_get_program_binary,  o(dummy_true),  
GL, 2010 },
-   { GL_ARB_gpu_shader5, o(ARB_gpu_shader5), 
GL, 2010 },
+   { GL_ARB_gpu_shader5, o(ARB_gpu_shader5), 
GLC, 2010 },
{ GL_ARB_half_float_pixel,o(dummy_true),  
GL, 2003 },
{ GL_ARB_half_float_vertex,   o(ARB_half_float_vertex),   
GL, 2008 },
{ GL_ARB_instanced_arrays,o(ARB_instanced_arrays),
GL, 2008 },
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/11] i965/eu: Make it clear that brw_patch_break_count only runs on Gen4-5.

2014-08-09 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 2e11eca..214ff0a 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1674,6 +1674,8 @@ brw_patch_break_cont(struct brw_compile *p, brw_inst 
*while_inst)
brw_inst *inst;
int br = (brw-gen == 5) ? 2 : 1;
 
+   assert(brw-gen  6);
+
for (inst = while_inst - 1; inst != do_inst; inst--) {
   /* If the jump count is != 0, that means that this instruction has 
already
* been patched because it's part of a loop inside of the one we're
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/11] i965/eu: Emulate F32TO16 and F16TO32 on Broadwell.

2014-08-09 Thread Chris Forbes
 +  if (align16) {
 + /* Emulate the Gen7 zeroing bug (see comments in vec4_visitor's
 +  * emit_pack_half_2x16 method.)
 +  */
 + brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_UD), brw_imm_ud(0u));
 +  }
 +  return brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_HF), src);
 +   } else {
 +  assert(brw-gen = 7);

This can be == 7.

 +  return brw_alu1(p, BRW_OPCODE_F32TO16, dst, src);
 +   }
 +}
 +
 +brw_inst *
 +brw_F16TO32(struct brw_compile *p, struct brw_reg dst, struct brw_reg src)
 +{
 +   const struct brw_context *brw = p-brw;
 +   assert(src.type == BRW_REGISTER_TYPE_W ||
 +  src.type == BRW_REGISTER_TYPE_UW ||
 +  src.type == BRW_REGISTER_TYPE_HF);
 +
 +   if (brw-gen = 8) {
 +  return brw_MOV(p, dst, retype(src, BRW_REGISTER_TYPE_HF));
 +   } else {
 +  assert(brw-gen = 7);

Same here
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965: Broadwell using brw_eu_emit.c, part 1

2014-08-09 Thread Chris Forbes
For the series:

Reviewed-by: Chris Forbes chr...@ijw.co.nz

On Sun, Aug 10, 2014 at 9:28 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 Hello,

 This is a series of preparatory work for getting Broadwell to use the
 brw_eu_emit.c code generator rather than gen8_generator.  The hope is
 to drop the separate Gen8+ code completely.  We couldn't do this
 originally because the old code was centered around a struct describing
 the Gen4-7 assembly format, but now that we've shifted to the new
 brw_inst API, there's no reason we can't share code.

 This doesn't actually get us there - I've still got some bugs to track
 down, but these patches have been in my tree for a long time and seem
 pretty solid.

 --Ken

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev