Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. Hmm, I suppose this might depend on the specific GPU? If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. But that hurts performance? The patch looks good to me, just a minor comment: +/* Emit a CP DMA packet to do a copy from one buffer to another. + * The size must fit in bits [20:0]. Notes: + * + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done. + * + * 2) Set raw_hazard_wait to true if the source data was used as a destination + *in a previous CP DMA packet. It's for preventing a read-after-write hazard + *between two CP DMA packets. + */ +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx, +uint64_t dst_va, uint64_t src_va, +unsigned size, +bool sync, bool raw_hazard_wait) [...] + /* Copy the descriptors to a new context slot. */ + si_emit_cp_dma_copy_buffer(rctx, +va_base + new_context_id * desc-context_size, +va_base + desc-current_context_id * desc-context_size, +desc-context_size, true, false); It's hard to remember / guess the meaning of such boolean parameters, so it might be better to use self-explanatory flags instead. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
Am 15.08.2013 09:33, schrieb Michel Dänzer: On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. Hmm, I suppose this might depend on the specific GPU? That looks like the SQ is indeed capable of handling more request at once. Well there has to be an upper limit to this, we just have to figure it out from the docs. If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. But that hurts performance? Yeah, probably allot. Sounds like a complete pipeline drain to me. Don't fully understand it either, but I think the commands from the IB are read through the KCACHE as well. So when you flush it the CP completely stops until that command is complete. The patch looks good to me, just a minor comment: Also have some more comments on this, but going to write a separate mail on it. Christian. +/* Emit a CP DMA packet to do a copy from one buffer to another. + * The size must fit in bits [20:0]. Notes: + * + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done. + * + * 2) Set raw_hazard_wait to true if the source data was used as a destination + *in a previous CP DMA packet. It's for preventing a read-after-write hazard + *between two CP DMA packets. + */ +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx, + uint64_t dst_va, uint64_t src_va, + unsigned size, + bool sync, bool raw_hazard_wait) [...] + /* Copy the descriptors to a new context slot. */ + si_emit_cp_dma_copy_buffer(rctx, + va_base + new_context_id * desc-context_size, + va_base + desc-current_context_id * desc-context_size, + desc-context_size, true, false); It's hard to remember / guess the meaning of such boolean parameters, so it might be better to use self-explanatory flags instead. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface
On Mit, 2013-08-14 at 18:35 +0200, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com This makes things a bit nicer, and more importantly it fixes an issue where a downgraded array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could also be fixed by passing target through the sampler interface much the same way as is done for size queries, might do this eventually anyway.) And if we'd ever want to support (shadow) cube map arrays, we'd need 5 coords in any case. v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex using wrong shadow coord for 2d...). Plus need to project the shadow coord, and just for fun keep projecting the layer coord too. [...] @@ -1631,55 +1632,58 @@ emit_tex( struct lp_build_tgsi_soa_context *bld, } switch (inst-Texture.Texture) { - case TGSI_TEXTURE_1D: - num_coords = 1; - num_offsets = 1; - num_derivs = 1; - break; case TGSI_TEXTURE_1D_ARRAY: - num_coords = 2; + layer_coord = 1; + /* fallthrough */ + case TGSI_TEXTURE_1D: num_offsets = 1; num_derivs = 1; break; + case TGSI_TEXTURE_2D_ARRAY: + layer_coord = 2; + /* fallthrough */ case TGSI_TEXTURE_2D: case TGSI_TEXTURE_RECT: - num_coords = 2; num_offsets = 2; num_derivs = 2; break; - case TGSI_TEXTURE_SHADOW1D: case TGSI_TEXTURE_SHADOW1D_ARRAY: - num_coords = 3; + layer_coord = 1; + /* fallthrough */ + case TGSI_TEXTURE_SHADOW1D: + shadow_coord = 2; num_offsets = 1; num_derivs = 1; break; + case TGSI_TEXTURE_SHADOW2D_ARRAY: + layer_coord = 2; + shadow_coord = 3; + num_offsets = 2; + num_derivs = 2; + break; case TGSI_TEXTURE_SHADOW2D: case TGSI_TEXTURE_SHADOWRECT: - case TGSI_TEXTURE_2D_ARRAY: - num_coords = 3; + shadow_coord = 2; num_offsets = 2; num_derivs = 2; break; case TGSI_TEXTURE_CUBE: - num_coords = 3; num_offsets = 2; num_derivs = 3; break; case TGSI_TEXTURE_3D: - num_coords = 3; num_offsets = 3; num_derivs = 3; break; - case TGSI_TEXTURE_SHADOW2D_ARRAY: - num_coords = 4; - num_offsets = 2; - num_derivs = 2; - break; case TGSI_TEXTURE_SHADOWCUBE: - num_coords = 4; + shadow_coord = 3; num_offsets = 2; num_derivs = 3; break; + case TGSI_TEXTURE_CUBE_ARRAY: + case TGSI_TEXTURE_SHADOWCUBE_ARRAY: + case TGSI_TEXTURE_2D_MSAA: + case TGSI_TEXTURE_2D_ARRAY_MSAA: default: assert(0); return; Couldn't some of this code be simplified using tgsi_util_get_texture_coord_dim()? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
Am 15.08.2013 05:25, schrieb Marek Olšák: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. I don't have an explanation for this. --- The idea itself looks really good to me, but we should probably also move the all resources and samplers to the new model and then rip out the code that stores them directly into the IB. +/* Emit a CP DMA packet to do a copy from one buffer to another. + * The size must fit in bits [20:0]. Notes: + * + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done. + * + * 2) Set raw_hazard_wait to true if the source data was used as a destination + *in a previous CP DMA packet. It's for preventing a read-after-write hazard + *between two CP DMA packets. + */ +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx, + uint64_t dst_va, uint64_t src_va, + unsigned size, + bool sync, bool raw_hazard_wait) +{ + struct radeon_winsys_cs *cs = rctx-cs; + uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0; + uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : 0; + + assert(size); + assert((size ((121)-1)) == size); + + cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0); + cs-buf[cs-cdw++] = src_va; /* SRC_ADDR_LO [31:0] */ + cs-buf[cs-cdw++] = sync_flag | ((src_va 32) 0xff); /* CP_SYNC [31] | SRC_ADDR_HI [7:0] */ + cs-buf[cs-cdw++] = dst_va; /* DST_ADDR_LO [31:0] */ + cs-buf[cs-cdw++] = (dst_va 32) 0xff; /* DST_ADDR_HI [7:0] */ + cs-buf[cs-cdw++] = size | raw_wait; /* COMMAND [29:22] | BYTE_COUNT [20:0] */ +} + +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits [20:0]. */ +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx, + uint64_t dst_va, unsigned size, + uint32_t clear_value, + bool sync, bool raw_hazard_wait) +{ + struct radeon_winsys_cs *cs = rctx-cs; + uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0; + uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : 0; + + assert(size); + assert((size ((121)-1)) == size); + + cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0); + cs-buf[cs-cdw++] = clear_value; /* DATA [31:0] */ + cs-buf[cs-cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /* CP_SYNC [31] | SRC_SEL[30:29] */ + cs-buf[cs-cdw++] = dst_va; /* DST_ADDR_LO [31:0] */ + cs-buf[cs-cdw++] = (dst_va 32) 0xff; /* DST_ADDR_HI [7:0] */ + cs-buf[cs-cdw++] = size | raw_wait; /* COMMAND [29:22] | BYTE_COUNT [20:0] */ +} Can we use some kind of macro or inline function instead of cs-buf[cs-cdw++] ? That should help of we need to port that over to a different CS mechanism. And IIRC the CP DMA is identical on all chipset generation (maybe excluding early R6xx, but I'm not 100% sure of that), so it might be a good idea to start sharing code again by putting this under src/gallium/drivers/radeon/radeon_cp_dma.c. Not necessary now, but more as a general idea. What do you think? Christian. + +static void si_init_descriptors(struct r600_context *rctx, + struct si_descriptors *desc, + unsigned shader_userdata_reg, + unsigned element_dw_size, + unsigned num_elements, + void (*emit_func)(struct r600_context *ctx, struct si_atom *state)) +{ + uint64_t va; + + desc-atom.emit = emit_func; + desc-shader_userdata_reg = shader_userdata_reg; + desc-element_dw_size = element_dw_size; + desc-num_elements = num_elements; + desc-context_size = num_elements * element_dw_size * 4; + + desc-buffer = (struct si_resource*) + pipe_buffer_create(rctx-context.screen, PIPE_BIND_CUSTOM, +
Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests
On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote: On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote: On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote: LLVM revision 187139 ('Allocate local registers in order for optimal coloring.') broke some derivative related piglit tests with the radeonsi driver. I'm attaching a diff between the bad and good generated code (as printed with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only difference I can see is in which registers are used in which order. I wonder if we might be missing S_WAITCNT after DS_READ/WRITE instructions in some cases, but I haven't spotted any candidates for that in the bad code which aren't there in the good code as well. Can anyone else spot something I've missed? Shouldn't we be using the S_BARRIER instruction to keep the threads in sync? Doesn't seem to help unfortunately, but thanks for the good suggestion. I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR register number instead of the $gds operand for encoding the GDS field (the asm output from llc even shows the VGPR name). If the VGPR number happens to be odd (i.e. to have the least significant bit set), the shader ends up writing to GDS instead of LDS. But I have no idea why this is happening, or how to fix it. :( -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
On Thu, Aug 15, 2013 at 9:33 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. Hmm, I suppose this might depend on the specific GPU? Very likely. I'm using SI VERDE. If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. But that hurts performance? I didn't test performance in that case, but presumably yes. The patch looks good to me, just a minor comment: +/* Emit a CP DMA packet to do a copy from one buffer to another. + * The size must fit in bits [20:0]. Notes: + * + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done. + * + * 2) Set raw_hazard_wait to true if the source data was used as a destination + *in a previous CP DMA packet. It's for preventing a read-after-write hazard + *between two CP DMA packets. + */ +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx, +uint64_t dst_va, uint64_t src_va, +unsigned size, +bool sync, bool raw_hazard_wait) [...] + /* Copy the descriptors to a new context slot. */ + si_emit_cp_dma_copy_buffer(rctx, +va_base + new_context_id * desc-context_size, +va_base + desc-current_context_id * desc-context_size, +desc-context_size, true, false); It's hard to remember / guess the meaning of such boolean parameters, so it might be better to use self-explanatory flags instead. I was thinking of that too. Alright, I'll add the flags. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling.
Signed-off-by: Axel Davy axel.d...@ens.fr --- include/GL/internal/dri_interface.h | 1 + src/gallium/drivers/i915/i915_resource.c| 8 ++-- src/gallium/drivers/ilo/ilo_resource.c | 2 +- src/gallium/drivers/nv50/nv50_miptree.c | 3 +++ src/gallium/drivers/nvc0/nvc0_miptree.c | 3 +++ src/gallium/drivers/r300/r300_texture.c | 2 +- src/gallium/drivers/r600/r600_texture.c | 3 ++- src/gallium/drivers/radeonsi/r600_texture.c | 2 +- src/gallium/include/pipe/p_defines.h| 4 src/gallium/state_trackers/dri/drm/dri2.c | 2 ++ src/mesa/drivers/dri/i915/intel_screen.c| 3 +++ src/mesa/drivers/dri/i965/intel_screen.c| 3 +++ 12 files changed, 30 insertions(+), 6 deletions(-) diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h index be31bb8..709fece 100644 --- a/include/GL/internal/dri_interface.h +++ b/include/GL/internal/dri_interface.h @@ -968,6 +968,7 @@ struct __DRIdri2ExtensionRec { #define __DRI_IMAGE_USE_SHARE 0x0001 #define __DRI_IMAGE_USE_SCANOUT0x0002 #define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */ +#define __DRI_IMAGE_USE_LINEAR 0x0008 /** diff --git a/src/gallium/drivers/i915/i915_resource.c b/src/gallium/drivers/i915/i915_resource.c index 314ebe9..627ed2b 100644 --- a/src/gallium/drivers/i915/i915_resource.c +++ b/src/gallium/drivers/i915/i915_resource.c @@ -12,8 +12,12 @@ i915_resource_create(struct pipe_screen *screen, if (template-target == PIPE_BUFFER) return i915_buffer_create(screen, template); else - return i915_texture_create(screen, template, FALSE); - + { + if (!(template-bind PIPE_BIND_LINEAR)) + return i915_texture_create(screen, template, FALSE); + else + return i915_texture_create(screen, template, TRUE); + } } static struct pipe_resource * diff --git a/src/gallium/drivers/ilo/ilo_resource.c b/src/gallium/drivers/ilo/ilo_resource.c index 5061f69..7dd3435 100644 --- a/src/gallium/drivers/ilo/ilo_resource.c +++ b/src/gallium/drivers/ilo/ilo_resource.c @@ -473,7 +473,7 @@ tex_layout_init_tiling(struct tex_layout *layout) * The cursor surface address must be 4K byte aligned. The cursor must * be in linear memory, it cannot be tiled. */ - if (unlikely(templ-bind PIPE_BIND_CURSOR)) + if (unlikely(templ-bind (PIPE_BIND_CURSOR | PIPE_BIND_LINEAR))) valid_tilings = tile_none; /* diff --git a/src/gallium/drivers/nv50/nv50_miptree.c b/src/gallium/drivers/nv50/nv50_miptree.c index 28be768..e44c843 100644 --- a/src/gallium/drivers/nv50/nv50_miptree.c +++ b/src/gallium/drivers/nv50/nv50_miptree.c @@ -326,6 +326,9 @@ nv50_miptree_create(struct pipe_screen *pscreen, pipe_reference_init(pt-reference, 1); pt-screen = pscreen; + if (pt-bind PIPE_BIND_LINEAR) + pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR; + bo_config.nv50.memtype = nv50_mt_choose_storage_type(mt, TRUE); if (!nv50_miptree_init_ms_mode(mt)) { diff --git a/src/gallium/drivers/nvc0/nvc0_miptree.c b/src/gallium/drivers/nvc0/nvc0_miptree.c index 9e57d74..f359207 100644 --- a/src/gallium/drivers/nvc0/nvc0_miptree.c +++ b/src/gallium/drivers/nvc0/nvc0_miptree.c @@ -274,6 +274,9 @@ nvc0_miptree_create(struct pipe_screen *pscreen, } } + if (pt-bind PIPE_BIND_LINEAR) + pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR; + bo_config.nvc0.memtype = nvc0_mt_choose_storage_type(mt, compressed); if (!nvc0_miptree_init_ms_mode(mt)) { diff --git a/src/gallium/drivers/r300/r300_texture.c b/src/gallium/drivers/r300/r300_texture.c index 13e9bc3..b7fb081 100644 --- a/src/gallium/drivers/r300/r300_texture.c +++ b/src/gallium/drivers/r300/r300_texture.c @@ -1079,7 +1079,7 @@ struct pipe_resource *r300_texture_create(struct pipe_screen *screen, enum radeon_bo_layout microtile, macrotile; if ((base-flags R300_RESOURCE_FLAG_TRANSFER) || -(base-bind PIPE_BIND_SCANOUT)) { +(base-bind (PIPE_BIND_SCANOUT | PIPE_BIND_LINEAR))) { microtile = RADEON_LAYOUT_LINEAR; macrotile = RADEON_LAYOUT_LINEAR; } else { diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 36cca17..b81a432 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -609,7 +609,8 @@ struct pipe_resource *r600_texture_create(struct pipe_screen *screen, * because 422 formats are used for videos, which prefer linear buffers * for fast uploads anyway. */ if (!(templ-flags R600_RESOURCE_FLAG_TRANSFER) - desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) { + (desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) + !(templ-bind PIPE_BIND_LINEAR)) { if (templ-flags R600_RESOURCE_FLAG_FORCE_TILING) { array_mode = V_038000_ARRAY_2D_TILED_THIN1; } else if (!(templ-bind
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
On Thu, Aug 15, 2013 at 10:27 AM, Christian König deathsim...@vodafone.de wrote: Am 15.08.2013 05:25, schrieb Marek Olšák: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. I don't have an explanation for this. --- The idea itself looks really good to me, but we should probably also move the all resources and samplers to the new model and then rip out the code that stores them directly into the IB. I'd like MSAA to land first, but yes, the plan is to eventually move all resources and samplers to the new model. +/* Emit a CP DMA packet to do a copy from one buffer to another. + * The size must fit in bits [20:0]. Notes: + * + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is done. + * + * 2) Set raw_hazard_wait to true if the source data was used as a destination + *in a previous CP DMA packet. It's for preventing a read-after-write hazard + *between two CP DMA packets. + */ +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx, + uint64_t dst_va, uint64_t src_va, + unsigned size, + bool sync, bool raw_hazard_wait) +{ + struct radeon_winsys_cs *cs = rctx-cs; + uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0; + uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : 0; + + assert(size); + assert((size ((121)-1)) == size); + + cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0); + cs-buf[cs-cdw++] = src_va;/* SRC_ADDR_LO [31:0] */ + cs-buf[cs-cdw++] = sync_flag | ((src_va 32) 0xff); /* CP_SYNC [31] | SRC_ADDR_HI [7:0] */ + cs-buf[cs-cdw++] = dst_va;/* DST_ADDR_LO [31:0] */ + cs-buf[cs-cdw++] = (dst_va 32) 0xff; /* DST_ADDR_HI [7:0] */ + cs-buf[cs-cdw++] = size | raw_wait; /* COMMAND [29:22] | BYTE_COUNT [20:0] */ +} + +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits [20:0]. */ +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx, + uint64_t dst_va, unsigned size, + uint32_t clear_value, + bool sync, bool raw_hazard_wait) +{ + struct radeon_winsys_cs *cs = rctx-cs; + uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0; + uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT : 0; + + assert(size); + assert((size ((121)-1)) == size); + + cs-buf[cs-cdw++] = PKT3(PKT3_CP_DMA, 4, 0); + cs-buf[cs-cdw++] = clear_value; /* DATA [31:0] */ + cs-buf[cs-cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /* CP_SYNC [31] | SRC_SEL[30:29] */ + cs-buf[cs-cdw++] = dst_va;/* DST_ADDR_LO [31:0] */ + cs-buf[cs-cdw++] = (dst_va 32) 0xff; /* DST_ADDR_HI [7:0] */ + cs-buf[cs-cdw++] = size | raw_wait; /* COMMAND [29:22] | BYTE_COUNT [20:0] */ +} Can we use some kind of macro or inline function instead of cs-buf[cs-cdw++] ? That should help of we need to port that over to a different CS mechanism. How about this? static INLINE void r600_write_value(struct radeon_winsys_cs *cs, unsigned value) { cs-buf[cs-cdw++] = value; } And IIRC the CP DMA is identical on all chipset generation (maybe excluding early R6xx, but I'm not 100% sure of that), so it might be a good idea to start sharing code again by putting this under src/gallium/drivers/radeon/radeon_cp_dma.c. Not necessary now, but more as a general idea. What do you think? I agree. CP DMA is indeed identical on all chipsets. The copying is supported since R600 and the clearing is supported since Evergreen. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling.
Signed-off-by: Axel Davy axel.d...@ens.fr --- include/GL/internal/dri_interface.h | 1 + src/gallium/drivers/i915/i915_resource.c| 8 ++-- src/gallium/drivers/ilo/ilo_resource.c | 2 +- src/gallium/drivers/nv50/nv50_miptree.c | 3 +++ src/gallium/drivers/nvc0/nvc0_miptree.c | 3 +++ src/gallium/drivers/r300/r300_texture.c | 2 +- src/gallium/drivers/r600/r600_texture.c | 3 ++- src/gallium/drivers/radeonsi/r600_texture.c | 2 +- src/gallium/include/pipe/p_defines.h| 4 src/gallium/state_trackers/dri/drm/dri2.c | 2 ++ src/mesa/drivers/dri/i915/intel_screen.c| 3 +++ src/mesa/drivers/dri/i965/intel_screen.c| 3 +++ 12 files changed, 30 insertions(+), 6 deletions(-) diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h index be31bb8..709fece 100644 --- a/include/GL/internal/dri_interface.h +++ b/include/GL/internal/dri_interface.h @@ -968,6 +968,7 @@ struct __DRIdri2ExtensionRec { #define __DRI_IMAGE_USE_SHARE 0x0001 #define __DRI_IMAGE_USE_SCANOUT0x0002 #define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */ +#define __DRI_IMAGE_USE_LINEAR 0x0008 /** diff --git a/src/gallium/drivers/i915/i915_resource.c b/src/gallium/drivers/i915/i915_resource.c index 314ebe9..627ed2b 100644 --- a/src/gallium/drivers/i915/i915_resource.c +++ b/src/gallium/drivers/i915/i915_resource.c @@ -12,8 +12,12 @@ i915_resource_create(struct pipe_screen *screen, if (template-target == PIPE_BUFFER) return i915_buffer_create(screen, template); else - return i915_texture_create(screen, template, FALSE); - + { + if (!(template-bind PIPE_BIND_LINEAR)) + return i915_texture_create(screen, template, FALSE); + else + return i915_texture_create(screen, template, TRUE); + } } static struct pipe_resource * diff --git a/src/gallium/drivers/ilo/ilo_resource.c b/src/gallium/drivers/ilo/ilo_resource.c index 5061f69..7dd3435 100644 --- a/src/gallium/drivers/ilo/ilo_resource.c +++ b/src/gallium/drivers/ilo/ilo_resource.c @@ -473,7 +473,7 @@ tex_layout_init_tiling(struct tex_layout *layout) * The cursor surface address must be 4K byte aligned. The cursor must * be in linear memory, it cannot be tiled. */ - if (unlikely(templ-bind PIPE_BIND_CURSOR)) + if (unlikely(templ-bind (PIPE_BIND_CURSOR | PIPE_BIND_LINEAR))) valid_tilings = tile_none; /* diff --git a/src/gallium/drivers/nv50/nv50_miptree.c b/src/gallium/drivers/nv50/nv50_miptree.c index 28be768..e44c843 100644 --- a/src/gallium/drivers/nv50/nv50_miptree.c +++ b/src/gallium/drivers/nv50/nv50_miptree.c @@ -326,6 +326,9 @@ nv50_miptree_create(struct pipe_screen *pscreen, pipe_reference_init(pt-reference, 1); pt-screen = pscreen; + if (pt-bind PIPE_BIND_LINEAR) + pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR; + bo_config.nv50.memtype = nv50_mt_choose_storage_type(mt, TRUE); if (!nv50_miptree_init_ms_mode(mt)) { diff --git a/src/gallium/drivers/nvc0/nvc0_miptree.c b/src/gallium/drivers/nvc0/nvc0_miptree.c index 9e57d74..f359207 100644 --- a/src/gallium/drivers/nvc0/nvc0_miptree.c +++ b/src/gallium/drivers/nvc0/nvc0_miptree.c @@ -274,6 +274,9 @@ nvc0_miptree_create(struct pipe_screen *pscreen, } } + if (pt-bind PIPE_BIND_LINEAR) + pt-flags |= NOUVEAU_RESOURCE_FLAG_LINEAR; + bo_config.nvc0.memtype = nvc0_mt_choose_storage_type(mt, compressed); if (!nvc0_miptree_init_ms_mode(mt)) { diff --git a/src/gallium/drivers/r300/r300_texture.c b/src/gallium/drivers/r300/r300_texture.c index 13e9bc3..b7fb081 100644 --- a/src/gallium/drivers/r300/r300_texture.c +++ b/src/gallium/drivers/r300/r300_texture.c @@ -1079,7 +1079,7 @@ struct pipe_resource *r300_texture_create(struct pipe_screen *screen, enum radeon_bo_layout microtile, macrotile; if ((base-flags R300_RESOURCE_FLAG_TRANSFER) || -(base-bind PIPE_BIND_SCANOUT)) { +(base-bind (PIPE_BIND_SCANOUT | PIPE_BIND_LINEAR))) { microtile = RADEON_LAYOUT_LINEAR; macrotile = RADEON_LAYOUT_LINEAR; } else { diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 36cca17..b81a432 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -609,7 +609,8 @@ struct pipe_resource *r600_texture_create(struct pipe_screen *screen, * because 422 formats are used for videos, which prefer linear buffers * for fast uploads anyway. */ if (!(templ-flags R600_RESOURCE_FLAG_TRANSFER) - desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) { + (desc-layout != UTIL_FORMAT_LAYOUT_SUBSAMPLED) + !(templ-bind PIPE_BIND_LINEAR)) { if (templ-flags R600_RESOURCE_FLAG_FORCE_TILING) { array_mode = V_038000_ARRAY_2D_TILED_THIN1; } else if (!(templ-bind
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
Am 15.08.2013 12:54, schrieb Marek Olšák: On Thu, Aug 15, 2013 at 10:27 AM, Christian König deathsim...@vodafone.de wrote: Am 15.08.2013 05:25, schrieb Marek Olšák: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. I don't have an explanation for this. --- The idea itself looks really good to me, but we should probably also move the all resources and samplers to the new model and then rip out the code that stores them directly into the IB. I'd like MSAA to land first, but yes, the plan is to eventually move all resources and samplers to the new model. Sounds good. Can we use some kind of macro or inline function instead of cs-buf[cs-cdw++] ? That should help of we need to port that over to a different CS mechanism. How about this? static INLINE void r600_write_value(struct radeon_winsys_cs *cs, unsigned value) { cs-buf[cs-cdw++] = value; } I would name it more like radeon_emit(cs, value) and put it in radeon_winsys.h, but yeah that's the general idea. Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau/video: use correct parameter name
On 15.08.2013 02:10, Emil Velikov wrote: Fix a typo introduced with commit d1ba1055d9 - vl: Add support for max level query v2 Cc: Rico Schüller kgbric...@web.de Cc: Christian König christian.koe...@amd.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68126 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- FWIW the original patch could have introduced default params for all profiles, and let the individual driver provide their own function to ease duplication. A call to get_params(VIDEO_CAP_SUPPORTED) ensures that we do not request the CAP_MAX_LEVEL of a unsupported profile And what are the correct default values? Are these the values which most hardware can do (at this time) or the maximum values? (for reference only for H.264 see: http://en.wikipedia.org/wiki/H.264#Levels ) It's pretty much likely that some video decoding units may have different supported levels and then we would add that back in again. I'm not sure this is really better. Does something like the attached meet your needs? Thanks for the fix. Cheers Rico src/gallium/drivers/nouveau/nouveau_video.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nouveau_video.c b/src/gallium/drivers/nouveau/nouveau_video.c index 1563b22..5c4ec0f 100644 --- a/src/gallium/drivers/nouveau/nouveau_video.c +++ b/src/gallium/drivers/nouveau/nouveau_video.c @@ -863,7 +863,7 @@ nouveau_screen_get_video_param(struct pipe_screen *pscreen, case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE: return true; case PIPE_VIDEO_CAP_MAX_LEVEL: - return vl_level_supported(screen, profile); + return vl_level_supported(pscreen, profile); default: debug_printf(unknown video param: %d\n, param); return 0; commit d53abeff5304bb6148ddc1aae1d4810282042ee5 Author: Rico Schüller kgbric...@web.de Date: Wed Aug 14 13:00:33 2013 +0200 vl/decoder: Use defaults for the supported level. diff --git a/src/gallium/auxiliary/vl/vl_decoder.c b/src/gallium/auxiliary/vl/vl_decoder.c index 16f09b5..68c6e4c 100644 --- a/src/gallium/auxiliary/vl/vl_decoder.c +++ b/src/gallium/auxiliary/vl/vl_decoder.c @@ -54,6 +54,20 @@ vl_level_supported(struct pipe_screen *screen, enum pipe_video_profile profile) case PIPE_VIDEO_PROFILE_MPEG2_SIMPLE: case PIPE_VIDEO_PROFILE_MPEG2_MAIN: return 3; + case PIPE_VIDEO_PROFILE_MPEG4_SIMPLE: + return 3; + case PIPE_VIDEO_PROFILE_MPEG4_ADVANCED_SIMPLE: + return 5; + case PIPE_VIDEO_PROFILE_VC1_SIMPLE: + return 1; + case PIPE_VIDEO_PROFILE_VC1_MAIN: + return 2; + case PIPE_VIDEO_PROFILE_VC1_ADVANCED: + return 4; + case PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE: + case PIPE_VIDEO_PROFILE_MPEG4_AVC_MAIN: + case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH: + return 41; default: return 0; } diff --git a/src/gallium/auxiliary/vl/vl_decoder.h b/src/gallium/auxiliary/vl/vl_decoder.h index b0b4161..cb4811d 100644 --- a/src/gallium/auxiliary/vl/vl_decoder.h +++ b/src/gallium/auxiliary/vl/vl_decoder.h @@ -38,7 +38,7 @@ bool vl_profile_supported(struct pipe_screen *screen, enum pipe_video_profile profile); /** - * get the maximum supported level for the given profile with shader based decoding + * get the default maximum supported level for the given profile */ int vl_level_supported(struct pipe_screen *screen, enum pipe_video_profile profile); diff --git a/src/gallium/drivers/nv50/nv84_video.c b/src/gallium/drivers/nv50/nv84_video.c index 3602a6d..0d1af8e 100644 --- a/src/gallium/drivers/nv50/nv84_video.c +++ b/src/gallium/drivers/nv50/nv84_video.c @@ -779,20 +779,7 @@ nv84_screen_get_video_param(struct pipe_screen *pscreen, case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE: return false; case PIPE_VIDEO_CAP_MAX_LEVEL: - switch (profile) { - case PIPE_VIDEO_PROFILE_MPEG1: - return 0; - case PIPE_VIDEO_PROFILE_MPEG2_SIMPLE: - case PIPE_VIDEO_PROFILE_MPEG2_MAIN: - return 3; - case PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE: - case PIPE_VIDEO_PROFILE_MPEG4_AVC_MAIN: - case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH: - return 41; - default: - debug_printf(unknown video profile: %d\n, profile); - return 0; - } + return vl_level_supported(pscreen, profile); default: debug_printf(unknown video param: %d\n, param); return 0; diff --git a/src/gallium/drivers/nvc0/nvc0_video.c b/src/gallium/drivers/nvc0/nvc0_video.c index a871ab7..b987344 100644 --- a/src/gallium/drivers/nvc0/nvc0_video.c +++ b/src/gallium/drivers/nvc0/nvc0_video.c @@ -49,30 +49,7 @@ nvc0_screen_get_video_param(struct pipe_screen *pscreen, case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE: return false; case PIPE_VIDEO_CAP_MAX_LEVEL: - switch (profile) { - case PIPE_VIDEO_PROFILE_MPEG1: - return 0; - case PIPE_VIDEO_PROFILE_MPEG2_SIMPLE: -
Re: [Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface
Am 15.08.2013 10:16, schrieb Michel Dänzer: On Mit, 2013-08-14 at 18:35 +0200, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com This makes things a bit nicer, and more importantly it fixes an issue where a downgraded array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could also be fixed by passing target through the sampler interface much the same way as is done for size queries, might do this eventually anyway.) And if we'd ever want to support (shadow) cube map arrays, we'd need 5 coords in any case. v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex using wrong shadow coord for 2d...). Plus need to project the shadow coord, and just for fun keep projecting the layer coord too. [...] @@ -1631,55 +1632,58 @@ emit_tex( struct lp_build_tgsi_soa_context *bld, } switch (inst-Texture.Texture) { - case TGSI_TEXTURE_1D: - num_coords = 1; - num_offsets = 1; - num_derivs = 1; - break; case TGSI_TEXTURE_1D_ARRAY: - num_coords = 2; + layer_coord = 1; + /* fallthrough */ + case TGSI_TEXTURE_1D: num_offsets = 1; num_derivs = 1; break; + case TGSI_TEXTURE_2D_ARRAY: + layer_coord = 2; + /* fallthrough */ case TGSI_TEXTURE_2D: case TGSI_TEXTURE_RECT: - num_coords = 2; num_offsets = 2; num_derivs = 2; break; - case TGSI_TEXTURE_SHADOW1D: case TGSI_TEXTURE_SHADOW1D_ARRAY: - num_coords = 3; + layer_coord = 1; + /* fallthrough */ + case TGSI_TEXTURE_SHADOW1D: + shadow_coord = 2; num_offsets = 1; num_derivs = 1; break; + case TGSI_TEXTURE_SHADOW2D_ARRAY: + layer_coord = 2; + shadow_coord = 3; + num_offsets = 2; + num_derivs = 2; + break; case TGSI_TEXTURE_SHADOW2D: case TGSI_TEXTURE_SHADOWRECT: - case TGSI_TEXTURE_2D_ARRAY: - num_coords = 3; + shadow_coord = 2; num_offsets = 2; num_derivs = 2; break; case TGSI_TEXTURE_CUBE: - num_coords = 3; num_offsets = 2; num_derivs = 3; break; case TGSI_TEXTURE_3D: - num_coords = 3; num_offsets = 3; num_derivs = 3; break; - case TGSI_TEXTURE_SHADOW2D_ARRAY: - num_coords = 4; - num_offsets = 2; - num_derivs = 2; - break; case TGSI_TEXTURE_SHADOWCUBE: - num_coords = 4; + shadow_coord = 3; num_offsets = 2; num_derivs = 3; break; + case TGSI_TEXTURE_CUBE_ARRAY: + case TGSI_TEXTURE_SHADOWCUBE_ARRAY: + case TGSI_TEXTURE_2D_MSAA: + case TGSI_TEXTURE_2D_ARRAY_MSAA: default: assert(0); return; Couldn't some of this code be simplified using tgsi_util_get_texture_coord_dim()? Hmm yes this code is a bunch of crap but whenever I try to simplify it it just gets more complicated. Maybe it was a mistake to adjust layer coord position here, could do that in the sampling code itself, which would mean tgsi_util_get_texture_coord_dim() would fit quite well. The different handling of derivs/offsets though sort of looks required to not fetch these components for explicit derivatives, texel offsets as for instance cube maps have 3 coords but only 2 offsets etc. Though I wonder if it's possible that these can end up as invalid pointers, I guess not so could probably stop caring about them and just fetch more components than potentially necessary (after all llvm will immediately drop them anyway as they won't get used). That definitely would clean things up. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau/video: use correct parameter name
On 15/08/13 13:41, Rico Schüller wrote: On 15.08.2013 02:10, Emil Velikov wrote: Fix a typo introduced with commit d1ba1055d9 - vl: Add support for max level query v2 Cc: Rico Schüller kgbric...@web.de Cc: Christian König christian.koe...@amd.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68126 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- FWIW the original patch could have introduced default params for all profiles, and let the individual driver provide their own function to ease duplication. A call to get_params(VIDEO_CAP_SUPPORTED) ensures that we do not request the CAP_MAX_LEVEL of a unsupported profile And what are the correct default values? Are these the values which most hardware can do (at this time) or the maximum values? (for reference only for H.264 see: http://en.wikipedia.org/wiki/H.264#Levels ) It's pretty much likely that some video decoding units may have different supported levels and then we would add that back in again. I'm not sure this is really better. Does something like the attached meet your needs? I was thinking that the ones used by most hardware can be considered default. Anyway all this is a silly bikeshedding, which I could have omitted. Patch looks good, thanks. Emil Thanks for the fix. Cheers Rico src/gallium/drivers/nouveau/nouveau_video.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nouveau_video.c b/src/gallium/drivers/nouveau/nouveau_video.c index 1563b22..5c4ec0f 100644 --- a/src/gallium/drivers/nouveau/nouveau_video.c +++ b/src/gallium/drivers/nouveau/nouveau_video.c @@ -863,7 +863,7 @@ nouveau_screen_get_video_param(struct pipe_screen *pscreen, case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE: return true; case PIPE_VIDEO_CAP_MAX_LEVEL: - return vl_level_supported(screen, profile); + return vl_level_supported(pscreen, profile); default: debug_printf(unknown video param: %d\n, param); return 0; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
On Wed, Aug 14, 2013 at 11:25 PM, Marek Olšák mar...@gmail.com wrote: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. If I set SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are needed. I don't have an explanation for this. --- src/gallium/drivers/radeonsi/Makefile.sources | 1 + src/gallium/drivers/radeonsi/r600_blit.c | 12 +- src/gallium/drivers/radeonsi/r600_hw_context.c | 14 ++ src/gallium/drivers/radeonsi/radeonsi_pipe.c | 7 +- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 19 +- src/gallium/drivers/radeonsi/si_descriptors.c | 335 + src/gallium/drivers/radeonsi/si_state.c| 47 ++-- src/gallium/drivers/radeonsi/si_state.h| 56 + src/gallium/drivers/radeonsi/si_state_draw.c | 18 +- src/gallium/drivers/radeonsi/sid.h | 43 10 files changed, 500 insertions(+), 52 deletions(-) create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c diff --git a/src/gallium/drivers/radeonsi/Makefile.sources b/src/gallium/drivers/radeonsi/Makefile.sources index b3ffa72..68c8282 100644 --- a/src/gallium/drivers/radeonsi/Makefile.sources +++ b/src/gallium/drivers/radeonsi/Makefile.sources @@ -10,6 +10,7 @@ C_SOURCES := \ r600_translate.c \ radeonsi_pm4.c \ radeonsi_compute.c \ + si_descriptors.c \ si_state.c \ si_state_streamout.c \ si_state_draw.c \ diff --git a/src/gallium/drivers/radeonsi/r600_blit.c b/src/gallium/drivers/radeonsi/r600_blit.c index bab108e..5bd1a62 100644 --- a/src/gallium/drivers/radeonsi/r600_blit.c +++ b/src/gallium/drivers/radeonsi/r600_blit.c @@ -70,12 +70,12 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum r600_blitter_op op if (op R600_SAVE_TEXTURES) { util_blitter_save_fragment_sampler_states( - rctx-blitter, rctx-ps_samplers.n_samplers, - (void**)rctx-ps_samplers.samplers); + rctx-blitter, rctx-samplers[PIPE_SHADER_FRAGMENT].n_samplers, + (void**)rctx-samplers[PIPE_SHADER_FRAGMENT].samplers); - util_blitter_save_fragment_sampler_views( - rctx-blitter, rctx-ps_samplers.n_views, - (struct pipe_sampler_view**)rctx-ps_samplers.views); + util_blitter_save_fragment_sampler_views(rctx-blitter, + util_bitcount(rctx-samplers[PIPE_SHADER_FRAGMENT].views.desc.enabled_mask), + rctx-samplers[PIPE_SHADER_FRAGMENT].views.views); } if ((op R600_DISABLE_RENDER_COND) rctx-current_render_cond) { @@ -224,7 +224,7 @@ void si_flush_depth_textures(struct r600_context *rctx, struct pipe_sampler_view *view; struct r600_texture *tex; - view = textures-views[i]-base; + view = textures-views.views[i]; if (!view) continue; tex = (struct r600_texture *)view-texture; diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 25c972b..cf43089 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -114,9 +114,17 @@ err: void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean count_draw_in) { + int i; + /* The number of dwords we already used in the CS so far. */ num_dw += ctx-cs-cdw; + for (i = 0; i SI_NUM_ATOMS(ctx); i++) { + if (ctx-atoms.array[i]-dirty) { + num_dw += ctx-atoms.array[i]-num_dw; + } + } + if (count_draw_in) { /* The number of dwords all the dirty states would take. */ num_dw += ctx-pm4_dirty_cdwords; @@ -254,6 +262,10 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) ctx-pm4_dirty_cdwords = 0; ctx-flags = 0; + /* The CS initialization should be emitted before everything else. */ +
[Mesa-dev] Bison 3 fixes
Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release? eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0 f043381334a0760ec118d07b6fb7785b5692572a de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5 6d2a9220b832d9a0c0cf35fcc5b9de1542af267d 5ffa28df4e4cc22481b4ed41c78632f35765f41d ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Bison 3 fixes
On 08/15/2013 03:55 PM, Armin K. wrote: Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release? eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0 f043381334a0760ec118d07b6fb7785b5692572a de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5 6d2a9220b832d9a0c0cf35fcc5b9de1542af267d 5ffa28df4e4cc22481b4ed41c78632f35765f41d ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev It appears that only last one is missing. Others have been applied to 9.2 branch. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] osmesa: Copy shared library to LIB_DIR to fix two symlinks
--- src/mesa/drivers/osmesa/Makefile.am | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/osmesa/Makefile.am b/src/mesa/drivers/osmesa/Makefile.am index a1c0bb5..d625faa 100644 --- a/src/mesa/drivers/osmesa/Makefile.am +++ b/src/mesa/drivers/osmesa/Makefile.am @@ -56,6 +56,7 @@ all-local: lib@OSMESA_LIB@.la $(MKDIR_P) $(top_builddir)/$(LIB_DIR); ln -f .libs/lib@OSMESA_LIB@.so $(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so; ln -f .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@ $(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so.@OSMESA_VERSION@; + cp .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@.0.0 $(top_builddir)/$(LIB_DIR)/ endif pkgconfigdir = $(libdir)/pkgconfig -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests
From: Roland Scheidegger srol...@vmware.com This is a very well hidden bug found by accident (only the fixed glean tstencil2 test so far seems to hit it). We must use new mask with combined s_pass values and orig_mask values for zpass/zfail stencil ops, otherwise both the sfail op and one of zpass/zfail op are applied (probably not hit in most tests because some of the ops tend to be KEEP usually). Note: this is a candidate for the 9.2 branch. --- src/gallium/drivers/llvmpipe/lp_bld_depth.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c b/src/gallium/drivers/llvmpipe/lp_bld_depth.c index 06556dc..5c13ee5 100644 --- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c +++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c @@ -836,7 +836,7 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm, LLVMValueRef stencil_vals = NULL; LLVMValueRef z_bitmask = NULL, stencil_shift = NULL; LLVMValueRef z_pass = NULL, s_pass_mask = NULL; - LLVMValueRef orig_mask = lp_build_mask_value(mask); + LLVMValueRef current_mask = lp_build_mask_value(mask); LLVMValueRef front_facing = NULL; boolean have_z, have_s; @@ -984,7 +984,7 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm, /* apply stencil-fail operator */ { - LLVMValueRef s_fail_mask = lp_build_andnot(s_bld, orig_mask, s_pass_mask); + LLVMValueRef s_fail_mask = lp_build_andnot(s_bld, current_mask, s_pass_mask); stencil_vals = lp_build_stencil_op(s_bld, stencil, S_FAIL_OP, stencil_refs, stencil_vals, s_fail_mask, front_facing); @@ -1032,6 +1032,11 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm, /* compare src Z to dst Z, returning 'pass' mask */ z_pass = lp_build_cmp(z_bld, depth-func, z_src, z_dst); + /* mask off bits that failed stencil test */ + if (s_pass_mask) { + current_mask = LLVMBuildAnd(builder, current_mask, s_pass_mask, ); + } + if (!stencil[0].enabled) { /* We can potentially skip all remaining operations here, but only * if stencil is disabled because we still need to update the stencil @@ -1041,25 +1046,19 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm, if (do_branch) { lp_build_mask_check(mask); -do_branch = FALSE; } } if (depth-writemask) { - LLVMValueRef zselectmask; + LLVMValueRef z_pass_mask; /* mask off bits that failed Z test */ - zselectmask = LLVMBuildAnd(builder, orig_mask, z_pass, ); - - /* mask off bits that failed stencil test */ - if (s_pass_mask) { -zselectmask = LLVMBuildAnd(builder, zselectmask, s_pass_mask, ); - } + z_pass_mask = LLVMBuildAnd(builder, current_mask, z_pass, ); /* Mix the old and new Z buffer values. * z_dst[i] = zselectmask[i] ? z_src[i] : z_dst[i] */ - z_dst = lp_build_select(z_bld, zselectmask, z_src, z_dst); + z_dst = lp_build_select(z_bld, z_pass_mask, z_src, z_dst); } if (stencil[0].enabled) { @@ -1067,13 +1066,13 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm, LLVMValueRef z_fail_mask, z_pass_mask; /* apply Z-fail operator */ - z_fail_mask = lp_build_andnot(s_bld, orig_mask, z_pass); + z_fail_mask = lp_build_andnot(s_bld, current_mask, z_pass); stencil_vals = lp_build_stencil_op(s_bld, stencil, Z_FAIL_OP, stencil_refs, stencil_vals, z_fail_mask, front_facing); /* apply Z-pass operator */ - z_pass_mask = LLVMBuildAnd(builder, orig_mask, z_pass, ); + z_pass_mask = LLVMBuildAnd(builder, current_mask, z_pass, ); stencil_vals = lp_build_stencil_op(s_bld, stencil, Z_PASS_OP, stencil_refs, stencil_vals, z_pass_mask, front_facing); @@ -1083,7 +1082,7 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm, /* No depth test: apply Z-pass operator to stencil buffer values which * passed the stencil test. */ - s_pass_mask = LLVMBuildAnd(builder, orig_mask, s_pass_mask, ); + s_pass_mask = LLVMBuildAnd(builder, current_mask, s_pass_mask, ); stencil_vals = lp_build_stencil_op(s_bld, stencil, Z_PASS_OP, stencil_refs, stencil_vals, s_pass_mask, front_facing); -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
On Don, 2013-08-15 at 12:36 +0200, Marek Olšák wrote: On Thu, Aug 15, 2013 at 9:33 AM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2013-08-15 at 05:25 +0200, Marek Olšák wrote: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. IMPORTANT: 128 resource contexts are needed, 64 doesn't work. Hmm, I suppose this might depend on the specific GPU? Very likely. I'm using SI VERDE. Then it seems likely more will be needed for Pitcairn or Tahiti due to multiple shader engines? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing egl_glx?
On 08/14/2013 10:04 PM, Chia-I Wu wrote: On Thu, Aug 15, 2013 at 10:03 AM, Kenneth Graunke kenn...@whitecape.org wrote: On 08/08/2013 03:13 PM, Chad Versace wrote: [snip] By the way, I talked to krh today, and he suggested that we delete egl_glx rather than allow it to bitrot. I'm in favor, but I don't know who uses that. GLX on EGL would be an interesting experiment. EGL on GLX is not useful, IMHO. That sounds good to me. If Dr Jakob also agrees, then let's plan to kill egl_glx for 9.3/10.0. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT
On 08/14/2013 12:50 AM, Ville Syrjälä wrote: On Wed, Aug 14, 2013 at 10:45:23AM +0300, Ville Syrjälä wrote: On Tue, Aug 13, 2013 at 05:46:55PM -0700, Chad Versace wrote: On 08/13/2013 03:31 PM, Vedran Rodic wrote: On Mon, Aug 12, 2013 at 3:07 PM, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com For L3 cacheability, IVB won't consult the PTE for anything that has a relevant MOCS field. So even if you make everything L3 cacheable through the PTEs, MOCS will always override it. How do i know you ask? Well, BSpec says so for one, and more importantly I verified this by running some tests [...] I suspected this. Thanks for sharing your experimental evidence. For LLC cachebility the story will be different because there IVB MOCS can only say LLC cacheable or consult the PTE. So to make stuff uncached in LLC on IVB, we'd need to issues the set_caching ioctl to change the PTE to uncached, and after that we could use just the MOCS to select the LLC caching policy. Since the set_caching ioctl only needs to be issued once per object (or you could use the , there Sorry hit send by accident. Was going to say we could use the new create2 ioctl Chris has proposed that allows you to set the cache mode when creating the object. So there won't be a performance hit from extra ioctls getting issued all the time. I would like such a cache-control ioctl, as long the ioctl can also be used to change the object's cacheing policy in addition to setting it at object creation. This would be needed when an object's usage oscillates between texture surface and render target. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 67962] undefined reference to `wayland_drm_buffer_get'
https://bugs.freedesktop.org/show_bug.cgi?id=67962 U. Artie Eoff ullysses.a.e...@intel.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from U. Artie Eoff ullysses.a.e...@intel.com --- Fixed in commit http://cgit.freedesktop.org/mesa/mesa/commit/?id=f423eba46e080b975a2b8366b490d99dee4729ad -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 67962] undefined reference to `wayland_drm_buffer_get'
https://bugs.freedesktop.org/show_bug.cgi?id=67962 U. Artie Eoff ullysses.a.e...@intel.com changed: What|Removed |Added Status|RESOLVED|VERIFIED -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests
On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote: On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote: On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote: On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote: LLVM revision 187139 ('Allocate local registers in order for optimal coloring.') broke some derivative related piglit tests with the radeonsi driver. I'm attaching a diff between the bad and good generated code (as printed with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only difference I can see is in which registers are used in which order. I wonder if we might be missing S_WAITCNT after DS_READ/WRITE instructions in some cases, but I haven't spotted any candidates for that in the bad code which aren't there in the good code as well. Can anyone else spot something I've missed? Shouldn't we be using the S_BARRIER instruction to keep the threads in sync? Doesn't seem to help unfortunately, but thanks for the good suggestion. I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR register number instead of the $gds operand for encoding the GDS field (the asm output from llc even shows the VGPR name). If the VGPR number happens to be odd (i.e. to have the least significant bit set), the shader ends up writing to GDS instead of LDS. Ouch, that's a pretty bad bug. But I have no idea why this is happening, or how to fix it. :( I can take a look at it. -Tom -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests
- Original Message - From: Roland Scheidegger srol...@vmware.com This is a very well hidden bug found by accident (only the fixed glean tstencil2 test so far seems to hit it). We must use new mask with combined s_pass values and orig_mask values for zpass/zfail stencil ops, otherwise both the sfail op and one of zpass/zfail op are applied (probably not hit in most tests because some of the ops tend to be KEEP usually). Note: this is a candidate for the 9.2 branch. Looks good ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Enable LTO by default on i965/libdricore release builds.
Chad Versace chad.vers...@linux.intel.com writes: On 08/08/2013 01:43 PM, Eric Anholt wrote: We can't just smash it on globally due to (probably resolvable) issues with the asm in glapi. And we don't want to penalize developers with longer build times for their normal debug environment. Due to libdricore making almost all of our symbols public, the effect is very small -- cairo-gl with INTEL_NO_HW=1 shows -0.798709% +/- 0.333703% change in runtime (n=30). --- If we were to avoid dricore, there's an additional 5% improvement available (see the megadriver branch of my tree). configure.ac | 25 + src/mesa/Makefile.am | 4 ++-- src/mesa/drivers/dri/i965/Makefile.am | 1 + src/mesa/libdricore/Makefile.am | 8 +++- src/mesa/program/Makefile.am | 4 ++-- 5 files changed, 37 insertions(+), 5 deletions(-) diff --git a/configure.ac b/configure.ac index 62d06e0..26c230d 100644 --- a/configure.ac +++ b/configure.ac @@ -314,6 +314,7 @@ AC_ARG_ENABLE([debug], [enable_debug=$enableval], [enable_debug=no] ) +enable_lto=yes if test x$enable_debug = xyes; then DEFINES_FOR_BUILD=$DEFINES_FOR_BUILD -DDEBUG if test x$GCC_FOR_BUILD = xyes; then @@ -330,7 +331,31 @@ if test x$enable_debug = xyes; then if test x$GXX = xyes; then CXXFLAGS=$CXXFLAGS -g -O0 fi + +# Disable LTO by default on debug builds, since it's so expensive at +# compile time. +enable_lto=no +fi I'd like to emit a configuration error if someone tries to enable LTO in a debug build. According the gcc-4.8 manpage, that's a really bad idea. Link-time optimization does not work well with generation of debugging information. Combining -flto with -g is currently experimental and expected to produce wrong results. Other than that, the patch looks good to me. Yeah, debug symbols appear to be quite scrambled. As is, I don't think we can ship with it -- breaking backtraces (and thus profiling) is not acceptable. pgpkub4MzCftN.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: link against -lLLVM to determine build type
On Wed, Aug 14, 2013 at 09:46:44 -0700, Tom Stellard wrote: On Wed, Aug 14, 2013 at 09:08:55AM +0200, Maarten Lankhorst wrote: Op 14-08-13 04:37, Tom Stellard schreef: On Tue, Aug 13, 2013 at 05:53:52PM +0200, Maarten Lankhorst wrote: Fixes a build failure of 9.2 on ubuntu, because libLLVM-3.3.so is not present in /usr/lib/llvm-3.2/lib. I'm trying to understand the problem here, could you give a little more information about how Ubuntu packages LLVM? Where are the LLVM libraries installed and what does llvm-config --libdir --ldflags report? libLLVM-3.3.so gets installed in /usr/lib/x86_64-linux-gnu/libLLVM-3.3.so, there is no /usr/lib/llvm-3.2/lib/libLLVM-3.3.so All the other libs still get installed to /usr/lib/llvm-3.3/lib. ~$ llvm-config-3.3 --libdir --ldflags /usr/lib/llvm-3.3/lib -L/usr/lib/llvm-3.3/lib -lpthread -lffi -ldl -lm So, if /usr/lib/x86_64-linux-gnu/ is not in the ldflags, then how will the linker find it? It's in the default search path. Cheers, Julien ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] osmesa: Copy shared library to LIB_DIR to fix two symlinks
On Thu, Aug 15, 2013 at 7:19 AM, Armin K kre...@email.com wrote: --- src/mesa/drivers/osmesa/Makefile.am | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/osmesa/Makefile.am b/src/mesa/drivers/osmesa/Makefile.am index a1c0bb5..d625faa 100644 --- a/src/mesa/drivers/osmesa/Makefile.am +++ b/src/mesa/drivers/osmesa/Makefile.am @@ -56,6 +56,7 @@ all-local: lib@OSMESA_LIB@.la $(MKDIR_P) $(top_builddir)/$(LIB_DIR); ln -f .libs/lib@OSMESA_LIB@.so $(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so; ln -f .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@ $(top_builddir)/$(LIB_DIR)/lib@OSMESA_LIB@.so.@OSMESA_VERSION@; + cp .libs/lib@OSMESA_LIB@.so.@OSMESA_VERSION@.0.0 $(top_builddir)/$(LIB_DIR)/ endif pkgconfigdir = $(libdir)/pkgconfig -- 1.8.3.4 Just ln -f it like the others. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT
On Thu, Aug 15, 2013 at 08:08:12AM -0700, Chad Versace wrote: On 08/14/2013 12:50 AM, Ville Syrjälä wrote: On Wed, Aug 14, 2013 at 10:45:23AM +0300, Ville Syrjälä wrote: On Tue, Aug 13, 2013 at 05:46:55PM -0700, Chad Versace wrote: On 08/13/2013 03:31 PM, Vedran Rodic wrote: On Mon, Aug 12, 2013 at 3:07 PM, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com For L3 cacheability, IVB won't consult the PTE for anything that has a relevant MOCS field. So even if you make everything L3 cacheable through the PTEs, MOCS will always override it. How do i know you ask? Well, BSpec says so for one, and more importantly I verified this by running some tests [...] I suspected this. Thanks for sharing your experimental evidence. For LLC cachebility the story will be different because there IVB MOCS can only say LLC cacheable or consult the PTE. So to make stuff uncached in LLC on IVB, we'd need to issues the set_caching ioctl to change the PTE to uncached, and after that we could use just the MOCS to select the LLC caching policy. Since the set_caching ioctl only needs to be issued once per object (or you could use the , there Sorry hit send by accident. Was going to say we could use the new create2 ioctl Chris has proposed that allows you to set the cache mode when creating the object. So there won't be a performance hit from extra ioctls getting issued all the time. I would like such a cache-control ioctl, as long the ioctl can also be used to change the object's cacheing policy in addition to setting it at object creation. This would be needed when an object's usage oscillates between texture surface and render target. We do have the set_caching ioctl. It's enough to flip the PTEs to UC and let MOCS manage things. I actually did a few experiments on my IVB. I made all Mesa's buffers UC via PTEs by patching libdrm to change the cache mode of each bo after allocation. Then I fiddled with the MOCS LLC bits in various ways. It definitely has an effect, sometimes making things slower, sometimes faster. xonotic again seemed to benefit. IIRC leaving everything LLC uncached was actually the fastest (w/ high quality at least) so we may be thrashing the LLC a bit there. But eg. reaction quake regressed quite a lot if most things were left as UC. I should probably run through a few MOCS combinations and collect a bit more data. But it's looking like some sensible heuristic has to be involved since different benchmarks show conflicting results. Maybe your LLC overcommit prevention approach would be the one. Are you planning to continue with that work? -- Ville Syrjälä Intel OTC ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests
On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote: On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote: On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote: On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote: On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote: LLVM revision 187139 ('Allocate local registers in order for optimal coloring.') broke some derivative related piglit tests with the radeonsi driver. I'm attaching a diff between the bad and good generated code (as printed with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only difference I can see is in which registers are used in which order. I wonder if we might be missing S_WAITCNT after DS_READ/WRITE instructions in some cases, but I haven't spotted any candidates for that in the bad code which aren't there in the good code as well. Can anyone else spot something I've missed? Shouldn't we be using the S_BARRIER instruction to keep the threads in sync? Doesn't seem to help unfortunately, but thanks for the good suggestion. I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR register number instead of the $gds operand for encoding the GDS field (the asm output from llc even shows the VGPR name). If the VGPR number happens to be odd (i.e. to have the least significant bit set), the shader ends up writing to GDS instead of LDS. Ouch, that's a pretty bad bug. But I have no idea why this is happening, or how to fix it. :( I can take a look at it. The attached patch should fix the problem, can you test? -Tom From c837737e2807b523fffac2ddf9aa9d8f293cf622 Mon Sep 17 00:00:00 2001 From: Tom Stellard thomas.stell...@amd.com Date: Thu, 15 Aug 2013 09:03:08 -0700 Subject: [PATCH] R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions The SIInsertWaits pass was overwriting the first operand (gds bit) of DS_WRITE_B32 with the second operand (value to write). This meant that any time the value to write was stored in an odd number VGPR, the gds bit would be set causing the instruction to write to GDS instead of LDS. --- lib/Target/R600/SIInsertWaits.cpp | 4 +--- lib/Target/R600/SIInstrInfo.td| 8 lib/Target/R600/SIInstructions.td | 6 +++--- test/CodeGen/R600/local-memory.ll | 4 ++-- 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp index ba202e3..c477be5 100644 --- a/lib/Target/R600/SIInsertWaits.cpp +++ b/lib/Target/R600/SIInsertWaits.cpp @@ -134,9 +134,7 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) { // LGKM may uses larger values if (TSFlags SIInstrFlags::LGKM_CNT) { -MachineOperand Op = MI.getOperand(0); -if (!Op.isReg()) - Op = MI.getOperand(1); +const MachineOperand Op = MI.getOperand(0); assert(Op.isReg() First LGKM operand must be a register!); unsigned Reg = Op.getReg(); diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td index ecc4718..1965ba0 100644 --- a/lib/Target/R600/SIInstrInfo.td +++ b/lib/Target/R600/SIInstrInfo.td @@ -342,8 +342,8 @@ class VOP3_64 bits9 op, string opName, listdag pattern : VOP3 class DS_Load_Helper bits8 op, string asm, RegisterClass regClass : DS op, (outs regClass:$vdst), - (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, VReg_32:$data1, - i8imm:$offset0, i8imm:$offset1), + (ins VReg_32:$addr, VReg_32:$data0, VReg_32:$data1, + i8imm:$offset0, i8imm:$offset1, i1imm:$gds), asm# $vdst, $gds, $addr, $data0, $data1, $offset0, $offset1, [M0], [] { let mayLoad = 1; @@ -353,8 +353,8 @@ class DS_Load_Helper bits8 op, string asm, RegisterClass regClass : DS class DS_Store_Helper bits8 op, string asm, RegisterClass regClass : DS op, (outs), - (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, VReg_32:$data1, - i8imm:$offset0, i8imm:$offset1), + (ins VReg_32:$addr, VReg_32:$data0, VReg_32:$data1, + i8imm:$offset0, i8imm:$offset1, i1imm:$gds), asm# $gds, $addr, $data0, $data1, $offset0, $offset1, [M0], [] { let mayStore = 1; diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td index 4eb3566..9856fa6 100644 --- a/lib/Target/R600/SIInstructions.td +++ b/lib/Target/R600/SIInstructions.td @@ -1745,13 +1745,13 @@ def : Pat def : Pat (local_load i64:$src0), -(i32 (DS_READ_B32 0, (EXTRACT_SUBREG $src0, sub0), - (EXTRACT_SUBREG $src0, sub0), (EXTRACT_SUBREG $src0, sub0), 0, 0)) +(i32 (DS_READ_B32 (EXTRACT_SUBREG $src0, sub0), + (EXTRACT_SUBREG $src0, sub0), (EXTRACT_SUBREG $src0, sub0), 0, 0, 0)) ; def : Pat (local_store i32:$src1, i64:$src0), -(DS_WRITE_B32 0, (EXTRACT_SUBREG $src0, sub0), $src1, $src1, 0, 0) +(DS_WRITE_B32 (EXTRACT_SUBREG $src0, sub0), $src1, $src1, 0, 0, 0) ;
[Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
(This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later) - turn the bool CP DMA parameters into self-explanatory flags - add a nice simple API for packet emission to radeon_winsys.h - use 256 contexts, 128 causes texture corruption in openarena DISCUSSION: Maybe there is a synchronization issue and we don't actually need so many contexts? We can always flush KCACHE at the end of the ring if needed. --- src/gallium/drivers/radeonsi/Makefile.sources | 1 + src/gallium/drivers/radeonsi/r600_blit.c | 12 +- src/gallium/drivers/radeonsi/r600_hw_context.c | 22 +- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 7 +- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 19 +- src/gallium/drivers/radeonsi/si_descriptors.c | 355 + src/gallium/drivers/radeonsi/si_state.c| 47 +--- src/gallium/drivers/radeonsi/si_state.h| 56 src/gallium/drivers/radeonsi/si_state_draw.c | 18 +- src/gallium/drivers/radeonsi/sid.h | 54 src/gallium/winsys/radeon/drm/radeon_winsys.h | 12 + 11 files changed, 547 insertions(+), 56 deletions(-) create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c diff --git a/src/gallium/drivers/radeonsi/Makefile.sources b/src/gallium/drivers/radeonsi/Makefile.sources index b3ffa72..68c8282 100644 --- a/src/gallium/drivers/radeonsi/Makefile.sources +++ b/src/gallium/drivers/radeonsi/Makefile.sources @@ -10,6 +10,7 @@ C_SOURCES := \ r600_translate.c \ radeonsi_pm4.c \ radeonsi_compute.c \ + si_descriptors.c \ si_state.c \ si_state_streamout.c \ si_state_draw.c \ diff --git a/src/gallium/drivers/radeonsi/r600_blit.c b/src/gallium/drivers/radeonsi/r600_blit.c index bab108e..bdd9bb4 100644 --- a/src/gallium/drivers/radeonsi/r600_blit.c +++ b/src/gallium/drivers/radeonsi/r600_blit.c @@ -70,12 +70,12 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum r600_blitter_op op if (op R600_SAVE_TEXTURES) { util_blitter_save_fragment_sampler_states( - rctx-blitter, rctx-ps_samplers.n_samplers, - (void**)rctx-ps_samplers.samplers); + rctx-blitter, rctx-samplers[PIPE_SHADER_FRAGMENT].n_samplers, + (void**)rctx-samplers[PIPE_SHADER_FRAGMENT].samplers); - util_blitter_save_fragment_sampler_views( - rctx-blitter, rctx-ps_samplers.n_views, - (struct pipe_sampler_view**)rctx-ps_samplers.views); + util_blitter_save_fragment_sampler_views(rctx-blitter, + util_last_bit(rctx-samplers[PIPE_SHADER_FRAGMENT].views.desc.enabled_mask), + rctx-samplers[PIPE_SHADER_FRAGMENT].views.views); } if ((op R600_DISABLE_RENDER_COND) rctx-current_render_cond) { @@ -224,7 +224,7 @@ void si_flush_depth_textures(struct r600_context *rctx, struct pipe_sampler_view *view; struct r600_texture *tex; - view = textures-views[i]-base; + view = textures-views.views[i]; if (!view) continue; tex = (struct r600_texture *)view-texture; diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 25c972b..bc6ba0b 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -114,9 +114,17 @@ err: void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean count_draw_in) { + int i; + /* The number of dwords we already used in the CS so far. */ num_dw += ctx-cs-cdw; + for (i = 0; i SI_NUM_ATOMS(ctx); i++) { + if (ctx-atoms.array[i]-dirty) { + num_dw += ctx-atoms.array[i]-num_dw; + } + } + if (count_draw_in) { /* The number of dwords all the dirty states would take. */ num_dw += ctx-pm4_dirty_cdwords; @@ -254,6 +262,15 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) ctx-pm4_dirty_cdwords
Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT
On 08/15/2013 09:11 AM, Ville Syrjälä wrote: On Thu, Aug 15, 2013 at 08:08:12AM -0700, Chad Versace wrote: I would like such a cache-control ioctl, as long the ioctl can also be used to change the object's cacheing policy in addition to setting it at object creation. This would be needed when an object's usage oscillates between texture surface and render target. We do have the set_caching ioctl. It's enough to flip the PTEs to UC and let MOCS manage things. I actually did a few experiments on my IVB. I made all Mesa's buffers UC via PTEs by patching libdrm to change the cache mode of each bo after allocation. Then I fiddled with the MOCS LLC bits in various ways. It definitely has an effect, sometimes making things slower, sometimes faster. xonotic again seemed to benefit. IIRC leaving everything LLC uncached was actually the fastest (w/ high quality at least) so we may be thrashing the LLC a bit there. But eg. reaction quake regressed quite a lot if most things were left as UC. I should probably run through a few MOCS combinations and collect a bit more data. But it's looking like some sensible heuristic has to be involved since different benchmarks show conflicting results. Maybe your LLC overcommit prevention approach would be the one. Are you planning to continue with that work? I do plan to continue that work. I plan to return to it the week of Aug 26, because I need to first make more progress on Broadwell. My simple heuristic that prevents overcommit of the LLC, in its current form, gives varying results too. Some benchmarks benefit, some harmed. In each experiment, I set the LLC commit threshhold to 0.80, 1.00, or 1.25. (That is, for a given draw call, Mesa stops putting objects in the LLC when the draw call has filled that ratio of the LLC). Hopefully, to get consistent benefit across all apps, all we need is to choose a significantly higher or lower threshold than I've previously chosen. What I fear, though, is that since the GPU shares the LLC with the CPU (GPU-LLC=CPU-L3), to find a heuristic that's near-globally beneficial, we may need to consider the CPU load to intelligently choose the LLC commit threshold ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] Consolidate the remaining source files to Makefile.sources
Hello list Feeling inspired by the automake work done in mesa, I felt like finishing a few things that did not receive too much attention * use Makefile.sources wherever possible * cleanup the duplicated C{,PP,XX}FLAGS and factor out the the common ones into Automake.inc If anyone is interested I have pushed my preliminary work to the makefile.sources branch at https://github.com/evelikov/Mesa/ Currently I have gone through the following: gallium/drivers gallium/state_trackers/dri/drm gallium/state_trackers/dri/sw gallium/state_trackers/vega gallium/state_trackers/xorg Producing the following diff stat: 54 files changed, 352 insertions(+), 552 deletions(-) Pros: * patches such as gallium/dri-targets: hide all symbols except for __driDriverExtensions will be brought down to changing only with 2-3 lines * one less chance of breaking builds when someone forgets to update the SCons/Makefile.am/etc build, after adding/removing a file Cons: * Non nouveau changes will be only(lacking any other the hardware atm). Note that I will check the symbols for all drivers, to ensure that I do not cause chaos. Curious if anyone is interested and have any comments on this. Note: there may be some git rebase fails as I've started this ~3months ago Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/15] i965/fs: Don't kill ACP entries due to their generating instruction.
On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote: fs_copy_prop_dataflow::setup_kills() walks through each basic block's instructions, looking for instructions which overwrite registers used in ACP entries. This would be fine, except that it didn't exclude the instructions which generated the ACP entries in the first place. This meant that every copy was killed by its own generating instruction, making the dataflow analysis useless. I don't think this is actually a problem, because the bd[b].kills bits are only used to prevent liveness information from being propagated from bd[b].livein to bd[b].liveout, and the fs_copy_prop_dataflow constructor initializes bd[b].liveout with the set of ACP's that are generated from block b. So the only situation in which your patch would allow liveness information to be propagated from bd[b].livein to bd[b].liveout is for bits in bd[b].liveout that are already set. Or am I misunderstanding things? To fix this, this patch records the generating instruction in the ACP entry. It then skips the kill checks when processing that instruction. Using a void pointer ensures it will only be used for comparisons, and not dereferenced. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index d8d1546..379605c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -42,6 +42,9 @@ namespace { /* avoid conflict with opt_copy_propagation_elements */ struct acp_entry : public exec_node { fs_reg dst; fs_reg src; + + /** MOV instruction which generated this copy/ACP entry. */ + void *inst; }; struct block_data { @@ -146,8 +149,9 @@ fs_copy_prop_dataflow::setup_kills() continue; for (int i = 0; i num_acp; i++) { -if (inst-overwrites_reg(acp[i]-dst) || -inst-overwrites_reg(acp[i]-src)) { +if (inst != acp[i]-inst +(inst-overwrites_reg(acp[i]-dst) || + inst-overwrites_reg(acp[i]-src))) { BITSET_SET(bd[b].kill, i); } } @@ -418,6 +422,7 @@ fs_visitor::opt_copy_propagate_local(void *mem_ctx, bblock_t *block, acp_entry *entry = ralloc(mem_ctx, acp_entry); entry-dst = inst-dst; entry-src = inst-src[0]; + entry-inst = inst; acp[entry-dst.reg % ACP_HASH_SIZE].push_tail(entry); } } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: handle nan clipdistance
If clipdistance for one of the vertices is nan (or inf) then the entire primitive should be discarded. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_cliptest_tmp.h |2 +- src/gallium/auxiliary/draw/draw_llvm.c |3 ++ src/gallium/auxiliary/draw/draw_pipe_clip.c| 13 +- src/gallium/auxiliary/gallivm/lp_bld_arit.c| 53 src/gallium/auxiliary/gallivm/lp_bld_arit.h| 11 + 5 files changed, 79 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h index e4500db..fc54810 100644 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h @@ -140,7 +140,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, clipdist = out-data[cd[0]][i]; else clipdist = out-data[cd[1]][i-4]; - if (clipdist 0) + if (clipdist 0 || util_is_inf_or_nan(clipdist)) mask |= 1 plane_idx; } else { if (dot4(clipvertex, plane[plane_idx]) 0) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 84e3392..1e9eadb 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1261,6 +1261,7 @@ generate_clipmask(struct draw_llvm *llvm, if (clip_user) { LLVMValueRef planes_ptr = draw_jit_context_planes(gallivm, context_ptr); LLVMValueRef indices[3]; + LLVMValueRef is_nan; /* userclip planes */ while (ucp_enable) { @@ -1280,6 +1281,8 @@ generate_clipmask(struct draw_llvm *llvm, clipdist = LLVMBuildLoad(builder, outputs[cd[1]][i-4], ); } test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, zero, clipdist); +is_nan = lp_build_is_inf_or_nan(gallivm, vs_type, clipdist); +test = LLVMBuildOr(builder, test, is_nan, ); temp = lp_build_const_int_vec(gallivm, i32_type, 1 plane_idx); test = LLVMBuildAnd(builder, test, temp, ); mask = LLVMBuildOr(builder, mask, test, ); diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c b/src/gallium/auxiliary/draw/draw_pipe_clip.c index b76e9a5..2f2aadb 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c @@ -104,7 +104,7 @@ static void interp_attr( float dst[4], float t, const float in[4], const float out[4] ) -{ +{ dst[0] = LINTERP( t, out[0], in[0] ); dst[1] = LINTERP( t, out[1], in[1] ); dst[2] = LINTERP( t, out[2], in[2] ); @@ -380,6 +380,9 @@ do_clip_tri( struct draw_stage *stage, dp_prev = getclipdist(clipper, vert_prev, plane_idx); clipmask = ~(1plane_idx); + if (util_is_inf_or_nan(dp_prev)) + return; //discard nan + assert(n MAX_CLIPPED_VERTICES); if (n = MAX_CLIPPED_VERTICES) return; @@ -392,6 +395,9 @@ do_clip_tri( struct draw_stage *stage, float dp = getclipdist(clipper, vert, plane_idx); + if (util_is_inf_or_nan(dp)) +return; //discard nan + if (!IS_NEGATIVE(dp_prev)) { assert(outcount MAX_CLIPPED_VERTICES); if (outcount = MAX_CLIPPED_VERTICES) @@ -522,6 +528,9 @@ do_clip_line( struct draw_stage *stage, const float dp0 = getclipdist(clipper, v0, plane_idx); const float dp1 = getclipdist(clipper, v1, plane_idx); + if (util_is_inf_or_nan(dp0) || util_is_inf_or_nan(dp1)) + return; //discard nan + if (dp1 0.0F) { float t = dp1 / (dp1 - dp0); t1 = MAX2(t1, t); @@ -594,7 +603,7 @@ clip_tri( struct draw_stage *stage, unsigned clipmask = (header-v[0]-clipmask | header-v[1]-clipmask | header-v[2]-clipmask); - + if (clipmask == 0) { /* no clipping needed */ stage-next-tri( stage-next, header ); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c index 98409c3..72b563e 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c @@ -3671,3 +3671,56 @@ lp_build_isfinite(struct lp_build_context *bld, return lp_build_compare(bld-gallivm, int_type, PIPE_FUNC_NOTEQUAL, intx, infornan32); } + +/* + * Returns true if the number is nan or inf or false otherwise. + * The input has to be a floating point vector. + */ +LLVMValueRef +lp_build_is_inf_or_nan(struct gallivm_state *gallivm, + const struct lp_type type, + LLVMValueRef x) +{ + LLVMBuilderRef builder = gallivm-builder; + struct lp_type int_type = lp_int_type(type); +
Re: [Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views
Am 15.08.2013 19:01, schrieb Marek Olšák: (This should be applied before MSAA, which will need to be rebased.) It moves all sampler view descriptors to a buffer. It supports partial resource updates and it can also unbind resources (required for FMASK texturing). The buffer contains all sampler view descriptors for one shader stage, represented as an array. On top of that, there are N arrays in the buffer, which are used to emulate context registers as implemented by the previous ASICs (each array is a context). This uses the RCU synchronization approach to avoid read-after-write hazards as discussed in the thread: radeonsi: add FMASK texture binding slots and resource setup CP DMA is used to clear the descriptors at context initialization and to copy the descriptors from one context to the next. v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later) - turn the bool CP DMA parameters into self-explanatory flags - add a nice simple API for packet emission to radeon_winsys.h - use 256 contexts, 128 causes texture corruption in openarena DISCUSSION: Maybe there is a synchronization issue and we don't actually need so many contexts? We can always flush KCACHE at the end of the ring if needed. Well you definitely need to flush the texture cache when changing the descriptors, but I assume you already do so. Going to dig up the documentation for it, but 256 indeed sounds a bit much. Christian. --- src/gallium/drivers/radeonsi/Makefile.sources | 1 + src/gallium/drivers/radeonsi/r600_blit.c | 12 +- src/gallium/drivers/radeonsi/r600_hw_context.c | 22 +- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 7 +- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 19 +- src/gallium/drivers/radeonsi/si_descriptors.c | 355 + src/gallium/drivers/radeonsi/si_state.c| 47 +--- src/gallium/drivers/radeonsi/si_state.h| 56 src/gallium/drivers/radeonsi/si_state_draw.c | 18 +- src/gallium/drivers/radeonsi/sid.h | 54 src/gallium/winsys/radeon/drm/radeon_winsys.h | 12 + 11 files changed, 547 insertions(+), 56 deletions(-) create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c diff --git a/src/gallium/drivers/radeonsi/Makefile.sources b/src/gallium/drivers/radeonsi/Makefile.sources index b3ffa72..68c8282 100644 --- a/src/gallium/drivers/radeonsi/Makefile.sources +++ b/src/gallium/drivers/radeonsi/Makefile.sources @@ -10,6 +10,7 @@ C_SOURCES := \ r600_translate.c \ radeonsi_pm4.c \ radeonsi_compute.c \ + si_descriptors.c \ si_state.c \ si_state_streamout.c \ si_state_draw.c \ diff --git a/src/gallium/drivers/radeonsi/r600_blit.c b/src/gallium/drivers/radeonsi/r600_blit.c index bab108e..bdd9bb4 100644 --- a/src/gallium/drivers/radeonsi/r600_blit.c +++ b/src/gallium/drivers/radeonsi/r600_blit.c @@ -70,12 +70,12 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum r600_blitter_op op if (op R600_SAVE_TEXTURES) { util_blitter_save_fragment_sampler_states( - rctx-blitter, rctx-ps_samplers.n_samplers, - (void**)rctx-ps_samplers.samplers); + rctx-blitter, rctx-samplers[PIPE_SHADER_FRAGMENT].n_samplers, + (void**)rctx-samplers[PIPE_SHADER_FRAGMENT].samplers); - util_blitter_save_fragment_sampler_views( - rctx-blitter, rctx-ps_samplers.n_views, - (struct pipe_sampler_view**)rctx-ps_samplers.views); + util_blitter_save_fragment_sampler_views(rctx-blitter, + util_last_bit(rctx-samplers[PIPE_SHADER_FRAGMENT].views.desc.enabled_mask), + rctx-samplers[PIPE_SHADER_FRAGMENT].views.views); } if ((op R600_DISABLE_RENDER_COND) rctx-current_render_cond) { @@ -224,7 +224,7 @@ void si_flush_depth_textures(struct r600_context *rctx, struct pipe_sampler_view *view; struct r600_texture *tex; - view = textures-views[i]-base; + view = textures-views.views[i]; if (!view) continue; tex = (struct r600_texture *)view-texture; diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 25c972b..bc6ba0b 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -114,9 +114,17 @@ err: void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean count_draw_in) { + int i; + /* The number of dwords we already used in the CS so far. */ num_dw += ctx-cs-cdw; + for (i = 0; i SI_NUM_ATOMS(ctx); i++) { + if (ctx-atoms.array[i]-dirty) { + num_dw += ctx-atoms.array[i]-num_dw; + } + } + if (count_draw_in) {
Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests
On Don, 2013-08-15 at 09:16 -0700, Tom Stellard wrote: On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote: On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote: On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote: On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote: On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote: LLVM revision 187139 ('Allocate local registers in order for optimal coloring.') broke some derivative related piglit tests with the radeonsi driver. I'm attaching a diff between the bad and good generated code (as printed with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only difference I can see is in which registers are used in which order. I wonder if we might be missing S_WAITCNT after DS_READ/WRITE instructions in some cases, but I haven't spotted any candidates for that in the bad code which aren't there in the good code as well. Can anyone else spot something I've missed? Shouldn't we be using the S_BARRIER instruction to keep the threads in sync? Doesn't seem to help unfortunately, but thanks for the good suggestion. I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR register number instead of the $gds operand for encoding the GDS field (the asm output from llc even shows the VGPR name). If the VGPR number happens to be odd (i.e. to have the least significant bit set), the shader ends up writing to GDS instead of LDS. Ouch, that's a pretty bad bug. But I have no idea why this is happening, or how to fix it. :( I can take a look at it. The attached patch should fix the problem, can you test? Thanks for finding my silly mistake. However, I'd like to preserve the ability to use these instructions for GDS access, and the logic in SIInsertWaits::getHwCounts() only really makes sense for SMRD anyway. How about this patch instead? It fixes the piglit regressions that prompted me to start this thread. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer From 4ad4d9bba31b06bbadb83fead3e53d989b80d83e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com Date: Thu, 15 Aug 2013 19:43:02 +0200 Subject: [PATCH] R600/SI: Fix broken encoding of DS_WRITE_B32 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused it to corrupt the encoding of that by clobbering the first operand with the second one. Undo that damage and only apply the SMRD logic to that. Fixes some derivates related piglit regressions with radeonsi. Signed-off-by: Michel Dänzer michel.daen...@amd.com --- lib/Target/R600/SIInsertWaits.cpp | 22 ++ test/CodeGen/R600/local-memory.ll | 4 ++-- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp index ba202e3..200e064 100644 --- a/lib/Target/R600/SIInsertWaits.cpp +++ b/lib/Target/R600/SIInsertWaits.cpp @@ -134,14 +134,20 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) { // LGKM may uses larger values if (TSFlags SIInstrFlags::LGKM_CNT) { -MachineOperand Op = MI.getOperand(0); -if (!Op.isReg()) - Op = MI.getOperand(1); -assert(Op.isReg() First LGKM operand must be a register!); - -unsigned Reg = Op.getReg(); -unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize(); -Result.Named.LGKM = Size 4 ? 2 : 1; +if (MI.getNumOperands() == 3) { + + // SMRD + MachineOperand Op = MI.getOperand(0); + assert(Op.isReg() First LGKM operand must be a register!); + + unsigned Reg = Op.getReg(); + unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize(); + Result.Named.LGKM = Size 4 ? 2 : 1; + +} else { + // DS + Result.Named.LGKM = 1; +} } else { Result.Named.LGKM = 0; diff --git a/test/CodeGen/R600/local-memory.ll b/test/CodeGen/R600/local-memory.ll index 5458fb9..9ebb769 100644 --- a/test/CodeGen/R600/local-memory.ll +++ b/test/CodeGen/R600/local-memory.ll @@ -13,7 +13,7 @@ ; SI-CHECK-NEXT: .long 32768 ; EG-CHECK: LDS_WRITE -; SI-CHECK: DS_WRITE_B32 +; SI-CHECK: DS_WRITE_B32 0 ; GROUP_BARRIER must be the last instruction in a clause ; EG-CHECK: GROUP_BARRIER @@ -21,7 +21,7 @@ ; SI-CHECK: S_BARRIER ; EG-CHECK: LDS_READ_RET -; SI-CHECK: DS_READ_B32 +; SI-CHECK: DS_READ_B32 {{VGPR[0-9]+}}, 0 define void @local_memory(i32 addrspace(1)* %out) { entry: -- 1.8.4.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance
Am 15.08.2013 19:12, schrieb Zack Rusin: If clipdistance for one of the vertices is nan (or inf) then the entire primitive should be discarded. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_cliptest_tmp.h |2 +- src/gallium/auxiliary/draw/draw_llvm.c |3 ++ src/gallium/auxiliary/draw/draw_pipe_clip.c| 13 +- src/gallium/auxiliary/gallivm/lp_bld_arit.c| 53 src/gallium/auxiliary/gallivm/lp_bld_arit.h| 11 + 5 files changed, 79 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h index e4500db..fc54810 100644 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h @@ -140,7 +140,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, clipdist = out-data[cd[0]][i]; else clipdist = out-data[cd[1]][i-4]; - if (clipdist 0) + if (clipdist 0 || util_is_inf_or_nan(clipdist)) mask |= 1 plane_idx; } else { if (dot4(clipvertex, plane[plane_idx]) 0) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 84e3392..1e9eadb 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1261,6 +1261,7 @@ generate_clipmask(struct draw_llvm *llvm, if (clip_user) { LLVMValueRef planes_ptr = draw_jit_context_planes(gallivm, context_ptr); LLVMValueRef indices[3]; + LLVMValueRef is_nan; /* userclip planes */ while (ucp_enable) { @@ -1280,6 +1281,8 @@ generate_clipmask(struct draw_llvm *llvm, clipdist = LLVMBuildLoad(builder, outputs[cd[1]][i-4], ); } test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, zero, clipdist); +is_nan = lp_build_is_inf_or_nan(gallivm, vs_type, clipdist); I think this should either use is_inf_or_nan (if it only cares about nan, though I guess it really does care about inf too) or else the var should be named is_inf_or_nan. +test = LLVMBuildOr(builder, test, is_nan, ); temp = lp_build_const_int_vec(gallivm, i32_type, 1 plane_idx); test = LLVMBuildAnd(builder, test, temp, ); mask = LLVMBuildOr(builder, mask, test, ); diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c b/src/gallium/auxiliary/draw/draw_pipe_clip.c index b76e9a5..2f2aadb 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c @@ -104,7 +104,7 @@ static void interp_attr( float dst[4], float t, const float in[4], const float out[4] ) -{ +{ dst[0] = LINTERP( t, out[0], in[0] ); dst[1] = LINTERP( t, out[1], in[1] ); dst[2] = LINTERP( t, out[2], in[2] ); @@ -380,6 +380,9 @@ do_clip_tri( struct draw_stage *stage, dp_prev = getclipdist(clipper, vert_prev, plane_idx); clipmask = ~(1plane_idx); + if (util_is_inf_or_nan(dp_prev)) + return; //discard nan + assert(n MAX_CLIPPED_VERTICES); if (n = MAX_CLIPPED_VERTICES) return; @@ -392,6 +395,9 @@ do_clip_tri( struct draw_stage *stage, float dp = getclipdist(clipper, vert, plane_idx); + if (util_is_inf_or_nan(dp)) +return; //discard nan + if (!IS_NEGATIVE(dp_prev)) { assert(outcount MAX_CLIPPED_VERTICES); if (outcount = MAX_CLIPPED_VERTICES) @@ -522,6 +528,9 @@ do_clip_line( struct draw_stage *stage, const float dp0 = getclipdist(clipper, v0, plane_idx); const float dp1 = getclipdist(clipper, v1, plane_idx); + if (util_is_inf_or_nan(dp0) || util_is_inf_or_nan(dp1)) + return; //discard nan + if (dp1 0.0F) { float t = dp1 / (dp1 - dp0); t1 = MAX2(t1, t); @@ -594,7 +603,7 @@ clip_tri( struct draw_stage *stage, unsigned clipmask = (header-v[0]-clipmask | header-v[1]-clipmask | header-v[2]-clipmask); - + if (clipmask == 0) { /* no clipping needed */ stage-next-tri( stage-next, header ); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c index 98409c3..72b563e 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c @@ -3671,3 +3671,56 @@ lp_build_isfinite(struct lp_build_context *bld, return lp_build_compare(bld-gallivm, int_type, PIPE_FUNC_NOTEQUAL, intx, infornan32); } + +/* + * Returns true if the number is nan or inf or false otherwise. + *
[Mesa-dev] [PATCH 1/4] glsl: Rename ubo_qualifiers_valid to ubo_qualifiers_allowed.
The variable means that UBO qualifiers are allowed in a particular context (e.g., not allowed in a struct field declaration), rather than a particular set of UBO qualifiers are valid. --- src/glsl/ast.h | 2 +- src/glsl/ast_to_hir.cpp | 6 +++--- src/glsl/glsl_parser.yy | 2 +- src/glsl/glsl_parser_extras.cpp | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index 9b194db..b0446b5 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -640,7 +640,7 @@ public: * Flag indicating that these declarators are in a uniform block, * allowing UBO type qualifiers. */ - bool ubo_qualifiers_valid; + bool ubo_qualifiers_allowed; }; diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 04b16c8..9bc713d 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1929,7 +1929,7 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, ir_variable *var, struct _mesa_glsl_parse_state *state, YYLTYPE *loc, -bool ubo_qualifiers_valid, +bool ubo_qualifiers_allowed, bool is_parameter) { STATIC_ASSERT(sizeof(qual-flags.q) = sizeof(qual-flags.i)); @@ -2275,7 +2275,7 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, } if (qual-flags.q.row_major || qual-flags.q.column_major) { - if (!ubo_qualifiers_valid) { + if (!ubo_qualifiers_allowed) { _mesa_glsl_error(loc, state, uniform block layout qualifiers row_major and column_major can only be applied to uniform block @@ -2830,7 +2830,7 @@ ast_declarator_list::hir(exec_list *instructions, } apply_type_qualifier_to_variable( this-type-qualifier, var, state, - loc, this-ubo_qualifiers_valid, false); + loc, this-ubo_qualifiers_allowed, false); if (this-type-qualifier.flags.q.invariant) { if ((state-target == vertex_shader) diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index e3a57ea..0ed3453 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -2295,7 +2295,7 @@ member_declaration: $$ = new(ctx) ast_declarator_list(type); $$-set_location(yylloc); - $$-ubo_qualifiers_valid = true; + $$-ubo_qualifiers_allowed = true; $$-declarations.push_degenerate_list_at_head( $2-link); } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 88f0483..98fd510 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1170,7 +1170,7 @@ ast_declarator_list::ast_declarator_list(ast_fully_specified_type *type) { this-type = type; this-invariant = false; - this-ubo_qualifiers_valid = false; + this-ubo_qualifiers_allowed = false; } void -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] glsl: Drop duplicate error messages.
This same message is printed in the validate_matrix_layout_for_type function. --- src/glsl/ast_to_hir.cpp | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 9bc713d..c9ffa30 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2275,13 +2275,7 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, } if (qual-flags.q.row_major || qual-flags.q.column_major) { - if (!ubo_qualifiers_allowed) { -_mesa_glsl_error(loc, state, - uniform block layout qualifiers row_major and - column_major can only be applied to uniform block - members); - } else -validate_matrix_layout_for_type(state, loc, var-type); + validate_matrix_layout_for_type(state, loc, var-type); } } @@ -4415,11 +4409,6 @@ ast_process_structure_or_interface_block(exec_list *instructions, _mesa_glsl_error(loc, state, row_major and column_major can only be applied to uniform interface blocks); -} else if (!field_type-is_matrix() !field_type-is_record()) { - _mesa_glsl_error(loc, state, -uniform block layout qualifiers row_major and -column_major can only be applied to matrix and -structure types); } else validate_matrix_layout_for_type(state, loc, field_type); } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] glsl: Remove ubo_qualifiers_allowed variable.
No longer used. --- src/glsl/ast.h | 6 -- src/glsl/ast_to_hir.cpp | 5 ++--- src/glsl/glsl_parser.yy | 1 - src/glsl/glsl_parser_extras.cpp | 1 - 4 files changed, 2 insertions(+), 11 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index b0446b5..33a00e0 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -635,12 +635,6 @@ public: * is used to note these cases when no type is specified. */ int invariant; - - /** -* Flag indicating that these declarators are in a uniform block, -* allowing UBO type qualifiers. -*/ - bool ubo_qualifiers_allowed; }; diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index c9ffa30..c2fdbd5 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1929,7 +1929,6 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, ir_variable *var, struct _mesa_glsl_parse_state *state, YYLTYPE *loc, -bool ubo_qualifiers_allowed, bool is_parameter) { STATIC_ASSERT(sizeof(qual-flags.q) = sizeof(qual-flags.i)); @@ -2824,7 +2823,7 @@ ast_declarator_list::hir(exec_list *instructions, } apply_type_qualifier_to_variable( this-type-qualifier, var, state, - loc, this-ubo_qualifiers_allowed, false); + loc, false); if (this-type-qualifier.flags.q.invariant) { if ((state-target == vertex_shader) @@ -3334,7 +,7 @@ ast_parameter_declarator::hir(exec_list *instructions, * for function parameters the default mode is 'in'. */ apply_type_qualifier_to_variable( this-type-qualifier, var, state, loc, - false, true); + true); /* From page 17 (page 23 of the PDF) of the GLSL 1.20 spec: * diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index 0ed3453..d8e589d 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -2295,7 +2295,6 @@ member_declaration: $$ = new(ctx) ast_declarator_list(type); $$-set_location(yylloc); - $$-ubo_qualifiers_allowed = true; $$-declarations.push_degenerate_list_at_head( $2-link); } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 98fd510..8e87556 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1170,7 +1170,6 @@ ast_declarator_list::ast_declarator_list(ast_fully_specified_type *type) { this-type = type; this-invariant = false; - this-ubo_qualifiers_allowed = false; } void -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] glsl: Give a warning, not an error, for UBO qualifiers on struct fields.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59648 --- src/glsl/ast_to_hir.cpp | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index c2fdbd5..f35c11f 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1830,10 +1830,11 @@ validate_matrix_layout_for_type(struct _mesa_glsl_parse_state *state, const glsl_type *type) { if (!type-is_matrix() !type-is_record()) { - _mesa_glsl_error(loc, state, - uniform block layout qualifiers row_major and - column_major can only be applied to matrix and - structure types); + _mesa_glsl_warning(loc, state, + uniform block layout qualifiers row_major and + column_major applied to structure fields is not + strictly conformant and my be rejected by other + compilers); } else if (type-is_record()) { /* We allow 'layout(row_major)' on structure types because it's the only * way to get row-major layouts on matrices contained in structures. -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance
I realize this function isn't used but it looks unnecessarily complicated - two constants one AND plus one comparison when you could simply do a single comparison (compare x with x with unordered not equal). This is actually doubly bad with AVX because the int comparison is going to use 4 instructions instead of 1 (extract/2 cmp/1 insert), well if this runs 8-wide at least. I'm going to kill that function, we already have lp_build_isnan that does the correct thing. Otherwise looks good. Though I'm not sure you really need to kill the prims if the clip distances are infinite? The d3d10 spec says Coordinates coming in to clipping with infinites at x, y, z may or may not result in a discarded primitive.. I liked handling them the same way as nan, otherwise we're just generating pointless primitives. I don't have a strong opinion though, wlk doesn't seem to test infinites. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] glsl: Give a warning, not an error, for UBO qualifiers on struct fields.
On Thu, Aug 15, 2013 at 11:27 AM, Matt Turner matts...@gmail.com wrote: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59648 --- src/glsl/ast_to_hir.cpp | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index c2fdbd5..f35c11f 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1830,10 +1830,11 @@ validate_matrix_layout_for_type(struct _mesa_glsl_parse_state *state, const glsl_type *type) { if (!type-is_matrix() !type-is_record()) { - _mesa_glsl_error(loc, state, - uniform block layout qualifiers row_major and - column_major can only be applied to matrix and - structure types); + _mesa_glsl_warning(loc, state, + uniform block layout qualifiers row_major and + column_major applied to structure fields is not + strictly conformant and my be rejected by other typo in spelling of may. + compilers); } else if (type-is_record()) { /* We allow 'layout(row_major)' on structure types because it's the only * way to get row-major layouts on matrices contained in structures. -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev This series is Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance
Am 15.08.2013 20:27, schrieb Zack Rusin: I realize this function isn't used but it looks unnecessarily complicated - two constants one AND plus one comparison when you could simply do a single comparison (compare x with x with unordered not equal). This is actually doubly bad with AVX because the int comparison is going to use 4 instructions instead of 1 (extract/2 cmp/1 insert), well if this runs 8-wide at least. I'm going to kill that function, we already have lp_build_isnan that does the correct thing. Ah right. It actually uses OEQ then does a not, I think it would be easier to understand if it would just use UNE. I think at some point we might want to optimize these functions a bit, even if it's really llvm's fault. Consider this (a trivial select 1.0 if input is a nan otherwise 0.0) define 4 x float @une(4 x float %in) { entry: %0 = fcmp une 4 x float %in, %in %1 = sext 4 x i1 %0 to 4 x i32 %2 = and 4 x i32 %1, i32 1065353216,i32 1065353216,i32 1065353216,i32 1065353216 %3 = bitcast 4 x i32 %2 to 4 x float ret 4 x float %3 } This generates some neat code: vcmpunordps %xmm0, %xmm0, %xmm0 vandps .LCPI0_0(%rip), %xmm0, %xmm0 ret (FWIW the OEQ version using a xor generates much the same just vcmpordps and vandnotps instead). But now do it with a 1-vec (which we will generally never do, just for illustration): define 1 x float @une(1 x float %in) { entry: %0 = fcmp une 1 x float %in, %in %1 = sext 1 x i1 %0 to 1 x i32 %2 = and 1 x i32 %1, i32 1065353216 %3 = bitcast 1 x i32 %2 to 1 x float ret 1 x float %3 } Generates not-so-hot-code: vucomiss%xmm0, %xmm0 setp%al movzbl %al, %eax shll$31, %eax sarl$31, %eax andl$1065353216, %eax # imm = 0x3F80 vmovd %eax, %xmm0 ret We generally use true scalars instead: define float @une(float %in) { entry: %0 = fcmp une float %in, %in %1 = sext i1 %0 to i32 %2 = and i32 %1, 1065353216 %3 = bitcast i32 %2 to float ret float %3 } Which generates this abomination (don't ask me why 1 x float vs. float makes a difference in generated code): vucomiss%xmm0, %xmm0 setp%cl setne %al orb %cl, %al movl$-1, %ecx testb %al, %al movl$0, %eax cmovnel %ecx, %eax andl$1065353216, %eax # imm = 0x3F80 vmovd %eax, %xmm0 ret So, it looks like for the scalar case using int comparisons would be definitely better as it should avoid most of that ugliness (but I couldn't be bothered to write the test case for that). Now granted my guess is llvm simply should have used the simd unit no matter which version is used in the first place - nothing forbids it to use vector instructions for scalars. But no luck unfortunately... (It is possible newer llvm versions handle this better, didn't check, but I suspect shit like that could potentially make a performance difference if these functions are used enough.) Otherwise looks good. Though I'm not sure you really need to kill the prims if the clip distances are infinite? The d3d10 spec says Coordinates coming in to clipping with infinites at x, y, z may or may not result in a discarded primitive.. I liked handling them the same way as nan, otherwise we're just generating pointless primitives. I don't have a strong opinion though, wlk doesn't seem to test infinites. Ok whatever works. I just thought the test may be simpler but maybe not (given the above...) so that's just fine. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT
We do have the set_caching ioctl. It's enough to flip the PTEs to UC and let MOCS manage things. I actually did a few experiments on my IVB. I made all Mesa's buffers UC via PTEs by patching libdrm to change the cache mode of each bo after allocation. Then I fiddled with the MOCS LLC bits in various ways. It definitely has an effect, sometimes making things slower, sometimes faster. xonotic again seemed to benefit. IIRC leaving everything LLC uncached was actually the fastest (w/ high quality at least) so we may be thrashing the LLC a bit there. But eg. reaction quake regressed quite a lot if most things were left as UC. Can you share the libdrm patch? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] radeonsi: LLVM r187139 broke some piglit tests
On Thu, Aug 15, 2013 at 07:50:10PM +0200, Michel Dänzer wrote: On Don, 2013-08-15 at 09:16 -0700, Tom Stellard wrote: On Thu, Aug 15, 2013 at 08:22:39AM -0700, Tom Stellard wrote: On Thu, Aug 15, 2013 at 11:55:36AM +0200, Michel Dänzer wrote: On Fre, 2013-08-02 at 17:58 +0200, Michel Dänzer wrote: On Mit, 2013-07-31 at 08:42 -0700, Tom Stellard wrote: On Wed, Jul 31, 2013 at 01:04:01PM +0200, Michel Dänzer wrote: LLVM revision 187139 ('Allocate local registers in order for optimal coloring.') broke some derivative related piglit tests with the radeonsi driver. I'm attaching a diff between the bad and good generated code (as printed with RADEON_DUMP_SHADERS=1) for the glsl-derivs test. The only difference I can see is in which registers are used in which order. I wonder if we might be missing S_WAITCNT after DS_READ/WRITE instructions in some cases, but I haven't spotted any candidates for that in the bad code which aren't there in the good code as well. Can anyone else spot something I've missed? Shouldn't we be using the S_BARRIER instruction to keep the threads in sync? Doesn't seem to help unfortunately, but thanks for the good suggestion. I found one thing going wrong: DS_WRITE_B32 ends up using a VGPR register number instead of the $gds operand for encoding the GDS field (the asm output from llc even shows the VGPR name). If the VGPR number happens to be odd (i.e. to have the least significant bit set), the shader ends up writing to GDS instead of LDS. Ouch, that's a pretty bad bug. But I have no idea why this is happening, or how to fix it. :( I can take a look at it. The attached patch should fix the problem, can you test? Thanks for finding my silly mistake. However, I'd like to preserve the ability to use these instructions for GDS access, and the logic in SIInsertWaits::getHwCounts() only really makes sense for SMRD anyway. How about this patch instead? It fixes the piglit regressions that prompted me to start this thread. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer From 4ad4d9bba31b06bbadb83fead3e53d989b80d83e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com Date: Thu, 15 Aug 2013 19:43:02 +0200 Subject: [PATCH] R600/SI: Fix broken encoding of DS_WRITE_B32 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused it to corrupt the encoding of that by clobbering the first operand with the second one. Undo that damage and only apply the SMRD logic to that. Fixes some derivates related piglit regressions with radeonsi. Signed-off-by: Michel Dänzer michel.daen...@amd.com --- lib/Target/R600/SIInsertWaits.cpp | 22 ++ test/CodeGen/R600/local-memory.ll | 4 ++-- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp index ba202e3..200e064 100644 --- a/lib/Target/R600/SIInsertWaits.cpp +++ b/lib/Target/R600/SIInsertWaits.cpp @@ -134,14 +134,20 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) { // LGKM may uses larger values if (TSFlags SIInstrFlags::LGKM_CNT) { -MachineOperand Op = MI.getOperand(0); -if (!Op.isReg()) - Op = MI.getOperand(1); -assert(Op.isReg() First LGKM operand must be a register!); - -unsigned Reg = Op.getReg(); -unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize(); -Result.Named.LGKM = Size 4 ? 2 : 1; +if (MI.getNumOperands() == 3) { We should add a TSFlag for SMRD like we do for MIMG and add a helper function isSMRD to SIInstrInfo and use it here. The number of operands for instructions tends to change from time to time. -Tom + + // SMRD + MachineOperand Op = MI.getOperand(0); + assert(Op.isReg() First LGKM operand must be a register!); + + unsigned Reg = Op.getReg(); + unsigned Size = TRI-getMinimalPhysRegClass(Reg)-getSize(); + Result.Named.LGKM = Size 4 ? 2 : 1; + +} else { + // DS + Result.Named.LGKM = 1; +} } else { Result.Named.LGKM = 0; diff --git a/test/CodeGen/R600/local-memory.ll b/test/CodeGen/R600/local-memory.ll index 5458fb9..9ebb769 100644 --- a/test/CodeGen/R600/local-memory.ll +++ b/test/CodeGen/R600/local-memory.ll @@ -13,7 +13,7 @@ ; SI-CHECK-NEXT: .long 32768 ; EG-CHECK: LDS_WRITE -; SI-CHECK: DS_WRITE_B32 +; SI-CHECK: DS_WRITE_B32 0 ; GROUP_BARRIER must be the
Re: [Mesa-dev] [PATCH 07/15] i965/fs: Create the COPY() set for use in copy propagation dataflow.
On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote: This is the COPY set from Muchnick's textbook, which is necessary to do the dataflow algorithm correctly. I believe this can be done more easily. The only ACP entries that are in the hastable on exit from opt_copy_propagate_local() are the ones that made it to the end of the block. So we should be able to simply populate the new COPY set in the for loop at the bottom of the fs_copy_prop_dataflow constructor. We don't need to do the extra work you're doing in setup_initial_values(). Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 27 ++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 7aff36b..2970af6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -64,6 +64,13 @@ struct block_data { BITSET_WORD *liveout; /** +* Which entries in the fs_copy_prop_dataflow acp table are generated by +* instructions in this block which reach the end of the block without +* being killed. +*/ + BITSET_WORD *copy; + + /** * Which entries in the fs_copy_prop_dataflow acp table are killed over the * course of this block. */ @@ -113,6 +120,7 @@ fs_copy_prop_dataflow::fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg, for (int b = 0; b cfg-num_blocks; b++) { bd[b].livein = rzalloc_array(bd, BITSET_WORD, bitset_words); bd[b].liveout = rzalloc_array(bd, BITSET_WORD, bitset_words); + bd[b].copy = rzalloc_array(bd, BITSET_WORD, bitset_words); bd[b].kill = rzalloc_array(bd, BITSET_WORD, bitset_words); for (int i = 0; i ACP_HASH_SIZE; i++) { @@ -148,15 +156,26 @@ fs_copy_prop_dataflow::setup_initial_values() if (inst-dst.file != GRF) continue; - /* Mark ACP entries which are killed by this instruction. */ for (int i = 0; i num_acp; i++) { -if (inst != acp[i]-inst -(inst-overwrites_reg(acp[i]-dst) || - inst-overwrites_reg(acp[i]-src))) { +if (inst == acp[i]-inst) { + /* Add this entry to the COPY set. */ + BITSET_SET(bd[b].copy, i); +} else if (inst-overwrites_reg(acp[i]-dst) || + inst-overwrites_reg(acp[i]-src)) { + /* The current instruction kills this copy. Add the entry to +* the KILL set. +*/ BITSET_SET(bd[b].kill, i); } } } + + /* Anything killed did not make it to the end of the block, so it + * shouldn't be in COPY. + */ + for (int i = 0; i bitset_words; i++) { + bd[b].copy[i] = ~bd[b].kill[i]; + } } } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] Bison 3 fixes
On 08/15/2013 06:55 AM, Armin K. wrote: Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release? eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0 f043381334a0760ec118d07b6fb7785b5692572a de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5 6d2a9220b832d9a0c0cf35fcc5b9de1542af267d 5ffa28df4e4cc22481b4ed41c78632f35765f41d It looks like this are all on the release branch already. If they're on the branch, they'll be in the release. ___ mesa-stable mailing list mesa-sta...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-stable ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Fix Sandybridge regressions from SEL optimization.
On Tue, Aug 13, 2013 at 4:47 PM, Kenneth Graunke kenn...@whitecape.org wrote: Sandybridge is the only platform that supports an IF instruction with an embedded comparison. In this case, we need to emit a CMP to go along with the SEL. Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16, fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and isinf-and-isnan fs_fbo. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index a36c248..984b08a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1911,10 +1911,19 @@ fs_visitor::try_replace_with_sel() emit(MOV(src0, then_mov-src[0])); } - fs_inst *sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); - sel-predicate = if_inst-predicate; - sel-predicate_inverse = if_inst-predicate_inverse; - sel-conditional_mod = if_inst-conditional_mod; + fs_inst *sel; + if (if_inst-conditional_mod) { + /* Sandybridge-specific IF with embedded comparison */ + emit(CMP(reg_null_d, if_inst-src[0], if_inst-src[1], + if_inst-conditional_mod)); + sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); + sel-predicate = BRW_PREDICATE_NORMAL; + } else { + /* Separate CMP and IF instructions */ + sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); + sel-predicate = if_inst-predicate; + sel-predicate_inverse = if_inst-predicate_inverse; + } } } -- 1.8.3.4 Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/15] i965/fs: Fully recompute liveout at each step.
On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote: Since we start with an overestimation of livein (0x), successive steps may should actually take away values. This means we can't simply Is may should a Southernism? In any case, Reviewed-by: Paul Berry stereotype...@gmail.com OR in new liveout values; we need to recompute it from scratch at each iteration of the fixed-point algorithm. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index f5c8e4a..9522649 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -219,7 +219,7 @@ fs_copy_prop_dataflow::run() for (int i = 0; i bitset_words; i++) { const BITSET_WORD old_liveout = bd[b].liveout[i]; -bd[b].liveout[i] |= +bd[b].liveout[i] = bd[b].copy[i] | (bd[b].livein[i] ~bd[b].kill[i]); if (old_liveout != bd[b].liveout[i]) -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] i965 global copy propagation fixes
On 12 August 2013 13:11, Kenneth Graunke kenn...@whitecape.org wrote: Hello, This surprisingly large series fixes bugs in the data flow algorithm for copy propagation. As far as I can tell, there were a myriad of issues. The updated algorithm seems to follow the textbook and online materials much more closely. No Piglit regressions on Ivybridge. shader-db statistics are more or less a wash: total instructions in shared programs: 1569184 - 1569423 (0.02%) instructions in affected programs: 44961 - 45200 (0.53%) A bunch of things are helped by a small margin, while other things are hurt by a small margin (a couple of extra MOVs). The few hurt shaders I looked into appeared to suffer from compute-to-MRF not handling control flow. There may be other reasons as well. I'd still like to land it, though, as the old algorithm seems to be pretty broken. --Ken I sent comments on patch 2, 7, and 13. The remainder are: Reviewed-by: Paul Berry stereotype...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Fix Sandybridge regressions from SEL optimization.
On Tue, Aug 13, 2013 at 5:57 PM, Kenneth Graunke kenn...@whitecape.org wrote: On 08/13/2013 05:49 PM, Ian Romanick wrote: On 08/13/2013 04:47 PM, Kenneth Graunke wrote: Sandybridge is the only platform that supports an IF instruction with an embedded comparison. In this case, we need to emit a CMP to go along with the SEL. Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16, fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and isinf-and-isnan fs_fbo. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index a36c248..984b08a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1911,10 +1911,19 @@ fs_visitor::try_replace_with_sel() emit(MOV(src0, then_mov-src[0])); } - fs_inst *sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); - sel-predicate = if_inst-predicate; - sel-predicate_inverse = if_inst-predicate_inverse; - sel-conditional_mod = if_inst-conditional_mod; + fs_inst *sel; + if (if_inst-conditional_mod) { + /* Sandybridge-specific IF with embedded comparison */ This doesn't appear to be SNB-specific code. Can you explain this? Is if_inst-conditional_mod only set on SNB? I really need to learn more about the back end... Normally, control flow looks like: cmp.l.f0(8) null g58,8,1F0F (+f0) if(8) For Sandybridge, the hardware designers extended IF to support built-in comparisons, so you can simply do: if.l(8)g58,8,1F0F They immediately dropped this with Ivybridge; it's not been present on any other platform. fs_inst::conditional_mod represents that conditional modifier (always = 0, never, less, equal, lequal, greater, notequal, gequal). Thanks for explaining Ken. Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] Bison 3 fixes
On 16.8.2013 0:20, Ian Romanick wrote: On 08/15/2013 06:55 AM, Armin K. wrote: Can we have Bison 3 fixes in 9.2 branch so they make it into 9.2 release? eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0 f043381334a0760ec118d07b6fb7785b5692572a de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5 6d2a9220b832d9a0c0cf35fcc5b9de1542af267d 5ffa28df4e4cc22481b4ed41c78632f35765f41d It looks like this are all on the release branch already. If they're on the branch, they'll be in the release. ___ mesa-stable mailing list mesa-sta...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-stable Last one is missing. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Fix Sandybridge regressions from SEL optimization.
On 08/13/2013 05:57 PM, Kenneth Graunke wrote: On 08/13/2013 05:49 PM, Ian Romanick wrote: On 08/13/2013 04:47 PM, Kenneth Graunke wrote: Sandybridge is the only platform that supports an IF instruction with an embedded comparison. In this case, we need to emit a CMP to go along with the SEL. Fixes regressions in Piglit's glsl-fs-atan-3, fs-unpackHalf2x16, fs-faceforward-float-float-float, isinf-and-isnan fs_basic, and isinf-and-isnan fs_fbo. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index a36c248..984b08a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1911,10 +1911,19 @@ fs_visitor::try_replace_with_sel() emit(MOV(src0, then_mov-src[0])); } - fs_inst *sel = emit(BRW_OPCODE_SEL, then_mov-dst, src0, else_mov-src[0]); - sel-predicate = if_inst-predicate; - sel-predicate_inverse = if_inst-predicate_inverse; - sel-conditional_mod = if_inst-conditional_mod; + fs_inst *sel; + if (if_inst-conditional_mod) { + /* Sandybridge-specific IF with embedded comparison */ This doesn't appear to be SNB-specific code. Can you explain this? Is if_inst-conditional_mod only set on SNB? I really need to learn more about the back end... Normally, control flow looks like: cmp.l.f0(8) null g58,8,1F0F (+f0) if(8) For Sandybridge, the hardware designers extended IF to support built-in comparisons, so you can simply do: if.l(8)g58,8,1F0F They immediately dropped this with Ivybridge; it's not been present on any other platform. fs_inst::conditional_mod represents that conditional modifier (always = 0, never, less, equal, lequal, greater, notequal, gequal). In other words... yes to Is if_inst-conditional_mod only set on SNB? :) Reviewed-by: Ian Romanick ian.d.roman...@intel.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: Emit MOVs for neg/abs.
On 12 August 2013 13:18, Matt Turner matts...@gmail.com wrote: Necessary to avoid combining a bitcast and a modifier into a single operation. Otherwise if safe, the MOV should be removed by copy-propagation or register coalescing. --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) This patch is: Reviewed-by: Paul Berry stereoytpe...@gmail.com diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index ee7728c..fa4554b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -361,12 +361,12 @@ fs_visitor::visit(ir_expression *ir) break; case ir_unop_neg: op[0].negate = !op[0].negate; - this-result = op[0]; + emit(MOV(this-result, op[0])); break; case ir_unop_abs: op[0].abs = true; op[0].negate = false; - this-result = op[0]; + emit(MOV(this-result, op[0])); break; case ir_unop_sign: temp = fs_reg(this, ir-type); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 8d4a5d4..05c0091 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1391,12 +1391,12 @@ vec4_visitor::visit(ir_expression *ir) break; case ir_unop_neg: op[0].negate = !op[0].negate; - this-result = op[0]; + emit(MOV(result_dst, op[0])); break; case ir_unop_abs: op[0].abs = true; op[0].negate = false; - this-result = op[0]; + emit(MOV(result_dst, op[0])); break; case ir_unop_sign: -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Don't copy propagate bitcasts with source modifiers.
On 12 August 2013 13:18, Matt Turner matts...@gmail.com wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp| 13 + src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 3 +++ src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 10 ++ 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a81e97f..f104f8c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2118,6 +2118,19 @@ fs_visitor::register_coalesce() } } + if (has_source_modifiers) { +for (int i = 0; i 3; i++) { + if (scan_inst-src[i].file == GRF + scan_inst-src[i].reg == inst-dst.reg + scan_inst-src[i].reg_offset == inst-dst.reg_offset + inst-dst.type != scan_inst-src[i].type) + { + return false; + } +} + } + + I don't understand the whole patch, but I'm pretty sure it's wrong to return false in this case, because that will exit the entire fs_visitor::register_coalesce() function. /* The gen6 MATH instruction can't handle source modifiers or * unusual register regions, so avoid coalescing those for * now. We should do something more specific. diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 234f8bd..a5cd858 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -221,6 +221,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) entry-src.smear != -1) !can_do_source_mods(inst)) return false; + if (has_source_modifiers (entry-dst.type != inst-src[arg].type)) + return false; + inst-src[arg].file = entry-src.file; inst-src[arg].reg = entry-src.reg; inst-src[arg].reg_offset = entry-src.reg_offset; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index c28d0de..2a2d403 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -206,14 +206,16 @@ vec4_visitor::try_copy_propagation(vec4_instruction *inst, int arg, if (inst-src[arg].negate) value.negate = !value.negate; - bool has_source_modifiers = (value.negate || value.abs || -value.swizzle != BRW_SWIZZLE_XYZW || -value.file == UNIFORM); + bool has_source_modifiers = value.negate || value.abs; /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on * instructions. */ - if (has_source_modifiers !can_do_source_mods(inst)) + if ((has_source_modifiers || value.file == UNIFORM || +value.swizzle != BRW_SWIZZLE_XYZW) !can_do_source_mods(inst)) + return false; + + if (has_source_modifiers (value.type != inst-src[arg].type)) return false; bool is_3src_inst = (inst-opcode == BRW_OPCODE_LRP || -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Don't copy propagate bitcasts with source modifiers.
Previously, copy propagation would cause bitcast_f2u(abs(float)) to be performed in a single step, but the application of source modifiers (abs, neg) happens after type conversion, leading to incorrect results. That is, for bitcast_f2u(abs(float)) we would in fact generate code to do abs(bitcast_f2u(float)). For example, whereas bitcast_f2u(abs(float)) might result in a register argument such as (abs)g2.20,1,0UD v2: Set interfered = true and break in register_coalesce instead of returning false. --- src/mesa/drivers/dri/i965/brw_fs.cpp| 14 ++ src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 3 +++ src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 10 ++ 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 69e544a..d111cbd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2118,6 +2118,20 @@ fs_visitor::register_coalesce() } } + if (has_source_modifiers) { +for (int i = 0; i 3; i++) { + if (scan_inst-src[i].file == GRF + scan_inst-src[i].reg == inst-dst.reg + scan_inst-src[i].reg_offset == inst-dst.reg_offset + inst-dst.type != scan_inst-src[i].type) + { + interfered = true; + break; + } +} + } + + /* The gen6 MATH instruction can't handle source modifiers or * unusual register regions, so avoid coalescing those for * now. We should do something more specific. diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 234f8bd..61b2617 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -221,6 +221,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) entry-src.smear != -1) !can_do_source_mods(inst)) return false; + if (has_source_modifiers entry-dst.type != inst-src[arg].type) + return false; + inst-src[arg].file = entry-src.file; inst-src[arg].reg = entry-src.reg; inst-src[arg].reg_offset = entry-src.reg_offset; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index c28d0de..fdbe96c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -206,14 +206,16 @@ vec4_visitor::try_copy_propagation(vec4_instruction *inst, int arg, if (inst-src[arg].negate) value.negate = !value.negate; - bool has_source_modifiers = (value.negate || value.abs || -value.swizzle != BRW_SWIZZLE_XYZW || -value.file == UNIFORM); + bool has_source_modifiers = value.negate || value.abs; /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on * instructions. */ - if (has_source_modifiers !can_do_source_mods(inst)) + if ((has_source_modifiers || value.file == UNIFORM || +value.swizzle != BRW_SWIZZLE_XYZW) !can_do_source_mods(inst)) + return false; + + if (has_source_modifiers value.type != inst-src[arg].type) return false; bool is_3src_inst = (inst-opcode == BRW_OPCODE_LRP || -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: implement better control of per-quad/per-element/scalar lod
From: Roland Scheidegger srol...@vmware.com There's a new debug value used to disable per-quad lod optimizations in fragment shader (ignored for vs/gs as the results are just too wrong typically). Also trying to detect if a supplied lod value is really a scalar (if it's coming from immediate or constant file) in which case sampler code can use this to stay on per-quad-lod path (in fact for explicit lod could simplify even further and use same lod for both quads in the avx case but this is not implemented yet). Still need to actually implement per-element lod bias (and derivatives), and need to handle per-element lod in size queries. --- src/gallium/auxiliary/draw/draw_llvm_sample.c |8 +- src/gallium/auxiliary/gallivm/lp_bld_debug.h |3 +- src/gallium/auxiliary/gallivm/lp_bld_init.c |1 + src/gallium/auxiliary/gallivm/lp_bld_sample.h | 11 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |8 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |5 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 158 - src/gallium/drivers/llvmpipe/lp_tex_sample.c |8 +- 8 files changed, 147 insertions(+), 55 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm_sample.c b/src/gallium/auxiliary/draw/draw_llvm_sample.c index 97b0255..a6341fa 100644 --- a/src/gallium/auxiliary/draw/draw_llvm_sample.c +++ b/src/gallium/auxiliary/draw/draw_llvm_sample.c @@ -238,7 +238,7 @@ draw_llvm_sampler_soa_emit_fetch_texel(const struct lp_build_sampler_soa *base, const struct lp_derivatives *derivs, LLVMValueRef lod_bias, /* optional */ LLVMValueRef explicit_lod, /* optional */ - boolean scalar_lod, + enum lp_sampler_lod_property lod_property, LLVMValueRef *texel) { struct draw_llvm_sampler_soa *sampler = (struct draw_llvm_sampler_soa *)base; @@ -257,7 +257,7 @@ draw_llvm_sampler_soa_emit_fetch_texel(const struct lp_build_sampler_soa *base, coords, offsets, derivs, - lod_bias, explicit_lod, scalar_lod, + lod_bias, explicit_lod, lod_property, texel); } @@ -272,7 +272,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct lp_build_sampler_soa *base, unsigned texture_unit, unsigned target, boolean is_sviewinfo, - boolean scalar_lod, + enum lp_sampler_lod_property lod_property, LLVMValueRef explicit_lod, /* optional */ LLVMValueRef *sizes_out) { @@ -287,7 +287,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct lp_build_sampler_soa *base, texture_unit, target, is_sviewinfo, - scalar_lod, + lod_property, explicit_lod, sizes_out); } diff --git a/src/gallium/auxiliary/gallivm/lp_bld_debug.h b/src/gallium/auxiliary/gallivm/lp_bld_debug.h index 4f38edf..76c39af 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_debug.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_debug.h @@ -43,7 +43,8 @@ #define GALLIVM_DEBUG_PERF (1 4) #define GALLIVM_DEBUG_NO_BRILINEAR (1 5) #define GALLIVM_DEBUG_NO_RHO_APPROX (1 6) -#define GALLIVM_DEBUG_GC(1 7) +#define GALLIVM_DEBUG_NO_QUAD_LOD (1 7) +#define GALLIVM_DEBUG_GC(1 8) #ifdef __cplusplus diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c index e4cc058..61eadb8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c @@ -80,6 +80,7 @@ static const struct debug_named_value lp_bld_debug_flags[] = { { perf, GALLIVM_DEBUG_PERF, NULL }, { no_brilinear, GALLIVM_DEBUG_NO_BRILINEAR, NULL }, { no_rho_approx, GALLIVM_DEBUG_NO_RHO_APPROX, NULL }, + { no_quad_lod, GALLIVM_DEBUG_NO_QUAD_LOD, NULL }, { gc, GALLIVM_DEBUG_GC, NULL }, DEBUG_NAMED_VALUE_END }; diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h b/src/gallium/auxiliary/gallivm/lp_bld_sample.h index 6d8fe88..a6901d1 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h @@ -61,6 +61,13 @@ struct lp_derivatives }; +enum lp_sampler_lod_property { + lp_sampler_lod_scalar, + lp_sampler_lod_perelement, + lp_sampler_lod_perquad +}; + + /** * Texture static state. * @@ -476,7 +483,7
[Mesa-dev] [PATCH 1/2] radeonsi: add FMASK texture binding slots and resource setup (v2)
v2: bind FMASK textures to shader resource slots 16..31 --- This depends on the patch: radeonsi: add flexible shader descriptor management and use it for sampler views src/gallium/drivers/radeonsi/r600_resource.h | 1 + src/gallium/drivers/radeonsi/r600_texture.c | 1 + src/gallium/drivers/radeonsi/radeonsi_pipe.h | 1 + src/gallium/drivers/radeonsi/si_descriptors.c | 5 ++- src/gallium/drivers/radeonsi/si_state.c | 52 +++ src/gallium/drivers/radeonsi/si_state.h | 10 -- 6 files changed, 67 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_resource.h b/src/gallium/drivers/radeonsi/r600_resource.h index e5dd36a..ab5c7b7 100644 --- a/src/gallium/drivers/radeonsi/r600_resource.h +++ b/src/gallium/drivers/radeonsi/r600_resource.h @@ -44,6 +44,7 @@ struct r600_fmask_info { unsigned offset; unsigned size; unsigned alignment; + unsigned pitch; unsigned bank_height; unsigned slice_tile_max; unsigned tile_mode_index; diff --git a/src/gallium/drivers/radeonsi/r600_texture.c b/src/gallium/drivers/radeonsi/r600_texture.c index 59e3604..1507239 100644 --- a/src/gallium/drivers/radeonsi/r600_texture.c +++ b/src/gallium/drivers/radeonsi/r600_texture.c @@ -463,6 +463,7 @@ static void r600_texture_get_fmask_info(struct r600_screen *rscreen, out-slice_tile_max -= 1; out-tile_mode_index = fmask.tiling_index[0]; + out-pitch = fmask.level[0].nblk_x; out-bank_height = fmask.bankh; out-alignment = MAX2(256, fmask.bo_alignment); out-size = fmask.bo_size; diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index 147368c..27913ed 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -83,6 +83,7 @@ struct si_pipe_sampler_view { struct pipe_sampler_viewbase; struct si_resource *resource; uint32_tstate[8]; + uint32_tfmask_state[8]; }; struct si_pipe_sampler_state { diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index f05c8f4..e171f8a 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -115,6 +115,9 @@ static void si_init_descriptors(struct r600_context *rctx, { uint64_t va; + assert(num_elements = sizeof(desc-enabled_mask)*8); + assert(num_elements = sizeof(desc-dirty_mask)*8); + desc-atom.emit = emit_func; desc-shader_userdata_reg = shader_userdata_reg; desc-element_dw_size = element_dw_size; @@ -263,7 +266,7 @@ static void si_init_sampler_views(struct r600_context *rctx, si_init_descriptors(rctx, views-desc, si_get_shader_user_data_base(shader) + SI_SGPR_RESOURCE * 4, - 8, 16, si_emit_sampler_views); + 8, NUM_SAMPLER_VIEWS, si_emit_sampler_views); } static void si_release_sampler_views(struct si_sampler_views *views) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index b86872b..77a4339 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -2705,6 +2705,44 @@ static struct pipe_sampler_view *si_create_sampler_view(struct pipe_context *ctx view-state[6] = 0; view-state[7] = 0; + /* Initialize the sampler view for FMASK. */ + if (tmp-fmask.size) { + uint64_t va = r600_resource_va(ctx-screen, texture) + tmp-fmask.offset; + uint32_t fmask_format; + + switch (texture-nr_samples) { + case 2: + fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2; + break; + case 4: + fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4; + break; + case 8: + fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8; + break; + default: + assert(0); + } + + view-fmask_state[0] = va 8; + view-fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va 40) | + S_008F14_DATA_FORMAT(fmask_format) | + S_008F14_NUM_FORMAT(V_008F14_IMG_NUM_FORMAT_UINT); + view-fmask_state[2] = S_008F18_WIDTH(width - 1) | + S_008F18_HEIGHT(height - 1); + view-fmask_state[3] = S_008F1C_DST_SEL_X(V_008F1C_SQ_SEL_X) | + S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) | +
[Mesa-dev] [PATCH 2/2] radeonsi: implement texture fetching for compressed MSAA textures (v2)
v2: use resource slots 16..31 for FMASK textures --- src/gallium/drivers/radeonsi/radeonsi_shader.c | 121 - 1 file changed, 116 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index cbe10cc..3e62858 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -915,6 +915,12 @@ handle_semantic: /* ctx-shader-output[i].spi_sid = r600_spi_sid(ctx-shader-output[i]);*/ } +static const struct lp_build_tgsi_action txf_action; + +static void build_tex_intrinsic(const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data); + static void tex_fetch_args( struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) @@ -924,9 +930,11 @@ static void tex_fetch_args( const struct tgsi_full_instruction * inst = emit_data-inst; unsigned opcode = inst-Instruction.Opcode; unsigned target = inst-Texture.Texture; - unsigned sampler_src; + unsigned sampler_src, sampler_index; LLVMValueRef coords[4]; LLVMValueRef address[16]; + LLVMValueRef sample_index_rewrite = NULL; + LLVMValueRef sample_chan = NULL; int ref_pos; unsigned num_coords = tgsi_util_get_texture_coord_dim(target, ref_pos); unsigned count = 0; @@ -1021,9 +1029,99 @@ static void tex_fetch_args( } sampler_src = emit_data-inst-Instruction.NumSrcRegs - 1; + sampler_index = emit_data-inst-Src[sampler_src].Register.Index; + + /* Adjust the sample index according to FMASK. +* +* For uncompressed MSAA surfaces, FMASK should return 0x76543210, +* which is the identity mapping. Each nibble says which physical sample +* should be fetched to get that sample. +* +* For example, 0x1100 means there are only 2 samples stored and +* the second sample covers 3/4 of the pixel. When reading samples 0 +* and 1, return physical sample 0 (determined by the first two 0s +* in FMASK), otherwise return physical sample 1. +* +* The sample index should be adjusted as follows: +* sample_index = (fmask (sample_index * 4)) 0xF; +*/ + if (target == TGSI_TEXTURE_2D_MSAA || + target == TGSI_TEXTURE_2D_ARRAY_MSAA) { + struct lp_build_context *uint_bld = bld_base-uint_bld; + struct lp_build_emit_data txf_emit_data = *emit_data; + LLVMValueRef txf_address[16]; + unsigned txf_count = count; + + memcpy(txf_address, address, sizeof(address)); + + /* Pad to a power-of-two size. */ + while (txf_count util_next_power_of_two(txf_count)) + txf_address[txf_count++] = LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context)); + + /* Read FMASK using TXF. */ + txf_emit_data.chan = 0; + txf_emit_data.dst_type = LLVMVectorType( + LLVMInt32TypeInContext(bld_base-base.gallivm-context), 4); + txf_emit_data.args[0] = lp_build_gather_values(gallivm, txf_address, txf_count); + txf_emit_data.args[1] = si_shader_ctx-resources[FMASK_TEX_OFFSET + sampler_index]; + txf_emit_data.args[2] = lp_build_const_int32(bld_base-base.gallivm, target); + txf_emit_data.arg_count = 3; + + build_tex_intrinsic(txf_action, bld_base, txf_emit_data); + + /* Initialize some constants. */ + if (target == TGSI_TEXTURE_2D_MSAA) { + sample_chan = LLVMConstInt(uint_bld-elem_type, 2, 0); + } else { + sample_chan = LLVMConstInt(uint_bld-elem_type, 3, 0); + } + + LLVMValueRef four = LLVMConstInt(uint_bld-elem_type, 4, 0); + LLVMValueRef F = LLVMConstInt(uint_bld-elem_type, 0xF, 0); + + /* Apply the formula. */ + LLVMValueRef fmask = + LLVMBuildExtractElement(gallivm-builder, + txf_emit_data.output[0], + uint_bld-zero, ); + + LLVMValueRef sample_index = + LLVMBuildExtractElement(gallivm-builder, + txf_emit_data.args[0], + sample_chan, ); + + LLVMValueRef sample_index4 = + LLVMBuildMul(gallivm-builder, sample_index, four, ); + + LLVMValueRef shifted_fmask = + LLVMBuildLShr(gallivm-builder, fmask, sample_index4, ); + + LLVMValueRef
Re: [Mesa-dev] [PATCH 1/2] i965: Emit MOVs for neg/abs.
On 08/12/2013 01:18 PM, Matt Turner wrote: Necessary to avoid combining a bitcast and a modifier into a single operation. Otherwise if safe, the MOV should be removed by copy-propagation or register coalescing. Has that been verified with shaderdb? --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index ee7728c..fa4554b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -361,12 +361,12 @@ fs_visitor::visit(ir_expression *ir) break; case ir_unop_neg: op[0].negate = !op[0].negate; - this-result = op[0]; + emit(MOV(this-result, op[0])); break; case ir_unop_abs: op[0].abs = true; op[0].negate = false; - this-result = op[0]; + emit(MOV(this-result, op[0])); break; case ir_unop_sign: temp = fs_reg(this, ir-type); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 8d4a5d4..05c0091 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1391,12 +1391,12 @@ vec4_visitor::visit(ir_expression *ir) break; case ir_unop_neg: op[0].negate = !op[0].negate; - this-result = op[0]; + emit(MOV(result_dst, op[0])); break; case ir_unop_abs: op[0].abs = true; op[0].negate = false; - this-result = op[0]; + emit(MOV(result_dst, op[0])); break; case ir_unop_sign: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 67925] ARB_vertex_array_bgra api-errors test fails when normalized is false
https://bugs.freedesktop.org/show_bug.cgi?id=67925 Kenneth Graunke kenn...@whitecape.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|mesa-dev@lists.freedesktop. |kenn...@whitecape.org |org | QA Contact||e...@anholt.net Product|Mesa|piglit Version|git |unspecified Component|Mesa core |tests Blocks|67224 | --- Comment #2 from Kenneth Graunke kenn...@whitecape.org --- I picked Fredrik's version as I liked it best. It's now pushed to master as: commit 0e7a61a29f883c63a5439ac16eddeba3aaf4 Author: Fredrik Höglund fred...@kde.org Date: Fri Apr 12 17:36:06 2013 +0200 mesa: Update the BGRA vertex array error handling The error code was changed from INVALID_VALUE to INVALID_OPERATION in OpenGL 3.3. We should also generate an error when size is BGRA and normalized is FALSE. Reviewed-by: Kenneth Graunke kenn...@whitecape.org Since this uses INVALID_OPERATION, it remains to change the tests. I've posted patches for this: http://lists.freedesktop.org/archives/piglit/2013-August/006774.html http://lists.freedesktop.org/archives/piglit/2013-August/006775.html In the meantime, I've changed this bug to Piglit and removed it from the Mesa 9.2 release tracker bug (as it isn't a Mesa bug now). -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] glsl: Drop duplicate error messages.
On Thursday, August 15, 2013 11:27:14 AM Matt Turner wrote: This same message is printed in the validate_matrix_layout_for_type function. --- src/glsl/ast_to_hir.cpp | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 9bc713d..c9ffa30 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2275,13 +2275,7 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, } if (qual-flags.q.row_major || qual-flags.q.column_major) { - if (!ubo_qualifiers_allowed) { - _mesa_glsl_error(loc, state, - uniform block layout qualifiers row_major and - column_major can only be applied to uniform block - members); - } else - validate_matrix_layout_for_type(state, loc, var-type); + validate_matrix_layout_for_type(state, loc, var-type); Maybe I'm blind, but I don't see where row_major/column_major are only allowed for uniform blocks, as opposed to other interface blocks or variables in general. I thought that was part of what ubo_qualifiers_valid did, but clearly it doesn't. Regardless, assuming you fix the typo in patch 4's error message... This series is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965: Improve comments for driver hooks in intel_buffer_object.c.
On 14 August 2013 12:07, Kenneth Graunke kenn...@whitecape.org wrote: Consistently using a The ___ driver hook. line at the the top of each function's comment block makes it easy to see at a glance what function is being implemented. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Series is: Reviewed-by: Paul Berry stereotype...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Direct3D 9 state tracker
Hello, one day I bought Mass Effect 3, but currently I'm not really able to play it on my Linux laptop featuring 3.0 Mesa 9.1.4. Gallium 0.4 on AMD CEDAR I tried the fglrx driver, but it gives more problems than it solves (yet the fps is there). With the radeon driver I manage to get to an ALMOST decent frame rate, meaning that is barely playable. In conclusion I would like to try this new D3D9, but would somebody be so kind to drop me 10 lines as a guideline to do it? Thanks in advance, Carlo. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau/video: use correct parameter name
Am 15.08.2013 14:54, schrieb Emil Velikov: On 15/08/13 13:41, Rico Schüller wrote: On 15.08.2013 02:10, Emil Velikov wrote: Fix a typo introduced with commit d1ba1055d9 - vl: Add support for max level query v2 Cc: Rico Schüller kgbric...@web.de Cc: Christian König christian.koe...@amd.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68126 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- FWIW the original patch could have introduced default params for all profiles, and let the individual driver provide their own function to ease duplication. A call to get_params(VIDEO_CAP_SUPPORTED) ensures that we do not request the CAP_MAX_LEVEL of a unsupported profile And what are the correct default values? Are these the values which most hardware can do (at this time) or the maximum values? (for reference only for H.264 see: http://en.wikipedia.org/wiki/H.264#Levels ) It's pretty much likely that some video decoding units may have different supported levels and then we would add that back in again. I'm not sure this is really better. Does something like the attached meet your needs? I was thinking that the ones used by most hardware can be considered default. Anyway all this is a silly bikeshedding, which I could have omitted. Patch looks good, thanks. Only for the shader based MPEG2 implementation a sane fallback is needed, so the original implementation is actually already fine, cause if anybody things the levels should be different for a specific driver can just change that driver. Christian. Emil Thanks for the fix. Cheers Rico src/gallium/drivers/nouveau/nouveau_video.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nouveau_video.c b/src/gallium/drivers/nouveau/nouveau_video.c index 1563b22..5c4ec0f 100644 --- a/src/gallium/drivers/nouveau/nouveau_video.c +++ b/src/gallium/drivers/nouveau/nouveau_video.c @@ -863,7 +863,7 @@ nouveau_screen_get_video_param(struct pipe_screen *pscreen, case PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE: return true; case PIPE_VIDEO_CAP_MAX_LEVEL: - return vl_level_supported(screen, profile); + return vl_level_supported(pscreen, profile); default: debug_printf(unknown video param: %d\n, param); return 0; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Direct3D 9 state tracker [amended]
Hello, one day I bought Mass Effect 3, but currently I'm not really able to play it on my Linux laptop featuring 3.0 Mesa 9.1.4. Gallium 0.4 on AMD CEDAR I tried the fglrx driver, but it gives more problems than it solves (yet the fps is there). With the radeon driver I manage to get to an ALMOST decent frame rate, meaning that is barely playable. In conclusion I would like to try this new D3D9, but would somebody be so kind to drop me 10 lines as a guideline to do it? Plese consider that Mass Effect 3 appear to run only on wine 5.8 i386 (my laptop is 64 bit), so I should patch that very version. Thanks in advance, Carlo 2013/8/15 Carlo Marchiori carlo.marchi...@gmail.com Hello, one day I bought Mass Effect 3, but currently I'm not really able to play it on my Linux laptop featuring 3.0 Mesa 9.1.4. Gallium 0.4 on AMD CEDAR I tried the fglrx driver, but it gives more problems than it solves (yet the fps is there). With the radeon driver I manage to get to an ALMOST decent frame rate, meaning that is barely playable. In conclusion I would like to try this new D3D9, but would somebody be so kind to drop me 10 lines as a guideline to do it? Thanks in advance, Carlo. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/10] i965: Shorten sampler loops in precompile key setup.
On 14 August 2013 18:55, Kenneth Graunke kenn...@whitecape.org wrote: Now that we have the number of samplers available, we don't need to iterate over all 16. This should be particularly helpful for vertex shaders. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vs.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 69e544a..27263fd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3143,7 +3143,7 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog) key.clamp_fragment_color = ctx-API == API_OPENGL_COMPAT; - for (int i = 0; i MAX_SAMPLERS; i++) { + for (unsigned i = 0; i brw-wm.sampler_count; i++) { Precompile is called during link, before the program is made current. So brw-wm is some other program. I think you need to do this instead: _mesa_fls(fp-Base.SamplersUsed). if (fp-Base.ShadowSamplers (1 i)) { /* Assume DEPTH_TEXTURE_MODE is the default: X, X, X, 1 */ key.tex.swizzles[i] = diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index 5b8173d..7df93c2 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -544,7 +544,7 @@ brw_vs_precompile(struct gl_context *ctx, struct gl_shader_program *prog) key.base.program_string_id = bvp-id; key.base.clamp_vertex_color = ctx-API == API_OPENGL_COMPAT; - for (int i = 0; i MAX_SAMPLERS; i++) { + for (unsigned i = 0; i brw-vs.sampler_count; i++) { Similarly, this should be _mesa_fls(vp-Base.SamplersUsed). With those fixes, this patch is: Reviewed-by: Paul Berry stereotype...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/10] i965: Separate VS/FS sampler tables.
On 14 August 2013 18:55, Kenneth Graunke kenn...@whitecape.org wrote: Currently, i965 uploads a single SAMPLER_STATE table shared across all shader stages (VS, FS). This series splits it out, uploading a unique table for each stage. I think this may actually fix some bugs with vertex texturing: Piglit's fragment-and-vertex-texturing uses two textures (unit 0 and unit 1), one in each shader. vs-SamplersUsed and fs-SamplersUsed both only have one bit set (bit 0), but vs-SamplerUnits and fs-SamplerUnits map them differently (units 0 and 1). The existing code would select fs-SamplerUnits[0], ignoring vs-SamplerUnits[0]. If the two textures had, say, different wrap modes, this would probably illustrate the problem. It also just seems like a good idea. The border color code in particular is much nicer after this change, as it's not directly tied to brw-wm any longer (even though textures can be used in all shader stages). No observed Piglit changes on Ivybridge. Nice! I think this will make things easier for me in geometry shader land as well. I sent a comment on patch 9. The rest are: Reviewed-by: Paul Berry stereotype...@gmail.com Possible follow-up idea: ctx-vs and ctx-fs now have a lot of elements in common: scratch_bo, const_bo, prog_offset, state_offset, push_const_offset, bind_bo_offset, surf_offset, sampler_count, sampler_offset, and sdc_offset. Let's put those in their own struct, and then we can pass a pointer to that struct to upload_sampler_state_table virtual function. I might should try doing that as part of my upcoming geometry shader back-end series. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev