[Freedreno] [PATCH] drm/msm: avoid double-attaching hdmi/edp bridges

2020-03-11 Thread Ilia Mirkin
Each of hdmi and edp are already attached in msm_*_bridge_init. A second
attachment returns -EBUSY, failing the driver load.

Tested with HDMI on IFC6410 (APQ8064 / MDP4), but eDP case should be
analogous.

Fixes: 3ef2f119bd3ed (drm/msm: Use drm_attach_bridge() to attach a bridge to an 
encoder)
Cc: Boris Brezillon 
Signed-off-by: Ilia Mirkin 
---
 drivers/gpu/drm/msm/edp/edp.c   | 4 
 drivers/gpu/drm/msm/hdmi/hdmi.c | 4 
 2 files changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/edp/edp.c b/drivers/gpu/drm/msm/edp/edp.c
index ad4e963ccd9b..106a67473af5 100644
--- a/drivers/gpu/drm/msm/edp/edp.c
+++ b/drivers/gpu/drm/msm/edp/edp.c
@@ -178,10 +178,6 @@ int msm_edp_modeset_init(struct msm_edp *edp, struct 
drm_device *dev,
goto fail;
}
 
-   ret = drm_bridge_attach(encoder, edp->bridge, NULL);
-   if (ret)
-   goto fail;
-
priv->bridges[priv->num_bridges++]   = edp->bridge;
priv->connectors[priv->num_connectors++] = edp->connector;
 
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c
index 1a9b6289637d..737453b6e596 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.c
@@ -327,10 +327,6 @@ int msm_hdmi_modeset_init(struct hdmi *hdmi,
goto fail;
}
 
-   ret = drm_bridge_attach(encoder, hdmi->bridge, NULL);
-   if (ret)
-   goto fail;
-
priv->bridges[priv->num_bridges++]   = hdmi->bridge;
priv->connectors[priv->num_connectors++] = hdmi->connector;
 
-- 
2.24.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Ilia Mirkin
The specification in NIR has to be exact. Otherwise it will
constant-fold in a way that doesn't reflect what the hardware would
do, leading to subtle bugs.

On Fri, Jan 25, 2019 at 11:06 AM Eduardo Lima Mitev  wrote:
>
> On 1/25/19 5:01 PM, Ilia Mirkin wrote:
> > On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin  wrote:
> >>
> >> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
> >> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
> >> had this, and I think maxwell+ has a variant of this implemented by
> >> XMAD):
> >>
> >> (src0 * src1) & 0xff + src2
> >
> > And of course even that's wrong... the 24th bit has to get
> > sign-extended on that. Can express it with shifts.
> >
>
> IMAD_S24 is what is currently used in
> ir3_compiler_nir::get_image_offset(), so the pass doesn't change
> anything regarding computations.
>
> I agree that the nir opcode should hint at the bit limit, so probably
> nir_op_imad24. That is one of the open questions.
>
> Thanks,
>
> Eduardo
>
>
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Ilia Mirkin
On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin  wrote:
>
> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
> had this, and I think maxwell+ has a variant of this implemented by
> XMAD):
>
> (src0 * src1) & 0xff + src2

And of course even that's wrong... the 24th bit has to get
sign-extended on that. Can express it with shifts.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Ilia Mirkin
IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
had this, and I think maxwell+ has a variant of this implemented by
XMAD):

(src0 * src1) & 0xff + src2

Cheers,

  -ilia

On Fri, Jan 25, 2019 at 10:49 AM Eduardo Lima Mitev  wrote:
>
> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..b61845fd514 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, 
> src3_size, const_expr):
> [tuint, tuint, tuint], "", const_expr)
>
>  triop("ffma", tfloat, "src0 * src1 + src2")
> +triop("imad", tint, "src0 * src1 + src2")
>
>  triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2")
>
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [PATCH] freedreno/ir3: Make imageStore use num components from image format

2018-12-17 Thread Ilia Mirkin
Note that the format may not be known. I suspect that falls into your
"default" case.
On Mon, Dec 17, 2018 at 3:41 PM Eduardo Lima Mitev  wrote:
>
> emit_intrinsic_store_image() is always using 4 components when
> collecting registers for the value. When image has less than
> 4 components (e.g, r32f, r32i, r32ui) this results in extra mov
> instructions.
>
> This patch uses the actual number of components from the image format.
>
> For example, in a shader like:
>
> layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
> ...
> void main(void) {
>...
>imageStore (u_image, some_offset, vec4(1.0));
>...
> }
>
> instruction count is reduced in at least 3 instructions (note image
> format is r32f, 1 component only).
>
> This obviously reduces register pressure as well.
> ---
>  src/freedreno/ir3/ir3_compiler_nir.c | 34 ++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/src/freedreno/ir3/ir3_compiler_nir.c 
> b/src/freedreno/ir3/ir3_compiler_nir.c
> index 85f14f354d2..cc00602c249 100644
> --- a/src/freedreno/ir3/ir3_compiler_nir.c
> +++ b/src/freedreno/ir3/ir3_compiler_nir.c
> @@ -1251,6 +1251,35 @@ emit_intrinsic_load_image(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr,
> ir3_split_dest(b, dst, sam, 0, 4);
>  }
>
> +/* Get the number of components of the different image formats supported
> + * by the GLES 3.1 spec.
> + */
> +static unsigned
> +get_num_components_for_glformat(GLuint format)
> +{
> +   switch (format) {
> +   case GL_R32F:
> +   case GL_R32I:
> +   case GL_R32UI:
> +   return 1;
> +
> +   case GL_RGBA32F:
> +   case GL_RGBA16F:
> +   case GL_RGBA8:
> +   case GL_RGBA8_SNORM:
> +   case GL_RGBA32I:
> +   case GL_RGBA16I:
> +   case GL_RGBA8I:
> +   case GL_RGBA32UI:
> +   case GL_RGBA16UI:
> +   case GL_RGBA8UI:
> +   return 4;
> +
> +   default:
> +   assert(!"Unsupported GL format for image");
> +   }
> +}
> +
>  /* src[] = { deref, coord, sample_index, value }. const_index[] = {} */
>  static void
>  emit_intrinsic_store_image(struct ir3_context *ctx, nir_intrinsic_instr 
> *intr)
> @@ -1262,6 +1291,7 @@ emit_intrinsic_store_image(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr)
> struct ir3_instruction * const *coords = ir3_get_src(ctx, 
> >src[1]);
> unsigned ncoords = get_image_coords(var, NULL);
> unsigned tex_idx = get_image_slot(ctx, 
> nir_src_as_deref(intr->src[0]));
> +   unsigned ncomp = 
> get_num_components_for_glformat(var->data.image.format);
>
> /* src0 is value
>  * src1 is coords
> @@ -1276,10 +1306,10 @@ emit_intrinsic_store_image(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr)
>  */
>
> stib = ir3_STIB(b, create_immed(b, tex_idx), 0,
> -   ir3_create_collect(ctx, value, 4), 0,
> +   ir3_create_collect(ctx, value, ncomp), 0,
> ir3_create_collect(ctx, coords, ncoords), 0,
> offset, 0);
> -   stib->cat6.iim_val = 4;
> +   stib->cat6.iim_val = ncomp;
> stib->cat6.d = ncoords;
> stib->cat6.type = get_image_type(var);
> stib->cat6.typed = true;
> --
> 2.19.2
>
> ___
> mesa-dev mailing list
> mesa-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH] ir3_compiler/nir: fix imageSize() for buffer-backed images

2018-10-23 Thread Ilia Mirkin
On Tue, Oct 23, 2018 at 3:03 PM Eduardo Lima Mitev  wrote:
>
> GL_EXT_texture_buffer introduced texture buffers, which can be used
> in shaders through a new type imageBuffer.
>
> Because how image access is implemented in freedreno, calling
> imageSize on an imageBuffer returns the size in bytes instead of texels,
> which is incorrect.
>
> This patch adds a division of imageSize result by the bytes-per-pixel
> of the image format, when image is buffer-backed.
>
> Fixes all tests under
> dEQP-GLES31.functional.image_load_store.buffer.image_size.*
>
> v2: Pre-compute and submit the log2 of the image format's bpp as shader
> constant instead of emitting the LOG2 instruction in code. (Rob Clark)
> ---
>  .../drivers/freedreno/ir3/ir3_compiler_nir.c  | 23 +++
>  .../drivers/freedreno/ir3/ir3_shader.c| 10 
>  2 files changed, 33 insertions(+)
>
> diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
> b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> index 197196383b0..7a3c8a8579c 100644
> --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> @@ -2035,6 +2035,29 @@ emit_intrinsic_image_size(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr,
>
> split_dest(b, tmp, sam, 0, 4);
>
> +   /* get_size instruction returns size in bytes instead of texels
> +* for imageBuffer, so we need to divide it by the pixel size
> +* of the image format.
> +*
> +* TODO: This is at least true on a5xx. Check other gens.
> +*/
> +   enum glsl_sampler_dim dim =
> +   glsl_get_sampler_dim(glsl_without_array(var->type));
> +   if (dim == GLSL_SAMPLER_DIM_BUF) {
> +   /* Since all the possible values the divisor can take are
> +* power-of-two (4, 8, or 16), the division is implemented
> +* as a shift-right.
> +* During shader setup, the log2 of the image format's
> +* bytes-per-pixel should have been emitted in 2nd slot of
> +* image_dims. See ir3_shader::emit_image_dims().
> +*/
> +   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
> +   
> ctx->so->const_layout.image_dims.off[var->data.driver_location];
> +   struct ir3_instruction *aux = create_uniform(ctx, cb + 1);
> +
> +   tmp[0] = ir3_SHR_B(b, tmp[0], 0, aux, 0);
> +   }
> +
> for (unsigned i = 0; i < ncoords; i++)
> dst[i] = tmp[i];
>
> diff --git a/src/gallium/drivers/freedreno/ir3/ir3_shader.c 
> b/src/gallium/drivers/freedreno/ir3/ir3_shader.c
> index 9bf0a7f999c..de59e5888c6 100644
> --- a/src/gallium/drivers/freedreno/ir3/ir3_shader.c
> +++ b/src/gallium/drivers/freedreno/ir3/ir3_shader.c
> @@ -699,6 +699,16 @@ emit_image_dims(struct fd_context *ctx, const struct 
> ir3_shader_variant *v,
> } else {
> dims[off + 2] = 
> rsc->slices[lvl].size0;
> }
> +   } else {
> +   /* For buffer-backed images, the log2 of the 
> format's
> +* bytes-per-pixel is placed on the 2nd slot. 
> This is useful
> +* when emitting image_size instructions, for 
> which we need
> +* to divide by bpp for image buffers. Since 
> the bpp
> +* can only be power-of-two, the division is 
> implemented
> +* as a SHR, and for that it is handy to have 
> the log2 of
> +* bpp as a constant.
> +*/
> +   dims[off + 1] = log (dims[off + 0]) / log (2);

This is not the log function you are looking for. You're looking for
ilog2 or ffs.

  -ilia
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH 4/7] freedreno: a2xx: Support TEXTURE_RECT

2018-03-22 Thread Ilia Mirkin
On Thu, Mar 22, 2018 at 10:43 AM, Wladimir J. van der Laan
<laa...@gmail.com> wrote:
> Hello Ilia,
>
> On Thu, Jan 25, 2018 at 08:41:11AM -0500, Ilia Mirkin wrote:
>> Should you also expose PIPE_CAP_TEXTURE_RECTANGLE? (Or whatever it's
>> called... I forget.)
>
> I checked and I don't think a capability exists for this (anymore?).
>
> Everywhere, the assumption is meant that all Gallium drivers support, or at 
> least emulate this.
>
> For example in src/mesa/state_tracker/st_extensions.c:
>
> extensions->NV_texture_rectangle = GL_TRUE;

You're probably right - texture rect is required as part of Gallium.
Probably always has been.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] Whether A200 driver is supported by Linux Mainline Kernel

2018-03-16 Thread Ilia Mirkin
Hi Abhijit,

Looks like there may have been some duplication of effort...

https://github.com/laanwj/linux-freedreno-a2xx/commits/4.15-rc5-rdu1-kgsl

Seems to be based on 4.15 if the branch name is to be believed.

  -ilia

On Fri, Mar 16, 2018 at 8:54 AM, abhijit  wrote:
> Hi Waldimir,
>
> Hi Wladimir,
>
> Can you please let me know if you have any plans to make changes for A200
> mainline?
>
> I have ported the KGSL driver along with other source files to 4.14.1 kernel
> and got it working. Can you let me know, if I can commit the code to your
> repository?
>
> Regards,
>Abhijit
>
> On Monday 07 August 2017 04:59 PM, Wladimir wrote:
>>>
>>> I guess that it is failing to find the old kgsl shim drm driver, which
>>> enabled allocation of GEM buffers for pixmaps.  I know Wladimir played
>>> a bit with this on imx5, but I think he was just using gbm/kms and not
>>> x11.  I guess he was using imx-drm for GEM buffer allocation?
>>
>>
>> Yes, for the a20x stuff I use the old GSL kernel driver
>> (forward-ported to 4.12, see
>> https://github.com/laanwj/linux-freedreno-a2xx).
>> This creates a /dev/gsl_kmod, which has an interface different from
>> the "newer" kgsl.
>>
>> And a mesa and libdrm that is patched to use this:
>>
>> https://github.com/laanwj/mesa-freedreno-a2xx
>> https://github.com/laanwj/libdrm-freedreno-a20x
>>
>> This fork of mesa doesn't support X11, only kms/gbm-based rendering.
>> All of this is very experimental.
>>
>> Regards,
>> Wladimir
>> .
>>
> ___
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH 4/7] freedreno: a2xx: Support TEXTURE_RECT

2018-01-25 Thread Ilia Mirkin
Should you also expose PIPE_CAP_TEXTURE_RECTANGLE? (Or whatever it's
called... I forget.)

On Thu, Jan 25, 2018 at 8:29 AM, Wladimir J. van der Laan
 wrote:
> Denormalized texture coordinates are required for text rendering in
> GALLIUM_HUD.
>
> Signed-off-by: Wladimir J. van der Laan 
> ---
>  src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 3 ++-
>  src/gallium/drivers/freedreno/a2xx/ir-a2xx.c  | 1 +
>  src/gallium/drivers/freedreno/a2xx/ir-a2xx.h  | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c 
> b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
> index 2ffd8cd..9f2fc61 100644
> --- a/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
> +++ b/src/gallium/drivers/freedreno/a2xx/fd2_compiler.c
> @@ -791,6 +791,7 @@ translate_tex(struct fd2_compile_context *ctx,
> instr = ir2_instr_create(next_exec_cf(ctx), IR2_FETCH);
> instr->fetch.opc = TEX_FETCH;
> instr->fetch.is_cube = (inst->Texture.Texture == TGSI_TEXTURE_3D);
> +   instr->fetch.is_rect = (inst->Texture.Texture == TGSI_TEXTURE_RECT);
> assert(inst->Texture.NumOffsets <= 1); // TODO what to do in other 
> cases?
>
> /* save off the tex fetch to be patched later with correct const_idx: 
> */
> @@ -802,7 +803,7 @@ translate_tex(struct fd2_compile_context *ctx,
> reg = add_src_reg(ctx, instr, coord);
>
> /* blob compiler always sets 3rd component to same as 1st for 2d: */
> -   if (inst->Texture.Texture == TGSI_TEXTURE_2D)
> +   if (inst->Texture.Texture == TGSI_TEXTURE_2D || inst->Texture.Texture 
> == TGSI_TEXTURE_RECT)
> reg->swizzle[2] = reg->swizzle[0];
>
> /* dst register needs to be marked for sync: */
> diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c 
> b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c
> index 163c282..3666a7e 100644
> --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c
> +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.c
> @@ -341,6 +341,7 @@ static int instr_emit_fetch(struct ir2_instruction *instr,
> tex->use_comp_lod = 1;
> tex->use_reg_lod = !instr->fetch.is_cube;
> tex->sample_location = SAMPLE_CENTER;
> +tex->tx_coord_denorm = instr->fetch.is_rect;
>
> if (instr->pred != IR2_PRED_NONE) {
> tex->pred_select = 1;
> diff --git a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h 
> b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h
> index 36ed204..c4b6c18 100644
> --- a/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h
> +++ b/src/gallium/drivers/freedreno/a2xx/ir-a2xx.h
> @@ -74,6 +74,7 @@ struct ir2_instruction {
> unsigned const_idx;
> /* texture fetch specific: */
> bool is_cube : 1;
> +   bool is_rect : 1;
> /* vertex fetch specific: */
> unsigned const_idx_sel;
> enum a2xx_sq_surfaceformat fmt;
> --
> 2.7.4
>
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH 1/7] freedreno: a2xx: Update rnndb header

2018-01-25 Thread Ilia Mirkin
On Thu, Jan 25, 2018 at 8:29 AM, Wladimir J. van der Laan
 wrote:
> Also update BLEND_ to BLEND2_ opcodes to accomodate.

Are you saying this doesn't compile right now? I would have expected
the accompanying change to a2xx.xml.h for that. Perhaps this landed
into the wrong commit?

Also it's odd that the formats are so different than originally
entered. Any opinion on how that happened?

>
> Signed-off-by: Wladimir J. van der Laan 
> ---
>  src/gallium/drivers/freedreno/a2xx/a2xx.xml.h | 33 
> +++
>  src/gallium/drivers/freedreno/a2xx/fd2_gmem.c |  4 ++--
>  2 files changed, 15 insertions(+), 22 deletions(-)
>
> diff --git a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h 
> b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h
> index 55a4355..279a652 100644
> --- a/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h
> +++ b/src/gallium/drivers/freedreno/a2xx/a2xx.xml.h
> @@ -84,13 +84,12 @@ enum a2xx_sq_surfaceformat {
> FMT_5_5_5_1 = 13,
> FMT_8_8_8_8_A = 14,
> FMT_4_4_4_4 = 15,
> -   FMT_10_11_11 = 16,
> -   FMT_11_11_10 = 17,
> +   FMT_8_8_8 = 16,
> FMT_DXT1 = 18,
> FMT_DXT2_3 = 19,
> FMT_DXT4_5 = 20,
> +   FMT_10_10_10_2 = 21,
> FMT_24_8 = 22,
> -   FMT_24_8_FLOAT = 23,
> FMT_16 = 24,
> FMT_16_16 = 25,
> FMT_16_16_16_16 = 26,
> @@ -106,29 +105,23 @@ enum a2xx_sq_surfaceformat {
> FMT_32_FLOAT = 36,
> FMT_32_32_FLOAT = 37,
> FMT_32_32_32_32_FLOAT = 38,
> -   FMT_32_AS_8 = 39,
> -   FMT_32_AS_8_8 = 40,
> -   FMT_16_MPEG = 41,
> -   FMT_16_16_MPEG = 42,
> -   FMT_8_INTERLACED = 43,
> -   FMT_32_AS_8_INTERLACED = 44,
> -   FMT_32_AS_8_8_INTERLACED = 45,
> -   FMT_16_INTERLACED = 46,
> -   FMT_16_MPEG_INTERLACED = 47,
> -   FMT_16_16_MPEG_INTERLACED = 48,
> +   FMT_ATI_TC_RGB = 39,
> +   FMT_ATI_TC_RGBA = 40,
> +   FMT_ATI_TC_555_565_RGB = 41,
> +   FMT_ATI_TC_555_565_RGBA = 42,
> +   FMT_ATI_TC_RGBA_INTERP = 43,
> +   FMT_ATI_TC_555_565_RGBA_INTERP = 44,
> +   FMT_ETC1_RGBA_INTERP = 46,
> +   FMT_ETC1_RGB = 47,
> +   FMT_ETC1_RGBA = 48,
> FMT_DXN = 49,
> -   FMT_8_8_8_8_AS_16_16_16_16 = 50,
> -   FMT_DXT1_AS_16_16_16_16 = 51,
> -   FMT_DXT2_3_AS_16_16_16_16 = 52,
> -   FMT_DXT4_5_AS_16_16_16_16 = 53,
> +   FMT_2_3_3 = 51,
> FMT_2_10_10_10_AS_16_16_16_16 = 54,
> -   FMT_10_11_11_AS_16_16_16_16 = 55,
> -   FMT_11_11_10_AS_16_16_16_16 = 56,
> +   FMT_10_10_10_2_AS_16_16_16_16 = 55,
> FMT_32_32_32_FLOAT = 57,
> FMT_DXT3A = 58,
> FMT_DXT5A = 59,
> FMT_CTX1 = 60,
> -   FMT_DXT3A_AS_1_1_1_1 = 61,
>  };
>
>  enum a2xx_sq_ps_vtx_mode {
> diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
> b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
> index 0905ab6..46a7d18 100644
> --- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
> +++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
> @@ -293,10 +293,10 @@ fd2_emit_tile_mem2gmem(struct fd_batch *batch, struct 
> fd_tile *tile)
> OUT_PKT3(ring, CP_SET_CONSTANT, 2);
> OUT_RING(ring, CP_REG(REG_A2XX_RB_BLEND_CONTROL));
> OUT_RING(ring, A2XX_RB_BLEND_CONTROL_COLOR_SRCBLEND(FACTOR_ONE) |
> -   
> A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND_DST_PLUS_SRC) |
> +   
> A2XX_RB_BLEND_CONTROL_COLOR_COMB_FCN(BLEND2_DST_PLUS_SRC) |
> A2XX_RB_BLEND_CONTROL_COLOR_DESTBLEND(FACTOR_ZERO) |
> A2XX_RB_BLEND_CONTROL_ALPHA_SRCBLEND(FACTOR_ONE) |
> -   
> A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND_DST_PLUS_SRC) |
> +   
> A2XX_RB_BLEND_CONTROL_ALPHA_COMB_FCN(BLEND2_DST_PLUS_SRC) |
> A2XX_RB_BLEND_CONTROL_ALPHA_DESTBLEND(FACTOR_ZERO));
>
> OUT_PKT3(ring, CP_SET_CONSTANT, 3);
> --
> 2.7.4
>
> ___
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] freedreno: set missing internal_format when importing texture

2017-12-21 Thread Ilia Mirkin
Fixes running piglits without -fbo. Probably lots of other stuff too.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/freedreno_resource.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c 
b/src/gallium/drivers/freedreno/freedreno_resource.c
index df00b514396..920e8736a81 100644
--- a/src/gallium/drivers/freedreno/freedreno_resource.c
+++ b/src/gallium/drivers/freedreno/freedreno_resource.c
@@ -840,6 +840,7 @@ fd_resource_from_handle(struct pipe_screen *pscreen,
if (!rsc->bo)
goto fail;
 
+   rsc->internal_format = tmpl->format;
rsc->cpp = util_format_get_blocksize(tmpl->format);
slice->pitch = handle->stride / rsc->cpp;
slice->offset = handle->offset;
-- 
2.13.6

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] freedreno 3D test applications

2017-12-20 Thread Ilia Mirkin
On Thu, Dec 21, 2017 at 12:43 AM, priyanka more
 wrote:
> Hi,
>
> I've cloned freedreno test applications from the bellow link.
> https://github.com/freedreno-zz/freedreno/tree/master/tests-3d
>
> I'm working on 3d test cases, I've executed test-cube.c test-caps.c on
> i.mx53 QSB environment, I got colored cube as output on display.
>
> I'm trying to execute other 3D test cases like test-es2gears.c test-vertex.c
> test-cube-textured.c and 2D test application test-fill.c, but I'm wondering
> what will be the output display of these applications.
>
> Will you please guide me in what would be the output display of these test
> applications?

The idea of these applications is actually not to display anything --
the fact that they do is largely coincidence.

The idea of these applications is to run GL commands against the blob
driver, record a trace, and then analyze it, to figure out how to
operate the underlying hardware. As parameters are changed, we look at
what changes in the command stream, and infer the meanings of various
registers based on how they're programmed relative to the GL commands
being run.

Tracing on the a2xx isn't an exact science right now. There's 20
different incompatible kgsl versions running around. The current
"state of the art" is a hacked up kgsl driver which prints stuff to
dmesg -- see e.g.
https://github.com/laanwj/linux-freedreno-a2xx/tree/4.12-rdu1-kgsl +
dmesg2rd from 
https://github.com/laanwj/freedreno/commit/24905de896b1d15aef9e66f465ec379b24061494.

I have some incomplete work in adapting the libwrap-based tracer to
that particular kgsl version, but ... it's incomplete. And it won't
work with $other kgsl ioctl version.

What's your goal?

  -ilia
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH] freedreno/ir3: avoid using shr.b for immediate offset inputs

2017-11-26 Thread Ilia Mirkin
On Sun, Nov 26, 2017 at 1:29 PM, Rob Clark <robdcl...@gmail.com> wrote:
> On Sun, Nov 26, 2017 at 12:08 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>> Since this is all happening as a post-optimization fixup, and offsets
>> are generally immediates, we can just do the calculation directly.
>>
>> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
>> ---
>>
>> Only very mildly tested. Noticed it when looking closely at our shaders, 
>> thinking
>> why it tries to shift 0 by a constant. This is why.
>
> not strictly against this, but a few thoughts:
>
> 1) I'm not sure how common in real life it is to access ssbo at
> hard-coded offsets.. I've noticed the funny shaders like shifting an
> immed zero by constant too, but figured it wasn't too likely to happen
> in real life.  Although undoing nir's shl w/ our shr might be useful.

I suspect it's moderately common. Any time you don't have a
variably-indexed array, that will happen.

>
> 2) if it is common, maybe support in ir3_cp to recognize the handful
> of instructions that are added when lowering nir instructions to ir3
> would be more beneficial (ie. ssbo load/store isn't the only one to
> add shl/shr/etc..  although the instructions added are a small subset
> of possible instructions so might be sane to make cp a bit more
> clever..
>
> 3) or, perhaps an even better idea is nir->nir pass that lowers things
> into ir3 specific nir instructions and then run nir's opt passes
> again.. that has been kinda on my todo list for a while

Yeah, that's clearly the right way to go. Having new instructions
added after opt is ... not a good idea. (This is why I've never warmed
up to the "frontend" vs "backend" concept -- the backend needs the
opts just as much.)

Happy to drop this until that happens. I just hated seeing

shr.b r0.x, 0, c0.x

(Where c0.x == 2, of course.)

  -ilia

>
> BR,
> -R
>
>>  src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
>> b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
>> index c97df4f1d63..ab326c24aa7 100644
>> --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
>> +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
>> @@ -1351,6 +1351,7 @@ emit_intrinsic_atomic_ssbo(struct ir3_context *ctx, 
>> nir_intrinsic_instr *intr)
>> ssbo = create_immed(b, const_offset->u32[0]);
>>
>> offset = get_src(ctx, >src[1])[0];
>> +   const_offset = nir_src_as_const_value(intr->src[1]);
>>
>> /* src0 is data (or uvec2(data, compare))
>>  * src1 is offset
>> @@ -1359,7 +1360,10 @@ emit_intrinsic_atomic_ssbo(struct ir3_context *ctx, 
>> nir_intrinsic_instr *intr)
>>  * Note that nir already multiplies the offset by four
>>  */
>> src0 = get_src(ctx, >src[2])[0];
>> -   src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
>> +   if (const_offset)
>> +   src1 = create_immed(b, const_offset->u32[0] >> 2);
>> +   else
>> +   src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
>> src2 = create_collect(b, (struct ir3_instruction*[]){
>> offset,
>> create_immed(b, 0),
>> --
>> 2.13.6
>>
>> ___
>> Freedreno mailing list
>> Freedreno@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/freedreno
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] freedreno/ir3: avoid using shr.b for immediate offset inputs

2017-11-26 Thread Ilia Mirkin
Since this is all happening as a post-optimization fixup, and offsets
are generally immediates, we can just do the calculation directly.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---

Only very mildly tested. Noticed it when looking closely at our shaders, 
thinking
why it tries to shift 0 by a constant. This is why.

 src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index c97df4f1d63..ab326c24aa7 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -1351,6 +1351,7 @@ emit_intrinsic_atomic_ssbo(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
ssbo = create_immed(b, const_offset->u32[0]);
 
offset = get_src(ctx, >src[1])[0];
+   const_offset = nir_src_as_const_value(intr->src[1]);
 
/* src0 is data (or uvec2(data, compare))
 * src1 is offset
@@ -1359,7 +1360,10 @@ emit_intrinsic_atomic_ssbo(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
 * Note that nir already multiplies the offset by four
 */
src0 = get_src(ctx, >src[2])[0];
-   src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
+   if (const_offset)
+   src1 = create_immed(b, const_offset->u32[0] >> 2);
+   else
+   src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
src2 = create_collect(b, (struct ir3_instruction*[]){
offset,
create_immed(b, 0),
-- 
2.13.6

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] freedreno/a4xx: add ARB_framebuffer_no_attachments support

2017-11-25 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 docs/features.txt| 4 ++--
 src/gallium/drivers/freedreno/a4xx/fd4_screen.c  | 5 +
 src/gallium/drivers/freedreno/freedreno_screen.c | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 59f7a180700..01cd133ef01 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -172,7 +172,7 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, radeonsi
   GL_KHR_debug  DONE (all drivers)
   GL_ARB_explicit_uniform_location  DONE (all drivers that 
support GLSL)
   GL_ARB_fragment_layer_viewportDONE (i965, nv50, 
r600, llvmpipe, softpipe)
-  GL_ARB_framebuffer_no_attachments DONE (freedreno/a5xx, 
i965, r600, softpipe)
+  GL_ARB_framebuffer_no_attachments DONE (freedreno, i965, 
r600, softpipe)
   GL_ARB_internalformat_query2  DONE (all drivers)
   GL_ARB_invalidate_subdata DONE (all drivers)
   GL_ARB_multi_draw_indirectDONE (freedreno, i965, 
r600, llvmpipe, softpipe, swr)
@@ -244,7 +244,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
   GL_ARB_compute_shader DONE (freedreno/a5xx, 
i965/gen7+, softpipe)
   GL_ARB_draw_indirect  DONE (freedreno, 
i965/gen7+, r600, llvmpipe, softpipe, swr)
   GL_ARB_explicit_uniform_location  DONE (all drivers that 
support GLSL)
-  GL_ARB_framebuffer_no_attachments DONE (freedreno/a5xx, 
i965/gen7+, r600, softpipe)
+  GL_ARB_framebuffer_no_attachments DONE (freedreno, 
i965/gen7+, r600, softpipe)
   GL_ARB_program_interface_queryDONE (all drivers)
   GL_ARB_shader_atomic_counters DONE (freedreno/a5xx, 
i965/gen7+, r600, softpipe)
   GL_ARB_shader_image_load_storeDONE (freedreno/a5xx, 
i965/gen7+, r600, softpipe)
diff --git a/src/gallium/drivers/freedreno/a4xx/fd4_screen.c 
b/src/gallium/drivers/freedreno/a4xx/fd4_screen.c
index 6006bb96b3a..1b81f8db2f3 100644
--- a/src/gallium/drivers/freedreno/a4xx/fd4_screen.c
+++ b/src/gallium/drivers/freedreno/a4xx/fd4_screen.c
@@ -75,6 +75,11 @@ fd4_screen_is_format_supported(struct pipe_screen *pscreen,
PIPE_BIND_SHARED);
}
 
+   /* For ARB_framebuffer_no_attachments: */
+   if ((usage & PIPE_BIND_RENDER_TARGET) && (format == PIPE_FORMAT_NONE)) {
+   retval |= usage & PIPE_BIND_RENDER_TARGET;
+   }
+
if ((usage & PIPE_BIND_DEPTH_STENCIL) &&
(fd4_pipe2depth(format) != (enum a4xx_depth_format)~0) 
&&
(fd4_pipe2tex(format) != (enum a4xx_tex_fmt)~0)) {
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 3fa28e3f310..aea56a180af 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -338,7 +338,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
return 0;
 
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
-   if (is_a5xx(screen))
+   if (is_a4xx(screen) || is_a5xx(screen))
return 1;
return 0;
 
-- 
2.13.6

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH 2/2] freedreno/a4xx: add indirect draw support

2017-11-25 Thread Ilia Mirkin
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 docs/features.txt|  6 ++---
 src/gallium/drivers/freedreno/a4xx/fd4_draw.h| 29 
 src/gallium/drivers/freedreno/freedreno_screen.c |  4 
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index e4eac28a917..59f7a180700 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -110,7 +110,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, 
radeonsi, llvmpipe, soft
 GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
 
   GL_ARB_draw_buffers_blend DONE (freedreno, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
-  GL_ARB_draw_indirect  DONE (freedreno/a5xx, 
i965/gen7+, llvmpipe, softpipe, swr)
+  GL_ARB_draw_indirect  DONE (freedreno, 
i965/gen7+, llvmpipe, softpipe, swr)
   GL_ARB_gpu_shader5DONE (i965/gen7+)
   - 'precise' qualifier DONE
   - Dynamically uniform sampler array indices   DONE (softpipe)
@@ -175,7 +175,7 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, radeonsi
   GL_ARB_framebuffer_no_attachments DONE (freedreno/a5xx, 
i965, r600, softpipe)
   GL_ARB_internalformat_query2  DONE (all drivers)
   GL_ARB_invalidate_subdata DONE (all drivers)
-  GL_ARB_multi_draw_indirectDONE (freedreno/a5xx, 
i965, r600, llvmpipe, softpipe, swr)
+  GL_ARB_multi_draw_indirectDONE (freedreno, i965, 
r600, llvmpipe, softpipe, swr)
   GL_ARB_program_interface_queryDONE (all drivers)
   GL_ARB_robust_buffer_access_behavior  DONE (i965)
   GL_ARB_shader_image_size  DONE (freedreno/a5xx, 
i965, r600, softpipe)
@@ -242,7 +242,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
 
   GL_ARB_arrays_of_arrays   DONE (all drivers that 
support GLSL 1.30)
   GL_ARB_compute_shader DONE (freedreno/a5xx, 
i965/gen7+, softpipe)
-  GL_ARB_draw_indirect  DONE (freedreno/a5xx, 
i965/gen7+, r600, llvmpipe, softpipe, swr)
+  GL_ARB_draw_indirect  DONE (freedreno, 
i965/gen7+, r600, llvmpipe, softpipe, swr)
   GL_ARB_explicit_uniform_location  DONE (all drivers that 
support GLSL)
   GL_ARB_framebuffer_no_attachments DONE (freedreno/a5xx, 
i965/gen7+, r600, softpipe)
   GL_ARB_program_interface_queryDONE (all drivers)
diff --git a/src/gallium/drivers/freedreno/a4xx/fd4_draw.h 
b/src/gallium/drivers/freedreno/a4xx/fd4_draw.h
index 842a952719b..f7a7d92453b 100644
--- a/src/gallium/drivers/freedreno/a4xx/fd4_draw.h
+++ b/src/gallium/drivers/freedreno/a4xx/fd4_draw.h
@@ -112,6 +112,35 @@ fd4_draw_emit(struct fd_batch *batch, struct fd_ringbuffer 
*ring,
enum pc_di_src_sel src_sel;
uint32_t idx_size, idx_offset;
 
+   if (info->indirect) {
+   struct fd_resource *ind = fd_resource(info->indirect->buffer);
+
+   emit_marker(ring, 7);
+
+   if (info->index_size) {
+   struct pipe_resource *idx = info->index.resource;
+
+   OUT_PKT3(ring, CP_DRAW_INDX_INDIRECT, 4);
+   OUT_RINGP(ring, DRAW4(primtype, DI_SRC_SEL_DMA,
+   fd4_size2indextype(info->index_size), 
0),
+   >draw_patches);
+   OUT_RELOC(ring, fd_resource(idx)->bo, index_offset, 0, 
0);
+   OUT_RING(ring, A4XX_CP_DRAW_INDX_INDIRECT_2_INDX_SIZE(
+idx->width0 - 
index_offset));
+   OUT_RELOC(ring, ind->bo, info->indirect->offset, 0, 0);
+   } else {
+   OUT_PKT3(ring, CP_DRAW_INDIRECT, 2);
+   OUT_RINGP(ring, DRAW4(primtype, DI_SRC_SEL_AUTO_INDEX, 
0, 0),
+   >draw_patches);
+   OUT_RELOC(ring, ind->bo, info->indirect->offset, 0, 0);
+   }
+
+   emit_marker(ring, 7);
+   fd_reset_wfi(batch);
+
+   return;
+   }
+
if (info->index_size) {
assert(!info->has_user_indices);
 
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 62e4a574b90..3fa28e3f310 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedr

[Freedreno] [PATCH 1/2] freedreno: regenerate pm4 header, adjust code for new names

2017-11-25 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a5xx/fd5_compute.c |   6 +-
 src/gallium/drivers/freedreno/a5xx/fd5_draw.h|   2 +-
 src/gallium/drivers/freedreno/adreno_pm4.xml.h   | 277 ++-
 3 files changed, 171 insertions(+), 114 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a5xx/fd5_compute.c 
b/src/gallium/drivers/freedreno/a5xx/fd5_compute.c
index 55cddadf600..f9fb599e785 100644
--- a/src/gallium/drivers/freedreno/a5xx/fd5_compute.c
+++ b/src/gallium/drivers/freedreno/a5xx/fd5_compute.c
@@ -165,9 +165,9 @@ fd5_launch_grid(struct fd_context *ctx, const struct 
pipe_grid_info *info)
OUT_PKT7(ring, CP_EXEC_CS_INDIRECT, 4);
OUT_RING(ring, 0x);
OUT_RELOC(ring, rsc->bo, info->indirect_offset, 0, 0);  /* 
ADDR_LO/HI */
-   OUT_RING(ring, CP_EXEC_CS_INDIRECT_3_LOCALSIZEX(local_size[0] - 
1) |
-   CP_EXEC_CS_INDIRECT_3_LOCALSIZEY(local_size[1] 
- 1) |
-   CP_EXEC_CS_INDIRECT_3_LOCALSIZEZ(local_size[2] 
- 1));
+   OUT_RING(ring, 
A5XX_CP_EXEC_CS_INDIRECT_3_LOCALSIZEX(local_size[0] - 1) |
+   
A5XX_CP_EXEC_CS_INDIRECT_3_LOCALSIZEY(local_size[1] - 1) |
+   
A5XX_CP_EXEC_CS_INDIRECT_3_LOCALSIZEZ(local_size[2] - 1));
} else {
OUT_PKT7(ring, CP_EXEC_CS, 4);
OUT_RING(ring, 0x);
diff --git a/src/gallium/drivers/freedreno/a5xx/fd5_draw.h 
b/src/gallium/drivers/freedreno/a5xx/fd5_draw.h
index e33085fcd5b..d1069157e75 100644
--- a/src/gallium/drivers/freedreno/a5xx/fd5_draw.h
+++ b/src/gallium/drivers/freedreno/a5xx/fd5_draw.h
@@ -105,7 +105,7 @@ fd5_draw_emit(struct fd_batch *batch, struct fd_ringbuffer 
*ring,
>draw_patches);
OUT_RELOC(ring, fd_resource(idx)->bo,
index_offset, 0, 0);
-   OUT_RING(ring, 
CP_DRAW_INDX_INDIRECT_3_MAX_INDICES(max_indicies));
+   OUT_RING(ring, 
A5XX_CP_DRAW_INDX_INDIRECT_3_MAX_INDICES(max_indicies));
OUT_RELOC(ring, ind->bo, info->indirect->offset, 0, 0);
} else {
OUT_PKT7(ring, CP_DRAW_INDIRECT, 3);
diff --git a/src/gallium/drivers/freedreno/adreno_pm4.xml.h 
b/src/gallium/drivers/freedreno/adreno_pm4.xml.h
index 99404e8c0c9..d6f49e7ccfa 100644
--- a/src/gallium/drivers/freedreno/adreno_pm4.xml.h
+++ b/src/gallium/drivers/freedreno/adreno_pm4.xml.h
@@ -8,15 +8,15 @@ http://github.com/freedreno/envytools/
 git clone https://github.com/freedreno/envytools.git
 
 The rules-ng-ng source files this header was generated from are:
-- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml   (
431 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   
1572 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml  (  
37162 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  
13324 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml(  
33379 bytes, from 2017-11-14 21:00:47)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml  (  
83840 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml  ( 
111898 bytes, from 2017-06-06 18:23:59)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a5xx.xml  ( 
143420 bytes, from 2017-11-16 20:29:34)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/ocmem.xml (   
1773 bytes, from 2017-05-17 13:21:27)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno.xml   (431 
bytes, from 2017-11-18 20:43:22)
+- /home/ilia/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1572 
bytes, from 2016-02-11 01:04:14)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/a2xx.xml  (  36805 
bytes, from 2017-11-18 20:48:10)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  15292 
bytes, from 2017-11-19 20:45:26)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml(  34349 
bytes, from 2017-11-19 20:43:33)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/a3xx.xml  (  83840 
bytes, from 2017-11-18 19:40:11)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/a4xx.xml  ( 112609 
bytes, from 2017-11-19 04:47:10)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/a5xx.xml  ( 143017 
bytes, from 2017-11-19 04:05:11)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/ocmem.xml (   1773 
bytes, from 2015-11-07 21:10:25)
 
 Copyright (C) 2013-2017 by the following authors:
 - Rob Clark <robdcl...@gmail.com> (robclark)
@@ -

[Freedreno] [PATCH v2 1/2] nir: allow texture offsets with cube maps

2017-11-25 Thread Ilia Mirkin
GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
---

v1 -> v2: shuffled code around to use an if ladder

 src/compiler/nir/nir.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index f46f6147110..cf200fdc665 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1364,8 +1364,7 @@ nir_tex_instr_src_size(const nir_tex_instr *instr, 
unsigned src)
if (instr->src[src].src_type == nir_tex_src_ms_mcs)
   return 4;
 
-   if (instr->src[src].src_type == nir_tex_src_offset ||
-   instr->src[src].src_type == nir_tex_src_ddx ||
+   if (instr->src[src].src_type == nir_tex_src_ddx ||
instr->src[src].src_type == nir_tex_src_ddy) {
   if (instr->is_array)
  return instr->coord_components - 1;
@@ -1373,6 +1372,18 @@ nir_tex_instr_src_size(const nir_tex_instr *instr, 
unsigned src)
  return instr->coord_components;
}
 
+   /* Usual APIs don't allow cube + offset, but we allow it, with 2 coords for
+* the offset, since a cube maps to a single face.
+*/
+   if (instr->src[src].src_type == nir_tex_src_offset) {
+  if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
+ return 2;
+  else if (instr->is_array)
+ return instr->coord_components - 1;
+  else
+ return instr->coord_components;
+   }
+
return 1;
 }
 
-- 
2.13.6

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH v2 2/2] freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx

2017-11-25 Thread Ilia Mirkin
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---

v1 -> v2: minor fixups in response to Rob Clark's feedback

 docs/features.txt  |   4 +-
 src/gallium/drivers/freedreno/Makefile.sources |   1 +
 src/gallium/drivers/freedreno/freedreno_screen.c   |   2 +-
 .../drivers/freedreno/ir3/ir3_compiler_nir.c   |   7 +-
 src/gallium/drivers/freedreno/ir3/ir3_nir.c|   2 +
 src/gallium/drivers/freedreno/ir3/ir3_nir.h|   1 +
 .../freedreno/ir3/ir3_nir_lower_tg4_to_tex.c   | 140 +
 src/gallium/drivers/freedreno/meson.build  |   1 +
 8 files changed, 153 insertions(+), 5 deletions(-)
 create mode 100644 src/gallium/drivers/freedreno/ir3/ir3_nir_lower_tg4_to_tex.c

diff --git a/docs/features.txt b/docs/features.txt
index d4ec38a236b..7f9a18ddd79 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -130,7 +130,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, 
radeonsi
   GL_ARB_tessellation_shaderDONE (i965/gen7+)
   GL_ARB_texture_buffer_object_rgb32DONE (freedreno, 
i965/gen6+, llvmpipe, softpipe, swr)
   GL_ARB_texture_cube_map_array DONE (i965/gen6+, 
nv50, llvmpipe, softpipe)
-  GL_ARB_texture_gather DONE (freedreno/a5xx, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
+  GL_ARB_texture_gather DONE (freedreno, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_query_lod  DONE (freedreno, i965, 
nv50, llvmpipe, softpipe)
   GL_ARB_transform_feedback2DONE (i965/gen6+, 
nv50, llvmpipe, softpipe, swr)
   GL_ARB_transform_feedback3DONE (i965/gen7+, 
llvmpipe, softpipe, swr)
@@ -256,7 +256,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
   GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, 
nv50, r600, llvmpipe, softpipe)
   GL_ARB_texture_storage_multisampleDONE (all drivers that 
support GL_ARB_texture_multisample)
   GL_ARB_vertex_attrib_binding  DONE (all drivers)
-  GS5 Enhanced textureGatherDONE (i965/gen7+, r600)
+  GS5 Enhanced textureGatherDONE (freedreno, 
i965/gen7+, r600)
   GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, r600)
   GL_EXT_shader_integer_mix DONE (all drivers that 
support GLSL)
 
diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index b109a5a7a21..40c2eff0455 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -168,6 +168,7 @@ ir3_SOURCES := \
ir3/ir3_nir.c \
ir3/ir3_nir.h \
ir3/ir3_nir_lower_if_else.c \
+   ir3/ir3_nir_lower_tg4_to_tex.c \
ir3/ir3_print.c \
ir3/ir3_ra.c \
ir3/ir3_sched.c \
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index e61344fd104..62e4a574b90 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -264,7 +264,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
return 0;
 
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
-   if (is_a5xx(screen))
+   if (is_a4xx(screen) || is_a5xx(screen))
return 4;
return 0;
 
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index da4aeaa7acb..c97df4f1d63 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -2399,9 +2399,12 @@ emit_tex(struct ir3_context *ctx, nir_tex_instr *tex)
 */
if (has_off | has_lod | has_bias) {
if (has_off) {
-   for (i = 0; i < coords; i++)
+   unsigned off_coords = coords;
+   if (tex->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
+   off_coords--;
+   for (i = 0; i < off_coords; i++)
src1[nsrc1++] = off[i];
-   if (coords < 2)
+   if (off_coords < 2)
src1[nsrc1++] = create_immed(b, fui(0.0));
flags |= IR3_INSTR_O;
}
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.

Re: [Freedreno] [Mesa-dev] [PATCH 1/2] nir: allow texture offsets with cube maps

2017-11-20 Thread Ilia Mirkin
On Mon, Nov 20, 2017 at 7:08 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
> On Mon, Nov 20, 2017 at 3:11 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>>
>> On Mon, Nov 20, 2017 at 5:16 PM, Jason Ekstrand <ja...@jlekstrand.net>
>> wrote:
>> > On Sun, Nov 19, 2017 at 11:54 AM, Ilia Mirkin <imir...@alum.mit.edu>
>> > wrote:
>> >>
>> >> GL doesn't have this, but some hardware supports it. This is convenient
>> >> for lowering tg4 to plain texture calls, which is necessary on Adreno
>> >> A4xx hardware.
>> >>
>> >> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
>> >> ---
>> >>  src/compiler/nir/nir.h | 15 +--
>> >>  1 file changed, 13 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> >> index f46f6147110..64965ae16d6 100644
>> >> --- a/src/compiler/nir/nir.h
>> >> +++ b/src/compiler/nir/nir.h
>> >> @@ -1364,8 +1364,7 @@ nir_tex_instr_src_size(const nir_tex_instr
>> >> *instr,
>> >> unsigned src)
>> >> if (instr->src[src].src_type == nir_tex_src_ms_mcs)
>> >>return 4;
>> >>
>> >> -   if (instr->src[src].src_type == nir_tex_src_offset ||
>> >> -   instr->src[src].src_type == nir_tex_src_ddx ||
>> >> +   if (instr->src[src].src_type == nir_tex_src_ddx ||
>> >> instr->src[src].src_type == nir_tex_src_ddy) {
>> >>if (instr->is_array)
>> >>   return instr->coord_components - 1;
>> >> @@ -1373,6 +1372,18 @@ nir_tex_instr_src_size(const nir_tex_instr
>> >> *instr,
>> >> unsigned src)
>> >>   return instr->coord_components;
>> >> }
>> >>
>> >> +   /* Usual APIs don't allow cube + offset, but we allow it, with 2
>> >> coords for
>> >> +* the offset, since a cube maps to a single face.
>> >> +*/
>> >> +   if (instr->src[src].src_type == nir_tex_src_offset) {
>> >> +  unsigned ret = instr->coord_components;
>> >> +  if (instr->is_array)
>> >> + ret--;
>> >> +  if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
>> >> + ret--;
>> >> +  return ret;
>> >
>> >
>> > I think I'd rather this look more like the one above:
>> >
>> > if (instr->is_array)
>> >return instr->coord_components;
>> > else if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
>> >return 2;
>> > else
>> >return instr->coord_components - 1;
>> >
>> > It seems a bit cleaner and/or more explicit to me.
>>
>> OK. Although your version is slightly wrong, but I get the idea. Will
>> do that in a v2. (array should get -1, and cube should always get 2
>> even if it's an array)
>
>
> I'd forgotten about cube arrays, yes, those would naturally be -2.  In that
> case, maybe -- for each subtraction is a good idea...

Well, rearranging it, we get

if (instr->sampler_dim == CUBE)
  return 2;
else if (instr->is_array)
  return comp - 1;
else
  return comp;

Happy to leave it alone too though.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [PATCH 1/2] nir: allow texture offsets with cube maps

2017-11-20 Thread Ilia Mirkin
On Mon, Nov 20, 2017 at 5:16 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
> On Sun, Nov 19, 2017 at 11:54 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>>
>> GL doesn't have this, but some hardware supports it. This is convenient
>> for lowering tg4 to plain texture calls, which is necessary on Adreno
>> A4xx hardware.
>>
>> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
>> ---
>>  src/compiler/nir/nir.h | 15 +--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index f46f6147110..64965ae16d6 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -1364,8 +1364,7 @@ nir_tex_instr_src_size(const nir_tex_instr *instr,
>> unsigned src)
>> if (instr->src[src].src_type == nir_tex_src_ms_mcs)
>>return 4;
>>
>> -   if (instr->src[src].src_type == nir_tex_src_offset ||
>> -   instr->src[src].src_type == nir_tex_src_ddx ||
>> +   if (instr->src[src].src_type == nir_tex_src_ddx ||
>> instr->src[src].src_type == nir_tex_src_ddy) {
>>if (instr->is_array)
>>   return instr->coord_components - 1;
>> @@ -1373,6 +1372,18 @@ nir_tex_instr_src_size(const nir_tex_instr *instr,
>> unsigned src)
>>   return instr->coord_components;
>> }
>>
>> +   /* Usual APIs don't allow cube + offset, but we allow it, with 2
>> coords for
>> +* the offset, since a cube maps to a single face.
>> +*/
>> +   if (instr->src[src].src_type == nir_tex_src_offset) {
>> +  unsigned ret = instr->coord_components;
>> +  if (instr->is_array)
>> + ret--;
>> +  if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
>> + ret--;
>> +  return ret;
>
>
> I think I'd rather this look more like the one above:
>
> if (instr->is_array)
>return instr->coord_components;
> else if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
>return 2;
> else
>return instr->coord_components - 1;
>
> It seems a bit cleaner and/or more explicit to me.

OK. Although your version is slightly wrong, but I get the idea. Will
do that in a v2. (array should get -1, and cube should always get 2
even if it's an array)

  -ilia

>
> Also, bonus points to anyone who converts this function to use a switch. :-P
>
> --Jason
>
>>
>> +   }
>> +
>> return 1;
>>  }
>>
>> --
>> 2.13.6
>>
>> ___
>> mesa-dev mailing list
>> mesa-...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
> ___
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
>
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] freedreno/a4xx: add stencil texturing support

2017-11-19 Thread Ilia Mirkin
Copied from a5xx, should be identical.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 docs/features.txt|  6 ++---
 src/gallium/drivers/freedreno/a4xx/fd4_emit.c|  2 ++
 src/gallium/drivers/freedreno/a4xx/fd4_format.c  | 11 +---
 src/gallium/drivers/freedreno/a4xx/fd4_texture.c | 34 ++--
 4 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 99fb1715e0b..2d6e0b20fb5 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -180,7 +180,7 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, radeonsi
   GL_ARB_robust_buffer_access_behavior  DONE (i965)
   GL_ARB_shader_image_size  DONE (freedreno/a5xx, 
i965, r600, softpipe)
   GL_ARB_shader_storage_buffer_object   DONE (freedreno/a5xx, 
i965, softpipe)
-  GL_ARB_stencil_texturing  DONE (freedreno/a5xx, 
i965/hsw+, nv50, r600, llvmpipe, softpipe, swr)
+  GL_ARB_stencil_texturing  DONE (freedreno, 
i965/hsw+, nv50, r600, llvmpipe, softpipe, swr)
   GL_ARB_texture_buffer_range   DONE (freedreno, nv50, 
i965, r600, llvmpipe)
   GL_ARB_texture_query_levels   DONE (all drivers that 
support GLSL 1.30)
   GL_ARB_texture_storage_multisampleDONE (all drivers that 
support GL_ARB_texture_multisample)
@@ -203,7 +203,7 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, radeonsi
   GL_ARB_multi_bind DONE (all drivers)
   GL_ARB_query_buffer_objectDONE (i965/hsw+)
   GL_ARB_texture_mirror_clamp_to_edge   DONE (i965, nv50, 
r600, llvmpipe, softpipe, swr)
-  GL_ARB_texture_stencil8   DONE (freedreno/a5xx, 
i965/hsw+, nv50, r600, llvmpipe, softpipe, swr)
+  GL_ARB_texture_stencil8   DONE (freedreno, 
i965/hsw+, nv50, r600, llvmpipe, softpipe, swr)
   GL_ARB_vertex_type_10f_11f_11f_revDONE (i965, nv50, 
r600, llvmpipe, softpipe, swr)
 
 GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi
@@ -252,7 +252,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
   GL_ARB_shader_storage_buffer_object   DONE (freedreno/a5xx, 
i965/gen7+, softpipe)
   GL_ARB_shading_language_packing   DONE (all drivers)
   GL_ARB_separate_shader_objectsDONE (all drivers)
-  GL_ARB_stencil_texturing  DONE (freedreno/a5xx, 
nv50, r600, llvmpipe, softpipe, swr)
+  GL_ARB_stencil_texturing  DONE (freedreno, nv50, 
r600, llvmpipe, softpipe, swr)
   GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, 
nv50, r600, llvmpipe, softpipe)
   GL_ARB_texture_storage_multisampleDONE (all drivers that 
support GL_ARB_texture_multisample)
   GL_ARB_vertex_attrib_binding  DONE (all drivers)
diff --git a/src/gallium/drivers/freedreno/a4xx/fd4_emit.c 
b/src/gallium/drivers/freedreno/a4xx/fd4_emit.c
index 0f7c6470330..8262b45daad 100644
--- a/src/gallium/drivers/freedreno/a4xx/fd4_emit.c
+++ b/src/gallium/drivers/freedreno/a4xx/fd4_emit.c
@@ -190,6 +190,8 @@ emit_textures(struct fd_context *ctx, struct fd_ringbuffer 
*ring,
OUT_RING(ring, view->texconst3);
if (view->base.texture) {
struct fd_resource *rsc = 
fd_resource(view->base.texture);
+   if (view->base.format == 
PIPE_FORMAT_X32_S8X24_UINT)
+   rsc = rsc->stencil;
OUT_RELOC(ring, rsc->bo, view->offset, 
view->texconst4, 0);
} else {
OUT_RING(ring, 0x);
diff --git a/src/gallium/drivers/freedreno/a4xx/fd4_format.c 
b/src/gallium/drivers/freedreno/a4xx/fd4_format.c
index 3e1dc277850..75d24126149 100644
--- a/src/gallium/drivers/freedreno/a4xx/fd4_format.c
+++ b/src/gallium/drivers/freedreno/a4xx/fd4_format.c
@@ -211,10 +211,13 @@ static struct fd4_format formats[PIPE_FORMAT_COUNT] = {
VT(R11G11B10_FLOAT, 11_11_10_FLOAT, R11G11B10_FLOAT, WZYX),
_T(R9G9B9E5_FLOAT,  9_9_9_E5_FLOAT, NONE,WZYX),
 
-   _T(Z24X8_UNORM,   X8Z24_UNORM, R8G8B8A8_UNORM, WZYX),
-   _T(Z24_UNORM_S8_UINT, X8Z24_UNORM, R8G8B8A8_UNORM, WZYX),
-   _T(Z32_FLOAT, 32_FLOAT,   R8G8B8A8_UNORM, WZYX),
-   _T(Z32_FLOAT_S8X24_UINT, 32_FLOAT,R8G8B8A8_UNORM, WZYX),
+   _T(Z16_UNORM,16_UNORM, R16_UNORM,  WZYX),
+   _T(Z24X8_UNORM,  X8Z24_UNORM,  R8G8B8A8_UNORM, WZYX),
+   _T(X24S8_UINT,   8_8_8_8_UINT, R8G8B8A8_UINT,  XYZW),
+   _T(Z24_UNORM_S8_UINT,X8Z24_UNORM,  R8G8B8A8_UNORM, WZYX),
+   _T(

[Freedreno] [PATCH 1/2] nir: allow texture offsets with cube maps

2017-11-19 Thread Ilia Mirkin
GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/compiler/nir/nir.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index f46f6147110..64965ae16d6 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1364,8 +1364,7 @@ nir_tex_instr_src_size(const nir_tex_instr *instr, 
unsigned src)
if (instr->src[src].src_type == nir_tex_src_ms_mcs)
   return 4;
 
-   if (instr->src[src].src_type == nir_tex_src_offset ||
-   instr->src[src].src_type == nir_tex_src_ddx ||
+   if (instr->src[src].src_type == nir_tex_src_ddx ||
instr->src[src].src_type == nir_tex_src_ddy) {
   if (instr->is_array)
  return instr->coord_components - 1;
@@ -1373,6 +1372,18 @@ nir_tex_instr_src_size(const nir_tex_instr *instr, 
unsigned src)
  return instr->coord_components;
}
 
+   /* Usual APIs don't allow cube + offset, but we allow it, with 2 coords for
+* the offset, since a cube maps to a single face.
+*/
+   if (instr->src[src].src_type == nir_tex_src_offset) {
+  unsigned ret = instr->coord_components;
+  if (instr->is_array)
+ ret--;
+  if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE)
+ ret--;
+  return ret;
+   }
+
return 1;
 }
 
-- 
2.13.6

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH 2/2] freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx

2017-11-19 Thread Ilia Mirkin
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---

This pass relies on the hw doing the "right thing", working with nonconst
offsets, and not having the usual limits (since the gather offset will in
effect get offset by another 1).

It fails two tests out of all the gather ones:

bin/zero-tex-coord textureGather
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-textureGatherOffset-uniform-array-offset.shader_test

We haven't fully investigated why yet, but this is a good start.

Note that the blob does this differently - they modify the source coordinate.
However this seems unnecessary given that the hw can be made to use the
offsets.

Also please note that my knowledge of nir is minimal. Please carefully check
that I used the right helpers/etc. This was largely a result of seeing what
doesn't result in assertions.

 docs/features.txt  |   4 +-
 src/gallium/drivers/freedreno/Makefile.sources |   1 +
 src/gallium/drivers/freedreno/freedreno_screen.c   |   2 +-
 .../drivers/freedreno/ir3/ir3_compiler_nir.c   |   7 +-
 src/gallium/drivers/freedreno/ir3/ir3_nir.c|   2 +
 src/gallium/drivers/freedreno/ir3/ir3_nir.h|   1 +
 .../freedreno/ir3/ir3_nir_lower_tg4_to_tex.c   | 139 +
 src/gallium/drivers/freedreno/meson.build  |   1 +
 8 files changed, 152 insertions(+), 5 deletions(-)
 create mode 100644 src/gallium/drivers/freedreno/ir3/ir3_nir_lower_tg4_to_tex.c

diff --git a/docs/features.txt b/docs/features.txt
index 633d2593738..99fb1715e0b 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -130,7 +130,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, 
radeonsi
   GL_ARB_tessellation_shaderDONE (i965/gen7+)
   GL_ARB_texture_buffer_object_rgb32DONE (freedreno, 
i965/gen6+, llvmpipe, softpipe, swr)
   GL_ARB_texture_cube_map_array DONE (i965/gen6+, 
nv50, llvmpipe, softpipe)
-  GL_ARB_texture_gather DONE (freedreno/a5xx, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
+  GL_ARB_texture_gather DONE (freedreno, 
i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_query_lod  DONE (freedreno, i965, 
nv50, llvmpipe, softpipe)
   GL_ARB_transform_feedback2DONE (i965/gen6+, 
nv50, llvmpipe, softpipe, swr)
   GL_ARB_transform_feedback3DONE (i965/gen7+, 
llvmpipe, softpipe, swr)
@@ -256,7 +256,7 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
   GL_ARB_texture_multisample (Multisample textures) DONE (i965/gen7+, 
nv50, r600, llvmpipe, softpipe)
   GL_ARB_texture_storage_multisampleDONE (all drivers that 
support GL_ARB_texture_multisample)
   GL_ARB_vertex_attrib_binding  DONE (all drivers)
-  GS5 Enhanced textureGatherDONE (i965/gen7+, r600)
+  GS5 Enhanced textureGatherDONE (freedreno, 
i965/gen7+, r600)
   GS5 Packing/bitfield/conversion functions DONE (i965/gen6+, r600)
   GL_EXT_shader_integer_mix DONE (all drivers that 
support GLSL)
 
diff --git a/src/gallium/drivers/freedreno/Makefile.sources 
b/src/gallium/drivers/freedreno/Makefile.sources
index b109a5a7a21..40c2eff0455 100644
--- a/src/gallium/drivers/freedreno/Makefile.sources
+++ b/src/gallium/drivers/freedreno/Makefile.sources
@@ -168,6 +168,7 @@ ir3_SOURCES := \
ir3/ir3_nir.c \
ir3/ir3_nir.h \
ir3/ir3_nir_lower_if_else.c \
+   ir3/ir3_nir_lower_tg4_to_tex.c \
ir3/ir3_print.c \
ir3/ir3_ra.c \
ir3/ir3_sched.c \
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index e61344fd104..62e4a574b90 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -264,7 +264,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
return 0;
 
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
-   if (is_a5xx(screen))
+   if (is_a4xx(screen) || is_a5xx(screen))
return 4;
return 0;
 
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index da4aeaa7acb..c97df4f1d63 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -2399,9 +2399,12 @@ emit_tex(struct ir3_context *ctx, ni

[Freedreno] [PATCH] a2xx: add support for a few 16-bit color rendering formats

2017-08-24 Thread Ilia Mirkin
The rest should be possible too, just needs some additional
investigation. Passes fbo-*-formats piglit tests.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c   | 5 +
 src/gallium/drivers/freedreno/a2xx/fd2_screen.c | 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index aaba4127e0a..faf4dbccbc9 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -50,6 +50,11 @@ static uint32_t fmt2swap(enum pipe_format format)
switch (format) {
case PIPE_FORMAT_B8G8R8A8_UNORM:
case PIPE_FORMAT_B8G8R8X8_UNORM:
+   case PIPE_FORMAT_B5G6R5_UNORM:
+   case PIPE_FORMAT_B5G5R5A1_UNORM:
+   case PIPE_FORMAT_B5G5R5X1_UNORM:
+   case PIPE_FORMAT_B4G4R4A4_UNORM:
+   case PIPE_FORMAT_B4G4R4X4_UNORM:
/* TODO probably some more.. */
return 1;
default:
diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_screen.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_screen.c
index 714948c1cef..8e176b1341f 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_screen.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_screen.c
@@ -54,7 +54,12 @@ fd2_screen_is_format_supported(struct pipe_screen *pscreen,
 
/* TODO figure out how to render to other formats.. */
if ((usage & PIPE_BIND_RENDER_TARGET) &&
-   ((format != PIPE_FORMAT_B8G8R8A8_UNORM) &&
+   ((format != PIPE_FORMAT_B5G6R5_UNORM) &&
+(format != PIPE_FORMAT_B5G5R5A1_UNORM) &&
+(format != PIPE_FORMAT_B5G5R5X1_UNORM) &&
+(format != PIPE_FORMAT_B4G4R4A4_UNORM) &&
+(format != PIPE_FORMAT_B4G4R4X4_UNORM) &&
+(format != PIPE_FORMAT_B8G8R8A8_UNORM) &&
 (format != PIPE_FORMAT_B8G8R8X8_UNORM) &&
 (format != PIPE_FORMAT_R8G8B8A8_UNORM) &&
 (format != PIPE_FORMAT_R8G8B8X8_UNORM))) {
-- 
2.13.5

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] a2xx: only update rasterizer settings when they're there

2017-08-15 Thread Ilia Mirkin
The rasterizer being empty can happen e.g. during clears

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
index af9a69e86fb..42f27d278a2 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_emit.c
@@ -222,7 +222,7 @@ fd2_emit_state(struct fd_context *ctx, const enum 
fd_dirty_3d_state dirty)
OUT_RING(ring, zsa->rb_alpha_ref);
}
 
-   if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_FRAMEBUFFER)) {
+   if (ctx->rasterizer && dirty & FD_DIRTY_RASTERIZER) {
struct fd2_rasterizer_stateobj *rasterizer =
fd2_rasterizer_stateobj(ctx->rasterizer);
OUT_PKT3(ring, CP_SET_CONSTANT, 3);
-- 
2.13.0

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] freedreno/ir3: fix load_front_face conversion

2017-07-11 Thread Ilia Mirkin
The comments are correct - we get -1 and 0. However by adding 1, we
convert this into 0,1. This mostly works for conditionals, but when
negated, this will yield the wrong result. Instead just negate the
values (as they are backwards -- -1 means back instead of front).

Fixes tests/shaders/glsl-fs-frontfacing-not.shader_test and
dEQP-GLES3.functional.shaders.builtin_variable.frontfacing on A530.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index ba1c64ee37c..764aeb49f1a 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -1546,14 +1546,11 @@ emit_intrinsic(struct ir3_compile *ctx, 
nir_intrinsic_instr *intr)
ctx->frag_face = create_input(b, 0);
ctx->frag_face->regs[0]->flags |= IR3_REG_HALF;
}
-   /* for fragface, we always get -1 or 0, but that is inverse
-* of what nir expects (where ~0 is true).  Unfortunately
-* trying to widen from half to full in add.s seems to do a
-* non-sign-extending widen (resulting in something that
-* gets interpreted as float Inf??)
+   /* for fragface, we get -1 for back and 0 for front. However 
this is
+* the inverse of what nir expects (where ~0 is true).
 */
dst[0] = ir3_COV(b, ctx->frag_face, TYPE_S16, TYPE_S32);
-   dst[0] = ir3_ADD_S(b, dst[0], 0, create_immed(b, 1), 0);
+   dst[0] = ir3_NOT_B(b, dst[0], 0);
break;
case nir_intrinsic_load_local_invocation_id:
if (!ctx->local_invocation_id) {
-- 
2.13.0

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] Whether A200 driver is supported by Linux Mainline Kernel

2017-07-10 Thread Ilia Mirkin
On Mon, Jul 10, 2017 at 10:53 AM, abhijit  wrote:
> Hi Rob,
>
> Thank you very much for your reply.
>
> I ensured that --enable-freedreno-kgsl is enabled in libdrm build and the
> same is copied to target
>
> The issue seems to be in mesa build
>
> I observed that there are two mechanism in which application can interact
> with underlaying DRM,
> 1. With DRI ($MESA_INSTALL_PATH/src/mesa/drivers/dri)
> 2. With Gallium driver ($MESA_INSTALL_PATH/src/gallium/drivers)
>
> Freedreno driver is present only in Case 2. For that reason I disabled dri
> in Mesa build and enabled gallium-xlib intreface, which will enable case 2

You absolutely need --enable-dri. Gallium drivers are also DRI
drivers. The "src/mesa/drivers/dri" drivers are "classic" drivers,
while the others are "gallium" drivers. However they're all DRI
drivers.

You can build mesa --with-dri-drivers=""
--with-gallium-drivers="freedreno", but you have to leave DRI(3)
enabled.

  -ilia
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH 3/3] a5xx: enable formats newly added to the headers

2017-07-04 Thread Ilia Mirkin
This enables S3TC, BPTC, ETC2, and ASTC texture decoding. Additionally
this enables RGB32 texture buffer objects, as well as 11_11_10_FLOAT and
10_10_10_2 vertex formats (and related extensions).

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a5xx/fd5_format.c | 138 
 1 file changed, 69 insertions(+), 69 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a5xx/fd5_format.c 
b/src/gallium/drivers/freedreno/a5xx/fd5_format.c
index 2255b1f7396..0c72ec0d13c 100644
--- a/src/gallium/drivers/freedreno/a5xx/fd5_format.c
+++ b/src/gallium/drivers/freedreno/a5xx/fd5_format.c
@@ -194,19 +194,19 @@ static struct fd5_format formats[PIPE_FORMAT_COUNT] = {
_T(A8R8G8B8_SRGB,8_8_8_8_UNORM, R8G8B8A8_UNORM, ZYXW),
_T(X8R8G8B8_SRGB,8_8_8_8_UNORM, R8G8B8A8_UNORM, ZYXW),
 
-   _T(R10G10B10A2_UNORM,   10_10_10_2_UNORM, NONE,  WZYX),
-   _T(B10G10R10A2_UNORM,   10_10_10_2_UNORM, NONE,  WXYZ),
+   VT(R10G10B10A2_UNORM,   10_10_10_2_UNORM, NONE,  WZYX),
+   VT(B10G10R10A2_UNORM,   10_10_10_2_UNORM, NONE,  WXYZ),
_T(B10G10R10X2_UNORM,   10_10_10_2_UNORM, NONE,  WXYZ),
-// V_(R10G10B10A2_SNORM,   10_10_10_2_SNORM, NONE,  WZYX),
-// V_(B10G10R10A2_SNORM,   10_10_10_2_SNORM, NONE,  WXYZ),
-   _T(R10G10B10A2_UINT,10_10_10_2_UINT,  R10G10B10A2_UINT,  WZYX),
-   _T(B10G10R10A2_UINT,10_10_10_2_UINT,  R10G10B10A2_UINT,  WXYZ),
-// V_(R10G10B10A2_USCALED, 10_10_10_2_UINT,  NONE,  WZYX),
-// V_(B10G10R10A2_USCALED, 10_10_10_2_UINT,  NONE,  WXYZ),
-// V_(R10G10B10A2_SSCALED, 10_10_10_2_SINT,  NONE,  WZYX),
-// V_(B10G10R10A2_SSCALED, 10_10_10_2_SINT,  NONE,  WXYZ),
-
-   _T(R11G11B10_FLOAT, 11_11_10_FLOAT, R11G11B10_FLOAT, WZYX),
+   V_(R10G10B10A2_SNORM,   10_10_10_2_SNORM, NONE,  WZYX),
+   V_(B10G10R10A2_SNORM,   10_10_10_2_SNORM, NONE,  WXYZ),
+   VT(R10G10B10A2_UINT,10_10_10_2_UINT,  R10G10B10A2_UINT,  WZYX),
+   VT(B10G10R10A2_UINT,10_10_10_2_UINT,  R10G10B10A2_UINT,  WXYZ),
+   V_(R10G10B10A2_USCALED, 10_10_10_2_UINT,  NONE,  WZYX),
+   V_(B10G10R10A2_USCALED, 10_10_10_2_UINT,  NONE,  WXYZ),
+   V_(R10G10B10A2_SSCALED, 10_10_10_2_SINT,  NONE,  WZYX),
+   V_(B10G10R10A2_SSCALED, 10_10_10_2_SINT,  NONE,  WXYZ),
+
+   VT(R11G11B10_FLOAT, 11_11_10_FLOAT, R11G11B10_FLOAT, WZYX),
_T(R9G9B9E5_FLOAT,  9_9_9_E5_FLOAT, NONE,WZYX),
 
_T(Z24X8_UNORM,   X8Z24_UNORM, R8G8B8A8_UNORM, WZYX),
@@ -248,11 +248,11 @@ static struct fd5_format formats[PIPE_FORMAT_COUNT] = {
_T(L32A32_SINT,32_32_SINT,  NONE,WZYX),
 
/* 96-bit */
-   V_(R32G32B32_UINT,32_32_32_UINT,  NONE, WZYX),
-   V_(R32G32B32_SINT,32_32_32_SINT,  NONE, WZYX),
+   VT(R32G32B32_UINT,32_32_32_UINT,  NONE, WZYX),
+   VT(R32G32B32_SINT,32_32_32_SINT,  NONE, WZYX),
V_(R32G32B32_USCALED, 32_32_32_UINT,  NONE, WZYX),
V_(R32G32B32_SSCALED, 32_32_32_SINT,  NONE, WZYX),
-   V_(R32G32B32_FLOAT,   32_32_32_FLOAT, NONE, WZYX),
+   VT(R32G32B32_FLOAT,   32_32_32_FLOAT, NONE, WZYX),
V_(R32G32B32_FIXED,   32_32_32_FIXED, NONE, WZYX),
 
/* 128-bit */
@@ -267,31 +267,31 @@ static struct fd5_format formats[PIPE_FORMAT_COUNT] = {
V_(R32G32B32A32_FIXED,   32_32_32_32_FIXED, NONE,   WZYX),
 
/* compressed */
-// _T(ETC1_RGB8, ETC1, NONE, WZYX),
-// _T(ETC2_RGB8, ETC2_RGB8, NONE, WZYX),
-// _T(ETC2_SRGB8, ETC2_RGB8, NONE, WZYX),
-// _T(ETC2_RGB8A1, ETC2_RGB8A1, NONE, WZYX),
-// _T(ETC2_SRGB8A1, ETC2_RGB8A1, NONE, WZYX),
-// _T(ETC2_RGBA8, ETC2_RGBA8, NONE, WZYX),
-// _T(ETC2_SRGBA8, ETC2_RGBA8, NONE, WZYX),
-// _T(ETC2_R11_UNORM, ETC2_R11_UNORM, NONE, WZYX),
-// _T(ETC2_R11_SNORM, ETC2_R11_SNORM, NONE, WZYX),
-// _T(ETC2_RG11_UNORM, ETC2_RG11_UNORM, NONE, WZYX),
-// _T(ETC2_RG11_SNORM, ETC2_RG11_SNORM, NONE, WZYX),
-
-// _T(DXT1_RGB,   DXT1, NONE, WZYX),
-// _T(DXT1_SRGB,  DXT1, NONE, WZYX),
-// _T(DXT1_RGBA,  DXT1, NONE, WZYX),
-// _T(DXT1_SRGBA, DXT1, NONE, WZYX),
-// _T(DXT3_RGBA,  DXT3, NONE, WZYX),
-// _T(DXT3_SRGBA, DXT3, NONE, WZYX),
-// _T(DXT5_RGBA,  DXT5, NONE, WZYX),
-// _T(DXT5_SRGBA, DXT5, NONE, WZYX),
-
-// _T(BPTC_RGBA_UNORM, BPTC,NONE, WZYX),
-// _T(BPTC_SRGBA,  BPTC,NONE, WZYX),
-// _T(BPTC_RGB_FLOAT,  BPTC_FLOAT,  NONE, WZYX),
-// _T(BPTC_RGB_UFLOAT, BPTC_UFLOAT, NONE, WZYX),
+   _T(ETC1_RGB8, ETC1, NONE, WZYX),
+   _T(ETC2_RGB8, ETC2_RGB8, NONE, WZYX),
+   _T(ETC2_SRGB8, ETC2_RGB8, NONE, WZYX),
+   _T(ETC2_RGB8A1, ETC2_RGB8A1, NONE, WZYX),
+   _T(ETC2_SRGB8A1, ETC2_RGB8A1, NONE, WZYX),
+   

[Freedreno] [PATCH 1/3] a5xx: update headers

2017-07-04 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a5xx/a5xx.xml.h | 57 ++-
 1 file changed, 47 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a5xx/a5xx.xml.h 
b/src/gallium/drivers/freedreno/a5xx/a5xx.xml.h
index abcc53965ad..ee6146532b1 100644
--- a/src/gallium/drivers/freedreno/a5xx/a5xx.xml.h
+++ b/src/gallium/drivers/freedreno/a5xx/a5xx.xml.h
@@ -8,15 +8,10 @@ http://github.com/freedreno/envytools/
 git clone https://github.com/freedreno/envytools.git
 
 The rules-ng-ng source files this header was generated from are:
-- /home/robclark/src/freedreno/envytools/rnndb/adreno.xml   (
431 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   
1572 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a2xx.xml  (  
37162 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  
13324 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml(  
31866 bytes, from 2017-06-02 15:50:23)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a3xx.xml  (  
83840 bytes, from 2017-05-17 13:21:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a4xx.xml  ( 
111898 bytes, from 2017-05-30 19:25:27)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/a5xx.xml  ( 
142603 bytes, from 2017-06-06 17:02:32)
-- /home/robclark/src/freedreno/envytools/rnndb/adreno/ocmem.xml (   
1773 bytes, from 2017-05-17 13:21:27)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/a5xx.xml  ( 141249 
bytes, from 2017-07-04 04:13:12)
+- /home/ilia/src/freedreno/envytools/rnndb/freedreno_copyright.xml  (   1572 
bytes, from 2016-02-11 01:04:14)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/adreno_common.xml (  13324 
bytes, from 2017-07-04 02:59:47)
+- /home/ilia/src/freedreno/envytools/rnndb/adreno/adreno_pm4.xml(  31866 
bytes, from 2017-07-04 02:59:47)
 
 Copyright (C) 2013-2017 by the following authors:
 - Rob Clark <robdcl...@gmail.com> (robclark)
@@ -119,6 +114,11 @@ enum a5xx_vtx_fmt {
VFMT5_8_8_8_8_SNORM = 50,
VFMT5_8_8_8_8_UINT = 51,
VFMT5_8_8_8_8_SINT = 52,
+   VFMT5_10_10_10_2_UNORM = 54,
+   VFMT5_10_10_10_2_SNORM = 57,
+   VFMT5_10_10_10_2_UINT = 58,
+   VFMT5_10_10_10_2_SINT = 59,
+   VFMT5_11_11_10_FLOAT = 66,
VFMT5_16_16_UNORM = 67,
VFMT5_16_16_SNORM = 68,
VFMT5_16_16_FLOAT = 69,
@@ -204,14 +204,45 @@ enum a5xx_tex_fmt {
TFMT5_32_32_FLOAT = 103,
TFMT5_32_32_UINT = 104,
TFMT5_32_32_SINT = 105,
+   TFMT5_32_32_32_UINT = 114,
+   TFMT5_32_32_32_SINT = 115,
+   TFMT5_32_32_32_FLOAT = 116,
TFMT5_32_32_32_32_FLOAT = 130,
TFMT5_32_32_32_32_UINT = 131,
TFMT5_32_32_32_32_SINT = 132,
TFMT5_X8Z24_UNORM = 160,
+   TFMT5_ETC2_RG11_UNORM = 171,
+   TFMT5_ETC2_RG11_SNORM = 172,
+   TFMT5_ETC2_R11_UNORM = 173,
+   TFMT5_ETC2_R11_SNORM = 174,
+   TFMT5_ETC1 = 175,
+   TFMT5_ETC2_RGB8 = 176,
+   TFMT5_ETC2_RGBA8 = 177,
+   TFMT5_ETC2_RGB8A1 = 178,
+   TFMT5_DXT1 = 179,
+   TFMT5_DXT3 = 180,
+   TFMT5_DXT5 = 181,
TFMT5_RGTC1_UNORM = 183,
TFMT5_RGTC1_SNORM = 184,
TFMT5_RGTC2_UNORM = 187,
TFMT5_RGTC2_SNORM = 188,
+   TFMT5_BPTC_UFLOAT = 190,
+   TFMT5_BPTC_FLOAT = 191,
+   TFMT5_BPTC = 192,
+   TFMT5_ASTC_4x4 = 193,
+   TFMT5_ASTC_5x4 = 194,
+   TFMT5_ASTC_5x5 = 195,
+   TFMT5_ASTC_6x5 = 196,
+   TFMT5_ASTC_6x6 = 197,
+   TFMT5_ASTC_8x5 = 198,
+   TFMT5_ASTC_8x6 = 199,
+   TFMT5_ASTC_8x8 = 200,
+   TFMT5_ASTC_10x5 = 201,
+   TFMT5_ASTC_10x6 = 202,
+   TFMT5_ASTC_10x8 = 203,
+   TFMT5_ASTC_10x10 = 204,
+   TFMT5_ASTC_12x10 = 205,
+   TFMT5_ASTC_12x12 = 206,
 };
 
 enum a5xx_tex_fetchsize {
@@ -3719,12 +3750,18 @@ static inline uint32_t 
A5XX_VFD_DECODE_INSTR_IDX(uint32_t val)
return ((val) << A5XX_VFD_DECODE_INSTR_IDX__SHIFT) & 
A5XX_VFD_DECODE_INSTR_IDX__MASK;
 }
 #define A5XX_VFD_DECODE_INSTR_INSTANCED
0x0002
-#define A5XX_VFD_DECODE_INSTR_FORMAT__MASK 0x3ff0
+#define A5XX_VFD_DECODE_INSTR_FORMAT__MASK 0x0ff0
 #define A5XX_VFD_DECODE_INSTR_FORMAT__SHIFT20
 static inline uint32_t A5XX_VFD_DECODE_INSTR_FORMAT(enum a5xx_vtx_fmt val)
 {
return ((val) << A5XX_VFD_DECODE_INSTR_FORMAT__SHIFT) & 
A5XX_VFD_DECODE_INSTR_FORMAT__MASK;
 }
+#define A5XX_VFD_DECODE_INSTR_SWAP__MASK   0x3000
+#define A5XX_VFD_DECODE_INSTR_SWAP__SHIFT  28
+static inline uint32_t A5XX_VFD_DECODE_INSTR_SWAP(enum a3xx_color_swap val)
+{
+   return ((val) <

[Freedreno] [PATCH] freedreno: pack texture buffer objects in 2d logical space

2016-09-04 Thread Ilia Mirkin
This artificially converts a buffer into a 8K x N 2D texture to fetch
texels from. As a result we can access up to 8K x 8K texels on a3xx, and
16K x 16K on a4xx. This could be further expanded into 3D space if
necessary, but 64M should be enough.

We have to check out-of-bounds conditions in the shader since otherwise
we wouldn't be able to prevent a situation where the last line of the
texture covers unallocated pages.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---

The limits we were previous allowing are too small. The spec requires at least 
64K.

 src/gallium/drivers/freedreno/a3xx/fd3_texture.c   | 15 +---
 src/gallium/drivers/freedreno/a4xx/fd4_texture.c   | 13 +++
 src/gallium/drivers/freedreno/freedreno_screen.c   |  7 ++--
 .../drivers/freedreno/ir3/ir3_compiler_nir.c   | 40 --
 src/gallium/drivers/freedreno/ir3/ir3_shader.c | 32 +
 src/gallium/drivers/freedreno/ir3/ir3_shader.h |  7 +++-
 6 files changed, 94 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_texture.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_texture.c
index 94caaed..875bd49 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_texture.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_texture.c
@@ -194,10 +194,10 @@ tex_type(unsigned target)
switch (target) {
default:
assert(0);
-   case PIPE_BUFFER:
case PIPE_TEXTURE_1D:
case PIPE_TEXTURE_1D_ARRAY:
return A3XX_TEX_1D;
+   case PIPE_BUFFER:
case PIPE_TEXTURE_RECT:
case PIPE_TEXTURE_2D:
case PIPE_TEXTURE_2D_ARRAY:
@@ -238,11 +238,16 @@ fd3_sampler_view_create(struct pipe_context *pctx, struct 
pipe_resource *prsc,
so->texconst0 |= A3XX_TEX_CONST_0_SRGB;
 
if (prsc->target == PIPE_BUFFER) {
+   unsigned elements =
+   cso->u.buf.size / 
util_format_get_blocksize(cso->format);
lvl = 0;
so->texconst1 =

A3XX_TEX_CONST_1_FETCHSIZE(fd3_pipe2fetchsize(cso->format)) |
-   A3XX_TEX_CONST_1_WIDTH(cso->u.buf.size / 
util_format_get_blocksize(cso->format)) |
-   A3XX_TEX_CONST_1_HEIGHT(1);
+   A3XX_TEX_CONST_1_WIDTH(MIN2(elements, 8192)) |
+   A3XX_TEX_CONST_1_HEIGHT(DIV_ROUND_UP(elements, 8192));
+   so->texconst2 =
+   A3XX_TEX_CONST_2_PITCH(MIN2(elements, 8192) *
+  
util_format_get_blocksize(cso->format));
} else {
unsigned miplevels;
 
@@ -254,10 +259,10 @@ fd3_sampler_view_create(struct pipe_context *pctx, struct 
pipe_resource *prsc,

A3XX_TEX_CONST_1_FETCHSIZE(fd3_pipe2fetchsize(cso->format)) |
A3XX_TEX_CONST_1_WIDTH(u_minify(prsc->width0, lvl)) |
A3XX_TEX_CONST_1_HEIGHT(u_minify(prsc->height0, lvl));
+   so->texconst2 =
+   A3XX_TEX_CONST_2_PITCH(fd3_pipe2nblocksx(cso->format, 
rsc->slices[lvl].pitch) * rsc->cpp);
}
/* when emitted, A3XX_TEX_CONST_2_INDX() must be OR'd in: */
-   so->texconst2 =
-   A3XX_TEX_CONST_2_PITCH(fd3_pipe2nblocksx(cso->format, 
rsc->slices[lvl].pitch) * rsc->cpp);
switch (prsc->target) {
case PIPE_TEXTURE_1D_ARRAY:
case PIPE_TEXTURE_2D_ARRAY:
diff --git a/src/gallium/drivers/freedreno/a4xx/fd4_texture.c 
b/src/gallium/drivers/freedreno/a4xx/fd4_texture.c
index 4faecee..06645ca 100644
--- a/src/gallium/drivers/freedreno/a4xx/fd4_texture.c
+++ b/src/gallium/drivers/freedreno/a4xx/fd4_texture.c
@@ -195,10 +195,10 @@ tex_type(unsigned target)
switch (target) {
default:
assert(0);
-   case PIPE_BUFFER:
case PIPE_TEXTURE_1D:
case PIPE_TEXTURE_1D_ARRAY:
return A4XX_TEX_1D;
+   case PIPE_BUFFER:
case PIPE_TEXTURE_RECT:
case PIPE_TEXTURE_2D:
case PIPE_TEXTURE_2D_ARRAY:
@@ -249,15 +249,16 @@ fd4_sampler_view_create(struct pipe_context *pctx, struct 
pipe_resource *prsc,
}
 
if (cso->target == PIPE_BUFFER) {
-   unsigned elements = cso->u.buf.size / 
util_format_get_blocksize(cso->format);
-
+   unsigned elements =
+   cso->u.buf.size / 
util_format_get_blocksize(cso->format);
lvl = 0;
so->texconst1 =
-   A4XX_TEX_CONST_1_WIDTH(elements) |
-   A4XX_TEX_CONST_1_HEIGHT(1);
+   A4XX_TEX_CONST_1_WIDTH(MIN2(elements, 16384)) |
+   A4XX_TEX_CONST_1_HEIGHT(DIV_ROUND_UP(elements, 16384));
so->texconst2 =

A4XX_TEX_CONST_

[Freedreno] [PATCH 3/3] a3xx: use window scissor to simulate viewport xy clip

2016-08-30 Thread Ilia Mirkin
Unfortunately a3xx does not have a separate disable for depth clipping,
so when depth clamp is enabled, we disable the whole 3d clipper logic.
This in turn also gets rid of the xy clip that it would normally do.
When we detect this would happen, instead we integrate the viewport into
the window scissor. This may have slightly different behavior around
wide points, but it's unlikely that anything depends on this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/freedreno/a3xx/fd3_emit.c | 36 +++
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
index 7945184..6d223c0 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
@@ -629,19 +629,35 @@ fd3_emit_state(struct fd_context *ctx, struct 
fd_ringbuffer *ring,
OUT_RING(ring, val);
}
 
-   if (dirty & FD_DIRTY_SCISSOR) {
+   if (dirty & (FD_DIRTY_SCISSOR | FD_DIRTY_RASTERIZER | 
FD_DIRTY_VIEWPORT)) {
struct pipe_scissor_state *scissor = 
fd_context_get_scissor(ctx);
+   int minx = scissor->minx;
+   int miny = scissor->miny;
+   int maxx = scissor->maxx;
+   int maxy = scissor->maxy;
+
+   /* Unfortunately there is no separate depth clip disable, only 
an all
+* or nothing deal. So when we disable clipping, we must handle 
the
+* viewport clip via scissors.
+*/
+   if (!ctx->rasterizer->depth_clip) {
+   struct pipe_viewport_state *vp = >viewport;
+   minx = MAX2(minx, (int)floorf(vp->translate[0] - 
fabsf(vp->scale[0])));
+   miny = MAX2(miny, (int)floorf(vp->translate[1] - 
fabsf(vp->scale[1])));
+   maxx = MIN2(maxx, (int)ceilf(vp->translate[0] + 
fabsf(vp->scale[0])));
+   maxy = MIN2(maxy, (int)ceilf(vp->translate[1] + 
fabsf(vp->scale[1])));
+   }
 
OUT_PKT0(ring, REG_A3XX_GRAS_SC_WINDOW_SCISSOR_TL, 2);
-   OUT_RING(ring, A3XX_GRAS_SC_WINDOW_SCISSOR_TL_X(scissor->minx) |
-   
A3XX_GRAS_SC_WINDOW_SCISSOR_TL_Y(scissor->miny));
-   OUT_RING(ring, A3XX_GRAS_SC_WINDOW_SCISSOR_BR_X(scissor->maxx - 
1) |
-   A3XX_GRAS_SC_WINDOW_SCISSOR_BR_Y(scissor->maxy 
- 1));
-
-   ctx->batch->max_scissor.minx = 
MIN2(ctx->batch->max_scissor.minx, scissor->minx);
-   ctx->batch->max_scissor.miny = 
MIN2(ctx->batch->max_scissor.miny, scissor->miny);
-   ctx->batch->max_scissor.maxx = 
MAX2(ctx->batch->max_scissor.maxx, scissor->maxx);
-   ctx->batch->max_scissor.maxy = 
MAX2(ctx->batch->max_scissor.maxy, scissor->maxy);
+   OUT_RING(ring, A3XX_GRAS_SC_WINDOW_SCISSOR_TL_X(minx) |
+   A3XX_GRAS_SC_WINDOW_SCISSOR_TL_Y(miny));
+   OUT_RING(ring, A3XX_GRAS_SC_WINDOW_SCISSOR_BR_X(maxx - 1) |
+   A3XX_GRAS_SC_WINDOW_SCISSOR_BR_Y(maxy - 1));
+
+   ctx->batch->max_scissor.minx = 
MIN2(ctx->batch->max_scissor.minx, minx);
+   ctx->batch->max_scissor.miny = 
MIN2(ctx->batch->max_scissor.miny, miny);
+   ctx->batch->max_scissor.maxx = 
MAX2(ctx->batch->max_scissor.maxx, maxx);
+   ctx->batch->max_scissor.maxy = 
MAX2(ctx->batch->max_scissor.maxy, maxy);
}
 
if (dirty & FD_DIRTY_VIEWPORT) {
-- 
2.7.3

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH 2/3] a3xx: make use of software clipping when hw can't handle it

2016-08-30 Thread Ilia Mirkin
The hw clipper only handles up to 6 UCPs. If there are more than 6 UCPs,
or a clip vertex, or clip distances are in use, then we must use the
fallback discard-based clipping from the frag shader.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/freedreno/a3xx/fd3_draw.c|  3 +++
 src/gallium/drivers/freedreno/a3xx/fd3_emit.c| 12 
 src/gallium/drivers/freedreno/a3xx/fd3_program.c | 15 +++
 src/gallium/drivers/freedreno/a3xx/fd3_program.h |  3 +++
 src/gallium/drivers/freedreno/ir3/ir3_shader.c   |  6 ++
 src/gallium/drivers/freedreno/ir3/ir3_shader.h   |  1 +
 6 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_draw.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_draw.c
index a1594b6..d26786f 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_draw.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_draw.c
@@ -156,6 +156,9 @@ fd3_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info)
.sprite_coord_mode = ctx->rasterizer->sprite_coord_mode,
};
 
+   if (fd3_needs_manual_clipping(ctx->prog.vp, ctx->rasterizer))
+   emit.key.ucp_enables = ctx->rasterizer->clip_plane_enable;
+
fixup_shader_state(ctx, );
 
unsigned dirty = ctx->dirty;
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
index e66836b..7945184 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
@@ -571,20 +571,24 @@ fd3_emit_state(struct fd_context *ctx, struct 
fd_ringbuffer *ring,
if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_PROG)) {
uint32_t val = fd3_rasterizer_stateobj(ctx->rasterizer)
->gras_cl_clip_cntl;
+   uint8_t planes = ctx->rasterizer->clip_plane_enable;
val |= COND(fp->writes_pos, 
A3XX_GRAS_CL_CLIP_CNTL_ZCLIP_DISABLE);
val |= COND(fp->frag_coord, A3XX_GRAS_CL_CLIP_CNTL_ZCOORD |
A3XX_GRAS_CL_CLIP_CNTL_WCOORD);
-   /* TODO only use if prog doesn't use clipvertex/clipdist */
-   val |= A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES(
-   
MIN2(util_bitcount(ctx->rasterizer->clip_plane_enable), 6));
+   if (!emit->key.ucp_enables)
+   val |= A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES(
+   MIN2(util_bitcount(planes), 6));
OUT_PKT0(ring, REG_A3XX_GRAS_CL_CLIP_CNTL, 1);
OUT_RING(ring, val);
}
 
-   if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_UCP)) {
+   if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_PROG | FD_DIRTY_UCP)) {
uint32_t planes = ctx->rasterizer->clip_plane_enable;
int count = 0;
 
+   if (emit->key.ucp_enables)
+   planes = 0;
+
while (planes && count < 6) {
int i = ffs(planes) - 1;
 
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_program.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_program.c
index 485a4da..3146dc5 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_program.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_program.c
@@ -28,6 +28,7 @@
 
 #include "pipe/p_state.h"
 #include "util/u_string.h"
+#include "util/u_math.h"
 #include "util/u_memory.h"
 #include "util/u_inlines.h"
 #include "util/u_format.h"
@@ -85,6 +86,20 @@ fd3_vp_state_delete(struct pipe_context *pctx, void *hwcso)
delete_shader_stateobj(so);
 }
 
+bool
+fd3_needs_manual_clipping(const struct fd3_shader_stateobj *so,
+ const struct 
pipe_rasterizer_state *rast)
+{
+   uint64_t outputs = ir3_shader_outputs(so->shader);
+
+   return (!rast->depth_clip ||
+   util_bitcount(rast->clip_plane_enable) > 6 ||
+   outputs & ((1ULL << VARYING_SLOT_CLIP_VERTEX) |
+  (1ULL << VARYING_SLOT_CLIP_DIST0) |
+  (1ULL << VARYING_SLOT_CLIP_DIST1)));
+}
+
+
 static void
 emit_shader(struct fd_ringbuffer *ring, const struct ir3_shader_variant *so)
 {
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_program.h 
b/src/gallium/drivers/freedreno/a3xx/fd3_program.h
index b3fcc0c..b95df4c 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_program.h
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_program.h
@@ -44,4 +44,7 @@ void fd3_program_emit(struct fd_ringbuffer *ring, struct 
fd3_emit *emit,
 
 void fd3_prog_init(struct pipe_context *pctx);
 
+bool fd3_needs_manual_clipping(const struct fd3_shader_stat

[Freedreno] [PATCH] a3xx: make use of software clipping when hw can't handle it

2016-08-19 Thread Ilia Mirkin
The hw clipper only handles up to 6 UCPs. If there are more than 6 UCPs,
or a clip vertex, or clip distances are in use, then we must use the
fallback discard-based clipping from the frag shader.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a3xx/fd3_draw.c|  4 
 src/gallium/drivers/freedreno/a3xx/fd3_emit.c| 12 
 src/gallium/drivers/freedreno/a3xx/fd3_program.c | 14 ++
 src/gallium/drivers/freedreno/a3xx/fd3_program.h |  2 ++
 src/gallium/drivers/freedreno/ir3/ir3_shader.c   |  6 ++
 src/gallium/drivers/freedreno/ir3/ir3_shader.h   |  1 +
 6 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_draw.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_draw.c
index a1594b6..bebe944 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_draw.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_draw.c
@@ -156,6 +156,10 @@ fd3_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *info)
.sprite_coord_mode = ctx->rasterizer->sprite_coord_mode,
};
 
+   if (fd3_needs_manual_clipping(
+   ctx->prog.vp, 
ctx->rasterizer->clip_plane_enable))
+   emit.key.ucp_enables = ctx->rasterizer->clip_plane_enable;
+
fixup_shader_state(ctx, );
 
unsigned dirty = ctx->dirty;
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
index e66836b..7945184 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
@@ -571,20 +571,24 @@ fd3_emit_state(struct fd_context *ctx, struct 
fd_ringbuffer *ring,
if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_PROG)) {
uint32_t val = fd3_rasterizer_stateobj(ctx->rasterizer)
->gras_cl_clip_cntl;
+   uint8_t planes = ctx->rasterizer->clip_plane_enable;
val |= COND(fp->writes_pos, 
A3XX_GRAS_CL_CLIP_CNTL_ZCLIP_DISABLE);
val |= COND(fp->frag_coord, A3XX_GRAS_CL_CLIP_CNTL_ZCOORD |
A3XX_GRAS_CL_CLIP_CNTL_WCOORD);
-   /* TODO only use if prog doesn't use clipvertex/clipdist */
-   val |= A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES(
-   
MIN2(util_bitcount(ctx->rasterizer->clip_plane_enable), 6));
+   if (!emit->key.ucp_enables)
+   val |= A3XX_GRAS_CL_CLIP_CNTL_NUM_USER_CLIP_PLANES(
+   MIN2(util_bitcount(planes), 6));
OUT_PKT0(ring, REG_A3XX_GRAS_CL_CLIP_CNTL, 1);
OUT_RING(ring, val);
}
 
-   if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_UCP)) {
+   if (dirty & (FD_DIRTY_RASTERIZER | FD_DIRTY_PROG | FD_DIRTY_UCP)) {
uint32_t planes = ctx->rasterizer->clip_plane_enable;
int count = 0;
 
+   if (emit->key.ucp_enables)
+   planes = 0;
+
while (planes && count < 6) {
int i = ffs(planes) - 1;
 
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_program.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_program.c
index 485a4da..057a514 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_program.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_program.c
@@ -28,6 +28,7 @@
 
 #include "pipe/p_state.h"
 #include "util/u_string.h"
+#include "util/u_math.h"
 #include "util/u_memory.h"
 #include "util/u_inlines.h"
 #include "util/u_format.h"
@@ -85,6 +86,19 @@ fd3_vp_state_delete(struct pipe_context *pctx, void *hwcso)
delete_shader_stateobj(so);
 }
 
+bool
+fd3_needs_manual_clipping(const struct fd3_shader_stateobj *so,
+ uint8_t ucp_enables)
+{
+   uint64_t outputs = ir3_shader_outputs(so->shader);
+
+   return (util_bitcount(ucp_enables) > 6 ||
+   outputs & ((1ULL << VARYING_SLOT_CLIP_VERTEX) |
+  (1ULL << VARYING_SLOT_CLIP_DIST0) |
+  (1ULL << VARYING_SLOT_CLIP_DIST1)));
+}
+
+
 static void
 emit_shader(struct fd_ringbuffer *ring, const struct ir3_shader_variant *so)
 {
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_program.h 
b/src/gallium/drivers/freedreno/a3xx/fd3_program.h
index b3fcc0c..847830e 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_program.h
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_program.h
@@ -44,4 +44,6 @@ void fd3_program_emit(struct fd_ringbuffer *ring, struct 
fd3_emit *emit,
 
 void fd3_prog_init(struct pipe_context *pctx);
 
+bool fd3_needs_manual_clipping(const struct fd3_shader_stateobj *, uint8_t);
+
 #endif /* FD3_PROGRAM_H_ */
diff --git a/src/gall

[Freedreno] [PATCH] a4xx: add some comments around CL_NDRANGE values

2016-08-15 Thread Ilia Mirkin
---
 rnndb/adreno/a4xx.xml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/rnndb/adreno/a4xx.xml b/rnndb/adreno/a4xx.xml
index 9f4c3d2..12f28b5 100644
--- a/rnndb/adreno/a4xx.xml
+++ b/rnndb/adreno/a4xx.xml
@@ -2107,12 +2107,12 @@ perhaps they should be taken with a grain of salt



-   
-   
+   
+   

-   
+   

-   
+   



-- 
2.7.3

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] freedreno: fix depth clamping on a3xx/a4xx

2016-08-14 Thread Ilia Mirkin
We were previously ... not clamping. I guess this meant that everything
got clamped to 1/0, which was enough to pass the existing tests. Or
perhaps the clamping would only happen to the rasterized depth value and
not the frag shader's output depth value. Either way, this broke
dolphin's new depth implementation, which seems to work better with
this patch.

Tested on a4xx but not a3xx.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/drivers/freedreno/a3xx/a3xx.xml.h |  2 +-
 src/gallium/drivers/freedreno/a3xx/fd3_emit.c | 30 ++-
 src/gallium/drivers/freedreno/a4xx/a4xx.xml.h |  2 +-
 src/gallium/drivers/freedreno/a4xx/fd4_emit.c | 29 +-
 4 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a3xx/a3xx.xml.h 
b/src/gallium/drivers/freedreno/a3xx/a3xx.xml.h
index dcb6dfb..bf787d1 100644
--- a/src/gallium/drivers/freedreno/a3xx/a3xx.xml.h
+++ b/src/gallium/drivers/freedreno/a3xx/a3xx.xml.h
@@ -1472,7 +1472,7 @@ static inline uint32_t A3XX_RB_DEPTH_CONTROL_ZFUNC(enum 
adreno_compare_func val)
 {
return ((val) << A3XX_RB_DEPTH_CONTROL_ZFUNC__SHIFT) & 
A3XX_RB_DEPTH_CONTROL_ZFUNC__MASK;
 }
-#define A3XX_RB_DEPTH_CONTROL_BF_ENABLE
0x0080
+#define A3XX_RB_DEPTH_CONTROL_Z_CLAMP_ENABLE   0x0080
 #define A3XX_RB_DEPTH_CONTROL_Z_TEST_ENABLE0x8000
 
 #define REG_A3XX_RB_DEPTH_CLEAR
0x2101
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
index 0fb2ee1..130223c 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_emit.c
@@ -31,6 +31,7 @@
 #include "util/u_memory.h"
 #include "util/u_helpers.h"
 #include "util/u_format.h"
+#include "util/u_viewport.h"
 
 #include "freedreno_resource.h"
 #include "freedreno_query_hw.h"
@@ -536,7 +537,7 @@ fd3_emit_state(struct fd_context *ctx, struct fd_ringbuffer 
*ring,

A3XX_RB_STENCILREFMASK_BF_STENCILREF(sr->ref_value[1]));
}
 
-   if (dirty & (FD_DIRTY_ZSA | FD_DIRTY_PROG)) {
+   if (dirty & (FD_DIRTY_ZSA | FD_DIRTY_RASTERIZER | FD_DIRTY_PROG)) {
uint32_t val = fd3_zsa_stateobj(ctx->zsa)->rb_depth_control;
if (fp->writes_pos) {
val |= A3XX_RB_DEPTH_CONTROL_FRAG_WRITES_Z;
@@ -545,6 +546,9 @@ fd3_emit_state(struct fd_context *ctx, struct fd_ringbuffer 
*ring,
if (fp->has_kill) {
val |= A3XX_RB_DEPTH_CONTROL_EARLY_Z_DISABLE;
}
+   if (!ctx->rasterizer->depth_clip) {
+   val |= A3XX_RB_DEPTH_CONTROL_Z_CLAMP_ENABLE;
+   }
OUT_PKT0(ring, REG_A3XX_RB_DEPTH_CONTROL, 1);
OUT_RING(ring, val);
}
@@ -648,6 +652,30 @@ fd3_emit_state(struct fd_context *ctx, struct 
fd_ringbuffer *ring,
OUT_RING(ring, 
A3XX_GRAS_CL_VPORT_ZSCALE(ctx->viewport.scale[2]));
}
 
+   if (dirty & (FD_DIRTY_VIEWPORT | FD_DIRTY_RASTERIZER | 
FD_DIRTY_FRAMEBUFFER)) {
+   float zmin, zmax;
+   int depth = 24;
+   if (ctx->batch->framebuffer.zsbuf) {
+   depth = util_format_get_component_bits(
+   
pipe_surface_format(ctx->batch->framebuffer.zsbuf),
+   UTIL_FORMAT_COLORSPACE_ZS, 0);
+   }
+   util_viewport_zmin_zmax(>viewport, 
ctx->rasterizer->clip_halfz,
+   , );
+
+   OUT_PKT0(ring, REG_A3XX_RB_Z_CLAMP_MIN, 2);
+   if (depth == 32) {
+   OUT_RING(ring, fui(zmin));
+   OUT_RING(ring, fui(zmax));
+   } else if (depth == 16) {
+   OUT_RING(ring, (uint32_t)(zmin * 0x));
+   OUT_RING(ring, (uint32_t)(zmax * 0x));
+   } else {
+   OUT_RING(ring, (uint32_t)(zmin * 0xff));
+   OUT_RING(ring, (uint32_t)(zmax * 0xff));
+   }
+   }
+
if (dirty & (FD_DIRTY_PROG | FD_DIRTY_FRAMEBUFFER | 
FD_DIRTY_BLEND_DUAL)) {
struct pipe_framebuffer_state *pfb = >batch->framebuffer;
int nr_cbufs = pfb->nr_cbufs;
diff --git a/src/gallium/drivers/freedreno/a4xx/a4xx.xml.h 
b/src/gallium/drivers/freedreno/a4xx/a4xx.xml.h
index d9a7bb5..8e8fedb 100644
--- a/src/gallium/drivers/freedreno/a4xx/a4xx.xml.h
+++ b/src/gallium/drivers/freedreno/a4xx/a4xx.xml.h
@@ -1376,7 +1376,7 @

[Freedreno] [PATCH] a3xx, a4xx: fix Z_CLAMP_ENABLE name in RB_DEPTH_CONTROL

2016-08-14 Thread Ilia Mirkin
This bit appears in the original revision of the db410c docs, and is
tested on a4xx to work.
---
 rnndb/adreno/a3xx.xml | 2 +-
 rnndb/adreno/a4xx.xml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/rnndb/adreno/a3xx.xml b/rnndb/adreno/a3xx.xml
index 6228804..980f711 100644
--- a/rnndb/adreno/a3xx.xml
+++ b/rnndb/adreno/a3xx.xml
@@ -979,7 +979,7 @@ xsi:schemaLocation="http://nouveau.freedesktop.org/ 
rules-ng.xsd">



-   
+   
Z_TEST_ENABLE bit is set for zfunc other than GL_ALWAYS or 
GL_NEVER


diff --git a/rnndb/adreno/a4xx.xml b/rnndb/adreno/a4xx.xml
index d8e88c1..9f4c3d2 100644
--- a/rnndb/adreno/a4xx.xml
+++ b/rnndb/adreno/a4xx.xml
@@ -1065,7 +1065,7 @@ perhaps they should be taken with a grain of salt



-   
+   


Z_TEST_ENABLE bit is set for zfunc other than GL_ALWAYS or 
GL_NEVER
-- 
2.7.3

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno