[Mesa-dev] Question about OSMesaCreateContextAttribs?

2016-06-30 Thread Mathias Fröhlich
Hi,

Is OSMesaCreateContextAttribs supposed to be an extension function
that should not be linked against and queried for using
OSMesaGetProcAddress or is that a public api function?

If I look into the source I could think its public since it's
listed in the public header and marked as GLAPI.
But it's currently missing in the export lists so that you
cannot actually link against it when building a shared object.

Which variant is the anticipated usage?
Do we want to add the function to the export lists or do
we want to provide an extension function typedef and
force using it as an extension?

Greetings

Mathias___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/36] i965/blorp: Make sample count asserts a bit more lazy

2016-06-30 Thread Pohjolainen, Topi
On Thu, Jun 30, 2016 at 06:57:39AM -0700, Jason Ekstrand wrote:
>On Jun 29, 2016 11:07 PM, "Pohjolainen, Topi"
><[1]topi.pohjolai...@intel.com> wrote:
>>
>> On Wed, Jun 29, 2016 at 05:37:28PM -0700, Jason Ekstrand wrote:
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 10 +-
>> >  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> Could you add some rational here? In the next patch you still use
>> MAX2(mt->num_samples, 1) and it looks that it at least should still
>work
>> without this.
> 
>Over time, these will all get replaced with isl_surf.samples which are
>1 for single sampled.  On the other hand, mt->num_samples is zero for
>single-sampled.

Does that mean that value zero becomes illegal once we get the value from
isl_surf?

> 
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> > index 1e15bd5..257db06 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
>> > @@ -1302,7 +1302,7 @@ brw_blorp_build_nir_shader(struct brw_context
>*brw,
>> > nir_ssa_def *src_pos, *dst_pos, *color;
>> >
>> > /* Sanity checks */
>> > -   if (key->dst_tiled_w && key->rt_samples > 0) {
>> > +   if (key->dst_tiled_w && key->rt_samples > 1) {
>> >/* If the destination image is W tiled and multisampled,
>then the thread
>> > * must be dispatched once per sample, not once per pixel.
>This is
>> > * necessary because after conversion between W and Y
>tiling, there's no
>> > @@ -1333,13 +1333,13 @@ brw_blorp_build_nir_shader(struct
>brw_context *brw,
>> >
>> > /* Make sure layout is consistent with sample count */
>> > assert((key->tex_layout == INTEL_MSAA_LAYOUT_NONE) ==
>> > -  (key->tex_samples == 0));
>> > +  (key->tex_samples <= 1));
>> > assert((key->rt_layout == INTEL_MSAA_LAYOUT_NONE) ==
>> > -  (key->rt_samples == 0));
>> > +  (key->rt_samples <= 1));
>> > assert((key->src_layout == INTEL_MSAA_LAYOUT_NONE) ==
>> > -  (key->src_samples == 0));
>> > +  (key->src_samples <= 1));
>> > assert((key->dst_layout == INTEL_MSAA_LAYOUT_NONE) ==
>> > -  (key->dst_samples == 0));
>> > +  (key->dst_samples <= 1));
>> >
>> > nir_builder b;
>> > nir_builder_init_simple_shader(, NULL, MESA_SHADER_FRAGMENT,
>NULL);
>> > --
>> > 2.5.0.400.gff86faf
>> >
>> > ___
>> > mesa-dev mailing list
>> > [2]mesa-dev@lists.freedesktop.org
>> > [3]https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> References
> 
>1. mailto:topi.pohjolai...@intel.com
>2. mailto:mesa-dev@lists.freedesktop.org
>3. https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] i965: Make room in the batch epilogue for three more pipe controls.

2016-06-30 Thread Francisco Jerez
Review carefully, it sucks to have to keep track of the number of
command packet dwords emitted in the batch epilogue manually.  The
MI_REPORT_PERF_COUNT_BATCH_DWORDS calculation was obviously wrong.
---
 src/mesa/drivers/dri/i965/brw_performance_monitor.c | 10 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h   | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
index cb1ab4c..f8e50e1 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -686,12 +686,12 @@ stop_oa_counters(struct brw_context *brw)
  * The amount of batch space it takes to emit an MI_REPORT_PERF_COUNT snapshot,
  * including the required PIPE_CONTROL flushes.
  *
- * Sandybridge is the worst case scenario: brw_emit_mi_flush
- * expands to three PIPE_CONTROLs which are 4 DWords each.  We have to flush
- * before and after MI_REPORT_PERF_COUNT, so multiply by two.  Finally, add
- * the 3 DWords for MI_REPORT_PERF_COUNT itself.
+ * Sandybridge is the worst case scenario: brw_emit_mi_flush expands to four
+ * PIPE_CONTROLs which are 5 DWords each.  We have to flush before and after
+ * MI_REPORT_PERF_COUNT, so multiply by two.  Finally, add the 3 DWords for
+ * MI_REPORT_PERF_COUNT itself.
  */
-#define MI_REPORT_PERF_COUNT_BATCH_DWORDS (2 * (3 * 4) + 3)
+#define MI_REPORT_PERF_COUNT_BATCH_DWORDS (2 * (4 * 5) + 3)
 
 /**
  * Emit an MI_REPORT_PERF_COUNT command packet.
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index aa1dc38..67e8e8f 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -21,13 +21,13 @@ extern "C" {
  *   - Gen4-5 record ending occlusion query values (4 * 4 = 16 bytes)
  *   - Disabling OA counters on Gen6+ (3 DWords = 12 bytes)
  *   - Ending MI_REPORT_PERF_COUNT on Gen5+, plus associated PIPE_CONTROLs:
- * - Two sets of PIPE_CONTROLs, which become 3 PIPE_CONTROLs each on SNB,
- *   which are 5 DWords each ==> 2 * 3 * 5 * 4 = 120 bytes
+ * - Two sets of PIPE_CONTROLs, which become 4 PIPE_CONTROLs each on SNB,
+ *   which are 5 DWords each ==> 2 * 4 * 5 * 4 = 160 bytes
  * - 3 DWords for MI_REPORT_PERF_COUNT itself on Gen6+.  ==> 12 bytes.
  *   On Ironlake, it's 6 DWords, but we have some slack due to the lack of
  *   Sandybridge PIPE_CONTROL madness.
- *   - CC_STATE workaround on HSW (12 * 4 = 48 bytes)
- * - 5 dwords for initial mi_flush
+ *   - CC_STATE workaround on HSW (17 * 4 = 68 bytes)
+ * - 10 dwords for initial mi_flush
  * - 2 dwords for CC state setup
  * - 5 dwords for the required pipe control at the end
  *   - Restoring L3 configuration: (24 dwords = 96 bytes)
@@ -35,7 +35,7 @@ extern "C" {
  * - 7 dwords for L3 configuration set-up.
  * - 5 dwords for L3 atomic set-up (on HSW).
  */
-#define BATCH_RESERVED 248
+#define BATCH_RESERVED 308
 
 struct intel_batchbuffer;
 
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] i965: Emit SNB write cache flush W/A from brw_emit_pipe_control_flush.

2016-06-30 Thread Francisco Jerez
Shouldn't cause any functional changes at this point, but we have
forgotten to apply this workaround several times in the past, make
sure it doesn't happen again.
---
 src/mesa/drivers/dri/i965/brw_misc_state.c   |  9 -
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 21 +++--
 src/mesa/drivers/dri/i965/intel_fbo.c|  8 
 src/mesa/drivers/dri/i965/intel_tex.c|  8 
 4 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 690c2f6..c3d341f 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -925,15 +925,6 @@ brw_emit_select_pipeline(struct brw_context *brw, enum 
brw_pipeline pipeline)
   const unsigned dc_flush =
  brw->gen >= 7 ? PIPE_CONTROL_DATA_CACHE_FLUSH : 0;
 
-  if (brw->gen == 6) {
- /* Hardware workaround: SNB B-Spec says:
-  *
-  *   Before a PIPE_CONTROL with Write Cache Flush Enable = 1, a
-  *   PIPE_CONTROL with any non-zero post-sync-op is required.
-  */
- brw_emit_post_sync_nonzero_flush(brw);
-  }
-
   brw_emit_pipe_control_flush(brw,
   PIPE_CONTROL_RENDER_TARGET_FLUSH |
   PIPE_CONTROL_DEPTH_CACHE_FLUSH |
diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 4672efd..586355d 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -109,6 +109,17 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
uint32_t flags)
   OUT_BATCH(0);
   ADVANCE_BATCH();
} else if (brw->gen >= 6) {
+  if (brw->gen == 6 &&
+  (flags & PIPE_CONTROL_RENDER_TARGET_FLUSH)) {
+ /* Hardware workaround: SNB B-Spec says:
+  *
+  *   [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush
+  *   Enable = 1, a PIPE_CONTROL with any non-zero post-sync-op is
+  *   required.
+  */
+ brw_emit_post_sync_nonzero_flush(brw);
+  }
+
   flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
 
   BEGIN_BATCH(5);
@@ -325,16 +336,6 @@ brw_emit_mi_flush(struct brw_context *brw)
   PIPE_CONTROL_VF_CACHE_INVALIDATE |
   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
   PIPE_CONTROL_CS_STALL;
-
- if (brw->gen == 6) {
-/* Hardware workaround: SNB B-Spec says:
- *
- * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
- * Flush Enable =1, a PIPE_CONTROL with any non-zero
- * post-sync-op is required.
- */
-brw_emit_post_sync_nonzero_flush(brw);
- }
   }
   brw_emit_pipe_control_flush(brw, flags);
}
diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c 
b/src/mesa/drivers/dri/i965/intel_fbo.c
index 939f9a0..707a9d2 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -1061,14 +1061,6 @@ brw_render_cache_set_check_flush(struct brw_context 
*brw, drm_intel_bo *bo)
   return;
 
if (brw->gen >= 6) {
-  if (brw->gen == 6) {
- /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
-  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
-  * post-sync-op is required.
-  */
- brw_emit_post_sync_nonzero_flush(brw);
-  }
-
   brw_emit_pipe_control_flush(brw,
   PIPE_CONTROL_DEPTH_CACHE_FLUSH |
   PIPE_CONTROL_RENDER_TARGET_FLUSH |
diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
b/src/mesa/drivers/dri/i965/intel_tex.c
index a802d5a..8c32fe3 100644
--- a/src/mesa/drivers/dri/i965/intel_tex.c
+++ b/src/mesa/drivers/dri/i965/intel_tex.c
@@ -364,14 +364,6 @@ intel_texture_barrier(struct gl_context *ctx)
struct brw_context *brw = brw_context(ctx);
 
if (brw->gen >= 6) {
-  if (brw->gen == 6) {
- /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
-  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
-  * post-sync-op is required.
-  */
- brw_emit_post_sync_nonzero_flush(brw);
-  }
-
   brw_emit_pipe_control_flush(brw,
   PIPE_CONTROL_DEPTH_CACHE_FLUSH |
   PIPE_CONTROL_RENDER_TARGET_FLUSH |
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] i965: Emit SKL VF cache invalidation W/A from brw_emit_pipe_control_flush.

2016-06-30 Thread Francisco Jerez
There were two places in the driver doing a pipe control VF cache
flush, one of them was missing this workaround, move it down into
brw_emit_pipe_control_flush to make sure we don't miss it again.
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 586355d..14a8f7c 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -100,6 +100,16 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
uint32_t flags)
   if (brw->gen == 8)
  gen8_add_cs_stall_workaround_bits();
 
+  if (brw->gen == 9 &&
+  (flags & PIPE_CONTROL_VF_CACHE_INVALIDATE)) {
+ /* Hardware workaround: SKL
+  *
+  * Emit Pipe Control with all bits set to zero before emitting
+  * a Pipe Control with VF Cache Invalidate set.
+  */
+ brw_emit_pipe_control_flush(brw, 0);
+  }
+
   BEGIN_BATCH(6);
   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2));
   OUT_BATCH(flags);
@@ -322,15 +332,6 @@ brw_emit_mi_flush(struct brw_context *brw)
} else {
   int flags = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_RENDER_TARGET_FLUSH;
   if (brw->gen >= 6) {
- if (brw->gen == 9) {
-/* Hardware workaround: SKL
- *
- * Emit Pipe Control with all bits set to zero before emitting
- * a Pipe Control with VF Cache Invalidate set.
- */
-brw_emit_pipe_control_flush(brw, 0);
- }
-
  flags |= PIPE_CONTROL_INSTRUCTION_INVALIDATE |
   PIPE_CONTROL_DEPTH_CACHE_FLUSH |
   PIPE_CONTROL_VF_CACHE_INVALIDATE |
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] i965: Fix remaining flush vs invalidate race conditions in brw_emit_pipe_control_flush.

2016-06-30 Thread Francisco Jerez
This hardware race condition has caused problems several times already
(see "i965: Fix cache pollution race during L3 partitioning set-up.",
"i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
"i965: intel_texture_barrier reimplemented").  The problem is that
whenever we attempt to both flush and invalidate multiple caches with
a single pipe control command the flush and invalidation happen in
reverse order, so the contents flushed from the R/W caches aren't
guaranteed to become visible from the invalidated caches after the
PIPE_CONTROL command completes execution if some concurrent rendering
workload happened to pollute any of the invalidated R/O caches in the
short window of time between the invalidation and flush.

This makes sure that brw_emit_pipe_control_flush() has the effect
expected by most callers of making the contents flushed from any R/W
caches visible from the invalidated R/O caches.
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++
 src/mesa/drivers/dri/i965/intel_reg.h|  9 +
 2 files changed, 27 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 14a8f7c..05e8c05 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -96,6 +96,24 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context 
*brw, uint32_t flags)
 void
 brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
 {
+   if (brw->gen >= 6 &&
+   (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) &&
+   (flags & PIPE_CONTROL_CACHE_INVALIDATE_BITS)) {
+  /* A pipe control command with flush and invalidate bits set
+   * simultaneously is an inherently racy operation on Gen6+ if the
+   * contents of the flushed caches were intended to become visible from
+   * any of the invalidated caches.  Split it in two PIPE_CONTROLs, the
+   * first one should stall the pipeline to make sure that the flushed R/W
+   * caches are coherent with memory once the specified R/O caches are
+   * invalidated.  On pre-Gen6 hardware the (implicit) R/O cache
+   * invalidation seems to happen at the bottom of the pipeline together
+   * with any write cache flush, so this shouldn't be a concern.
+   */
+  brw_emit_pipe_control_flush(brw, (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) 
|
+   PIPE_CONTROL_CS_STALL);
+  flags &= ~(PIPE_CONTROL_CACHE_FLUSH_BITS | PIPE_CONTROL_CS_STALL);
+   }
+
if (brw->gen >= 8) {
   if (brw->gen == 8)
  gen8_add_cs_stall_workaround_bits();
diff --git a/src/mesa/drivers/dri/i965/intel_reg.h 
b/src/mesa/drivers/dri/i965/intel_reg.h
index 95365fe..7a82be4 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -134,6 +134,15 @@
 #define PIPE_CONTROL_PPGTT_WRITE   (0 << 2)
 #define PIPE_CONTROL_GLOBAL_GTT_WRITE  (1 << 2)
 
+#define PIPE_CONTROL_CACHE_FLUSH_BITS \
+   (PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DATA_CACHE_FLUSH | \
+PIPE_CONTROL_RENDER_TARGET_FLUSH)
+
+#define PIPE_CONTROL_CACHE_INVALIDATE_BITS \
+   (PIPE_CONTROL_STATE_CACHE_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE 
| \
+PIPE_CONTROL_VF_CACHE_INVALIDATE | PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | 
\
+PIPE_CONTROL_INSTRUCTION_INVALIDATE)
+
 /** @} */
 
 #define XY_SETUP_BLT_CMD   (CMD_2D | (0x01 << 22))
-- 
2.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965/blorp: Use flat vertex inputs instead of uniforms

2016-06-30 Thread Jason Ekstrand
I keep saying I'm going to review this "tomorrow" but I really mean it this
time!  I would have today but I spent the whole day arguing the finer
points of surface layout, HiZ, fast clears, and color compression with
Chad.  It was a fun day! :-)

--Jason
On Jun 23, 2016 12:17 PM, "Topi Pohjolainen" 
wrote:

> In addition to the actual vertex coordinates blorp will get another
> vertex input buffer providing the constants that are until now
> provided as uniforms. This will remove the need to configure push
> constants and their allocation in the pipeline.
>
> First three patches refactor the vertex data setup for blorp. The
> existing logic in blorp already supports all gens (gen6-gen9). I
> chose to change the core upload logic accordingly and simply use that
> for blorp.
>
> Patches 5-8 pack the constants in blorp programs into vec4s. By
> default compiler puts input variables two full hardware registers
> apart. Having them in vec4s drops the need to repack them.
>
> Last four patches take actual advantage of the change by dropping
> unnecessary pipeline reconfiguration.
>
> CC: Kenneth Graunke 
> CC: Jason Ekstrand 
>
> Topi Pohjolainen (18):
>   i965/draw: Expose vertex buffer state setup
>   i965: Unify vertex buffer setup
>   i965/blorp: Split vertex data and element setup
>   i965/blorp: Use core vertex buffer state setup
>   i965/blorp: Rename push constants to inputs
>   i965/blorp: Share input slot between pixel kill and blend/scaled
>   i965/blorp: Load tranformation coordinates as vec4
>   i965/blorp: Drop LOAD_UNIFORM macro
>   i965/blorp: Store input read mask
>   i965/blorp: Add support for flat input buffer
>   i965/blorp: Tell vertex fetcher about flat inputs
>   i965/blorp: Prepare for more than two vertex attributes
>   i965/blorp: Use flat inputs instead of uniforms
>   i965/blorp: Remove support for push constants
>   i965/urb: Allow blorp to record current settings
>   i965/blorp: Fix the size requirement for vertex elements
>   i965/blorp/gen7+: Stop trashing push constant allocation
>   i965/blorp/gen7+: Do not trigger push constant space reconfig
>
>  src/mesa/drivers/dri/i965/brw_blorp.c |  18 +-
>  src/mesa/drivers/dri/i965/brw_blorp.h |  81 ++---
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  | 135 +--
>  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp |  15 +-
>  src/mesa/drivers/dri/i965/brw_context.h   |  12 +-
>  src/mesa/drivers/dri/i965/brw_draw.h  |  13 ++
>  src/mesa/drivers/dri/i965/brw_draw_upload.c   |  74 +---
>  src/mesa/drivers/dri/i965/gen6_blorp.c| 240
> +-
>  src/mesa/drivers/dri/i965/gen7_blorp.c| 153 
>  src/mesa/drivers/dri/i965/gen7_urb.c  |  93 +-
>  src/mesa/drivers/dri/i965/gen8_blorp.c|  86 ++---
>  src/mesa/drivers/dri/i965/gen8_draw_upload.c  |  41 ++---
>  12 files changed, 448 insertions(+), 513 deletions(-)
>
> --
> 2.5.5
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] i965: intel_texture_barrier reimplemented

2016-06-30 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Jun 30, 2016 9:25 PM, "Francisco Jerez"  wrote:
>>
>> Jason Ekstrand  writes:
>>
>> > Fwiw, I very much like the way I did this in the Vulkan driver where it
>> > splits it into two pipe controls automatically based on the input bits.
>> > (Look at genX_cmd_buffer.c cmd_buffer_apply_pipe_flushes.)  I very much
>> > doubt that this is the only place we have this problem in the GL driver.
>>
>> Yes, it definitely isn't.
>>
>> > We should probably fix it in brw_emit_pipe_control.
>>
>> I had been poking around with making brw_emit_pipe_control_flush (I
>> think that's what you meant by brw_emit_pipe_control?) catch the racy
>> invalidate+flush combinations and emit two pipe controls instead.  I'm
>> not terribly attached to the idea of hard-coding this much policy into
>> the rather low-level brw_emit_pipe_control_flush(), but it is sure less
>> code (and more idiot-proof) than fixing all of its users individually.
>>
>> In case other people don't like it I have another series that fixes the
>> remaining PIPE_CONTROL races manually and just adds an assertion to
>> brw_emit_pipe_control_flush() checking that the caller is not trying to
>> flush and invalidate at the same time in a single PIPE_CONTROL command,
>> I don't really care doing it one way or another.
>
> Since you've already written all that code, you should be in top form for
> answering this question:  Is there anywhere in the code where we emit a
> pipe control containing both flushes and invalidates but *don't* want the
> stall?  If not, then I'm a big fan of idiot-proof.

Nope, we always want the the flush with CS stall followed by the R/O
cache invalidation as far as I'm aware.

>
>> I'll reply to this thread with the fixes I've written which apply on top
>> of Alejandro's patch -- I believe we want to land v3 of his fix anyway
>> because it replaces a call to brw_emit_mi_flush [AKA "I don't know how
>> to use pipe control, just flush everything for me" ;)] with the right
>> sequence of cache flushes and invalidations, which is always a good
>> thing.
>>
>> > On Jun 30, 2016 12:00 AM, "Alejandro Piñeiro" 
> wrote:
>> >
>> > Fixes:
>> > GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>> >
>> > On Haswell, Broadwell and Skylake (note that in order to execute that
>> > test, it is needed to override GL and GLSL versions).
>> >
>> > On gen6 this test was already working without this change. It keeps
>> > working after it.
>> >
>> > This commit replaces the call to brw_emit_mi_flush for gen6+ with two
>> > calls to brw_emit_pipe_control_flush:
>> >
>> >  * The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
>> >a render cache flush after any concurrent rendering completes and
>> >cause the CS to stop parsing commands until the render cache
>> >becomes coherent with memory.
>> >
>> >  * The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
>> >to clean up any stale data from the sampler caches before rendering
>> >continues.
>> >
>> > Didn't touch gen4-5, basically because I don't have a way to test
>> > them.
>> >
>> > More info on commits:
>> > 0aa4f99f562a05880a779707cbcd46be459863bf
>> > 72473658c51d5e074ce219c1e6385a4cce29f467
>> >
>> > Thanks to Curro to help to tracking this down, as the root case was a
>> > hw race condition.
>> >
>> > v2: use two calls to pipe_control_flush instead of a combination of
>> > gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
>> > v3: no need to const cache invalidation (Curro)
>> > ---
>> >
>> > FWIW: checked with the CTS tests, and the piglit series, and confirmed
>> > that the const cache invalidation is not needed.
>> >
>> >  src/mesa/drivers/dri/i965/intel_tex.c | 21 -
>> >  1 file changed, 20 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/intel_tex.c
>> > b/src/mesa/drivers/dri/i965/intel_tex.c
>> > index cac33ac..a802d5a 100644
>> > --- a/src/mesa/drivers/dri/i965/intel_tex.c
>> > +++ b/src/mesa/drivers/dri/i965/intel_tex.c
>> > @@ -9,6 +9,7 @@
>> >  #include "intel_mipmap_tree.h"
>> >  #include "intel_tex.h"
>> >  #include "intel_fbo.h"
>> > +#include "intel_reg.h"
>> >
>> >  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>> >
>> > @@ -362,7 +363,25 @@ intel_texture_barrier(struct gl_context *ctx)
>> >  {
>> > struct brw_context *brw = brw_context(ctx);
>> >
>> > -   brw_emit_mi_flush(brw);
>> > +   if (brw->gen >= 6) {
>> > +  if (brw->gen == 6) {
>> > + /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
>> > +  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
>> > +  * post-sync-op is required.
>> > +  */
>> > + brw_emit_post_sync_nonzero_flush(brw);
>> > +  }
>> > +
>> > +  brw_emit_pipe_control_flush(brw,
>> > +  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
>> > +  

Re: [Mesa-dev] [PATCH v3] i965: intel_texture_barrier reimplemented

2016-06-30 Thread Jason Ekstrand
On Jun 30, 2016 9:25 PM, "Francisco Jerez"  wrote:
>
> Jason Ekstrand  writes:
>
> > Fwiw, I very much like the way I did this in the Vulkan driver where it
> > splits it into two pipe controls automatically based on the input bits.
> > (Look at genX_cmd_buffer.c cmd_buffer_apply_pipe_flushes.)  I very much
> > doubt that this is the only place we have this problem in the GL driver.
>
> Yes, it definitely isn't.
>
> > We should probably fix it in brw_emit_pipe_control.
>
> I had been poking around with making brw_emit_pipe_control_flush (I
> think that's what you meant by brw_emit_pipe_control?) catch the racy
> invalidate+flush combinations and emit two pipe controls instead.  I'm
> not terribly attached to the idea of hard-coding this much policy into
> the rather low-level brw_emit_pipe_control_flush(), but it is sure less
> code (and more idiot-proof) than fixing all of its users individually.
>
> In case other people don't like it I have another series that fixes the
> remaining PIPE_CONTROL races manually and just adds an assertion to
> brw_emit_pipe_control_flush() checking that the caller is not trying to
> flush and invalidate at the same time in a single PIPE_CONTROL command,
> I don't really care doing it one way or another.

Since you've already written all that code, you should be in top form for
answering this question:  Is there anywhere in the code where we emit a
pipe control containing both flushes and invalidates but *don't* want the
stall?  If not, then I'm a big fan of idiot-proof.

> I'll reply to this thread with the fixes I've written which apply on top
> of Alejandro's patch -- I believe we want to land v3 of his fix anyway
> because it replaces a call to brw_emit_mi_flush [AKA "I don't know how
> to use pipe control, just flush everything for me" ;)] with the right
> sequence of cache flushes and invalidations, which is always a good
> thing.
>
> > On Jun 30, 2016 12:00 AM, "Alejandro Piñeiro" 
wrote:
> >
> > Fixes:
> > GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
> >
> > On Haswell, Broadwell and Skylake (note that in order to execute that
> > test, it is needed to override GL and GLSL versions).
> >
> > On gen6 this test was already working without this change. It keeps
> > working after it.
> >
> > This commit replaces the call to brw_emit_mi_flush for gen6+ with two
> > calls to brw_emit_pipe_control_flush:
> >
> >  * The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
> >a render cache flush after any concurrent rendering completes and
> >cause the CS to stop parsing commands until the render cache
> >becomes coherent with memory.
> >
> >  * The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
> >to clean up any stale data from the sampler caches before rendering
> >continues.
> >
> > Didn't touch gen4-5, basically because I don't have a way to test
> > them.
> >
> > More info on commits:
> > 0aa4f99f562a05880a779707cbcd46be459863bf
> > 72473658c51d5e074ce219c1e6385a4cce29f467
> >
> > Thanks to Curro to help to tracking this down, as the root case was a
> > hw race condition.
> >
> > v2: use two calls to pipe_control_flush instead of a combination of
> > gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
> > v3: no need to const cache invalidation (Curro)
> > ---
> >
> > FWIW: checked with the CTS tests, and the piglit series, and confirmed
> > that the const cache invalidation is not needed.
> >
> >  src/mesa/drivers/dri/i965/intel_tex.c | 21 -
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/intel_tex.c
> > b/src/mesa/drivers/dri/i965/intel_tex.c
> > index cac33ac..a802d5a 100644
> > --- a/src/mesa/drivers/dri/i965/intel_tex.c
> > +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> > @@ -9,6 +9,7 @@
> >  #include "intel_mipmap_tree.h"
> >  #include "intel_tex.h"
> >  #include "intel_fbo.h"
> > +#include "intel_reg.h"
> >
> >  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
> >
> > @@ -362,7 +363,25 @@ intel_texture_barrier(struct gl_context *ctx)
> >  {
> > struct brw_context *brw = brw_context(ctx);
> >
> > -   brw_emit_mi_flush(brw);
> > +   if (brw->gen >= 6) {
> > +  if (brw->gen == 6) {
> > + /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
> > +  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
> > +  * post-sync-op is required.
> > +  */
> > + brw_emit_post_sync_nonzero_flush(brw);
> > +  }
> > +
> > +  brw_emit_pipe_control_flush(brw,
> > +  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> > +  PIPE_CONTROL_RENDER_TARGET_FLUSH |
> > +  PIPE_CONTROL_CS_STALL);
> > +
> > +  brw_emit_pipe_control_flush(brw,
> > +
PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> > +   } else {
> > +  brw_emit_mi_flush(brw);
> > +   }
> >  }
> >
> 

Re: [Mesa-dev] [PATCH v3] i965: intel_texture_barrier reimplemented

2016-06-30 Thread Francisco Jerez
Jason Ekstrand  writes:

> Fwiw, I very much like the way I did this in the Vulkan driver where it
> splits it into two pipe controls automatically based on the input bits.
> (Look at genX_cmd_buffer.c cmd_buffer_apply_pipe_flushes.)  I very much
> doubt that this is the only place we have this problem in the GL driver.

Yes, it definitely isn't.

> We should probably fix it in brw_emit_pipe_control.

I had been poking around with making brw_emit_pipe_control_flush (I
think that's what you meant by brw_emit_pipe_control?) catch the racy
invalidate+flush combinations and emit two pipe controls instead.  I'm
not terribly attached to the idea of hard-coding this much policy into
the rather low-level brw_emit_pipe_control_flush(), but it is sure less
code (and more idiot-proof) than fixing all of its users individually.

In case other people don't like it I have another series that fixes the
remaining PIPE_CONTROL races manually and just adds an assertion to
brw_emit_pipe_control_flush() checking that the caller is not trying to
flush and invalidate at the same time in a single PIPE_CONTROL command,
I don't really care doing it one way or another.

I'll reply to this thread with the fixes I've written which apply on top
of Alejandro's patch -- I believe we want to land v3 of his fix anyway
because it replaces a call to brw_emit_mi_flush [AKA "I don't know how
to use pipe control, just flush everything for me" ;)] with the right
sequence of cache flushes and invalidations, which is always a good
thing.

> On Jun 30, 2016 12:00 AM, "Alejandro Piñeiro"  wrote:
>
> Fixes:
> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>
> On Haswell, Broadwell and Skylake (note that in order to execute that
> test, it is needed to override GL and GLSL versions).
>
> On gen6 this test was already working without this change. It keeps
> working after it.
>
> This commit replaces the call to brw_emit_mi_flush for gen6+ with two
> calls to brw_emit_pipe_control_flush:
>
>  * The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
>a render cache flush after any concurrent rendering completes and
>cause the CS to stop parsing commands until the render cache
>becomes coherent with memory.
>
>  * The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
>to clean up any stale data from the sampler caches before rendering
>continues.
>
> Didn't touch gen4-5, basically because I don't have a way to test
> them.
>
> More info on commits:
> 0aa4f99f562a05880a779707cbcd46be459863bf
> 72473658c51d5e074ce219c1e6385a4cce29f467
>
> Thanks to Curro to help to tracking this down, as the root case was a
> hw race condition.
>
> v2: use two calls to pipe_control_flush instead of a combination of
> gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
> v3: no need to const cache invalidation (Curro)
> ---
>
> FWIW: checked with the CTS tests, and the piglit series, and confirmed
> that the const cache invalidation is not needed.
>
>  src/mesa/drivers/dri/i965/intel_tex.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c
> b/src/mesa/drivers/dri/i965/intel_tex.c
> index cac33ac..a802d5a 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> @@ -9,6 +9,7 @@
>  #include "intel_mipmap_tree.h"
>  #include "intel_tex.h"
>  #include "intel_fbo.h"
> +#include "intel_reg.h"
>
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>
> @@ -362,7 +363,25 @@ intel_texture_barrier(struct gl_context *ctx)
>  {
> struct brw_context *brw = brw_context(ctx);
>
> -   brw_emit_mi_flush(brw);
> +   if (brw->gen >= 6) {
> +  if (brw->gen == 6) {
> + /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
> +  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
> +  * post-sync-op is required.
> +  */
> + brw_emit_post_sync_nonzero_flush(brw);
> +  }
> +
> +  brw_emit_pipe_control_flush(brw,
> +  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> +  PIPE_CONTROL_RENDER_TARGET_FLUSH |
> +  PIPE_CONTROL_CS_STALL);
> +
> +  brw_emit_pipe_control_flush(brw,
> +  PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> +   } else {
> +  brw_emit_mi_flush(brw);
> +   }
>  }
>
>  void
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 6/8] nv50/ir: optimize ADD3(d, 0x0, b, c) to ADD(d, b, c)

2016-06-30 Thread Ilia Mirkin
On Thu, Jun 30, 2016 at 6:54 PM, Samuel Pitoiset
 wrote:
>
>
> On 07/01/2016 12:44 AM, Ilia Mirkin wrote:
>>
>> If moveSources doesn't move modifiers, we have a serious problem.
>> However it looks like it does:
>>
>> void
>> Instruction::setSrc(int s, const ValueRef& ref)
>> {
>>setSrc(s, ref.get());
>>srcs[s].mod = ref.mod;
>> }
>>
>> which is what moveSources calls.
>
>
> I was not sure about moveSources() because we have two variants and the
> other one doesn't move the modifiers.
>>
>>
>>
>> On Thu, Jun 30, 2016 at 6:26 PM, Samuel Pitoiset
>>  wrote:
>>>
>>> And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well.
>>>
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12
>>> +++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index 1cf1fa3..517f779 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -1032,7 +1032,17 @@ ConstantFolding::opnd(Instruction *i,
>>> ImmediateValue , int s)
>>>  i->src(0).mod = Modifier(0);
>>>}
>>>break;
>>> -
>>> +   case OP_ADD3:
>>> +  if (i->usesFlags())
>>> + break;
>>
>>
>> Why? ADD can produce/consume a flag just fine.
>
>
> Well, this is loosely based on OP_ADD which does exactly the same check.

Right, because you can't get rid of ADD 0, $c. (There is no MOV $c
variant... or we don't support it ... we probably should, I think
that's what CSET/CSETP are.) However you can definitely flip the ADD3
into an ADD2 even if the carry flag is being added in.

>
>>
>>> +  if (imm0.isInteger(0)) {
>>> + i->op = OP_ADD;
>>> + for (int k = s; k < 2; k++) {
>>> +i->setSrc(k, i->getSrc(k + 1));
>>> +i->src(k).mod = i->src(k + 1).mod;
>>> + }
>>
>>
>> aka
>>
>> i->moveSources(s + 1, -1) ?
>
>
> Yes.
>
>
>>
>>> +  }
>>> +  break;
>>> case OP_DIV:
>>>if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
>>>   break;
>>> --
>>> 2.8.3
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] V3 On disk shader cache for i965 (Now with real world results!)

2016-06-30 Thread Timothy Arceri
On Thu, 2016-06-30 at 00:59 +0300, Grazvydas Ignotas wrote:
> On Wed, Jun 29, 2016 at 3:11 PM, Timothy Arceri
>  wrote:
> > On Wed, 2016-06-29 at 03:47 +0300, Grazvydas Ignotas wrote:
> > > On Tue, Jun 28, 2016 at 10:53 AM, Timothy Arceri
> > >  wrote:
> > > > On Mon, 2016-06-27 at 00:46 +1000, Timothy Arceri wrote:
> > > > > On Sun, 2016-06-26 at 16:15 +0300, Grazvydas Ignotas wrote:
> > > > > > Tried this while playing with apitrace and am getting
> > > > > > segfaults
> > > > > > when
> > > > > > running any trace with a cached (second) run. Not sure if
> > > > > > it's
> > > > > > "wrong"
> > > > > > traces I've chosen or what, you can take one example from
> > > > > > this
> > > > > > bug:
> > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=96425
> > > > > 
> > > > > Thanks for testing I'll take a look tomorrow.
> > > > 
> > > > The problem is the shaders were being detached after linking so
> > > > we
> > > > had
> > > > nothing to fallback to if we had a shade cache miss.
> > > > I've hacked something up and pushed it to the shader-cache19
> > > > branch
> > > > that allows the trace to run. Not sure how it relates to real
> > > > game
> > > > performance but the trace goes from 5FPS to 7FPS on the second
> > > > run
> > > > on
> > > > my machine with which looks good :)
> > > 
> > > Seems to work now and makes things a good deal faster. nice!
> > > 
> > > However I have a case of one trace's cache seemingly affecting
> > > another
> > > trace, are you interested in that?
> > 
> > Yes, very interested in any bugs :)
> > 
> > >  One of them (the one that gets
> > > broken) is from this bug:
> > > https://bugs.freedesktop.org/show_bug.cgi?id=92229
> > > Unfortunately the other "bad" one is my own and is over a
> > > gigabyte
> > > (even compressed), I'll need to trim it I guess.
> > 
> > If your happy to upload it somewhere I'm happy to download it.
> 
> Ok then, it's here:
> https://drive.google.com/file/d/0Bz8fw_SGGDzsZVBMSWF6dlRCMFE/view?usp
> =sharing
> 
> Steps to reproduce:
> rm -rf ~/.cache/mesa
> MESA_GLSL_CACHE_ENABLE=1 glretrace -b Soma_slow_trim.x86_64.trace
> MESA_GLSL_CACHE_ENABLE=1 glretrace -b Soma.bin.x86_64.trace # from
> bug 92229
> 
> The first one should hit an assert due to reasons unrelated to cache,
> after that playing the second one crashes on free() due to some
> corruption for me. If you remove the "bad" cache and just play the
> second one, it works with empty and it's own full cache.

I couldn't hit a crash using this method. However these traces do hit a
bug with the previous hack I did that could be the problem you are
seeing. I've pushed a fix to the same branch.

Now I'm trying to track down a problem where the walls and floor in the
room are missing in the trace from the bug on the first run. They are
there in the second run so it seems like another bug in the fallback
code.

> 
> Gražvydas
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] nv50/ir: add preliminary support for OP_ADD3

2016-06-30 Thread Ilia Mirkin
On Thu, Jun 30, 2016 at 6:26 PM, Samuel Pitoiset
 wrote:
> This instruction is new since SM50 (Maxwell) and allows to perform
> an add with three sources. Unfortunately, it only supports integers.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 1 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp   | 3 +++
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp | 4 
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp  | 1 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp  | 5 -
>  6 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> index 94e54bb..6afe7c7 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> @@ -163,6 +163,7 @@ enum operation
> OP_SHFL, // warp shuffle
> OP_VOTE,
> OP_BUFQ, // buffer query
> +   OP_ADD3,

I would highly recommend placing this op closer to the other arithmetic ops.

> OP_LAST
>  };
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
> index 6469f71..d88bdce 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
> @@ -192,6 +192,7 @@ const char *operationStr[OP_LAST + 1] =
> "shfl",
> "vote",
> "bufq",
> +   "add3",
> "(invalid)"
>  };
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> index cc98d32..8067787 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
> @@ -57,6 +57,7 @@ const uint8_t Target::operationSrcNr[] =
> 3,  // SHFL
> 1,  // VOTE
> 1,  // BUFQ
> +   3,  // ADD3
> 0
>  };
>
> @@ -135,6 +136,8 @@ const OpClass Target::operationClass[] =
> OPCLASS_OTHER,
> // BUFQ
> OPCLASS_OTHER,
> +   // ADD3
> +   OPCLASS_ARITH,
> OPCLASS_PSEUDO // LAST
>  };
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> index 92caeb2..5608b5e 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
> @@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) const
> case OP_DIV:
> case OP_MOD:
>return false;
> +   case OP_ADD3:
> +  if (isFloatType(ty))
> + return false;
> +  break;
> default:
>break;
> }
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
> index b37ea73..e1a7963 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
> @@ -437,6 +437,7 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
> case OP_EXTBF:
> case OP_EXIT: // want exit modifier instead (on NOP if required)
> case OP_MEMBAR:
> +   case OP_ADD3:
>return false;
> case OP_SAD:
>return ty == TYPE_S32;
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
> index 932ec39..f895019 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
> @@ -147,7 +147,8 @@ static const struct opProperties _initProps[] =
> { OP_SUSTP,   0x0, 0x0, 0x0, 0x0, 0x2, 0x0 },
> { OP_SUCLAMP, 0x0, 0x0, 0x0, 0x0, 0x2, 0x2 },
> { OP_SUBFM,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
> -   { OP_SUEAU,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 }
> +   { OP_SUEAU,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
> +   { OP_ADD3,0x7, 0x0, 0x0, 0x0, 0x2, 0x2 },

Should probably update the commutative bitmask too (which is where
you'll find that moving it somewhere into range will be beneficial),
so that load propagation can swap it into the right place. [And, in a
later commit, teach that swap function to try to look at arg2 as
well.]

>  };
>
>  void TargetNVC0::initOpInfo()
> @@ -415,6 +416,8 @@ TargetNVC0::isOpSupported(operation op, DataType ty) const
>return false;
> if (op == OP_POW || op == OP_SQRT || op == OP_DIV || op == OP_MOD)
>return false;
> +   if (op == OP_ADD3)
> +  return false;
> return true;
>  }
>
> --
> 2.8.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> 

[Mesa-dev] [PATCH] mesa: make attribute binding message more useful

2016-06-30 Thread Timothy Arceri
---
 src/mesa/main/shader_query.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index b5e1a44..a2a93b1 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -84,7 +84,8 @@ _mesa_BindAttribLocation(GLuint program, GLuint index,
}
 
if (index >= ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs) {
-  _mesa_error(ctx, GL_INVALID_VALUE, "glBindAttribLocation(index)");
+  _mesa_error(ctx, GL_INVALID_VALUE, "glBindAttribLocation(%u >= %u)",
+  index, ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs);
   return;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] nvc0: fix up image support for allowing multiple samples

2016-06-30 Thread Ilia Mirkin
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.

Signed-off-by: Ilia Mirkin 
---

v1 -> v2:
 - fix sample count result
 - rework handleSUQ to properly handle the tex mask
 - add clamping of indirect address

 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  3 +
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 80 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 24 +++
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h|  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 20 +++---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 20 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_tex.c|  4 +-
 7 files changed, 106 insertions(+), 47 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 0fa5aa1..f3d7bee 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -2388,6 +2388,9 @@ Converter::getImageCoords(std::vector , 
int r, int s)
 
for (int c = 0; c < arg; ++c)
   coords.push_back(fetchSrc(s, c));
+
+   if (t.isMS())
+  coords.push_back(fetchSrc(s, 3));
 }
 
 // For raw loads, granularity is 4 byte.
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 67bd73b..0558ae0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1569,44 +1569,51 @@ static inline uint16_t getSuClampSubOp(const 
TexInstruction *su, int c)
 bool
 NVC0LoweringPass::handleSUQ(TexInstruction *suq)
 {
+   int mask = suq->tex.mask;
int dim = suq->tex.target.getDim();
int arg = dim + (suq->tex.target.isArray() || suq->tex.target.isCube());
-   uint8_t s = prog->driver->io.auxCBSlot;
Value *ind = suq->getIndirectR();
uint32_t base;
-   int c;
-
-   base = prog->driver->io.suInfoBase + suq->tex.r * NVE4_SU_INFO__STRIDE;
+   int c, d;
 
-   if (ind)
-  ind = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getScratch(),
+   if (ind) {
+  ind = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(),
+   ind, bld.mkImm(suq->tex.r));
+  ind = bld.mkOp2v(OP_AND, TYPE_U32, bld.getSSA(),
+   ind, bld.mkImm(7));
+  ind = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(),
ind, bld.mkImm(6));
+  base = 0;
+   } else {
+  base = suq->tex.r * NVE4_SU_INFO__STRIDE;
+   }
 
-   for (c = 0; c < arg; ++c) {
-  if (suq->defExists(c)) {
- int offset;
+   for (c = 0, d = 0; c < 3; ++c, mask >>= 1) {
+  if (c >= arg || !(mask & 1))
+ continue;
 
- if (c == 1 && suq->tex.target == TEX_TARGET_1D_ARRAY) {
-offset = base + NVE4_SU_INFO_SIZE(2);
- } else {
-offset = base + NVE4_SU_INFO_SIZE(c);
- }
- bld.mkLoad(TYPE_U32, suq->getDef(c),
-bld.mkSymbol(FILE_MEMORY_CONST, s, TYPE_U32, offset), ind);
-  }
-   }
+  int offset;
 
-   if (suq->tex.target.isCube()) {
-  if (suq->defExists(2)) {
- bld.mkOp2(OP_DIV, TYPE_U32, suq->getDef(2), suq->getDef(2),
-   bld.loadImm(NULL, 6));
+  if (c == 1 && suq->tex.target == TEX_TARGET_1D_ARRAY) {
+ offset = NVE4_SU_INFO_SIZE(2);
+  } else {
+ offset = NVE4_SU_INFO_SIZE(c);
   }
+  bld.mkMov(suq->getDef(d++), loadSuInfo32(ind, base + offset));
+  if (c == 2 && suq->tex.target.isCube())
+ bld.mkOp2(OP_DIV, TYPE_U32, suq->getDef(d - 1), suq->getDef(d - 1),
+   bld.loadImm(NULL, 6));
}
 
-   if (suq->defExists(3)) {
-  // .w contains the number of samples for multi-sampled images but we
-  // don't support them for now.
-  bld.mkMov(suq->getDef(3), bld.loadImm(NULL, 1));
+   if (mask & 1) {
+  if (suq->tex.target.isMS()) {
+ Value *ms_x = loadSuInfo32(ind, base + NVE4_SU_INFO_MS(0));
+ Value *ms_y = loadSuInfo32(ind, base + NVE4_SU_INFO_MS(1));
+ Value *ms = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getScratch(), ms_x, 
ms_y);
+ bld.mkOp2(OP_SHL, TYPE_U32, suq->getDef(d++), bld.loadImm(NULL, 1), 
ms);
+  } else {
+ bld.mkMov(suq->getDef(d++), bld.loadImm(NULL, 1));
+  }
}
 
bld.remove(suq);
@@ -1616,7 +1623,7 @@ NVC0LoweringPass::handleSUQ(TexInstruction *suq)
 void
 NVC0LoweringPass::adjustCoordinatesMS(TexInstruction *tex)
 {
-   const uint16_t base = tex->tex.r * NVE4_SU_INFO__STRIDE;
+   uint16_t base;
const int arg = tex->tex.target.getArgCount();
 
if (tex->tex.target == TEX_TARGET_2D_MS)
@@ -1632,13 +1639,18 @@ 

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #13 from Luchesar V. ILIEV  ---
Sorry for the spam, but I just realised that earlier game versions can be
tested. Going back as far as 1.0.0, the situation is the same: textures work
in-game, but are broken in the start new game screen. Unless I'm missing
something important, it thus seems that whatever brought the improvement must
be in Mesa. Whether intentional or not, kudos! Hopefully, the remaining
problems can be fixed too.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/radeon: use max_alloc_size

2016-06-30 Thread Vedran Miletić

On 07/01/2016 02:52 AM, Vedran Miletić wrote:

Had something similar in the works, Bas did as well, but this approach
is cleaner.

With these changes, in si_pipe.c and r600_pipe.c, you should not return
max_const_buffer_size anymore, since it can exceed int limits, but
instead something like

MAX2(0x7ff, max_const_buffer_size)



That was supposed to be MIN2().


Haven't tried running memory-heavy OpenCL programs, but clinfo other
than const buffer size looks good.




Regards,
Vedran

--
Vedran Miletić
vedran.miletic.net
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/radeon: use max_alloc_size

2016-06-30 Thread Vedran Miletić

On 07/01/2016 01:29 AM, Marek Olšák wrote:

From: Marek Olšák 

also fix max_global_size to take a maximum of {vram_size, gart_size}
---
 src/gallium/drivers/r600/r600_pipe.c  | 2 +-
 src/gallium/drivers/radeon/r600_pipe_common.c | 9 +++--
 src/gallium/drivers/radeonsi/si_pipe.c| 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 119c76b..55bbde1 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -301,7 +301,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
return 0;

case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
-   return MIN2(rscreen->b.info.vram_size, 0x);
+   return MIN2(rscreen->b.info.max_alloc_size, 0x);

 case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
 return R600_MAP_BUFFER_ALIGNMENT;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index d7f1d41..f75fa6c 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -864,8 +864,8 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
 * 4 * MAX_MEM_ALLOC_SIZE.
 */
*max_global_size = MIN2(4 * max_mem_alloc_size,
-   rscreen->info.gart_size +
-   rscreen->info.vram_size);
+   MAX2(rscreen->info.gart_size,
+rscreen->info.vram_size));


Can't you also use info.max_alloc_size here?


}
return sizeof(uint64_t);

@@ -889,10 +889,7 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
if (ret) {
uint64_t *max_mem_alloc_size = ret;

-   /* XXX: The limit in older kernels is 256 MB.  We
-* should add a query here for newer kernels.
-*/
-   *max_mem_alloc_size = 256 * 1024 * 1024;
+   *max_mem_alloc_size = rscreen->info.max_alloc_size;
}
return sizeof(uint64_t);

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index ad2a86a..30e6253 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -419,7 +419,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
   HAVE_LLVM >= 0x0307 ? 410 : 330;

case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
-   return MIN2(sscreen->b.info.vram_size, 0x);
+   return MIN2(sscreen->b.info.max_alloc_size, 0x);

case PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY:
return 0;



Had something similar in the works, Bas did as well, but this approach 
is cleaner.


With these changes, in si_pipe.c and r600_pipe.c, you should not return 
max_const_buffer_size anymore, since it can exceed int limits, but 
instead something like


MAX2(0x7ff, max_const_buffer_size)

Haven't tried running memory-heavy OpenCL programs, but clinfo other 
than const buffer size looks good.


--
Vedran Miletić
vedran.miletic.net
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars

2016-06-30 Thread Jason Ekstrand
On Thu, Jun 30, 2016 at 11:42 AM, Kenneth Graunke 
wrote:

> On Saturday, June 25, 2016 8:37:47 AM PDT Rob Clark wrote:
> > From: Rob Clark 
> >
> > Some games are sloppy.. perhaps because it is defined behavior for DX or
> > perhaps because nv blob driver defaults things to zero.
> >
> > So add driconf param to force uninitialized variables to default to zero.
> >
> > This issue was observed with rust, from steam store.  But has surfaced
> > elsewhere in the past.
> >
> > Signed-off-by: Rob Clark 
> > ---
> > Note that I left out the drirc bit, since not entirely sure how to
> > identify this game.  (I don't actually have the game, just working off
> > of an apitrace)
> >
> > Possibly worth mentioning that for the shaders using uninitialized vars
> > having zero-initializers lets constant-propagation get rid of a whole
> > lot of instructions.  One shader I saw dropped to less than half of
> > it's original instruction count.
> >
> > Second patch in the series is just fixing an i965 bug that was exposed
> > by this patch.
>
> I'm a bit surprised to see this at the GLSL IR level...handling it for
> nir_ssa_undef would probably be simpler.  But I suppose this works too.
>

That was my suggestion.  I figured the gallium people would want it too.


>
> >  src/compiler/glsl/ast_to_hir.cpp| 9 +
> >  src/compiler/glsl/glsl_parser_extras.cpp| 1 +
> >  src/compiler/glsl/glsl_parser_extras.h  | 1 +
> >  src/gallium/include/state_tracker/st_api.h  | 1 +
> >  src/gallium/state_trackers/dri/dri_screen.c | 2 ++
> >  src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 -
> >  src/mesa/drivers/dri/i965/brw_context.c | 2 ++
> >  src/mesa/drivers/dri/i965/intel_screen.c| 4 
> >  src/mesa/main/mtypes.h  | 5 +
> >  src/mesa/state_tracker/st_extensions.c  | 2 ++
> >  10 files changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/compiler/glsl/ast_to_hir.cpp
> b/src/compiler/glsl/ast_to_hir.cpp
> > index 0cfce68..d2c284f 100644
> > --- a/src/compiler/glsl/ast_to_hir.cpp
> > +++ b/src/compiler/glsl/ast_to_hir.cpp
> > @@ -4697,6 +4697,15 @@ ast_declarator_list::hir(exec_list *instructions,
> >apply_layout_qualifier_to_variable(>type->qualifier, var,
> state,
> >   );
> >
> > +  if ((var->data.mode == ir_var_auto || var->data.mode ==
> ir_var_temporary)
> > +  && (var->type->base_type >= GLSL_TYPE_UINT)
> > +  && (var->type->base_type <= GLSL_TYPE_BOOL)
>
> I'd prefer:
>
>  && (var->type->is_numeric() || var->type->is_boolean())
>
> Either way,
> Reviewed-by: Kenneth Graunke 
>
> Patch 2 also gets an R-b with Jason's feedback (d[0] and u[0] on LHS)
> taken care of.
>
> > +  && state->zero_init) {
> > + const ir_constant_data data = {0};
> > + var->data.has_initializer = true;
> > + var->constant_initializer = new(var) ir_constant(var->type,
> );
> > +  }
> > +
> >if (this->type->qualifier.flags.q.invariant) {
> >   if (!is_varying_var(var, state->stage)) {
> >  _mesa_glsl_error(, state,
> > diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
> b/src/compiler/glsl/glsl_parser_extras.cpp
> > index 09f7477..fc2859a 100644
> > --- a/src/compiler/glsl/glsl_parser_extras.cpp
> > +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> > @@ -74,6 +74,7 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct
> gl_context *_ctx,
> > /* Set default language version and extensions */
> > this->language_version = 110;
> > this->forced_language_version = ctx->Const.ForceGLSLVersion;
> > +   this->zero_init = ctx->Const.GLSLZeroInit;
> > this->es_shader = false;
> > this->ARB_texture_rectangle_enable = true;
> >
> > diff --git a/src/compiler/glsl/glsl_parser_extras.h
> b/src/compiler/glsl/glsl_parser_extras.h
> > index 8c43292..669b3d1 100644
> > --- a/src/compiler/glsl/glsl_parser_extras.h
> > +++ b/src/compiler/glsl/glsl_parser_extras.h
> > @@ -306,6 +306,7 @@ struct _mesa_glsl_parse_state {
> > bool es_shader;
> > unsigned language_version;
> > unsigned forced_language_version;
> > +   bool zero_init;
> > gl_shader_stage stage;
> >
> > /**
> > diff --git a/src/gallium/include/state_tracker/st_api.h
> b/src/gallium/include/state_tracker/st_api.h
> > index 41daa47..21d5177 100644
> > --- a/src/gallium/include/state_tracker/st_api.h
> > +++ b/src/gallium/include/state_tracker/st_api.h
> > @@ -242,6 +242,7 @@ struct st_config_options
> > unsigned force_glsl_version;
> > boolean force_s3tc_enable;
> > boolean allow_glsl_extension_directive_midshader;
> > +   boolean glsl_zero_init;
> >  };
> >
> >  /**
> > diff --git a/src/gallium/state_trackers/dri/dri_screen.c
> b/src/gallium/state_trackers/dri/dri_screen.c
> > index 2ac55c8..b16585a 100644
> > 

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #12 from Luchesar V. ILIEV  ---
Slight correction: the textures load correctly in-game; however the planets are
still broken as before (i.e. showing only the night texture) in the "New game"
screen, where the player selects who they want to play as. I'm not entirely
sure, but I believe Gražvydas' snapshots are exactly from this (still broken)
screen.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/util: check for window cliprects in util_can_blit_via_copy_region()

2016-06-30 Thread Roland Scheidegger
Am 01.07.2016 um 02:16 schrieb Brian Paul:
> We can't blit with resource_copy_region() if there are window clip rects.
> ---
>  src/gallium/auxiliary/util/u_surface.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/gallium/auxiliary/util/u_surface.c 
> b/src/gallium/auxiliary/util/u_surface.c
> index e0234f8..a9ed006 100644
> --- a/src/gallium/auxiliary/util/u_surface.c
> +++ b/src/gallium/auxiliary/util/u_surface.c
> @@ -723,6 +723,7 @@ util_can_blit_via_copy_region(const struct pipe_blit_info 
> *blit,
> if ((blit->mask & mask) != mask ||
> blit->filter != PIPE_TEX_FILTER_NEAREST ||
> blit->scissor_enable ||
> +   blit->num_window_rectangles > 0 ||
> blit->alpha_blend) {
>return FALSE;
> }
> 

Reviewed-by: Roland Scheidegger 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/util: check for window cliprects in util_can_blit_via_copy_region()

2016-06-30 Thread Brian Paul
We can't blit with resource_copy_region() if there are window clip rects.
---
 src/gallium/auxiliary/util/u_surface.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/auxiliary/util/u_surface.c 
b/src/gallium/auxiliary/util/u_surface.c
index e0234f8..a9ed006 100644
--- a/src/gallium/auxiliary/util/u_surface.c
+++ b/src/gallium/auxiliary/util/u_surface.c
@@ -723,6 +723,7 @@ util_can_blit_via_copy_region(const struct pipe_blit_info 
*blit,
if ((blit->mask & mask) != mask ||
blit->filter != PIPE_TEX_FILTER_NEAREST ||
blit->scissor_enable ||
+   blit->num_window_rectangles > 0 ||
blit->alpha_blend) {
   return FALSE;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Drop -fno-builtin-memcmp.

2016-06-30 Thread Timothy Arceri
On Fri, 2016-07-01 at 01:26 +0200, Roland Scheidegger wrote:
> Am 01.07.2016 um 00:59 schrieb Matt Turner:
> > According to the referenced bug report, gcc-4.5 and newer do not
> > inline
> > memcmp(). I see no difference in performance of ipers with llvmpipe
> > on a
> > Sandybridge (which does not have "Enhanced REP MOVSB/STOSB") by
> > removing
> > this flag.
> > 
> > I attempted to confirm the problem with gcc-4.4, but it fails to
> > compile
> > for quite a few different reasons.
> > ---
> >  configure.ac | 8 
> >  1 file changed, 8 deletions(-)
> > 
> > diff --git a/configure.ac b/configure.ac
> > index bd8842d..95cfc36 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -304,10 +304,6 @@ if test "x$GCC" = xyes; then
> >  # We don't want floating-point math functions to set errno or
> > trap
> >  CFLAGS="$CFLAGS -fno-math-errno -fno-trapping-math"
> >  
> > -# gcc's builtin memcmp is slower than glibc's
> > -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > -CFLAGS="$CFLAGS -fno-builtin-memcmp"
> > -
> >  # Flags to help ensure that certain portions of the code --
> > and only those
> >  # portions -- can be built with MSVC:
> >  # - src/util, src/gallium/auxiliary,
> > rc/gallium/drivers/llvmpipe, and
> > @@ -343,10 +339,6 @@ if test "x$GXX" = xyes; then
> >  
> >  # Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where
> > needed.
> >  CXXFLAGS=$save_CXXFLAGS
> > -
> > -# gcc's builtin memcmp is slower than glibc's
> > -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > -CXXFLAGS="$CXXFLAGS -fno-builtin-memcmp"
> >  fi
> >  
> >  AC_SUBST([MSVC2013_COMPAT_CFLAGS])
> > 
> 
> FWIW we're using gcc 4.4.3...
> Technically the configure script only requires gcc 4.2 (albeit some
> parts have higher requirements) so if it doesn't compile that's a
> bug.

Maybe we should just bump the minimum to 4.4 I sent patches for this
back in Jan [1][2]. Reading the reply for Jonathan Gray again it seems
really odd that we still have 4.2 as the minimum since the "4.2.1"
version they use is a type of inhanced version, so building on 4.2 is
unlikely to actually work on other systems.

"src/xenocara is built with gcc 4.2.1.  This has in the past been
patched for things like the gnu binary integer constants extension the
i965 code started to require a couple of major Mesa releases
ago.  FreeBSD also patched their gcc for the same."


[1] https://patchwork.freedesktop.org/patch/71701/
[2] https://patchwork.freedesktop.org/patch/71702/

> 
> But we're using scons to build so this should be ok (albeit I think
> different flags in configure.ac and scons are generally not a really
> good idea).
> 
> Roland
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/24] i965: Use LZD to implement nir_op_ifind_msb on Gen < 7

2016-06-30 Thread Francisco Jerez
Ian Romanick  writes:

> From: Ian Romanick 
>
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 50 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 52 
> +++---
>  2 files changed, 81 insertions(+), 21 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index f15bf3e..f8db28a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -623,8 +623,32 @@ fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder 
> ,
>  bool is_signed)
>  {
> fs_inst *inst;
> +   fs_reg temp = src;
>  
> -   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
> +   if (is_signed) {
> +  /* LZD of an absolute value source almost always does the right
> +   * thing.  There are two problem values:
> +   *

This comment seems somewhat misleading, there are many more problem
values if you attempt to just take the LZD of the absolute value of the
argument: Take the negative of any power of two e.g. -8 represented as
0xfff8.  findMSB(-8) is supposed to give you 2 as result, but '31
- lzd(abs(-8)) == 31 - 28 == 3'.

> +   * * 0x8000.  Since abs(0x8000) == 0x8000, LZD returns
> +   *   0.  However, findMSB(int(0x8000)) == 30.
> +   *
> +   * * 0x.  Since abs(0x) == 1, LZD returns
> +   *   31.  Section 8.8 (Integer Functions) of the GLSL 4.50 spec says:
> +   *
> +   *For a value of zero or negative one, -1 will be returned.
> +   *
> +   * For all negative number cases, including 0x8000 and
> +   * 0x, the correct value is obtained from LZD if instead of
> +   * negating the (already negative) value the logical-not is used.  A
> +   * conditonal logical-not can be achieved in two instructions.
> +   */
> +  temp = vgrf(glsl_type::int_type);
> +
> +  bld.ASR(temp, src, brw_imm_d(31));
> +  bld.XOR(temp, temp, src);
> +   }
> +
> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), temp);
>  
> /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
>  * from the LSB side. Subtract the result from 31 to convert the MSB
> @@ -1339,17 +1363,23 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>  
> case nir_op_ifind_msb: {
>assert(nir_dest_bit_size(instr->dest.dest) < 64);
> -  bld.FBH(retype(result, BRW_REGISTER_TYPE_UD), op[0]);
>  
> -  /* FBH counts from the MSB side, while GLSL's findMSB() wants the count
> -   * from the LSB side. If FBH didn't return an error (0x), then
> -   * subtract the result from 31 to convert the MSB count into an LSB 
> count.
> -   */
> -  bld.CMP(bld.null_reg_d(), result, brw_imm_d(-1), BRW_CONDITIONAL_NZ);
> +  if (devinfo->gen < 7) {
> + nir_emit_find_msb_using_lzd(bld, result, op[0], true);
> +  } else {
> + bld.FBH(retype(result, BRW_REGISTER_TYPE_UD), op[0]);
>  
> -  inst = bld.ADD(result, result, brw_imm_d(31));
> -  inst->predicate = BRW_PREDICATE_NORMAL;
> -  inst->src[0].negate = true;
> + /* FBH counts from the MSB side, while GLSL's findMSB() wants the
> +  * count from the LSB side. If FBH didn't return an error
> +  * (0x), then subtract the result from 31 to convert the MSB
> +  * count into an LSB count.
> +  */
> + bld.CMP(bld.null_reg_d(), result, brw_imm_d(-1), 
> BRW_CONDITIONAL_NZ);
> +
> + inst = bld.ADD(result, result, brw_imm_d(31));
> + inst->predicate = BRW_PREDICATE_NORMAL;
> + inst->src[0].negate = true;
> +  }
>break;
> }
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index cd88fb6..2fc2cf2 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -999,8 +999,32 @@ vec4_visitor::nir_emit_find_msb_using_lzd(const dst_reg 
> ,
>bool is_signed)
>  {
> vec4_instruction *inst;
> +   src_reg temp = src;
>  
> -   emit(BRW_OPCODE_LZD, retype(dst, BRW_REGISTER_TYPE_UD), src);
> +   if (is_signed) {
> +  /* LZD of an absolute value source almost always does the right
> +   * thing.  There are two problem values:
> +   *
> +   * * 0x8000.  Since abs(0x8000) == 0x8000, LZD returns
> +   *   0.  However, findMSB(int(0x8000)) == 30.
> +   *
> +   * * 0x.  Since abs(0x) == 1, LZD returns
> +   *   31.  Section 8.8 (Integer Functions) of the GLSL 4.50 spec says:
> +   *
> +   *For a value of zero or negative one, -1 will be returned.
> +   *
> +   * For all negative number cases, 

Re: [Mesa-dev] [PATCH 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-06-30 Thread Francisco Jerez
Ian Romanick  writes:

> On 06/30/2016 03:33 PM, Francisco Jerez wrote:
>> Matt Turner  writes:
>> 
>>> On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
 From: Ian Romanick 

 This uses one less instruction.
>>>
>>> Add FBH to the list of stupid instructions.
>>>
 Signed-off-by: Ian Romanick 
 ---
  src/mesa/drivers/dri/i965/brw_fs.h   |  4 
  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 
 +++-
  src/mesa/drivers/dri/i965/brw_vec4.h |  4 
  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 22 
  6 files changed, 61 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
 b/src/mesa/drivers/dri/i965/brw_fs.h
 index 4237197..22ce092 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.h
 +++ b/src/mesa/drivers/dri/i965/brw_fs.h
 @@ -237,6 +237,10 @@ public:
   nir_tex_instr *instr);
 void nir_emit_jump(const brw::fs_builder ,
nir_jump_instr *instr);
 +   void nir_emit_find_msb_using_lzd(const brw::fs_builder ,
 +const fs_reg ,
 +const fs_reg ,
 +bool is_signed);
 fs_reg get_nir_src(const nir_src );
 fs_reg get_nir_src_imm(const nir_src );
 fs_reg get_nir_dest(const nir_dest );
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 index d25d26a..bda4a26 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 @@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
 dispatch_width)
   /* FBL only supports UD type for dst. */
   brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
   break;
 +  case BRW_OPCODE_LZD:
 + brw_LZD(p, dst, src[0]);
 + break;
case BRW_OPCODE_CBIT:
   assert(devinfo->gen >= 7);
   /* CBIT only supports UD type for dst. */
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index b3f5dfd..f15bf3e 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -617,6 +617,25 @@ 
 fs_visitor::optimize_frontfacing_ternary(nir_alu_instr *instr,
  }

  void
 +fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder ,
 +const fs_reg ,
 +const fs_reg ,
 +bool is_signed)
 +{
 +   fs_inst *inst;
 +
 +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
 +
 +   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
 +* from the LSB side. Subtract the result from 31 to convert the MSB
 +* count into an LSB count.  If no bits are set, LZD will return 32.
 +* 31-32 = -1, which is exactly what findMSB() is supposed to return.
 +*/
 +   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), 
 brw_imm_d(31));
 +   inst->src[0].negate = true;
 +}
>>>
>>> I'd personally be inclined to just inline these functions. I know they
>>> grow somewhat in the next patches... whatever your preference is.
>> 
>> It seems to grow quite a bit in PATCH 16, and it's used in multiple
>> places, right?  How about we keep it factored out but make it a
>> stand-alone function instead of a visitor method?  It doesn't look like
>> it uses *any* internal or external data structures of fs_visitor, it
>> doesn't even dereference the 'this' pointer at all AFAICT, so you could
>> likely improve encapsulation somewhat by making it a static function
>> local to the brw_*_nir.cpp source files.
>
> Okay... two problems:
>
> In the vec4 visitor, all of the emit() functions are in visitor, so this
> new function has to stay there.
>
Ah, right, we should finish the transition to the i965 IR builder in the
vec4 back-end someday.  You could potentially pass a vec4_builder as
argument to the function though, or a vec4_visitor pointer, it's up to
you.

> In the fs visitor, the next patch introduces a call to fs_visitor::vgrf.
>  See previous problem. :(
>
You can use fs_builder::vgrf() instead.

>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP 

Re: [Mesa-dev] [PATCH 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-06-30 Thread Ian Romanick
On 06/30/2016 03:33 PM, Francisco Jerez wrote:
> Matt Turner  writes:
> 
>> On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
>>> From: Ian Romanick 
>>>
>>> This uses one less instruction.
>>
>> Add FBH to the list of stupid instructions.
>>
>>> Signed-off-by: Ian Romanick 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.h   |  4 
>>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 
>>> +++-
>>>  src/mesa/drivers/dri/i965/brw_vec4.h |  4 
>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 22 
>>>  6 files changed, 61 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
>>> b/src/mesa/drivers/dri/i965/brw_fs.h
>>> index 4237197..22ce092 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>>> @@ -237,6 +237,10 @@ public:
>>>   nir_tex_instr *instr);
>>> void nir_emit_jump(const brw::fs_builder ,
>>>nir_jump_instr *instr);
>>> +   void nir_emit_find_msb_using_lzd(const brw::fs_builder ,
>>> +const fs_reg ,
>>> +const fs_reg ,
>>> +bool is_signed);
>>> fs_reg get_nir_src(const nir_src );
>>> fs_reg get_nir_src_imm(const nir_src );
>>> fs_reg get_nir_dest(const nir_dest );
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> index d25d26a..bda4a26 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> @@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
>>> dispatch_width)
>>>   /* FBL only supports UD type for dst. */
>>>   brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
>>>   break;
>>> +  case BRW_OPCODE_LZD:
>>> + brw_LZD(p, dst, src[0]);
>>> + break;
>>>case BRW_OPCODE_CBIT:
>>>   assert(devinfo->gen >= 7);
>>>   /* CBIT only supports UD type for dst. */
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index b3f5dfd..f15bf3e 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -617,6 +617,25 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
>>> *instr,
>>>  }
>>>
>>>  void
>>> +fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder ,
>>> +const fs_reg ,
>>> +const fs_reg ,
>>> +bool is_signed)
>>> +{
>>> +   fs_inst *inst;
>>> +
>>> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
>>> +
>>> +   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
>>> +* from the LSB side. Subtract the result from 31 to convert the MSB
>>> +* count into an LSB count.  If no bits are set, LZD will return 32.
>>> +* 31-32 = -1, which is exactly what findMSB() is supposed to return.
>>> +*/
>>> +   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), 
>>> brw_imm_d(31));
>>> +   inst->src[0].negate = true;
>>> +}
>>
>> I'd personally be inclined to just inline these functions. I know they
>> grow somewhat in the next patches... whatever your preference is.
> 
> It seems to grow quite a bit in PATCH 16, and it's used in multiple
> places, right?  How about we keep it factored out but make it a
> stand-alone function instead of a visitor method?  It doesn't look like
> it uses *any* internal or external data structures of fs_visitor, it
> doesn't even dereference the 'this' pointer at all AFAICT, so you could
> likely improve encapsulation somewhat by making it a static function
> local to the brw_*_nir.cpp source files.

Okay... two problems:

In the vec4 visitor, all of the emit() functions are in visitor, so this
new function has to stay there.

In the fs visitor, the next patch introduces a call to fs_visitor::vgrf.
 See previous problem. :(

>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/8] radeonsi: keep using v_rcp_f32 for division in future LLVM (v2)

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

This will be needed after some LLVM changes that haven't landed yet.

v2: - use LLVMIsConstant to fix an LLVM assertion failure.
  LLVMSetMetadata doesn't work with constants.
- don't set float metadata as string
---
 src/gallium/drivers/radeon/radeon_llvm.h   |  3 +++
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 29 --
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index ec16def..61afa7a 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -101,6 +101,9 @@ struct radeon_llvm_context {
LLVMValueRef main_fn;
LLVMTypeRef return_type;
 
+   unsigned fpmath_md_kind;
+   LLVMValueRef fpmath_md_2p5_ulp;
+
struct gallivm_state gallivm;
 };
 
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index d183ff0..cf56c3b 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1523,19 +1523,36 @@ static void emit_up2h(const struct lp_build_tgsi_action 
*action,
}
 }
 
+static void emit_fdiv(const struct lp_build_tgsi_action *action,
+ struct lp_build_tgsi_context *bld_base,
+ struct lp_build_emit_data *emit_data)
+{
+   struct radeon_llvm_context *ctx = radeon_llvm_context(bld_base);
+
+   emit_data->output[emit_data->chan] =
+   LLVMBuildFDiv(bld_base->base.gallivm->builder,
+ emit_data->args[0], emit_data->args[1], "");
+
+   /* Use v_rcp_f32 instead of precise division. */
+   if (HAVE_LLVM >= 0x0309 &&
+   !LLVMIsConstant(emit_data->output[emit_data->chan]))
+   LLVMSetMetadata(emit_data->output[emit_data->chan],
+   ctx->fpmath_md_kind, ctx->fpmath_md_2p5_ulp);
+}
+
 /* 1/sqrt is translated to rsq for f32 if fp32 denormals are not enabled in
  * the target machine. f64 needs global unsafe math flags to get rsq. */
 static void emit_rsq(const struct lp_build_tgsi_action *action,
 struct lp_build_tgsi_context *bld_base,
 struct lp_build_emit_data *emit_data)
 {
-   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
LLVMValueRef sqrt =
lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_SQRT,
 emit_data->args[0]);
 
emit_data->output[emit_data->chan] =
-   LLVMBuildFDiv(builder, bld_base->base.one, sqrt, "");
+   lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_DIV,
+ bld_base->base.one, sqrt);
 }
 
 void radeon_llvm_context_init(struct radeon_llvm_context * ctx, const char 
*triple)
@@ -1586,6 +1603,13 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx, const char *trip
bld_base->emit_fetch_funcs[TGSI_FILE_OUTPUT] = radeon_llvm_emit_fetch;
bld_base->emit_fetch_funcs[TGSI_FILE_SYSTEM_VALUE] = fetch_system_value;
 
+   /* metadata allowing 2.5 ULP */
+   ctx->fpmath_md_kind = LLVMGetMDKindIDInContext(ctx->gallivm.context,
+  "fpmath", 6);
+   LLVMValueRef arg = lp_build_const_float(>gallivm, 2.5);
+   ctx->fpmath_md_2p5_ulp = LLVMMDNodeInContext(ctx->gallivm.context,
+, 1);
+
/* Allocate outputs */
ctx->soa.outputs = ctx->outputs;
 
@@ -1615,6 +1639,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx, const char *trip
bld_base->op_actions[TGSI_OPCODE_DFMA].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_DFMA].intr_name = "llvm.fma.f64";
bld_base->op_actions[TGSI_OPCODE_DFRAC].emit = emit_frac;
+   bld_base->op_actions[TGSI_OPCODE_DIV].emit = emit_fdiv;
bld_base->op_actions[TGSI_OPCODE_DNEG].emit = emit_dneg;
bld_base->op_actions[TGSI_OPCODE_DSEQ].emit = emit_dcmp;
bld_base->op_actions[TGSI_OPCODE_DSGE].emit = emit_dcmp;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium/radeon: add radeon_info::max_alloc_size into the winsys interface

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.c | 2 ++
 src/gallium/drivers/radeon/radeon_winsys.h| 1 +
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 2 ++
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 1 +
 4 files changed, 6 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 74674e7..d7f1d41 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -1098,6 +1098,8 @@ bool r600_common_screen_init(struct r600_common_screen 
*rscreen,
printf("chip_class = %i\n", rscreen->info.chip_class);
printf("gart_size = %i MB\n", 
(int)DIV_ROUND_UP(rscreen->info.gart_size, 1024*1024));
printf("vram_size = %i MB\n", 
(int)DIV_ROUND_UP(rscreen->info.vram_size, 1024*1024));
+   printf("max_alloc_size = %i MB\n",
+  (int)DIV_ROUND_UP(rscreen->info.max_alloc_size, 
1024*1024));
printf("has_virtual_memory = %i\n", 
rscreen->info.has_virtual_memory);
printf("gfx_ib_pad_with_type2 = %i\n", 
rscreen->info.gfx_ib_pad_with_type2);
printf("has_sdma = %i\n", rscreen->info.has_sdma);
diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index afb970e..1ba288a 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -253,6 +253,7 @@ struct radeon_info {
 uint32_tgart_page_size;
 uint64_tgart_size;
 uint64_tvram_size;
+uint64_tmax_alloc_size;
 boolhas_dedicated_vram;
 boolhas_virtual_memory;
 boolgfx_ib_pad_with_type2;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index 72afc76..3f5a6f0 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -263,6 +263,8 @@ static bool do_winsys_init(struct amdgpu_winsys *ws, int fd)
/* Set hardware information. */
ws->info.gart_size = gtt.heap_size;
ws->info.vram_size = vram.heap_size;
+   /* TODO: the kernel reports vram/gart.max_allocation == 251 MB (bug?) */
+   ws->info.max_alloc_size = MAX2(ws->info.vram_size, ws->info.gart_size);
/* convert the shader clock from KHz to MHz */
ws->info.max_shader_clock = ws->amdinfo.max_engine_clk / 1000;
ws->info.max_se = ws->amdinfo.num_shader_engines;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index c4d28ff..76a125c 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -372,6 +372,7 @@ static bool do_winsys_init(struct radeon_drm_winsys *ws)
 }
 ws->info.gart_size = gem_info.gart_size;
 ws->info.vram_size = gem_info.vram_size;
+ws->info.max_alloc_size = MAX2(ws->info.vram_size, ws->info.gart_size);
 
 /* Get max clock frequency info and convert it to MHz */
 radeon_get_drm_value(ws->fd, RADEON_INFO_MAX_SCLK, NULL,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/8] gallium/radeon: remove unused code - radeon_llvm_util.*

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/r600/evergreen_compute.c  |   3 -
 src/gallium/drivers/radeon/Makefile.sources   |   2 -
 src/gallium/drivers/radeon/radeon_llvm_util.c | 124 --
 src/gallium/drivers/radeon/radeon_llvm_util.h |  39 
 src/gallium/drivers/radeonsi/si_compute.c |   1 -
 5 files changed, 169 deletions(-)
 delete mode 100644 src/gallium/drivers/radeon/radeon_llvm_util.c
 delete mode 100644 src/gallium/drivers/radeon/radeon_llvm_util.h

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 8db525e..96495f4 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -46,9 +46,6 @@
 #include "evergreen_compute_internal.h"
 #include "compute_memory_pool.h"
 #include "sb/sb_public.h"
-#ifdef HAVE_OPENCL
-#include "radeon/radeon_llvm_util.h"
-#endif
 #include "radeon/radeon_elf_util.h"
 #include 
 
diff --git a/src/gallium/drivers/radeon/Makefile.sources 
b/src/gallium/drivers/radeon/Makefile.sources
index 6fbed81..d6dd786 100644
--- a/src/gallium/drivers/radeon/Makefile.sources
+++ b/src/gallium/drivers/radeon/Makefile.sources
@@ -30,6 +30,4 @@ LLVM_C_FILES := \
radeon_llvm_emit.c \
radeon_llvm_emit.h \
radeon_llvm.h \
-   radeon_llvm_util.c \
-   radeon_llvm_util.h \
radeon_setup_tgsi_llvm.c
diff --git a/src/gallium/drivers/radeon/radeon_llvm_util.c 
b/src/gallium/drivers/radeon/radeon_llvm_util.c
deleted file mode 100644
index da19533..000
--- a/src/gallium/drivers/radeon/radeon_llvm_util.c
+++ /dev/null
@@ -1,124 +0,0 @@
-/*
- * Copyright 2012, 2013 Advanced Micro Devices, Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
- * SOFTWARE.
- *
- * Authors: Tom Stellard 
- *
- */
-
-#include "radeon_llvm_util.h"
-#include "util/u_memory.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-LLVMModuleRef radeon_llvm_parse_bitcode(LLVMContextRef ctx,
-   const char * bitcode, 
unsigned bitcode_len)
-{
-   LLVMMemoryBufferRef buf;
-   LLVMModuleRef module;
-
-   buf = LLVMCreateMemoryBufferWithMemoryRangeCopy((const char*)bitcode,
-   bitcode_len, "radeon");
-   LLVMParseBitcodeInContext(ctx, buf, , NULL);
-   LLVMDisposeMemoryBuffer(buf);
-   return module;
-}
-
-unsigned radeon_llvm_get_num_kernels(LLVMContextRef ctx,
-   const char *bitcode, unsigned bitcode_len)
-{
-   LLVMModuleRef mod = radeon_llvm_parse_bitcode(ctx, bitcode, 
bitcode_len);
-   return LLVMGetNamedMetadataNumOperands(mod, "opencl.kernels");
-}
-
-static void radeon_llvm_optimize(LLVMModuleRef mod)
-{
-#if HAVE_LLVM < 0x0309
-   const char *data_layout = LLVMGetDataLayout(mod);
-   LLVMTargetDataRef TD = LLVMCreateTargetData(data_layout);
-#endif
-   LLVMPassManagerBuilderRef builder = LLVMPassManagerBuilderCreate();
-   LLVMPassManagerRef pass_manager = LLVMCreatePassManager();
-
-   /* Functions calls are not supported yet, so we need to inline
-* everything.  The most efficient way to do this is to add
-* the always_inline attribute to all non-kernel functions
-* and then run the Always Inline pass.  The Always Inline
-* pass will automaically inline functions with this attribute
-* and does not perform the expensive cost analysis that the normal
-* inliner does.
-*/
-
-   LLVMValueRef fn;
-   for (fn = LLVMGetFirstFunction(mod); fn; fn = LLVMGetNextFunction(fn)) {
-   /* All the non-kernel functions have internal linkage */
-   if (LLVMGetLinkage(fn) == LLVMInternalLinkage) {
-   LLVMAddFunctionAttr(fn, LLVMAlwaysInlineAttribute);
- 

[Mesa-dev] [PATCH 2/2] gallium/radeon: use max_alloc_size

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

also fix max_global_size to take a maximum of {vram_size, gart_size}
---
 src/gallium/drivers/r600/r600_pipe.c  | 2 +-
 src/gallium/drivers/radeon/r600_pipe_common.c | 9 +++--
 src/gallium/drivers/radeonsi/si_pipe.c| 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 119c76b..55bbde1 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -301,7 +301,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
return 0;
 
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
-   return MIN2(rscreen->b.info.vram_size, 0x);
+   return MIN2(rscreen->b.info.max_alloc_size, 0x);
 
 case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
 return R600_MAP_BUFFER_ALIGNMENT;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index d7f1d41..f75fa6c 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -864,8 +864,8 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
 * 4 * MAX_MEM_ALLOC_SIZE.
 */
*max_global_size = MIN2(4 * max_mem_alloc_size,
-   rscreen->info.gart_size +
-   rscreen->info.vram_size);
+   MAX2(rscreen->info.gart_size,
+rscreen->info.vram_size));
}
return sizeof(uint64_t);
 
@@ -889,10 +889,7 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
if (ret) {
uint64_t *max_mem_alloc_size = ret;
 
-   /* XXX: The limit in older kernels is 256 MB.  We
-* should add a query here for newer kernels.
-*/
-   *max_mem_alloc_size = 256 * 1024 * 1024;
+   *max_mem_alloc_size = rscreen->info.max_alloc_size;
}
return sizeof(uint64_t);
 
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index ad2a86a..30e6253 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -419,7 +419,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
   HAVE_LLVM >= 0x0307 ? 410 : 330;
 
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
-   return MIN2(sscreen->b.info.vram_size, 0x);
+   return MIN2(sscreen->b.info.max_alloc_size, 0x);
 
case PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY:
return 0;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] radeonsi: don't dump the shader key for non-monolithic shaders early

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

It's always zero.
---
 src/gallium/drivers/radeonsi/si_shader.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index e2aae85..f2cdd6e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6522,7 +6522,8 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
 * conversion fails. */
if (r600_can_dump_shader(>b, sel->info.processor) &&
!(sscreen->b.debug_flags & DBG_NO_TGSI)) {
-   si_dump_shader_key(sel->type, >key, stderr);
+   if (is_monolithic)
+   si_dump_shader_key(sel->type, >key, stderr);
tgsi_dump(sel->tokens, 0);
si_dump_streamout(>so);
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] radeonsi: don't interpolate colors if flatshading is enabled

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

use v_interp_mov for those
---
 src/gallium/drivers/radeonsi/si_shader.c| 13 -
 src/gallium/drivers/radeonsi/si_shader.h|  2 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 10dd12a..a408dee 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1528,6 +1528,11 @@ static void declare_input_fs(
interp_param = get_interp_param(ctx, interp_param_idx);
}
 
+   if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR &&
+   decl->Interp.Interpolate == TGSI_INTERPOLATE_COLOR &&
+   ctx->shader->key.ps.prolog.flatshade_colors)
+   interp_param = NULL; /* load the constant color */
+
interp_fs_input(ctx, input_index, decl->Semantic.Name,
decl->Semantic.Index, shader->selector->info.num_inputs,
shader->selector->info.colors_read, interp_param,
@@ -6455,6 +6460,7 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
 
case PIPE_SHADER_FRAGMENT:
fprintf(f, "  prolog.color_two_side = %u\n", 
key->ps.prolog.color_two_side);
+   fprintf(f, "  prolog.flatshade_colors = %u\n", 
key->ps.prolog.flatshade_colors);
fprintf(f, "  prolog.poly_stipple = %u\n", 
key->ps.prolog.poly_stipple);
fprintf(f, "  prolog.force_persp_sample_interp = %u\n", 
key->ps.prolog.force_persp_sample_interp);
fprintf(f, "  prolog.force_linear_sample_interp = %u\n", 
key->ps.prolog.force_linear_sample_interp);
@@ -7597,6 +7603,7 @@ static bool si_shader_select_ps_parts(struct si_screen 
*sscreen,
}
 
for (i = 0; i < 2; i++) {
+   unsigned interp = info->input_interpolate[color[i]];
unsigned location = 
info->input_interpolate_loc[color[i]];
 
if (!(info->colors_read & (0xf << i*4)))
@@ -7604,7 +7611,11 @@ static bool si_shader_select_ps_parts(struct si_screen 
*sscreen,
 
prolog_key.ps_prolog.color_attr_index[i] = color[i];
 
-   switch (info->input_interpolate[color[i]]) {
+   if (shader->key.ps.prolog.flatshade_colors &&
+   interp == TGSI_INTERPOLATE_COLOR)
+   interp = TGSI_INTERPOLATE_CONSTANT;
+
+   switch (interp) {
case TGSI_INTERPOLATE_CONSTANT:
prolog_key.ps_prolog.color_interp_vgpr_index[i] 
= -1;
break;
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 3b7b3e1..14ef4e7 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -311,7 +311,7 @@ struct si_tcs_epilog_bits {
 /* Common PS bits between the shader key and the prolog key. */
 struct si_ps_prolog_bits {
unsignedcolor_two_side:1;
-   /* TODO: add a flatshade bit that skips interpolation for colors */
+   unsignedflatshade_colors:1;
unsignedpoly_stipple:1;
unsignedforce_persp_sample_interp:1;
unsignedforce_linear_sample_interp:1;
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 42ebf69..766ef2c 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -924,6 +924,7 @@ static inline void si_shader_selector_key(struct 
pipe_context *ctx,
bool is_line = !is_poly && sctx->current_rast_prim != 
PIPE_PRIM_POINTS;
 
key->ps.prolog.color_two_side = rs->two_side && 
sel->info.colors_read;
+   key->ps.prolog.flatshade_colors = rs->flatshade && 
sel->info.colors_read;
 
if (sctx->queued.named.blend) {
key->ps.epilog.alpha_to_one = 
sctx->queued.named.blend->alpha_to_one &&
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] radeonsi: enable the barycentric optimization in all cases

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
---
 src/gallium/drivers/radeonsi/si_shader.c| 118 +++-
 src/gallium/drivers/radeonsi/si_shader.h|   7 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  18 ++--
 3 files changed, 125 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 4652fe8..10dd12a 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1435,6 +1435,56 @@ static void interp_fs_input(struct si_shader_context 
*ctx,
}
 }
 
+/* LLVMGetParam with bc_optimize resolved. */
+static LLVMValueRef get_interp_param(struct si_shader_context *ctx,
+int interp_param_idx)
+{
+   LLVMBuilderRef builder = ctx->radeon_bld.gallivm.builder;
+   LLVMValueRef main_fn = ctx->radeon_bld.main_fn;
+   LLVMValueRef param = NULL;
+
+   /* Handle PRIM_MASK[31] (bc_optimize). */
+   if (ctx->is_monolithic &&
+   ((ctx->shader->key.ps.prolog.bc_optimize_for_persp &&
+ interp_param_idx == SI_PARAM_PERSP_CENTROID) ||
+(ctx->shader->key.ps.prolog.bc_optimize_for_linear &&
+ interp_param_idx == SI_PARAM_LINEAR_CENTROID))) {
+   /* The shader should do: if (PRIM_MASK[31]) CENTROID = CENTER;
+* The hw doesn't compute CENTROID if the whole wave only
+* contains fully-covered quads.
+*/
+   LLVMValueRef bc_optimize =
+   LLVMGetParam(main_fn, SI_PARAM_PRIM_MASK);
+   bc_optimize = LLVMBuildLShr(builder,
+   bc_optimize,
+   LLVMConstInt(ctx->i32, 31, 0), "");
+   bc_optimize = LLVMBuildTrunc(builder, bc_optimize, ctx->i1, "");
+
+   if (ctx->shader->key.ps.prolog.bc_optimize_for_persp &&
+   interp_param_idx == SI_PARAM_PERSP_CENTROID) {
+   param = LLVMBuildSelect(builder, bc_optimize,
+   LLVMGetParam(main_fn,
+
SI_PARAM_PERSP_CENTER),
+   LLVMGetParam(main_fn,
+
SI_PARAM_PERSP_CENTROID),
+   "");
+   }
+   if (ctx->shader->key.ps.prolog.bc_optimize_for_linear &&
+   interp_param_idx == SI_PARAM_LINEAR_CENTROID) {
+   param = LLVMBuildSelect(builder, bc_optimize,
+   LLVMGetParam(main_fn,
+
SI_PARAM_LINEAR_CENTER),
+   LLVMGetParam(main_fn,
+
SI_PARAM_LINEAR_CENTROID),
+   "");
+   }
+   }
+
+   if (!param)
+   param = LLVMGetParam(main_fn, interp_param_idx);
+   return param;
+}
+
 static void declare_input_fs(
struct radeon_llvm_context *radeon_bld,
unsigned input_index,
@@ -1475,7 +1525,7 @@ static void declare_input_fs(
else if (interp_param_idx) {
interp_param_idx = select_interp_param(ctx,
   interp_param_idx);
-   interp_param = LLVMGetParam(main_fn, interp_param_idx);
+   interp_param = get_interp_param(ctx, interp_param_idx);
}
 
interp_fs_input(ctx, input_index, decl->Semantic.Name,
@@ -5041,7 +5091,7 @@ static void build_interp_intrinsic(const struct 
lp_build_tgsi_action *action,
if (interp_param_idx == -1)
return;
else if (interp_param_idx)
-   interp_param = LLVMGetParam(ctx->radeon_bld.main_fn, 
interp_param_idx);
+   interp_param = get_interp_param(ctx, interp_param_idx);
else
interp_param = NULL;
 
@@ -6410,6 +6460,8 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
fprintf(f, "  prolog.force_linear_sample_interp = %u\n", 
key->ps.prolog.force_linear_sample_interp);
fprintf(f, "  prolog.force_persp_center_interp = %u\n", 
key->ps.prolog.force_persp_center_interp);
fprintf(f, "  prolog.force_linear_center_interp = %u\n", 
key->ps.prolog.force_linear_center_interp);
+   fprintf(f, "  prolog.bc_optimize_for_persp = %u\n", 
key->ps.prolog.bc_optimize_for_persp);
+   fprintf(f, "  prolog.bc_optimize_for_linear = %u\n", 

[Mesa-dev] [PATCH 2/8] radeonsi: split ps.prolog.force_persample_interp into persp and linear bits

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

This reduces the number of v_mov's in the prolog.
---
 src/gallium/drivers/radeonsi/si_shader.c| 85 +++--
 src/gallium/drivers/radeonsi/si_shader.h|  3 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 21 +++---
 3 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index f2cdd6e..438981c 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1278,8 +1278,7 @@ static int lookup_interp_param_index(unsigned 
interpolate, unsigned location)
 static unsigned select_interp_param(struct si_shader_context *ctx,
unsigned param)
 {
-   if (!ctx->shader->key.ps.prolog.force_persample_interp ||
-   !ctx->is_monolithic)
+   if (!ctx->is_monolithic)
return param;
 
/* If the shader doesn't use center/centroid, just return the parameter.
@@ -1287,18 +1286,22 @@ static unsigned select_interp_param(struct 
si_shader_context *ctx,
 * If the shader only uses one set of (i,j), "si_emit_spi_ps_input" can
 * switch between center/centroid and sample without shader changes.
 */
-   switch (param) {
-   case SI_PARAM_PERSP_CENTROID:
-   case SI_PARAM_PERSP_CENTER:
-   return SI_PARAM_PERSP_SAMPLE;
-
-   case SI_PARAM_LINEAR_CENTROID:
-   case SI_PARAM_LINEAR_CENTER:
-   return SI_PARAM_LINEAR_SAMPLE;
-
-   default:
-   return param;
+   if (ctx->shader->key.ps.prolog.force_persp_sample_interp) {
+   switch (param) {
+   case SI_PARAM_PERSP_CENTROID:
+   case SI_PARAM_PERSP_CENTER:
+   return SI_PARAM_PERSP_SAMPLE;
+   }
+   }
+   if (ctx->shader->key.ps.prolog.force_linear_sample_interp) {
+   switch (param) {
+   case SI_PARAM_LINEAR_CENTROID:
+   case SI_PARAM_LINEAR_CENTER:
+   return SI_PARAM_LINEAR_SAMPLE;
+   }
}
+
+   return param;
 }
 
 /**
@@ -6389,7 +6392,8 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
case PIPE_SHADER_FRAGMENT:
fprintf(f, "  prolog.color_two_side = %u\n", 
key->ps.prolog.color_two_side);
fprintf(f, "  prolog.poly_stipple = %u\n", 
key->ps.prolog.poly_stipple);
-   fprintf(f, "  prolog.force_persample_interp = %u\n", 
key->ps.prolog.force_persample_interp);
+   fprintf(f, "  prolog.force_persp_sample_interp = %u\n", 
key->ps.prolog.force_persp_sample_interp);
+   fprintf(f, "  prolog.force_linear_sample_interp = %u\n", 
key->ps.prolog.force_linear_sample_interp);
fprintf(f, "  epilog.spi_shader_col_format = 0x%x\n", 
key->ps.epilog.spi_shader_col_format);
fprintf(f, "  epilog.color_is_int8 = 0x%X\n", 
key->ps.epilog.color_is_int8);
fprintf(f, "  epilog.last_cbuf = %u\n", 
key->ps.epilog.last_cbuf);
@@ -7230,9 +7234,9 @@ static bool si_compile_ps_prolog(struct si_screen 
*sscreen,
}
 
/* Force per-sample interpolation. */
-   if (key->ps_prolog.states.force_persample_interp) {
+   if (key->ps_prolog.states.force_persp_sample_interp) {
unsigned i, base = key->ps_prolog.num_input_sgprs;
-   LLVMValueRef persp_sample[2], linear_sample[2];
+   LLVMValueRef persp_sample[2];
 
/* Read PERSP_SAMPLE. */
for (i = 0; i < 2; i++)
@@ -7245,6 +7249,11 @@ static bool si_compile_ps_prolog(struct si_screen 
*sscreen,
for (i = 0; i < 2; i++)
ret = LLVMBuildInsertValue(gallivm->builder, ret,
   persp_sample[i], base + 4 + 
i, "");
+   }
+   if (key->ps_prolog.states.force_linear_sample_interp) {
+   unsigned i, base = key->ps_prolog.num_input_sgprs;
+   LLVMValueRef linear_sample[2];
+
/* Read LINEAR_SAMPLE. */
for (i = 0; i < 2; i++)
linear_sample[i] = LLVMGetParam(func, base + 6 + i);
@@ -7416,7 +7425,8 @@ static bool si_shader_select_ps_parts(struct si_screen 
*sscreen,
prolog_key.ps_prolog.num_input_vgprs = shader->info.num_input_vgprs;
prolog_key.ps_prolog.wqm = info->uses_derivatives &&
(prolog_key.ps_prolog.colors_read ||
-prolog_key.ps_prolog.states.force_persample_interp);
+prolog_key.ps_prolog.states.force_persp_sample_interp ||
+prolog_key.ps_prolog.states.force_linear_sample_interp);
 
if (info->colors_read) {
unsigned *color = shader->selector->color_attr_index;
@@ -7436,16 +7446,16 @@ static bool si_shader_select_ps_parts(struct si_screen 

[Mesa-dev] [PATCH 3/8] radeonsi: compute only one set of interpolation (i, j) when MSAA is disabled

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.
---
 src/gallium/drivers/radeonsi/si_shader.c| 74 -
 src/gallium/drivers/radeonsi/si_shader.h|  4 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 13 +
 3 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 438981c..4652fe8 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1300,6 +1300,20 @@ static unsigned select_interp_param(struct 
si_shader_context *ctx,
return SI_PARAM_LINEAR_SAMPLE;
}
}
+   if (ctx->shader->key.ps.prolog.force_persp_center_interp) {
+   switch (param) {
+   case SI_PARAM_PERSP_CENTROID:
+   case SI_PARAM_PERSP_SAMPLE:
+   return SI_PARAM_PERSP_CENTER;
+   }
+   }
+   if (ctx->shader->key.ps.prolog.force_linear_center_interp) {
+   switch (param) {
+   case SI_PARAM_LINEAR_CENTROID:
+   case SI_PARAM_LINEAR_SAMPLE:
+   return SI_PARAM_PERSP_CENTER;
+   }
+   }
 
return param;
 }
@@ -6394,6 +6408,8 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
fprintf(f, "  prolog.poly_stipple = %u\n", 
key->ps.prolog.poly_stipple);
fprintf(f, "  prolog.force_persp_sample_interp = %u\n", 
key->ps.prolog.force_persp_sample_interp);
fprintf(f, "  prolog.force_linear_sample_interp = %u\n", 
key->ps.prolog.force_linear_sample_interp);
+   fprintf(f, "  prolog.force_persp_center_interp = %u\n", 
key->ps.prolog.force_persp_center_interp);
+   fprintf(f, "  prolog.force_linear_center_interp = %u\n", 
key->ps.prolog.force_linear_center_interp);
fprintf(f, "  epilog.spi_shader_col_format = 0x%x\n", 
key->ps.epilog.spi_shader_col_format);
fprintf(f, "  epilog.color_is_int8 = 0x%X\n", 
key->ps.epilog.color_is_int8);
fprintf(f, "  epilog.last_cbuf = %u\n", 
key->ps.epilog.last_cbuf);
@@ -7267,6 +7283,40 @@ static bool si_compile_ps_prolog(struct si_screen 
*sscreen,
   linear_sample[i], base + 10 
+ i, "");
}
 
+   /* Force center interpolation. */
+   if (key->ps_prolog.states.force_persp_center_interp) {
+   unsigned i, base = key->ps_prolog.num_input_sgprs;
+   LLVMValueRef persp_center[2];
+
+   /* Read PERSP_CENTER. */
+   for (i = 0; i < 2; i++)
+   persp_center[i] = LLVMGetParam(func, base + 2 + i);
+   /* Overwrite PERSP_SAMPLE. */
+   for (i = 0; i < 2; i++)
+   ret = LLVMBuildInsertValue(gallivm->builder, ret,
+  persp_center[i], base + i, 
"");
+   /* Overwrite PERSP_CENTROID. */
+   for (i = 0; i < 2; i++)
+   ret = LLVMBuildInsertValue(gallivm->builder, ret,
+  persp_center[i], base + 4 + 
i, "");
+   }
+   if (key->ps_prolog.states.force_linear_center_interp) {
+   unsigned i, base = key->ps_prolog.num_input_sgprs;
+   LLVMValueRef linear_center[2];
+
+   /* Read LINEAR_CENTER. */
+   for (i = 0; i < 2; i++)
+   linear_center[i] = LLVMGetParam(func, base + 8 + i);
+   /* Overwrite LINEAR_SAMPLE. */
+   for (i = 0; i < 2; i++)
+   ret = LLVMBuildInsertValue(gallivm->builder, ret,
+  linear_center[i], base + 6 + 
i, "");
+   /* Overwrite LINEAR_CENTROID. */
+   for (i = 0; i < 2; i++)
+   ret = LLVMBuildInsertValue(gallivm->builder, ret,
+  linear_center[i], base + 10 
+ i, "");
+   }
+
/* Tell LLVM to insert WQM instruction sequence when needed. */
if (key->ps_prolog.wqm) {
LLVMAddTargetDependentFunctionAttr(func,
@@ -7426,7 +7476,9 @@ static bool si_shader_select_ps_parts(struct si_screen 
*sscreen,
prolog_key.ps_prolog.wqm = info->uses_derivatives &&
(prolog_key.ps_prolog.colors_read ||
 prolog_key.ps_prolog.states.force_persp_sample_interp ||
-prolog_key.ps_prolog.states.force_linear_sample_interp);
+prolog_key.ps_prolog.states.force_linear_sample_interp ||
+prolog_key.ps_prolog.states.force_persp_center_interp ||
+prolog_key.ps_prolog.states.force_linear_center_interp);
 
if 

[Mesa-dev] [PATCH 6/8] radeonsi: remove an obsolete comment

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

It's not true.
---
 src/gallium/drivers/radeonsi/si_shader.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index a408dee..77d1a8b 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1281,11 +1281,6 @@ static unsigned select_interp_param(struct 
si_shader_context *ctx,
if (!ctx->is_monolithic)
return param;
 
-   /* If the shader doesn't use center/centroid, just return the parameter.
-*
-* If the shader only uses one set of (i,j), "si_emit_spi_ps_input" can
-* switch between center/centroid and sample without shader changes.
-*/
if (ctx->shader->key.ps.prolog.force_persp_sample_interp) {
switch (param) {
case SI_PARAM_PERSP_CENTROID:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/8] RadeonSI: Interpolation optimizations and more

2016-06-30 Thread Marek Olšák
Hi,

These mainly reduce the number of (i,j) that the hardware has to
compute for each pixel shader, which should increase the PS launch
rate in those cases.

There are also some codegen improvements for interpolation and a few
interp-unrelated but shader-related changes.

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Drop -fno-builtin-memcmp.

2016-06-30 Thread Roland Scheidegger
Am 01.07.2016 um 00:59 schrieb Matt Turner:
> According to the referenced bug report, gcc-4.5 and newer do not inline
> memcmp(). I see no difference in performance of ipers with llvmpipe on a
> Sandybridge (which does not have "Enhanced REP MOVSB/STOSB") by removing
> this flag.
> 
> I attempted to confirm the problem with gcc-4.4, but it fails to compile
> for quite a few different reasons.
> ---
>  configure.ac | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index bd8842d..95cfc36 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -304,10 +304,6 @@ if test "x$GCC" = xyes; then
>  # We don't want floating-point math functions to set errno or trap
>  CFLAGS="$CFLAGS -fno-math-errno -fno-trapping-math"
>  
> -# gcc's builtin memcmp is slower than glibc's
> -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> -CFLAGS="$CFLAGS -fno-builtin-memcmp"
> -
>  # Flags to help ensure that certain portions of the code -- and only 
> those
>  # portions -- can be built with MSVC:
>  # - src/util, src/gallium/auxiliary, rc/gallium/drivers/llvmpipe, and
> @@ -343,10 +339,6 @@ if test "x$GXX" = xyes; then
>  
>  # Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where needed.
>  CXXFLAGS=$save_CXXFLAGS
> -
> -# gcc's builtin memcmp is slower than glibc's
> -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> -CXXFLAGS="$CXXFLAGS -fno-builtin-memcmp"
>  fi
>  
>  AC_SUBST([MSVC2013_COMPAT_CFLAGS])
> 

FWIW we're using gcc 4.4.3...
Technically the configure script only requires gcc 4.2 (albeit some
parts have higher requirements) so if it doesn't compile that's a bug.

But we're using scons to build so this should be ok (albeit I think
different flags in configure.ac and scons are generally not a really
good idea).

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] radeonsi: enable string markers and record apitrace call numbers

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_debug.c |  4 
 src/gallium/drivers/radeonsi/si_pipe.c  | 20 +++-
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 112e686..220ce13 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -814,6 +814,10 @@ void si_check_vm_faults(struct r600_common_context *ctx,
fprintf(f, "Device name: %s\n\n", screen->get_name(screen));
fprintf(f, "Failing VM page: 0x%08x\n\n", addr);
 
+   if (sctx->apitrace_call_number)
+   fprintf(f, "Last apitrace call: %u\n\n",
+   sctx->apitrace_call_number);
+
switch (ring) {
case RING_GFX:
si_dump_debug_state(>b.b, f, 0);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 6c88fe3..f15e589 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -31,6 +31,7 @@
 #include "util/u_memory.h"
 #include "util/u_suballoc.h"
 #include "vl/vl_decoder.h"
+#include "../ddebug/dd_util.h"
 
 #define SI_LLVM_DEFAULT_FEATURES \
"+DumpCode,+vgpr-spilling,-fp32-denormals,+fp64-denormals"
@@ -106,6 +107,22 @@ si_amdgpu_get_reset_status(struct pipe_context *ctx)
return sctx->b.ws->ctx_query_reset_status(sctx->b.ctx);
 }
 
+/* Apitrace profiling:
+ *   1) qapitrace : Tools -> Profile: Measure CPU & GPU times
+ *   2) In the middle panel, zoom in (mouse wheel) on some bad draw call
+ *  and remember its number.
+ *   3) In Mesa, enable queries and performance counters around that draw
+ *  call and print the results.
+ *   4) glretrace --benchmark --markers ..
+ */
+static void si_emit_string_marker(struct pipe_context *ctx,
+ const char *string, int len)
+{
+   struct si_context *sctx = (struct si_context *)ctx;
+
+   dd_parse_apitrace_marker(string, len, >apitrace_call_number);
+}
+
 static struct pipe_context *si_create_context(struct pipe_screen *screen,
   void *priv, unsigned flags)
 {
@@ -125,6 +142,7 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
sctx->b.b.screen = screen; /* this must be set first */
sctx->b.b.priv = priv;
sctx->b.b.destroy = si_destroy_context;
+   sctx->b.b.emit_string_marker = si_emit_string_marker;
sctx->b.set_atom_dirty = (void *)si_set_atom_dirty;
sctx->screen = sscreen; /* Easy accessing of screen/winsys. */
sctx->is_debug = (flags & PIPE_CONTEXT_DEBUG) != 0;
@@ -361,6 +379,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
case PIPE_CAP_GENERATE_MIPMAP:
case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
+   case PIPE_CAP_STRING_MARKER:
return 1;
 
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
@@ -413,7 +432,6 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_DRAW_PARAMETERS:
case PIPE_CAP_MULTI_DRAW_INDIRECT:
case PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS:
-   case PIPE_CAP_STRING_MARKER:
case PIPE_CAP_QUERY_BUFFER_OBJECT:
case PIPE_CAP_CULL_DISTANCE:
case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 3aff0ac..9d15cbf 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -330,6 +330,7 @@ struct si_context {
struct r600_resource*trace_buf;
unsignedtrace_id;
uint64_tdmesg_timestamp;
+   unsignedapitrace_call_number;
 
/* Other state */
bool need_check_render_feedback;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] ddebug: record and dump apitrace call numbers

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/ddebug/dd_context.c |  4 +++-
 src/gallium/drivers/ddebug/dd_draw.c|  4 
 src/gallium/drivers/ddebug/dd_pipe.h|  1 +
 src/gallium/drivers/ddebug/dd_util.h| 23 +++
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/ddebug/dd_context.c 
b/src/gallium/drivers/ddebug/dd_context.c
index 5fe423b..98475b9 100644
--- a/src/gallium/drivers/ddebug/dd_context.c
+++ b/src/gallium/drivers/ddebug/dd_context.c
@@ -665,9 +665,11 @@ static void
 dd_context_emit_string_marker(struct pipe_context *_pipe,
   const char *string, int len)
 {
-   struct pipe_context *pipe = dd_context(_pipe)->pipe;
+   struct dd_context *dctx = dd_context(_pipe);
+   struct pipe_context *pipe = dctx->pipe;
 
pipe->emit_string_marker(pipe, string, len);
+   dd_parse_apitrace_marker(string, len, >apitrace_call_number);
 }
 
 static void
diff --git a/src/gallium/drivers/ddebug/dd_draw.c 
b/src/gallium/drivers/ddebug/dd_draw.c
index f0c8887..22337e0 100644
--- a/src/gallium/drivers/ddebug/dd_draw.c
+++ b/src/gallium/drivers/ddebug/dd_draw.c
@@ -99,6 +99,10 @@ dd_get_file_stream(struct dd_context *dctx)
fprintf(f, "Driver vendor: %s\n", screen->get_vendor(screen));
fprintf(f, "Device vendor: %s\n", screen->get_device_vendor(screen));
fprintf(f, "Device name: %s\n\n", screen->get_name(screen));
+
+   if (dctx->apitrace_call_number)
+  fprintf(f, "Last apitrace call: %u\n\n",
+  dctx->apitrace_call_number);
return f;
 }
 
diff --git a/src/gallium/drivers/ddebug/dd_pipe.h 
b/src/gallium/drivers/ddebug/dd_pipe.h
index c9bbd56..f94303d 100644
--- a/src/gallium/drivers/ddebug/dd_pipe.h
+++ b/src/gallium/drivers/ddebug/dd_pipe.h
@@ -114,6 +114,7 @@ struct dd_context
float tess_default_levels[6];
 
unsigned num_draw_calls;
+   unsigned apitrace_call_number;
 };
 
 
diff --git a/src/gallium/drivers/ddebug/dd_util.h 
b/src/gallium/drivers/ddebug/dd_util.h
index 093bdff..3649644 100644
--- a/src/gallium/drivers/ddebug/dd_util.h
+++ b/src/gallium/drivers/ddebug/dd_util.h
@@ -71,4 +71,27 @@ dd_get_debug_file(bool verbose)
return f;
 }
 
+static inline void
+dd_parse_apitrace_marker(const char *string, int len, unsigned *call_number)
+{
+   unsigned num;
+   char *s;
+
+   if (len <= 0)
+  return;
+
+   /* Make it zero-terminated. */
+   s = alloca(len + 1);
+   memcpy(s, string, len);
+   s[len] = 0;
+
+   /* Parse the number. */
+   errno = 0;
+   num = strtol(s, NULL, 10);
+   if (errno)
+  return;
+
+   *call_number = num;
+}
+
 #endif /* DD_UTIL_H */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] radeonsi: print LLVM IRs to ddebug logs

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

Getting LLVM IRs of hanging shaders have never been easier.
---
 src/gallium/drivers/radeon/r600_pipe_common.c   |  1 +
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 src/gallium/drivers/radeonsi/si_pipe.c  |  3 +++
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_shader.c| 12 
 src/gallium/drivers/radeonsi/si_state_shaders.c |  9 -
 6 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 5e981d6..74674e7 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -67,6 +67,7 @@ void radeon_shader_binary_clean(struct radeon_shader_binary 
*b)
FREE(b->global_symbol_offsets);
FREE(b->relocs);
FREE(b->disasm_string);
+   FREE(b->llvm_ir_string);
 }
 
 /*
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index c145dc3..1ad69f8 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -155,6 +155,7 @@ struct radeon_shader_binary {
 
/** Disassembled shader in a string. */
char *disasm_string;
+   char *llvm_ir_string;
 };
 
 void radeon_shader_binary_init(struct radeon_shader_binary *b);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index f15e589..ad2a86a 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -139,6 +139,9 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
if (sscreen->b.debug_flags & DBG_CHECK_VM)
flags |= PIPE_CONTEXT_DEBUG;
 
+   if (flags & PIPE_CONTEXT_DEBUG)
+   sscreen->record_llvm_ir = true; /* racy but not critical */
+
sctx->b.b.screen = screen; /* this must be set first */
sctx->b.b.priv = priv;
sctx->b.b.destroy = si_destroy_context;
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 9d15cbf..fc7e73e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -87,6 +87,7 @@ struct si_screen {
 
/* Whether shaders are monolithic (1-part) or separate (3-part). */
booluse_monolithic_shaders;
+   boolrecord_llvm_ir;
 
pipe_mutex  shader_parts_mutex;
struct si_shader_part   *vs_prologs;
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index bba6a55..e2aae85 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6159,6 +6159,12 @@ void si_shader_dump(struct si_screen *sscreen, struct 
si_shader *shader,
struct pipe_debug_callback *debug, unsigned processor,
FILE *file)
 {
+   if (file != stderr && shader->binary.llvm_ir_string) {
+   fprintf(file, "\n%s - main shader part - LLVM IR:\n\n",
+   si_get_shader_name(shader, processor));
+   fprintf(file, "%s\n", shader->binary.llvm_ir_string);
+   }
+
if (file != stderr ||
(r600_can_dump_shader(>b, processor) &&
 !(sscreen->b.debug_flags & DBG_NO_ASM))) {
@@ -6204,6 +6210,12 @@ int si_compile_llvm(struct si_screen *sscreen,
}
}
 
+   if (sscreen->record_llvm_ir) {
+   char *ir = LLVMPrintModuleToString(mod);
+   binary->llvm_ir_string = strdup(ir);
+   LLVMDisposeMessage(ir);
+   }
+
if (!si_replace_shader(count, binary)) {
r = radeon_llvm_compile(mod, binary, tm, debug);
if (r)
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 4bcdeb6..ed14288 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -96,6 +96,8 @@ static uint32_t *read_chunk(uint32_t *ptr, void **data, 
unsigned *size)
 {
*size = *ptr++;
assert(*data == NULL);
+   if (!*size)
+   return ptr;
*data = malloc(*size);
return read_data(ptr, *data, *size);
 }
@@ -110,6 +112,8 @@ static void *si_get_shader_binary(struct si_shader *shader)
unsigned relocs_size = shader->binary.reloc_count *
   sizeof(shader->binary.relocs[0]);
unsigned disasm_size = strlen(shader->binary.disasm_string) + 1;
+   unsigned llvm_ir_size = shader->binary.llvm_ir_string ?
+   strlen(shader->binary.llvm_ir_string) + 1 : 0;
unsigned size =
4 + /* total size */
4 + /* 

[Mesa-dev] [PATCH 0/6] RadeonSI & ddebug: Apitrace interactions and better debugging

2016-06-30 Thread Marek Olšák
Hi,

This series adds apitrace call tracking into ddebug and radeonsi and
other improvements.

It will improve our debugging and profiling possibilities. Just set
GALLIUM_DDEBUG="apitrace [draw call number]" and you will get
a complete ddebug log with TGSI, LLVM IR (new!), and asm. Both radeonsi
and ddebug know the exact draw call number. It can be very powerful
with apitrace profiling or just debugging bad draw calls.

Also, the LLVM IR is newly printed into ddebug logs for all cases
(GPU hang, VM fault, apitrace mode, etc).

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] ddebug: implement pipe_context::generate_mipmap

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/ddebug/dd_draw.c | 53 +++-
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/ddebug/dd_draw.c 
b/src/gallium/drivers/ddebug/dd_draw.c
index 22337e0..f0f6fb6 100644
--- a/src/gallium/drivers/ddebug/dd_draw.c
+++ b/src/gallium/drivers/ddebug/dd_draw.c
@@ -43,6 +43,7 @@ enum call_type
CALL_CLEAR_BUFFER,
CALL_CLEAR_RENDER_TARGET,
CALL_CLEAR_DEPTH_STENCIL,
+   CALL_GENERATE_MIPMAP,
 };
 
 struct call_resource_copy_region
@@ -72,6 +73,15 @@ struct call_clear_buffer
int clear_value_size;
 };
 
+struct call_generate_mipmap {
+   struct pipe_resource *res;
+   enum pipe_format format;
+   unsigned base_level;
+   unsigned last_level;
+   unsigned first_layer;
+   unsigned last_layer;
+};
+
 struct dd_call
 {
enum call_type type;
@@ -84,6 +94,7 @@ struct dd_call
   struct pipe_resource *flush_resource;
   struct call_clear clear;
   struct call_clear_buffer clear_buffer;
+  struct call_generate_mipmap generate_mipmap;
} info;
 };
 
@@ -425,6 +436,13 @@ dd_dump_blit(struct dd_context *dctx, struct 
pipe_blit_info *info, FILE *f)
 }
 
 static void
+dd_dump_generate_mipmap(struct dd_context *dctx, FILE *f)
+{
+   fprintf(f, "%s:\n", __func__+8);
+   /* TODO */
+}
+
+static void
 dd_dump_flush_resource(struct dd_context *dctx, struct pipe_resource *res,
FILE *f)
 {
@@ -521,6 +539,10 @@ dd_dump_call(struct dd_context *dctx, struct dd_call 
*call, unsigned flags)
   break;
case CALL_CLEAR_DEPTH_STENCIL:
   dd_dump_clear_depth_stencil(dctx, f);
+  break;
+   case CALL_GENERATE_MIPMAP:
+  dd_dump_generate_mipmap(dctx, f);
+  break;
}
 
dd_dump_driver_state(dctx, f, flags);
@@ -723,6 +745,35 @@ dd_context_blit(struct pipe_context *_pipe, const struct 
pipe_blit_info *info)
dd_after_draw(dctx, );
 }
 
+static boolean
+dd_context_generate_mipmap(struct pipe_context *_pipe,
+   struct pipe_resource *res,
+   enum pipe_format format,
+   unsigned base_level,
+   unsigned last_level,
+   unsigned first_layer,
+   unsigned last_layer)
+{
+   struct dd_context *dctx = dd_context(_pipe);
+   struct pipe_context *pipe = dctx->pipe;
+   struct dd_call call;
+   boolean result;
+
+   call.type = CALL_GENERATE_MIPMAP;
+   call.info.generate_mipmap.res = res;
+   call.info.generate_mipmap.format = format;
+   call.info.generate_mipmap.base_level = base_level;
+   call.info.generate_mipmap.last_level = last_level;
+   call.info.generate_mipmap.first_layer = first_layer;
+   call.info.generate_mipmap.last_layer = last_layer;
+
+   dd_before_draw(dctx);
+   result = pipe->generate_mipmap(pipe, res, format, base_level, last_level,
+  first_layer, last_layer);
+   dd_after_draw(dctx, );
+   return result;
+}
+
 static void
 dd_context_flush_resource(struct pipe_context *_pipe,
   struct pipe_resource *resource)
@@ -829,5 +880,5 @@ dd_init_draw_functions(struct dd_context *dctx)
CTX_INIT(clear_depth_stencil);
CTX_INIT(clear_buffer);
CTX_INIT(flush_resource);
-   /* launch_grid */
+   CTX_INIT(generate_mipmap);
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] ddebug: add an option to dump info about a specific apitrace call

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/ddebug/dd_draw.c   |  8 
 src/gallium/drivers/ddebug/dd_pipe.h   |  4 +++-
 src/gallium/drivers/ddebug/dd_screen.c | 20 ++--
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/ddebug/dd_draw.c 
b/src/gallium/drivers/ddebug/dd_draw.c
index f0f6fb6..f8047cc 100644
--- a/src/gallium/drivers/ddebug/dd_draw.c
+++ b/src/gallium/drivers/ddebug/dd_draw.c
@@ -617,6 +617,7 @@ dd_context_flush(struct pipe_context *_pipe,
"GPU hang detected in pipe->flush()");
   break;
case DD_DUMP_ALL_CALLS:
+   case DD_DUMP_APITRACE_CALL:
   pipe->flush(pipe, fence, flags);
   break;
default:
@@ -659,6 +660,13 @@ dd_after_draw(struct dd_context *dctx, struct dd_call 
*call)
 pipe->flush(pipe, NULL, 0);
  dd_dump_call(dctx, call, 0);
  break;
+  case DD_DUMP_APITRACE_CALL:
+ if (dscreen->apitrace_dump_call == dctx->apitrace_call_number) {
+dd_dump_call(dctx, call, 0);
+/* No need to continue. */
+exit(0);
+ }
+ break;
   default:
  assert(0);
   }
diff --git a/src/gallium/drivers/ddebug/dd_pipe.h 
b/src/gallium/drivers/ddebug/dd_pipe.h
index f94303d..69d5c4e 100644
--- a/src/gallium/drivers/ddebug/dd_pipe.h
+++ b/src/gallium/drivers/ddebug/dd_pipe.h
@@ -35,7 +35,8 @@
 
 enum dd_mode {
DD_DETECT_HANGS,
-   DD_DUMP_ALL_CALLS
+   DD_DUMP_ALL_CALLS,
+   DD_DUMP_APITRACE_CALL,
 };
 
 struct dd_screen
@@ -47,6 +48,7 @@ struct dd_screen
bool no_flush;
bool verbose;
unsigned skip_count;
+   unsigned apitrace_dump_call;
 };
 
 struct dd_query
diff --git a/src/gallium/drivers/ddebug/dd_screen.c 
b/src/gallium/drivers/ddebug/dd_screen.c
index 5a883bd..db535e9 100644
--- a/src/gallium/drivers/ddebug/dd_screen.c
+++ b/src/gallium/drivers/ddebug/dd_screen.c
@@ -292,9 +292,14 @@ ddebug_screen_create(struct pipe_screen *screen)
struct dd_screen *dscreen;
const char *option = debug_get_option("GALLIUM_DDEBUG", NULL);
bool dump_always = option && !strncmp(option, "always", 6);
+   bool dump_apitrace = option && !strncmp(option, "apitrace", 8);
bool no_flush = option && strstr(option, "noflush");
bool help = option && !strcmp(option, "help");
unsigned timeout = 0;
+   unsigned apitrace_dump_call = 0;
+
+   if (dump_apitrace)
+  no_flush = true;
 
if (help) {
   puts("Gallium driver debugger");
@@ -310,6 +315,9 @@ ddebug_screen_create(struct pipe_screen *screen)
   puts("fence timeout and dump context and driver information into");
   puts("$HOME/"DD_DIR"/ when a hang is detected.");
   puts("");
+  puts("  GALLIUM_DDEBUG=\"apitrace [call#] [verbose]\"");
+  puts("Dump apitrace draw call information into $HOME/"DD_DIR"/. 
Implies 'noflush'.");
+  puts("");
   puts("  If 'noflush' is specified, do not flush on every draw call. In 
hang");
   puts("  detection mode, this only detect hangs in pipe->flush.");
   puts("  If 'verbose' is specified, additional information is written to 
stderr.");
@@ -322,7 +330,10 @@ ddebug_screen_create(struct pipe_screen *screen)
 
if (!option)
   return screen;
-   if (!dump_always && sscanf(option, "%u", ) != 1)
+   if (!dump_always && !dump_apitrace && sscanf(option, "%u", ) != 1)
+  return screen;
+
+   if (dump_apitrace && sscanf(option+8, "%u", _dump_call) != 1)
   return screen;
 
dscreen = CALLOC_STRUCT(dd_screen);
@@ -363,9 +374,11 @@ ddebug_screen_create(struct pipe_screen *screen)
 
dscreen->screen = screen;
dscreen->timeout_ms = timeout;
-   dscreen->mode = dump_always ? DD_DUMP_ALL_CALLS : DD_DETECT_HANGS;
+   dscreen->mode = dump_always ? DD_DUMP_ALL_CALLS :
+   dump_apitrace ? DD_DUMP_APITRACE_CALL : DD_DETECT_HANGS;
dscreen->no_flush = no_flush;
dscreen->verbose = strstr(option, "verbose") != NULL;
+   dscreen->apitrace_dump_call = apitrace_dump_call;
 
switch (dscreen->mode) {
case DD_DUMP_ALL_CALLS:
@@ -375,6 +388,9 @@ ddebug_screen_create(struct pipe_screen *screen)
   fprintf(stderr, "Gallium debugger active. "
   "The hang detection timout is %i ms.\n", timeout);
   break;
+   case DD_DUMP_APITRACE_CALL:
+  fprintf(stderr, "Gallium debugger active. Going to dump an apitrace 
call.\n");
+  break;
default:
   assert(0);
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] ddebug: implement emit_string_marker

2016-06-30 Thread Marek Olšák
From: Marek Olšák 

and remove some obsolete comments
---
 src/gallium/drivers/ddebug/dd_context.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/ddebug/dd_context.c 
b/src/gallium/drivers/ddebug/dd_context.c
index f618f0c..5fe423b 100644
--- a/src/gallium/drivers/ddebug/dd_context.c
+++ b/src/gallium/drivers/ddebug/dd_context.c
@@ -662,6 +662,15 @@ dd_context_get_device_reset_status(struct pipe_context 
*_pipe)
 }
 
 static void
+dd_context_emit_string_marker(struct pipe_context *_pipe,
+  const char *string, int len)
+{
+   struct pipe_context *pipe = dd_context(_pipe)->pipe;
+
+   pipe->emit_string_marker(pipe, string, len);
+}
+
+static void
 dd_context_dump_debug_state(struct pipe_context *_pipe, FILE *stream,
 unsigned flags)
 {
@@ -761,15 +770,13 @@ dd_context_create(struct dd_screen *dscreen, struct 
pipe_context *pipe)
CTX_INIT(memory_barrier);
/* create_video_codec */
/* create_video_buffer */
-   /* create_compute_state */
-   /* bind_compute_state */
-   /* delete_compute_state */
/* set_compute_resources */
/* set_global_binding */
CTX_INIT(get_sample_position);
CTX_INIT(invalidate_resource);
CTX_INIT(get_device_reset_status);
CTX_INIT(dump_debug_state);
+   CTX_INIT(emit_string_marker);
 
dd_init_draw_functions(dctx);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: Drop -fno-builtin-memcmp.

2016-06-30 Thread Ian Romanick
On 06/30/2016 03:59 PM, Matt Turner wrote:
> According to the referenced bug report, gcc-4.5 and newer do not inline
> memcmp(). I see no difference in performance of ipers with llvmpipe on a
> Sandybridge (which does not have "Enhanced REP MOVSB/STOSB") by removing
> this flag.
> 
> I attempted to confirm the problem with gcc-4.4, but it fails to compile
> for quite a few different reasons.

You'd probably have to go back to a suitably old version of Mesa, and
it's probably not worth that much effort.

Series is

Reviewed-by: Ian Romanick 

> ---
>  configure.ac | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index bd8842d..95cfc36 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -304,10 +304,6 @@ if test "x$GCC" = xyes; then
>  # We don't want floating-point math functions to set errno or trap
>  CFLAGS="$CFLAGS -fno-math-errno -fno-trapping-math"
>  
> -# gcc's builtin memcmp is slower than glibc's
> -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> -CFLAGS="$CFLAGS -fno-builtin-memcmp"
> -
>  # Flags to help ensure that certain portions of the code -- and only 
> those
>  # portions -- can be built with MSVC:
>  # - src/util, src/gallium/auxiliary, rc/gallium/drivers/llvmpipe, and
> @@ -343,10 +339,6 @@ if test "x$GXX" = xyes; then
>  
>  # Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where needed.
>  CXXFLAGS=$save_CXXFLAGS
> -
> -# gcc's builtin memcmp is slower than glibc's
> -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> -CXXFLAGS="$CXXFLAGS -fno-builtin-memcmp"
>  fi
>  
>  AC_SUBST([MSVC2013_COMPAT_CFLAGS])
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 04/11] glsl: Pack integer and double varyings as flat even if interpolation mode is none

2016-06-30 Thread Kenneth Graunke
On Thursday, June 16, 2016 12:07:36 PM PDT Ian Romanick wrote:
> From: Ian Romanick 
> 
> v2: Also update varying_matches::compute_packing_class().  Suggested by
> Timothy Arceri.
> 
> Signed-off-by: Ian Romanick 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
> Cc: "12.0" 
> Cc: Gregory Hainaut 
> Cc: Ilia Mirkin 

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 03/11] mesa: Strip arrayness from interface block names in some IO validation

2016-06-30 Thread Kenneth Graunke
On Thursday, June 16, 2016 12:06:56 PM PDT Ian Romanick wrote:
> From: Ian Romanick 
> 
> Outputs from the vertex shader need to be able to match
> per-vertex-arrayed inputs of later stages.  Acomplish this by stripping
> one level of arrayness from the names and types of outputs going to a
> per-vertex-arrayed stage.
> 
> v2: Add missing checks for TESS_EVAL->GEOMETRY.  Noticed by Timothy
> Arceri.
> 
> Signed-off-by: Ian Romanick 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
> Cc: "12.0" 
> Cc: Gregory Hainaut 
> Cc: Ilia Mirkin 

This is an annoying amount of code, but it looks OK to me and I don't
have any ideas of how to do it better.

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: Add -fno-math-errno -fno-trapping-math to CXXFLAGS.

2016-06-30 Thread Matt Turner
Not sure why I forgot to add them to CXXFLAGS in commit f55c408067 or
commit 875458b778. Cuts about 1k of .text.

   text data  bss  dec  hex  filename
5806354   28781629384  6123554   5d7022  i965_dri.so before
5805497   28774429384  6122625   5d6c81  i965_dri.so after
---
 configure.ac | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/configure.ac b/configure.ac
index 95cfc36..e8cd97f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -339,6 +339,9 @@ if test "x$GXX" = xyes; then
 
 # Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where needed.
 CXXFLAGS=$save_CXXFLAGS
+
+# We don't want floating-point math functions to set errno or trap
+CXXFLAGS="$CXXFLAGS -fno-math-errno -fno-trapping-math"
 fi
 
 AC_SUBST([MSVC2013_COMPAT_CFLAGS])
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/24] i965: Update assertion to account for Gen < 7

2016-06-30 Thread Ian Romanick
On 06/30/2016 03:37 PM, Francisco Jerez wrote:
> Ian Romanick  writes:
> 
>> From: Ian Romanick 
>>
>> Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
>> assertion assumed the Gen7+ accumulator rules.  A future patch will
>> allow this instruction on at least Gen6, so update the assertion.
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 2f473cc..0a48df9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -43,6 +43,9 @@
>>  
>>  using namespace brw;
>>  
>> +static unsigned get_fpu_lowered_simd_width(const struct brw_device_info *,
>> +   const fs_inst *);
>> +
>>  void
>>  fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg ,
>>const fs_reg *src, unsigned sources)
>> @@ -3658,7 +3661,8 @@ fs_visitor::lower_integer_multiplication()
>>  
>>} else if (inst->opcode == SHADER_OPCODE_MULH) {
>>   /* Should have been lowered to 8-wide. */
>> - assert(inst->exec_size <= 8);
>> + assert(inst->exec_size <=
>> +(devinfo->gen >= 7 ? 8 : 
>> get_fpu_lowered_simd_width(devinfo, inst)));
> 
> Instead of get_fpu_lowered_simd_width you could call
> get_lowered_simd_width, which expands to same conditional expression you
> have put in the assertion for opcode == SHADER_OPCODE_MULH.

Yes.  I like that much better.

>>   const fs_reg acc = retype(brw_acc_reg(inst->exec_size),
>> inst->dst.type);
>>   fs_inst *mul = ibld.MUL(acc, inst->src[0], inst->src[1]);
>> -- 
>> 2.5.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: Drop -fno-builtin-memcmp.

2016-06-30 Thread Matt Turner
According to the referenced bug report, gcc-4.5 and newer do not inline
memcmp(). I see no difference in performance of ipers with llvmpipe on a
Sandybridge (which does not have "Enhanced REP MOVSB/STOSB") by removing
this flag.

I attempted to confirm the problem with gcc-4.4, but it fails to compile
for quite a few different reasons.
---
 configure.ac | 8 
 1 file changed, 8 deletions(-)

diff --git a/configure.ac b/configure.ac
index bd8842d..95cfc36 100644
--- a/configure.ac
+++ b/configure.ac
@@ -304,10 +304,6 @@ if test "x$GCC" = xyes; then
 # We don't want floating-point math functions to set errno or trap
 CFLAGS="$CFLAGS -fno-math-errno -fno-trapping-math"
 
-# gcc's builtin memcmp is slower than glibc's
-# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
-CFLAGS="$CFLAGS -fno-builtin-memcmp"
-
 # Flags to help ensure that certain portions of the code -- and only those
 # portions -- can be built with MSVC:
 # - src/util, src/gallium/auxiliary, rc/gallium/drivers/llvmpipe, and
@@ -343,10 +339,6 @@ if test "x$GXX" = xyes; then
 
 # Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where needed.
 CXXFLAGS=$save_CXXFLAGS
-
-# gcc's builtin memcmp is slower than glibc's
-# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
-CXXFLAGS="$CXXFLAGS -fno-builtin-memcmp"
 fi
 
 AC_SUBST([MSVC2013_COMPAT_CFLAGS])
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-06-30 Thread Ian Romanick
On 06/30/2016 03:33 PM, Francisco Jerez wrote:
> Matt Turner  writes:
> 
>> On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
>>> From: Ian Romanick 
>>>
>>> This uses one less instruction.
>>
>> Add FBH to the list of stupid instructions.
>>
>>> Signed-off-by: Ian Romanick 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.h   |  4 
>>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 
>>> +++-
>>>  src/mesa/drivers/dri/i965/brw_vec4.h |  4 
>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 22 
>>>  6 files changed, 61 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
>>> b/src/mesa/drivers/dri/i965/brw_fs.h
>>> index 4237197..22ce092 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>>> @@ -237,6 +237,10 @@ public:
>>>   nir_tex_instr *instr);
>>> void nir_emit_jump(const brw::fs_builder ,
>>>nir_jump_instr *instr);
>>> +   void nir_emit_find_msb_using_lzd(const brw::fs_builder ,
>>> +const fs_reg ,
>>> +const fs_reg ,
>>> +bool is_signed);
>>> fs_reg get_nir_src(const nir_src );
>>> fs_reg get_nir_src_imm(const nir_src );
>>> fs_reg get_nir_dest(const nir_dest );
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> index d25d26a..bda4a26 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> @@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
>>> dispatch_width)
>>>   /* FBL only supports UD type for dst. */
>>>   brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
>>>   break;
>>> +  case BRW_OPCODE_LZD:
>>> + brw_LZD(p, dst, src[0]);
>>> + break;
>>>case BRW_OPCODE_CBIT:
>>>   assert(devinfo->gen >= 7);
>>>   /* CBIT only supports UD type for dst. */
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index b3f5dfd..f15bf3e 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -617,6 +617,25 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
>>> *instr,
>>>  }
>>>
>>>  void
>>> +fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder ,
>>> +const fs_reg ,
>>> +const fs_reg ,
>>> +bool is_signed)
>>> +{
>>> +   fs_inst *inst;
>>> +
>>> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
>>> +
>>> +   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
>>> +* from the LSB side. Subtract the result from 31 to convert the MSB
>>> +* count into an LSB count.  If no bits are set, LZD will return 32.
>>> +* 31-32 = -1, which is exactly what findMSB() is supposed to return.
>>> +*/
>>> +   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), 
>>> brw_imm_d(31));
>>> +   inst->src[0].negate = true;
>>> +}
>>
>> I'd personally be inclined to just inline these functions. I know they
>> grow somewhat in the next patches... whatever your preference is.
> 
> It seems to grow quite a bit in PATCH 16, and it's used in multiple
> places, right?  How about we keep it factored out but make it a
> stand-alone function instead of a visitor method?  It doesn't look like
> it uses *any* internal or external data structures of fs_visitor, it
> doesn't even dereference the 'this' pointer at all AFAICT, so you could
> likely improve encapsulation somewhat by making it a static function
> local to the brw_*_nir.cpp source files.

I originally had it inline, but I moved it out after seeing it grow.  It
didn't occur to me to have it not be a class function.  I can make that
change.

>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/24] i965: Use LZD to implement nir_op_ifind_msb on Gen < 7

2016-06-30 Thread Ian Romanick
On 06/30/2016 03:20 PM, Matt Turner wrote:
> On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 50 ++--
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 52 
>> +++---
>>  2 files changed, 81 insertions(+), 21 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index f15bf3e..f8db28a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -623,8 +623,32 @@ fs_visitor::nir_emit_find_msb_using_lzd(const 
>> fs_builder ,
>>  bool is_signed)
>>  {
>> fs_inst *inst;
>> +   fs_reg temp = src;
>>
>> -   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
>> +   if (is_signed) {
>> +  /* LZD of an absolute value source almost always does the right
>> +   * thing.  There are two problem values:
>> +   *
>> +   * * 0x8000.  Since abs(0x8000) == 0x8000, LZD returns
>> +   *   0.  However, findMSB(int(0x8000)) == 30.
>> +   *
>> +   * * 0x.  Since abs(0x) == 1, LZD returns
>> +   *   31.  Section 8.8 (Integer Functions) of the GLSL 4.50 spec says:
>> +   *
>> +   *For a value of zero or negative one, -1 will be returned.
>> +   *
>> +   * For all negative number cases, including 0x8000 and
>> +   * 0x, the correct value is obtained from LZD if instead of
> 
> Interesting, both the G45 and IVB docs (I didn't check others, I
> suspect they say the same) say
> 
> "If the source is signed, the abs source modifier must be used to
> convert any negative source value to a positive value."
> 
> I suppose that means you're supposed to retype the src to UD below?

Yes, probably.  The above instructions make sure that it is positive,
but we could still have a value of 0x8000.  I can add the retype()
below.

>> +   * negating the (already negative) value the logical-not is used.  A
>> +   * conditonal logical-not can be achieved in two instructions.
>> +   */
>> +  temp = vgrf(glsl_type::int_type);
>> +
>> +  bld.ASR(temp, src, brw_imm_d(31));
>> +  bld.XOR(temp, temp, src);
>> +   }
>> +
>> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), temp);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-06-30 Thread Matt Turner
On Thu, Jun 30, 2016 at 3:33 PM, Francisco Jerez  wrote:
> Matt Turner  writes:
>
>> On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
>>> From: Ian Romanick 
>>>
>>> This uses one less instruction.
>>
>> Add FBH to the list of stupid instructions.
>>
>>> Signed-off-by: Ian Romanick 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.h   |  4 
>>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 
>>> +++-
>>>  src/mesa/drivers/dri/i965/brw_vec4.h |  4 
>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 22 
>>>  6 files changed, 61 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
>>> b/src/mesa/drivers/dri/i965/brw_fs.h
>>> index 4237197..22ce092 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>>> @@ -237,6 +237,10 @@ public:
>>>   nir_tex_instr *instr);
>>> void nir_emit_jump(const brw::fs_builder ,
>>>nir_jump_instr *instr);
>>> +   void nir_emit_find_msb_using_lzd(const brw::fs_builder ,
>>> +const fs_reg ,
>>> +const fs_reg ,
>>> +bool is_signed);
>>> fs_reg get_nir_src(const nir_src );
>>> fs_reg get_nir_src_imm(const nir_src );
>>> fs_reg get_nir_dest(const nir_dest );
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> index d25d26a..bda4a26 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> @@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
>>> dispatch_width)
>>>   /* FBL only supports UD type for dst. */
>>>   brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
>>>   break;
>>> +  case BRW_OPCODE_LZD:
>>> + brw_LZD(p, dst, src[0]);
>>> + break;
>>>case BRW_OPCODE_CBIT:
>>>   assert(devinfo->gen >= 7);
>>>   /* CBIT only supports UD type for dst. */
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index b3f5dfd..f15bf3e 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -617,6 +617,25 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
>>> *instr,
>>>  }
>>>
>>>  void
>>> +fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder ,
>>> +const fs_reg ,
>>> +const fs_reg ,
>>> +bool is_signed)
>>> +{
>>> +   fs_inst *inst;
>>> +
>>> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
>>> +
>>> +   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
>>> +* from the LSB side. Subtract the result from 31 to convert the MSB
>>> +* count into an LSB count.  If no bits are set, LZD will return 32.
>>> +* 31-32 = -1, which is exactly what findMSB() is supposed to return.
>>> +*/
>>> +   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), 
>>> brw_imm_d(31));
>>> +   inst->src[0].negate = true;
>>> +}
>>
>> I'd personally be inclined to just inline these functions. I know they
>> grow somewhat in the next patches... whatever your preference is.
>
> It seems to grow quite a bit in PATCH 16, and it's used in multiple
> places, right?  How about we keep it factored out but make it a
> stand-alone function instead of a visitor method?  It doesn't look like
> it uses *any* internal or external data structures of fs_visitor, it
> doesn't even dereference the 'this' pointer at all AFAICT, so you could
> likely improve encapsulation somewhat by making it a static function
> local to the brw_*_nir.cpp source files.

That sounds like a fine plan to me.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] nv50/ir: optimize ADD3(d, 0x0, b, c) to ADD(d, b, c)

2016-06-30 Thread Samuel Pitoiset



On 07/01/2016 12:44 AM, Ilia Mirkin wrote:

If moveSources doesn't move modifiers, we have a serious problem.
However it looks like it does:

void
Instruction::setSrc(int s, const ValueRef& ref)
{
   setSrc(s, ref.get());
   srcs[s].mod = ref.mod;
}

which is what moveSources calls.


I was not sure about moveSources() because we have two variants and the 
other one doesn't move the modifiers.



On Thu, Jun 30, 2016 at 6:26 PM, Samuel Pitoiset
 wrote:

And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 1cf1fa3..517f779 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1032,7 +1032,17 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
 i->src(0).mod = Modifier(0);
   }
   break;
-
+   case OP_ADD3:
+  if (i->usesFlags())
+ break;


Why? ADD can produce/consume a flag just fine.


Well, this is loosely based on OP_ADD which does exactly the same check.




+  if (imm0.isInteger(0)) {
+ i->op = OP_ADD;
+ for (int k = s; k < 2; k++) {
+i->setSrc(k, i->getSrc(k + 1));
+i->src(k).mod = i->src(k + 1).mod;
+ }


aka

i->moveSources(s + 1, -1) ?


Yes.




+  }
+  break;
case OP_DIV:
   if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
  break;
--
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] nv50/ir: optimize ADD(ADD(a, b), c) to ADD3(a, b, c)

2016-06-30 Thread Ilia Mirkin
On Thu, Jun 30, 2016 at 6:47 PM, Samuel Pitoiset
 wrote:
>
>
> On 07/01/2016 12:40 AM, Ilia Mirkin wrote:
>>
>> Doesn't ADD3 only work for integers? I don't see anything here
>> preventing float adds from being merged here...
>
>
> isOpSupported() should do the job because I check if dtype is float.

Ah, indeed. Missed that.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] nv50/ir: optimize ADD(ADD(a, b), c) to ADD3(a, b, c)

2016-06-30 Thread Samuel Pitoiset



On 07/01/2016 12:40 AM, Ilia Mirkin wrote:

Doesn't ADD3 only work for integers? I don't see anything here
preventing float adds from being merged here...


isOpSupported() should do the job because I check if dtype is float.



On Thu, Jun 30, 2016 at 6:26 PM, Samuel Pitoiset
 wrote:

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3213188..928923c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1531,6 +1531,7 @@ private:
void handleABS(Instruction *);
bool handleADD(Instruction *);
bool tryADDToMADOrSAD(Instruction *, operation toOp);
+   bool tryADDToADD3(Instruction *);
void handleMINMAX(Instruction *);
void handleRCP(Instruction *);
void handleSLCT(Instruction *);
@@ -1604,6 +1605,8 @@ AlgebraicOpt::handleADD(Instruction *add)
   changed = tryADDToMADOrSAD(add, OP_MAD);
if (!changed && prog->getTarget()->isOpSupported(OP_SAD, add->dType))
   changed = tryADDToMADOrSAD(add, OP_SAD);
+   if (!changed && prog->getTarget()->isOpSupported(OP_ADD3, add->dType))
+  changed = tryADDToADD3(add);
return changed;
 }

@@ -1674,6 +1677,58 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, 
operation toOp)
return true;
 }

+// ADD(ADD(a,b), c) -> ADD3(a,b,c)
+bool
+AlgebraicOpt::tryADDToADD3(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+   const Modifier modBad = Modifier(~NV50_IR_MOD_NEG);
+   Modifier mod[4];
+   Value *src;
+   int s;
+
+   if (src0->refCount() == 1 &&
+   src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_ADD)
+  s = 0;
+   else
+   if (src1->refCount() == 1 &&
+   src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_ADD)
+  s = 1;
+   else
+  return false;
+
+   src = add->getSrc(s);
+
+   if (src->getUniqueInsn() && src->getUniqueInsn()->bb != add->bb)
+  return false;
+
+   if (src->getInsn()->saturate)
+  return false;
+
+   if (typeSizeof(add->dType) != typeSizeof(src->getInsn()->dType))
+  return false;
+
+   mod[0] = add->src(0).mod;
+   mod[1] = add->src(1).mod;
+   mod[2] = src->getUniqueInsn()->src(0).mod;
+   mod[3] = src->getUniqueInsn()->src(1).mod;
+
+   if (((mod[0] | mod[1]) | (mod[2] | mod[3])) & modBad)
+  return false;
+
+   add->op = OP_ADD3;
+   add->dType = src->getInsn()->dType;
+   add->sType = src->getInsn()->sType;
+
+   add->setSrc(s, src->getInsn()->getSrc(0));
+   add->src(s).mod = mod[s] ^ mod[2];
+   add->setSrc(2, src->getInsn()->getSrc(1));
+   add->src(2).mod = mod[3];
+
+   return true;
+}
+
 void
 AlgebraicOpt::handleMINMAX(Instruction *minmax)
 {
--
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] nv50/ir: optimize ADD3(d, 0x0, b, c) to ADD(d, b, c)

2016-06-30 Thread Ilia Mirkin
If moveSources doesn't move modifiers, we have a serious problem.
However it looks like it does:

void
Instruction::setSrc(int s, const ValueRef& ref)
{
   setSrc(s, ref.get());
   srcs[s].mod = ref.mod;
}

which is what moveSources calls.


On Thu, Jun 30, 2016 at 6:26 PM, Samuel Pitoiset
 wrote:
> And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 1cf1fa3..517f779 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -1032,7 +1032,17 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
> , int s)
>  i->src(0).mod = Modifier(0);
>}
>break;
> -
> +   case OP_ADD3:
> +  if (i->usesFlags())
> + break;

Why? ADD can produce/consume a flag just fine.

> +  if (imm0.isInteger(0)) {
> + i->op = OP_ADD;
> + for (int k = s; k < 2; k++) {
> +i->setSrc(k, i->getSrc(k + 1));
> +i->src(k).mod = i->src(k + 1).mod;
> + }

aka

i->moveSources(s + 1, -1) ?

> +  }
> +  break;
> case OP_DIV:
>if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
>   break;
> --
> 2.8.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] nv50/ir: optimize ADD(ADD(a, b), c) to ADD3(a, b, c)

2016-06-30 Thread Ilia Mirkin
Doesn't ADD3 only work for integers? I don't see anything here
preventing float adds from being merged here...

On Thu, Jun 30, 2016 at 6:26 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 55 
> ++
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 3213188..928923c 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -1531,6 +1531,7 @@ private:
> void handleABS(Instruction *);
> bool handleADD(Instruction *);
> bool tryADDToMADOrSAD(Instruction *, operation toOp);
> +   bool tryADDToADD3(Instruction *);
> void handleMINMAX(Instruction *);
> void handleRCP(Instruction *);
> void handleSLCT(Instruction *);
> @@ -1604,6 +1605,8 @@ AlgebraicOpt::handleADD(Instruction *add)
>changed = tryADDToMADOrSAD(add, OP_MAD);
> if (!changed && prog->getTarget()->isOpSupported(OP_SAD, add->dType))
>changed = tryADDToMADOrSAD(add, OP_SAD);
> +   if (!changed && prog->getTarget()->isOpSupported(OP_ADD3, add->dType))
> +  changed = tryADDToADD3(add);
> return changed;
>  }
>
> @@ -1674,6 +1677,58 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, 
> operation toOp)
> return true;
>  }
>
> +// ADD(ADD(a,b), c) -> ADD3(a,b,c)
> +bool
> +AlgebraicOpt::tryADDToADD3(Instruction *add)
> +{
> +   Value *src0 = add->getSrc(0);
> +   Value *src1 = add->getSrc(1);
> +   const Modifier modBad = Modifier(~NV50_IR_MOD_NEG);
> +   Modifier mod[4];
> +   Value *src;
> +   int s;
> +
> +   if (src0->refCount() == 1 &&
> +   src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_ADD)
> +  s = 0;
> +   else
> +   if (src1->refCount() == 1 &&
> +   src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_ADD)
> +  s = 1;
> +   else
> +  return false;
> +
> +   src = add->getSrc(s);
> +
> +   if (src->getUniqueInsn() && src->getUniqueInsn()->bb != add->bb)
> +  return false;
> +
> +   if (src->getInsn()->saturate)
> +  return false;
> +
> +   if (typeSizeof(add->dType) != typeSizeof(src->getInsn()->dType))
> +  return false;
> +
> +   mod[0] = add->src(0).mod;
> +   mod[1] = add->src(1).mod;
> +   mod[2] = src->getUniqueInsn()->src(0).mod;
> +   mod[3] = src->getUniqueInsn()->src(1).mod;
> +
> +   if (((mod[0] | mod[1]) | (mod[2] | mod[3])) & modBad)
> +  return false;
> +
> +   add->op = OP_ADD3;
> +   add->dType = src->getInsn()->dType;
> +   add->sType = src->getInsn()->sType;
> +
> +   add->setSrc(s, src->getInsn()->getSrc(0));
> +   add->src(s).mod = mod[s] ^ mod[2];
> +   add->setSrc(2, src->getInsn()->getSrc(1));
> +   add->src(2).mod = mod[3];
> +
> +   return true;
> +}
> +
>  void
>  AlgebraicOpt::handleMINMAX(Instruction *minmax)
>  {
> --
> 2.8.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/24] i965: Update assertion to account for Gen < 7

2016-06-30 Thread Francisco Jerez
Ian Romanick  writes:

> From: Ian Romanick 
>
> Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
> assertion assumed the Gen7+ accumulator rules.  A future patch will
> allow this instruction on at least Gen6, so update the assertion.
>
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 2f473cc..0a48df9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -43,6 +43,9 @@
>  
>  using namespace brw;
>  
> +static unsigned get_fpu_lowered_simd_width(const struct brw_device_info *,
> +   const fs_inst *);
> +
>  void
>  fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg ,
>const fs_reg *src, unsigned sources)
> @@ -3658,7 +3661,8 @@ fs_visitor::lower_integer_multiplication()
>  
>} else if (inst->opcode == SHADER_OPCODE_MULH) {
>   /* Should have been lowered to 8-wide. */
> - assert(inst->exec_size <= 8);
> + assert(inst->exec_size <=
> +(devinfo->gen >= 7 ? 8 : get_fpu_lowered_simd_width(devinfo, 
> inst)));

Instead of get_fpu_lowered_simd_width you could call
get_lowered_simd_width, which expands to same conditional expression you
have put in the assertion for opcode == SHADER_OPCODE_MULH.

>   const fs_reg acc = retype(brw_acc_reg(inst->exec_size),
> inst->dst.type);
>   fs_inst *mul = ibld.MUL(acc, inst->src[0], inst->src[1]);
> -- 
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-06-30 Thread Francisco Jerez
Matt Turner  writes:

> On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> This uses one less instruction.
>
> Add FBH to the list of stupid instructions.
>
>> Signed-off-by: Ian Romanick 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.h   |  4 
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 
>> +++-
>>  src/mesa/drivers/dri/i965/brw_vec4.h |  4 
>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 22 
>>  6 files changed, 61 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
>> b/src/mesa/drivers/dri/i965/brw_fs.h
>> index 4237197..22ce092 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> @@ -237,6 +237,10 @@ public:
>>   nir_tex_instr *instr);
>> void nir_emit_jump(const brw::fs_builder ,
>>nir_jump_instr *instr);
>> +   void nir_emit_find_msb_using_lzd(const brw::fs_builder ,
>> +const fs_reg ,
>> +const fs_reg ,
>> +bool is_signed);
>> fs_reg get_nir_src(const nir_src );
>> fs_reg get_nir_src_imm(const nir_src );
>> fs_reg get_nir_dest(const nir_dest );
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> index d25d26a..bda4a26 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> @@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
>> dispatch_width)
>>   /* FBL only supports UD type for dst. */
>>   brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
>>   break;
>> +  case BRW_OPCODE_LZD:
>> + brw_LZD(p, dst, src[0]);
>> + break;
>>case BRW_OPCODE_CBIT:
>>   assert(devinfo->gen >= 7);
>>   /* CBIT only supports UD type for dst. */
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index b3f5dfd..f15bf3e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -617,6 +617,25 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
>> *instr,
>>  }
>>
>>  void
>> +fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder ,
>> +const fs_reg ,
>> +const fs_reg ,
>> +bool is_signed)
>> +{
>> +   fs_inst *inst;
>> +
>> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
>> +
>> +   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
>> +* from the LSB side. Subtract the result from 31 to convert the MSB
>> +* count into an LSB count.  If no bits are set, LZD will return 32.
>> +* 31-32 = -1, which is exactly what findMSB() is supposed to return.
>> +*/
>> +   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), 
>> brw_imm_d(31));
>> +   inst->src[0].negate = true;
>> +}
>
> I'd personally be inclined to just inline these functions. I know they
> grow somewhat in the next patches... whatever your preference is.

It seems to grow quite a bit in PATCH 16, and it's used in multiple
places, right?  How about we keep it factored out but make it a
stand-alone function instead of a visitor method?  It doesn't look like
it uses *any* internal or external data structures of fs_visitor, it
doesn't even dereference the 'this' pointer at all AFAICT, so you could
likely improve encapsulation somewhat by making it a static function
local to the brw_*_nir.cpp source files.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/8] nv50/ir: optimize ADD3(d, 0x0, b, c) to ADD(d, b, c)

2016-06-30 Thread Samuel Pitoiset
And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 1cf1fa3..517f779 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1032,7 +1032,17 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
, int s)
 i->src(0).mod = Modifier(0);
   }
   break;
-
+   case OP_ADD3:
+  if (i->usesFlags())
+ break;
+  if (imm0.isInteger(0)) {
+ i->op = OP_ADD;
+ for (int k = s; k < 2; k++) {
+i->setSrc(k, i->getSrc(k + 1));
+i->src(k).mod = i->src(k + 1).mod;
+ }
+  }
+  break;
case OP_DIV:
   if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
  break;
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/8] nv50/ir: optimize ADD3(d, a, b, c) to ADD(d, c, a + b)

2016-06-30 Thread Samuel Pitoiset
This is similar to what we already do for MAD/FMA.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 517f779..552672f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -567,6 +567,14 @@ ConstantFolding::expr(Instruction *i,
  return;
   }
   break;
+   case OP_ADD3:
+  switch (i->dType) {
+  case TYPE_S32:
+  case TYPE_U32: res.data.u32 = a->data.u32 + b->data.u32; break;
+  default:
+ return;
+  }
+  break;
case OP_POW:
   switch (i->dType) {
   case TYPE_F32: res.data.f32 = pow(a->data.f32, b->data.f32); break;
@@ -683,7 +691,8 @@ ConstantFolding::expr(Instruction *i,
 
switch (i->op) {
case OP_MAD:
-   case OP_FMA: {
+   case OP_FMA:
+   case OP_ADD3: {
   ImmediateValue src0, src1 = *i->getSrc(0)->asImm();
 
   // Move the immediate into position 1, where we know it might be
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/8] nv50/ir: optimize ADD3(d, a, b, c) to ADD(d, a, b + c)

2016-06-30 Thread Samuel Pitoiset
And ADD3(d, a, b, c) to ADD(d, b, a + c) as well.

This doesn't change the world but it can reduce the number of
instructions in some situations:

total instructions in shared programs :97191 -> 97175 (-0.02%)
total gprs used in shared programs:29196 -> 29196 (0.00%)
total local used in shared programs   :22128 -> 22128 (0.00%)

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 62 ++
 1 file changed, 62 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 552672f..3ff3e64 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -336,6 +336,7 @@ private:
void expr(Instruction *, ImmediateValue&, ImmediateValue&);
void expr(Instruction *, ImmediateValue&, ImmediateValue&, ImmediateValue&);
void opnd(Instruction *, ImmediateValue&, int s);
+   void opnd2(Instruction *, ImmediateValue&, int, ImmediateValue&, int);
void opnd3(Instruction *, ImmediateValue&);
 
void unary(Instruction *, const ImmediateValue&);
@@ -391,6 +392,13 @@ ConstantFolding::visit(BasicBlock *bb)
  opnd(i, src1, 1);
   if (i->srcExists(2) && i->src(2).getImmediate(src2))
  opnd3(i, src2);
+  if (i->srcExists(2) &&
+  i->src(0).getImmediate(src0) && i->src(2).getImmediate(src2))
+ opnd2(i, src0, 0, src2, 2);
+  else
+  if (i->srcExists(2) &&
+  i->src(1).getImmediate(src1) && i->src(2).getImmediate(src2))
+ opnd2(i, src1, 1, src2, 2);
}
return true;
 }
@@ -922,6 +930,60 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
)
 }
 
 void
+ConstantFolding::opnd2(Instruction *i, ImmediateValue , int s0,
+   ImmediateValue , int s1)
+{
+   struct Storage *const a = , *const b = 
+   ImmediateValue src0, src1;
+   struct Storage res;
+   DataType type = i->dType;
+
+   memset(, 0, sizeof(res.data));
+
+   switch (i->op) {
+   case OP_ADD3:
+  switch (i->dType) {
+  case TYPE_S32:
+  case TYPE_U32: res.data.u32 = a->data.u32 + b->data.u32; break;
+  default:
+ return;
+  }
+  break;
+   default:
+  return;
+   }
+   ++foldCount;
+
+   i->op = OP_ADD;
+
+   if (s0 == 0) {
+  i->setSrc(0, i->getSrc(1));
+  i->src(0).mod = i->src(1).mod;
+   }
+
+   i->setSrc(1, new_ImmediateValue(i->bb->getProgram(), res.data.u32));
+   i->setSrc(2, NULL);
+
+   i->getSrc(1)->reg.data = res.data;
+   i->getSrc(1)->reg.type = type;
+   i->getSrc(1)->reg.size = typeSizeof(type);
+
+   src1 = *i->getSrc(1)->asImm();
+
+   // Move the immediate into position 1, where we know it might be
+   // emittable. However it might not be anyways, as there may be other
+   // restrictions, so move it into a separate LValue.
+   bld.setPosition(i, false);
+   i->setSrc(1, bld.mkMov(bld.getSSA(type), i->getSrc(1), type)->getDef(0));
+   i->src(1).mod = Modifier(0);
+
+   if (i->src(0).getImmediate(src0))
+  expr(i, src0, src1);
+   else
+  opnd(i, src1, 1);
+}
+
+void
 ConstantFolding::opnd(Instruction *i, ImmediateValue , int s)
 {
const int t = !s;
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] nv50/ir: optimize ADD3(d, a, b, 0x0) to ADD(d, a, b)

2016-06-30 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index f136c58..1cf1fa3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -899,6 +899,14 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
)
  return;
   }
   break;
+   case OP_ADD3:
+  if (imm2.isInteger(0)) {
+ i->op = OP_ADD;
+ i->setSrc(2, NULL);
+ foldCount++;
+ break;
+  }
+  break;
default:
   return;
}
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] nv50/ir: optimize ADD(ADD(a, b), c) to ADD3(a, b, c)

2016-06-30 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3213188..928923c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1531,6 +1531,7 @@ private:
void handleABS(Instruction *);
bool handleADD(Instruction *);
bool tryADDToMADOrSAD(Instruction *, operation toOp);
+   bool tryADDToADD3(Instruction *);
void handleMINMAX(Instruction *);
void handleRCP(Instruction *);
void handleSLCT(Instruction *);
@@ -1604,6 +1605,8 @@ AlgebraicOpt::handleADD(Instruction *add)
   changed = tryADDToMADOrSAD(add, OP_MAD);
if (!changed && prog->getTarget()->isOpSupported(OP_SAD, add->dType))
   changed = tryADDToMADOrSAD(add, OP_SAD);
+   if (!changed && prog->getTarget()->isOpSupported(OP_ADD3, add->dType))
+  changed = tryADDToADD3(add);
return changed;
 }
 
@@ -1674,6 +1677,58 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, 
operation toOp)
return true;
 }
 
+// ADD(ADD(a,b), c) -> ADD3(a,b,c)
+bool
+AlgebraicOpt::tryADDToADD3(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+   const Modifier modBad = Modifier(~NV50_IR_MOD_NEG);
+   Modifier mod[4];
+   Value *src;
+   int s;
+
+   if (src0->refCount() == 1 &&
+   src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_ADD)
+  s = 0;
+   else
+   if (src1->refCount() == 1 &&
+   src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_ADD)
+  s = 1;
+   else
+  return false;
+
+   src = add->getSrc(s);
+
+   if (src->getUniqueInsn() && src->getUniqueInsn()->bb != add->bb)
+  return false;
+
+   if (src->getInsn()->saturate)
+  return false;
+
+   if (typeSizeof(add->dType) != typeSizeof(src->getInsn()->dType))
+  return false;
+
+   mod[0] = add->src(0).mod;
+   mod[1] = add->src(1).mod;
+   mod[2] = src->getUniqueInsn()->src(0).mod;
+   mod[3] = src->getUniqueInsn()->src(1).mod;
+
+   if (((mod[0] | mod[1]) | (mod[2] | mod[3])) & modBad)
+  return false;
+
+   add->op = OP_ADD3;
+   add->dType = src->getInsn()->dType;
+   add->sType = src->getInsn()->sType;
+
+   add->setSrc(s, src->getInsn()->getSrc(0));
+   add->src(s).mod = mod[s] ^ mod[2];
+   add->setSrc(2, src->getInsn()->getSrc(1));
+   add->src(2).mod = mod[3];
+
+   return true;
+}
+
 void
 AlgebraicOpt::handleMINMAX(Instruction *minmax)
 {
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] nv50/ir: add preliminary support for OP_ADD3

2016-06-30 Thread Samuel Pitoiset
This instruction is new since SM50 (Maxwell) and allows to perform
an add with three sources. Unfortunately, it only supports integers.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h| 1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp| 1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp   | 3 +++
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp | 4 
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp  | 1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp  | 5 -
 6 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 94e54bb..6afe7c7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -163,6 +163,7 @@ enum operation
OP_SHFL, // warp shuffle
OP_VOTE,
OP_BUFQ, // buffer query
+   OP_ADD3,
OP_LAST
 };
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 6469f71..d88bdce 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -192,6 +192,7 @@ const char *operationStr[OP_LAST + 1] =
"shfl",
"vote",
"bufq",
+   "add3",
"(invalid)"
 };
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index cc98d32..8067787 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -57,6 +57,7 @@ const uint8_t Target::operationSrcNr[] =
3,  // SHFL
1,  // VOTE
1,  // BUFQ
+   3,  // ADD3
0
 };
 
@@ -135,6 +136,8 @@ const OpClass Target::operationClass[] =
OPCLASS_OTHER,
// BUFQ
OPCLASS_OTHER,
+   // ADD3
+   OPCLASS_ARITH,
OPCLASS_PSEUDO // LAST
 };
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
index 92caeb2..5608b5e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
@@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) const
case OP_DIV:
case OP_MOD:
   return false;
+   case OP_ADD3:
+  if (isFloatType(ty))
+ return false;
+  break;
default:
   break;
}
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index b37ea73..e1a7963 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -437,6 +437,7 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
case OP_EXTBF:
case OP_EXIT: // want exit modifier instead (on NOP if required)
case OP_MEMBAR:
+   case OP_ADD3:
   return false;
case OP_SAD:
   return ty == TYPE_S32;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index 932ec39..f895019 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -147,7 +147,8 @@ static const struct opProperties _initProps[] =
{ OP_SUSTP,   0x0, 0x0, 0x0, 0x0, 0x2, 0x0 },
{ OP_SUCLAMP, 0x0, 0x0, 0x0, 0x0, 0x2, 0x2 },
{ OP_SUBFM,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
-   { OP_SUEAU,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 }
+   { OP_SUEAU,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
+   { OP_ADD3,0x7, 0x0, 0x0, 0x0, 0x2, 0x2 },
 };
 
 void TargetNVC0::initOpInfo()
@@ -415,6 +416,8 @@ TargetNVC0::isOpSupported(operation op, DataType ty) const
   return false;
if (op == OP_POW || op == OP_SQRT || op == OP_DIV || op == OP_MOD)
   return false;
+   if (op == OP_ADD3)
+  return false;
return true;
 }
 
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] nv50/ir: optimize ADD3(d, a, b, c) to MOV(d, a + b + c)

2016-06-30 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 928923c..f136c58 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -759,6 +759,17 @@ ConstantFolding::expr(Instruction *i,
   }
   break;
}
+   case OP_ADD3: {
+  switch (i->dType) {
+  case TYPE_S32:
+  case TYPE_U32:
+ res.data.u32 = a->data.u32 + b->data.u32 + c->data.u32;
+ break;
+  default:
+ return;
+  }
+  break;
+   }
default:
   return;
}
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/8] gm107/ir: add emission for OP_ADD3

2016-06-30 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 2c5e8f6..c6dedc2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -150,6 +150,7 @@ private:
void emitLOP();
void emitNOT();
void emitIADD();
+   void emitIADD3();
void emitIMUL();
void emitIMAD();
void emitIMNMX();
@@ -1718,6 +1719,36 @@ CodeEmitterGM107::emitIADD()
 }
 
 void
+CodeEmitterGM107::emitIADD3()
+{
+   switch (insn->src(1).getFile()) {
+   case FILE_GPR:
+  emitInsn(0x5cc0);
+  emitGPR (0x14, insn->src(1));
+  break;
+   case FILE_MEMORY_CONST:
+  emitInsn(0x4cc0);
+  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1));
+  break;
+   case FILE_IMMEDIATE:
+  emitInsn(0x38c0);
+  emitIMMD(0x14, 19, insn->src(1));
+  break;
+   default:
+  assert(!"bad src1 file");
+  break;
+   }
+   emitNEG(0x33, insn->src(0));
+   emitNEG(0x32, insn->src(1));
+   emitNEG(0x31, insn->src(2));
+   emitX  (0x30);
+   emitCC (0x2f);
+   emitGPR(0x27, insn->src(2));
+   emitGPR(0x08, insn->src(0));
+   emitGPR(0x00, insn->def(0));
+}
+
+void
 CodeEmitterGM107::emitIMUL()
 {
if (insn->src(1).getFile() != FILE_IMMEDIATE) {
@@ -2930,6 +2961,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
  emitIADD();
   }
   break;
+   case OP_ADD3:
+  emitIADD3();
+  break;
case OP_MUL:
   if (isFloatType(insn->dType)) {
  if (insn->dType == TYPE_F64)
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Add MESA_shader_integer_functions

2016-06-30 Thread Matt Turner
On Thu, Jun 30, 2016 at 12:00 PM, Matt Turner  wrote:
> On Thu, Jun 30, 2016 at 11:37 AM, Matt Turner  wrote:
>> Patches 1-5 are
>>
>> Reviewed-by: Matt Turner 
>
> As are 6-14.

18-22 as well.

There's weirdness about LZD of signed sources. I sent a comment.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Make single-buffered GLES representation internally consistent

2016-06-30 Thread Stéphane Marchesin
On Thu, Jun 30, 2016 at 3:20 PM, Gurchetan Singh
 wrote:
> There are a few places in the code where clearing and reading are done on
> incorrect buffers for GLES contexts.  See comments for details.  This
> fixes 75 GLES3 dEQP tests on the surfaceless platform with no regressions.
>
> v2: Corrected unclear comment
> v3: Make the change in context.c instead of get.c
> v4: Removed whitespace

I looked for a better way than initializing from makecurrent, but
there doesn't seem to be one, so...

Reviewed-by: 

> ---
>  src/mesa/main/buffers.c | 14 --
>  src/mesa/main/clear.c   |  8 
>  src/mesa/main/context.c | 10 ++
>  3 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
> index e8aedde..86696b8 100644
> --- a/src/mesa/main/buffers.c
> +++ b/src/mesa/main/buffers.c
> @@ -173,12 +173,22 @@ draw_buffer_enum_to_bitmask(const struct gl_context 
> *ctx, GLenum buffer)
>   * return -1 for an invalid buffer.
>   */
>  static gl_buffer_index
> -read_buffer_enum_to_index(GLenum buffer)
> +read_buffer_enum_to_index(const struct gl_context *ctx, GLenum buffer)
>  {
> switch (buffer) {
>case GL_FRONT:
>   return BUFFER_FRONT_LEFT;
>case GL_BACK:
> + if (_mesa_is_gles(ctx)) {
> +/* In draw_buffer_enum_to_bitmask, when GLES contexts draw to
> + * GL_BACK with a single-buffered configuration, we actually end
> + * up drawing to the sole front buffer in our internal
> + * representation.  For consistency, we must read from that
> + * front left buffer too.
> + */
> +if (!ctx->DrawBuffer->Visual.doubleBufferMode)
> +   return BUFFER_FRONT_LEFT;
> + }
>   return BUFFER_BACK_LEFT;
>case GL_RIGHT:
>   return BUFFER_FRONT_RIGHT;
> @@ -724,7 +734,7 @@ read_buffer(struct gl_context *ctx, struct gl_framebuffer 
> *fb,
>if (_mesa_is_gles3(ctx) && !is_legal_es3_readbuffer_enum(buffer))
>   srcBuffer = -1;
>else
> - srcBuffer = read_buffer_enum_to_index(buffer);
> + srcBuffer = read_buffer_enum_to_index(ctx, buffer);
>
>if (srcBuffer == -1) {
>   _mesa_error(ctx, GL_INVALID_ENUM,
> diff --git a/src/mesa/main/clear.c b/src/mesa/main/clear.c
> index 35b912c..a1bb36e 100644
> --- a/src/mesa/main/clear.c
> +++ b/src/mesa/main/clear.c
> @@ -267,6 +267,14 @@ make_color_buffer_mask(struct gl_context *ctx, GLint 
> drawbuffer)
>   mask |= BUFFER_BIT_FRONT_RIGHT;
>break;
> case GL_BACK:
> +  /* For GLES contexts with a single buffered configuration, we actually
> +   * only have a front renderbuffer, so any clear calls to GL_BACK should
> +   * affect that buffer. See draw_buffer_enum_to_bitmask for details.
> +   */
> +  if (_mesa_is_gles(ctx))
> + if (!ctx->DrawBuffer->Visual.doubleBufferMode)
> +if (att[BUFFER_FRONT_LEFT].Renderbuffer)
> +   mask |= BUFFER_BIT_FRONT_LEFT;
>if (att[BUFFER_BACK_LEFT].Renderbuffer)
>   mask |= BUFFER_BIT_BACK_LEFT;
>if (att[BUFFER_BACK_RIGHT].Renderbuffer)
> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
> index 85cd779..8925626 100644
> --- a/src/mesa/main/context.c
> +++ b/src/mesa/main/context.c
> @@ -1676,6 +1676,16 @@ _mesa_make_current( struct gl_context *newCtx,
>   }
>   if (!newCtx->ReadBuffer || _mesa_is_winsys_fbo(newCtx->ReadBuffer)) 
> {
>  _mesa_reference_framebuffer(>ReadBuffer, readBuffer);
> +/* In _mesa_initialize_window_framebuffer, for single-buffered
> + * visuals, the ColorReadBuffer is set to be GL_FRONT, even with
> + * GLES contexts. When calling read_buffer, we verify we are 
> reading
> + * from GL_BACK in is_legal_es3_readbuffer_enum.  But the 
> default is
> + * incorrect, and certain dEQP tests check this.  So fix it here.
> + */
> +if (_mesa_is_gles(newCtx) &&
> +   !newCtx->ReadBuffer->Visual.doubleBufferMode)
> +   if (newCtx->ReadBuffer->ColorReadBuffer == GL_FRONT)
> +  newCtx->ReadBuffer->ColorReadBuffer = GL_BACK;
>   }
>
>   /* XXX only set this flag if we're really changing the draw/read
> --
> 2.1.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/24] i965: Use LZD to implement nir_op_ifind_msb on Gen < 7

2016-06-30 Thread Matt Turner
On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 50 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 52 
> +++---
>  2 files changed, 81 insertions(+), 21 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index f15bf3e..f8db28a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -623,8 +623,32 @@ fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder 
> ,
>  bool is_signed)
>  {
> fs_inst *inst;
> +   fs_reg temp = src;
>
> -   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
> +   if (is_signed) {
> +  /* LZD of an absolute value source almost always does the right
> +   * thing.  There are two problem values:
> +   *
> +   * * 0x8000.  Since abs(0x8000) == 0x8000, LZD returns
> +   *   0.  However, findMSB(int(0x8000)) == 30.
> +   *
> +   * * 0x.  Since abs(0x) == 1, LZD returns
> +   *   31.  Section 8.8 (Integer Functions) of the GLSL 4.50 spec says:
> +   *
> +   *For a value of zero or negative one, -1 will be returned.
> +   *
> +   * For all negative number cases, including 0x8000 and
> +   * 0x, the correct value is obtained from LZD if instead of

Interesting, both the G45 and IVB docs (I didn't check others, I
suspect they say the same) say

"If the source is signed, the abs source modifier must be used to
convert any negative source value to a positive value."

I suppose that means you're supposed to retype the src to UD below?

> +   * negating the (already negative) value the logical-not is used.  A
> +   * conditonal logical-not can be achieved in two instructions.
> +   */
> +  temp = vgrf(glsl_type::int_type);
> +
> +  bld.ASR(temp, src, brw_imm_d(31));
> +  bld.XOR(temp, temp, src);
> +   }
> +
> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), temp);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Make single-buffered GLES representation internally consistent

2016-06-30 Thread Gurchetan Singh
There are a few places in the code where clearing and reading are done on
incorrect buffers for GLES contexts.  See comments for details.  This
fixes 75 GLES3 dEQP tests on the surfaceless platform with no regressions.

v2: Corrected unclear comment
v3: Make the change in context.c instead of get.c
v4: Removed whitespace
---
 src/mesa/main/buffers.c | 14 --
 src/mesa/main/clear.c   |  8 
 src/mesa/main/context.c | 10 ++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index e8aedde..86696b8 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -173,12 +173,22 @@ draw_buffer_enum_to_bitmask(const struct gl_context *ctx, 
GLenum buffer)
  * return -1 for an invalid buffer.
  */
 static gl_buffer_index
-read_buffer_enum_to_index(GLenum buffer)
+read_buffer_enum_to_index(const struct gl_context *ctx, GLenum buffer)
 {
switch (buffer) {
   case GL_FRONT:
  return BUFFER_FRONT_LEFT;
   case GL_BACK:
+ if (_mesa_is_gles(ctx)) {
+/* In draw_buffer_enum_to_bitmask, when GLES contexts draw to
+ * GL_BACK with a single-buffered configuration, we actually end
+ * up drawing to the sole front buffer in our internal
+ * representation.  For consistency, we must read from that
+ * front left buffer too.
+ */
+if (!ctx->DrawBuffer->Visual.doubleBufferMode)
+   return BUFFER_FRONT_LEFT;
+ }
  return BUFFER_BACK_LEFT;
   case GL_RIGHT:
  return BUFFER_FRONT_RIGHT;
@@ -724,7 +734,7 @@ read_buffer(struct gl_context *ctx, struct gl_framebuffer 
*fb,
   if (_mesa_is_gles3(ctx) && !is_legal_es3_readbuffer_enum(buffer))
  srcBuffer = -1;
   else
- srcBuffer = read_buffer_enum_to_index(buffer);
+ srcBuffer = read_buffer_enum_to_index(ctx, buffer);
 
   if (srcBuffer == -1) {
  _mesa_error(ctx, GL_INVALID_ENUM,
diff --git a/src/mesa/main/clear.c b/src/mesa/main/clear.c
index 35b912c..a1bb36e 100644
--- a/src/mesa/main/clear.c
+++ b/src/mesa/main/clear.c
@@ -267,6 +267,14 @@ make_color_buffer_mask(struct gl_context *ctx, GLint 
drawbuffer)
  mask |= BUFFER_BIT_FRONT_RIGHT;
   break;
case GL_BACK:
+  /* For GLES contexts with a single buffered configuration, we actually
+   * only have a front renderbuffer, so any clear calls to GL_BACK should
+   * affect that buffer. See draw_buffer_enum_to_bitmask for details.
+   */
+  if (_mesa_is_gles(ctx))
+ if (!ctx->DrawBuffer->Visual.doubleBufferMode)
+if (att[BUFFER_FRONT_LEFT].Renderbuffer)
+   mask |= BUFFER_BIT_FRONT_LEFT;
   if (att[BUFFER_BACK_LEFT].Renderbuffer)
  mask |= BUFFER_BIT_BACK_LEFT;
   if (att[BUFFER_BACK_RIGHT].Renderbuffer)
diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 85cd779..8925626 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -1676,6 +1676,16 @@ _mesa_make_current( struct gl_context *newCtx,
  }
  if (!newCtx->ReadBuffer || _mesa_is_winsys_fbo(newCtx->ReadBuffer)) {
 _mesa_reference_framebuffer(>ReadBuffer, readBuffer);
+/* In _mesa_initialize_window_framebuffer, for single-buffered
+ * visuals, the ColorReadBuffer is set to be GL_FRONT, even with
+ * GLES contexts. When calling read_buffer, we verify we are 
reading
+ * from GL_BACK in is_legal_es3_readbuffer_enum.  But the default 
is
+ * incorrect, and certain dEQP tests check this.  So fix it here.
+ */
+if (_mesa_is_gles(newCtx) &&
+   !newCtx->ReadBuffer->Visual.doubleBufferMode)
+   if (newCtx->ReadBuffer->ColorReadBuffer == GL_FRONT)
+  newCtx->ReadBuffer->ColorReadBuffer = GL_BACK;
  }
 
  /* XXX only set this flag if we're really changing the draw/read
-- 
2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/24] i965: Use LZD to implement nir_op_ufind_msb

2016-06-30 Thread Matt Turner
On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> This uses one less instruction.

Add FBH to the list of stupid instructions.

> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  4 
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  3 +++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 26 
> +++-
>  src/mesa/drivers/dri/i965/brw_vec4.h |  4 
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  3 +++
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   | 22 
>  6 files changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 4237197..22ce092 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -237,6 +237,10 @@ public:
>   nir_tex_instr *instr);
> void nir_emit_jump(const brw::fs_builder ,
>nir_jump_instr *instr);
> +   void nir_emit_find_msb_using_lzd(const brw::fs_builder ,
> +const fs_reg ,
> +const fs_reg ,
> +bool is_signed);
> fs_reg get_nir_src(const nir_src );
> fs_reg get_nir_src_imm(const nir_src );
> fs_reg get_nir_dest(const nir_dest );
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index d25d26a..bda4a26 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1761,6 +1761,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>   /* FBL only supports UD type for dst. */
>   brw_FBL(p, retype(dst, BRW_REGISTER_TYPE_UD), src[0]);
>   break;
> +  case BRW_OPCODE_LZD:
> + brw_LZD(p, dst, src[0]);
> + break;
>case BRW_OPCODE_CBIT:
>   assert(devinfo->gen >= 7);
>   /* CBIT only supports UD type for dst. */
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index b3f5dfd..f15bf3e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -617,6 +617,25 @@ fs_visitor::optimize_frontfacing_ternary(nir_alu_instr 
> *instr,
>  }
>
>  void
> +fs_visitor::nir_emit_find_msb_using_lzd(const fs_builder ,
> +const fs_reg ,
> +const fs_reg ,
> +bool is_signed)
> +{
> +   fs_inst *inst;
> +
> +   bld.LZD(retype(result, BRW_REGISTER_TYPE_UD), src);
> +
> +   /* LZD counts from the MSB side, while GLSL's findMSB() wants the count
> +* from the LSB side. Subtract the result from 31 to convert the MSB
> +* count into an LSB count.  If no bits are set, LZD will return 32.
> +* 31-32 = -1, which is exactly what findMSB() is supposed to return.
> +*/
> +   inst = bld.ADD(result, retype(result, BRW_REGISTER_TYPE_D), 
> brw_imm_d(31));
> +   inst->src[0].negate = true;
> +}

I'd personally be inclined to just inline these functions. I know they
grow somewhat in the next patches... whatever your preference is.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [AppVeyor] mesa master #1718 completed

2016-06-30 Thread AppVeyor


Build mesa 1718 completed



Commit eb79b2b331 by Brian Paul on 5/11/2016 3:20 PM:

st/wgl: make own_mutex() non-static\n\nReviewed-by: Jose Fonseca 


Configure your notification preferences

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] i965: intel_texture_barrier reimplemented

2016-06-30 Thread Francisco Jerez
Alejandro Piñeiro  writes:

> Fixes:
> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>
> On Haswell, Broadwell and Skylake (note that in order to execute that
> test, it is needed to override GL and GLSL versions).
>
> On gen6 this test was already working without this change. It keeps
> working after it.
>
> This commit replaces the call to brw_emit_mi_flush for gen6+ with two
> calls to brw_emit_pipe_control_flush:
>
>  * The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
>a render cache flush after any concurrent rendering completes and
>cause the CS to stop parsing commands until the render cache
>becomes coherent with memory.
>
>  * The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
>to clean up any stale data from the sampler caches before rendering
>continues.
>
> Didn't touch gen4-5, basically because I don't have a way to test
> them.
>
> More info on commits:
> 0aa4f99f562a05880a779707cbcd46be459863bf
> 72473658c51d5e074ce219c1e6385a4cce29f467
>
> Thanks to Curro to help to tracking this down, as the root case was a
> hw race condition.
>
> v2: use two calls to pipe_control_flush instead of a combination of
> gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
> v3: no need to const cache invalidation (Curro)
> ---
>
> FWIW: checked with the CTS tests, and the piglit series, and confirmed
> that the const cache invalidation is not needed.
>
>  src/mesa/drivers/dri/i965/intel_tex.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c 
> b/src/mesa/drivers/dri/i965/intel_tex.c
> index cac33ac..a802d5a 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> @@ -9,6 +9,7 @@
>  #include "intel_mipmap_tree.h"
>  #include "intel_tex.h"
>  #include "intel_fbo.h"
> +#include "intel_reg.h"
>  
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>  
> @@ -362,7 +363,25 @@ intel_texture_barrier(struct gl_context *ctx)
>  {
> struct brw_context *brw = brw_context(ctx);
>  
> -   brw_emit_mi_flush(brw);
> +   if (brw->gen >= 6) {
> +  if (brw->gen == 6) {
> + /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
> +  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
> +  * post-sync-op is required.
> +  */
> + brw_emit_post_sync_nonzero_flush(brw);
> +  }
> +
> +  brw_emit_pipe_control_flush(brw,
> +  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> +  PIPE_CONTROL_RENDER_TARGET_FLUSH |
> +  PIPE_CONTROL_CS_STALL);
> +
> +  brw_emit_pipe_control_flush(brw,
> +  PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> +   } else {
> +  brw_emit_mi_flush(brw);
> +   }

Looks reasonable to me, let's get this bug fixed on master, things can
be refactored later on:

Reviewed-by: Francisco Jerez 

>  }
>  
>  void
> -- 
> 2.7.4


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Add gperf as a build dependency?

2016-06-30 Thread Kenneth Graunke
On Thursday, June 30, 2016 10:13:47 AM PDT Ian Romanick wrote:
> I think I might want to use gperf for something in Mesa, but I'm not
> 100% sure yet.  Before I proceed, is it even acceptable to add that as a
> build dependency?

Why not make it an optional dependency?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH shader-db 2/4] split-to-files.py: fix parsing compute shaders

2016-06-30 Thread Kenneth Graunke
On Thursday, June 30, 2016 5:03:24 PM PDT Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  split-to-files.py | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/split-to-files.py b/split-to-files.py
> index 7e14d89..0e1d729 100755
> --- a/split-to-files.py
> +++ b/split-to-files.py
> @@ -92,6 +92,8 @@ def write_shader_test(filename, shaders):
>  out.write("[tessellation control shader]\n")
>  elif stage == "tess eval" or stage == "tessellation evaluation":
>  out.write("[tessellation evaluation shader]\n")
> +elif stage == "compute":
> +out.write("[compute shader]\n")
>  else:
>  assert False, stage
>  out.write(shaders[(stage, num)])
> 

FWIW, I finally landed the MESA_SHADER_CAPTURE_PATH patch a while back,
which means you can just do:

$ mkdir /tmp/mesa
$ MESA_SHADER_CAPTURE_PATH=/tmp/mesa ./application

and it'll dump .shader_test files for you.

I'd be in favor of deleting split-to-files.py, personally, but if it's
still useful to you, feel free to land these two patches.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [AppVeyor] mesa master #1716 failed

2016-06-30 Thread AppVeyor



Build mesa 1716 failed


Commit c8ea85 by Brian Paul on 6/28/2016 11:15 PM:

svga: use SVGA3D_vgpu10_BufferCopy() for buffer copies\n\nSo that we do copies host-side rather than in the guest with map/memcpy.\n\nTested with piglit arb_copy_buffer-subdata-sync test and new\narb_copy_buffer-intra-buffer-copy test.\n\nReviewed-by: Charmaine Lee \nAcked-by: Roland Scheidegger 


Configure your notification preferences

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] i965: intel_texture_barrier reimplemented

2016-06-30 Thread Alejandro Piñeiro
Ok, thanks for the pointers. Will take a look tomorrow (is 21:00 here).

Btw, what do you prefer? To fix it first on the texture barrier with a
patch like this, and then import Vulkan's? or forget about fixing with
the current status and go directly to import Vulkan's approach?

BR

On 30/06/16 16:04, Jason Ekstrand wrote:
>
> Fwiw, I very much like the way I did this in the Vulkan driver where
> it splits it into two pipe controls automatically based on the input
> bits.  (Look at genX_cmd_buffer.c cmd_buffer_apply_pipe_flushes.)  I
> very much doubt that this is the only place we have this problem in
> the GL driver.  We should probably fix it in brw_emit_pipe_control.
>
> On Jun 30, 2016 12:00 AM, "Alejandro Piñeiro"  > wrote:
>
> Fixes:
> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
>
> On Haswell, Broadwell and Skylake (note that in order to execute that
> test, it is needed to override GL and GLSL versions).
>
> On gen6 this test was already working without this change. It keeps
> working after it.
>
> This commit replaces the call to brw_emit_mi_flush for gen6+ with two
> calls to brw_emit_pipe_control_flush:
>
>  * The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
>a render cache flush after any concurrent rendering completes and
>cause the CS to stop parsing commands until the render cache
>becomes coherent with memory.
>
>  * The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
>to clean up any stale data from the sampler caches before rendering
>continues.
>
> Didn't touch gen4-5, basically because I don't have a way to test
> them.
>
> More info on commits:
> 0aa4f99f562a05880a779707cbcd46be459863bf
> 72473658c51d5e074ce219c1e6385a4cce29f467
>
> Thanks to Curro to help to tracking this down, as the root case was a
> hw race condition.
>
> v2: use two calls to pipe_control_flush instead of a combination of
> gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
> v3: no need to const cache invalidation (Curro)
> ---
>
> FWIW: checked with the CTS tests, and the piglit series, and confirmed
> that the const cache invalidation is not needed.
>
>  src/mesa/drivers/dri/i965/intel_tex.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c
> b/src/mesa/drivers/dri/i965/intel_tex.c
> index cac33ac..a802d5a 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex.c
> @@ -9,6 +9,7 @@
>  #include "intel_mipmap_tree.h"
>  #include "intel_tex.h"
>  #include "intel_fbo.h"
> +#include "intel_reg.h"
>
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>
> @@ -362,7 +363,25 @@ intel_texture_barrier(struct gl_context *ctx)
>  {
> struct brw_context *brw = brw_context(ctx);
>
> -   brw_emit_mi_flush(brw);
> +   if (brw->gen >= 6) {
> +  if (brw->gen == 6) {
> + /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
> +  * Flush Enable = 1, a PIPE_CONTROL with any non-zero
> +  * post-sync-op is required.
> +  */
> + brw_emit_post_sync_nonzero_flush(brw);
> +  }
> +
> +  brw_emit_pipe_control_flush(brw,
> +  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> +  PIPE_CONTROL_RENDER_TARGET_FLUSH |
> +  PIPE_CONTROL_CS_STALL);
> +
> +  brw_emit_pipe_control_flush(brw,
> + 
> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> +   } else {
> +  brw_emit_mi_flush(brw);
> +   }
>  }
>
>  void
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] st/va: add encode entrypoint

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/config.c | 61 +++---
 src/gallium/state_trackers/va/context.c| 57 
 src/gallium/state_trackers/va/surface.c| 12 --
 src/gallium/state_trackers/va/va_private.h |  5 +++
 4 files changed, 103 insertions(+), 32 deletions(-)

diff --git a/src/gallium/state_trackers/va/config.c 
b/src/gallium/state_trackers/va/config.c
index 9ca0aa8..73704a1 100644
--- a/src/gallium/state_trackers/va/config.c
+++ b/src/gallium/state_trackers/va/config.c
@@ -34,6 +34,8 @@
 
 #include "va_private.h"
 
+#include "util/u_handle_table.h"
+
 DEBUG_GET_ONCE_BOOL_OPTION(mpeg4, "VAAPI_MPEG4_ENABLED", false)
 
 VAStatus
@@ -128,14 +130,27 @@ VAStatus
 vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, VAEntrypoint 
entrypoint,
  VAConfigAttrib *attrib_list, int num_attribs, VAConfigID 
*config_id)
 {
+   vlVaDriver *drv;
+   vlVaConfig *config;
struct pipe_screen *pscreen;
enum pipe_video_profile p;
 
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
+   drv = VL_VA_DRIVER(ctx);
+
+   if (!drv)
+  return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+   config = CALLOC(1, sizeof(vlVaConfig));
+   if (!config)
+  return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
if (profile == VAProfileNone && entrypoint == VAEntrypointVideoProc) {
-  *config_id = PIPE_VIDEO_PROFILE_UNKNOWN;
+  config->entrypoint = VAEntrypointVideoProc;
+  config->profile = PIPE_VIDEO_PROFILE_UNKNOWN;
+  *config_id = handle_table_add(drv->htab, config);
   return VA_STATUS_SUCCESS;
}
 
@@ -150,7 +165,14 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, 
VAEntrypoint entrypoin
if (entrypoint != VAEntrypointVLD)
   return VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
 
-   *config_id = p;
+   if (entrypoint == VAEntrypointEncSlice || entrypoint == 
VAEntrypointEncPicture)
+  config->entrypoint = PIPE_VIDEO_ENTRYPOINT_ENCODE;
+   else
+  config->entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
+
+   config->profile = p;
+
+   *config_id = handle_table_add(drv->htab, config);
 
return VA_STATUS_SUCCESS;
 }
@@ -158,9 +180,25 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, 
VAEntrypoint entrypoin
 VAStatus
 vlVaDestroyConfig(VADriverContextP ctx, VAConfigID config_id)
 {
+   vlVaDriver *drv;
+   vlVaConfig *config;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
+   drv = VL_VA_DRIVER(ctx);
+
+   if (!drv)
+  return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+   config = handle_table_get(drv->htab, config_id);
+
+   if (!config)
+  return VA_STATUS_ERROR_INVALID_CONFIG;
+
+   FREE(config);
+   handle_table_remove(drv->htab, config_id);
+
return VA_STATUS_SUCCESS;
 }
 
@@ -168,18 +206,31 @@ VAStatus
 vlVaQueryConfigAttributes(VADriverContextP ctx, VAConfigID config_id, 
VAProfile *profile,
   VAEntrypoint *entrypoint, VAConfigAttrib 
*attrib_list, int *num_attribs)
 {
+   vlVaDriver *drv;
+   vlVaConfig *config;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   *profile = PipeToProfile(config_id);
+   drv = VL_VA_DRIVER(ctx);
+
+   if (!drv)
+  return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+   config = handle_table_get(drv->htab, config_id);
+
+   if (!config)
+  return VA_STATUS_ERROR_INVALID_CONFIG;
+
+   *profile = PipeToProfile(config->profile);
 
-   if (config_id == PIPE_VIDEO_PROFILE_UNKNOWN) {
+   if (config->profile == PIPE_VIDEO_PROFILE_UNKNOWN) {
   *entrypoint = VAEntrypointVideoProc;
   *num_attribs = 0;
   return VA_STATUS_SUCCESS;
}
 
-   *entrypoint = VAEntrypointVLD;
+   *entrypoint = config->entrypoint;
 
*num_attribs = 1;
attrib_list[0].type = VAConfigAttribRTFormat;
diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index 402fbb2..b4334f4 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -195,18 +195,21 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
 {
vlVaDriver *drv;
vlVaContext *context;
+   vlVaConfig *config;
int is_vpp;
 
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   is_vpp = config_id == PIPE_VIDEO_PROFILE_UNKNOWN && !picture_width &&
+   drv = VL_VA_DRIVER(ctx);
+   config = handle_table_get(drv->htab, config_id);
+
+   is_vpp = config->profile == PIPE_VIDEO_PROFILE_UNKNOWN && !picture_width &&
 !picture_height && !flag && !render_targets && !num_render_targets;
 
if (!(picture_width && picture_height) && !is_vpp)
   return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT;
 
-   drv = VL_VA_DRIVER(ctx);
context = CALLOC(1, sizeof(vlVaContext));
if (!context)
   return VA_STATUS_ERROR_ALLOCATION_FAILED;
@@ -218,8 +221,8 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
  return 

[Mesa-dev] [PATCH 07/12] st/va: get rate control method from configattrib

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/config.c | 11 +++
 src/gallium/state_trackers/va/context.c|  3 ++-
 src/gallium/state_trackers/va/va_private.h |  1 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/config.c 
b/src/gallium/state_trackers/va/config.c
index 73704a1..ea838c0 100644
--- a/src/gallium/state_trackers/va/config.c
+++ b/src/gallium/state_trackers/va/config.c
@@ -172,6 +172,17 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, 
VAEntrypoint entrypoin
 
config->profile = p;
 
+   for (int i = 0; i rc = PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT;
+ else if (attrib_list[i].value == VA_RC_VBR)
+config->rc = PIPE_H264_ENC_RATE_CONTROL_METHOD_VARIABLE;
+ else
+config->rc = PIPE_H264_ENC_RATE_CONTROL_METHOD_DISABLE;
+  }
+   }
+
*config_id = handle_table_add(drv->htab, config);
 
return VA_STATUS_SUCCESS;
diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index b4334f4..c67ed1f 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -274,7 +274,8 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
 
context->desc.base.profile = config->profile;
context->desc.base.entry_point = config->entrypoint;
-
+   if (config->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE)
+  context->desc.h264enc.rate_ctrl.rate_ctrl_method = config->rc;
pipe_mutex_lock(drv->mutex);
*context_id = handle_table_add(drv->htab, context);
pipe_mutex_unlock(drv->mutex);
diff --git a/src/gallium/state_trackers/va/va_private.h 
b/src/gallium/state_trackers/va/va_private.h
index 723983d..ad9010a 100644
--- a/src/gallium/state_trackers/va/va_private.h
+++ b/src/gallium/state_trackers/va/va_private.h
@@ -246,6 +246,7 @@ typedef struct {
 typedef struct {
VAEntrypoint entrypoint;
enum pipe_video_profile profile;
+   enum pipe_h264_enc_rate_control_method rc;
 } vlVaConfig;
 
 typedef struct {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/12] radeon/omx: assign previous values to new structure

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/omx/vid_enc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/gallium/state_trackers/omx/vid_enc.c 
b/src/gallium/state_trackers/omx/vid_enc.c
index d70439a..bbc7941 100644
--- a/src/gallium/state_trackers/omx/vid_enc.c
+++ b/src/gallium/state_trackers/omx/vid_enc.c
@@ -1006,6 +1006,14 @@ static void enc_ScaleInput(omx_base_PortType *port, 
struct pipe_video_buffer **v
priv->current_scale_buffer %= OMX_VID_ENC_NUM_SCALING_BUFFERS;
 }
 
+static void enc_GetPictureParamPreset(struct pipe_h264_enc_picture_desc 
*picture)
+{
+   picture->motion_est.enc_disable_sub_mode = 0x00fe;
+   picture->motion_est.enc_ime2_search_range_x = 0x0001;
+   picture->motion_est.enc_ime2_search_range_y = 0x0001;
+   picture->pic_ctrl.enc_constraint_set_flags = 0x0040;
+}
+
 static void enc_ControlPicture(omx_base_PortType *port, struct 
pipe_h264_enc_picture_desc *picture)
 {
OMX_COMPONENTTYPE* comp = port->standCompContainer;
@@ -1064,6 +1072,8 @@ static void enc_ControlPicture(omx_base_PortType *port, 
struct pipe_h264_enc_pic
picture->frame_num = priv->frame_num;
picture->ref_idx_l0 = priv->ref_idx_l0;
picture->ref_idx_l1 = priv->ref_idx_l1;
+   picture->enable_vui = (picture->rate_ctrl.frame_rate_num != 0);
+   enc_GetPictureParamPreset(picture);
 }
 
 static void enc_HandleTask(omx_base_PortType *port, struct encode_task *task,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/12] st/va: colorspace conversion when image is yv12 and surface is nv12

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/image.c | 48 +--
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 3c8cc9c..1f68169 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -499,7 +499,7 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, 
VAImageID image,
VAImage *vaimage;
struct pipe_sampler_view **views;
enum pipe_format format;
-   void *data[3];
+   uint8_t *data[3];
unsigned pitches[3], i, j;
 
if (!ctx)
@@ -539,7 +539,9 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, 
VAImageID image,
   return VA_STATUS_ERROR_OPERATION_FAILED;
}
 
-   if (format != surf->buffer->buffer_format) {
+   if ((format != surf->buffer->buffer_format) &&
+  ((format != PIPE_FORMAT_YV12) || (surf->buffer->buffer_format != 
PIPE_FORMAT_NV12)) &&
+  ((format != PIPE_FORMAT_IYUV) || (surf->buffer->buffer_format != 
PIPE_FORMAT_NV12))) {
   struct pipe_video_buffer *tmp_buf;
   struct pipe_video_buffer templat = surf->templat;
 
@@ -581,12 +583,42 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, 
VAImageID image,
   unsigned width, height;
   if (!views[i]) continue;
   vlVaVideoSurfaceSize(surf, i, , );
-  for (j = 0; j < views[i]->texture->array_size; ++j) {
- struct pipe_box dst_box = {0, 0, j, width, height, 1};
- drv->pipe->transfer_inline_write(drv->pipe, views[i]->texture, 0,
-PIPE_TRANSFER_WRITE, _box,
-data[i] + pitches[i] * j,
-pitches[i] * views[i]->texture->array_size, 0);
+  if ((format == PIPE_FORMAT_YV12) || (format == PIPE_FORMAT_IYUV) &&
+ (surf->buffer->buffer_format == PIPE_FORMAT_NV12) && (i == 1)) {
+ struct pipe_transfer *transfer = NULL;
+ uint8_t *map = NULL;
+ struct pipe_box dst_box_1 = {0, 0, 0, width, height, 1};
+ map = drv->pipe->transfer_map(drv->pipe,
+   views[i]->texture,
+   0,
+   PIPE_TRANSFER_DISCARD_RANGE,
+   _box_1, );
+ if (map == NULL)
+return VA_STATUS_ERROR_OPERATION_FAILED;
+
+ bool odd = false;
+ for (unsigned int k = 0; k < ((vaimage->offsets[1])/2) ; k++){
+if (odd == false) {
+   map[k] = data[i][k/2];
+   odd = true;
+}
+else {
+   map[k] = data[i+1][k/2];
+   odd = false;
+}
+ }
+ pipe_transfer_unmap(drv->pipe, transfer);
+ pipe_mutex_unlock(drv->mutex);
+ return VA_STATUS_SUCCESS;
+  }
+  else {
+ for (j = 0; j < views[i]->texture->array_size; ++j) {
+struct pipe_box dst_box = {0, 0, j, width, height, 1};
+drv->pipe->transfer_inline_write(drv->pipe, views[i]->texture, 0,
+   PIPE_TRANSFER_WRITE, _box,
+   data[i] + pitches[i] * j,
+   pitches[i] * views[i]->texture->array_size, 0);
+ }
   }
}
pipe_mutex_unlock(drv->mutex);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/12] vl: add entry point

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/include/pipe/p_video_state.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/include/pipe/p_video_state.h 
b/src/gallium/include/pipe/p_video_state.h
index 9cd489b..040d2f1 100644
--- a/src/gallium/include/pipe/p_video_state.h
+++ b/src/gallium/include/pipe/p_video_state.h
@@ -131,6 +131,7 @@ enum pipe_h264_enc_rate_control_method
 struct pipe_picture_desc
 {
enum pipe_video_profile profile;
+   enum pipe_video_entrypoint entry_point;
 };
 
 struct pipe_quant_matrix
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/12] st/va: use correct pitch value for deriveimage call

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/image.c | 55 ---
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 1b956e3..c82b554 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -185,10 +185,12 @@ vlVaDeriveImage(VADriverContextP ctx, VASurfaceID 
surface, VAImage *image)
vlVaSurface *surf;
vlVaBuffer *img_buf;
VAImage *img;
+   struct pipe_sampler_view **views;
struct pipe_surface **surfaces;
int w;
int h;
int i;
+   int pitch[3];
 
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
@@ -220,6 +222,51 @@ vlVaDeriveImage(VADriverContextP ctx, VASurfaceID surface, 
VAImage *image)
w = align(surf->buffer->width, 2);
h = align(surf->buffer->height, 2);
 
+   switch (img->format.fourcc) {
+  case VA_FOURCC('N','V','1','2'):
+ img->num_planes = 2;
+ break;
+
+  case VA_FOURCC('I','4','2','0'):
+  case VA_FOURCC('Y','V','1','2'):
+ img->num_planes = 3;
+ break;
+
+  case VA_FOURCC('U','Y','V','Y'):
+  case VA_FOURCC('Y','U','Y','V'):
+  case VA_FOURCC('B','G','R','A'):
+  case VA_FOURCC('R','G','B','A'):
+  case VA_FOURCC('B','G','R','X'):
+  case VA_FOURCC('R','G','B','X'):
+ img->num_planes = 1;
+ break;
+
+  default:
+ /* VaDeriveImage is designed for contiguous planes. */
+ FREE(img);
+ return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT;
+   }
+
+   views = surf->buffer->get_sampler_view_planes(surf->buffer);
+   if (!views)
+  return VA_STATUS_ERROR_OPERATION_FAILED;
+
+   for (i = 0; i < img->num_planes; i++) {
+  unsigned width, height;
+  if (!views[i]) continue;
+  vlVaVideoSurfaceSize(surf, i, , );
+  struct pipe_box box = {0, 0, 0, width, height, 1};
+  struct pipe_transfer *transfer;
+  uint8_t *map;
+  map = drv->pipe->transfer_map(drv->pipe, views[i]->texture, 0,
+PIPE_TRANSFER_READ, , );
+  if (!map)
+ return VA_STATUS_ERROR_OPERATION_FAILED;
+
+  pitch[i] = transfer->stride;
+  pipe_transfer_unmap(drv->pipe, transfer);
+   }
+
for (i = 0; i < ARRAY_SIZE(formats); ++i) {
   if (img->format.fourcc == formats[i].fourcc) {
  img->format = formats[i];
@@ -231,9 +278,9 @@ vlVaDeriveImage(VADriverContextP ctx, VASurfaceID surface, 
VAImage *image)
case VA_FOURCC('U','Y','V','Y'):
case VA_FOURCC('Y','U','Y','V'):
   img->num_planes = 1;
-  img->pitches[0] = w * 2;
+  img->pitches[0] = pitch[0] * 2;
   img->offsets[0] = 0;
-  img->data_size  = w * h * 2;
+  img->data_size  = pitch[0] * h * 2;
   break;
 
case VA_FOURCC('B','G','R','A'):
@@ -241,9 +288,9 @@ vlVaDeriveImage(VADriverContextP ctx, VASurfaceID surface, 
VAImage *image)
case VA_FOURCC('B','G','R','X'):
case VA_FOURCC('R','G','B','X'):
   img->num_planes = 1;
-  img->pitches[0] = w * 4;
+  img->pitches[0] = pitch[0] * 4;
   img->offsets[0] = 0;
-  img->data_size  = w * h * 4;
+  img->data_size  = pitch[0] * h * 4;
   break;
 
default:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #11 from Luchesar V. ILIEV  ---
This seems to have been fixed with the latest Mesa from git. Either this, or
the 1.2 upgrade to Stellaris itself did the trick.

The only problem left now is the ugly cursor that uses a non-premultiplied
alpha.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Add MESA_shader_integer_functions

2016-06-30 Thread Matt Turner
On Thu, Jun 30, 2016 at 11:37 AM, Matt Turner  wrote:
> Patches 1-5 are
>
> Reviewed-by: Matt Turner 

As are 6-14.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Add gperf as a build dependency?

2016-06-30 Thread Jose Fonseca

On 30/06/16 18:13, Ian Romanick wrote:

I think I might want to use gperf for something in Mesa, but I'm not
100% sure yet.  Before I proceed, is it even acceptable to add that as a
build dependency?



I presume C code generated by gperf is freely usable and it's not 
subject to GPL, right?  (I couldn't find any explicit mention of that in 
the docs.)



It looks like the source code builds for Windows with MSVC.  So while it 
will be a bit of a hassle to get it added to our build toolchain, that's 
not a blocker.



Jose


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/12] radeon/vce: handle newly added parameters

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/drivers/radeon/radeon_vce_52.c | 33 ++
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_vce_52.c 
b/src/gallium/drivers/radeon/radeon_vce_52.c
index 7d33313..7986eb8 100644
--- a/src/gallium/drivers/radeon/radeon_vce_52.c
+++ b/src/gallium/drivers/radeon/radeon_vce_52.c
@@ -48,13 +48,14 @@ static void get_rate_control_param(struct rvce_encoder 
*enc, struct pipe_h264_en
enc->enc_pic.rc.quant_i_frames = pic->quant_i_frames;
enc->enc_pic.rc.quant_p_frames = pic->quant_p_frames;
enc->enc_pic.rc.quant_b_frames = pic->quant_b_frames;
+   enc->enc_pic.rc.gop_size = pic->gop_size;
enc->enc_pic.rc.frame_rate_num = pic->rate_ctrl.frame_rate_num;
enc->enc_pic.rc.frame_rate_den = pic->rate_ctrl.frame_rate_den;
enc->enc_pic.rc.max_qp = 51;
enc->enc_pic.rc.vbv_buffer_size = pic->rate_ctrl.vbv_buffer_size;
-   enc->enc_pic.rc.vbv_buf_lv = 0;
-   enc->enc_pic.rc.fill_data_enable = 0;
-   enc->enc_pic.rc.enforce_hrd = 0;
+   enc->enc_pic.rc.vbv_buf_lv = pic->rate_ctrl.vbv_buf_lv;
+   enc->enc_pic.rc.fill_data_enable = pic->rate_ctrl.fill_data_enable;
+   enc->enc_pic.rc.enforce_hrd = pic->rate_ctrl.enforce_hrd;
enc->enc_pic.rc.target_bits_picture = 
pic->rate_ctrl.target_bits_picture;
enc->enc_pic.rc.peak_bits_picture_integer = 
pic->rate_ctrl.peak_bits_picture_integer;
enc->enc_pic.rc.peak_bits_picture_fraction = 
pic->rate_ctrl.peak_bits_picture_fraction;
@@ -62,13 +63,13 @@ static void get_rate_control_param(struct rvce_encoder 
*enc, struct pipe_h264_en
 
 static void get_motion_estimation_param(struct rvce_encoder *enc, struct 
pipe_h264_enc_picture_desc *pic)
 {
-   enc->enc_pic.me.motion_est_quarter_pixel = 0x;
-   enc->enc_pic.me.enc_disable_sub_mode = 0x00fe;
-   enc->enc_pic.me.lsmvert = 0x;
-   enc->enc_pic.me.enc_en_ime_overw_dis_subm = 0x;
-   enc->enc_pic.me.enc_ime_overw_dis_subm_no = 0x;
-   enc->enc_pic.me.enc_ime2_search_range_x = 0x0001;
-   enc->enc_pic.me.enc_ime2_search_range_y = 0x0001;
+   enc->enc_pic.me.motion_est_quarter_pixel = 
pic->motion_est.motion_est_quarter_pixel;
+   enc->enc_pic.me.enc_disable_sub_mode = 
pic->motion_est.enc_disable_sub_mode;
+   enc->enc_pic.me.lsmvert = pic->motion_est.lsmvert;
+   enc->enc_pic.me.enc_en_ime_overw_dis_subm = 
pic->motion_est.enc_en_ime_overw_dis_subm;
+   enc->enc_pic.me.enc_ime_overw_dis_subm_no = 
pic->motion_est.enc_ime_overw_dis_subm_no;
+   enc->enc_pic.me.enc_ime2_search_range_x = 
pic->motion_est.enc_ime2_search_range_x;
+   enc->enc_pic.me.enc_ime2_search_range_y = 
pic->motion_est.enc_ime2_search_range_y;
enc->enc_pic.me.enc_ime_decimation_search = 0x0001;
enc->enc_pic.me.motion_est_half_pixel = 0x0001;
enc->enc_pic.me.enc_search_range_x = 0x0010;
@@ -90,8 +91,8 @@ static void get_pic_control_param(struct rvce_encoder *enc, 
struct pipe_h264_enc
enc->enc_pic.pc.enc_max_num_ref_frames = enc->base.max_references + 1;
enc->enc_pic.pc.enc_num_default_active_ref_l0 = 0x0001;
enc->enc_pic.pc.enc_num_default_active_ref_l1 = 0x0001;
-   enc->enc_pic.pc.enc_cabac_enable = 0x;
-   enc->enc_pic.pc.enc_constraint_set_flags = 0x0040;
+   enc->enc_pic.pc.enc_cabac_enable = pic->pic_ctrl.enc_cabac_enable;
+   enc->enc_pic.pc.enc_constraint_set_flags = 
pic->pic_ctrl.enc_constraint_set_flags;
enc->enc_pic.pc.enc_num_default_active_ref_l0 = 0x0001;
enc->enc_pic.pc.enc_num_default_active_ref_l1 = 0x0001;
 }
@@ -113,7 +114,7 @@ static void get_config_ext_param(struct rvce_encoder *enc)
 
 static void get_vui_param(struct rvce_encoder *enc, struct 
pipe_h264_enc_picture_desc *pic)
 {
-   enc->enc_pic.enable_vui = (pic->rate_ctrl.frame_rate_num != 0);
+   enc->enc_pic.enable_vui = pic->enable_vui;
enc->enc_pic.vui.video_format = 0x0005;
enc->enc_pic.vui.color_prim = 0x0002;
enc->enc_pic.vui.transfer_char = 0x0002;
@@ -149,10 +150,16 @@ void radeon_vce_52_get_param(struct rvce_encoder *enc, 
struct pipe_h264_enc_pict
 
enc->enc_pic.picture_type = pic->picture_type;
enc->enc_pic.frame_num = pic->frame_num;
+   enc->enc_pic.frame_num_cnt = pic->frame_num_cnt;
+   enc->enc_pic.p_remain = pic->p_remain;
+   enc->enc_pic.i_remain = pic->i_remain;
+   enc->enc_pic.gop_cnt = pic->gop_cnt;
enc->enc_pic.pic_order_cnt = pic->pic_order_cnt;
enc->enc_pic.ref_idx_l0 = pic->ref_idx_l0;
enc->enc_pic.ref_idx_l1 = pic->ref_idx_l1;
enc->enc_pic.not_referenced = pic->not_referenced;
+   enc->enc_pic.addrmode_arraymode_disrdo_distwoinstants = 
pic->ref_pic_mode;
+   enc->enc_pic.is_idr = pic->is_idr;
 }
 

[Mesa-dev] [PATCH 08/12] st/va: add functions for VAAPI encode

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/buffer.c |   6 +
 src/gallium/state_trackers/va/picture.c| 170 -
 src/gallium/state_trackers/va/va_private.h |   3 +
 3 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/va/buffer.c 
b/src/gallium/state_trackers/va/buffer.c
index 7d3167b..dfcebbe 100644
--- a/src/gallium/state_trackers/va/buffer.c
+++ b/src/gallium/state_trackers/va/buffer.c
@@ -133,6 +133,12 @@ vlVaMapBuffer(VADriverContextP ctx, VABufferID buf_id, 
void **pbuff)
   if (!buf->derived_surface.transfer || !*pbuff)
  return VA_STATUS_ERROR_INVALID_BUFFER;
 
+  if (buf->type == VAEncCodedBufferType) {
+ ((VACodedBufferSegment*)buf->data)->buf = *pbuff;
+ ((VACodedBufferSegment*)buf->data)->size = buf->coded_size;
+ ((VACodedBufferSegment*)buf->data)->next = NULL;
+ *pbuff = buf->data;
+  }
} else {
   pipe_mutex_unlock(drv->mutex);
   *pbuff = buf->data;
diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 89ac024..26205b1 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -78,7 +78,8 @@ vlVaBeginPicture(VADriverContextP ctx, VAContextID 
context_id, VASurfaceID rende
   return VA_STATUS_SUCCESS;
}
 
-   context->decoder->begin_frame(context->decoder, context->target, 
>desc.base);
+   if (context->decoder->entrypoint != PIPE_VIDEO_ENTRYPOINT_ENCODE)
+  context->decoder->begin_frame(context->decoder, context->target, 
>desc.base);
 
return VA_STATUS_SUCCESS;
 }
@@ -278,6 +279,140 @@ handleVASliceDataBufferType(vlVaContext *context, 
vlVaBuffer *buf)
   num_buffers, (const void * const*)buffers, sizes);
 }
 
+static VAStatus
+handleVAEncMiscParameterTypeRateControl(vlVaContext *context, 
VAEncMiscParameterBuffer *misc)
+{
+   VAEncMiscParameterRateControl *rc = (VAEncMiscParameterRateControl 
*)misc->data;
+   if (context->desc.h264enc.rate_ctrl.rate_ctrl_method ==
+   PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT)
+  context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second;
+   else
+  context->desc.h264enc.rate_ctrl.target_bitrate = rc->bits_per_second * 
rc->target_percentage;
+   context->desc.h264enc.rate_ctrl.peak_bitrate = rc->bits_per_second;
+   if (context->desc.h264enc.rate_ctrl.target_bitrate < 200)
+  context->desc.h264enc.rate_ctrl.vbv_buffer_size = 
MIN2((context->desc.h264enc.rate_ctrl.target_bitrate * 2.75), 200);
+   else
+  context->desc.h264enc.rate_ctrl.vbv_buffer_size = 
context->desc.h264enc.rate_ctrl.target_bitrate;
+
+   return VA_STATUS_SUCCESS;
+}
+
+static VAStatus
+handleVAEncSequenceParameterBufferType(vlVaDriver *drv, vlVaContext *context, 
vlVaBuffer *buf)
+{
+   VAEncSequenceParameterBufferH264 *h264 = (VAEncSequenceParameterBufferH264 
*)buf->data;
+   if (!context->decoder) {
+  context->templat.max_references = h264->max_num_ref_frames;
+  context->templat.level = h264->level_idc;
+  context->decoder = drv->pipe->create_video_codec(drv->pipe, 
>templat);
+  if (!context->decoder)
+ return VA_STATUS_ERROR_ALLOCATION_FAILED;
+   }
+   context->desc.h264enc.gop_size = h264->intra_idr_period;
+   return VA_STATUS_SUCCESS;
+}
+
+static VAStatus
+handleVAEncMiscParameterBufferType(vlVaContext *context, vlVaBuffer *buf)
+{
+   VAStatus vaStatus = VA_STATUS_SUCCESS;
+   VAEncMiscParameterBuffer *misc;
+   misc = buf->data;
+
+   switch (misc->type) {
+   case VAEncMiscParameterTypeRateControl:
+  vaStatus = handleVAEncMiscParameterTypeRateControl(context, misc);
+  break;
+
+   default:
+  break;
+   }
+
+   return vaStatus;
+}
+
+static VAStatus
+handleVAEncPictureParameterBufferType(vlVaDriver *drv, vlVaContext *context, 
vlVaBuffer *buf)
+{
+   VAEncPictureParameterBufferH264 *h264;
+   vlVaBuffer *coded_buf;
+
+   h264 = buf->data;
+   context->desc.h264enc.frame_num = h264->frame_num;
+   context->desc.h264enc.not_referenced = false;
+   context->desc.h264enc.is_idr = (h264->pic_fields.bits.idr_pic_flag == 1);
+   context->desc.h264enc.pic_order_cnt = h264->CurrPic.TopFieldOrderCnt / 2;
+   if (context->desc.h264enc.is_idr)
+  context->desc.h264enc.i_remain = 1;
+   else
+  context->desc.h264enc.i_remain = 0;
+
+   context->desc.h264enc.p_remain = context->desc.h264enc.gop_size - 
context->desc.h264enc.gop_cnt - context->desc.h264enc.i_remain;
+
+   coded_buf = handle_table_get(drv->htab, h264->coded_buf);
+   coded_buf->derived_surface.resource = pipe_buffer_create(drv->pipe->screen, 
PIPE_BIND_VERTEX_BUFFER,
+ PIPE_USAGE_STREAM, coded_buf->size);
+   context->coded_buf = coded_buf;
+
+   context->desc.h264enc.frame_idx[h264->CurrPic.picture_id] = h264->frame_num;
+   if (context->desc.h264enc.is_idr)
+  context->desc.h264enc.picture_type = 

[Mesa-dev] [PATCH 09/12] st/va: add preset values for VAAPI encode

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/picture.c | 36 +
 1 file changed, 36 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 26205b1..2d22e8b 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -95,6 +95,41 @@ vlVaGetReferenceFrame(vlVaDriver *drv, VASurfaceID 
surface_id,
   *ref_frame = NULL;
 }
 
+static void
+getEncParamPreset(vlVaContext *context)
+{
+   //motion estimation preset
+   context->desc.h264enc.motion_est.motion_est_quarter_pixel = 0x0001;
+   context->desc.h264enc.motion_est.lsmvert = 0x0002;
+   context->desc.h264enc.motion_est.enc_disable_sub_mode = 0x0078;
+   context->desc.h264enc.motion_est.enc_en_ime_overw_dis_subm = 0x0001;
+   context->desc.h264enc.motion_est.enc_ime_overw_dis_subm_no = 0x0001;
+   context->desc.h264enc.motion_est.enc_ime2_search_range_x = 0x0004;
+   context->desc.h264enc.motion_est.enc_ime2_search_range_y = 0x0004;
+
+   //pic control preset
+   context->desc.h264enc.pic_ctrl.enc_cabac_enable = 0x0001;
+   context->desc.h264enc.pic_ctrl.enc_constraint_set_flags = 0x0040;
+
+   //rate control
+   context->desc.h264enc.rate_ctrl.vbv_buffer_size = 2000;
+   if (context->desc.h264enc.rate_ctrl.frame_rate_num == 0) {
+  context->desc.h264enc.rate_ctrl.frame_rate_num = 30;
+  context->desc.h264enc.rate_ctrl.frame_rate_den = 1;
+   }
+   context->desc.h264enc.rate_ctrl.vbv_buf_lv = 48;
+   context->desc.h264enc.rate_ctrl.fill_data_enable = 1;
+   context->desc.h264enc.rate_ctrl.enforce_hrd = 1;
+   context->desc.h264enc.enable_vui = false;
+   context->desc.h264enc.rate_ctrl.target_bits_picture =
+  context->desc.h264enc.rate_ctrl.target_bitrate / 
context->desc.h264enc.rate_ctrl.frame_rate_num;
+   context->desc.h264enc.rate_ctrl.peak_bits_picture_integer =
+  context->desc.h264enc.rate_ctrl.peak_bitrate / 
context->desc.h264enc.rate_ctrl.frame_rate_num;
+   context->desc.h264enc.rate_ctrl.peak_bits_picture_fraction = 0;
+
+   context->desc.h264enc.ref_pic_mode = 0x0201;
+}
+
 static VAStatus
 handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext *context, vlVaBuffer 
*buf)
 {
@@ -522,6 +557,7 @@ vlVaEndPicture(VADriverContextP ctx, VAContextID context_id)
 
if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
   coded_buf = context->coded_buf;
+  getEncParamPreset(context);
   context->decoder->begin_frame(context->decoder, context->target, 
>desc.base);
   context->decoder->encode_bitstream(context->decoder, context->target,
  coded_buf->derived_surface.resource, 
);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/12] st/va: add nv12 i420 yv12 format to deriveimage call

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/image.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index c82b554..3c8cc9c 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -275,6 +275,27 @@ vlVaDeriveImage(VADriverContextP ctx, VASurfaceID surface, 
VAImage *image)
}
 
switch (img->format.fourcc) {
+   case VA_FOURCC('N','V','1','2'):
+  img->num_planes = 2;
+  img->pitches[0] = pitch[0];
+  img->offsets[0] = 0;
+  img->pitches[1] = pitch[1];
+  img->offsets[1] = pitch[0] * h;
+  img->data_size  = pitch[0] * h + pitch[1] * h / 2;
+  break;
+
+   case VA_FOURCC('I','4','2','0'):
+   case VA_FOURCC('Y','V','1','2'):
+  img->num_planes = 3;
+  img->pitches[0] = pitch[0];
+  img->offsets[0] = 0;
+  img->pitches[1] = pitch[1];
+  img->offsets[1] = pitch[0] * h;
+  img->pitches[2] = pitch[2];
+  img->offsets[2] = pitch[0] * h + pitch[1] * h / 4;
+  img->data_size  = pitch[0] * h + pitch[1] * h / 4 + pitch[2] * h / 4;
+  break;
+
case VA_FOURCC('U','Y','V','Y'):
case VA_FOURCC('Y','U','Y','V'):
   img->num_planes = 1;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] st/va: enable h264 VAAPI encode

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/state_trackers/va/config.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/gallium/state_trackers/va/config.c 
b/src/gallium/state_trackers/va/config.c
index ea838c0..04d214d 100644
--- a/src/gallium/state_trackers/va/config.c
+++ b/src/gallium/state_trackers/va/config.c
@@ -74,6 +74,7 @@ vlVaQueryConfigEntrypoints(VADriverContextP ctx, VAProfile 
profile,
 {
struct pipe_screen *pscreen;
enum pipe_video_profile p;
+   int va_status = VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
 
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
@@ -90,12 +91,18 @@ vlVaQueryConfigEntrypoints(VADriverContextP ctx, VAProfile 
profile,
   return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
 
pscreen = VL_VA_PSCREEN(ctx);
-   if (!pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, 
PIPE_VIDEO_CAP_SUPPORTED))
-  return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
-
-   entrypoint_list[(*num_entrypoints)++] = VAEntrypointVLD;
+   if (pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, 
PIPE_VIDEO_CAP_SUPPORTED)) {
+  entrypoint_list[(*num_entrypoints)++] = VAEntrypointVLD;
+  va_status = VA_STATUS_SUCCESS;
+   }
+   if (pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_ENCODE, 
PIPE_VIDEO_CAP_SUPPORTED) &&
+   p == PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE) {
+  entrypoint_list[(*num_entrypoints)++] = VAEntrypointEncSlice;
+  entrypoint_list[(*num_entrypoints)++] = VAEntrypointEncPicture;
+  va_status = VA_STATUS_SUCCESS;
+   }
 
-   return VA_STATUS_SUCCESS;
+   return va_status;
 }
 
 VAStatus
@@ -114,7 +121,7 @@ vlVaGetConfigAttributes(VADriverContextP ctx, VAProfile 
profile, VAEntrypoint en
  value = VA_RT_FORMAT_YUV420;
  break;
   case VAConfigAttribRateControl:
- value = VA_RC_NONE;
+ value = VA_RC_CQP | VA_RC_CBR;
  break;
   default:
  value = VA_ATTRIB_NOT_SUPPORTED;
@@ -159,10 +166,15 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, 
VAEntrypoint entrypoin
   return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
 
pscreen = VL_VA_PSCREEN(ctx);
-   if (!pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, 
PIPE_VIDEO_CAP_SUPPORTED))
-  return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
-
-   if (entrypoint != VAEntrypointVLD)
+   if (entrypoint == VAEntrypointVLD) {
+  if (!pscreen->get_video_param(pscreen, p, 
PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED))
+ return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+   }
+   else if (entrypoint == VAEntrypointEncSlice) {
+  if (!pscreen->get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_ENCODE, 
PIPE_VIDEO_CAP_SUPPORTED))
+ return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+   }
+   else
   return VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
 
if (entrypoint == VAEntrypointEncSlice || entrypoint == 
VAEntrypointEncPicture)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/24] glsl: Add lowering pass for ir_unop_bit_count

2016-06-30 Thread Matt Turner
On Wed, Jun 29, 2016 at 2:04 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/glsl/ir_optimization.h  |  1 +
>  src/compiler/glsl/lower_instructions.cpp | 53 
> 
>  2 files changed, 54 insertions(+)
>
> diff --git a/src/compiler/glsl/ir_optimization.h 
> b/src/compiler/glsl/ir_optimization.h
> index ba14e34..019909a 100644
> --- a/src/compiler/glsl/ir_optimization.h
> +++ b/src/compiler/glsl/ir_optimization.h
> @@ -42,6 +42,7 @@
>  #define SAT_TO_CLAMP   0x400
>  #define DOPS_TO_DFRAC  0x800
>  #define DFREXP_DLDEXP_TO_ARITH0x1000
> +#define BIT_COUNT_TO_MATH 0x02000

No preference, but there's a leading zero in this constant.

>
>  /**
>   * \see class lower_packing_builtins_visitor
> diff --git a/src/compiler/glsl/lower_instructions.cpp 
> b/src/compiler/glsl/lower_instructions.cpp
> index 16d92d6..fc87605 100644
> --- a/src/compiler/glsl/lower_instructions.cpp
> +++ b/src/compiler/glsl/lower_instructions.cpp
> @@ -159,6 +159,7 @@ private:
> void dround_even_to_dfrac(ir_expression *);
> void dtrunc_to_dfrac(ir_expression *);
> void dsign_to_csel(ir_expression *);
> +   void bit_count_to_math(ir_expression *);
>  };
>
>  } /* anonymous namespace */
> @@ -954,6 +955,52 @@ lower_instructions_visitor::dsign_to_csel(ir_expression 
> *ir)
> this->progress = true;
>  }
>
> +void
> +lower_instructions_visitor::bit_count_to_math(ir_expression *ir)
> +{
> +   /* For more details, see:
> +*
> +* http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetPaallel

Lost the 'r' in Parallel in the link.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/12] vl: add parameters for VAAPI encode

2016-06-30 Thread Boyuan Zhang
Signed-off-by: Boyuan Zhang 
---
 src/gallium/include/pipe/p_video_state.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/src/gallium/include/pipe/p_video_state.h 
b/src/gallium/include/pipe/p_video_state.h
index d353be6..9cd489b 100644
--- a/src/gallium/include/pipe/p_video_state.h
+++ b/src/gallium/include/pipe/p_video_state.h
@@ -352,9 +352,29 @@ struct pipe_h264_enc_rate_control
unsigned frame_rate_num;
unsigned frame_rate_den;
unsigned vbv_buffer_size;
+   unsigned vbv_buf_lv;
unsigned target_bits_picture;
unsigned peak_bits_picture_integer;
unsigned peak_bits_picture_fraction;
+   unsigned fill_data_enable;
+   unsigned enforce_hrd;
+};
+
+struct pipe_h264_enc_motion_estimation
+{
+   unsigned motion_est_quarter_pixel;
+   unsigned enc_disable_sub_mode;
+   unsigned lsmvert;
+   unsigned enc_en_ime_overw_dis_subm;
+   unsigned enc_ime_overw_dis_subm_no;
+   unsigned enc_ime2_search_range_x;
+   unsigned enc_ime2_search_range_y;
+};
+
+struct pipe_h264_enc_pic_control
+{
+   unsigned enc_cabac_enable;
+   unsigned enc_constraint_set_flags;
 };
 
 struct pipe_h264_enc_picture_desc
@@ -363,17 +383,33 @@ struct pipe_h264_enc_picture_desc
 
struct pipe_h264_enc_rate_control rate_ctrl;
 
+   struct pipe_h264_enc_motion_estimation motion_est;
+   struct pipe_h264_enc_pic_control pic_ctrl;
+
unsigned quant_i_frames;
unsigned quant_p_frames;
unsigned quant_b_frames;
 
enum pipe_h264_enc_picture_type picture_type;
unsigned frame_num;
+   unsigned frame_num_cnt;
+   unsigned p_remain;
+   unsigned i_remain;
+   unsigned idr_pic_id;
+   unsigned gop_cnt;
unsigned pic_order_cnt;
unsigned ref_idx_l0;
unsigned ref_idx_l1;
+   unsigned gop_size;
+   unsigned ref_pic_mode;
 
bool not_referenced;
+   bool is_idr;
+   bool has_ref_pic_list;
+   bool enable_vui;
+   unsigned int ref_pic_list_0[32];
+   unsigned int ref_pic_list_1[32];
+   unsigned int frame_idx[32];
 };
 
 struct pipe_h265_sps
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars

2016-06-30 Thread Kenneth Graunke
On Saturday, June 25, 2016 8:37:47 AM PDT Rob Clark wrote:
> From: Rob Clark 
> 
> Some games are sloppy.. perhaps because it is defined behavior for DX or
> perhaps because nv blob driver defaults things to zero.
> 
> So add driconf param to force uninitialized variables to default to zero.
> 
> This issue was observed with rust, from steam store.  But has surfaced
> elsewhere in the past.
> 
> Signed-off-by: Rob Clark 
> ---
> Note that I left out the drirc bit, since not entirely sure how to
> identify this game.  (I don't actually have the game, just working off
> of an apitrace)
> 
> Possibly worth mentioning that for the shaders using uninitialized vars
> having zero-initializers lets constant-propagation get rid of a whole
> lot of instructions.  One shader I saw dropped to less than half of
> it's original instruction count.
> 
> Second patch in the series is just fixing an i965 bug that was exposed
> by this patch.

I'm a bit surprised to see this at the GLSL IR level...handling it for
nir_ssa_undef would probably be simpler.  But I suppose this works too.

>  src/compiler/glsl/ast_to_hir.cpp| 9 +
>  src/compiler/glsl/glsl_parser_extras.cpp| 1 +
>  src/compiler/glsl/glsl_parser_extras.h  | 1 +
>  src/gallium/include/state_tracker/st_api.h  | 1 +
>  src/gallium/state_trackers/dri/dri_screen.c | 2 ++
>  src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 -
>  src/mesa/drivers/dri/i965/brw_context.c | 2 ++
>  src/mesa/drivers/dri/i965/intel_screen.c| 4 
>  src/mesa/main/mtypes.h  | 5 +
>  src/mesa/state_tracker/st_extensions.c  | 2 ++
>  10 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/src/compiler/glsl/ast_to_hir.cpp 
> b/src/compiler/glsl/ast_to_hir.cpp
> index 0cfce68..d2c284f 100644
> --- a/src/compiler/glsl/ast_to_hir.cpp
> +++ b/src/compiler/glsl/ast_to_hir.cpp
> @@ -4697,6 +4697,15 @@ ast_declarator_list::hir(exec_list *instructions,
>apply_layout_qualifier_to_variable(>type->qualifier, var, state,
>   );
>  
> +  if ((var->data.mode == ir_var_auto || var->data.mode == 
> ir_var_temporary)
> +  && (var->type->base_type >= GLSL_TYPE_UINT)
> +  && (var->type->base_type <= GLSL_TYPE_BOOL)

I'd prefer:

 && (var->type->is_numeric() || var->type->is_boolean())

Either way,
Reviewed-by: Kenneth Graunke 

Patch 2 also gets an R-b with Jason's feedback (d[0] and u[0] on LHS)
taken care of.

> +  && state->zero_init) {
> + const ir_constant_data data = {0};
> + var->data.has_initializer = true;
> + var->constant_initializer = new(var) ir_constant(var->type, );
> +  }
> +
>if (this->type->qualifier.flags.q.invariant) {
>   if (!is_varying_var(var, state->stage)) {
>  _mesa_glsl_error(, state,
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index 09f7477..fc2859a 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -74,6 +74,7 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct 
> gl_context *_ctx,
> /* Set default language version and extensions */
> this->language_version = 110;
> this->forced_language_version = ctx->Const.ForceGLSLVersion;
> +   this->zero_init = ctx->Const.GLSLZeroInit;
> this->es_shader = false;
> this->ARB_texture_rectangle_enable = true;
>  
> diff --git a/src/compiler/glsl/glsl_parser_extras.h 
> b/src/compiler/glsl/glsl_parser_extras.h
> index 8c43292..669b3d1 100644
> --- a/src/compiler/glsl/glsl_parser_extras.h
> +++ b/src/compiler/glsl/glsl_parser_extras.h
> @@ -306,6 +306,7 @@ struct _mesa_glsl_parse_state {
> bool es_shader;
> unsigned language_version;
> unsigned forced_language_version;
> +   bool zero_init;
> gl_shader_stage stage;
>  
> /**
> diff --git a/src/gallium/include/state_tracker/st_api.h 
> b/src/gallium/include/state_tracker/st_api.h
> index 41daa47..21d5177 100644
> --- a/src/gallium/include/state_tracker/st_api.h
> +++ b/src/gallium/include/state_tracker/st_api.h
> @@ -242,6 +242,7 @@ struct st_config_options
> unsigned force_glsl_version;
> boolean force_s3tc_enable;
> boolean allow_glsl_extension_directive_midshader;
> +   boolean glsl_zero_init;
>  };
>  
>  /**
> diff --git a/src/gallium/state_trackers/dri/dri_screen.c 
> b/src/gallium/state_trackers/dri/dri_screen.c
> index 2ac55c8..b16585a 100644
> --- a/src/gallium/state_trackers/dri/dri_screen.c
> +++ b/src/gallium/state_trackers/dri/dri_screen.c
> @@ -74,6 +74,7 @@ const __DRIconfigOptionsExtension gallium_config_options = {
>  
>DRI_CONF_SECTION_MISCELLANEOUS
>   DRI_CONF_ALWAYS_HAVE_DEPTH_BUFFER("false")
> + DRI_CONF_GLSL_ZERO_INIT("false")
>DRI_CONF_SECTION_END

Re: [Mesa-dev] [PATCH 00/24] Add MESA_shader_integer_functions

2016-06-30 Thread Matt Turner
Patches 1-5 are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] freedreno/a3xx: Fix sign compare warnings

2016-06-30 Thread Francesco Ansanelli
---
 src/gallium/drivers/freedreno/a3xx/fd3_screen.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_screen.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_screen.c
index 4aea2fe..013b0ca 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_screen.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_screen.c
@@ -52,12 +52,12 @@ fd3_screen_is_format_supported(struct pipe_screen *pscreen,
}
 
if ((usage & PIPE_BIND_VERTEX_BUFFER) &&
-   (fd3_pipe2vtx(format) != ~0)) {
+   (fd3_pipe2vtx(format) != ~0u)) {
retval |= PIPE_BIND_VERTEX_BUFFER;
}
 
if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
-   (fd3_pipe2tex(format) != ~0)) {
+   (fd3_pipe2tex(format) != ~0u)) {
retval |= PIPE_BIND_SAMPLER_VIEW;
}
 
@@ -66,8 +66,8 @@ fd3_screen_is_format_supported(struct pipe_screen *pscreen,
PIPE_BIND_SCANOUT |
PIPE_BIND_SHARED |
PIPE_BIND_BLENDABLE)) &&
-   (fd3_pipe2color(format) != ~0) &&
-   (fd3_pipe2tex(format) != ~0)) {
+   (fd3_pipe2color(format) != ~0u) &&
+   (fd3_pipe2tex(format) != ~0u)) {
retval |= usage & (PIPE_BIND_RENDER_TARGET |
PIPE_BIND_DISPLAY_TARGET |
PIPE_BIND_SCANOUT |
@@ -77,13 +77,13 @@ fd3_screen_is_format_supported(struct pipe_screen *pscreen,
}
 
if ((usage & PIPE_BIND_DEPTH_STENCIL) &&
-   (fd_pipe2depth(format) != ~0) &&
-   (fd3_pipe2tex(format) != ~0)) {
+   (fd_pipe2depth(format) != ~0u) &&
+   (fd3_pipe2tex(format) != ~0u)) {
retval |= PIPE_BIND_DEPTH_STENCIL;
}
 
if ((usage & PIPE_BIND_INDEX_BUFFER) &&
-   (fd_pipe2index(format) != ~0)) {
+   (fd_pipe2index(format) != ~0u)) {
retval |= PIPE_BIND_INDEX_BUFFER;
}
 
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] freedreno/a2xx: Fix sign compare warnings

2016-06-30 Thread Francesco Ansanelli
---
 src/gallium/drivers/freedreno/a2xx/fd2_screen.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_screen.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_screen.c
index c2baa6f..fe4849b 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_screen.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_screen.c
@@ -61,7 +61,7 @@ fd2_screen_is_format_supported(struct pipe_screen *pscreen,
 
if ((usage & (PIPE_BIND_SAMPLER_VIEW |
PIPE_BIND_VERTEX_BUFFER)) &&
-   (fd2_pipe2surface(format) != ~0)) {
+   (fd2_pipe2surface(format) != ~0u)) {
retval |= usage & (PIPE_BIND_SAMPLER_VIEW |
PIPE_BIND_VERTEX_BUFFER);
}
@@ -70,7 +70,7 @@ fd2_screen_is_format_supported(struct pipe_screen *pscreen,
PIPE_BIND_DISPLAY_TARGET |
PIPE_BIND_SCANOUT |
PIPE_BIND_SHARED)) &&
-   (fd2_pipe2color(format) != ~0)) {
+   (fd2_pipe2color(format) != ~0u)) {
retval |= usage & (PIPE_BIND_RENDER_TARGET |
PIPE_BIND_DISPLAY_TARGET |
PIPE_BIND_SCANOUT |
@@ -78,12 +78,12 @@ fd2_screen_is_format_supported(struct pipe_screen *pscreen,
}
 
if ((usage & PIPE_BIND_DEPTH_STENCIL) &&
-   (fd_pipe2depth(format) != ~0)) {
+   (fd_pipe2depth(format) != ~0u)) {
retval |= PIPE_BIND_DEPTH_STENCIL;
}
 
if ((usage & PIPE_BIND_INDEX_BUFFER) &&
-   (fd_pipe2index(format) != ~0)) {
+   (fd_pipe2index(format) != ~0u)) {
retval |= PIPE_BIND_INDEX_BUFFER;
}
 
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >