Re: [Mesa-dev] [PATCH] i965: oa-counters: keep on reading reports until delimiting timestamp

2017-04-04 Thread Robert Bragg
On Tue, Apr 4, 2017 at 10:20 AM, Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> On 03/04/17 21:04, Robert Bragg wrote:
>
>
>
> On Mar 30, 2017 16:16, "Lionel Landwerlin" <lionel.g.landwer...@intel.com>
> wrote:
>
> While exercising reading report with moderate load, we might have to
> wait for all the reports to land in the OA buffer, otherwise we might
> miss some reports. That means we need to keep on reading the OA stream
> until the last report we read has a timestamp older that the timestamp
> recorded by the MI_REPORT_PERF_COUNT at the end of the performance
> query.
>
>
> The expectation is that since we have received the second
> MI_REPORT_PERF_COUNT report that implies any periodic/automatic reports
> that the could possibly relate to a query must have been written to the
> OABUFFER. Since we read() as much as is available from i915 perf when we
> come to finish processing a query then we shouldn't miss anything.
>
> We shouldn't synchronously wait in a busy loop until we've found a report
> that has a timestamp >= our end MI_RPC because that could be a really
> significant amount of time (e.g. 50+ milliseconds on HSW). NB. the periodic
> samples are just to account for counter overflow. On Gen8+ the loop would
> likely block until the next context switch happens.
>
>
> Right, so if we don't get all the reports until the second
> MI_REPORT_PERF_COUNT, we might be dealing with a kernel issue.
>

Okey, so the realisation we've come to talking about this offline is that
the kernel's handling of the OA unit's tail pointer race condition breaks
this assumption :-/

Even though any necessary reports must have landed in the OABUFFER, the
kernel will throttle/delay access to the most recent reports in case the HW
tail pointer has got ahead of what's visible to the CPU in memory.

To handle this without blocking in a busy loop as above I guess we probably
need to drive our i915 perf reads() when the application asks if a query's
results are ready via brw_is_perf_query_ready().

It also isn't enough for brw_wait_perf_query to rely on
drm_intel_bo_wait_rendering() with the MI_RPC bo.

We might want to bump the periodic sampling frequency to a small multiple
of 5 milliseconds to at least reduce the worst-case latency for results
becoming ready. It wouldn't be that worthwhile sampling with a higher
frequency since the kernel only does tail pointer age checks at 200Hz
currently.


>
> I'm still suspicious of the code in mesa. I think in some cases (in
> particular with many applications running) it might loop forever.
>

Hope it's ok, but maybe there's an issue, can you maybe elaborate on what
details looks suspicious to you?


>
> Also the igt tests using roughly the same loop while (read() && errno !=
> EINTR) are pretty reliable at not finding all the reports we need from to
> the stream.
> So I would expect the same thing from mesa.
>

Right, I wrote the igt code first and when that looked to be working I
ported it to Mesa.

The issue with the tail pointer race handling by the kernel sounds like a
likely explanation for seeing issues here in both mesa and igt.



>
>
>
> Was this proposal based on a code inspection or did you maybe hit a
> problem correctly measuring something?
>
> Maybe if based on inspection we can find somewhere to put a comment to
> clarify the assumptions above?
>
> Br,
> - Robert
>
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
> Cc: Robert Bragg <rob...@sixbynine.org>
> ---
>  src/mesa/drivers/dri/i965/brw_performance_query.c | 51
> ++-
>  1 file changed, 32 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c
> b/src/mesa/drivers/dri/i965/brw_performance_query.c
> index 4a94e4b3cc..076c59e633 100644
> --- a/src/mesa/drivers/dri/i965/brw_performance_query.c
> +++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
> @@ -647,38 +647,50 @@ discard_all_queries(struct brw_context *brw)
>  }
>
>  static bool
> -read_oa_samples(struct brw_context *brw)
> +read_oa_samples_until(struct brw_context *brw,
> +  uint32_t start_timestamp,
> +  uint32_t end_timestamp)
>  {
> -   while (1) {
> +   uint32_t last_timestamp = start_timestamp;
> +
> +   while (last_timestamp < end_timestamp) {
>struct brw_oa_sample_buf *buf = get_free_sample_buf(brw);
> +  uint32_t offset;
>int len;
>
>while ((len = read(brw->perfquery.oa_stream_fd, buf->buf,
> - sizeof(buf->buf))) < 0 && errno == EINTR)
> + sizeof(buf->buf))) < 0 &&
> + (errno == EINTR

Re: [Mesa-dev] [PATCH] i965: oa-counters: keep on reading reports until delimiting timestamp

2017-04-03 Thread Robert Bragg
On Mar 30, 2017 16:16, "Lionel Landwerlin" <lionel.g.landwer...@intel.com>
wrote:

While exercising reading report with moderate load, we might have to
wait for all the reports to land in the OA buffer, otherwise we might
miss some reports. That means we need to keep on reading the OA stream
until the last report we read has a timestamp older that the timestamp
recorded by the MI_REPORT_PERF_COUNT at the end of the performance
query.


The expectation is that since we have received the second
MI_REPORT_PERF_COUNT report that implies any periodic/automatic reports
that the could possibly relate to a query must have been written to the
OABUFFER. Since we read() as much as is available from i915 perf when we
come to finish processing a query then we shouldn't miss anything.

We shouldn't synchronously wait in a busy loop until we've found a report
that has a timestamp >= our end MI_RPC because that could be a really
significant amount of time (e.g. 50+ milliseconds on HSW). NB. the periodic
samples are just to account for counter overflow. On Gen8+ the loop would
likely block until the next context switch happens.

Was this proposal based on a code inspection or did you maybe hit a problem
correctly measuring something?

Maybe if based on inspection we can find somewhere to put a comment to
clarify the assumptions above?

Br,
- Robert


Signed-off-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Cc: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/brw_performance_query.c | 51
++-
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c
b/src/mesa/drivers/dri/i965/brw_performance_query.c
index 4a94e4b3cc..076c59e633 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -647,38 +647,50 @@ discard_all_queries(struct brw_context *brw)
 }

 static bool
-read_oa_samples(struct brw_context *brw)
+read_oa_samples_until(struct brw_context *brw,
+  uint32_t start_timestamp,
+  uint32_t end_timestamp)
 {
-   while (1) {
+   uint32_t last_timestamp = start_timestamp;
+
+   while (last_timestamp < end_timestamp) {
   struct brw_oa_sample_buf *buf = get_free_sample_buf(brw);
+  uint32_t offset;
   int len;

   while ((len = read(brw->perfquery.oa_stream_fd, buf->buf,
- sizeof(buf->buf))) < 0 && errno == EINTR)
+ sizeof(buf->buf))) < 0 &&
+ (errno == EINTR || errno == EAGAIN))
  ;

   if (len <= 0) {
  exec_list_push_tail(>perfquery.free_sample_buffers,
>link);

- if (len < 0) {
-if (errno == EAGAIN)
-   return true;
-else {
-   DBG("Error reading i915 perf samples: %m\n");
-   return false;
-}
- } else {
+ if (len < 0)
+DBG("Error reading i915 perf samples: %m\n");
+ else
 DBG("Spurious EOF reading i915 perf samples\n");
-return false;
- }
+
+ return false;
   }

   buf->len = len;
   exec_list_push_tail(>perfquery.sample_buffers, >link);
+
+  /* Go through the reports and update the last timestamp. */
+  while (offset < buf->len) {
+ const struct drm_i915_perf_record_header *header =
+(const struct drm_i915_perf_record_header *) >buf[offset];
+ uint32_t *report = (uint32_t *) (header + 1);
+
+ if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
+last_timestamp = report[1];
+
+ offset += header->size;
+  }
}

-   unreachable("not reached");
-   return false;
+   return true;
 }

 /**
@@ -709,10 +721,6 @@ accumulate_oa_reports(struct brw_context *brw,

assert(o->Ready);

-   /* Collect the latest periodic OA reports from i915 perf */
-   if (!read_oa_samples(brw))
-  goto error;
-
drm_intel_bo_map(obj->oa.bo, false);
query_buffer = obj->oa.bo->virtual;

@@ -728,6 +736,11 @@ accumulate_oa_reports(struct brw_context *brw,
   goto error;
}

+   /* Collect the latest periodic OA reports from i915 perf */
+   if (!read_oa_samples_until(brw, start[1], end[1]))
+  goto error;
+
+
/* See if we have any periodic reports to accumulate too... */

/* N.B. The oa.samples_head was set when the query began and
--
2.11.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv/device: init timestampPeriod from devinfo

2017-03-17 Thread Robert Bragg
I've just pushed an old patch (was already reviewed, but I never followed up on
it) that added a timebase_scale to gen_device_info so we can use that instead
here instead.

--- >8 ---

Now that there's a timebase_scale in gen_device_info which is
effectively the 'period' this switches anv_GetPhysicalDeviceProperties
to using this common device info to initialize the timestampPeriod
device limit.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/intel/vulkan/anv_device.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index fcfbd5fd6b..748d70327e 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -535,8 +535,6 @@ void anv_GetPhysicalDeviceProperties(
ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
const struct gen_device_info *devinfo = >info;
 
-   const float time_stamp_base = devinfo->gen >= 9 ? 83.333 : 80.0;
-
/* See assertions made when programming the buffer surface state. */
const uint32_t max_raw_buffer_sz = devinfo->gen >= 7 ?
   (1ul << 30) : (1ul << 27);
@@ -641,7 +639,7 @@ void anv_GetPhysicalDeviceProperties(
   .storageImageSampleCounts = VK_SAMPLE_COUNT_1_BIT,
   .maxSampleMaskWords   = 1,
   .timestampComputeAndGraphics  = false,
-  .timestampPeriod  = time_stamp_base,
+  .timestampPeriod  = devinfo->timebase_scale,
   .maxClipDistances = 8,
   .maxCullDistances = 8,
   .maxCombinedClipAndCullDistances  = 8,
-- 
2.12.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Allow a per gen timebase scale factor

2017-03-17 Thread Robert Bragg
On Fri, Jan 6, 2017 at 9:28 PM, Kenneth Graunke <kenn...@whitecape.org> wrote:
> On Friday, January 6, 2017 1:17:39 PM PST Kenneth Graunke wrote:
>> From: Robert Bragg <rob...@sixbynine.org>
>>
>> v2: (Ken) Update timebase_scale for platforms past Skylake/Broxton too.
>
> Hi Robert!
>
> Your patch had merge conflicts in gen_device_info.c at this point, so I
> fixed those and re-sent it.  It also looked like it didn't set
> timebase_scale for KBL, GLK, etc...it should now (via GEN9_FEATURES and
> GEN9_LP_FEATURES).
>
> [snip]
>
>> +/* As best we know currently, the Gen HW timestamps are 36bits across
>> + * all platforms, which we need to account for when calculating a
>> + * delta to measure elapsed time.
>> + *
>> + * The timestamps read via glGetTimestamp() / brw_get_timestamp() sometimes
>> + * only have 32bits due to a kernel bug and so in that case we make sure to
>> + * treat all raw timestamps as 32bits so they overflow consistently and 
>> remain
>> + * comparable.
>> + */
>> +uint64_t
>> +brw_raw_timestamp_delta(struct brw_context *brw, uint64_t time0, uint64_t 
>> time1)
>> +{
>> +   if (brw->screen->hw_has_timestamp == 2) {
>> +  /* Kernel clips timestamps to 32bits in this case */
>> +  return (uint32_t)time1 - (uint32_t)time0;
>
> Is this right?  intel_detect_timestamp() says
>
>  return 2; /* upper dword holds the low 32bits of the timestamp */
>
> but casting these to uint32_t should take the low DWords...

*very* late seeing this and following up, sorry.

I think this is ok. The timestamps we're dealing with here aren't from
the kernel, they are read via a PIPE_CONTROL. The point here is really
just being consistent with clipping raw timestamps to 32bits if we're
running on a buggy kernel. It looks like brw_get_timestamp() is doing
the right thing with reporting the upper dword when the timestamp is
queried from the kernel.

I've updated the comment above the function and above this specific
return statement to hopefully clarify this.

>
>> +   } else {
>> +  if (time0 > time1)
>> + return (1ULL << 36) + time1 - time0;
>> +  else
>> + return time1 - time0;
>> +   }
>> +}
>
> Otherwise, this looks good to me, and is:
> Reviewed-by: Kenneth Graunke <kenn...@whitecape.org>

Thanks, and to Matt: I've added extra braces with the nested
control-flow in brw_raw_timestamp_delta(), though I didn't add
trailing commas where noted. After seeing that the uses of the
_FEATURES macros are all followed by explicit commas it ends up
breaking compilation if the macros contain their own trailing comma.

Just pushed to master.

Br,
- Robert

>
> Could you try and hook up your fixed-point multiplier to fix query
> buffer objects as well?  I'd like to land these for the 17.0 release.
>
> Thanks for fixing this!
>
> --Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv/device: fix timestampPeriod for Broxton

2017-03-17 Thread Robert Bragg
Broxton's reference clock for timestamps runs at 19.2MHz and gives us
a period of 52.083 nanoseconds.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/intel/vulkan/anv_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 014b2f7d9c..d9eba859a1 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -535,7 +535,8 @@ void anv_GetPhysicalDeviceProperties(
ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
const struct gen_device_info *devinfo = >info;
 
-   const float time_stamp_base = devinfo->gen >= 9 ? 83.333 : 80.0;
+   const float time_stamp_base = (devinfo->gen < 9 ? 80.0 :
+  (devinfo->is_broxton ? 52.083 : 83.333));
 
/* See assertions made when programming the buffer surface state. */
const uint32_t max_raw_buffer_sz = devinfo->gen >= 7 ?
-- 
2.12.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: bounds checks while concatenating sysfs paths

2017-03-16 Thread Robert Bragg
This adds some missing return value checks for all uses of snprintf in
brw_performance_query.c. This also switches a use of strncpy + strncat
for snprintf for consistency and to avoiding the chance of the strncpy
leaving an unterminated string in the dest buffer if the src is too
long.

This issue with strncpy was picked up by Coverity.

CID: 1402201
Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/brw_performance_query.c | 43 +--
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
index 4052117ea5..2e04e091d2 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -1508,9 +1508,13 @@ enumerate_sysfs_metrics(struct brw_context *brw, const 
char *sysfs_dev_dir)
char buf[256];
DIR *metricsdir = NULL;
struct dirent *metric_entry;
+   int len;
 
-   strncpy(buf, sysfs_dev_dir, sizeof(buf));
-   strncat(buf, "/metrics", sizeof(buf));
+   len = snprintf(buf, sizeof(buf), "%s/metrics", sysfs_dev_dir);
+   if (len < 0 || len >= sizeof(buf)) {
+  DBG("Failed to concatenate path to sysfs metrics/ directory\n");
+  return;
+   }
 
metricsdir = opendir(buf);
if (!metricsdir) {
@@ -1533,8 +1537,12 @@ enumerate_sysfs_metrics(struct brw_context *brw, const 
char *sysfs_dev_dir)
  struct brw_perf_query_info *query;
  uint64_t id;
 
- snprintf(buf, sizeof(buf), "%s/metrics/%s/id",
-  sysfs_dev_dir, metric_entry->d_name);
+ len = snprintf(buf, sizeof(buf), "%s/metrics/%s/id",
+sysfs_dev_dir, metric_entry->d_name);
+ if (len < 0 || len >= sizeof(buf)) {
+DBG("Failed to concatenate path to sysfs metric id file\n");
+continue;
+ }
 
  if (!read_file_uint64(buf, )) {
 DBG("Failed to read metric set id from %s: %m", buf);
@@ -1561,8 +1569,13 @@ read_sysfs_drm_device_file_uint64(struct brw_context 
*brw,
   uint64_t *value)
 {
char buf[512];
+   int len;
 
-   snprintf(buf, sizeof(buf), "%s/%s", sysfs_dev_dir, file);
+   len = snprintf(buf, sizeof(buf), "%s/%s", sysfs_dev_dir, file);
+   if (len < 0 || len >= sizeof(buf)) {
+  DBG("Failed to concatenate sys filename to read u64 from\n");
+  return false;
+   }
 
return read_file_uint64(buf, value);
 }
@@ -1620,6 +1633,7 @@ get_sysfs_dev_dir(struct brw_context *brw,
int min, maj;
DIR *drmdir;
struct dirent *drm_entry;
+   int len;
 
assert(path_buf);
assert(path_buf_len);
@@ -1638,8 +1652,12 @@ get_sysfs_dev_dir(struct brw_context *brw,
   return false;
}
 
-   snprintf(path_buf, path_buf_len,
-"/sys/dev/char/%d:%d/device/drm", maj, min);
+   len = snprintf(path_buf, path_buf_len,
+  "/sys/dev/char/%d:%d/device/drm", maj, min);
+   if (len < 0 || len >= path_buf_len) {
+  DBG("Failed to concatenate sysfs path to drm device\n");
+  return false;
+   }
 
drmdir = opendir(path_buf);
if (!drmdir) {
@@ -1652,11 +1670,14 @@ get_sysfs_dev_dir(struct brw_context *brw,
drm_entry->d_type == DT_LNK) &&
   strncmp(drm_entry->d_name, "card", 4) == 0)
   {
- snprintf(path_buf, path_buf_len,
-  "/sys/dev/char/%d:%d/device/drm/%s",
-  maj, min, drm_entry->d_name);
+ len = snprintf(path_buf, path_buf_len,
+"/sys/dev/char/%d:%d/device/drm/%s",
+maj, min, drm_entry->d_name);
  closedir(drmdir);
- return true;
+ if (len < 0 || len >= path_buf_len)
+return false;
+ else
+return true;
   }
}
 
-- 
2.12.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: avoid using a GNU make pattern rule

2017-03-16 Thread Robert Bragg
On Thu, Mar 16, 2017 at 1:50 PM, Emil Velikov  wrote:
> On 16 March 2017 at 02:49, Jonathan Gray  wrote:
>> % pattern rules are a GNU extension.  As there is only one file here
>> avoid patterns and globbing entirely to fix the build on non-GNU make.
>>
>> Signed-off-by: Jonathan Gray 
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.am | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
>> b/src/mesa/drivers/dri/i965/Makefile.am
>> index a83e3a6fa1..fee1ccbbf5 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>> @@ -92,8 +92,8 @@ EXTRA_DIST = \
>>  # .c and .h files in one go so we don't hit problems with parallel
>>  # make and multiple invocations of the same script trying to write
>>  # to the same files.
>> -brw_oa_%.h: brw_oa_%.xml brw_oa.py Makefile.am
>> -   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
>> --header=$(builddir)/brw_oa_$(*).h --chipset="$(*)" $(srcdir)/brw_oa_$(*).xml
>> -brw_oa_%.c: brw_oa_%.xml brw_oa.py Makefile.am
>> -   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
>> --code=$(builddir)/brw_oa_$(*).c --chipset="$(*)" $(srcdir)/brw_oa_$(*).xml
> 
> Hmm that is not even remotely what was reviewed or suggested ... strange.
> 

Sorry, I didn't intend to change this under the radar, but I guess the
you didn't see this:
https://lists.freedesktop.org/archives/mesa-dev/2017-March/147200.html

>
>> +brw_oa_hsw.h: $(srcdir)/brw_oa_hsw.xml
>> +   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
>> --header=$(builddir)/brw_oa_hsw.h --chipset=hsw $(srcdir)/brw_oa_hsw.xml
>> +brw_oa_hsw.c: $(srcdir)/brw_oa_hsw.xml
>> +   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
>> --code=$(builddir)/brw_oa_hsw.c --chipset=hsw $(srcdir)/brw_oa_hsw.xml
>>
> Thank you Jonathan.
>
> We might need a generic rule as other generations are covered and/or
> even move the lot to another location.
> All that in due time, for now I'll add the missing brw_oa.py
> dependency and push this.

Just to mention; this change just caught me out when rebasing my gen 8
/ 9 changes and so I'm wondering about the best way to deal with this,
ideally without copying seven times for hsw, bdw, chv, sklgt2, sklgt3,
sklgt4, bxt + more later. For now I'm copy & pasting, and maybe it'll
just be easiest to stick with that.

I wouldn't guess the i915-perf kernel interface has been ported to any
other OS besides Linux, so maybe these can be annexed as Linux
specific somehow to avoid tripping up other OS builds.

It looks like there are a few other files in Mesa that use patterns,
such as ./src/mapi/glapi/gen/Makefile.am - I wonder how come that
hasn't proven to be a problem.

btw the recent change here removed Makefile.am from the list of
dependencies, which I think was reasonable to have originally (and
consistent with what automake normally does for rules it generates).

Regards,
- Robert

>
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] android: i965: generate code for OA counter queries

2017-03-13 Thread Robert Bragg
Acked-by: Robert Bragg <rob...@sixbynine.org>

On Sun, Mar 12, 2017 at 11:01 PM, Mauro Rossi <issor.or...@gmail.com> wrote:
> Automake generation rules are replicated for android.
> $* macro was expected to return "hsw" but instead gives "hsw.{h,c}"
> so $(basename $*) is used as a workaround
> to set the correct --chipset option for brw_oa.py script.
>
> Build tested with nougat-x86
>
> Fixes: e565505 "i965: Add script to gen code for OA counter queries"
> ---
>  src/mesa/drivers/dri/i965/Android.mk | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/Android.mk 
> b/src/mesa/drivers/dri/i965/Android.mk
> index 7dea3c2..6668930 100644
> --- a/src/mesa/drivers/dri/i965/Android.mk
> +++ b/src/mesa/drivers/dri/i965/Android.mk
> @@ -221,5 +221,22 @@ LOCAL_GENERATED_SOURCES := \
> $(MESA_DRI_OPTIONS_H) \
> $(MESA_GEN_NIR_H)
>
> +LOCAL_MODULE_CLASS := SHARED_LIBRARIES
> +
> +intermediates := $(call local-generated-sources-dir)
> +
> +LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, \
> +   $(i965_oa_GENERATED_FILES))
> +
> +$(intermediates)/brw_oa_%.h: $(LOCAL_PATH)/brw_oa_%.xml 
> $(LOCAL_PATH)/brw_oa.py
> +   @echo "target Generated: $(PRIVATE_MODULE) <= $(notdir $(@))"
> +   @mkdir -p $(dir $@)
> +   $(hide) $(MESA_PYTHON2) $(word 2, $^) --header=$@ 
> --chipset=$(basename $*) $<
> +
> +$(intermediates)/brw_oa_%.c: $(LOCAL_PATH)/brw_oa_%.xml 
> $(LOCAL_PATH)/brw_oa.py
> +   @echo "target Generated: $(PRIVATE_MODULE) <= $(notdir $(@))"
> +   @mkdir -p $(dir $@)
> +   $(hide) $(MESA_PYTHON2) $(word 2, $^) --code=$@ --chipset=$(basename 
> $*) $<
> +
>  include $(MESA_COMMON_MK)
>  include $(BUILD_SHARED_LIBRARY)
> --
> 2.10.2
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4] i965: Add script to gen code for OA counter queries

2017-03-08 Thread Robert Bragg
Although the updated targets for the brw_oa_.[ch] files fixed the
issue with parallel make it introduced an issue with there not being a rule to
rebuild the .c files if the .h file was up to date.

This makes two changes:
1) Use a pattern rule so we can avoid having to copy and paste more-or-less
the same targets for hsw,bdw,chv,skl.bxt...
2) Just go for simplicity and have orthogonal, separate targets + rules for
the .c and .h files. One rule passes --code, the other --header so with a
parallel make they won't be fighting to write the same files. The extra
xml parsing doesn't seem like something worth worrying about - probably
made up for by the switch to cElementTree.

Hope that's ok,
- Robert

--- >8 --- (git am --scissors)

Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:

This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.

Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.

The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.

v2: numerous python style improvements (Dylan)
v3: Makefile.am fixups (Emil)
v4: Pattern rule for codegen + orthogonal .c and .h rules (Robert)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Reviewed-by: Dylan Baker <dy...@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.veli...@collabora.com>
---
 src/mesa/drivers/dri/i965/Makefile.am  |  15 +-
 src/mesa/drivers/dri/i965/Makefile.sources |   4 +
 src/mesa/drivers/dri/i965/brw_oa.py| 556 +
 3 files changed, 574 insertions(+), 1 deletion(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index 4ff7e70b51..b9c66da995 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
 CLEANFILES = $(BUILT_SOURCES)
 
 EXTRA_DIST = \
-   brw_nir_trig_workarounds.py
+   brw_nir_trig_workarounds.py \
+   brw_oa_hsw.xml \
+   brw_oa.py
 
 TEST_LIBS = \
libi965_compiler.la \
@@ -170,3 +172,14 @@ test_eu_validate_SOURCES = \
 test_eu_validate_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
$(TEST_LIBS)
+
+BUILT_SOURCES += $(i965_oa_GENERATED_FILES)
+
+# Note: we avoid using a multi target rule here and outputting both the
+# .c and .h files in one go so we don't hit problems with parallel
+# make and multiple invocations of the same script trying to write
+# to the same files.
+brw_oa_%.h: brw_oa_%.xml brw_oa.py Makefile.am
+   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
--header=$(builddir)/brw_oa_$(*).h --chipset="$(*)" $(srcdir)/brw_oa_$(*).xml
+brw_oa_%.c: brw_oa_%.xml brw_oa.py Makefile.am
+   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
--code=$(builddir)/brw_oa_$(*).c --chipset="$(*)" $(srcdir)/brw_oa_$(*).xml
diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index b1776a8513..f9e5f6e594 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -262,3 +262,7 @@ i965_gen8_FILES = \
 
 i965_gen9_FILES = \
genX_blorp_exec.c
+
+i965_oa_GENERATED_FILES = \
+   brw_oa_hsw.h \
+   brw_oa_hsw.c
diff --git a/src/mesa/drivers/dri/i965/brw_oa.py 
b/src/mesa/drivers/dri/i965/brw_oa.py
new file mode 100644
index 00..e3fdd56174
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_oa.py
@@ -0,0 +1,556 @@
+# Copyright (c) 2015-2017 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIA

Re: [Mesa-dev] [PATCH v3] i965: Add script to gen code for OA counter queries

2017-03-02 Thread Robert Bragg
On Mar 2, 2017 7:32 PM, "Emil Velikov" <emil.l.veli...@gmail.com> wrote:

On 2 March 2017 at 18:58, Robert Bragg <rob...@sixbynine.org> wrote:
> Adds R/b from Dylan and Makefile fixups from Emil, including fixing race
with
> parallel make builds (thanks). Just holding fast on the use of #pragma
once
> though :-)
>
I think you want the "required=True" for all the fields, but feel free
to do at a later stage.


The header and code args are checked like:

> if args.header:
>header_file = open(args.header, 'w')
>
> if args.code:
>c_file = open(args.code, 'w')

and the c() and h() funcs are then be NOOPs when there's no open file to
write to.

I just double checked that's the case and it works as expected to omit
either arg.


As-is
Reviewed-by: Emil Velikov <emil.veli...@collabora.com>


Thanks,
- Robert


-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] i965: Add script to gen code for OA counter queries

2017-03-02 Thread Robert Bragg
Adds R/b from Dylan and Makefile fixups from Emil, including fixing race with
parallel make builds (thanks). Just holding fast on the use of #pragma once
though :-)

Br,
- Robert

--- >8 --- (git am --scissors)

Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:

This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.

Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.

The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.

v2: numerous python style improvements (Dylan)
v3: Makefile.am fixups (Emil)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Reviewed-by: Dylan Baker <dy...@pnwbakers.com>
---
 src/mesa/drivers/dri/i965/Makefile.am  |  17 +-
 src/mesa/drivers/dri/i965/Makefile.sources |   4 +
 src/mesa/drivers/dri/i965/brw_oa.py| 556 +
 3 files changed, 576 insertions(+), 1 deletion(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index 4ff7e70b51..5508df134d 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
 CLEANFILES = $(BUILT_SOURCES)
 
 EXTRA_DIST = \
-   brw_nir_trig_workarounds.py
+   brw_nir_trig_workarounds.py \
+   brw_oa_hsw.xml \
+   brw_oa.py
 
 TEST_LIBS = \
libi965_compiler.la \
@@ -170,3 +172,16 @@ test_eu_validate_SOURCES = \
 test_eu_validate_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
$(TEST_LIBS)
+
+BUILT_SOURCES += $(i965_oa_GENERATED_FILES)
+
+# XXX: note that we daisy chain these brw_oa_*.[ch] dependencies so
+# only the header directly depends on the script to be generated and
+# the .c file depends on the header instead of using a multi target
+# rule.  This ensures that a parallel make won't resort to running the
+# script twice (with the risk that the two runs trample on each other
+# generating the same files).
+#
+brw_oa_hsw.h: brw_oa_hsw.xml brw_oa.py Makefile.am
+   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py 
--header=$(builddir)/brw_oa_hsw.h --code=$(builddir)/brw_oa_hsw.c 
--chipset="hsw" $(srcdir)/brw_oa_hsw.xml
+brw_oa_hsw.c: brw_oa_hsw.h
diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index b1776a8513..f9e5f6e594 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -262,3 +262,7 @@ i965_gen8_FILES = \
 
 i965_gen9_FILES = \
genX_blorp_exec.c
+
+i965_oa_GENERATED_FILES = \
+   brw_oa_hsw.h \
+   brw_oa_hsw.c
diff --git a/src/mesa/drivers/dri/i965/brw_oa.py 
b/src/mesa/drivers/dri/i965/brw_oa.py
new file mode 100644
index 00..e3fdd56174
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_oa.py
@@ -0,0 +1,556 @@
+# Copyright (c) 2015-2017 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+# IN THE SOFTWARE.
+
+import argparse
+import os
+import sys
+import textwrap
+
+import xml.etree.cElementTree as et
+
+max_values = {}
+read_funcs = {}
+
+c_file = None
+_c_indent = 0
+
+def c(*args):
+if c_file:
+code = ' '.join(map(str,args))
+for line in code.splitlines():
+text = ''.rjust(_c_indent) + line
+c_file.write(text.rstrip() + "\n")
+
+# indented, but no trailing newline..

Re: [Mesa-dev] [PATCH] Mesa: Fix performance query id check

2017-03-01 Thread Robert Bragg
Since seeing that I had indeed sent out a v2 patch and it was reviewed
by Plamena too (thanks) I've just gone a head and pushed that now
(though with an updated commit message instead of copy pasting the
original message).

Thanks,
- Robert


On Mon, Feb 27, 2017 at 3:43 PM, Robert Bragg <rob...@sixbynine.org> wrote:
> On Mon, Feb 27, 2017 at 2:47 PM, Juha-Pekka Heikkila
> <juhapekka.heikk...@gmail.com> wrote:
>> In queryid_valid() fix handling of zero index.
>>
>> CC: Robert Bragg <rob...@sixbynine.org>
>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikk...@gmail.com>
>> ---
>>  src/mesa/main/performance_query.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/main/performance_query.c 
>> b/src/mesa/main/performance_query.c
>> index aa10351..142d834 100644
>> --- a/src/mesa/main/performance_query.c
>> +++ b/src/mesa/main/performance_query.c
>> @@ -91,7 +91,7 @@ static inline bool
>>  queryid_valid(const struct gl_context *ctx, unsigned numQueries, GLuint 
>> queryid)
>>  {
>> GLuint index = queryid_to_index(queryid);
>> -   return index >= 0 && index < numQueries;
>> +   return queryid != 0 && index < numQueries;
>>  }
>>
>>  static inline GLuint
>> --
>> 2.7.4
>>
>
> Oh, I thought I'd sent out an updated patch, but apparently I made it
> but got distracted and didn't actually send it :-/
>
> Main differences with mine was that I added a quote from the spec
> about ID zero being reserved:
>
> -   GLuint index = queryid_to_index(queryid);
> -   return index >= 0 && index < numQueries;
> +   /* The GL_INTEL_performance_query spec says:
> +*
> +*  "Performance counter ids values start with 1. Performance counter id 0
> +*  is reserved as an invalid counter."
> +*/
> +   return queryid != 0 && queryid_to_index(queryid) < numQueries;
>
> And dropped the local variable which you say you'd prefer to keep. I
> suppose on the other hand it seems good to me to not pass an index of
> zero to queryid_to_index() before validation in case
> queryid_to_index() might one day gain its own paranoid assertion that
> the queryid is >= 0.
>
> I don't have a strong opinion here though, so your patch looks fine to
> land to me:
> Reviewed-by: Robert Bragg <rob...@sixbynine.org>
>
> Br,
> - Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965: Add script to gen code for OA counter queries

2017-03-01 Thread Robert Bragg
Updated based on feedback from Emil, Dylan and Matt.
Adds R/b from Lionel

Thanks

--- >8 --- (git am --scissors)

Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:

This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.

Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.

The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.

v2: numerous python style improvements (Dylan)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 src/mesa/drivers/dri/i965/Makefile.am  |  16 +-
 src/mesa/drivers/dri/i965/Makefile.sources |   2 +
 src/mesa/drivers/dri/i965/brw_oa.py| 556 +
 3 files changed, 573 insertions(+), 1 deletion(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index f87fa67ef8..e32fb16c44 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
 CLEANFILES = $(BUILT_SOURCES)
 
 EXTRA_DIST = \
-   brw_nir_trig_workarounds.py
+   brw_nir_trig_workarounds.py \
+   brw_oa_hsw.xml \
+   brw_oa.py
 
 TEST_LIBS = \
libi965_compiler.la \
@@ -169,3 +171,15 @@ test_eu_validate_SOURCES = \
 test_eu_validate_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
$(TEST_LIBS)
+
+BUILT_SOURCES += \
+   brw_oa_hsw.h \
+   brw_oa_hsw.c
+
+brw_oa_hsw.h brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py Makefile.am
+   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py \
+  --header=$(builddir)/brw_oa_hsw.h \
+  --code=$(builddir)/brw_oa_hsw.c \
+  --chipset="hsw" \
+  $(srcdir)/brw_oa_hsw.xml
+brw_oa_hsw.c: brw_oa_hsw.h
diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 5278e86339..60acd15d41 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -135,6 +135,8 @@ i965_FILES = \
brw_nir_uniforms.cpp \
brw_object_purgeable.c \
brw_pipe_control.c \
+   brw_oa_hsw.h \
+   brw_oa_hsw.c \
brw_performance_query.h \
brw_performance_query.c \
brw_program.c \
diff --git a/src/mesa/drivers/dri/i965/brw_oa.py 
b/src/mesa/drivers/dri/i965/brw_oa.py
new file mode 100644
index 00..4c1286dc3a
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_oa.py
@@ -0,0 +1,556 @@
+# Copyright (c) 2015 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+# IN THE SOFTWARE.
+
+import argparse
+import os
+import sys
+import textwrap
+
+import xml.etree.cElementTree as et
+
+max_values = {}
+read_funcs = {}
+
+c_file = None
+_c_indent = 0
+
+def c(*args):
+if c_file:
+code = ' '.join(map(str,args))
+for line in code.splitlines():
+text = ''.rjust(_c_indent) + line
+c_file.write(text.rstrip() + "\n")
+
+# indented, but no trailing newline...
+def c_line_start(code):
+if c_file:
+c_file.write(''.rjust(_c_indent) + code)
+def c_raw(code):
+if c_file:
+c_file.write(code)
+
+def c_indent(n):
+global _c_indent
+_c_indent = _c_indent + n
+def c_outdent(n):
+global _c_indent
+_c_indent = _c_indent - n
+
+header_file = None
+_h_indent = 0
+
+def h(*args):
+if header_file:

Re: [Mesa-dev] [PATCH 5/8] i965: Add script to gen code for OA counter queries

2017-03-01 Thread Robert Bragg
On Wed, Mar 1, 2017 at 4:56 PM, Matt Turner <matts...@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 8:49 AM, Robert Bragg <rob...@sixbynine.org> wrote:
>> On Fri, Feb 24, 2017 at 2:50 PM, Emil Velikov <emil.l.veli...@gmail.com> 
>> wrote:
>>> Hi Robert,
>>>
>>> There's a few minor comments below. Feel free to address here or as
>>> follow-up, if applicable.
>>>
>>> On 24 February 2017 at 13:58, Robert Bragg <rob...@sixbynine.org> wrote:
>>>> Avoiding lots of error prone boilerplate and easing our ability to add +
>>>> maintain support for multiple OA performance counter queries for each
>>>> generation:
>>>>
>>>> This adds a python script to generate code for building up
>>>> performance_queries from the metric sets and counters described in
>>>> brw_oa_hsw.xml as well as functions to normalize each counter based on
>>>> the RPN expressions given.
>>>>
>>>> Although the XML file currently only includes a single metric set, the
>>>> code generated assumes there could be many sets.
>>>>
>>>> The metrics as described in XML get translated into C structures
>>>> which are registered in a brw->perfquery.oa_metrics_table hash table
>>>> keyed by the GUID of the metric set in XML.
>>>>
>>> Mauro, Tapani, we have another piece which will break Androids. Feel
>>> free to send a fixup for Robert to squash.
>>
>> Ah, can you maybe give me a bit more of a hint about what I'm breaking
>> for Android? Just extra work running python scripts as part of an
>> Android build I guess and requiring some change to
>> ./src/mesa/drivers/dri/i965/Android.gen.mk?
>>
>>>
>>>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>>>> ---
>>>>  src/mesa/drivers/dri/i965/Makefile.am  |  15 +-
>>>>  src/mesa/drivers/dri/i965/Makefile.sources |   2 +
>>>>  src/mesa/drivers/dri/i965/brw_oa.py| 543 
>>>> +
>>>>  3 files changed, 559 insertions(+), 1 deletion(-)
>>>>  create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
>>>> b/src/mesa/drivers/dri/i965/Makefile.am
>>>> index f87fa67ef8..0130afff5f 100644
>>>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>>>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>>>> @@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
>>>>  CLEANFILES = $(BUILT_SOURCES)
>>>>
>>>>  EXTRA_DIST = \
>>>> -   brw_nir_trig_workarounds.py
>>>> +   brw_nir_trig_workarounds.py \
>>>> +   brw_oa_hsw.xml \
>>>> +   brw_oa.py
>>>>
>>>>  TEST_LIBS = \
>>>> libi965_compiler.la \
>>>> @@ -169,3 +171,14 @@ test_eu_validate_SOURCES = \
>>>>  test_eu_validate_LDADD = \
>>>> $(top_builddir)/src/gtest/libgtest.la \
>>>> $(TEST_LIBS)
>>>> +
>>>> +BUILT_SOURCES = \
>>>> +   brw_oa_hsw.h \
>>>> +   brw_oa_hsw.c
>>>> +
>>>> +brw_oa_hsw.h brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py Makefile
>>> Eric reminded me that this is a bit buggy/racy. Something like the
>>> following should be better:
>>>
>>> brw_oa_hsw.h: brw_oa_hsw.c
>>> brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py
>>
>> as one issue, I can see that brw_oa_hsw.h isn't declared as a
>> prerequisite for compiling brw_oa_hsw.c, am I missing more issues?
>> Based on that I'd imagine adding:
>>
>> brw_oa_hsw.c: brw_oa_hsw.h
>>
>> as a way to ensure we can't attempt to compile brw_oa_hsw.c before
>> brw_oa_hsw.h is generated.
>> Is there a reason for removing the Makefile prerequisite? If I were to
>> mess with the arguments passed to the script in the codegen rules I'd
>> like a Makefile change to result in re-running those rules.
>
> We depend on the Makefile in src/intel/Makefile.genxml.am, and one
> unintended side-effect of that is that after every ./configure the
> files that depend on Makefile have to be recompiled.
>
> In that case, contents of the Makefile end up in the resulting files,
> so I think the dependency is okay if a bit annoying. This patch,
> however, I'd remove the Makefile dependency, since the Makefile just
> contains the execution of the script. I'm not sure why the script
> arguments would change...

Ah ok, 

Re: [Mesa-dev] [PATCH 5/8] i965: Add script to gen code for OA counter queries

2017-03-01 Thread Robert Bragg
On Wed, Mar 1, 2017 at 3:57 PM, Robert Bragg <rob...@sixbynine.org> wrote:
> On Mon, Feb 27, 2017 at 8:01 PM, Dylan Baker <dy...@pnwbakers.com> wrote:

>>> +def output_rpn_equation_code(set, counter, equation, counter_vars):
>>> +c("/* RPN equation: " + equation + " */")
>>> +tokens = equation.split()
>>> +stack = []
>>> +tmp_id = 0
>>> +tmp = None
>>> +
>>> +for token in tokens:
>>> +stack.append(token)
>>> +while stack and stack[-1] in ops:
>>> +op = stack.pop()
>>> +argc, callback = ops[op]
>>> +args = []
>>> +for i in range(0, argc):
>>
>> Use xrange instead of range
>
> okey, can do.

I just realised that the use of xrange() became the only thing
stopping the script from being python version agnostic (python 3
removes xrange and plain range() behaves like xrange()) so I went back
to using range() here.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/8] i965: Add script to gen code for OA counter queries

2017-03-01 Thread Robert Bragg
On Fri, Feb 24, 2017 at 2:50 PM, Emil Velikov <emil.l.veli...@gmail.com> wrote:
> Hi Robert,
>
> There's a few minor comments below. Feel free to address here or as
> follow-up, if applicable.
>
> On 24 February 2017 at 13:58, Robert Bragg <rob...@sixbynine.org> wrote:
>> Avoiding lots of error prone boilerplate and easing our ability to add +
>> maintain support for multiple OA performance counter queries for each
>> generation:
>>
>> This adds a python script to generate code for building up
>> performance_queries from the metric sets and counters described in
>> brw_oa_hsw.xml as well as functions to normalize each counter based on
>> the RPN expressions given.
>>
>> Although the XML file currently only includes a single metric set, the
>> code generated assumes there could be many sets.
>>
>> The metrics as described in XML get translated into C structures
>> which are registered in a brw->perfquery.oa_metrics_table hash table
>> keyed by the GUID of the metric set in XML.
>>
> Mauro, Tapani, we have another piece which will break Androids. Feel
> free to send a fixup for Robert to squash.

Ah, can you maybe give me a bit more of a hint about what I'm breaking
for Android? Just extra work running python scripts as part of an
Android build I guess and requiring some change to
./src/mesa/drivers/dri/i965/Android.gen.mk?

>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.am  |  15 +-
>>  src/mesa/drivers/dri/i965/Makefile.sources |   2 +
>>  src/mesa/drivers/dri/i965/brw_oa.py| 543 
>> +
>>  3 files changed, 559 insertions(+), 1 deletion(-)
>>  create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
>> b/src/mesa/drivers/dri/i965/Makefile.am
>> index f87fa67ef8..0130afff5f 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>> @@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
>>  CLEANFILES = $(BUILT_SOURCES)
>>
>>  EXTRA_DIST = \
>> -   brw_nir_trig_workarounds.py
>> +   brw_nir_trig_workarounds.py \
>> +   brw_oa_hsw.xml \
>> +   brw_oa.py
>>
>>  TEST_LIBS = \
>> libi965_compiler.la \
>> @@ -169,3 +171,14 @@ test_eu_validate_SOURCES = \
>>  test_eu_validate_LDADD = \
>> $(top_builddir)/src/gtest/libgtest.la \
>> $(TEST_LIBS)
>> +
>> +BUILT_SOURCES = \
>> +   brw_oa_hsw.h \
>> +   brw_oa_hsw.c
>> +
>> +brw_oa_hsw.h brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py Makefile
> Eric reminded me that this is a bit buggy/racy. Something like the
> following should be better:
>
> brw_oa_hsw.h: brw_oa_hsw.c
> brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py

as one issue, I can see that brw_oa_hsw.h isn't declared as a
prerequisite for compiling brw_oa_hsw.c, am I missing more issues?
Based on that I'd imagine adding:

brw_oa_hsw.c: brw_oa_hsw.h

as a way to ensure we can't attempt to compile brw_oa_hsw.c before
brw_oa_hsw.h is generated.
Is there a reason for removing the Makefile prerequisite? If I were to
mess with the arguments passed to the script in the codegen rules I'd
like a Makefile change to result in re-running those rules.

I just noticed that I should also be using += for BUILT_SOURCES here, oops.

>  ...
>
>> +   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py \
>> +  --header=$(builddir)/brw_oa_hsw.h \
>> +  --code=$(builddir)/brw_oa_hsw.c \
>> +  --chipset="hsw" \
>> +  $(srcdir)/brw_oa_hsw.xml
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 5278e86339..60acd15d41 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -135,6 +135,8 @@ i965_FILES = \
>> brw_nir_uniforms.cpp \
>> brw_object_purgeable.c \
>> brw_pipe_control.c \
>> +   brw_oa_hsw.h \
>> +   brw_oa_hsw.c \
> H is after C ;-)
>
>> brw_performance_query.h \
>> brw_performance_query.c \
>> brw_program.c \
>> diff --git a/src/mesa/drivers/dri/i965/brw_oa.py 
>> b/src/mesa/drivers/dri/i965/brw_oa.py
>
>> new file mode 100644
>> index 00..2c622531af
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/brw_oa.py
>> @@ -0,0 +1,543 @@
>> +#!/usr/bin/env python2
> Thanks for omitting the execute bit !
> Maybe drop this, sinc

Re: [Mesa-dev] [PATCH 5/8] i965: Add script to gen code for OA counter queries

2017-03-01 Thread Robert Bragg
On Mon, Feb 27, 2017 at 8:01 PM, Dylan Baker <dy...@pnwbakers.com> wrote:
> I have some comments below, some of them are a bit of work, some of them 
> should
> be pretty easy.
>
> Dylan
>
> Quoting Robert Bragg (2017-02-24 05:58:00)
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/brw_oa.py
>> @@ -0,0 +1,543 @@
>> +#!/usr/bin/env python2
>> +#
>> +# Copyright (c) 2015 Intel Corporation
>> +#
>> +# Permission is hereby granted, free of charge, to any person obtaining a
>> +# copy of this software and associated documentation files (the "Software"),
>> +# to deal in the Software without restriction, including without limitation
>> +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> +# and/or sell copies of the Software, and to permit persons to whom the
>> +# Software is furnished to do so, subject to the following conditions:
>> +#
>> +# The above copyright notice and this permission notice (including the next
>> +# paragraph) shall be included in all copies or substantial portions of the
>> +# Software.
>> +#
>> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> +# IN THE SOFTWARE.
>> +
>> +import xml.etree.ElementTree as ET
>
> use cElementTree, and please import as "et" instead of "ET", ET suggests to me
> as a pythonista that this is a class not a module. (I know ElementTree is
> capitalized, but that's a legacy thing, modern python style is modules are 
> lower
> case with _, classes are CamelCase)

Using cElementTree sounds good.

If I grep Mesa I see for 6 different uses of ElementTree with 5 out of
6 importing as ET. Really I just copied the style/name from other
examples I found including the reference manual:
https://docs.python.org/2/library/xml.etree.elementtree.html and ET
does seem to be the common abbreviation used with ElementTree. I don't
really have a strong oppinion here though and can just make the
change.

>
>> +import argparse
>> +import sys
>
> Also, please sort the imports

okey, can do; the python style guide says:

"Imports should be grouped in the following order:

1. standard library imports
2. related third party imports
3. local application/library specific imports

You should put a blank line between each group of imports. "

I'm not actually sure if mesa's general style of alphanumerically
sorting everything should override, but luckily my only third party
import begins with x.

>
>> +
>> +def print_err(*args):
>> +sys.stderr.write(' '.join(map(str,args)) + '\n')
>
> unused function

ah yup, can remove; just copied from another codegen script based on
the same data.

>
>> +
>> +c_file = None
>> +_c_indent = 0
>> +
>> +def c(*args):
>> +if c_file:
>> +code = ' '.join(map(str,args))
>> +for line in code.splitlines():
>> +text = ''.rjust(_c_indent) + line
>> +c_file.write(text.rstrip() + "\n")
>> +
>> +# indented, but no trailing newline...
>> +def c_line_start(code):
>> +if c_file:
>> +c_file.write(''.rjust(_c_indent) + code)
>> +def c_raw(code):
>> +if c_file:
>> +c_file.write(code)
>> +
>> +def c_indent(n):
>> +global _c_indent
>> +_c_indent = _c_indent + n
>> +def c_outdent(n):
>> +global _c_indent
>> +_c_indent = _c_indent - n
>> +
>> +header_file = None
>> +_h_indent = 0
>> +
>> +def h(*args):
>> +if header_file:
>> +code = ' '.join(map(str,args))
>> +for line in code.splitlines():
>> +text = ''.rjust(_h_indent) + line
>> +header_file.write(text.rstrip() + "\n")
>> +
>> +def h_indent(n):
>> +global _c_indent
>> +_h_indent = _h_indent + n
>> +def h_outdent(n):
>> +global _c_indent
>> +_h_indent = _h_indent - n
>> +
>> +
>> +def emit_fadd(tmp_id, args):
>> +c("double tmp" + str(tmp_id) +" = " + args[1] + " + " + args[0] + ";")
>
> "double tmp {tmp_id} = {args[1]} + {ags[0]};".format(tmp_id=tmp_id, args=a

Re: [Mesa-dev] [PATCH 5/8] i965: Add script to gen code for OA counter queries

2017-02-28 Thread Robert Bragg
in the xml files and don't
really see much alternative to using a typesetting library like mathjax to
render the equations e.g. for a UI.

Definitely thanks for doing this experiment though!

Br,
- Robert

>
> Regardless, this series is :
>
> Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
>
> Thanks!
>
> -
> Lionel
>
>
> On 24/02/17 13:58, Robert Bragg wrote:
>>
>> Avoiding lots of error prone boilerplate and easing our ability to add +
>> maintain support for multiple OA performance counter queries for each
>> generation:
>>
>> This adds a python script to generate code for building up
>> performance_queries from the metric sets and counters described in
>> brw_oa_hsw.xml as well as functions to normalize each counter based on
>> the RPN expressions given.
>>
>> Although the XML file currently only includes a single metric set, the
>> code generated assumes there could be many sets.
>>
>> The metrics as described in XML get translated into C structures
>> which are registered in a brw->perfquery.oa_metrics_table hash table
>> keyed by the GUID of the metric set in XML.
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> ---
>> src/mesa/drivers/dri/i965/Makefile.am | 15 +-
>> src/mesa/drivers/dri/i965/Makefile.sources | 2 +
>> src/mesa/drivers/dri/i965/brw_oa.py | 543
>> +
>> 3 files changed, 559 insertions(+), 1 deletion(-)
>> create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am
>> b/src/mesa/drivers/dri/i965/Makefile.am
>> index f87fa67ef8..0130afff5f 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>> @@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
>> CLEANFILES = $(BUILT_SOURCES)
>> EXTRA_DIST = \
>> - brw_nir_trig_workarounds.py
>> + brw_nir_trig_workarounds.py \
>> + brw_oa_hsw.xml \
>> + brw_oa.py
>> TEST_LIBS = \
>> libi965_compiler.la \
>> @@ -169,3 +171,14 @@ test_eu_validate_SOURCES = \
>> test_eu_validate_LDADD = \
>> $(top_builddir)/src/gtest/libgtest.la \
>> $(TEST_LIBS)
>> +
>> +BUILT_SOURCES = \
>> + brw_oa_hsw.h \
>> + brw_oa_hsw.c
>> +
>> +brw_oa_hsw.h brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py Makefile
>> + $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py \
>> + --header=$(builddir)/brw_oa_hsw.h \
>> + --code=$(builddir)/brw_oa_hsw.c \
>> + --chipset="hsw" \
>> + $(srcdir)/brw_oa_hsw.xml
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 5278e86339..60acd15d41 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -135,6 +135,8 @@ i965_FILES = \
>> brw_nir_uniforms.cpp \
>> brw_object_purgeable.c \
>> brw_pipe_control.c \
>> + brw_oa_hsw.h \
>> + brw_oa_hsw.c \
>> brw_performance_query.h \
>> brw_performance_query.c \
>> brw_program.c \
>> diff --git a/src/mesa/drivers/dri/i965/brw_oa.py
>> b/src/mesa/drivers/dri/i965/brw_oa.py
>> new file mode 100644
>> index 00..2c622531af
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/brw_oa.py
>> @@ -0,0 +1,543 @@
>> +#!/usr/bin/env python2
>> +#
>> +# Copyright (c) 2015 Intel Corporation
>> +#
>> +# Permission is hereby granted, free of charge, to any person obtaining
a
>> +# copy of this software and associated documentation files (the
>> "Software"),
>> +# to deal in the Software without restriction, including without
>> limitation
>> +# the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> +# and/or sell copies of the Software, and to permit persons to whom the
>> +# Software is furnished to do so, subject to the following conditions:
>> +#
>> +# The above copyright notice and this permission notice (including the
>> next
>> +# paragraph) shall be included in all copies or substantial portions of
>> the
>> +# Software.
>> +#
>> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS
>> OR
>> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHE

Re: [Mesa-dev] [PATCH] Mesa: Fix performance query id check

2017-02-27 Thread Robert Bragg
On Mon, Feb 27, 2017 at 2:47 PM, Juha-Pekka Heikkila
<juhapekka.heikk...@gmail.com> wrote:
> In queryid_valid() fix handling of zero index.
>
> CC: Robert Bragg <rob...@sixbynine.org>
> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikk...@gmail.com>
> ---
>  src/mesa/main/performance_query.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/performance_query.c 
> b/src/mesa/main/performance_query.c
> index aa10351..142d834 100644
> --- a/src/mesa/main/performance_query.c
> +++ b/src/mesa/main/performance_query.c
> @@ -91,7 +91,7 @@ static inline bool
>  queryid_valid(const struct gl_context *ctx, unsigned numQueries, GLuint 
> queryid)
>  {
> GLuint index = queryid_to_index(queryid);
> -   return index >= 0 && index < numQueries;
> +   return queryid != 0 && index < numQueries;
>  }
>
>  static inline GLuint
> --
> 2.7.4
>

Oh, I thought I'd sent out an updated patch, but apparently I made it
but got distracted and didn't actually send it :-/

Main differences with mine was that I added a quote from the spec
about ID zero being reserved:

-   GLuint index = queryid_to_index(queryid);
-   return index >= 0 && index < numQueries;
+   /* The GL_INTEL_performance_query spec says:
+*
+*  "Performance counter ids values start with 1. Performance counter id 0
+*  is reserved as an invalid counter."
+*/
+   return queryid != 0 && queryid_to_index(queryid) < numQueries;

And dropped the local variable which you say you'd prefer to keep. I
suppose on the other hand it seems good to me to not pass an index of
zero to queryid_to_index() before validation in case
queryid_to_index() might one day gain its own paranoid assertion that
the queryid is >= 0.

I don't have a strong opinion here though, so your patch looks fine to
land to me:
Reviewed-by: Robert Bragg <rob...@sixbynine.org>

Br,
- Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Fix performance query id check

2017-02-24 Thread Robert Bragg
In queryid_valid() index is unsigned so checking if it is less
than zero is useless. On queryid_to_index() is comment
saying 0 is reserved to be invalid thus rule it out.

This is a v2 of a fix for an issue identified by Juha-Pekka (thanks)
and the commit message is gratuitously stolen.

Cc: Juha-Pekka Heikkila <juhapekka.heikk...@gmail.com>
Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/main/performance_query.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/performance_query.c 
b/src/mesa/main/performance_query.c
index aa103516a5..56f6a7da8b 100644
--- a/src/mesa/main/performance_query.c
+++ b/src/mesa/main/performance_query.c
@@ -90,8 +90,12 @@ index_to_queryid(unsigned index)
 static inline bool
 queryid_valid(const struct gl_context *ctx, unsigned numQueries, GLuint 
queryid)
 {
-   GLuint index = queryid_to_index(queryid);
-   return index >= 0 && index < numQueries;
+   /* The GL_INTEL_performance_query spec says:
+*
+*  "Performance counter ids values start with 1. Performance counter id 0
+*  is reserved as an invalid counter."
+*/
+   return queryid != 0 && queryid_to_index(queryid) < numQueries;
 }
 
 static inline GLuint
-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa: Fix performance query id check

2017-02-24 Thread Robert Bragg
Ah, oops, I think the id types iterated a few times over the lifetime
of the original patch. E.g. until pretty much the last moment before
landing it pass a signed int id to the backend.

I suppose it can be return queryid != 0 && queryid_to_index(queryid) <
numQueries

I'll also look to update the local piglit tests I have to try and
catch this case, they currently check for an invalid queryid searching
from UINT_MAX down until it finds one not matching the id of any
discovered query.

I can send out a fix if you don't beat me to it,
Thanks,
- Robert


On Fri, Feb 24, 2017 at 1:21 PM, Juha-Pekka Heikkila
<juhapekka.heikk...@gmail.com> wrote:
> Oops. Original code can never fail on zero id but my patch is also wrong.
> Please ignore this patch.
>
> /Juha-Pekka
>
>
> On 24.02.2017 15:10, Juha-Pekka Heikkila wrote:
>>
>> In queryid_valid() index is unsigned so checking if it is less
>> than zero is useless. On queryid_to_index() is comment
>> saying 0 is reserved to be invalid thus rule it out.
>>
>> CC: Robert Bragg <rob...@sixbynine.org>
>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikk...@gmail.com>
>> ---
>>  src/mesa/main/performance_query.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/main/performance_query.c
>> b/src/mesa/main/performance_query.c
>> index aa10351..079d3c8 100644
>> --- a/src/mesa/main/performance_query.c
>> +++ b/src/mesa/main/performance_query.c
>> @@ -91,7 +91,7 @@ static inline bool
>>  queryid_valid(const struct gl_context *ctx, unsigned numQueries, GLuint
>> queryid)
>>  {
>> GLuint index = queryid_to_index(queryid);
>> -   return index >= 0 && index < numQueries;
>> +   return index != 0 && index < numQueries;
>>  }
>>
>>  static inline GLuint
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/8] i965: XML description of Haswell OA metric set

2017-02-24 Thread Robert Bragg
In preparation for exposing Gen Observation Architecture performance
counters via INTEL_performance_query this adds an XML description for an
initial 'Render Metrics Basic Gen7.5' query and corresponding counters.

The intention is to auto generate code for building a query from these
counters as well as the code for normalizing the individual counters.

Note that the upstream for this XML data is currently GPU Top:

  https://github.com/rib/gputop

The files are maintained under gputop-data/ and they are themselves
derived from files in an internal 'MDAPI XML' schema. There are scripts
under gputop-scripts/ and make rules in gputop-data/Makefile.xml for
maintaining these files.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/brw_oa_hsw.xml | 998 +++
 1 file changed, 998 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa_hsw.xml

diff --git a/src/mesa/drivers/dri/i965/brw_oa_hsw.xml 
b/src/mesa/drivers/dri/i965/brw_oa_hsw.xml
new file mode 100644
index 00..4947671263
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_oa_hsw.xml
@@ -0,0 +1,998 @@
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 

+ 

+ 

+ 

+
+  
+
+
-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] i965: brw_context.h additions for OA unit query codegen

2017-02-24 Thread Robert Bragg
In preparation for generating code from the XML performance counter meta
data, this makes some additions to brw_context.h for this code to be
able to reference.

It adds a brw->perfquery.oa_metrics_table hash table for indexing built
up query descriptions by the GUID that is expected to be advertised by
the kernel (via sysfs) to be able to use that query.

It adds an 'OA_COUNTERS' brw_query_kind to be assigned to queries built
up by generated code.

It adds a brw->perfquery.sys_vars structure to have a consistent place
to represent the different system variables like $EuCoresTotalCount and
$EuSlicesTotalCount that are referenced by OA counter normalization
equations.

  Although extending + referencing gen_device_info for these variables
  was considered, these are some of the (mostly minor) reasons for
  going with a dedicated structure:

  - Currently we only need this info for the performance_query backend
and it might be a bit tedious to go back and initialize the state
for pre-Haswell devinfo structures.
  - Considering the $SubsliceMask then the requirement for how multiple
per-slice masks are packed only comes from how the variables are
references by availability tests in XML, and might not be a good
general representation for tracking subslice masks if another use
case arises.
  - If we used gen_device_info then we'd likely want to avoid making
assumptions about the C types during codegen and adding explicit
casts, while that's not necessary with a dedicated struct with all
members being uint64_t.
  - This structure and the code for initializing it is currently shared
(just through copy & paste) with a few other projects dealing with
OA counters, and that's been convenient so far.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/brw_context.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7ff7b74252..68d2fbfc23 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -652,6 +652,7 @@ struct shader_times;
 struct gen_l3_config;
 
 enum brw_query_kind {
+   OA_COUNTERS,
PIPELINE_STATS
 };
 
@@ -1142,6 +1143,26 @@ struct brw_context
} predicate;
 
struct {
+  /* Variables referenced in the XML meta data for OA performance
+   * counters, e.g in the normalization equations.
+   *
+   * All uint64_t for consistent operand types in generated code
+   */
+  struct {
+ uint64_t timestamp_frequency; /** $GpuTimestampFrequency */
+ uint64_t n_eus;   /** $EuCoresTotalCount */
+ uint64_t n_eu_slices; /** $EuSlicesTotalCount */
+ uint64_t subslice_mask;   /** $SubsliceeMask */
+ uint64_t gt_min_freq; /** $GpuMinFrequency */
+ uint64_t gt_max_freq; /** $GpuMaxFrequency */
+  } sys_vars;
+
+  /* OA metric sets, indexed by GUID, as know by Mesa at build time,
+   * to cross-reference with the GUIDs of configs advertised by the
+   * kernel at runtime
+   */
+  struct hash_table *oa_metrics_table;
+
   struct brw_perf_query_info *queries;
   int n_queries;
 
-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/8] i965: Expose OA counters via INTEL_performance_query

2017-02-24 Thread Robert Bragg
This adds support for exposing basic Observation Architecture
performance counters on Haswell.

This support is based on the i915 perf kernel interface which is used
to configure the OA unit, allowing Mesa to emit MI_REPORT_PERF_COUNT
commands around queries to collect counter snapshots.

To take into account the small chance that some of the 32bit counters
could wrap around for long queries (~50 milliseconds for a GT3 Haswell @
1.1GHz) the implementation also collects periodic metrics.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/brw_context.h   |   52 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 1119 -
 2 files changed, 1158 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index cc5ec78f8c..93fdb648af 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1178,7 +1178,59 @@ struct brw_context
   struct brw_perf_query_info *queries;
   int n_queries;
 
+  /* The i915 perf stream we open to setup + enable the OA counters */
+  int oa_stream_fd;
+
+  /* An i915 perf stream fd gives exclusive access to the OA unit that will
+   * report counter snapshots for a specific counter set/profile in a
+   * specific layout/format so we can only start OA queries that are
+   * compatible with the currently open fd...
+   */
+  int current_oa_metrics_set_id;
+  int current_oa_format;
+
+  /* List of buffers containing OA reports */
+  struct exec_list sample_buffers;
+
+  /* Cached list of empty sample buffers */
+  struct exec_list free_sample_buffers;
+
+  int n_active_oa_queries;
   int n_active_pipeline_stats_queries;
+
+  /* The number of queries depending on running OA counters which
+   * extends beyond brw_end_perf_query() since we need to wait until
+   * the last MI_RPC command has parsed by the GPU.
+   *
+   * Accurate accounting is important here as emitting an
+   * MI_REPORT_PERF_COUNT command while the OA unit is disabled will
+   * effectively hang the gpu.
+   */
+  int n_oa_users;
+
+  /* To help catch an spurious problem with the hardware or perf
+   * forwarding samples, we emit each MI_REPORT_PERF_COUNT command
+   * with a unique ID that we can explicitly check for...
+   */
+  int next_query_start_report_id;
+
+  /**
+   * An array of queries whose results haven't yet been assembled
+   * based on the data in buffer objects.
+   *
+   * These may be active, or have already ended.  However, the
+   * results have not been requested.
+   */
+  struct brw_perf_query_object **unaccumulated;
+  int unaccumulated_elements;
+  int unaccumulated_array_size;
+
+  /* The total number of query objects so we can relinquish
+   * our exclusive access to perf if the application deletes
+   * all of its objects. (NB: We only disable perf while
+   * there are no active queries)
+   */
+  int n_query_instances;
} perfquery;
 
int num_atoms[BRW_NUM_PIPELINES];
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
index 6cd5bf4caa..64b3beeeae 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -26,12 +26,21 @@
  *
  * Implementation of the GL_INTEL_performance_query extension.
  *
- * Currently this driver only exposes the 64bit Pipeline Statistics
- * Registers for Gen6+, with support for Observability Counters to be
- * added later for Gen7.5+
+ * Currently there are two possible counter sources exposed here:
+ *
+ * On Gen6+ hardware we have numerous 64bit Pipeline Statistics Registers
+ * that we can snapshot at the beginning and end of a query.
+ *
+ * On Gen7.5+ we have Observability Architecture counters which are
+ * covered in separate document from the rest of the PRMs.  It is available at:
+ * https://01.org/linuxgraphics/documentation/driver-documentation-prms
+ * => 2013 Intel Core Processor Family => Observability Performance Counters
+ * (This one volume covers Sandybridge, Ivybridge, Baytrail, and Haswell,
+ * though notably we currently only support OA counters for Haswell+)
  */
 
 #include 
+#include 
 
 #include 
 #include 
@@ -40,6 +49,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #include "main/hash.h"
 #include "main/macros.h"
 #include "main/mtypes.h"
@@ -47,14 +59,145 @@
 
 #include "util/bitset.h"
 #include "util/ralloc.h"
+#include "util/hash_table.h"
+#include "util/list.h"
 
 #include "brw_context.h"
 #include "brw_defines.h"
 #include "brw_performance_query.h"
+#include "brw_oa_hsw.h"
 #include "intel_batchbuffer.h"
 
 #de

[Mesa-dev] [PATCH 6/8] exec_list: Add a foreach_list_typed_from macro

2017-02-24 Thread Robert Bragg
This allows iterating list nodes from a given start point instead of
necessarily the list head.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/compiler/glsl/list.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/compiler/glsl/list.h b/src/compiler/glsl/list.h
index 6afb9dcef9..47f53b865b 100644
--- a/src/compiler/glsl/list.h
+++ b/src/compiler/glsl/list.h
@@ -699,6 +699,11 @@ inline void exec_node::insert_before(exec_list *before)
(__node)->__field.next != NULL; \
(__node) = exec_node_data(__type, (__node)->__field.next, __field))
 
+#define foreach_list_typed_from(__type, __node, __field, __list, __start)  \
+   for (__type * __node = exec_node_data(__type, (__start), __field);  \
+   (__node)->__field.next != NULL;\
+   (__node) = exec_node_data(__type, (__node)->__field.next, __field))
+
 #define foreach_list_typed_reverse(__type, __node, __field, __list)\
for (__type * __node =\
exec_node_data(__type, (__list)->tail_sentinel.prev, __field);  \
-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] i965: extend query/counter structs for OA queries

2017-02-24 Thread Robert Bragg
In preparation for generating code from brw_oa_hsw.xml for describing OA
performance counter queries this adds some OA specific members to
brw_perf_query that our generated code will initialize:

- The oa_metric_set_id is the ID we will pass to
  DRM_IOCTL_I915_PERF_OPEN, and is an ID got via sysfs under:
  /sys/class/drm//metrics/
---
 src/mesa/drivers/dri/i965/brw_context.h   | 12 
 src/mesa/drivers/dri/i965/brw_performance_query.h | 10 +-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 68d2fbfc23..cc5ec78f8c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -660,9 +660,21 @@ struct brw_perf_query_info
 {
enum brw_query_kind kind;
const char *name;
+   const char *guid;
struct brw_perf_query_counter *counters;
int n_counters;
size_t data_size;
+
+   /* OA specific */
+   uint64_t oa_metrics_set_id;
+   int oa_format;
+
+   /* For indexing into the accumulator[] ... */
+   int gpu_time_offset;
+   int gpu_clock_offset;
+   int a_offset;
+   int b_offset;
+   int c_offset;
 };
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.h 
b/src/mesa/drivers/dri/i965/brw_performance_query.h
index 8f1f96060b..c9454f98ea 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.h
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.h
@@ -44,6 +44,14 @@ struct brw_perf_query_counter
size_t offset;
size_t size;
 
-   struct brw_pipeline_stat pipeline_stat;
+   union {
+  uint64_t (*oa_counter_read_uint64)(struct brw_context *brw,
+ const struct brw_perf_query_info 
*query,
+ uint64_t *accumulator);
+  float (*oa_counter_read_float)(struct brw_context *brw,
+ const struct brw_perf_query_info *query,
+ uint64_t *accumulator);
+  struct brw_pipeline_stat pipeline_stat;
+   };
 };
 
-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] i965: Add script to gen code for OA counter queries

2017-02-24 Thread Robert Bragg
Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:

This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.

Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.

The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/Makefile.am  |  15 +-
 src/mesa/drivers/dri/i965/Makefile.sources |   2 +
 src/mesa/drivers/dri/i965/brw_oa.py| 543 +
 3 files changed, 559 insertions(+), 1 deletion(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index f87fa67ef8..0130afff5f 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -93,7 +93,9 @@ BUILT_SOURCES = $(i965_compiler_GENERATED_FILES)
 CLEANFILES = $(BUILT_SOURCES)
 
 EXTRA_DIST = \
-   brw_nir_trig_workarounds.py
+   brw_nir_trig_workarounds.py \
+   brw_oa_hsw.xml \
+   brw_oa.py
 
 TEST_LIBS = \
libi965_compiler.la \
@@ -169,3 +171,14 @@ test_eu_validate_SOURCES = \
 test_eu_validate_LDADD = \
$(top_builddir)/src/gtest/libgtest.la \
$(TEST_LIBS)
+
+BUILT_SOURCES = \
+   brw_oa_hsw.h \
+   brw_oa_hsw.c
+
+brw_oa_hsw.h brw_oa_hsw.c: brw_oa_hsw.xml brw_oa.py Makefile
+   $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_oa.py \
+  --header=$(builddir)/brw_oa_hsw.h \
+  --code=$(builddir)/brw_oa_hsw.c \
+  --chipset="hsw" \
+  $(srcdir)/brw_oa_hsw.xml
diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 5278e86339..60acd15d41 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -135,6 +135,8 @@ i965_FILES = \
brw_nir_uniforms.cpp \
brw_object_purgeable.c \
brw_pipe_control.c \
+   brw_oa_hsw.h \
+   brw_oa_hsw.c \
brw_performance_query.h \
brw_performance_query.c \
brw_program.c \
diff --git a/src/mesa/drivers/dri/i965/brw_oa.py 
b/src/mesa/drivers/dri/i965/brw_oa.py
new file mode 100644
index 00..2c622531af
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_oa.py
@@ -0,0 +1,543 @@
+#!/usr/bin/env python2
+#
+# Copyright (c) 2015 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+# IN THE SOFTWARE.
+
+import xml.etree.ElementTree as ET
+import argparse
+import sys
+
+def print_err(*args):
+sys.stderr.write(' '.join(map(str,args)) + '\n')
+
+c_file = None
+_c_indent = 0
+
+def c(*args):
+if c_file:
+code = ' '.join(map(str,args))
+for line in code.splitlines():
+text = ''.rjust(_c_indent) + line
+c_file.write(text.rstrip() + "\n")
+
+# indented, but no trailing newline...
+def c_line_start(code):
+if c_file:
+c_file.write(''.rjust(_c_indent) + code)
+def c_raw(code):
+if c_file:
+c_file.write(code)
+
+def c_indent(n):
+global _c_indent
+_c_indent = _c_indent + n
+def c_outdent(n):
+global _c_indent
+_c_indent = _c_indent - n
+
+header_file = None
+_h_indent = 0
+
+def h(*args):
+if header_file:
+code = ' '.join(map(str,args))
+for line in code.splitlines():
+text = ''.rjust(_h_indent) + line
+header_file.write(text.rstrip() + "\n")
+
+def h_indent(n):
+global _c_indent
+ 

[Mesa-dev] [PATCH 0/8] i965: Support OA counters via INTEL_performance_query

2017-02-24 Thread Robert Bragg
This builds on the recent i965/INTEL_performance_query work to add support for
Haswell OA unit performance counters (Gen 8 and 9 support to follow once
corresponding kernel supports lands).

Robert Bragg (8):
  main/performance_query: s/GLboolean/bool/
  i965: XML description of Haswell OA metric set
  i965: brw_context.h additions for OA unit query codegen
  i965: extend query/counter structs for OA queries
  i965: Add script to gen code for OA counter queries
  exec_list: Add a foreach_list_typed_from macro
  i965: Expose OA counters via INTEL_performance_query
  i965: Add more Haswell OA metrics sets

 src/compiler/glsl/list.h  |5 +
 src/mesa/drivers/dri/i965/Makefile.am |   15 +-
 src/mesa/drivers/dri/i965/Makefile.sources|2 +
 src/mesa/drivers/dri/i965/brw_context.h   |   85 +
 src/mesa/drivers/dri/i965/brw_oa.py   |  543 +++
 src/mesa/drivers/dri/i965/brw_oa_hsw.xml  | 4400 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 1123 +-
 src/mesa/drivers/dri/i965/brw_performance_query.h |   10 +-
 src/mesa/main/dd.h|8 +-
 9 files changed, 6170 insertions(+), 21 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa.py
 create mode 100644 src/mesa/drivers/dri/i965/brw_oa_hsw.xml

-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] main/performance_query: s/GLboolean/bool/

2017-02-24 Thread Robert Bragg
Ideally would have caught these when adding the interface but this just
switches a few return type for the INTEL_performance_query backend
interface to bool instead of GLboolean.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/brw_performance_query.c | 4 ++--
 src/mesa/main/dd.h| 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
index 46847bef53..6cd5bf4caa 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -202,7 +202,7 @@ snapshot_statistics_registers(struct brw_context *brw,
 /**
  * Driver hook for glBeginPerfQueryINTEL().
  */
-static GLboolean
+static bool
 brw_begin_perf_query(struct gl_context *ctx,
  struct gl_perf_query_object *o)
 {
@@ -351,7 +351,7 @@ brw_wait_perf_query(struct gl_context *ctx, struct 
gl_perf_query_object *o)
drm_intel_bo_wait_rendering(bo);
 }
 
-static GLboolean
+static bool
 brw_is_perf_query_ready(struct gl_context *ctx,
 struct gl_perf_query_object *o)
 {
diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index aba301cf22..63fc62103a 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -804,14 +804,14 @@ struct dd_function_table {
unsigned queryIndex);
void (*DeletePerfQuery)(struct gl_context *ctx,
struct gl_perf_query_object *obj);
-   GLboolean (*BeginPerfQuery)(struct gl_context *ctx,
-   struct gl_perf_query_object *obj);
+   bool (*BeginPerfQuery)(struct gl_context *ctx,
+  struct gl_perf_query_object *obj);
void (*EndPerfQuery)(struct gl_context *ctx,
 struct gl_perf_query_object *obj);
void (*WaitPerfQuery)(struct gl_context *ctx,
  struct gl_perf_query_object *obj);
-   GLboolean (*IsPerfQueryReady)(struct gl_context *ctx,
- struct gl_perf_query_object *obj);
+   bool (*IsPerfQueryReady)(struct gl_context *ctx,
+struct gl_perf_query_object *obj);
void (*GetPerfQueryData)(struct gl_context *ctx,
 struct gl_perf_query_object *obj,
 GLsizei dataSize,
-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965: Implement INTEL_performance_query backend

2017-02-22 Thread Robert Bragg
On Wed, Feb 22, 2017 at 1:26 AM, Kenneth Graunke <kenn...@whitecape.org> wrote:
> On Thursday, February 16, 2017 5:20:37 AM PST Robert Bragg wrote:
> [snip]
>> +   switch(obj->query->kind) {
>
> Space after "switch" please.

Oh, oops, repeated that in a few places.

>
> Patch 3 is:
> Reviewed-by: Kenneth Graunke <kenn...@whitecape.org>

Thanks
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] Model INTEL perf query backend after query object BE

2017-02-22 Thread Robert Bragg
On Wed, Feb 22, 2017 at 1:24 AM, Kenneth Graunke <kenn...@whitecape.org> wrote:
> On Wednesday, February 15, 2017 1:37:36 PM PST Robert Bragg wrote:
>> Instead of using the same backend interface as AMD_performance_monitor
>> this defines a dedicated INTEL_performance_query interface that is
>> modelled more on the ARB_query_buffer_object interface (considering the
>> similarity of the extensions) with the addition of vfuncs for
>> initializing and enumerating query and counter info.
>
> Patches 1 and 2's commit titles should start with "mesa: ".

Ah, yup okey.

>
>> Compared to the previous backend, some notable differences are:
>>
>> - The backend is free to represent counters using whatever data
>>   structures are optimal/convenient since queries and counters are
>>   enumerated via an iterator api instead of declaring them using
>>   structures directly shared with the frontend.
>>
>>   This is also done to help us support the full range of data and
>>   semantic types available with INTEL_performance_query which is awkward
>>   while using a structure shared with the AMD_performance_monitor
>>   backend since neither extension's types are a subset of the other.
>>
>> - The backend must support waiting for a query instead of the frontend
>>   simply using glFinish().
>>
>> - Objects go through 'Active' and 'Ready' states consistent with the
>>   query object backend (hopefully making them more familiar). There is
>>   no 'Ended' state (which used to show that a query has ended at least
>>   once for a given object). There is a new 'Used' state similar to the
>>   'EverBound' state of query objects, set when a query is first begun
>>   which implies that we are expecting to get results back for the object
>>   at some point.
>
> That's a little different from EverBound, which is used to answer stupid
> glIsFoo() queries - where glGenFoo() doesn't actually "create" a Foo,
> but glBindFoo() does.  An awkward concept.

Ok, makes sense now. I've updated the comment to note there's no
equivalent to EverBound needed here.

>
>> The INTEL_performance_query and AMD_performance_monitor extensions are
>> now completely orthogonal within Mesa main (though a driver could
>> optionally choose to implement both extensions within a unified backend
>> if that were convenient for the sake of sharing state/code).
>>
>> v2: (Samuel Pitoiset)
>> - init PerfQuery.NumQueries in frontend
>> - s/return_string/output_clipped_string/
>> - s/backed/backend/ typo
>> - remove redundant *bytesWritten = 0
>> v3:
>> - Add InitPerfQueryInfo for lazy probing of available queries
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> ---
>>  src/mesa/main/dd.h|  41 +++
>>  src/mesa/main/mtypes.h|  24 +-
>>  src/mesa/main/performance_query.c | 625 
>> ++
>>  src/mesa/main/performance_query.h |   6 +-
>>  4 files changed, 295 insertions(+), 401 deletions(-)
>>
>> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
>> index 7ebd084ca3..e77df31cf2 100644
>> --- a/src/mesa/main/dd.h
>> +++ b/src/mesa/main/dd.h
>> @@ -780,6 +780,47 @@ struct dd_function_table {
>> /*@}*/
>>
>> /**
>> +* \name Performance Query objects
>> +*/
>> +   /*@{*/
>> +   GLuint (*InitPerfQueryInfo)(struct gl_context *ctx);
>> +   void (*GetPerfQueryInfo)(struct gl_context *ctx,
>> +int queryIndex,
>> +const char **name,
>> +GLuint *dataSize,
>> +GLuint *numCounters,
>> +GLuint *numActive);
>> +   void (*GetPerfCounterInfo)(struct gl_context *ctx,
>> +  int queryIndex,
>> +  int counterIndex,
>> +  const char **name,
>> +  const char **desc,
>> +  GLuint *offset,
>> +  GLuint *data_size,
>> +  GLuint *type_enum,
>> +  GLuint *data_type_enum,
>> +  GLuint64 *raw_max);
>> +   struct gl_perf_query_object * (*NewPerfQueryObject)(struct gl_context 
>> *ctx,
>> +   int queryIndex);
>> +   void (*DeletePerfQuery)(struct gl_context *ctx,
>> +   struct gl_perf_query_object *obj

Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-02-16 Thread Robert Bragg
On Wed, Feb 15, 2017 at 11:04 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Tue, Jan 24, 2017 at 5:27 PM, Robert Bragg <rob...@sixbynine.org> wrote:
>>>>>> Depending on how strictly we consider that the queries should only 
>>>>>> measure
>>>>>> the commands they bracket then I think some stalling will be necessary to
>>>>>> serialize the work associated with a query and defer reading the end 
>>>>>> state
>>>>>> until after the relevant stages have completed their work.
>>>>>>
>>>>>> We aren't very precise about this in GL currently, but in Begin maybe we
>>>>>> should stall until everything >= the statistic-stage is idle and in End
>>>>>> stall until everything <= the statistic-stage is idle before reading
>>>>>> (where
>>>>>> 'statistic-stage' here is the pipeline stage associated with the pipeline
>>>>>> statistic being queried (or respectively the min/max stage for a set)).
>>>>>>
>>>>>> For reference in my implementation of INTEL_performance_query facing this
>>>>>> same question, I'm currently just stalling before and after queries:
>>>>>>
>>>>>>
>>>>>> https://github.com/rib/mesa/blob/wip/rib/oa-next/src/mesa/drivers/dri/i965/brw_performance_query.c#L994
>>>>>>
>>>>>> https://github.com/rib/mesa/blob/wip/rib/oa-next/src/mesa/drivers/dri/i965/brw_performance_query.c#L1136
>>>>>
>>>>> So that's essentially what I'm doing here, I think. (And what the GL
>>>>> driver does.)
>>
>> Yup, the upshot might just be a comment explaining the need for a
>> stall. I think we probably need a stall in CmdEndQuery too, otherwise
>> the command streamer may read the end counter before the work has
>> finished.
>
> Robert,
>
> Can you give me some examples of how I might implement this? I'm not
> so familiar with the Intel HW to know this offhand. Mostly hoping you
> can point me at a mapping of which bit in what command corresponds to
> which stage.

Heh, actually just after I sent out my series for
GL_INTEL_performance_query yesterday I of course remembered that I
needed to fold back command streamer synchronization from a later
patch to the one for pipeline statistics.

My last reply was just trying to suggest replacing the "TODO: This
might only be necessary for certain stats" comment - so nothing to
really implement. I had thought you might be missing a corresponding
stall in the CmdEndQuery but just checking it looks like you already
have one with the same TODO comment. Sorry I didn't double check that
at the time.

I'm not sure it's worth worrying about trying to apply fine grained
control over flushing, even though I suggested that idea originally.
After looking into that possibility more I don't think the HW actually
supports very detailed control (with one exception maybe being to use
DEPTH_STALL with occlusion queries).

My (limited) understanding is that a PIPE_CONTROL with CS_STALL and
STALL_AT_SCOREBOARD should generally suffice to stall the command
streamer until the pipeline has been drained. Since this is what you
are already doing my last reply was trying to say that it maybe just
needs a better comment to explain why we need:

+  /* TODO: This might only be necessary for certain stats */
+  anv_batch_emit(_buffer->batch, GENX(PIPE_CONTROL), pc) {
+ pc.CommandStreamerStallEnable = true;
+ pc.StallAtPixelScoreboard = true;
+  }

instead of "TODO: This might only be necessary for certain stats".


I don't know if it's a clear explanation, but feel free to steal
anything from my latest attempt to comment the need for stalling in
this patch for INTEL_performance_query.

https://lists.freedesktop.org/archives/mesa-dev/2017-February/144670.html

Btw, in case you ask, I've never found a good explanation of what
'stall at scoreboard' really means since I'm not really familiar with
what the scoreboard is. :-/ One impression I've got is that it's just
the least-restrictive way of satisfying the restrictions on using
CS_STALL. I think the scoreboard is something related to dependency
tracking while scheduling threads to execute on EUs so I currently
imagine it to mean "stall until there are no more threads left to
schedule for pixel shading" - maybe someone else knows better.

One other data point here is that the Intel driver on windows uses
PIPE_CONTROL + CS_STALL + STALL_AT_SCOREBOARD in its implementation of
INTEL_performance_query and query objects, so hopefully they've found
that to be enough. So even if this does nothing to explain why, all
things being equal it could be good to be consistent if we're ever
trying to compare metrics between different drivers.

Regards,
- Robert


>
> Thanks,
>
>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965: Implement INTEL_performance_query backend

2017-02-16 Thread Robert Bragg
I forgot to fold back the addition of pipeline stalls around
queries from a later patch (a detailed explanation is included
as a comment in the code below).

--- >8 --- (git am --scissor)

This adds a bare-bones backend for the INTEL_performance_query extension
that exposes pipeline statistics.

Although this could be considered redundant given that the same
statistics are already available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.

This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/Makefile.sources|   2 +
 src/mesa/drivers/dri/i965/brw_context.c   |   3 +
 src/mesa/drivers/dri/i965/brw_context.h   |  23 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 649 ++
 src/mesa/drivers/dri/i965/brw_performance_query.h |  49 ++
 src/mesa/drivers/dri/i965/intel_extensions.c  |   3 +
 6 files changed, 729 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index dd546826d1..5278e86339 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -135,6 +135,8 @@ i965_FILES = \
brw_nir_uniforms.cpp \
brw_object_purgeable.c \
brw_pipe_control.c \
+   brw_performance_query.h \
+   brw_performance_query.c \
brw_program.c \
brw_program.h \
brw_program_cache.c \
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index c56a14e3d6..393adede8a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -1139,6 +1139,9 @@ brwCreateContext(gl_api api,
_mesa_initialize_dispatch_tables(ctx);
_mesa_initialize_vbo_vtxfmt(ctx);
 
+   if (ctx->Extensions.INTEL_performance_query)
+  brw_init_performance_queries(brw);
+
vbo_use_buffer_objects(ctx);
vbo_always_unmap_buffers(ctx);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 01e651b09f..a6d91bcce0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -655,6 +655,19 @@ struct shader_times;
 
 struct gen_l3_config;
 
+enum brw_query_kind {
+   PIPELINE_STATS
+};
+
+struct brw_perf_query_info
+{
+   enum brw_query_kind kind;
+   const char *name;
+   struct brw_perf_query_counter *counters;
+   int n_counters;
+   size_t data_size;
+};
+
 /**
  * brw_context is derived from gl_context.
  */
@@ -1132,6 +1145,13 @@ struct brw_context
   bool supported;
} predicate;
 
+   struct {
+  struct brw_perf_query_info *queries;
+  int n_queries;
+
+  int n_active_pipeline_stats_queries;
+   } perfquery;
+
int num_atoms[BRW_NUM_PIPELINES];
const struct brw_tracked_state render_atoms[76];
const struct brw_tracked_state compute_atoms[11];
@@ -1434,6 +1454,9 @@ bool brw_render_target_supported(struct brw_context *brw,
  struct gl_renderbuffer *rb);
 uint32_t brw_depth_format(struct brw_context *brw, mesa_format format);
 
+/* brw_performance_query.c */
+void brw_init_performance_queries(struct brw_context *brw);
+
 /* intel_buffer_objects.c */
 int brw_bo_map(struct brw_context *brw, drm_intel_bo *bo, int write_enable,
const char *bo_name);
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
new file mode 100644
index 00..f1b6f583bf
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -0,0 +1,649 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON

[Mesa-dev] [PATCH 1/3] Separate INTEL_performance_query frontend

2017-02-15 Thread Robert Bragg
To allow the backend interfaces for AMD_performance_monitor and
INTEL_performance_query to evolve independently based on the more
specific requirements of each extension this starts by separating
the frontends of these extensions.

Even though there wasn't much tying these frontends together, this
separation intentionally copies what few helpers/utilities that were
shared between the two extensions, avoiding any re-factoring specific to
INTEL_performance_query so that the evolution will be easier to follow
later.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mapi/glapi/gen/gl_genexec.py|   1 +
 src/mesa/Makefile.sources   |   2 +
 src/mesa/main/context.c |   3 +
 src/mesa/main/mtypes.h  |  15 +
 src/mesa/main/performance_monitor.c | 590 ---
 src/mesa/main/performance_monitor.h |  39 --
 src/mesa/main/performance_query.c   | 784 
 src/mesa/main/performance_query.h   |  84 
 8 files changed, 889 insertions(+), 629 deletions(-)
 create mode 100644 src/mesa/main/performance_query.c
 create mode 100644 src/mesa/main/performance_query.h

diff --git a/src/mapi/glapi/gen/gl_genexec.py b/src/mapi/glapi/gen/gl_genexec.py
index 183e6ff741..e63393d26a 100644
--- a/src/mapi/glapi/gen/gl_genexec.py
+++ b/src/mapi/glapi/gen/gl_genexec.py
@@ -92,6 +92,7 @@ header = """/**
 #include "main/objectlabel.h"
 #include "main/objectpurge.h"
 #include "main/performance_monitor.h"
+#include "main/performance_query.h"
 #include "main/pipelineobj.h"
 #include "main/pixel.h"
 #include "main/pixelstore.h"
diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index ee737b0d7f..ae701138f5 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -150,6 +150,8 @@ MAIN_FILES = \
main/pbo.h \
main/performance_monitor.c \
main/performance_monitor.h \
+   main/performance_query.c \
+   main/performance_query.h \
main/pipelineobj.c \
main/pipelineobj.h \
main/pixel.c \
diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 16e25a9bc2..f33ebe57e3 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -108,6 +108,7 @@
 #include "matrix.h"
 #include "multisample.h"
 #include "performance_monitor.h"
+#include "performance_query.h"
 #include "pipelineobj.h"
 #include "pixel.h"
 #include "pixelstore.h"
@@ -836,6 +837,7 @@ init_attrib_groups(struct gl_context *ctx)
_mesa_init_matrix( ctx );
_mesa_init_multisample( ctx );
_mesa_init_performance_monitors( ctx );
+   _mesa_init_performance_queries( ctx );
_mesa_init_pipeline( ctx );
_mesa_init_pixel( ctx );
_mesa_init_pixelstore( ctx );
@@ -1328,6 +1330,7 @@ _mesa_free_context_data( struct gl_context *ctx )
_mesa_free_varray_data(ctx);
_mesa_free_transform_feedback(ctx);
_mesa_free_performance_monitors(ctx);
+   _mesa_free_performance_queries(ctx);
 
_mesa_reference_buffer_object(ctx, >Pack.BufferObj, NULL);
_mesa_reference_buffer_object(ctx, >Unpack.BufferObj, NULL);
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 08bd929255..f3a24df589 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1874,6 +1874,20 @@ struct gl_perf_monitor_state
 
 
 /**
+ * Context state for INTEL_performance_query.
+ */
+struct gl_perf_query_state
+{
+   /** Array of performance monitor groups (indexed by group ID) */
+   const struct gl_perf_monitor_group *Groups;
+   GLuint NumGroups;
+
+   /** The table of all performance query objects. */
+   struct _mesa_HashTable *Objects;
+};
+
+
+/**
  * Names of the various vertex/fragment program register files, etc.
  *
  * NOTE: first four tokens must fit into 2 bits (see t_vb_arbprogram.c)
@@ -4512,6 +4526,7 @@ struct gl_context
struct gl_transform_feedback_state TransformFeedback;
 
struct gl_perf_monitor_state PerfMonitor;
+   struct gl_perf_query_state PerfQuery;
 
struct gl_buffer_object *DrawIndirectBuffer; /** < GL_ARB_draw_indirect */
struct gl_buffer_object *ParameterBuffer; /** < GL_ARB_indirect_parameters 
*/
diff --git a/src/mesa/main/performance_monitor.c 
b/src/mesa/main/performance_monitor.c
index 43529b2b35..a4a2f9e597 100644
--- a/src/mesa/main/performance_monitor.c
+++ b/src/mesa/main/performance_monitor.c
@@ -685,593 +685,3 @@ _mesa_perf_monitor_counter_size(const struct 
gl_perf_monitor_counter *c)
   return 0;
}
 }
-
-extern void GLAPIENTRY
-_mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId)
-{
-   GET_CURRENT_CONTEXT(ctx);
-   unsigned numGroups;
-
-   init_groups(ctx);
-
-   /* The GL_INTEL_performance_query spec says:
-*
-*"If queryId pointer is equal to 0, INVALID_VALUE error is generated."
-*/
-   if (!qu

[Mesa-dev] [PATCH 2/3] Model INTEL perf query backend after query object BE

2017-02-15 Thread Robert Bragg
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is
modelled more on the ARB_query_buffer_object interface (considering the
similarity of the extensions) with the addition of vfuncs for
initializing and enumerating query and counter info.

Compared to the previous backend, some notable differences are:

- The backend is free to represent counters using whatever data
  structures are optimal/convenient since queries and counters are
  enumerated via an iterator api instead of declaring them using
  structures directly shared with the frontend.

  This is also done to help us support the full range of data and
  semantic types available with INTEL_performance_query which is awkward
  while using a structure shared with the AMD_performance_monitor
  backend since neither extension's types are a subset of the other.

- The backend must support waiting for a query instead of the frontend
  simply using glFinish().

- Objects go through 'Active' and 'Ready' states consistent with the
  query object backend (hopefully making them more familiar). There is
  no 'Ended' state (which used to show that a query has ended at least
  once for a given object). There is a new 'Used' state similar to the
  'EverBound' state of query objects, set when a query is first begun
  which implies that we are expecting to get results back for the object
  at some point.

The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa main (though a driver could
optionally choose to implement both extensions within a unified backend
if that were convenient for the sake of sharing state/code).

v2: (Samuel Pitoiset)
- init PerfQuery.NumQueries in frontend
- s/return_string/output_clipped_string/
- s/backed/backend/ typo
- remove redundant *bytesWritten = 0
v3:
- Add InitPerfQueryInfo for lazy probing of available queries

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/main/dd.h|  41 +++
 src/mesa/main/mtypes.h|  24 +-
 src/mesa/main/performance_query.c | 625 ++
 src/mesa/main/performance_query.h |   6 +-
 4 files changed, 295 insertions(+), 401 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 7ebd084ca3..e77df31cf2 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -780,6 +780,47 @@ struct dd_function_table {
/*@}*/
 
/**
+* \name Performance Query objects
+*/
+   /*@{*/
+   GLuint (*InitPerfQueryInfo)(struct gl_context *ctx);
+   void (*GetPerfQueryInfo)(struct gl_context *ctx,
+int queryIndex,
+const char **name,
+GLuint *dataSize,
+GLuint *numCounters,
+GLuint *numActive);
+   void (*GetPerfCounterInfo)(struct gl_context *ctx,
+  int queryIndex,
+  int counterIndex,
+  const char **name,
+  const char **desc,
+  GLuint *offset,
+  GLuint *data_size,
+  GLuint *type_enum,
+  GLuint *data_type_enum,
+  GLuint64 *raw_max);
+   struct gl_perf_query_object * (*NewPerfQueryObject)(struct gl_context *ctx,
+   int queryIndex);
+   void (*DeletePerfQuery)(struct gl_context *ctx,
+   struct gl_perf_query_object *obj);
+   GLboolean (*BeginPerfQuery)(struct gl_context *ctx,
+   struct gl_perf_query_object *obj);
+   void (*EndPerfQuery)(struct gl_context *ctx,
+struct gl_perf_query_object *obj);
+   void (*WaitPerfQuery)(struct gl_context *ctx,
+ struct gl_perf_query_object *obj);
+   GLboolean (*IsPerfQueryReady)(struct gl_context *ctx,
+ struct gl_perf_query_object *obj);
+   void (*GetPerfQueryData)(struct gl_context *ctx,
+struct gl_perf_query_object *obj,
+GLsizei dataSize,
+GLuint *data,
+GLuint *bytesWritten);
+   /*@}*/
+
+
+   /**
 * \name GREMEDY debug/marker functions
 */
/*@{*/
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index f3a24df589..e6cf1f4af6 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1860,6 +1860,23 @@ struct gl_perf_monitor_group
 
 
 /**
+ * A query object instance as described in INTEL_performance_query.
+ *
+ * NB: We want to keep this and the corresponding backend structure
+ * relatively lean considering that applications may expect to
+ * allocate enough objects to be able to query around all draw calls
+ * in a

[Mesa-dev] [PATCH 3/3] i965: Implement INTEL_performance_query backend

2017-02-15 Thread Robert Bragg
This adds a bare-bones backend for the INTEL_performance_query extension
that exposes pipeline statistics.

Although this could be considered redundant given that the same
statistics are already available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.

This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/mesa/drivers/dri/i965/Makefile.sources|   2 +
 src/mesa/drivers/dri/i965/brw_context.c   |   3 +
 src/mesa/drivers/dri/i965/brw_context.h   |  23 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 595 ++
 src/mesa/drivers/dri/i965/brw_performance_query.h |  49 ++
 src/mesa/drivers/dri/i965/intel_extensions.c  |   3 +
 6 files changed, 675 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index dd546826d1..5278e86339 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -135,6 +135,8 @@ i965_FILES = \
brw_nir_uniforms.cpp \
brw_object_purgeable.c \
brw_pipe_control.c \
+   brw_performance_query.h \
+   brw_performance_query.c \
brw_program.c \
brw_program.h \
brw_program_cache.c \
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index c56a14e3d6..393adede8a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -1139,6 +1139,9 @@ brwCreateContext(gl_api api,
_mesa_initialize_dispatch_tables(ctx);
_mesa_initialize_vbo_vtxfmt(ctx);
 
+   if (ctx->Extensions.INTEL_performance_query)
+  brw_init_performance_queries(brw);
+
vbo_use_buffer_objects(ctx);
vbo_always_unmap_buffers(ctx);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 01e651b09f..a6d91bcce0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -655,6 +655,19 @@ struct shader_times;
 
 struct gen_l3_config;
 
+enum brw_query_kind {
+   PIPELINE_STATS
+};
+
+struct brw_perf_query_info
+{
+   enum brw_query_kind kind;
+   const char *name;
+   struct brw_perf_query_counter *counters;
+   int n_counters;
+   size_t data_size;
+};
+
 /**
  * brw_context is derived from gl_context.
  */
@@ -1132,6 +1145,13 @@ struct brw_context
   bool supported;
} predicate;
 
+   struct {
+  struct brw_perf_query_info *queries;
+  int n_queries;
+
+  int n_active_pipeline_stats_queries;
+   } perfquery;
+
int num_atoms[BRW_NUM_PIPELINES];
const struct brw_tracked_state render_atoms[76];
const struct brw_tracked_state compute_atoms[11];
@@ -1434,6 +1454,9 @@ bool brw_render_target_supported(struct brw_context *brw,
  struct gl_renderbuffer *rb);
 uint32_t brw_depth_format(struct brw_context *brw, mesa_format format);
 
+/* brw_performance_query.c */
+void brw_init_performance_queries(struct brw_context *brw);
+
 /* intel_buffer_objects.c */
 int brw_bo_map(struct brw_context *brw, drm_intel_bo *bo, int write_enable,
const char *bo_name);
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
new file mode 100644
index 00..48f834878d
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -0,0 +1,595 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * 

[Mesa-dev] [PATCH 0/3] i965: INTEL_performance_query for pipeline stats

2017-02-15 Thread Robert Bragg
To hopefully make progress towards landing support for OA unit metrics exposed
via INTEL_performance_query the idea here is to first just tackle upstreaming
the backend rework with an initial implementation supporting pipeline
statistics.

In case anyone wants to look ahead, my branch with these patches as well as OA
unit support can be found at https://github.com/rib/mesa - wip/rib/oa-next

Regards,
- Robert

Robert Bragg (3):
  Separate INTEL_performance_query frontend
  Model INTEL perf query backend after query object BE
  i965: Implement INTEL_performance_query backend

 src/mapi/glapi/gen/gl_genexec.py  |   1 +
 src/mesa/Makefile.sources |   2 +
 src/mesa/drivers/dri/i965/Makefile.sources|   2 +
 src/mesa/drivers/dri/i965/brw_context.c   |   3 +
 src/mesa/drivers/dri/i965/brw_context.h   |  23 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 595 
 src/mesa/drivers/dri/i965/brw_performance_query.h |  49 ++
 src/mesa/drivers/dri/i965/intel_extensions.c  |   3 +
 src/mesa/main/context.c   |   3 +
 src/mesa/main/dd.h|  41 ++
 src/mesa/main/mtypes.h|  27 +
 src/mesa/main/performance_monitor.c   | 590 
 src/mesa/main/performance_monitor.h   |  39 --
 src/mesa/main/performance_query.c | 629 ++
 src/mesa/main/performance_query.h |  80 +++
 15 files changed, 1458 insertions(+), 629 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.h
 create mode 100644 src/mesa/main/performance_query.c
 create mode 100644 src/mesa/main/performance_query.h

-- 
2.11.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-01-24 Thread Robert Bragg
On Tue, Jan 24, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Tue, Jan 24, 2017 at 5:27 PM, Robert Bragg <rob...@sixbynine.org> wrote:
>>>>>>> +/*
>>>>>>> + * GPR0 = GPR0 >> 2;
>>>>>>> + *
>>>>>>> + * Note that the upper 30 bits of GPR are lost!
>>>>>>> + */
>>>>>>> +static void
>>>>>>> +shr_gpr0_by_2_bits(struct anv_batch *batch)
>>>>>>> +{
>>>>>>> +   shl_gpr0_by_30_bits(batch);
>>>>>>> +   emit_load_alu_reg_reg32(batch, CS_GPR(0) + 4, CS_GPR(0));
>>>>>>> +   emit_load_alu_reg_imm32(batch, CS_GPR(0) + 4, 0);
>>
>>
>> I recently noticed from inspecting the original hsw_queryobj,c code
>> that this looks suspicious.
>>
>> Conceptually it makes sense to implement a right shift as lshift by
>> 32-n and then keeping the upper 32bits, but the emit_load_ functions
>> take a destination followed by a source and so it looks like after the
>> left shift it's copying the least significant 32bits of R0 over the
>> most significant and then setting the most significant 32bits of R0 to
>> zero. It looks like the first load_alu is redundant if the second one
>> just writes zero to the same location.
>>
>> Maybe I'm misreading something here though, this comment it just based
>> on inspection.
>
> What you're missing, I think, is that
>
> emit_load_alu_reg_reg32(batch, CS_GPR(0) + 4, CS_GPR(0));
>
> does CS_GPR(0) = CS_GPR(0) + 4, and not the inverse as one logically
> might have thought. I copied the semantics from the hsw_queryobj.c
> file, but I think they stink. But it stinks even more to have 2
> functions with inverted argument meanings.
>
> Does that make sense?

oh yeah sorry, not sure how I convinced myself it took dst then src.

>
> [So we have GPR0 which is a 64-bit entity, and do GPR0 <<= 30; GPR0_LO
> = GPR0_HI; GPR0_HI = 0; and then we can store GPR0 somewhere.]
>
> As for re-using your generalized shifter, I don't think that'd make
> sense to introduce in this change. It feels like a component on its
> own, which should be integrated (or not) separately. When/if it is,
> this and hsw_queryobj.c could migrate to using it.

Yup definitely, this code works for the current need so no need to
mess around with it here - thanks for clarifying my misreading.

- Robert

>
> Cheers,
>
>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2017-01-24 Thread Robert Bragg
Sorry for the delay responding here; some comments below...


On Tue, Jan 24, 2017 at 11:48 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> 2-month ping. [ok, it hasn't been 2 months on the dot, but ... close.]
>
> On Tue, Jan 10, 2017 at 5:49 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>> ping.
>>
>> On Thu, Dec 22, 2016 at 11:14 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>>> Ping? Any further comments/feedback/reviews?
>>>
>>>
>>> On Dec 5, 2016 11:22 AM, "Ilia Mirkin" <imir...@alum.mit.edu> wrote:
>>>
>>> On Mon, Dec 5, 2016 at 11:11 AM, Robert Bragg <rob...@sixbynine.org> wrote:
>>>>
>>>>
>>>> On Sun, Nov 27, 2016 at 7:23 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>>>>>
>>>>> The strategy is to just keep n anv_query_pool_slot entries per query
>>>>> instead of one. The available bit is only valid in the last one.
>>>>>
>>>>> Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
>>>>> ---
>>>>>
>>>>> I think this is in a pretty good state now. I've tested both the direct
>>>>> and
>>>>> buffer paths with a hacked up cube application, and I'm seeing
>>>>> non-ridiculous
>>>>> values for the various counters, although I haven't 100% verified them
>>>>> for
>>>>> accuracy.
>>>>>
>>>>> This also implements the hsw/bdw workaround for dividing frag invocations
>>>>> by 4,
>>>>> copied from hsw_queryobj. I tested this on SKL and it seem to divide the
>>>>> values
>>>>> as expected.
>>>>>
>>>>> The cube patch I've been testing with is at
>>>>> http://paste.debian.net/899374/
>>>>> You can flip between copying to a buffer and explicit retrieval by
>>>>> commenting
>>>>> out the relevant function calls.
>>>>>
>>>>>  src/intel/vulkan/anv_device.c  |   2 +-
>>>>>  src/intel/vulkan/anv_private.h |   4 +
>>>>>  src/intel/vulkan/anv_query.c   |  99 ++
>>>>>  src/intel/vulkan/genX_cmd_buffer.c | 260
>>>>> -
>>>>>  4 files changed, 308 insertions(+), 57 deletions(-)
>>>>>
>>>>>
>>>>> diff --git a/src/intel/vulkan/anv_device.c
>>>>> b/src/intel/vulkan/anv_device.c
>>>>> index 99eb73c..7ad1970 100644
>>>>> --- a/src/intel/vulkan/anv_device.c
>>>>> +++ b/src/intel/vulkan/anv_device.c
>>>>> @@ -427,7 +427,7 @@ void anv_GetPhysicalDeviceFeatures(
>>>>>.textureCompressionASTC_LDR   = pdevice->info.gen >=
>>>>> 9,
>>>>> /* FINISHME CHV */
>>>>>.textureCompressionBC = true,
>>>>>.occlusionQueryPrecise= true,
>>>>> -  .pipelineStatisticsQuery  = false,
>>>>> +  .pipelineStatisticsQuery  = true,
>>>>>.fragmentStoresAndAtomics = true,
>>>>>.shaderTessellationAndGeometryPointSize   = true,
>>>>>.shaderImageGatherExtended= false,
>>>>> diff --git a/src/intel/vulkan/anv_private.h
>>>>> b/src/intel/vulkan/anv_private.h
>>>>> index 2fc543d..7271609 100644
>>>>> --- a/src/intel/vulkan/anv_private.h
>>>>> +++ b/src/intel/vulkan/anv_private.h
>>>>> @@ -1763,6 +1763,8 @@ struct anv_render_pass {
>>>>> struct anv_subpass   subpasses[0];
>>>>>  };
>>>>>
>>>>> +#define ANV_PIPELINE_STATISTICS_COUNT 11
>>>>> +
>>>>>  struct anv_query_pool_slot {
>>>>> uint64_t begin;
>>>>> uint64_t end;
>>>>> @@ -1772,6 +1774,8 @@ struct anv_query_pool_slot {
>>>>>  struct anv_query_pool {
>>>>> VkQueryType  type;
>>>>> uint32_t slots;
>>>>> +   uint32_t pipeline_statistics;
>>>>> +   uint32_t slot_stride;
>>>>> struct anv_bobo;
>>>>>  };
>>>>>

Re: [Mesa-dev] Documenting with Sphinx

2016-12-15 Thread Robert Bragg
On Wed, Dec 14, 2016 at 7:17 PM, Jani Nikula
<jani.nik...@linux.intel.com> wrote:
> On Wed, 14 Dec 2016, Robert Bragg <rob...@sixbynine.org> wrote:
>> On Tue, Dec 13, 2016 at 11:08 PM, Rob Clark <robdcl...@gmail.com> wrote:
>>> On Tue, Dec 13, 2016 at 5:47 PM, Eric Anholt <e...@anholt.net> wrote:
>>>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>>>
>>>>> Hey All,
>>>>> I don't figure this will be terribly controversial (I'm about to be wrong,
>>>>> aren't I?) but how do people feel about switching our "primary"
>>>>> documentation focus, as far as we have one, to sphinx?  Right now, Gallium
>>>>> uses sphinx for documenting all of it's "public" interfaces.  A while ago,
>>>>> Connor wrote some nice NIR documentation using sphinx that never got
>>>>> merged.  I've also spent some time this year and written some additional
>>>>> documentation about Intel surface layout that I'd like to have a place to
>>>>> put.  The thing that's in common with each of these is that they contain
>>>>> quite a bit of prose about the general theory of how things work and not
>>>>> just code comments so sphinx seems like a fairly good fit.  There are also
>>>>> some sort-of "architectural" things in the Vulkan driver that I would like
>>>>> to document better and we don't have a good place for that right now.
>>>>>
>>>>> Here's what I have in mind:
>>>>>
>>>>> Each component that we care to document (this may not be everything!) 
>>>>> would
>>>>> have a docs/ or sphinx/ folder.  In that folder would be any side-band 
>>>>> (not
>>>>> in code comments) documentation.  We would also set up some sort of
>>>>> source-scraping tool to turn inline comments into sphinx documentation.  
>>>>> At
>>>>> the component maintainer's discretion, that code could either be imported
>>>>> into sphinx whole-sale or curated by importing bits into other
>>>>> documentation files in a more carefully curated manner.  Of course, a
>>>>> maintainer could also choose to do all of their documentation side-band
>>>>> (sphinx allows for this) or to not document at all. :-)
>>>
>>> So, +100 to sphinx... I've been quite happy with it (and would be nice
>>> to merge the NIR docs too while where at it).  Maybe someday we'll
>>> have a "mesa book" after all ;-)
>>>
>>>
>>>> I've written doxygen-style comments in my code for a long time, but
>>>> never really looked at the doxygen output, so I wouldn't say I'm wedded
>>>> to doxygen.  I think some side-band docs for architectural descriptions
>>>> (like NIR needs) would be great, and if sphinx would help with
>>>> encouraging those to be written and merged, then that's pretty good with
>>>> me.
>>>>
>>>> I'm a little skeptical of sphinx due to not being able to extract docs
>>>> From C comments in code in the base tool.  Apparently Breathe can chain
>>>> doxygen into sphinx, so hopefully a "let's get excited about sphinx"
>>>> patch series would convert our doxygen-based docs generation to using
>>>> that.
>>>
>>> So as far as I understand on kernel side we are *somehow* extracting
>>> out C comment docs and combining w/ sphinx stuff..  maybe there is
>>> something to re-use from that.
>>
>> The actual parsing, extraction and generation of reStructureText for
>> the kernel is handled by scripts/kernel-doc, which is a perl script
>> based around regex parsing of C code. Jani Nikula has also written a
>> sphinx extension for using this script which can be seen in
>> Documentation/sphinx/. As you can probably guess kernel-doc is pretty
>> linux kernel specific. The script existed before the recent adoption
>> of sphinx and has support for docbook and man page generation. Being
>> based on regex parsing it has a somewhat limited understanding of the
>> underlying types of symbols being documented.
>>
>> I recently raised the idea of using clang's python bindings to extract
>> kerneldoc on dri-devel due to some of the limitations of kernel-doc I
>> was hitting and pointed at an experimental tool I wrote last year for
>> generating reStructureText based on gtk-doc comments:
>> https://github.com/rib/clib/blob/master/site/rst-from-c.py Probably
>

Re: [Mesa-dev] Documenting with Sphinx

2016-12-14 Thread Robert Bragg
On Tue, Dec 13, 2016 at 11:08 PM, Rob Clark  wrote:
> On Tue, Dec 13, 2016 at 5:47 PM, Eric Anholt  wrote:
>> Jason Ekstrand  writes:
>>
>>> Hey All,
>>> I don't figure this will be terribly controversial (I'm about to be wrong,
>>> aren't I?) but how do people feel about switching our "primary"
>>> documentation focus, as far as we have one, to sphinx?  Right now, Gallium
>>> uses sphinx for documenting all of it's "public" interfaces.  A while ago,
>>> Connor wrote some nice NIR documentation using sphinx that never got
>>> merged.  I've also spent some time this year and written some additional
>>> documentation about Intel surface layout that I'd like to have a place to
>>> put.  The thing that's in common with each of these is that they contain
>>> quite a bit of prose about the general theory of how things work and not
>>> just code comments so sphinx seems like a fairly good fit.  There are also
>>> some sort-of "architectural" things in the Vulkan driver that I would like
>>> to document better and we don't have a good place for that right now.
>>>
>>> Here's what I have in mind:
>>>
>>> Each component that we care to document (this may not be everything!) would
>>> have a docs/ or sphinx/ folder.  In that folder would be any side-band (not
>>> in code comments) documentation.  We would also set up some sort of
>>> source-scraping tool to turn inline comments into sphinx documentation.  At
>>> the component maintainer's discretion, that code could either be imported
>>> into sphinx whole-sale or curated by importing bits into other
>>> documentation files in a more carefully curated manner.  Of course, a
>>> maintainer could also choose to do all of their documentation side-band
>>> (sphinx allows for this) or to not document at all. :-)
>
> So, +100 to sphinx... I've been quite happy with it (and would be nice
> to merge the NIR docs too while where at it).  Maybe someday we'll
> have a "mesa book" after all ;-)
>
>
>> I've written doxygen-style comments in my code for a long time, but
>> never really looked at the doxygen output, so I wouldn't say I'm wedded
>> to doxygen.  I think some side-band docs for architectural descriptions
>> (like NIR needs) would be great, and if sphinx would help with
>> encouraging those to be written and merged, then that's pretty good with
>> me.
>>
>> I'm a little skeptical of sphinx due to not being able to extract docs
>> From C comments in code in the base tool.  Apparently Breathe can chain
>> doxygen into sphinx, so hopefully a "let's get excited about sphinx"
>> patch series would convert our doxygen-based docs generation to using
>> that.
>
> So as far as I understand on kernel side we are *somehow* extracting
> out C comment docs and combining w/ sphinx stuff..  maybe there is
> something to re-use from that.

The actual parsing, extraction and generation of reStructureText for
the kernel is handled by scripts/kernel-doc, which is a perl script
based around regex parsing of C code. Jani Nikula has also written a
sphinx extension for using this script which can be seen in
Documentation/sphinx/. As you can probably guess kernel-doc is pretty
linux kernel specific. The script existed before the recent adoption
of sphinx and has support for docbook and man page generation. Being
based on regex parsing it has a somewhat limited understanding of the
underlying types of symbols being documented.

I recently raised the idea of using clang's python bindings to extract
kerneldoc on dri-devel due to some of the limitations of kernel-doc I
was hitting and pointed at an experimental tool I wrote last year for
generating reStructureText based on gtk-doc comments:
https://github.com/rib/clib/blob/master/site/rst-from-c.py Probably
not developed enough to be very useful here, but provides one basic
example of generating reStructuredText from structured comments.

Jani mentioned he had a bit of a side project going to use the same
clang bindings to support extracting doxygen comments, so maybe what
he's been working on could be useful for Mesa too, but I think maybe
that was for doxygen itself so maybe nothing to do with generating
reStructureText.

In the specific case of doxygen format comments, I have a vague
recollection that clang itself can actually parse and provide some
kind of AST for your comments, so maybe that would be convenient.

Br,
- Robert

>
> BR,
> -R
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: implement pipeline statistics queries

2016-12-05 Thread Robert Bragg
On Sun, Nov 27, 2016 at 7:23 PM, Ilia Mirkin  wrote:

> The strategy is to just keep n anv_query_pool_slot entries per query
> instead of one. The available bit is only valid in the last one.
>
> Signed-off-by: Ilia Mirkin 
> ---
>
> I think this is in a pretty good state now. I've tested both the direct and
> buffer paths with a hacked up cube application, and I'm seeing
> non-ridiculous
> values for the various counters, although I haven't 100% verified them for
> accuracy.
>
> This also implements the hsw/bdw workaround for dividing frag invocations
> by 4,
> copied from hsw_queryobj. I tested this on SKL and it seem to divide the
> values
> as expected.
>
> The cube patch I've been testing with is at http://paste.debian.net/
> 899374/
> You can flip between copying to a buffer and explicit retrieval by
> commenting
> out the relevant function calls.
>
>  src/intel/vulkan/anv_device.c  |   2 +-
>  src/intel/vulkan/anv_private.h |   4 +
>  src/intel/vulkan/anv_query.c   |  99 ++
>  src/intel/vulkan/genX_cmd_buffer.c | 260 ++
> ++-
>  4 files changed, 308 insertions(+), 57 deletions(-)
>

> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index 99eb73c..7ad1970 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -427,7 +427,7 @@ void anv_GetPhysicalDeviceFeatures(
>.textureCompressionASTC_LDR   = pdevice->info.gen >= 9,
> /* FINISHME CHV */
>.textureCompressionBC = true,
>.occlusionQueryPrecise= true,
> -  .pipelineStatisticsQuery  = false,
> +  .pipelineStatisticsQuery  = true,
>.fragmentStoresAndAtomics = true,
>.shaderTessellationAndGeometryPointSize   = true,
>.shaderImageGatherExtended= false,
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index 2fc543d..7271609 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1763,6 +1763,8 @@ struct anv_render_pass {
> struct anv_subpass   subpasses[0];
>  };
>
> +#define ANV_PIPELINE_STATISTICS_COUNT 11
> +
>  struct anv_query_pool_slot {
> uint64_t begin;
> uint64_t end;
> @@ -1772,6 +1774,8 @@ struct anv_query_pool_slot {
>  struct anv_query_pool {
> VkQueryType  type;
> uint32_t slots;
> +   uint32_t pipeline_statistics;
> +   uint32_t slot_stride;
> struct anv_bobo;
>  };
>
> diff --git a/src/intel/vulkan/anv_query.c b/src/intel/vulkan/anv_query.c
> index 293257b..dc00859 100644
> --- a/src/intel/vulkan/anv_query.c
> +++ b/src/intel/vulkan/anv_query.c
> @@ -38,8 +38,10 @@ VkResult anv_CreateQueryPool(
> ANV_FROM_HANDLE(anv_device, device, _device);
> struct anv_query_pool *pool;
> VkResult result;
> -   uint32_t slot_size;
> -   uint64_t size;
> +   uint32_t slot_size = sizeof(struct anv_query_pool_slot);
> +   uint32_t slot_stride = 1;
> +   uint64_t size = pCreateInfo->queryCount * slot_size;
> +   uint32_t pipeline_statistics = 0;
>
> assert(pCreateInfo->sType == VK_STRUCTURE_TYPE_QUERY_POOL_
> CREATE_INFO);
>
> @@ -48,12 +50,16 @@ VkResult anv_CreateQueryPool(
> case VK_QUERY_TYPE_TIMESTAMP:
>break;
> case VK_QUERY_TYPE_PIPELINE_STATISTICS:
> -  return VK_ERROR_INCOMPATIBLE_DRIVER;
> +  pipeline_statistics = pCreateInfo->pipelineStatistics &
> + ((1 << ANV_PIPELINE_STATISTICS_COUNT) - 1);
> +  slot_stride = _mesa_bitcount(pipeline_statistics);
> +  size *= slot_stride;
> +  break;
> default:
>assert(!"Invalid query type");
> +  return VK_ERROR_INCOMPATIBLE_DRIVER;
> }
>
> -   slot_size = sizeof(struct anv_query_pool_slot);
> pool = vk_alloc2(>alloc, pAllocator, sizeof(*pool), 8,
>   VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
> if (pool == NULL)
> @@ -61,8 +67,9 @@ VkResult anv_CreateQueryPool(
>
> pool->type = pCreateInfo->queryType;
> pool->slots = pCreateInfo->queryCount;
> +   pool->pipeline_statistics = pipeline_statistics;
> +   pool->slot_stride = slot_stride;
>
> -   size = pCreateInfo->queryCount * slot_size;
> result = anv_bo_init_new(>bo, device, size);
> if (result != VK_SUCCESS)
>goto fail;
> @@ -95,6 +102,27 @@ void anv_DestroyQueryPool(
> vk_free2(>alloc, pAllocator, pool);
>  }
>
> +static void *
> +store_query_result(void *pData, VkQueryResultFlags flags,
> +   uint64_t result, uint64_t available)
> +{
> +   if (flags & VK_QUERY_RESULT_64_BIT) {
> +  uint64_t *dst = pData;
> +  *dst++ = result;
> +  if (flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT)
> + 

[Mesa-dev] [PATCH] anv/pipeline: Move multiple shaders/module finishme

2016-11-04 Thread Robert Bragg
The heuristic expecting the entrypoint to be named 'main' was causing
false warnings for modules with only a single shader that happen to use
another name. We now count entrypoints before triggering this warning.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/compiler/spirv/spirv_to_nir.c | 6 ++
 src/compiler/spirv/vtn_private.h  | 1 +
 src/intel/vulkan/anv_pipeline.c   | 4 
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 9c5d331..6adeaa6 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -2507,6 +2507,8 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   unsigned name_words;
   entry_point->name = vtn_string_literal(b, [3], count - 3, _words);
 
+  b->n_entry_points++;
+
   if (strcmp(entry_point->name, b->entry_point_name) != 0 ||
   stage_for_execution_model(w[1]) != b->entry_point_stage)
  break;
@@ -2995,6 +2997,10 @@ spirv_to_nir(const uint32_t *words, size_t word_count,
   return NULL;
}
 
+   if (b->n_entry_points > 1) {
+  vtn_warn("FINISHME: Multiple shaders per module not really supported");
+   }
+
b->shader = nir_shader_create(NULL, stage, options, NULL);
 
/* Set shader info defaults */
diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h
index 6f34f09..5746898 100644
--- a/src/compiler/spirv/vtn_private.h
+++ b/src/compiler/spirv/vtn_private.h
@@ -377,6 +377,7 @@ struct vtn_builder {
gl_shader_stage entry_point_stage;
const char *entry_point_name;
struct vtn_value *entry_point;
+   int n_entry_points;
bool origin_upper_left;
bool pixel_center_integer;
 
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 0aac711..9600cd3 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -90,10 +90,6 @@ anv_shader_compile_to_nir(struct anv_device *device,
   gl_shader_stage stage,
   const VkSpecializationInfo *spec_info)
 {
-   if (strcmp(entrypoint_name, "main") != 0) {
-  anv_finishme("Multiple shaders per module not really supported");
-   }
-
const struct brw_compiler *compiler =
   device->instance->physicalDevice.compiler;
const nir_shader_compiler_options *nir_options =
-- 
2.10.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Allow a per gen timebase scale factor

2016-10-27 Thread Robert Bragg
typo fixed (thanks)

--- >8 ---

Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock
with the convenient property of being able to scale by an integer (80)
to nanosecond units.

For Skylake the frequency is 12MHz or a scale factor of 83.33

This updates gen_device_info to track a floating point timebase_scale
factor and makes corresponding _queryobj.c changes to no longer assume a
scale factor of 80 works across all gens.

Although the gen6_ code could have been been left alone, the changes
keep the code more comparable, and it now shares a few utility functions
for scaling raw timestamps and calculating deltas. The utility for
calculating deltas takes into account 32 or 36bit overflow depending on
the current kernel version.

Note: this leaves the timestamp handling of ARB_query_buffer_object
untouched, which continues to use an incorrect scale of 80 on Skylake
for now. This is more awkward to solve since the scaling is currently
done using a very limited uint64 ALU available to the command parser
that doesn't support multiply or divide where it's already taking a
large number of instructions just to effectively multiple by 80.

This fixes piglit arb_timer_query-timestamp-get on Skylake

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/intel/common/gen_device_info.c| 21 +---
 src/intel/common/gen_device_info.h| 24 ++
 src/mesa/drivers/dri/i965/brw_context.c   | 15 +
 src/mesa/drivers/dri/i965/brw_context.h   |  3 ++
 src/mesa/drivers/dri/i965/brw_queryobj.c  | 53 ---
 src/mesa/drivers/dri/i965/gen6_queryobj.c | 28 +---
 6 files changed, 115 insertions(+), 29 deletions(-)

diff --git a/src/intel/common/gen_device_info.c 
b/src/intel/common/gen_device_info.c
index 30df0b2..20594b0 100644
--- a/src/intel/common/gen_device_info.c
+++ b/src/intel/common/gen_device_info.c
@@ -35,6 +35,7 @@ static const struct gen_device_info gen_device_info_i965 = {
.urb = {
   .size = 256,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_g4x = {
@@ -50,6 +51,7 @@ static const struct gen_device_info gen_device_info_g4x = {
.urb = {
   .size = 384,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_ilk = {
@@ -64,6 +66,7 @@ static const struct gen_device_info gen_device_info_ilk = {
.urb = {
   .size = 1024,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_snb_gt1 = {
@@ -84,6 +87,7 @@ static const struct gen_device_info gen_device_info_snb_gt1 = 
{
   .max_vs_entries = 256,
   .max_gs_entries = 256,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_snb_gt2 = {
@@ -104,6 +108,7 @@ static const struct gen_device_info gen_device_info_snb_gt2 
= {
   .max_vs_entries = 256,
   .max_gs_entries = 256,
},
+   .timebase_scale = 80,
 };
 
 #define GEN7_FEATURES   \
@@ -112,7 +117,8 @@ static const struct gen_device_info gen_device_info_snb_gt2 
= {
.must_use_separate_stencil = true,   \
.has_llc = true, \
.has_pln = true, \
-   .has_surface_tile_offset = true
+   .has_surface_tile_offset = true, \
+   .timebase_scale = 80
 
 static const struct gen_device_info gen_device_info_ivb_gt1 = {
GEN7_FEATURES, .is_ivybridge = true, .gt = 1,
@@ -254,7 +260,8 @@ static const struct gen_device_info gen_device_info_hsw_gt3 
= {
.max_tcs_threads = 504,  \
.max_tes_threads = 504,  \
.max_gs_threads = 504,   \
-   .max_wm_threads = 384
+   .max_wm_threads = 384,   \
+   .timebase_scale = 80
 
 static const struct gen_device_info gen_device_info_bdw_gt1 = {
GEN8_FEATURES, .gt = 1,
@@ -351,16 +358,19 @@ static const struct gen_device_info 
gen_device_info_skl_gt1 = {
GEN9_FEATURES, .gt = 1,
.num_slices = 1,
.urb.size = 192,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_skl_gt2 = {
GEN9_FEATURES, .gt = 2,
.num_slices = 1,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_skl_gt3 = {
GEN9_FEATURES, .gt = 3,
.num_slices = 2,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_skl_gt4 = {
@@ -375,6 +385,7 @@ static const struct gen_device_info gen_device_info_skl_gt4 
= {
 * only 1008KB of this will be used."
 */
.urb.size = 1008 / 3,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_bxt = {
@@ -397,7 +408,8 @@ static const struct gen_device_info gen_device_info_bxt = {
   .max_tcs

[Mesa-dev] [PATCH] i965: Allow a per gen timebase scale factor

2016-10-27 Thread Robert Bragg
Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock
with the convenient property of being able to scale by an integer (80)
to nanosecond units.

For Skylake the frequency is 12MHz or a scale factor of 83.33

This updates gen_device_info to track a floating point timebase_scale
factor and makes corresponding _queryobj.c changes to no longer assume a
scale factor of 80 works across all gens.

Although the gen6_ code could have been been left alone, the changes
keep the code more comparable, and it now shares a few utility functions
for scaling raw timestamps and calculating deltas. The utility for
calculating deltas takes into account 32 or 36bit overflow depending on
the current kernel version.

Note: this leaves the timestamp handling of ARB_query_buffer_object
untouched, which continues to use an incorrect scale of 80 on Skylake
for now. This is more awkward to solve since the scaling is currently
done using a very limited uint64 ALU available to the command parser
that doesn't support multiply or divide where it's already taking a
large number of instructions just to effectively multiple by 80.

This fixes piglit arb_timer_query-timestamp-get on Skylake

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 src/intel/common/gen_device_info.c| 21 +---
 src/intel/common/gen_device_info.h| 24 ++
 src/mesa/drivers/dri/i965/brw_context.c   | 15 +
 src/mesa/drivers/dri/i965/brw_context.h   |  3 ++
 src/mesa/drivers/dri/i965/brw_queryobj.c  | 53 ---
 src/mesa/drivers/dri/i965/gen6_queryobj.c | 28 +---
 6 files changed, 115 insertions(+), 29 deletions(-)

diff --git a/src/intel/common/gen_device_info.c 
b/src/intel/common/gen_device_info.c
index 30df0b2..20594b0 100644
--- a/src/intel/common/gen_device_info.c
+++ b/src/intel/common/gen_device_info.c
@@ -35,6 +35,7 @@ static const struct gen_device_info gen_device_info_i965 = {
.urb = {
   .size = 256,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_g4x = {
@@ -50,6 +51,7 @@ static const struct gen_device_info gen_device_info_g4x = {
.urb = {
   .size = 384,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_ilk = {
@@ -64,6 +66,7 @@ static const struct gen_device_info gen_device_info_ilk = {
.urb = {
   .size = 1024,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_snb_gt1 = {
@@ -84,6 +87,7 @@ static const struct gen_device_info gen_device_info_snb_gt1 = 
{
   .max_vs_entries = 256,
   .max_gs_entries = 256,
},
+   .timebase_scale = 80,
 };
 
 static const struct gen_device_info gen_device_info_snb_gt2 = {
@@ -104,6 +108,7 @@ static const struct gen_device_info gen_device_info_snb_gt2 
= {
   .max_vs_entries = 256,
   .max_gs_entries = 256,
},
+   .timebase_scale = 80,
 };
 
 #define GEN7_FEATURES   \
@@ -112,7 +117,8 @@ static const struct gen_device_info gen_device_info_snb_gt2 
= {
.must_use_separate_stencil = true,   \
.has_llc = true, \
.has_pln = true, \
-   .has_surface_tile_offset = true
+   .has_surface_tile_offset = true, \
+   .timebase_scale = 80
 
 static const struct gen_device_info gen_device_info_ivb_gt1 = {
GEN7_FEATURES, .is_ivybridge = true, .gt = 1,
@@ -254,7 +260,8 @@ static const struct gen_device_info gen_device_info_hsw_gt3 
= {
.max_tcs_threads = 504,  \
.max_tes_threads = 504,  \
.max_gs_threads = 504,   \
-   .max_wm_threads = 384
+   .max_wm_threads = 384,   \
+   .timebase_scale = 80
 
 static const struct gen_device_info gen_device_info_bdw_gt1 = {
GEN8_FEATURES, .gt = 1,
@@ -351,16 +358,19 @@ static const struct gen_device_info 
gen_device_info_skl_gt1 = {
GEN9_FEATURES, .gt = 1,
.num_slices = 1,
.urb.size = 192,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_skl_gt2 = {
GEN9_FEATURES, .gt = 2,
.num_slices = 1,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_skl_gt3 = {
GEN9_FEATURES, .gt = 3,
.num_slices = 2,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_skl_gt4 = {
@@ -375,6 +385,7 @@ static const struct gen_device_info gen_device_info_skl_gt4 
= {
 * only 1008KB of this will be used."
 */
.urb.size = 1008 / 3,
+   .timebase_scale = 10.0 / 1200.0,
 };
 
 static const struct gen_device_info gen_device_info_bxt = {
@@ -397,7 +408,8 @@ static const struct gen_device_info gen_device_info_bxt = {
   .max_tcs_entries = 256,
   .max_tes_entries = 416,
   .max

[Mesa-dev] [RFC v2] Model INTEL perf query backend after query object BE

2015-05-15 Thread Robert Bragg
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is based
on the ARB_query_buffer_object interface (considering the similarity of
the extensions) with the addition of vfuncs for enumerating queries and
their counters.

Compared to the previous backend, some notable differences are:

- The backend is free to represent counters using whatever data
  structures are optimal/convenient since queries and counters are
  enumerated via an iterator api instead of declaring them using
  structures directly shared with the frontend.

  This is also done to help us support the full range of data and
  semantic types available with INTEL_performance_query which is awkward
  while using a structure shared with the AMD_performance_monitor
  backend since neither extension's types are a subset of the other.

- The backend must support waiting for a query instead of the frontend
  simply using glFinish().

- Objects go through 'Active' and 'Ready' states consistent with the
  query object backend (hopefully making them more familiar). There is
  no 'Ended' state (which used to show that a query has ended at least
  once for a given object). There is a new 'Used' state similar to the
  'EverBound' state of query objects, set when a query is first begun
  which implies that we are expecting to get results back for the object
  at some point.

The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa (though a driver could optionally
choose to implement both extensions within a unified backend if that
were convenient for the sake of sharing state/code).

v2: (Samuel Pitoiset)
- init PerfQuery.NumQueries in frontend
- s/return_string/output_clipped_string/
- s/backed/backend/ typo
- remove redundant *bytesWritten = 0

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mesa/main/dd.h|  39 +++
 src/mesa/main/mtypes.h|  25 +-
 src/mesa/main/performance_query.c | 591 ++
 src/mesa/main/performance_query.h |   6 +-
 4 files changed, 275 insertions(+), 386 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 0c1a13f..4ba1524 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -759,6 +759,45 @@ struct dd_function_table {
 GLint *bytesWritten);
/*@}*/
 
+   /**
+* \name Performance Query objects
+*/
+   /*@{*/
+   void (*GetPerfQueryInfo)(struct gl_context *ctx,
+int queryIndex,
+const char **name,
+GLuint *dataSize,
+GLuint *numCounters,
+GLuint *numActive);
+   void (*GetPerfCounterInfo)(struct gl_context *ctx,
+  int queryIndex,
+  int counterIndex,
+  const char **name,
+  const char **desc,
+  GLuint *offset,
+  GLuint *data_size,
+  GLuint *type_enum,
+  GLuint *data_type_enum,
+  GLuint64 *raw_max);
+   struct gl_perf_query_object * (*NewPerfQueryObject)(struct gl_context *ctx,
+   int queryIndex);
+   void (*DeletePerfQuery)(struct gl_context *ctx,
+   struct gl_perf_query_object *obj);
+   GLboolean (*BeginPerfQuery)(struct gl_context *ctx,
+   struct gl_perf_query_object *obj);
+   void (*EndPerfQuery)(struct gl_context *ctx,
+struct gl_perf_query_object *obj);
+   void (*WaitPerfQuery)(struct gl_context *ctx,
+ struct gl_perf_query_object *obj);
+   GLboolean (*IsPerfQueryReady)(struct gl_context *ctx,
+ struct gl_perf_query_object *obj);
+   void (*GetPerfQueryData)(struct gl_context *ctx,
+struct gl_perf_query_object *obj,
+GLsizei dataSize,
+GLuint *data,
+GLuint *bytesWritten);
+   /*@}*/
+
 
/**
 * \name Vertex Array objects
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index b1e5fa9..a26109d 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2014,6 +2014,23 @@ struct gl_perf_monitor_group
 
 
 /**
+ * A query object instance as described in INTEL_performance_query.
+ *
+ * NB: We want to keep this and the corresponding backend structure
+ * relatively lean considering that applications may expect to
+ * allocate enough objects to be able to query around all draw calls
+ * in a frame.
+ */
+struct gl_perf_query_object
+{
+   GLuint Id;/** hash table ID/name */
+   GLuint Used:1;/** has been used for 1 or more queries

Re: [Mesa-dev] [RFC 4/6] i965: Implement INTEL_performance_query extension

2015-05-15 Thread Robert Bragg
On Mon, May 11, 2015 at 4:28 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:


 On 05/06/2015 02:53 AM, Robert Bragg wrote:

 This adds a bare-bones backend for the INTEL_performance_query extension
 that exposes the pipeline statistics on gen 6 and 7 hardware.

 Although this could be considered redundant given that the same
 statistics are now available via query objects, they are a simple
 starting point for this extension and it's expected to be convenient for
 tools wanting to have a single go to api to introspect what performance
 counters are available, along with names, descriptions and semantic/data
 types.

 This code is derived from Kenneth Graunke's work, temporarily removed
 while the frontend and backend interface were reworked.

 Signed-off-by: Robert Bragg rob...@sixbynine.org
 ---
   src/mesa/drivers/dri/i965/Makefile.sources|   1 +
   src/mesa/drivers/dri/i965/brw_context.c   |   3 +
   src/mesa/drivers/dri/i965/brw_context.h   |  26 +
   src/mesa/drivers/dri/i965/brw_performance_query.c | 611
 ++
   src/mesa/drivers/dri/i965/intel_extensions.c  |   3 +
   5 files changed, 644 insertions(+)
   create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c

 diff --git a/src/mesa/drivers/dri/i965/Makefile.sources
 b/src/mesa/drivers/dri/i965/Makefile.sources
 index 210314b..066364a 100644
 --- a/src/mesa/drivers/dri/i965/Makefile.sources
 +++ b/src/mesa/drivers/dri/i965/Makefile.sources
 @@ -81,6 +81,7 @@ i965_FILES = \
 brw_nir_analyze_boolean_resolves.c \
 brw_object_purgeable.c \
 brw_packed_float.c \
 +   brw_performance_query.c \
 brw_primitive_restart.c \
 brw_program.c \
 brw_program.h \
 diff --git a/src/mesa/drivers/dri/i965/brw_context.c
 b/src/mesa/drivers/dri/i965/brw_context.c
 index 80a4b0a..1350bc1 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.c
 +++ b/src/mesa/drivers/dri/i965/brw_context.c
 @@ -884,6 +884,9 @@ brwCreateContext(gl_api api,
  _mesa_initialize_dispatch_tables(ctx);
  _mesa_initialize_vbo_vtxfmt(ctx);
   +   if (ctx-Extensions.INTEL_performance_query)
 +  brw_init_performance_queries(brw);
 +
  vbo_use_buffer_objects(ctx);
  vbo_always_unmap_buffers(ctx);
   diff --git a/src/mesa/drivers/dri/i965/brw_context.h
 b/src/mesa/drivers/dri/i965/brw_context.h
 index db65191..2cd963d 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -953,6 +953,21 @@ struct brw_stage_state
  uint32_t sampler_offset;
   };
   +enum brw_query_kind {
 +   PIPELINE_STATS
 +};
 +
 +struct brw_perf_query
 +{
 +   enum brw_query_kind kind;
 +   const char *name;
 +   struct brw_perf_query_counter *counters;
 +   int n_counters;
 +   size_t data_size;
 +};
 +
 +#define MAX_PERF_QUERIES 3
 +#define MAX_PERF_QUERY_COUNTERS 150
 /**
* brw_context is derived from gl_context.
 @@ -1380,6 +1395,13 @@ struct brw_context
 bool begin_emitted;
  } query;
   +   struct {
 +  struct brw_perf_query queries[MAX_PERF_QUERIES];


 Why the number of active queries is limited to 3? Is that a hardware
 limitation?

No, this isn't a restriction on the number of active queries, rather
it's just that the backend doesn't support more than 3 query types
currently:

1) pipeline statistics
2) simple aggregate counters query
3) 3D counters query

We would increase this e.g. if we were to add a GPGPU counters query

It could be good though to double check that we don't forget to extend
this if we add new query types, e.g. by asserting that
brw-perfquery.n_queries  MAX_PERF_QUERIES within the add_XYZ_query()
functions.



 +  int n_queries;
 +
 +  int n_active_pipeline_stats_queries;
 +   } perfquery;
 +
  int num_atoms[BRW_NUM_PIPELINES];
  const struct brw_tracked_state render_atoms[57];
  const struct brw_tracked_state compute_atoms[1];
 @@ -1656,6 +1678,10 @@ bool brw_render_target_supported(struct brw_context
 *brw,
struct gl_renderbuffer *rb);
   uint32_t brw_depth_format(struct brw_context *brw, mesa_format format);
   +/* brw_performance_query.c */
 +void brw_init_performance_queries(struct brw_context *brw);
 +void brw_dump_perf_queries(struct brw_context *brw);
 +
   /* intel_buffer_objects.c */
   int brw_bo_map(struct brw_context *brw, drm_intel_bo *bo, int
 write_enable,
  const char *bo_name);
 diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c
 b/src/mesa/drivers/dri/i965/brw_performance_query.c
 new file mode 100644
 index 000..38447e8
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
 @@ -0,0 +1,611 @@
 +/*
 + * Copyright © 2013 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining
 a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without

Re: [Mesa-dev] [RFC 0/6] i965: INTEL_performance_query re-work

2015-05-12 Thread Robert Bragg
On Wed, May 6, 2015 at 9:36 AM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:


 On 05/06/2015 02:53 AM, Robert Bragg wrote:

 As we've learned more about the observability capabilities of Gen
 graphics we've found that it's not enough to only try and configure the
 OA unit from userspace without any dedicated support from the kernel.


 Hi Robert,

 Yeah, this is the same idea for performance counters on Nouveau.

 We also need to implement a dedicated support from the kernel for
 configuring/sampling hardware performance counters. Then, we can
 expose a list of available counters through a set of ioctls. Thus, mesa
 configures a hardware event by sending its configuration to the kernel.

Ok makes sense. Just for reference, if you look at my current i915_oa
kernel driver you'll see that our configurations are statically
declared in the driver itself (well there's currently only one 3D
configuration to start with) but certainly a topic that keeps coming
up is the idea of also exposing an interface for userspace to send the
kernel new counter configurations. There are a number of extra details
to consider if we want to enable that so I decided to defer that for
now.



 As it is currently the i965 backends for both AMD_performance_monitor
 and INTEL_performance_query aren't able to report normalized metrics
 useful to application developers due to the limitations of configuring
 the OA unit from userspace via LRIs.

 More recently we've developed a perf PMU (performance monitoring unit)
 driver within the drm i915 driver (i915_oa) that lets userspace
 configure and open an event fd via the perf_event_open syscall which
 provides us a more complete interface for configuring the Gen graphics
 OA unit.

 With help from the kernel we can support periodic sampling (where the
 hardware writes reports into a gpu mapped circular buffer that we can
 forward as perf samples), we can deal with the clock gating + PM
 limitations imposed by the observability hw and also manage + maintain
 the selection of performance counters.

 The perf_event_open(2) man page is a good starting point for anyone
 wanting to learn about the Linux perf interface. Something to beware of
 is that there's currently no precedent upstream for exposing device
 metrics via a perf PMU and although early feedback was sought for this
 work, some of this may be subject to change based on feedback from the
 core perf maintainers as well as the i915 drm driver maintainers.


 Performance counters on Nouveau won't be exposed (in the near future)
 by perf since they need to be tied to the command stream of the GPU,
 and perf only works with ioctl calls.

Yeah the reports we currently collect from perf are periodically
sampled by the hardware, not tied to the command stream.

For metrics synchronized with the CS we have an MI_REPORT_PERF_COUNT
command which generates a report following the same configuration and
report format as for periodic sampling, so even when we're only
interested in these MI_RPC based reports it's convenient to have a
common kernel interface for configuring the OA unit.

A few related things here:

Even when we're primarily interested in MI_RPC based reports around
work of interest we will combine that with periodic reports too due to
the possibility of our 32bit counters (on Haswell at least)
overflowing relatively quickly for some hw generations. Since we know
theoretically, the fasted that any counter can overflow for a
particular generation we can collect periodic samples, just below that
frequency.

We're experimenting with a model where the kernel is automatically
inserting MI_RPC commands around batches and forwarding those reports
via perf, in-line with periodic reports. This lets us track work
per-context with more flexibility than the HW single-context filtering
which is useful for workloads that are spread between multiple
contexts of interest. (I see lots of interest in this from teams
looking to profile media workloads). To also help track different
stages of work within a single context, in this case userspace has a
way of tagging work it's submitting and those tags can be matched via
the perf samples too.

For tooling it's very helpful to be able to collect periodic and CS
synchronized reports, potentially across multiple contexts, via one
event fd, as opposed to collecting per-context results in userspace
possibly across several drivers (e.g media + GL + CL) and then needing
to devise a scheme for forwarding all that data to tools in userspace.



 This PRM is a good starting point for anyone wanting to learn about the
 Gen graphics Observability hardware. Some important information is
 currently missing and this should be updated soon, but that's more
 directly related to the i915_oa perf driver. Notably though the report
 formats described here need to be understood by Mesa, since the perf
 samples simply forward the raw reports from the OA hardware.

 https://01.org/sites/default/files/documentation

Re: [Mesa-dev] [RFC 0/6] i965: INTEL_performance_query re-work

2015-05-12 Thread Robert Bragg
Petri Latval wrote some piglit test for the core
INTEL_performance_query behaviour (i.e not assuming anything about the
semantics of the counters themselves) that he sent to the list some
time last year, but unfortunately they fell through the cracks and
never landed.

The other week I revived those to check my code before sending it out,
I just need to fix up the commit messages to use Petri's comments +
keep his authorship and I can re send those to the list.

Regards,
- Robert

On Mon, May 11, 2015 at 4:09 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 Did you write piglit tests according to what the spec says, btw ?

 On 05/06/2015 02:53 AM, Robert Bragg wrote:

 As we've learned more about the observability capabilities of Gen
 graphics we've found that it's not enough to only try and configure the
 OA unit from userspace without any dedicated support from the kernel.

 As it is currently the i965 backends for both AMD_performance_monitor
 and INTEL_performance_query aren't able to report normalized metrics
 useful to application developers due to the limitations of configuring
 the OA unit from userspace via LRIs.

 More recently we've developed a perf PMU (performance monitoring unit)
 driver within the drm i915 driver (i915_oa) that lets userspace
 configure and open an event fd via the perf_event_open syscall which
 provides us a more complete interface for configuring the Gen graphics
 OA unit.

 With help from the kernel we can support periodic sampling (where the
 hardware writes reports into a gpu mapped circular buffer that we can
 forward as perf samples), we can deal with the clock gating + PM
 limitations imposed by the observability hw and also manage + maintain
 the selection of performance counters.

 The perf_event_open(2) man page is a good starting point for anyone
 wanting to learn about the Linux perf interface. Something to beware of
 is that there's currently no precedent upstream for exposing device
 metrics via a perf PMU and although early feedback was sought for this
 work, some of this may be subject to change based on feedback from the
 core perf maintainers as well as the i915 drm driver maintainers.

 This PRM is a good starting point for anyone wanting to learn about the
 Gen graphics Observability hardware. Some important information is
 currently missing and this should be updated soon, but that's more
 directly related to the i915_oa perf driver. Notably though the report
 formats described here need to be understood by Mesa, since the perf
 samples simply forward the raw reports from the OA hardware.

 https://01.org/sites/default/files/documentation/
 observability_performance_counters_haswell.pdf

 This series re-works the i965 driver's support for exposing performance
 counters, taking advantage of this i915_oa perf event interface.

 A corresponding kernel branch with an initial i915_oa driver for Haswell
 can be found here:

 https://github.com/rib/linux  wip/rib/oa-hsw-4.0.0

 A corresponding libdrm branch can be found here:

 https://github.com/rib/drm  wip/rib/oa-hsw-4.0.0

 In case it's helpful to see another example using the i915_oa perf
 interface I've also been developing a 'gputop' tool that both lets me
 test the INTEL_performance_query interface to collect per-context
 metrics from Mesa and can also visualize system wide metrics (i.e.
 across all gpu contexts) using perf directly:

 https://github.com/rib/gputop

 Although I haven't updated the branches in a while, I could share some
 initial code adding support for Broadwell if anyone's interested to get
 a sense of what's involved in supporting later hardware generations.

 I still anticipate some (hopefully relatively minor) tweaking of
 implementation details based on review feedback for the i915_oa driver,
 but I hope that this is a good point to ask for some feedback on the
 Mesa changes.

 If it's more convenient, these patches can also be fetched from here:

 https://github.com/rib/mesa  wip/rib/oa-hsw-4.0.0

 Regards,
 - Robert

 Robert Bragg (6):
i965: Remove perf monitor/query backend
Separate INTEL_performance_query frontend
Model INTEL perf query backend after query object BE
i965: Implement INTEL_performance_query extension
i965: Expose OA counters via INTEL_performance_query
i965: Adds further support for 3D OA counters

   src/mapi/glapi/gen/gl_genexec.py   |1 +
   src/mesa/Makefile.sources  |2 +
   src/mesa/drivers/dri/i965/Makefile.sources |2 +-
   src/mesa/drivers/dri/i965/brw_context.c|5 +-
   src/mesa/drivers/dri/i965/brw_context.h|  101 +-
   .../drivers/dri/i965/brw_performance_monitor.c | 1472 
   src/mesa/drivers/dri/i965/brw_performance_query.c  | 2356
 
   src/mesa/drivers/dri/i965/intel_batchbuffer.c  |   10 +-
   src/mesa/drivers/dri/i965/intel_extensions.c   |   69 +-
   src/mesa/main/context.c

Re: [Mesa-dev] [RFC 3/6] Model INTEL perf query backend after query object BE

2015-05-12 Thread Robert Bragg
On Mon, May 11, 2015 at 4:11 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 Patches 1 and 2 look fine to me.

 See my comments below for this one.


 On 05/06/2015 02:53 AM, Robert Bragg wrote:


 --- a/src/mesa/main/performance_query.c
 +++ b/src/mesa/main/performance_query.c
 @@ -1,5 +1,5 @@
   /*
 - * Copyright © 2012 Intel Corporation
 + * Copyright © 2012-2015 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining
 a
* copy of this software and associated documentation files (the
 Software),
 @@ -24,14 +24,6 @@
   /**
* \file performance_query.c
* Core Mesa support for the INTEL_performance_query extension.
 - *
 - * In order to implement this extension, start by defining two enums:
 - * one for Groups, and one for Counters.  These will be used as indexes
 into
 - * arrays, so they should start at 0 and increment from there.
 - *
 - * Counter IDs need to be globally unique.  That is, you can't have
 counter 7
 - * in group A and counter 7 in group B.  A global enum of all available
 - * counters is a convenient way to guarantee this.
*/
 #include stdbool.h
 @@ -42,66 +34,21 @@
   #include macros.h
   #include mtypes.h
   #include performance_query.h
 -#include util/bitset.h
   #include util/ralloc.h
 void
   _mesa_init_performance_queries(struct gl_context *ctx)
   {
  ctx-PerfQuery.Objects = _mesa_NewHashTable();
 -   ctx-PerfQuery.NumGroups = 0;


 Why don't you initialize NumQueries here?

Hmm, yeah can't think of a good reason. It's initialized in the
backend, but it seems bad to not be initializing it here.


 @@ -112,34 +59,13 @@ _mesa_free_performance_queries(struct gl_context
 *ctx)
  _mesa_DeleteHashTable(ctx-PerfQuery.Objects);
   }
   -static inline struct gl_perf_monitor_object *
 -lookup_query(struct gl_context *ctx, GLuint id)
 -{
 -   return (struct gl_perf_monitor_object *)
 -  _mesa_HashLookup(ctx-PerfQuery.Objects, id);
 -}
 -
 -static inline const struct gl_perf_monitor_group *
 -get_group(const struct gl_context *ctx, GLuint id)
 +static inline struct gl_perf_query_object *
 +lookup_object(struct gl_context *ctx, GLuint id)
   {
 -   if (id = ctx-PerfQuery.NumGroups)
 -  return NULL;
 -
 -   return ctx-PerfQuery.Groups[id];
 -}
 -
 -static inline const struct gl_perf_monitor_counter *
 -get_counter(const struct gl_perf_monitor_group *group_obj, GLuint id)
 -{
 -   if (id = group_obj-NumCounters)
 -  return NULL;
 -
 -   return group_obj-Counters[id];
 +   return _mesa_HashLookup(ctx-PerfQuery.Objects, id);


 I guess _mesa_HashLookup() returns NULL when it cannot find the element,
 right?

I think that's right. The function description says \return pointer
to user's data or NULL if key not in table,
_mesa_HashLookup_unlocked() seems to imply as much.

 +static void
 +return_string(char *stringRet, GLuint stringMaxLen, const char *string)


 In my opinion, the name of this function is a bit ambiguous. :-)

Right, I'm not very happy with its naming either. Does
output_string_safe() seem clearer, or output_clipped_string() perhaps?


 @@ -274,15 +192,22 @@ _mesa_GetPerfQueryIdByNameINTEL(char *queryName,
 GLuint *queryId)
 return;
  }
   -   /* The specification does not state that this produces an error. */
 +   /* The specification does not state that this produces an error but
 +* to be consistent with glGetFirstPerfQueryIdINTEL we generate an
 +* INVALID_VALUE error */
  if (!queryId) {
 -  _mesa_warning(ctx, glGetPerfQueryIdByNameINTEL(queryId == NULL));
 +  _mesa_error(ctx, GL_INVALID_VALUE,
 +  glGetPerfQueryIdByNameINTEL(queryId == NULL));
 return;
  }
   -   for (i = 0; i  ctx-PerfQuery.NumGroups; ++i) {
 -  const struct gl_perf_monitor_group *group_obj = get_group(ctx, i);
 -  if (strcmp(group_obj-Name, queryName) == 0) {
 +   for (i = 0; i  ctx-PerfQuery.NumQueries; ++i) {
 +  const GLchar *name;
 +  GLuint ignore;
 +
 +  ctx-Driver.GetPerfQueryInfo(ctx, i, name, ignore, ignore,
 ignore);


 Why these parameters are currently ignored?

Driver.GetPerfQueryInfo() is used in two places within the frontend,
and it's mainly geared as the backend for glGetPerfQueryInfo to get a
query's name, report-size, the number of counters and number of active
queries of that type.

In this case though, we're only interested in the name of a query and
so we are going to ignore the extra info that function returns. For
the sake of avoiding the need for the backed to check for NULL
pointers the frontend gives the backend a dummy destination for the
info being ignored.


 @@ -294,18 +219,21 @@ _mesa_GetPerfQueryIdByNameINTEL(char *queryName,
 GLuint *queryId)
 extern void GLAPIENTRY
   _mesa_GetPerfQueryInfoINTEL(GLuint queryId,
 -GLuint queryNameLength, char *queryName,
 -GLuint *dataSize, GLuint *noCounters,
 -GLuint

[Mesa-dev] [RFC 4/6] i965: Implement INTEL_performance_query extension

2015-05-05 Thread Robert Bragg
This adds a bare-bones backend for the INTEL_performance_query extension
that exposes the pipeline statistics on gen 6 and 7 hardware.

Although this could be considered redundant given that the same
statistics are now available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.

This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mesa/drivers/dri/i965/Makefile.sources|   1 +
 src/mesa/drivers/dri/i965/brw_context.c   |   3 +
 src/mesa/drivers/dri/i965/brw_context.h   |  26 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 611 ++
 src/mesa/drivers/dri/i965/intel_extensions.c  |   3 +
 5 files changed, 644 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 210314b..066364a 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -81,6 +81,7 @@ i965_FILES = \
brw_nir_analyze_boolean_resolves.c \
brw_object_purgeable.c \
brw_packed_float.c \
+   brw_performance_query.c \
brw_primitive_restart.c \
brw_program.c \
brw_program.h \
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 80a4b0a..1350bc1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -884,6 +884,9 @@ brwCreateContext(gl_api api,
_mesa_initialize_dispatch_tables(ctx);
_mesa_initialize_vbo_vtxfmt(ctx);
 
+   if (ctx-Extensions.INTEL_performance_query)
+  brw_init_performance_queries(brw);
+
vbo_use_buffer_objects(ctx);
vbo_always_unmap_buffers(ctx);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index db65191..2cd963d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -953,6 +953,21 @@ struct brw_stage_state
uint32_t sampler_offset;
 };
 
+enum brw_query_kind {
+   PIPELINE_STATS
+};
+
+struct brw_perf_query
+{
+   enum brw_query_kind kind;
+   const char *name;
+   struct brw_perf_query_counter *counters;
+   int n_counters;
+   size_t data_size;
+};
+
+#define MAX_PERF_QUERIES 3
+#define MAX_PERF_QUERY_COUNTERS 150
 
 /**
  * brw_context is derived from gl_context.
@@ -1380,6 +1395,13 @@ struct brw_context
   bool begin_emitted;
} query;
 
+   struct {
+  struct brw_perf_query queries[MAX_PERF_QUERIES];
+  int n_queries;
+
+  int n_active_pipeline_stats_queries;
+   } perfquery;
+
int num_atoms[BRW_NUM_PIPELINES];
const struct brw_tracked_state render_atoms[57];
const struct brw_tracked_state compute_atoms[1];
@@ -1656,6 +1678,10 @@ bool brw_render_target_supported(struct brw_context *brw,
  struct gl_renderbuffer *rb);
 uint32_t brw_depth_format(struct brw_context *brw, mesa_format format);
 
+/* brw_performance_query.c */
+void brw_init_performance_queries(struct brw_context *brw);
+void brw_dump_perf_queries(struct brw_context *brw);
+
 /* intel_buffer_objects.c */
 int brw_bo_map(struct brw_context *brw, drm_intel_bo *bo, int write_enable,
const char *bo_name);
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
new file mode 100644
index 000..38447e8
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -0,0 +1,611 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE

[Mesa-dev] [RFC 5/6] i965: Expose OA counters via INTEL_performance_query

2015-05-05 Thread Robert Bragg
This adds support for exposing basic ('aggregate') Observation
Architecture performance counters on Haswell.

This support is based on the i915_oa perf kernel interface which is used
to configure the OA unit, allowing Mesa to emit MI_REPORT_PERF_COUNT
commands around queries to collect counter snapshots.

To take into account the small chance that some of the 32bit counters
could wrap around for long queries (~50 milliseconds for a GT3 Haswell @
1.1GHz) the implementation also collects periodic metrics via perf.

It seems worth noting that the structures chosen to describe the
counters were arrived at after a few iterations of creating other
limited intel_gpu_tools programs that similarly needed to describe,
normalise and display these counters and many of the utility functions
should be reusable as more sets of counters are exposed via the OA unit
as well as for future hardware generations.

It also seems worth noting that in the future we might want to aim to
factor out much of the code for describing and normalising OA counters
into a common library such as libdrm since we are already having
difficulties copying very similar code between multiple tools dealing
with these same counters; such as for looking at the OA counters for
system wide analysis. Until we've got a bit more experience dealing with
the counters though it seems best to defer this separation and avoid
having to design a suitable api for these different use cases.

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mesa/drivers/dri/i965/brw_context.h   |   66 +
 src/mesa/drivers/dri/i965/brw_performance_query.c | 1383 -
 2 files changed, 1430 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 2cd963d..5f5a59b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -954,6 +954,7 @@ struct brw_stage_state
 };
 
 enum brw_query_kind {
+   OA_COUNTERS,
PIPELINE_STATS
 };
 
@@ -964,10 +965,18 @@ struct brw_perf_query
struct brw_perf_query_counter *counters;
int n_counters;
size_t data_size;
+
+   /* OA specific */
+   int oa_metrics_set;
+   int oa_format;
+   struct brw_oa_counter *oa_counters;
+   int n_oa_counters;
 };
 
 #define MAX_PERF_QUERIES 3
 #define MAX_PERF_QUERY_COUNTERS 150
+#define MAX_OA_QUERY_COUNTERS 100
+#define MAX_RAW_OA_COUNTERS 62
 
 /**
  * brw_context is derived from gl_context.
@@ -1399,7 +1408,64 @@ struct brw_context
   struct brw_perf_query queries[MAX_PERF_QUERIES];
   int n_queries;
 
+  /* A common OA counter that we want to read directly in several places */
+  uint64_t (*read_oa_report_timestamp)(uint32_t *report);
+
+  /* Needed to normalize counters aggregated across all EUs */
+  int eu_count;
+
+  /* The i915_oa perf event we open to setup + enable the OA counters */
+  int perf_oa_event_fd;
+
+  /* An i915_oa perf event fd gives exclusive access to the OA unit that
+   * will report counter snapshots for a specific counter set/profile in a
+   * specific layout/format so we can only start OA queries that are
+   * compatible with the currently open fd... */
+  int perf_oa_metrics_set;
+  int perf_oa_format;
+
+  /* The mmaped circular buffer for collecting samples from perf */
+  uint8_t *perf_oa_mmap_base;
+  size_t perf_oa_buffer_size;
+  struct perf_event_mmap_page *perf_oa_mmap_page;
+
+  /* The system's page size */
+  unsigned int page_size;
+
+  int n_active_oa_queries;
   int n_active_pipeline_stats_queries;
+
+  /* The number of queries depending on running OA counters which
+   * extends beyond brw_end_perf_query() since we need to wait until
+   * the last MI_RPC command has been written.
+   *
+   * Accurate accounting is important here as emitting an
+   * MI_REPORT_PERF_COUNT command while the OA unit is disabled will
+   * effectively hang the gpu.
+   */
+  int n_oa_users;
+
+  /* To help catch an spurious problem with the hardware or perf
+   * forwarding samples, we emit each MI_REPORT_PERF_COUNT command
+   * with a unique ID that we can explicitly check for... */
+  int next_query_start_report_id;
+
+  /**
+   * An array of queries whose results haven't yet been assembled based on
+   * the data in buffer objects.
+   *
+   * These may be active, or have already ended.  However, the results
+   * have not been requested.
+   */
+  struct brw_perf_query_object **unresolved;
+  int unresolved_elements;
+  int unresolved_array_size;
+
+  /* The total number of query objects so we can relinquish
+   * our exclusive access to perf if the application deletes
+   * all of its objects. (NB: We only disable perf while
+   * there are no active queries) */
+  int n_query_instances;
} perfquery;
 
int num_atoms

[Mesa-dev] [RFC 2/6] Separate INTEL_performance_query frontend

2015-05-05 Thread Robert Bragg
To allow the backend interfaces for AMD_performance_monitor and
INTEL_performance_query to evolve independently based on the more
specific requirements of each extension this starts by separating
the frontends of these extensions.

Even though there wasn't much tying these frontends together, this
separation intentionally copies what few helpers/utilities
that were shared between the two extensions, avoiding any re-factoring
specific to INTEL_performance_query so that that the evolution
will be easier to follow later.

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mapi/glapi/gen/gl_genexec.py|   1 +
 src/mesa/Makefile.sources   |   2 +
 src/mesa/main/context.c |   3 +
 src/mesa/main/mtypes.h  |  15 +
 src/mesa/main/performance_monitor.c | 579 ---
 src/mesa/main/performance_monitor.h |  39 --
 src/mesa/main/performance_query.c   | 766 
 src/mesa/main/performance_query.h   |  84 
 8 files changed, 871 insertions(+), 618 deletions(-)
 create mode 100644 src/mesa/main/performance_query.c
 create mode 100644 src/mesa/main/performance_query.h

diff --git a/src/mapi/glapi/gen/gl_genexec.py b/src/mapi/glapi/gen/gl_genexec.py
index 7151f0d..510af8e 100644
--- a/src/mapi/glapi/gen/gl_genexec.py
+++ b/src/mapi/glapi/gen/gl_genexec.py
@@ -87,6 +87,7 @@ header = /**
 #include main/multisample.h
 #include main/objectlabel.h
 #include main/performance_monitor.h
+#include main/performance_query.h
 #include main/pipelineobj.h
 #include main/pixel.h
 #include main/pixelstore.h
diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 1293d41..19605fc 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -140,6 +140,8 @@ MAIN_FILES = \
main/pbo.h \
main/performance_monitor.c \
main/performance_monitor.h \
+   main/performance_query.c \
+   main/performance_query.h \
main/pipelineobj.c \
main/pipelineobj.h \
main/pixel.c \
diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 4aaf8b1..84dc1c9 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -106,6 +106,7 @@
 #include matrix.h
 #include multisample.h
 #include performance_monitor.h
+#include performance_query.h
 #include pipelineobj.h
 #include pixel.h
 #include pixelstore.h
@@ -826,6 +827,7 @@ init_attrib_groups(struct gl_context *ctx)
_mesa_init_matrix( ctx );
_mesa_init_multisample( ctx );
_mesa_init_performance_monitors( ctx );
+   _mesa_init_performance_queries( ctx );
_mesa_init_pipeline( ctx );
_mesa_init_pixel( ctx );
_mesa_init_pixelstore( ctx );
@@ -1296,6 +1298,7 @@ _mesa_free_context_data( struct gl_context *ctx )
_mesa_free_varray_data(ctx);
_mesa_free_transform_feedback(ctx);
_mesa_free_performance_monitors(ctx);
+   _mesa_free_performance_queries(ctx);
 
_mesa_reference_buffer_object(ctx, ctx-Pack.BufferObj, NULL);
_mesa_reference_buffer_object(ctx, ctx-Unpack.BufferObj, NULL);
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index fb41430..b1e5fa9 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2028,6 +2028,20 @@ struct gl_perf_monitor_state
 
 
 /**
+ * Context state for INTEL_performance_query.
+ */
+struct gl_perf_query_state
+{
+   /** Array of performance monitor groups (indexed by group ID) */
+   const struct gl_perf_monitor_group *Groups;
+   GLuint NumGroups;
+
+   /** The table of all performance query objects. */
+   struct _mesa_HashTable *Objects;
+};
+
+
+/**
  * Names of the various vertex/fragment program register files, etc.
  *
  * NOTE: first four tokens must fit into 2 bits (see t_vb_arbprogram.c)
@@ -4229,6 +4243,7 @@ struct gl_context
struct gl_transform_feedback_state TransformFeedback;
 
struct gl_perf_monitor_state PerfMonitor;
+   struct gl_perf_query_state PerfQuery;
 
struct gl_buffer_object *DrawIndirectBuffer; /**  GL_ARB_draw_indirect */
 
diff --git a/src/mesa/main/performance_monitor.c 
b/src/mesa/main/performance_monitor.c
index 2d740da..742d9a5 100644
--- a/src/mesa/main/performance_monitor.c
+++ b/src/mesa/main/performance_monitor.c
@@ -666,582 +666,3 @@ _mesa_perf_monitor_counter_size(const struct 
gl_perf_monitor_counter *c)
   return 0;
}
 }
-
-extern void GLAPIENTRY
-_mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId)
-{
-   GET_CURRENT_CONTEXT(ctx);
-   unsigned numGroups;
-
-   /* The GL_INTEL_performance_query spec says:
-*
-*If queryId pointer is equal to 0, INVALID_VALUE error is generated.
-*/
-   if (!queryId) {
-  _mesa_error(ctx, GL_INVALID_VALUE,
-  glGetFirstPerfQueryIdINTEL(queryId == NULL));
-  return;
-   }
-
-   numGroups = ctx-PerfMonitor.NumGroups;
-
-   /* The GL_INTEL_performance_query spec says:
-*
-*If the given hardware platform doesn't support any performance
-*queries, then the value of 0 is returned

[Mesa-dev] [RFC 6/6] i965: Adds further support for 3D OA counters

2015-05-05 Thread Robert Bragg
This uses the i915_oa '3D' metric set to expose many more interesting OA
counters including information about depth, alpha and stencil testing,
sampler usage/bottlneck stats and cache throughputs.

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mesa/drivers/dri/i965/brw_performance_query.c | 402 +-
 1 file changed, 401 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c 
b/src/mesa/drivers/dri/i965/brw_performance_query.c
index bfe39f9..db915cd 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -1313,7 +1313,7 @@ brw_delete_perf_query(struct gl_context *ctx,
 
 
/**/
 
-/* Type safe wrapper for reading OA counter values */
+/* Type safe wrappers for reading OA counter values */
 
 static uint64_t
 read_uint64_oa_counter(struct brw_oa_counter *counter, uint64_t *accumulated)
@@ -1327,6 +1327,18 @@ read_uint64_oa_counter(struct brw_oa_counter *counter, 
uint64_t *accumulated)
return value;
 }
 
+static float
+read_float_oa_counter(struct brw_oa_counter *counter, uint64_t *accumulated)
+{
+   float value;
+
+   assert(counter-data_type == GL_PERFQUERY_COUNTER_DATA_FLOAT_INTEL);
+
+   counter-read(counter, accumulated, value);
+
+   return value;
+}
+
 
/**/
 
 /*
@@ -1467,6 +1479,71 @@ add_oa_counter_normalised_by_gpu_duration(struct 
brw_query_builder *builder,
 }
 
 static void
+read_hsw_samplers_busy_duration_cb(struct brw_oa_counter *counter,
+   uint64_t *accumulated,
+   void *value) /* float */
+{
+   uint64_t sampler0_busy = read_uint64_oa_counter(counter-reference0, 
accumulated);
+   uint64_t sampler1_busy = read_uint64_oa_counter(counter-reference1, 
accumulated);
+   uint64_t clk_delta = read_uint64_oa_counter(counter-reference2, 
accumulated);
+   float *ret = value;
+
+   if (!clk_delta) {
+  *ret = 0;
+  return;
+   }
+
+   *ret = ((double)(sampler0_busy + sampler1_busy) * 100.0) / 
((double)clk_delta * 2.0);
+}
+
+static struct brw_oa_counter *
+add_hsw_samplers_busy_duration_oa_counter(struct brw_query_builder *builder,
+  struct brw_oa_counter 
*sampler0_busy_raw,
+  struct brw_oa_counter 
*sampler1_busy_raw)
+{
+   struct brw_oa_counter *counter =
+  builder-query-oa_counters[builder-query-n_oa_counters++];
+
+   counter-reference0 = sampler0_busy_raw;
+   counter-reference1 = sampler1_busy_raw;
+   counter-reference2 = builder-gpu_core_clock;
+   counter-read = read_hsw_samplers_busy_duration_cb;
+   counter-data_type = GL_PERFQUERY_COUNTER_DATA_FLOAT_INTEL;
+
+   return counter;
+}
+
+static void
+read_hsw_slice_extrapolated_cb(struct brw_oa_counter *counter,
+   uint64_t *accumulated,
+   void *value) /* float */
+{
+   uint64_t counter0 = read_uint64_oa_counter(counter-reference0, 
accumulated);
+   uint64_t counter1 = read_uint64_oa_counter(counter-reference1, 
accumulated);
+   int eu_count = counter-config;
+   uint64_t *ret = value;
+
+   *ret = (counter0 + counter1) * eu_count;
+}
+
+static struct brw_oa_counter *
+add_hsw_slice_extrapolated_oa_counter(struct brw_query_builder *builder,
+  struct brw_oa_counter *counter0,
+  struct brw_oa_counter *counter1)
+{
+   struct brw_oa_counter *counter =
+  builder-query-oa_counters[builder-query-n_oa_counters++];
+
+   counter-reference0 = counter0;
+   counter-reference1 = counter1;
+   counter-config = builder-brw-perfquery.eu_count;
+   counter-read = read_hsw_slice_extrapolated_cb;
+   counter-data_type = GL_PERFQUERY_COUNTER_DATA_UINT64_INTEL;
+
+   return counter;
+}
+
+static void
 read_oa_counter_normalized_by_eu_duration_cb(struct brw_oa_counter *counter,
  uint64_t *accumulated,
  void *value) /* float */
@@ -1535,6 +1612,63 @@ add_average_thread_cycles_oa_counter(struct 
brw_query_builder *builder,
 }
 
 static void
+read_scaled_uint64_counter_cb(struct brw_oa_counter *counter,
+  uint64_t *accumulated,
+  void *value) /* uint64 */
+{
+   uint64_t delta = read_uint64_oa_counter(counter-reference0, accumulated);
+   uint64_t scale = counter-config;
+   uint64_t *ret = value;
+
+   *ret = delta * scale;
+}
+
+static struct brw_oa_counter *
+add_scaled_uint64_oa_counter(struct brw_query_builder *builder,
+ struct brw_oa_counter *input,
+ int scale)
+{
+   struct brw_oa_counter *counter =
+  builder-query-oa_counters[builder-query-n_oa_counters

[Mesa-dev] [RFC 3/6] Model INTEL perf query backend after query object BE

2015-05-05 Thread Robert Bragg
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is based
on the ARB_query_buffer_object interface (considering the similarity of
the extensions) with the addition of vfuncs for enumerating queries and
their counters.

Compared to the previous backend, some notable differences are:

- The backend is free to represent counters using whatever data
  structures are optimal/convenient since queries and counters are
  enumerated via an iterator api instead of declaring them using
  structures directly shared with the frontend.

  This is also done to help us support the full range of data and
  semantic types available with INTEL_performance_query which is awkward
  while using a structure shared with the AMD_performance_monitor
  backend since neither extension's types are a subset of the other.

- The backend must support waiting for a query instead of the frontend
  simply using glFinish().

- Objects go through 'Active' and 'Ready' states consistent with the
  query object backend (hopefully making them more familiar). There is
  no 'Ended' state (which used to show that a query has ended at least
  once for a given object). There is a new 'Used' state similar to the
  'EverBound' state of query objects, set when a query is first begun
  which implies that we are expecting to get results back for the object
  at some point.

The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa (though a driver could optionally
choose to implement both extensions within a unified backend if that
were convenient for the sake of sharing state/code).

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mesa/main/dd.h|  39 +++
 src/mesa/main/mtypes.h|  25 +-
 src/mesa/main/performance_query.c | 590 ++
 src/mesa/main/performance_query.h |   6 +-
 4 files changed, 275 insertions(+), 385 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 0c1a13f..4ba1524 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -759,6 +759,45 @@ struct dd_function_table {
 GLint *bytesWritten);
/*@}*/
 
+   /**
+* \name Performance Query objects
+*/
+   /*@{*/
+   void (*GetPerfQueryInfo)(struct gl_context *ctx,
+int queryIndex,
+const char **name,
+GLuint *dataSize,
+GLuint *numCounters,
+GLuint *numActive);
+   void (*GetPerfCounterInfo)(struct gl_context *ctx,
+  int queryIndex,
+  int counterIndex,
+  const char **name,
+  const char **desc,
+  GLuint *offset,
+  GLuint *data_size,
+  GLuint *type_enum,
+  GLuint *data_type_enum,
+  GLuint64 *raw_max);
+   struct gl_perf_query_object * (*NewPerfQueryObject)(struct gl_context *ctx,
+   int queryIndex);
+   void (*DeletePerfQuery)(struct gl_context *ctx,
+   struct gl_perf_query_object *obj);
+   GLboolean (*BeginPerfQuery)(struct gl_context *ctx,
+   struct gl_perf_query_object *obj);
+   void (*EndPerfQuery)(struct gl_context *ctx,
+struct gl_perf_query_object *obj);
+   void (*WaitPerfQuery)(struct gl_context *ctx,
+ struct gl_perf_query_object *obj);
+   GLboolean (*IsPerfQueryReady)(struct gl_context *ctx,
+ struct gl_perf_query_object *obj);
+   void (*GetPerfQueryData)(struct gl_context *ctx,
+struct gl_perf_query_object *obj,
+GLsizei dataSize,
+GLuint *data,
+GLuint *bytesWritten);
+   /*@}*/
+
 
/**
 * \name Vertex Array objects
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index b1e5fa9..a26109d 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2014,6 +2014,23 @@ struct gl_perf_monitor_group
 
 
 /**
+ * A query object instance as described in INTEL_performance_query.
+ *
+ * NB: We want to keep this and the corresponding backend structure
+ * relatively lean considering that applications may expect to
+ * allocate enough objects to be able to query around all draw calls
+ * in a frame.
+ */
+struct gl_perf_query_object
+{
+   GLuint Id;/** hash table ID/name */
+   GLuint Used:1;/** has been used for 1 or more queries */
+   GLuint Active:1;  /** inside Begin/EndPerfQuery */
+   GLuint Ready:1;   /** result is ready? */
+};
+
+
+/**
  * Context state for AMD_performance_monitor

[Mesa-dev] [RFC 1/6] i965: Remove perf monitor/query backend

2015-05-05 Thread Robert Bragg
In its current state the unified i965 backend for
AMD_performance_monitor and INTEL_performance_query isn't able to report
meaningful Observation Architecture metrics since we haven't so far had
the necessary kernel support to fully configure the OA unit, nor the
corresponding support for normalizing the counters into a form that can
be usefully interpreted by application developers (as opposed to raw
values that may, for example, scale by the number of EUs there are).

So that we can focus on implementing just one of these extensions fully
and since we anticipate some significant backend changes as we look to
use the Linux perf interface as a means to configure the OA unit, this
patch removes the current backend. This will simplify our ability to
update the frontend infrastructure and backend interface before updating
our support for performance counters.

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/mesa/drivers/dri/i965/Makefile.sources |1 -
 src/mesa/drivers/dri/i965/brw_context.c|4 -
 src/mesa/drivers/dri/i965/brw_context.h|   43 -
 .../drivers/dri/i965/brw_performance_monitor.c | 1472 
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  |   10 +-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   70 -
 6 files changed, 1 insertion(+), 1599 deletions(-)
 delete mode 100644 src/mesa/drivers/dri/i965/brw_performance_monitor.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 6d4659f..210314b 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -81,7 +81,6 @@ i965_FILES = \
brw_nir_analyze_boolean_resolves.c \
brw_object_purgeable.c \
brw_packed_float.c \
-   brw_performance_monitor.c \
brw_primitive_restart.c \
brw_program.c \
brw_program.h \
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 0f2f9ad..80a4b0a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -884,10 +884,6 @@ brwCreateContext(gl_api api,
_mesa_initialize_dispatch_tables(ctx);
_mesa_initialize_vbo_vtxfmt(ctx);
 
-   if (ctx-Extensions.AMD_performance_monitor) {
-  brw_init_performance_monitors(brw);
-   }
-
vbo_use_buffer_objects(ctx);
vbo_always_unmap_buffers(ctx);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 8db1028..db65191 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1380,43 +1380,6 @@ struct brw_context
   bool begin_emitted;
} query;
 
-   struct {
-  /** A map from pipeline statistics counter IDs to MMIO addresses. */
-  const int *statistics_registers;
-
-  /** The number of active monitors using OA counters. */
-  unsigned oa_users;
-
-  /**
-   * A buffer object storing OA counter snapshots taken at the start and
-   * end of each batch (creating bookends around the batch).
-   */
-  drm_intel_bo *bookend_bo;
-
-  /** The number of snapshots written to bookend_bo. */
-  int bookend_snapshots;
-
-  /**
-   * An array of monitors whose results haven't yet been assembled based on
-   * the data in buffer objects.
-   *
-   * These may be active, or have already ended.  However, the results
-   * have not been requested.
-   */
-  struct brw_perf_monitor_object **unresolved;
-  int unresolved_elements;
-  int unresolved_array_size;
-
-  /**
-   * Mapping from a uint32_t offset within an OA snapshot to the ID of
-   * the counter which MI_REPORT_PERF_COUNT stores there.
-   */
-  const int *oa_snapshot_layout;
-
-  /** Number of 32-bit entries in a hardware counter snapshot. */
-  int entries_per_oa_snapshot;
-   } perfmon;
-
int num_atoms[BRW_NUM_PIPELINES];
const struct brw_tracked_state render_atoms[57];
const struct brw_tracked_state compute_atoms[1];
@@ -1693,12 +1656,6 @@ bool brw_render_target_supported(struct brw_context *brw,
  struct gl_renderbuffer *rb);
 uint32_t brw_depth_format(struct brw_context *brw, mesa_format format);
 
-/* brw_performance_monitor.c */
-void brw_init_performance_monitors(struct brw_context *brw);
-void brw_dump_perf_monitors(struct brw_context *brw);
-void brw_perf_monitor_new_batch(struct brw_context *brw);
-void brw_perf_monitor_finish_batch(struct brw_context *brw);
-
 /* intel_buffer_objects.c */
 int brw_bo_map(struct brw_context *brw, drm_intel_bo *bo, int write_enable,
const char *bo_name);
diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
deleted file mode 100644
index 2c8cd49..000
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ /dev/null

[Mesa-dev] [RFC 0/6] i965: INTEL_performance_query re-work

2015-05-05 Thread Robert Bragg
As we've learned more about the observability capabilities of Gen
graphics we've found that it's not enough to only try and configure the
OA unit from userspace without any dedicated support from the kernel.

As it is currently the i965 backends for both AMD_performance_monitor
and INTEL_performance_query aren't able to report normalized metrics
useful to application developers due to the limitations of configuring
the OA unit from userspace via LRIs.

More recently we've developed a perf PMU (performance monitoring unit)
driver within the drm i915 driver (i915_oa) that lets userspace
configure and open an event fd via the perf_event_open syscall which
provides us a more complete interface for configuring the Gen graphics
OA unit.

With help from the kernel we can support periodic sampling (where the
hardware writes reports into a gpu mapped circular buffer that we can
forward as perf samples), we can deal with the clock gating + PM
limitations imposed by the observability hw and also manage + maintain
the selection of performance counters.

The perf_event_open(2) man page is a good starting point for anyone
wanting to learn about the Linux perf interface. Something to beware of
is that there's currently no precedent upstream for exposing device
metrics via a perf PMU and although early feedback was sought for this
work, some of this may be subject to change based on feedback from the
core perf maintainers as well as the i915 drm driver maintainers.

This PRM is a good starting point for anyone wanting to learn about the
Gen graphics Observability hardware. Some important information is
currently missing and this should be updated soon, but that's more
directly related to the i915_oa perf driver. Notably though the report
formats described here need to be understood by Mesa, since the perf
samples simply forward the raw reports from the OA hardware.

https://01.org/sites/default/files/documentation/
observability_performance_counters_haswell.pdf

This series re-works the i965 driver's support for exposing performance
counters, taking advantage of this i915_oa perf event interface.

A corresponding kernel branch with an initial i915_oa driver for Haswell
can be found here:

https://github.com/rib/linux  wip/rib/oa-hsw-4.0.0

A corresponding libdrm branch can be found here:

https://github.com/rib/drm  wip/rib/oa-hsw-4.0.0

In case it's helpful to see another example using the i915_oa perf
interface I've also been developing a 'gputop' tool that both lets me
test the INTEL_performance_query interface to collect per-context
metrics from Mesa and can also visualize system wide metrics (i.e.
across all gpu contexts) using perf directly:

https://github.com/rib/gputop

Although I haven't updated the branches in a while, I could share some
initial code adding support for Broadwell if anyone's interested to get
a sense of what's involved in supporting later hardware generations.

I still anticipate some (hopefully relatively minor) tweaking of
implementation details based on review feedback for the i915_oa driver,
but I hope that this is a good point to ask for some feedback on the
Mesa changes.

If it's more convenient, these patches can also be fetched from here:

https://github.com/rib/mesa  wip/rib/oa-hsw-4.0.0

Regards,
- Robert

Robert Bragg (6):
  i965: Remove perf monitor/query backend
  Separate INTEL_performance_query frontend
  Model INTEL perf query backend after query object BE
  i965: Implement INTEL_performance_query extension
  i965: Expose OA counters via INTEL_performance_query
  i965: Adds further support for 3D OA counters

 src/mapi/glapi/gen/gl_genexec.py   |1 +
 src/mesa/Makefile.sources  |2 +
 src/mesa/drivers/dri/i965/Makefile.sources |2 +-
 src/mesa/drivers/dri/i965/brw_context.c|5 +-
 src/mesa/drivers/dri/i965/brw_context.h|  101 +-
 .../drivers/dri/i965/brw_performance_monitor.c | 1472 
 src/mesa/drivers/dri/i965/brw_performance_query.c  | 2356 
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  |   10 +-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   69 +-
 src/mesa/main/context.c|3 +
 src/mesa/main/dd.h |   39 +
 src/mesa/main/mtypes.h |   28 +
 src/mesa/main/performance_monitor.c|  579 -
 src/mesa/main/performance_monitor.h|   39 -
 src/mesa/main/performance_query.c  |  608 +
 src/mesa/main/performance_query.h  |   80 +
 16 files changed, 3197 insertions(+), 2197 deletions(-)
 delete mode 100644 src/mesa/drivers/dri/i965/brw_performance_monitor.c
 create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c
 create mode 100644 src/mesa/main/performance_query.c
 create mode 100644 src/mesa/main/performance_query.h

-- 
2.3.2

___
mesa-dev

Re: [Mesa-dev] [PATCH 00/15] GL_AMD_performance_monitor

2015-04-14 Thread Robert Bragg
Hi Samuel,

On Tue, Mar 31, 2015 at 5:56 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 Hello Robert,

 Sorry for the delay, I just saw your message few days ago, and I probably
 removed the mail by mistake too...

And then I was on holiday; so more delay :-)


 I have never heard about your work on this area, happy to know right now. :)

 Well, regarding the backend stuff, I would prefer to keep the same for both
 GL_AMD_performance_monitor and INTEL_performance_query.

My experience with the Intel backend where I initially aimed to update
both extensions behind one backend is that it was quite a hindrance
and there wasn't a clear benefit to it when there isn't really any
substantial code to speak of in the core infrastructure to share
between the extensions.

We should be careful not to talk cross purposes here though. In my
mind having orthogonal frontends and even different backend interfaces
wouldn't preclude a driver implementing both extensions with a unified
backend if desirable, there would just be two separate sets of entry
points for the frontend to interact with that unified backend.

Some of the issues I came across were:

The current design expects a common description of counters and their
types, but the current implementation doesn't fully support
INTEL_performance_query semantic types. Fixing this is awkward
because neither extension has data/semantic types that are a strict
subset of the other so to support all the types I imagine we'd also
need to introduce some mechanism for black/white listing counters for
each extension if we want to keep a common description. Then if we
wanted to utilize the full range of types for both extensions I have a
feeling a lot of the counters would end up being exclusively declared
for one extension or the other which would negate some of the benefit
of having a common structure.

The current infrastructure seems somewhat biased towards implementing
AMD_performance_monitor with the concept of groups and counter
selection which doesn't exist in the INTEL_performance_query extension
and that seems unfortunate when the selection mechanism looks to make
the allocation and tracking of query objects more costly/complex if we
don't need it for the INTEL_performance_query extension.

There's no substantial utility code associated with the core
infrastructure that the backends benefit from to help justify sharing
a backend for multiple extensions. The core support just does simple
frontend validation of user arguments to normalize things and handle
gl errors consistently before interacting with the backend so in
practice the INTEL_performance_query and AMD_performance_monitor code
is rather orthogonal.

I think the only things that connect the two extensions currently are
the shared declaration of counters and a tiny amount of utility code
for allocating/freeing monitor objects. Given the issue with the
counter types I found things became simpler if the counter
descriptions were instead moved into the backend. Given that
INTEL_performance_query doesn't need any active group/counter state
per object, the common object allocator also isn't ideal. So making
both of these changes (which seem to make sense even without the goal
of separating the extension) is enough to make the frontends
completely orthogonal. I also really like that with the counter
declarations in the backend that it's free to use whatever data
structures are appropriate for the various counters. As opposed to
statically declared arrays describing our counters, I needed to update
our backend to programatically build up the lists of available
counters and counter descriptions also necessarily became more
detailed so it was nice that this work could be self contained in the
backend and we can describe our Observation Architecture counters
differently from our pipeline statistics counters.

My thinking a.t.m is that if the current AMD_perfmon backend
architecture seems to be ok for your needs then it could be for the
best that the extensions can be easily made orthogonal so we can
develop support for both extensions without stepping on each other's
toes. Later if it's desirable to support both extensions in any driver
we can always evaluate what opportunities there are to have a common
backend interface if that could simplify things.

 Currently, I'm trying to implement GL_AMD_perfmon as a state tracker which
 is based on the query interface of Gallium and this looks quite good. Only
 minor changes in the current interface are required to do that.

 At this time, most of hardware performance counters are *only* exposed
 through the Gallium HUD and I think it's not very helpful for a large number
 of applications.
 I'm pretty sure that GL_AMD_perfmon will be very useful for exposing GPU
 counters and this is also a requirement for a GSoC project this year.

 So, with respect to your work, my question is : why do you want to get rid
 of AMD_perfmon in favour of INTEL_perf_query ?

From my pov, the 

Re: [Mesa-dev] [PATCH 00/15] GL_AMD_performance_monitor

2015-04-14 Thread Robert Bragg
On Tue, Mar 31, 2015 at 6:56 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Thu, Mar 19, 2015 at 6:40 PM, Robert Bragg rob...@sixbynine.org wrote:
 Hi Samuel,

 I thought I should let you and others know of some related changes
 I've been playing around with to try and improve our support for the
 INTEL_performance_query extension in the i965 dri driver, since it's
 quite related to your work.

 Some of my work-in-progress changes can currently be seen on github here:
 https://github.com/rib/mesa/commits/wip/rib/i915_oa_perf

 I've only recently scrubbed up these mesa changes enough for reviewing
 to make much sense, so sorry I haven't sent out an RFC to the list
 before now about this.

 What I was finding is that it's not very helpful in practice to
 support the AMD_performance_monitor + INTEL_performance query
 extensions via the same core infrastructure and so since there weren't
 previously any other drivers implementing AMD_performance_monitor, my
 series currently just drops the combined infrastructure in favour of
 core support for just INTEL_performance_query.

 The main thing here that might be interesting to think about with
 respect to your work, is whether you might like to do anything
 differently with the main/performance_monitor.c support if you were
 the only user and if it strictly only dealt with the
 AMD_performance_monitor extension?

 I don't guess it would be controversial for us to look at removing the
 current i965 backend for both AMD_performance_monitor and
 INTEL_performance_query for an interim if that might be convenient for
 you to evolve the backend interface. (Currently the i965 backend
 doesn't report many usable metrics unfortunately - mainly just the
 pipeline statistics that are also accessible via query objects.) Ken
 and Petri who did much of the work here are already aware that I've
 been taking this approach as I ran it by them some time ago and their
 work still carries forward in the series I have but instead focused
 only on INTEL_performance_query for now.

 Anyway, I just thought I should throw that out there as a possibility
 to consider in case it might be helpful.

 I would prefer to keep support for AMD_performance_monitor in mesa.
 We may implement more extensive support for this extension in our
 radeon open source drivers and it would be nice to be compatible with
 our closed source drivers on both Linux and other OSes.

Right, especially given that Samuel has implemented a backend now, I
assume the core support will be staying.

I just deleted it in my own series where I was also removing the only
backend for it but I'll be rebasing my series without the patch to
drop the AMD_performance_monitor infrastructure.

To clarify; what I was suggesting above was also based on the
assumption that the core AMD_performance_monitor support is being
kept. I was talking about removing the intel backend if that might
make Samuel's life easier in case he has some ideas for changing the
backend interface. Such changes would be less effort if he doesn't
have to worry about updating the intel backend too. I know we'd need
to make some changes to the backend interface if we were to look at
updating the intel AMD_performance_monitor support so I wouldn't
really want the current backend lending weight to keeping the
interface as is.

I hope that provides some reassurance.

Regards,
- Robert


 Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/15] GL_AMD_performance_monitor

2015-03-19 Thread Robert Bragg
Hi Samuel,

I thought I should let you and others know of some related changes
I've been playing around with to try and improve our support for the
INTEL_performance_query extension in the i965 dri driver, since it's
quite related to your work.

Some of my work-in-progress changes can currently be seen on github here:
https://github.com/rib/mesa/commits/wip/rib/i915_oa_perf

I've only recently scrubbed up these mesa changes enough for reviewing
to make much sense, so sorry I haven't sent out an RFC to the list
before now about this.

What I was finding is that it's not very helpful in practice to
support the AMD_performance_monitor + INTEL_performance query
extensions via the same core infrastructure and so since there weren't
previously any other drivers implementing AMD_performance_monitor, my
series currently just drops the combined infrastructure in favour of
core support for just INTEL_performance_query.

The main thing here that might be interesting to think about with
respect to your work, is whether you might like to do anything
differently with the main/performance_monitor.c support if you were
the only user and if it strictly only dealt with the
AMD_performance_monitor extension?

I don't guess it would be controversial for us to look at removing the
current i965 backend for both AMD_performance_monitor and
INTEL_performance_query for an interim if that might be convenient for
you to evolve the backend interface. (Currently the i965 backend
doesn't report many usable metrics unfortunately - mainly just the
pipeline statistics that are also accessible via query objects.) Ken
and Petri who did much of the work here are already aware that I've
been taking this approach as I ran it by them some time ago and their
work still carries forward in the series I have but instead focused
only on INTEL_performance_query for now.

Anyway, I just thought I should throw that out there as a possibility
to consider in case it might be helpful.

Kind regards,
- Robert


On Wed, Mar 18, 2015 at 4:00 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 Bump, is someone want to make a review of this patch set ? Especially the
 core stuff...


 On 03/09/2015 10:09 PM, Samuel Pitoiset wrote:

 Hello,

 A series I have waited too long to re-submit, but I recently refactored
 the
 code and fixed some minor issues.

 This patchset enables GL_AMD_performance_monitor for svga, freedreno,
 r600,
 radeonsi and nvc0 drivers.

 This code has been tested with Nouveau (NVD9 and NVE7) but it should also
 work
 with other drivers. All piglit tests, including API and measurement tests
 are
 okay.

 Feel free to make a review.

 Christoph Bumiller (1):
st/mesa: implement GL_AMD_performance_monitor

 Samuel Pitoiset (14):
gallium: add pipe_screen::get_driver_query_group_info
gallium: add new fields to pipe_driver_query_info
gallium: add new numeric types to pipe_query_result
gallium: replace pipe_driver_query_info::max_value by a union
gallium: make pipe_context::begin_query return a boolean
gallium: add util_get_driver_query_group_info
svga: implement pipe_screen::get_driver_query_group_info
freedreno: implement pipe_screen::get_driver_query_group_info
radeon: implement pipe_screen::get_driver_query_group_info
nvc0: implement pipe_screen::get_driver_query_group_info
docs: mark GL_AMD_performance_monitor for the 10.6.0 release
nvc0: expose more driver-specific query groups
nvc0: make begin_query return false when all MP counters are used
nvc0: all queries use an unsigned 64-bits integer by default

   docs/relnotes/10.6.0.html  |   1 +
   src/gallium/auxiliary/Makefile.sources |   1 +
   src/gallium/auxiliary/hud/hud_driver_query.c   |   2 +-
   src/gallium/auxiliary/util/u_query.c   |  50 +++
   src/gallium/auxiliary/util/u_query.h   |  45 ++
   src/gallium/docs/source/screen.rst |  10 +
   src/gallium/drivers/freedreno/freedreno_query.c|  25 +-
   src/gallium/drivers/freedreno/freedreno_query.h|   3 +-
   src/gallium/drivers/freedreno/freedreno_query_hw.c |   3 +-
   src/gallium/drivers/freedreno/freedreno_query_sw.c |   3 +-
   src/gallium/drivers/galahad/glhd_context.c |   6 +-
   src/gallium/drivers/i915/i915_query.c  |   3 +-
   src/gallium/drivers/ilo/ilo_query.c|   3 +-
   src/gallium/drivers/llvmpipe/lp_query.c|   3 +-
   src/gallium/drivers/noop/noop_pipe.c   |   3 +-
   src/gallium/drivers/nouveau/nv30/nv30_query.c  |   5 +-
   src/gallium/drivers/nouveau/nv50/nv50_query.c  |   3 +-
   src/gallium/drivers/nouveau/nvc0/nvc0_query.c  |  98 -
   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   1 +
   src/gallium/drivers/nouveau/nvc0/nvc0_screen.h |  14 +
   src/gallium/drivers/r300/r300_query.c  |   9 +-
   src/gallium/drivers/radeon/r600_pipe_common.c  |  25 +-
   

Re: [Mesa-dev] [PATCH] i965: Use the predicate enable bit for conditional rendering without stalling

2014-11-10 Thread Robert Bragg
On Mon, Nov 10, 2014 at 2:57 PM, Neil Roberts n...@linux.intel.com wrote:

 The bit I mentioned about OACONTROL was just saying that the method of
 detecting whether we can write to OACONTROL specifically doesn't work.
 This is because writing to a register that is not in the whitelist
 returns EINVAL and Mesa calls exit(1) when drm_intel_gem_bo_context_exec
 fails. I used a different method to detect whether we can write to
 MI_PREDICATE_SRC0 by manually calling drm_intel_gem_bo_context_exec to
 avoid the call to exit(1). I think the code to check whether OACONTROL
 is accessible is wrong because the process will have already exited
 before it gets a chance to check whether the load worked. In v2 of the
 patch I made it check the cmd parser version as you suggested. We should
 probably do something similar for OACONTROL, but that is a separate
 issue.

Hmm, that's awkward. I had been thinking we should aim remove the
OACONTROL tricks from the cmd parser, considering that it's not
currently possible to program the OA to get useful data this way, but
it sounds like we'd actually start making Mesa apps exit(1) if we were
to do that :-/

If we add more complete support for the OA via perf then it would be
nice if we just had one place responsible for programming OACONTROL so
I'd been thinking we could just drop it from the cmd parser white list
- assuming older versions of Mesa would handle that gracefully.

'Guess it won't be quite so simple :-)

Regards,
- Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: save and restore swizzle for _GenerateMipmap

2014-06-11 Thread Robert Bragg
On Tue, Jun 10, 2014 at 6:00 PM, Ian Romanick i...@freedesktop.org wrote:
 On 06/09/2014 07:48 AM, Robert Bragg wrote:
 This makes sure to use a no-op swizzle while iteratively rendering each
 level of a mipmap otherwise we may loose components and effectively
 apply the swizzle twice by the time these levels are sampled.

 Right... the swizzle state shouldn't affect mipmap generation at all.
 One minor nit below, but otherwise

 Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 I second Chris's request for a piglit test.  It should be pretty easy to
 clone one of the existing tests/fbo/fbo-generatemipmap-*.c tests and
 modify it to test this case.

Thanks Chris and Ian for the review; I've landed the patch and sent a
test case to the piglit mailinglist.

--
Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: initialize src as reg_undef for texture opcodes

2014-06-09 Thread Robert Bragg
Hey Tapani,

I came across this issue the other day too and can at least confirm
that I wrote an almost identical patch.

I was a bit unsure whether this was the best place to fix this issue
since it seems a bit unobvious, in isolation, why we emit these
texture ops with a place holder register, but maybe for others more
experienced than me with the code, this is fine.

Something I wasn't clear about, was whether the
gen6_resolve_implied_move() call in brw_SAMPLE is really needed. It
looks like the fs and vec4 visitors (where we emit these ops) both
explicitly setup an MRF payload and before 07af0ab both would have
effectively passed reg_undef registers through to generate_tex() which
would just result in the _resolve_move() being a no-op that returns a
register for msg_reg_nr.

As another possible approach; I smoke tested the following change
which also seemed to fixed this issue for me:

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 2fa65e9..899fe47 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2317,7 +2317,7 @@ void brw_SAMPLE(struct brw_compile *p,
struct brw_instruction *insn;

if (msg_reg_nr != -1)
-  gen6_resolve_implied_move(p, src0, msg_reg_nr);
+  src0 = brw_message_reg(msg_reg_nr);

insn = next_insn(p, BRW_OPCODE_SEND);
insn-header.predicate_control = 0; /* XXX */

I'm not familiar enough with this to know if a change like that would
be right/acceptable though and haven't had a chance to try and run
piglit to see if that breaks something I'm not aware of.

--
Regards,
Robert


On Mon, Jun 9, 2014 at 10:30 AM, Tapani Pälli tapani.pa...@intel.com wrote:
 Commit 07af0ab changed fs_inst to have 0 sources for texture opcodes
 in emit_texture_gen5 (Ironlake, Sandybrige) while fs_generator still
 uses a single source from brw_reg struct. Patch sets src as reg_undef
 which matches the behavior before the constructor got changed.

 Signed-off-by: Tapani Pälli tapani.pa...@intel.com
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79534
 ---
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index 0b4fbb7..3089932 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -1171,21 +1171,21 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg 
 dst, fs_reg coordinate,
 fs_inst *inst = NULL;
 switch (ir-op) {
 case ir_tex:
 -  inst = emit(SHADER_OPCODE_TEX, dst);
 +  inst = emit(SHADER_OPCODE_TEX, dst, reg_undef);
break;
 case ir_txb:
mlen = MAX2(mlen, header_present + 4 * reg_width);
emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
mlen += reg_width;

 -  inst = emit(FS_OPCODE_TXB, dst);
 +  inst = emit(FS_OPCODE_TXB, dst, reg_undef);
break;
 case ir_txl:
mlen = MAX2(mlen, header_present + 4 * reg_width);
emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
mlen += reg_width;

 -  inst = emit(SHADER_OPCODE_TXL, dst);
 +  inst = emit(SHADER_OPCODE_TXL, dst, reg_undef);
break;
 case ir_txd: {
mlen = MAX2(mlen, header_present + 4 * reg_width); /* skip over 'ai' */
 @@ -1215,17 +1215,17 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg 
 dst, fs_reg coordinate,
 case ir_txs:
emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), lod));
mlen += reg_width;
 -  inst = emit(SHADER_OPCODE_TXS, dst);
 +  inst = emit(SHADER_OPCODE_TXS, dst, reg_undef);
break;
 case ir_query_levels:
emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), 
 fs_reg(0u)));
mlen += reg_width;
 -  inst = emit(SHADER_OPCODE_TXS, dst);
 +  inst = emit(SHADER_OPCODE_TXS, dst, reg_undef);
break;
 case ir_txf:
mlen = header_present + 4 * reg_width;
emit(MOV(fs_reg(MRF, base_mrf + mlen - reg_width, 
 BRW_REGISTER_TYPE_UD), lod));
 -  inst = emit(SHADER_OPCODE_TXF, dst);
 +  inst = emit(SHADER_OPCODE_TXF, dst, reg_undef);
break;
 case ir_txf_ms:
mlen = header_present + 4 * reg_width;
 @@ -1235,13 +1235,13 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg 
 dst, fs_reg coordinate,
/* sample index */
emit(MOV(fs_reg(MRF, base_mrf + mlen, BRW_REGISTER_TYPE_UD), 
 sample_index));
mlen += reg_width;
 -  inst = emit(SHADER_OPCODE_TXF_CMS, dst);
 +  inst = emit(SHADER_OPCODE_TXF_CMS, dst, reg_undef);
break;
 case ir_lod:
 -  inst = emit(SHADER_OPCODE_LOD, dst);
 +  inst = emit(SHADER_OPCODE_LOD, dst, reg_undef);
break;
 case ir_tg4:
 -  inst = emit(SHADER_OPCODE_TG4, dst);
 +  inst = emit(SHADER_OPCODE_TG4, dst, reg_undef);
break;
 

[Mesa-dev] [PATCH] meta: save and restore swizzle for _GenerateMipmap

2014-06-09 Thread Robert Bragg
This makes sure to use a no-op swizzle while iteratively rendering each
level of a mipmap otherwise we may loose components and effectively
apply the swizzle twice by the time these levels are sampled.
---
 src/mesa/drivers/common/meta_generate_mipmap.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/common/meta_generate_mipmap.c 
b/src/mesa/drivers/common/meta_generate_mipmap.c
index d12806c..4b1718d 100644
--- a/src/mesa/drivers/common/meta_generate_mipmap.c
+++ b/src/mesa/drivers/common/meta_generate_mipmap.c
@@ -43,6 +43,7 @@
 #include main/varray.h
 #include main/viewport.h
 #include drivers/common/meta.h
+#include program/prog_instruction.h
 
 
 /**
@@ -168,6 +169,8 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum 
target,
GLenum faceTarget;
GLuint dstLevel;
GLuint samplerSave;
+   GLint swizzle[4];
+   GLboolean swizzleSaved = GL_FALSE;
 
if (fallback_required(ctx, target, texObj)) {
   _mesa_generate_mipmap(ctx, target, texObj);
@@ -231,6 +234,13 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum 
target,
 
_mesa_TexParameteri(target, GL_GENERATE_MIPMAP, GL_FALSE);
 
+   if (texObj-_Swizzle != SWIZZLE_NOOP) {
+  static const GLint swizzleNoop[4] = { GL_RED, GL_GREEN, GL_BLUE, 
GL_ALPHA };
+  memcpy(swizzle, texObj-Swizzle, sizeof(swizzle));
+  swizzleSaved = GL_TRUE;
+  _mesa_TexParameteriv(target, GL_TEXTURE_SWIZZLE_RGBA, swizzleNoop);
+   }
+
/* Silence valgrind warnings about reading uninitialized stack. */
memset(verts, 0, sizeof(verts));
 
@@ -347,4 +357,6 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum 
target,
_mesa_TexParameteri(target, GL_TEXTURE_MAX_LEVEL, maxLevelSave);
if (genMipmapSave)
   _mesa_TexParameteri(target, GL_GENERATE_MIPMAP, genMipmapSave);
+   if (swizzleSaved)
+  _mesa_TexParameteriv(target, GL_TEXTURE_SWIZZLE_RGBA, swizzle);
 }
-- 
1.9.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] dri3: Add GLX_EXT_buffer_age support

2014-02-24 Thread Robert Bragg
Hi Adel,

Thanks for look at this; just a few comments...

 diff --git a/src/glx/glx_pbuffer.c b/src/glx/glx_pbuffer.c
 index 978730c..afb4206 100644
 --- a/src/glx/glx_pbuffer.c
 +++ b/src/glx/glx_pbuffer.c
 @@ -373,6 +373,14 @@ GetDrawableAttribute(Display * dpy, GLXDrawable drawable,
  if (!pdraw-textureFormat)
 pdraw-textureFormat =
determineTextureFormat((const int *) data, num_attributes);
 +
 +if (attribute == GLX_BACK_BUFFER_AGE_EXT) {
 +   struct glx_screen *psc = pdraw-psc;
 +
 +   if (psc-driScreen-getBufferAge != NULL)
 + *value = psc-driScreen-getBufferAge(pdraw);
 +}
 +
   }
  #endif

Since with dri3 this is a client side only attribute, shouldn't we be
able to simply skip the redundant X round trip above to query the
drawable's attributes?

Technically we're supposed to report a GLXBadDrawable if the drawable
isn't currently bound to the current thread's context. I think the
requirement to be bound to a context was added to the spec considering
that if an app were switching between a direct/indirect context the
age state may switch from being client-side to server-side and they
shouldn't have to be kept in sync. glXSwapBuffers has a similar
requirement so it should be possible to borrow some checks from there.

Apart from that, this patch looks good to me:

Reviewed-by: Robert Bragg rob...@sixbynine.org

--
Regards,
Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] Implement EGL_EXT_swap_buffers_with_damage

2013-04-25 Thread Robert Bragg
This series adds support for the EGL_EXT_swap_buffers_with_damage[1] extension
for the egl wayland platform.

This extension is a natural counterpart to the EGL_EXT_buffer_age extension
that allows applications to inform the compositor what sub-regions of a buffer
have really changed with respect to the last presented buffer.

[1] http://www.khronos.org/registry/egl/extensions/EXT/
EGL_EXT_swap_buffers_with_damage.txt

Robert Bragg (3):
  egl: Update to revision 21254 of eglext.h
  egl: Add extension infrastructure for EGL_EXT_swap_buffers_with_damage
  egl/wayland: Implement EGL_EXT_swap_buffers_with_damage

 include/EGL/eglext.h| 45 ++---
 src/egl/drivers/dri2/platform_wayland.c | 27 +---
 src/egl/main/eglapi.c   | 32 +++
 src/egl/main/eglapi.h   |  7 +
 src/egl/main/egldisplay.h   |  1 +
 src/egl/main/eglmisc.c  |  1 +
 6 files changed, 107 insertions(+), 6 deletions(-)

-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] egl: Add extension infrastructure for EGL_EXT_swap_buffers_with_damage

2013-04-25 Thread Robert Bragg
---
 src/egl/main/eglapi.c | 32 
 src/egl/main/eglapi.h |  7 +++
 src/egl/main/egldisplay.h |  1 +
 src/egl/main/eglmisc.c|  1 +
 4 files changed, 41 insertions(+)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index bcc5465..30d4ee2 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -701,6 +701,35 @@ eglSwapBuffers(EGLDisplay dpy, EGLSurface surface)
 }
 
 
+#ifdef EGL_EXT_swap_buffers_with_damage
+
+EGLBoolean EGLAPIENTRY
+eglSwapBuffersWithDamageEXT(EGLDisplay dpy, EGLSurface surface,
+EGLint *rects, EGLint n_rects)
+{
+   _EGLContext *ctx = _eglGetCurrentContext();
+   _EGLDisplay *disp = _eglLockDisplay(dpy);
+   _EGLSurface *surf = _eglLookupSurface(surface, disp);
+   _EGLDriver *drv;
+   EGLBoolean ret;
+
+   _EGL_CHECK_SURFACE(disp, surf, EGL_FALSE, drv);
+
+   /* surface must be bound to current context in EGL 1.4 */
+   if (_eglGetContextHandle(ctx) == EGL_NO_CONTEXT ||
+   surf != ctx-DrawSurface)
+  RETURN_EGL_ERROR(disp, EGL_BAD_SURFACE, EGL_FALSE);
+
+   if (rects == NULL || n_rects = 0)
+  RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_FALSE);
+
+   ret = drv-API.SwapBuffersWithDamageEXT(drv, disp, surf, rects, n_rects);
+
+   RETURN_EGL_EVAL(disp, ret);
+}
+
+#endif /* EGL_EXT_swap_buffers_with_damage */
+
 EGLBoolean EGLAPIENTRY
 eglCopyBuffers(EGLDisplay dpy, EGLSurface surface, EGLNativePixmapType target)
 {
@@ -939,6 +968,9 @@ eglGetProcAddress(const char *procname)
   { eglQueryWaylandBufferWL, (_EGLProc) eglQueryWaylandBufferWL },
 #endif
   { eglPostSubBufferNV, (_EGLProc) eglPostSubBufferNV },
+#ifdef EGL_EXT_swap_buffers_with_damage
+  { eglSwapBuffersWithDamageEXT, (_EGLProc) eglSwapBuffersWithDamageEXT 
},
+#endif
   { NULL, NULL }
};
EGLint i;
diff --git a/src/egl/main/eglapi.h b/src/egl/main/eglapi.h
index 85b8f1a..ee382d0 100644
--- a/src/egl/main/eglapi.h
+++ b/src/egl/main/eglapi.h
@@ -131,6 +131,9 @@ typedef EGLBoolean (*PostSubBufferNV_t)(_EGLDriver *drv, 
_EGLDisplay *disp, _EGL
 typedef EGLint (*QueryBufferAge_t)(_EGLDriver *drv,
_EGLDisplay *dpy, _EGLSurface *surface);
 
+#ifdef EGL_EXT_swap_buffers_with_damage
+typedef EGLBoolean (*SwapBuffersWithDamageEXT_t) (_EGLDriver *drv, _EGLDisplay 
*dpy, _EGLSurface *surface, const EGLint *rects, EGLint n_rects);
+#endif
 
 /**
  * The API dispatcher jumps through these functions
@@ -207,6 +210,10 @@ struct _egl_api
QueryWaylandBufferWL_t QueryWaylandBufferWL;
 #endif
 
+#ifdef EGL_EXT_swap_buffers_with_damage
+   SwapBuffersWithDamageEXT_t SwapBuffersWithDamageEXT;
+#endif /* EGL_EXT_swap_buffers_with_damage */
+
PostSubBufferNV_t PostSubBufferNV;
 
QueryBufferAge_t QueryBufferAge;
diff --git a/src/egl/main/egldisplay.h b/src/egl/main/egldisplay.h
index 4b33470..f990300 100644
--- a/src/egl/main/egldisplay.h
+++ b/src/egl/main/egldisplay.h
@@ -115,6 +115,7 @@ struct _egl_extensions
 
EGLBoolean EXT_create_context_robustness;
EGLBoolean EXT_buffer_age;
+   EGLBoolean EXT_swap_buffers_with_damage;
 };
 
 
diff --git a/src/egl/main/eglmisc.c b/src/egl/main/eglmisc.c
index 92b0eae..a843ce7 100644
--- a/src/egl/main/eglmisc.c
+++ b/src/egl/main/eglmisc.c
@@ -117,6 +117,7 @@ _eglUpdateExtensionsString(_EGLDisplay *dpy)
 
_EGL_CHECK_EXTENSION(EXT_create_context_robustness);
_EGL_CHECK_EXTENSION(EXT_buffer_age);
+   _EGL_CHECK_EXTENSION(EXT_swap_buffers_with_damage);
 
_EGL_CHECK_EXTENSION(NV_post_sub_buffer);
 #undef _EGL_CHECK_EXTENSION
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] egl: Update to revision 21254 of eglext.h

2013-04-25 Thread Robert Bragg
This pulls in EGL_EXT_swap_buffers_with_damage.
---
 include/EGL/eglext.h | 45 ++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/include/EGL/eglext.h b/include/EGL/eglext.h
index b2b5a80..1d68178 100644
--- a/include/EGL/eglext.h
+++ b/include/EGL/eglext.h
@@ -6,7 +6,7 @@ extern C {
 #endif
 
 /*
-** Copyright (c) 2007-2012 The Khronos Group Inc.
+** Copyright (c) 2007-2013 The Khronos Group Inc.
 **
 ** Permission is hereby granted, free of charge, to any person obtaining a
 ** copy of this software and/or associated documentation files (the
@@ -34,8 +34,8 @@ extern C {
 
 /* Header file version number */
 /* Current version at http://www.khronos.org/registry/egl/ */
-/* $Revision: 19987 $ on $Date: 2012-12-13 16:46:46 -0800 (Thu, 13 Dec 2012) $ 
*/
-#define EGL_EGLEXT_VERSION 14
+/* $Revision: 21254 $ on $Date: 2013-04-25 03:11:55 -0700 (Thu, 25 Apr 2013) $ 
*/
+#define EGL_EGLEXT_VERSION 16
 
 #ifndef EGL_KHR_config_attribs
 #define EGL_KHR_config_attribs 1
@@ -532,6 +532,45 @@ typedef EGLint (EGLAPIENTRYP 
PFNEGLDUPNATIVEFENCEFDANDROIDPROC)(EGLDisplay dpy,
 #define EGL_BUFFER_AGE_EXT 0x313D
 #endif
 
+#ifndef EGL_EXT_image_dma_buf_import
+#define EGL_EXT_image_dma_buf_import 1
+#define EGL_LINUX_DMA_BUF_EXT  0x3270
+#define EGL_LINUX_DRM_FOURCC_EXT   0x3271
+#define EGL_DMA_BUF_PLANE0_FD_EXT  0x3272
+#define EGL_DMA_BUF_PLANE0_OFFSET_EXT  0x3273
+#define EGL_DMA_BUF_PLANE0_PITCH_EXT   0x3274
+#define EGL_DMA_BUF_PLANE1_FD_EXT  0x3275
+#define EGL_DMA_BUF_PLANE1_OFFSET_EXT  0x3276
+#define EGL_DMA_BUF_PLANE1_PITCH_EXT   0x3277
+#define EGL_DMA_BUF_PLANE2_FD_EXT  0x3278
+#define EGL_DMA_BUF_PLANE2_OFFSET_EXT  0x3279
+#define EGL_DMA_BUF_PLANE2_PITCH_EXT   0x327A
+#define EGL_YUV_COLOR_SPACE_HINT_EXT   0x327B
+#define EGL_SAMPLE_RANGE_HINT_EXT  0x327C
+#define EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT 0x327D
+#define EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT 0x327E
+#define EGL_ITU_REC601_EXT 0x327F
+#define EGL_ITU_REC709_EXT 0x3280
+#define EGL_ITU_REC2020_EXT0x3281
+#define EGL_YUV_FULL_RANGE_EXT 0x3282
+#define EGL_YUV_NARROW_RANGE_EXT   0x3283
+#define EGL_YUV_CHROMA_SITING_0_EXT0x3284
+#define EGL_YUV_CHROMA_SITING_0_5_EXT  0x3285
+#endif
+
+#ifndef EGL_ARM_pixmap_multisample_discard
+#define EGL_ARM_pixmap_multisample_discard 1
+#define EGL_DISCARD_SAMPLES_ARM0x3286
+#endif
+
+#ifndef EGL_EXT_swap_buffers_with_damage
+#define EGL_EXT_swap_buffers_with_damage 1
+#ifdef EGL_EGLEXT_PROTOTYPES
+EGLAPI EGLBoolean EGLAPIENTRY eglSwapBuffersWithDamageEXT( EGLDisplay dpy, 
EGLSurface surface, EGLint *rects, EGLint n_rects);
+#endif /* EGL_EGLEXT_PROTOTYPES */
+typedef EGLBoolean (EGLAPIENTRYP 
PFNEGLSWAPBUFFERSWITHDAMAGEEXTPROC)(EGLDisplay dpy, EGLSurface surface, EGLint 
*rects, EGLint n_rects);
+#endif
+
 #include EGL/eglmesaext.h
 
 #ifdef __cplusplus
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] egl/wayland: Implement EGL_EXT_swap_buffers_with_damage

2013-04-25 Thread Robert Bragg
---
 src/egl/drivers/dri2/platform_wayland.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index e9a66af..8c12dd8 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -442,7 +442,11 @@ static const struct wl_callback_listener frame_listener = {
  * Called via eglSwapBuffers(), drv-API.SwapBuffers().
  */
 static EGLBoolean
-dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
+dri2_swap_buffers_with_damage(_EGLDriver *drv,
+  _EGLDisplay *disp,
+  _EGLSurface *draw,
+  const EGLint *rects,
+  EGLint n_rects)
 {
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
@@ -491,8 +495,13 @@ dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *draw)
dri2_surf-dx = 0;
dri2_surf-dy = 0;
 
-   wl_surface_damage(dri2_surf-wl_win-surface, 0, 0,
- dri2_surf-base.Width, dri2_surf-base.Height);
+   for (i = 0; i  n_rects; i++) {
+  const int *rect = rects[i * 4];
+  wl_surface_damage(dri2_surf-wl_win-surface,
+rect[0],
+dri2_surf-base.Height - rect[1] - rect[3],
+rect[2], rect[3]);
+   }
 
wl_surface_commit(dri2_surf-wl_win-surface);
 
@@ -517,6 +526,15 @@ dri2_query_buffer_age(_EGLDriver *drv,
return dri2_surf-back-age;
 }
 
+static EGLBoolean
+dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
+{
+   struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
+   EGLint damage[4] = { 0, 0, dri2_surf-base.Width, dri2_surf-base.Height };
+
+   return dri2_swap_buffers_with_damage (drv, disp, draw, damage, 1);
+}
+
 static int
 dri2_wayland_authenticate(_EGLDisplay *disp, uint32_t id)
 {
@@ -653,6 +671,7 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp)
drv-API.CreateWindowSurface = dri2_create_window_surface;
drv-API.DestroySurface = dri2_destroy_surface;
drv-API.SwapBuffers = dri2_swap_buffers;
+   drv-API.SwapBuffersWithDamageEXT = dri2_swap_buffers_with_damage;
drv-API.Terminate = dri2_terminate;
drv-API.QueryBufferAge = dri2_query_buffer_age;
 
@@ -722,6 +741,8 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp)
disp-Extensions.EXT_buffer_age = EGL_TRUE;
dri2_dpy-authenticate = dri2_wayland_authenticate;
 
+   disp-Extensions.EXT_swap_buffers_with_damage = EGL_TRUE;
+
/* we're supporting EGL 1.4 */
disp-VersionMajor = 1;
disp-VersionMinor = 4;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] egl/wayland: Implement EGL_EXT_swap_buffers_with_damage

2013-04-25 Thread Robert Bragg
I've updated this patch to handle the case where 0 rectangles have been given
and updated the eglSwapBuffers shim to rely on that. Thanks to Eric for
noticing this.

-- 8 --

Reviewed-by: Eric Anholt e...@anholt.net

---
 src/egl/drivers/dri2/platform_wayland.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index e9a66af..f7c43e1 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -442,7 +442,11 @@ static const struct wl_callback_listener frame_listener = {
  * Called via eglSwapBuffers(), drv-API.SwapBuffers().
  */
 static EGLBoolean
-dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
+dri2_swap_buffers_with_damage(_EGLDriver *drv,
+  _EGLDisplay *disp,
+  _EGLSurface *draw,
+  const EGLint *rects,
+  EGLint n_rects)
 {
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
@@ -491,8 +495,18 @@ dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *draw)
dri2_surf-dx = 0;
dri2_surf-dy = 0;
 
-   wl_surface_damage(dri2_surf-wl_win-surface, 0, 0,
- dri2_surf-base.Width, dri2_surf-base.Height);
+   if (n_rects == 0) {
+  wl_surface_damage(dri2_surf-wl_win-surface, 0, 0,
+dri2_surf-base.Width, dri2_surf-base.Height);
+   } else {
+  for (i = 0; i  n_rects; i++) {
+ const int *rect = rects[i * 4];
+ wl_surface_damage(dri2_surf-wl_win-surface,
+   rect[0],
+   dri2_surf-base.Height - rect[1] - rect[3],
+   rect[2], rect[3]);
+  }
+   }
 
wl_surface_commit(dri2_surf-wl_win-surface);
 
@@ -517,6 +531,12 @@ dri2_query_buffer_age(_EGLDriver *drv,
return dri2_surf-back-age;
 }
 
+static EGLBoolean
+dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
+{
+   return dri2_swap_buffers_with_damage (drv, disp, draw, NULL, 0);
+}
+
 static int
 dri2_wayland_authenticate(_EGLDisplay *disp, uint32_t id)
 {
@@ -653,6 +673,7 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp)
drv-API.CreateWindowSurface = dri2_create_window_surface;
drv-API.DestroySurface = dri2_destroy_surface;
drv-API.SwapBuffers = dri2_swap_buffers;
+   drv-API.SwapBuffersWithDamageEXT = dri2_swap_buffers_with_damage;
drv-API.Terminate = dri2_terminate;
drv-API.QueryBufferAge = dri2_query_buffer_age;
 
@@ -722,6 +743,8 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp)
disp-Extensions.EXT_buffer_age = EGL_TRUE;
dri2_dpy-authenticate = dri2_wayland_authenticate;
 
+   disp-Extensions.EXT_swap_buffers_with_damage = EGL_TRUE;
+
/* we're supporting EGL 1.4 */
disp-VersionMajor = 1;
disp-VersionMinor = 4;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl-util: Adds probe_front_pixel_rgb function

2012-10-02 Thread Robert Bragg
On Thu, Sep 27, 2012 at 6:48 PM, Matt Turner matts...@gmail.com wrote:
 On Thu, Sep 20, 2012 at 8:59 AM, Robert Bragg rob...@sixbynine.org wrote:
 This adds an egl_probe_front_pixel_rgb function that is analogous to
 piglit_probe_pixel_rgba except it probes the front buffer instead of
 probing the back buffer.
 ---
  tests/egl/egl-util.c |   31 +++
  tests/egl/egl-util.h |4 
  2 files changed, 35 insertions(+), 0 deletions(-)

 I applied these two patches with the intention of committing them, but
 they cause egl-nok-swap-region to fail:

 Front Buffer Probe at (15,15)
   Expected: 1.00 0.00 0.00 1.00
   Observed: 0.00 1.00 0.00 0.00
 PIGLIT: {'result': 'fail' }

 The test (before and after) doesn't end on its own either. I have to
 close it in order to get the test result. I don't know if that's by
 design, but it's not good for automated piglit testing.

Hmm that's curious, I just rebased the patches on master, as well as
the corresponding mesa patch and the test works as expected for me. Is
it failing for you because you haven't got the mesa patch perhaps? (I
don't have commit access to land that myself) As for the test
blocking, I haven't seen that happen before so I don't know what could
be up there, but I guess it's not related to my changes.

kind regards,
- Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] egl-util: Adds probe_front_pixel_rgb function

2012-09-20 Thread Robert Bragg
This adds an egl_probe_front_pixel_rgb function that is analogous to
piglit_probe_pixel_rgba except it probes the front buffer instead of
probing the back buffer.
---
 tests/egl/egl-util.c |   30 ++
 tests/egl/egl-util.h |4 
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/tests/egl/egl-util.c b/tests/egl/egl-util.c
index 20cd699..e26add4 100644
--- a/tests/egl/egl-util.c
+++ b/tests/egl/egl-util.c
@@ -37,6 +37,36 @@ static int automatic;
 
 int depth;
 
+int
+egl_probe_front_pixel_rgb(struct egl_state *state,
+ int x, int y, const float *expected)
+{
+   XImage *ximage = XGetImage(state-dpy, state-win,
+  x, state-height - y - 1, 1, 1, AllPlanes, 
ZPixmap);
+   unsigned long pixel = XGetPixel(ximage, 0, 0);
+   uint8_t *probe = (uint8_t *)pixel;
+   int pass = 1;
+
+   XDestroyImage(ximage);
+
+   /* NB: XGetPixel returns a normalized BGRA, byte per
+* component, pixel format */
+   if (probe[2] != expected[0]*255 ||
+   probe[1] != expected[1]*255 ||
+   probe[0] != expected[2]*255) {
+   pass = 0;
+   }
+
+   if (pass)
+   return 1;
+
+   printf(Front Buffer Probe at (%i,%i)\n, x, y);
+   printf(  Expected: %f %f %f %f\n, expected[0], expected[1], 
expected[2], expected[3]);
+   printf(  Observed: %f %f %f %f\n, probe[0]/255.0, probe[1]/255.0, 
probe[2]/255.0, probe[3]/255.0);
+
+   return 0;
+}
+
 void
 egl_init_test(struct egl_test *test)
 {
diff --git a/tests/egl/egl-util.h b/tests/egl/egl-util.h
index 87c2db3..e1caa94 100644
--- a/tests/egl/egl-util.h
+++ b/tests/egl/egl-util.h
@@ -62,4 +62,8 @@ egl_util_create_pixmap(struct egl_state *state,
 
 int egl_util_run(const struct egl_test *test, int argc, char *argv[]);
 
+int
+egl_probe_front_pixel_rgb(struct egl_state *state,
+ int x, int y, const float *expected);
+
 #endif
-- 
1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] egl-nok-swap-region: probe front not back buffer

2012-09-20 Thread Robert Bragg
The egl-nok-swap-buffer was mistakenly reading the back buffer to test
the correctness of eglSwapRegionNOK but once the second glClear() to red
has been issued the whole back buffer is red and so the test would
always passed since it only probed points it expected to be red anyway.

This patch now uses egl_probe_front_pixel_rgb to read the front buffer
instead and in addition to checking that certain pixels should be red,
it also checks that other pixels are still green.
---
 tests/egl/egl-nok-swap-region.c |   39 +--
 1 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/tests/egl/egl-nok-swap-region.c b/tests/egl/egl-nok-swap-region.c
index 9124161..dad2750 100644
--- a/tests/egl/egl-nok-swap-region.c
+++ b/tests/egl/egl-nok-swap-region.c
@@ -40,12 +40,35 @@ draw(struct egl_state *state)
 {
EGLint rects[] = { 
10, 10, 10, 10, 
-   20, 20, 20, 10,
-   40, 30, 10, 20,
+   20, 20, 20, 10, /* wide rect */
+   40, 30, 10, 20, /* tall rect */
50, 50, 10, 10
};
-   PFNEGLSWAPBUFFERSREGIONNOK swap_buffers_region;
+   float green[] = { 0.0, 1.0, 0.0, 1.0};
float red[] = { 1.0, 0.0, 0.0, 1.0};
+   struct {
+   int x, y;
+   const float *expected;
+   } probes[] = {
+   { 15, 15, red },
+   { 15, state-height - 15, green },
+
+   { 25, 25, red },
+   { 35, 25, red },
+   { 25, 35, green },
+   { 25, state-height - 25, green },
+
+   { 45, 35, red },
+   { 45, 45, red },
+   { 55, 35, green },
+   { 45, state-height - 35, green },
+
+   { 55, 55, red },
+   { 55, state-height - 55, green },
+
+   { 0, 0, NULL }
+   };
+   PFNEGLSWAPBUFFERSREGIONNOK swap_buffers_region;
int i;
 
swap_buffers_region = (PFNEGLSWAPBUFFERSREGIONNOK) 
@@ -64,9 +87,13 @@ draw(struct egl_state *state)
glClear(GL_COLOR_BUFFER_BIT);
swap_buffers_region(state-egl_dpy, state-surf, 4, rects);
 
-   for (i = 0; i  16; i += 4)
-   if (!piglit_probe_pixel_rgba(rects[i] + 5,
-rects[i + 1] + 5, red))
+   glFinish();
+
+   for (i = 0; probes[i].expected; i ++)
+   if (!egl_probe_front_pixel_rgb(state,
+  probes[i].x,
+  probes[i].y,
+  probes[i].expected))
return PIGLIT_FAIL;
 
return PIGLIT_PASS;
-- 
1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] egl-nok-swap-region: probe front not back buffer

2012-09-20 Thread Robert Bragg
Ah, I only just realized there is a separate list for piglit tests,
sorry. Hopefully it's ok to review these here this time, but I'll
remember to send piglit patches to the right list in the future.

regards,
- Robert

On Thu, Sep 20, 2012 at 2:21 PM, Robert Bragg rob...@sixbynine.org wrote:
 The egl-nok-swap-buffer was mistakenly reading the back buffer to test
 the correctness of eglSwapRegionNOK but once the second glClear() to red
 has been issued the whole back buffer is red and so the test would
 always passed since it only probed points it expected to be red anyway.

 This patch now uses egl_probe_front_pixel_rgb to read the front buffer
 instead and in addition to checking that certain pixels should be red,
 it also checks that other pixels are still green.
 ---
  tests/egl/egl-nok-swap-region.c |   39 
 +--
  1 files changed, 33 insertions(+), 6 deletions(-)

 diff --git a/tests/egl/egl-nok-swap-region.c b/tests/egl/egl-nok-swap-region.c
 index 9124161..dad2750 100644
 --- a/tests/egl/egl-nok-swap-region.c
 +++ b/tests/egl/egl-nok-swap-region.c
 @@ -40,12 +40,35 @@ draw(struct egl_state *state)
  {
 EGLint rects[] = {
 10, 10, 10, 10,
 -   20, 20, 20, 10,
 -   40, 30, 10, 20,
 +   20, 20, 20, 10, /* wide rect */
 +   40, 30, 10, 20, /* tall rect */
 50, 50, 10, 10
 };
 -   PFNEGLSWAPBUFFERSREGIONNOK swap_buffers_region;
 +   float green[] = { 0.0, 1.0, 0.0, 1.0};
 float red[] = { 1.0, 0.0, 0.0, 1.0};
 +   struct {
 +   int x, y;
 +   const float *expected;
 +   } probes[] = {
 +   { 15, 15, red },
 +   { 15, state-height - 15, green },
 +
 +   { 25, 25, red },
 +   { 35, 25, red },
 +   { 25, 35, green },
 +   { 25, state-height - 25, green },
 +
 +   { 45, 35, red },
 +   { 45, 45, red },
 +   { 55, 35, green },
 +   { 45, state-height - 35, green },
 +
 +   { 55, 55, red },
 +   { 55, state-height - 55, green },
 +
 +   { 0, 0, NULL }
 +   };
 +   PFNEGLSWAPBUFFERSREGIONNOK swap_buffers_region;
 int i;

 swap_buffers_region = (PFNEGLSWAPBUFFERSREGIONNOK)
 @@ -64,9 +87,13 @@ draw(struct egl_state *state)
 glClear(GL_COLOR_BUFFER_BIT);
 swap_buffers_region(state-egl_dpy, state-surf, 4, rects);

 -   for (i = 0; i  16; i += 4)
 -   if (!piglit_probe_pixel_rgba(rects[i] + 5,
 -rects[i + 1] + 5, red))
 +   glFinish();
 +
 +   for (i = 0; probes[i].expected; i ++)
 +   if (!egl_probe_front_pixel_rgb(state,
 +  probes[i].x,
 +  probes[i].y,
 +  probes[i].expected))
 return PIGLIT_FAIL;

 return PIGLIT_PASS;
 --
 1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] egl-util: Adds probe_front_pixel_rgb function

2012-09-20 Thread Robert Bragg
On Thu, Sep 20, 2012 at 3:34 PM, Brian Paul bri...@vmware.com wrote:
 On 09/20/2012 07:21 AM, Robert Bragg wrote:

 This adds an egl_probe_front_pixel_rgb function that is analogous to
 piglit_probe_pixel_rgba except it probes the front buffer instead of
 probing the back buffer.
 ---
   tests/egl/egl-util.c |   30 ++
   tests/egl/egl-util.h |4 
   2 files changed, 34 insertions(+), 0 deletions(-)

 diff --git a/tests/egl/egl-util.c b/tests/egl/egl-util.c
 index 20cd699..e26add4 100644
 --- a/tests/egl/egl-util.c
 +++ b/tests/egl/egl-util.c
 @@ -37,6 +37,36 @@ static int automatic;

   int depth;

 +int
 +egl_probe_front_pixel_rgb(struct egl_state *state,
 + int x, int y, const float *expected)
 +{
 +   XImage *ximage = XGetImage(state-dpy, state-win,
 +  x, state-height - y - 1, 1, 1,
 AllPlanes, ZPixmap);
 +   unsigned long pixel = XGetPixel(ximage, 0, 0);
 +   uint8_t *probe = (uint8_t *)pixel;
 +   int pass = 1;
 +
 +   XDestroyImage(ximage);
 +
 +   /* NB: XGetPixel returns a normalized BGRA, byte per
 +* component, pixel format */


 Is that always true?  Don't you need to examine the window's XVisualInfo's
 red/green/blue_mask vales to determine the position of the components in the
 ulong?

Yeah *maybe* you need to check the visual mask values to confirm the
order I'm not 100% sure. I didn't quite fully decipher how much
XGetPixel does for you. Certainly it tries to avoid you needing to
handle the full myriad of XImage pixel formats and stepping through
the code I could at least see it was reordering the components
depending on ximage-byte_order, but maybe the component order isn't
normalized.


 In any case, 8-bit BGRA can probably be safely assumed for now.



 +   if (probe[2] != expected[0]*255 ||
 +   probe[1] != expected[1]*255 ||
 +   probe[0] != expected[2]*255) {
 +   pass = 0;
 +   }


 The other pixel probing functions in piglit use fabs(probe - expected) 
 piglit_tolerance when comparing actual/expected values. Shouldn't you do
 that here too?

Yeah I did see that, and initially figured it'd be ok to assume that
for just clearing the framebuffer to red/green and reading that back
to compare at byte per component precision I could get away without
it.

Since this api could be useful for other things though, it could be
nice if piglit_set_tolerance_for_bits() affected it in the same way as
the piglit_probe_ functions.

thanks for the feedback, I'll send out an updated patch in a bit.

regards,
- Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl-util: Adds probe_front_pixel_rgb function

2012-09-20 Thread Robert Bragg
This adds an egl_probe_front_pixel_rgb function that is analogous to
piglit_probe_pixel_rgba except it probes the front buffer instead of
probing the back buffer.
---
 tests/egl/egl-util.c |   31 +++
 tests/egl/egl-util.h |4 
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/tests/egl/egl-util.c b/tests/egl/egl-util.c
index 20cd699..4f06cab 100644
--- a/tests/egl/egl-util.c
+++ b/tests/egl/egl-util.c
@@ -37,6 +37,37 @@ static int automatic;
 
 int depth;
 
+int
+egl_probe_front_pixel_rgb(struct egl_state *state,
+ int x, int y, const float *expected)
+{
+   XImage *ximage = XGetImage(state-dpy, state-win,
+  x, state-height - y - 1, 1, 1, AllPlanes, 
ZPixmap);
+   unsigned long pixel = XGetPixel(ximage, 0, 0);
+   uint8_t *probe = (uint8_t *)pixel;
+   int pass = 1;
+   int i;
+
+   XDestroyImage(ximage);
+
+   /* NB: XGetPixel returns a normalized BGRA, byte per
+* component, pixel format */
+   for(i = 0; i  3; ++i) {
+   if (fabs(probe[2 - i]/255.0 - expected[i])  
piglit_tolerance[i]) {
+   pass = 0;
+   }
+   }
+
+   if (pass)
+   return 1;
+
+   printf(Front Buffer Probe at (%i,%i)\n, x, y);
+   printf(  Expected: %f %f %f %f\n, expected[0], expected[1], 
expected[2], expected[3]);
+   printf(  Observed: %f %f %f %f\n, probe[0]/255.0, probe[1]/255.0, 
probe[2]/255.0, probe[3]/255.0);
+
+   return 0;
+}
+
 void
 egl_init_test(struct egl_test *test)
 {
diff --git a/tests/egl/egl-util.h b/tests/egl/egl-util.h
index 87c2db3..e1caa94 100644
--- a/tests/egl/egl-util.h
+++ b/tests/egl/egl-util.h
@@ -62,4 +62,8 @@ egl_util_create_pixmap(struct egl_state *state,
 
 int egl_util_run(const struct egl_test *test, int argc, char *argv[]);
 
+int
+egl_probe_front_pixel_rgb(struct egl_state *state,
+ int x, int y, const float *expected);
+
 #endif
-- 
1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl-nok-swap-region: probe front not back buffer

2012-09-20 Thread Robert Bragg
The egl-nok-swap-buffer was mistakenly reading the back buffer to test
the correctness of eglSwapRegionNOK but once the second glClear() to red
has been issued the whole back buffer is red and so the test would
always passed since it only probed points it expected to be red anyway.

This patch now uses egl_probe_front_pixel_rgb to read the front buffer
instead and in addition to checking that certain pixels should be red,
it also checks that other pixels are still green.
---

I've tweaked the style of this patch, compared to my last, to use the
ARRAY_SIZE macro instead of a sentinel array, as suggested by Brian.

 tests/egl/egl-nok-swap-region.c |   37 +++--
 1 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/tests/egl/egl-nok-swap-region.c b/tests/egl/egl-nok-swap-region.c
index 9124161..b32bec9 100644
--- a/tests/egl/egl-nok-swap-region.c
+++ b/tests/egl/egl-nok-swap-region.c
@@ -40,12 +40,33 @@ draw(struct egl_state *state)
 {
EGLint rects[] = { 
10, 10, 10, 10, 
-   20, 20, 20, 10,
-   40, 30, 10, 20,
+   20, 20, 20, 10, /* wide rect */
+   40, 30, 10, 20, /* tall rect */
50, 50, 10, 10
};
-   PFNEGLSWAPBUFFERSREGIONNOK swap_buffers_region;
+   float green[] = { 0.0, 1.0, 0.0, 1.0};
float red[] = { 1.0, 0.0, 0.0, 1.0};
+   struct {
+   int x, y;
+   const float *expected;
+   } probes[] = {
+   { 15, 15, red },
+   { 15, state-height - 15, green },
+
+   { 25, 25, red },
+   { 35, 25, red },
+   { 25, 35, green },
+   { 25, state-height - 25, green },
+
+   { 45, 35, red },
+   { 45, 45, red },
+   { 55, 35, green },
+   { 45, state-height - 35, green },
+
+   { 55, 55, red },
+   { 55, state-height - 55, green },
+   };
+   PFNEGLSWAPBUFFERSREGIONNOK swap_buffers_region;
int i;
 
swap_buffers_region = (PFNEGLSWAPBUFFERSREGIONNOK) 
@@ -64,9 +85,13 @@ draw(struct egl_state *state)
glClear(GL_COLOR_BUFFER_BIT);
swap_buffers_region(state-egl_dpy, state-surf, 4, rects);
 
-   for (i = 0; i  16; i += 4)
-   if (!piglit_probe_pixel_rgba(rects[i] + 5,
-rects[i + 1] + 5, red))
+   glFinish();
+
+   for (i = 0; i  ARRAY_SIZE(probes); i++)
+   if (!egl_probe_front_pixel_rgb(state,
+  probes[i].x,
+  probes[i].y,
+  probes[i].expected))
return PIGLIT_FAIL;
 
return PIGLIT_PASS;
-- 
1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] SwapBuffersRegionNOK: invert rectangles on y axis

2012-09-19 Thread Robert Bragg
On Tue, Sep 18, 2012 at 5:35 PM, Matt Turner matts...@gmail.com wrote:
 On Tue, Sep 18, 2012 at 8:10 AM, Robert Bragg rob...@sixbynine.org wrote:
 From: Robert Bragg rob...@linux.intel.com

 The EGL_NOK_swap_region2 spec states that the rectangles are specified
 with a bottom-left origin within a surface coordinate space also with a
 bottom left origin, so this patch ensures the rectangles are flipped
 before passing them on to dri2_copy_region.
 ---
  src/egl/drivers/dri2/platform_x11.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)

 diff --git a/src/egl/drivers/dri2/platform_x11.c 
 b/src/egl/drivers/dri2/platform_x11.c
 index 2533774..44e4373 100644
 --- a/src/egl/drivers/dri2/platform_x11.c
 +++ b/src/egl/drivers/dri2/platform_x11.c
 @@ -777,10 +777,9 @@ dri2_swap_buffers_region(_EGLDriver *drv, _EGLDisplay 
 *disp, _EGLSurface *draw,
 if (numRects  (int)ARRAY_SIZE(rectangles))
return dri2_copy_region(drv, disp, draw, dri2_surf-region);

 -   /* FIXME: Invert y here? */
 for (i = 0; i  numRects; i++) {
rectangles[i].x = rects[i * 4];
 -  rectangles[i].y = rects[i * 4 + 1];
 +  rectangles[i].y = dri2_surf-base.Height - rects[i * 4 + 1] - rects[i 
 * 4 + 3];
rectangles[i].width = rects[i * 4 + 2];
rectangles[i].height = rects[i * 4 + 3];
 }
 --
 1.7.7.6

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 Does this fix the egl-nok-texture-from-pixmap piglit test?

 I notice that when I run it it says:

 EGL_Y_INVERTED_NOK: FALSE
 Probe at (50,50)
   Expected: 0.50 0.00 0.50 1.00
   Observed: 0.00 0.00 0.00 1.00

No, I'm afraid it doesn't interact with this so this extension and
this test continue to fail for me with or without this patch.

kind regards,
- Robert
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] build: substitute X11_INCLUDES variable

2012-09-19 Thread Robert Bragg
There are a few automake files that reference $(X11_INCLUDES) such as
src/glx/Makefile.am but configure.ac wasn't declaring the variable for
substitution. This would break builds of glx if libxcb, for example, was
installed in its own prefix since AM_CFLAGS wouldn't coincidentally
list the needed include path in that case.
---
 configure.ac |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/configure.ac b/configure.ac
index 4193496..ef2455b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -898,6 +898,7 @@ GLESv1_CM_PC_LIB_PRIV=-lm -lpthread $DLOPEN_LIBS
 GLESv2_LIB_DEPS=$LIBDRM_LIBS -lm -lpthread $DLOPEN_LIBS
 GLESv2_PC_LIB_PRIV=-lm -lpthread $DLOPEN_LIBS
 
+AC_SUBST([X11_INCLUDES])
 AC_SUBST([GL_LIB_DEPS])
 AC_SUBST([GL_PC_REQ_PRIV])
 AC_SUBST([GL_PC_LIB_PRIV])
-- 
1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] build: substitute X11_INCLUDES variable

2012-09-19 Thread Robert Bragg
On Wed, Sep 19, 2012 at 5:26 PM, Matt Turner matts...@gmail.com wrote:
 On Wed, Sep 19, 2012 at 8:12 AM, Robert Bragg rob...@sixbynine.org wrote:
 There are a few automake files that reference $(X11_INCLUDES) such as
 src/glx/Makefile.am but configure.ac wasn't declaring the variable for
 substitution. This would break builds of glx if libxcb, for example, was
 installed in its own prefix since AM_CFLAGS wouldn't coincidentally
 list the needed include path in that case.
 ---
  configure.ac |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

 diff --git a/configure.ac b/configure.ac
 index 4193496..ef2455b 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -898,6 +898,7 @@ GLESv1_CM_PC_LIB_PRIV=-lm -lpthread $DLOPEN_LIBS
  GLESv2_LIB_DEPS=$LIBDRM_LIBS -lm -lpthread $DLOPEN_LIBS
  GLESv2_PC_LIB_PRIV=-lm -lpthread $DLOPEN_LIBS

 +AC_SUBST([X11_INCLUDES])
  AC_SUBST([GL_LIB_DEPS])
  AC_SUBST([GL_PC_REQ_PRIV])
  AC_SUBST([GL_PC_LIB_PRIV])
 --
 1.7.7.6

 Reviewed-by: Matt Turner matts...@gmail.com

 Do you need me to commit this?

yes please
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] SwapBuffersRegionNOK: invert rectangles on y axis

2012-09-18 Thread Robert Bragg
From: Robert Bragg rob...@linux.intel.com

The EGL_NOK_swap_region2 spec states that the rectangles are specified
with a bottom-left origin within a surface coordinate space also with a
bottom left origin, so this patch ensures the rectangles are flipped
before passing them on to dri2_copy_region.
---
 src/egl/drivers/dri2/platform_x11.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index 2533774..44e4373 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -777,10 +777,9 @@ dri2_swap_buffers_region(_EGLDriver *drv, _EGLDisplay 
*disp, _EGLSurface *draw,
if (numRects  (int)ARRAY_SIZE(rectangles))
   return dri2_copy_region(drv, disp, draw, dri2_surf-region);
 
-   /* FIXME: Invert y here? */
for (i = 0; i  numRects; i++) {
   rectangles[i].x = rects[i * 4];
-  rectangles[i].y = rects[i * 4 + 1];
+  rectangles[i].y = dri2_surf-base.Height - rects[i * 4 + 1] - rects[i * 
4 + 3];
   rectangles[i].width = rects[i * 4 + 2];
   rectangles[i].height = rects[i * 4 + 3];
}
-- 
1.7.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glx/dri2: use GLX prefix to enable INTEL_swap_event

2012-02-08 Thread Robert Bragg
This adds the GLX_ prefix to the string we pass to
__glXEnableDirectExtension() otherwise it doesn't match the name we have
in known_glx_extensions[] in glxextensions.c and doesn't get enabled.

This mistake wasn't noticed before since GLX_INTEL_swap_event was being
implicitly enabled for GLX v1.4 but that was changed in commit
d3f7597bc9f6d5

Signed-off-by: Robert Bragg rob...@sixbynine.org
---
 src/glx/dri2_glx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index 10b6f52..1fd8c99 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -950,7 +950,7 @@ dri2BindExtensions(struct dri2_screen *psc, const 
__DRIextension **extensions)
__glXEnableDirectExtension(psc-base, GLX_SGI_make_current_read);
 
/* FIXME: if DRI2 version supports it... */
-   __glXEnableDirectExtension(psc-base, INTEL_swap_event);
+   __glXEnableDirectExtension(psc-base, GLX_INTEL_swap_event);
 
if (psc-dri2-base.version = 3) {
   const unsigned mask = psc-dri2-getAPIMask(psc-driScreen);
-- 
1.7.7.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx/dri2: use GLX prefix to enable INTEL_swap_event

2012-02-08 Thread Robert Bragg
On Feb 8, 2012 8:38 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:

 On Wed,  8 Feb 2012 19:49:54 +, Robert Bragg rob...@sixbynine.org
wrote:
  This adds the GLX_ prefix to the string we pass to
  __glXEnableDirectExtension() otherwise it doesn't match the name we have
  in known_glx_extensions[] in glxextensions.c and doesn't get enabled.
 
  This mistake wasn't noticed before since GLX_INTEL_swap_event was being
  implicitly enabled for GLX v1.4 but that was changed in commit
  d3f7597bc9f6d5
 
  Signed-off-by: Robert Bragg rob...@sixbynine.org

 This sounds related to
 https://bugs.freedesktop.org/show_bug.cgi?id=43667
 -Chris

I don't think this will fix that bug, the driver they are using is
reporting the extension it's just that it has some odd startup behaviour.

kind regards,
- Robert


 --
 Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev