Re: time for amber2 branch?

2024-06-21 Thread Ian Romanick

On 6/20/24 7:20 AM, Erik Faye-Lund wrote:

On Wed, 2024-06-19 at 10:33 -0400, Mike Blumenkrantz wrote:

In looking at the gallium tree, I'm wondering if it isn't time for a
second amber branch to prune some of the drivers that cause pain when
doing big tree updates:

* nv30
* r300
* r600
* lima
* virgl
* tegra
* ???

There's nothing stopping these drivers from continuing to develop in
an amber branch, but the risk of them being broken by other tree
refactorings is lowered, and then we are able to delete lots of
legacy code in the main branch.

Thoughts?


When we did Amber, we had a lot better reason to do so than "these
drivers cause pain when doing big tree updates". The maintenance burden
imposed by the drivers proposed for removal here is much, much smaller,
and doesn't really let us massively clean up things in a way comparable
to last time.


I was going to say basically the same thing.


I'm not convinced that this is a good idea. Most (if not all) of these
drivers are still useful, and several of them are actively maintained.
Pulling them out of main makes very little sense to me.

What exactly are you hoping to gain from this? If it's just that
they're old hardware with less capabilities, perhaps we can address the
problems from that in a different way, by (for instance) introducing a
"legacy hw" gallium layer, so legacy HW details doesn't have to leak
out into the rest of gallium...




Re: Replacing NIR with SPIR-V?

2022-01-24 Thread Ian Romanick

On 1/23/22 12:10 PM, Dave Airlie wrote:

On Sun, 23 Jan 2022 at 22:58, Abel Bernabeu
 wrote:


Yes, NIR arrays and struct and nir_deref to deal with them but, by the time you 
get into the back-end, all the nir_derefs are gone and you're left with 
load/store messages with actual addresses (either a 64-bit memory address or a 
index+offset pair for a bound resource).  Again, unless you're going to dump 
straight into LLVM, you really don't want to handle that in your back-end 
unless you really have to.



That is the thing: there is already a community maintained LLVM backend for 
RISC-V and I need to see how to get value from that effort. And that is a very 
typical escenario for new architectures. There is already an LLVM backend for a 
programmable device and someone asks: could you do some graphics around this 
without spending millions?


No.

If you want something useful, it's going to cost millions over the
lifetime of creating it. This stuff is hard, it needs engineers who
understand it and they usually have to be paid.

RISC-V as-is isn't going to make a good compute core for a GPU. I
don't think any of the implementations are the right design. as long
as people get sucked into thinking it might, there'll be millions
wasted. SIMT vs SIMD is just making SSE-512 type decisions or
recreating Intel Larrabee efforts. Nobody has made an effective GPU in
this fashion. You'd need someone to create a new GPU with it's own
instruction set (maybe dervied from RISC-V), but with it's own
specialised compute core.

The alternate more tractable project is to possibly make sw rendering
(with llvmpipe) on RISC-V more palatable, but that's really just
optimising llvmpipe and the LLVM backend and maybe finding a few
instructions to enhance things. It might be possible to use a texture
unit to speed things up and really for software rendering and hw
rendering, memory bandwidth is a lot of the problem to solve.


For the love of all that is good in the world, no! :) That was my 
original master's project that I gave up on.


Executive summary: There's a reason GPUs have huge piles of 
fixed-function blocks.  It's the only way to get enough power 
efficiency, and power consumption (and the heat it generates) is *the* 
problem.



Dave.




Re: git and Marge troubles this week

2022-01-07 Thread Ian Romanick
Blarg. That all sounds awful.  I think (hope!) I speak for everyone when 
I say that we all appreciate your and daniels' efforts to keep this big 
piece of machinery working.


If the problems persist much longer, should we consider pushing out the 
22.0 branch point?


On 1/6/22 9:36 PM, Emma Anholt wrote:

As you've probably noticed, there have been issues with git access
this week.  The fd.o sysadmins are desperately trying to stay on
vacation because they do deserve a break, but have still been working
on the problem and a couple of solutions haven't worked out yet.
Hopefully we'll have some news soon.

Due to these ongoing git timeouts, our CI runners have been getting
bogged down with stalled jobs and causing a lot of spurious failures
where the pipeline doesn't get all its jobs assigned to runners before
Marge gives up.  Today, I asked daniels to bump Marge's pipeline
timeout to 4 hours (up from 1).  To get MRs flowing at a similar rate
despite the longer total pipeline times, we also enabled batch mode as
described at 
https://github.com/smarkets/marge-bot/blob/master/README.md#batching-merge-requests.

It means there are now theoretical cases as described in the README
where Marge might merge a set of code that leaves main broken.
However, those cases are pretty obscure, and I expect that failure
rate to be much lower than the existing "you can merge flaky code"
failure rate and worth the risk.

Hopefully this gets us all productive again.


[Mesa-dev] One more thing to cut from the main branch...

2021-04-27 Thread Ian Romanick
If we're going to cut all the classic drivers and a handful of older
Gallium drivers... can we also cut Apple GLX?  Apple comes around every
couple years to fix breakages that have crept in, and we periodically
have compile breaks that need fixing (see
https://gitlab.freedesktop.org/mesa/mesa/-/issues/4702).  As far as I
can tell, having it in the main branch provides zero value to anyone...
including Apple.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 3/4] drm/i915/uapi: convert i915_query and friend to kernel doc

2021-04-15 Thread Ian Romanick
On 4/15/21 8:59 AM, Matthew Auld wrote:
> Add a note about the two-step process.
> 
> Suggested-by: Daniel Vetter 
> Signed-off-by: Matthew Auld 
> Cc: Joonas Lahtinen 
> Cc: Jordan Justen 
> Cc: Daniel Vetter 
> Cc: Kenneth Graunke 
> Cc: Jason Ekstrand 
> Cc: Dave Airlie 
> Cc: dri-de...@lists.freedesktop.org
> Cc: mesa-dev@lists.freedesktop.org
> ---
>  include/uapi/drm/i915_drm.h | 57 ++---
>  1 file changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index d9c954a5a456..ef36f1a0adde 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2210,14 +2210,23 @@ struct drm_i915_perf_oa_config {
>   __u64 flex_regs_ptr;
>  };
>  
> +/**
> + * struct drm_i915_query_item - An individual query for the kernel to 
> process.
> + *
> + * The behaviour is determined by the @query_id. Note that exactly what

Since we just had a big discussion about this on mesa-dev w.r.t. Mesa
code and documentation... does the kernel have a policy about which
flavor (pun intended) of English should be used?

> + * @data_ptr is also depends on the specific @query_id.
> + */
>  struct drm_i915_query_item {
> + /** @query_id: The id for this query */
>   __u64 query_id;
>  #define DRM_I915_QUERY_TOPOLOGY_INFO1
>  #define DRM_I915_QUERY_ENGINE_INFO   2
>  #define DRM_I915_QUERY_PERF_CONFIG  3
>  /* Must be kept compact -- no holes and well documented */
>  
> - /*
> + /**
> +  * @length:
> +  *
>* When set to zero by userspace, this is filled with the size of the
>* data to be written at the data_ptr pointer. The kernel sets this
>* value to a negative value to signal an error on a particular query
> @@ -2225,21 +2234,26 @@ struct drm_i915_query_item {
>*/
>   __s32 length;
>  
> - /*
> + /**
> +  * @flags:
> +  *
>* When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0.
>*
>* When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the
> -  * following :
> -  * - DRM_I915_QUERY_PERF_CONFIG_LIST
> -  * - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID
> -  * - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID
> +  * following:
> +  *
> +  *  - DRM_I915_QUERY_PERF_CONFIG_LIST
> +  *  - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID
> +  *  - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID
>*/
>   __u32 flags;
>  #define DRM_I915_QUERY_PERF_CONFIG_LIST  1
>  #define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2
>  #define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID   3
>  
> - /*
> + /**
> +  * @data_ptr:
> +  *
>* Data will be written at the location pointed by data_ptr when the
>* value of length matches the length of the data to be written by the
>* kernel.
> @@ -2247,16 +2261,37 @@ struct drm_i915_query_item {
>   __u64 data_ptr;
>  };
>  
> +/**
> + * struct drm_i915_query - Supply an array of drm_i915_query_item for the 
> kernel
> + * to fill out.
> + *
> + * Note that this is generally a two step process for each 
> drm_i915_query_item
> + * in the array:
> + *
> + *   1.) Call the DRM_IOCTL_I915_QUERY, giving it our array of
> + *   drm_i915_query_item, with drm_i915_query_item.size set to zero. The
> + *   kernel will then fill in the size, in bytes, which tells userspace how
> + *   memory it needs to allocate for the blob(say for an array of
> + *   properties).
> + *
> + *   2.) Next we call DRM_IOCTL_I915_QUERY again, this time with the
> + *   drm_i915_query_item.data_ptr equal to our newly allocated blob. Note
> + *   that the i915_query_item.size should still be the same as what the
> + *   kernel previously set. At this point the kernel can fill in the blob.
> + *
> + */
>  struct drm_i915_query {
> + /** @num_items: The number of elements in the @items_ptr array */
>   __u32 num_items;
>  
> - /*
> -  * Unused for now. Must be cleared to zero.
> + /**
> +  * @flags: Unused for now. Must be cleared to zero.
>*/
>   __u32 flags;
>  
> - /*
> -  * This points to an array of num_items drm_i915_query_item structures.
> + /**
> +  * @items_ptr: This points to an array of num_items drm_i915_query_item
> +  * structures.
>*/
>   __u64 items_ptr;
>  };
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-03-30 Thread Ian Romanick
On 3/25/21 3:13 PM, Jason Ekstrand wrote:
> On Thu, Mar 25, 2021 at 4:32 PM Kenneth Graunke  wrote:
>>
>> On Thursday, March 25, 2021 2:04:45 PM PDT Ian Romanick wrote:
>>> On 3/25/21 10:49 AM, Jason Ekstrand wrote:
>>> Can you be more specific? Also, is there a reason why that work can't
>>> or shouldn't be done directly in the LTS branch?  As Ken pointed out,
>>
>> The bulk of things that I had going were to enable some extensions and
>> make those extensions non-optional.  ARB_framebuffer_object was at the
>> top of the list, but there were a couple of others.  I think ARB_fbo
>> also affected the Gallium nouveau driver.  Some of that was derailed
>> when I wasn't able to remove (classic) support for NV04 and NV05... and
>> I don't remember exactly where I left it.  I would expect some of those
>> kinds of changes to also happen in post-fork mainline too.
> 
> Hrm... That is a genuinely interesting extension on those platforms.
> From a "make it non-optional and dead-code" perspective, we can do
> that on mainline immediately post-fork easily enough.  Enabling it on
> legacy could be done separately.  If we wanted to also clean up the
> core on the legacy branch then, yeah, it'd have to be done twice.
> 
>>>> I'm not sure we want to totally declare those drivers dead. People can
>>>> still do feature or enhancement development of they want to, it just
>>>> happens in a different branch.
>>>>
>>>> I think we need to be more clear about what "LTS mode" means for
>>>> developers and users here. It isn't that they can never change our be
>>>> improved. It's that we've gotten to the point where what they're
>>>> getting from being in the active development tree is breakage, not
>>>> "free" improvements. We're suggesting changing the social contract
>>>> with users, so to speak, from "these drivers still pick up
>>>> improvements from core" to "we won't break these drivers when we make
>>>> improvements to core in master."
>>>
>>> That is interesting.  I doesn't sound very much like "long term stable"
>>> as in the original proposal.  What would the versioning scheme be?  In
>>> the original proposal, I would have expected versions go to 21.1.∞.  How
>>> would that work if some version adds a feature?  Would the versions (and
>>> branches from the branch?) follow the usual Mesa Year.Quarter.x scheme?
> 
> I think we should make a distinction between what users should expect
> and what devs are allowed to do.  It should be LTS from the
> perspective that users shouldn't expect new features.  Then again,
> they really shouldn't expect new features on those drivers anyway.
> ¯\_(ツ)_/¯
> 
>> That's a good point.  It's a bit expanded from "put out to pasture" so
>> maybe "-lts" isn't great.  We could call it "-classic", unless Marek
>> wants to include r300g in there too.  "-legacy" seems reasonable.  It
>> looks like NVIDIA has various "legacy" branches with a version number
>> based on the original version where things branched off from.
>>
>> So maybe we could do:
>>
>>- mesa-legacy21 21.1.X
>>
>> where X increments on every release without worrying whether it adds
>> features or simply bug fixes.  We figure true features and major
>> development will be pretty rare anyway.
> 
> Yeah, I don't know that we need to worry too much about stable vs.
> feature releases on the legacy branch.  If we still wanted a concept
> of major releases, we could slow it down to 1/year or something.
> Really, whatever makes it easiest for the release maintainers is fine
> with me.

Since I suspect Gentoo will be one of the few distros that carry this
branch for a long time, I asked Matt about this.  I think he's fine with
infrequent YY.MM or YY.QQ.xx type releases where the least significant
part is incremented for non-feature changes and the YY is incremented
every year regardless of the kind of change.

As long as I can continue to count on using the Intel CI from time to
time to test the legacy branch... I'm okay with splitting whenever.

>>>> So, unless there's a solid reason why such work needs to happen in the
>>>> master branch, I don't see a reason to hold this up for it.  As long
>>>> as you're committed to test r200 and i915 when you make said changes,
>>>> we can do a feature release on the LTS branch. Users and distros ju

Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-03-25 Thread Ian Romanick
On 3/25/21 10:49 AM, Jason Ekstrand wrote:
> Can you be more specific? Also, is there a reason why that work can't
> or shouldn't be done directly in the LTS branch?  As Ken pointed out,

The bulk of things that I had going were to enable some extensions and
make those extensions non-optional.  ARB_framebuffer_object was at the
top of the list, but there were a couple of others.  I think ARB_fbo
also affected the Gallium nouveau driver.  Some of that was derailed
when I wasn't able to remove (classic) support for NV04 and NV05... and
I don't remember exactly where I left it.  I would expect some of those
kinds of changes to also happen in post-fork mainline too.

Some of the other stuff... would actually be easier after the split
because I wouldn't have to deal with Windows compilers.

> I'm not sure we want to totally declare those drivers dead. People can
> still do feature or enhancement development of they want to, it just
> happens in a different branch.
> 
> I think we need to be more clear about what "LTS mode" means for
> developers and users here. It isn't that they can never change our be
> improved. It's that we've gotten to the point where what they're
> getting from being in the active development tree is breakage, not
> "free" improvements. We're suggesting changing the social contract
> with users, so to speak, from "these drivers still pick up
> improvements from core" to "we won't break these drivers when we make
> improvements to core in master."

That is interesting.  I doesn't sound very much like "long term stable"
as in the original proposal.  What would the versioning scheme be?  In
the original proposal, I would have expected versions go to 21.1.∞.  How
would that work if some version adds a feature?  Would the versions (and
branches from the branch?) follow the usual Mesa Year.Quarter.x scheme?

> So, unless there's a solid reason why such work needs to happen in the
> master branch, I don't see a reason to hold this up for it.  As long
> as you're committed to test r200 and i915 when you make said changes,
> we can do a feature release on the LTS branch. Users and distros just
> shouldn't expect that to be a common thing.

This inverts the current testing problem.  Right now, r200 and i915 are
poorly tested, but 965G through Sandybridge are very well tested.  While
I can totally test core changes on r200 and i915, how would I verify
that those changes don't break, say, Ironlake?

> --Jason
> 
> On Tue, Mar 23, 2021 at 3:26 AM Ian Romanick  wrote:
>>
>> I would like to wait a couple more releases to do this.  I have a couple
>> things that I've been gradually working on for some of the non-i965
>> classic drivers that I'd like to land before they're put out to pasture.
>>  I talked to ajax about this a few weeks ago, and he was amenable at the
>> time.
>>
>> I can step up on testing at least r200 to make sure core changes aren't
>> making things explode.  That should also cover most of the problems that
>> could hit i915.  i965 gets good coverage in the Intel CI, so I don't
>> think that's as big of a problem.
>>
>> On 3/22/21 3:15 PM, Dylan Baker wrote:
>>> Hi list,
>>>
>>> We've talked about it a number of times, but I think it's time time to
>>> discuss splitting the classic drivers off of the main development branch
>>> again, although this time I have a concrete plan for how this would
>>> work.
>>>
>>> First, why? Basically, all of the classic drivers are in maintanence
>>> mode (even i965). Second, many of them rely on code that no one works
>>> on, and very few people still understand. There is no CI for most of
>>> them, and the Intel CI is not integrated with gitlab, so it's easy to
>>> unintentionally break them, and this breakage usually isn't noticed
>>> until just before or just after a release. 21.0 was held up (in small
>>> part, also me just getting behind) because of such breakages.
>>>
>>> I konw there is some interest in getting i915g in good enough shape that
>>> it could replace i915c, at least for the common case. I also am aware
>>> that Dave, Ilia, and Eric (with some pointers from Ken) have been
>>> working on a gallium driver to replace i965. Neither of those things are
>>> ready yet, but I've taken them into account.
>>>
>>> Here's the plan:
>>>
>>> 1) 21.1 release happens
>>> 2) we remove classic from master
>>> 3) 21.1 reaches EOL because of 21.2
>>> 4) we fork the 21.1 branc

Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-03-23 Thread Ian Romanick
I would like to wait a couple more releases to do this.  I have a couple
things that I've been gradually working on for some of the non-i965
classic drivers that I'd like to land before they're put out to pasture.
 I talked to ajax about this a few weeks ago, and he was amenable at the
time.

I can step up on testing at least r200 to make sure core changes aren't
making things explode.  That should also cover most of the problems that
could hit i915.  i965 gets good coverage in the Intel CI, so I don't
think that's as big of a problem.

On 3/22/21 3:15 PM, Dylan Baker wrote:
> Hi list,
> 
> We've talked about it a number of times, but I think it's time time to
> discuss splitting the classic drivers off of the main development branch
> again, although this time I have a concrete plan for how this would
> work.
> 
> First, why? Basically, all of the classic drivers are in maintanence
> mode (even i965). Second, many of them rely on code that no one works
> on, and very few people still understand. There is no CI for most of
> them, and the Intel CI is not integrated with gitlab, so it's easy to
> unintentionally break them, and this breakage usually isn't noticed
> until just before or just after a release. 21.0 was held up (in small
> part, also me just getting behind) because of such breakages.
> 
> I konw there is some interest in getting i915g in good enough shape that
> it could replace i915c, at least for the common case. I also am aware
> that Dave, Ilia, and Eric (with some pointers from Ken) have been
> working on a gallium driver to replace i965. Neither of those things are
> ready yet, but I've taken them into account.
> 
> Here's the plan:
> 
> 1) 21.1 release happens
> 2) we remove classic from master
> 3) 21.1 reaches EOL because of 21.2
> 4) we fork the 21.1 branch into a "classic-lts"¹ branch
> 5) we disable all vulkan and gallium drivers in said branch, at least at
>the Meson level
> 6) We change the name and precidence of the glvnd loader file
> 7) apply any build fixups (turn of intel generators for versions >= 7.5,
>for example
> 8) maintain that branch with build and critical bug fixes only
> 
> This gives ditros and end users two options.
> 1) then can build *only* the legacy branch in the a normal Mesa provides
>libGL interfaces fashion
> 2) They can use glvnd and install current mesa and the legacy branch in
>parallel
> 
> Because of glvnd, we can control which driver will get loaded first, and
> thus if we decide i915g or the i965 replacement is ready and turn it on
> by default it will be loaded by default. An end user who doesn't like
> this can add a new glvnd loader file that makes the classic drivers
> higher precident and continue to use them.
> 
> Why fork from 21.1 instead of master?
> 
> First, it allows us to delete classic immediately, which will allow
> refactoring to happen earlier in the cycle, and for any fallout to be
> caught and hopefully fixed before the release. Second, it means that
> when a user is switched from 21.1 to the new classic-lts branch, there
> will be no regressions, and no one has to spend time figuring out what
> broke and fixing the lts branch.
> 
> When you say "build and critical bug fixes", what do you mean?
> 
> I mean update Meson if we rely on something that in the future is
> deprecated and removed, and would prevent building the branch or an
> relying on some compiler behavior that changes, gaping exploitable
> security holes, that kind of thing.
> 
> footnotes
> ¹Or whatever color you like your bikeshed
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] docs: consistent language

2021-03-15 Thread Ian Romanick
On 3/15/21 5:44 AM, Erik Faye-Lund wrote:
> TLDR; I'm proposing to standardize on US English in our public-facing
> documentation.
> 
> I proposed an MR a while back that changed the only occurrence of the
> UK English spelling "optimisation" for the US English spelling
> "optimization", which is already used 34 times in the docs. I've done
> similar changes in the past.
> 
> But this time Ian brought up the good point that picking a preferred
> language should probably be done using group consensus on the mailing
> list than just picking what's currently most popular.

Over the years we've had quite a few contributions from folks in many
different English spelling parts of the world where the UK spelling is
the norm... though I don't think there have been that many from the UK
itself. :)  I suggested getting some group consensus because I didn't
want any of those people to feel left out or undervalued.

If anyone doesn't feel comfortable speaking out publicly about this,
please feel free to contact Erik or me privately.

> So, I hereby propose to pick US English as our default language for
> user-facing documentation.
> 
> I'm not proposing to force anyone to use a particular English variant
> for things like comments, commit messages or variable names. I don't
> think there's much value in enforcing that, and I think it'd be
> perfectly fine if any particular driver wanted to pick a particular
> variant of English.
> 
> The main reason I want to standardize on an English variant is that I'm
> trying create a word-list for domain-specific/technical language to
> detect spelling mistakes in the docs more easily. And not having to add
> both US and UK English words makes this list easier to maintain. I'm
> not planning on going 100% systematically though the dictionary to
> detect UK English words that might be in my system dictionary, just to
> fix the words that currently cause me (a tiny amount of) pain ;)
> 
> The main reason why I'm proposing US English over for instance UK
> English is that this seems to be the dominant variant currently in the
> documentation. So it seems like the pragmatic choice to me.
> 
> Thoughts? Any objections to sticking to US English in the docs?
> 
> The MR in question:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8572
> 
> Ian's response:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8572#note_808593
> 
> Previous changes:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6864
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6894
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] SpvOpSelect w/ float operands

2020-11-17 Thread Ian Romanick
On 11/17/20 9:25 AM, Brian Paul wrote:
> 
> Using the Intel Vulkan driver, we've found some cases where SpvOpSelect
> is returning -0.0 (negative zeros) instead of normal 0.0 depending on
> the arguments.

Do you have a specific test case that fails?

It seems like on some platforms there was an errata about the version of
the SEL instruction that is used for min or max that can return the
wrong signed zero in some cases.

It's also possible that some optimizations are causing problems.  I
don't remember exactly how it works in SPIR-V, but does marking those
SPIR-V instructions as precise (that's what it was in GLSL) make a
difference?

> I'm wondering if "SpvOpSelect x, a, b" for floats is being implemented
> with something like "a*x + b*(1-x)" ?  That might explain where the
> negative zeros are coming from.
> 
> Our work-around is to implement selection with bitwise operations: (a &
> x) | (b & ~x)
> 
> It seems to me that SpvOpSelect shouldn't interpret the bits and just
> return an exact copy of the argument.
> 
> -Brian
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GLSLstd450NMin/NMax/NClamp

2020-11-17 Thread Ian Romanick
On 11/17/20 9:25 AM, Brian Paul wrote:
> 
> It appears these SPIR-V extension functions don't behave as they should
> on Intel (don't know about other Vulkan drivers).
> 
> They're supposed to be NaN-aware such that if one argument is NaN, the
> other argument is returned.  From our testing, it looks like NMax works
> as expected, but not NMin or NClamp.

Do you have some specific test cases that fail?

> Looking at the SPIR-V/nir/intel code it's hard to tell what's going on
> and whether these semantics are actually being followed.
> 
> Any comments?
> 
> -Brian
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] piglit merge access

2020-10-13 Thread Ian Romanick
As one of the people guilty of forgetting to push your MRs... this is a
very reasonable request, and I support it fully.

On 10/13/20 1:12 AM, andrey simiklit wrote:
> Hi,
> 
> I would like to request merge access for piglit gitlab. I have
> contributed a number of commits for mesa/piglit. It would help in my
> work because sometimes even already reviewed MRs remain not pushed for
> months.
> 
> Regards,
> Andrii (asimiklit).
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rename "master" branch to "main"?

2020-08-03 Thread Ian Romanick
On 8/3/20 8:30 AM, Jason Ekstrand wrote:
> All,
> 
> I'm sure by now you've all seen the articles, LKML mails, and other
> chatter around inclusive language in software.  While mesa doesn't
> provide a whole lot of documentation (hah!), we do have a website, a
> code-base, and a git repo and this is something that we, as a project
> should consider.
> 
> What I'm proposing today is simply re-naming the primary Git branch
> from "master" to "main".  Why "main"?  Because that's what GitHub has
> chosen "main" as their new default branch name and so it sounds to me
> like the most likely new default.
> 
> As far as impact on the project goes, if and when we rename the
> primary branch, the old "master" branch will be locked (no
> pushing/merging allowed) and all MRs will have to be re-targeted
> against the new branch.  Fortunately, that's very easy to do.  You
> just edit the MR and there's a little drop-down box at the top for
> which branch it targets.  I just tested this with one of mine and it
> seems to work ok.
> 
> As far as other bits of language in the code-base, I'm happy to see
> those cleaned up as people have opportunity.  I'm not aware of any
> particularly egregious offenders.  However, changing the name of the
> primary branch is something which will cause a brief hiccup in
> people's development process and so warrants broader discussion.

I looked at this a week or so ago.  My recollection is that there were a
couple other instanced of master (that were references to field names in
a DRM header) and a couple instances of whitelist / blacklist.
Coordinating renaming the field from the DRM header will be fun, but
everything else should be trivial... and would make for good newbie
tasks. :)

> Thoughts?
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] nir: find_msb vs clz

2020-04-07 Thread Ian Romanick
On 4/1/20 11:52 AM, Eric Anholt wrote:
> I would generally be of the opinion that we should have NIR opcodes
> that match any common hardware instructions, and lowering in algebraic
> to help turn input patterns into clean sequences of hardware
> instructions.

There is quite a bit of benefit to having a single canonical
representation of things in the IR.  Whenever there are multiple ways of
doing the same thing, various passes need to be aware and handle all of
them.  I have two concrete examples.

In NIR there is a fsub(x, y) instruction, but we very quickly convert
that to fadd(fneg(x), y).  If we didn't, every pattern in opt_algebraic
that handles fadd would also need a variant for fsub.  If a pattern had
four instances of fadd, it would need 16 variants.

In NIR there is pack_half_2x16 and pack_half_2x16_split.  I just noticed
the other day that 1f72857739be added some optimization patterns for one
but not the other.  I'll have an MR soon that adds them.

It seems like most of the time when there are architecture specific
details creep into NIR instructions, it is done to overcome deficiencies
in the backend IR.  I know that I have done this, and I don't think it's
a problem per se.  However, care should be taken.  I have tried to do
most of these kinds of lowering during much later optimization passes,
for example, to prevent the need for a combinatorial explosion in the
number of patterns in the main block of algebraic optimizations.

There definitely are problems with having a billion patterns in
opt_algebraic.  See !3765 for some discussion on this topic.  Also, as
the number of patterns increases, the size of the state transition
tables increases quadratically.  I suspect we're going to want / need to
refactor the single, giant table of algebraic optimizations in the not
too distant future.

Munchnick has this idea of "levels" of IR.  The IR itself (data
structures) is the same, but the set of allowable constructs changes as
the program proceeds through the phases of compilation.  We have some of
that now with source modifiers and 1-bit vs. 32-bit Booleans.  What we
lack is a way for passes to advertise what "levels" they support or to
enforce what features exist at a given time.  I don't know that we need
something that rigid, but right now you just have to know that kinds of
instructions should be able to exist at different points during
compilation.  It's easy to make mistakes, and it's difficult to detect
some classes of those mistakes.

I'm having a conversation with Rhys about this topic in !3151 right now.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] nir: find_msb vs clz

2020-04-02 Thread Ian Romanick
On 4/1/20 11:39 AM, Erik Faye-Lund wrote:
> While working on the NIR to DXIL conversion code for D3D12, I've
> noticed that we're not exactly doing the best we could here.
> 
> First some background:
> 
> NIR currently has a few instructions that does kinda the same:
> 
> 1. nir_op_ufind_msb: Finds the index of the most significant bit,
> counting from the least significant bit. It returns -1 on zero-input.
> 
> 2. nir_op_ifind_msb: A signed version of ufind_msb; looks for the first
> non sign-bit. It's not terribly interesting in this context, as it can
> be trivially lowered if missing, and it doesn't seem like any hardware
> supports this natively. I'm just mentioning it for completeness.

These instructions map almost directly to GLSL findMSB().

> 3. nir_op_uclz: Counts the amount of leading zeroes, counding from the
> most significant bit. It returns 32 on zero-input, and only exist in an
> unsigned 32-bit variation.
> 
> ufind_msb is kinda the O.G here, uclz was recently added, and is as far
> as I can see only used in an intel-specific SPIR-V instruction.
> 
> Additionally, there's the OpenCLstd_Clz SPIR-V instruction, which we
> lower to ufind_msb using nir_clz_u(), regardless if the backend
> supports nir_op_uclz or not.

That extension (mostly) brings an handful of OpenCL instructions to
graphics.  The only outlier is the instruction for 32-bit × 16-bit
multiplication.

> It seems only the nouveau's NV50 backend actually wants ufind_msb,
> everything else seems to convert ufind_msb to some clz-variant while
> emitting code. Some have to special-case on zero-input, and some
> not... 
> 
> All of this is not really awesome in my eyes.
> 
> So, while adding support for DXIL, I need to figure out how to map
> these (well, ufind_msb at least) onto the DXIL intrinsics. DXIL doesn't
> have a ufind_msb, but it has a firstbit_hi that is identical to
> nir_op_uclz... except that it returns -1 on zero-input :(

Here's the first question you should be asking: how often does this
occur in real shaders?  Is it worth caring about generating the optimal
thing?  As Jason pointed out in a different message, the GLSL findMSB
functions seem to occur within epsilon of never.  If the same is true
for the DXIL instructions, is it worth spending any effort?

Honestly, until there is a known user, I would just do the thing that
has the least impact and move on.  Queue the old quote about premature
optimization...

> For now, I'm lowering ufind_msb to something ufind_msb while emitting
> code, like everyone else. But this feels a bit dirty, *especially*
> since we have a clz-instruction that *almost* fits. And since we're
> targetting OpenCL, which use clz as it's primitive, we end up doing 32
> - (32 - x), and since that inner isub happens while emitting, we can't
> easily optimize it away without introducing an optimizing backend...

If 12+ years of graphics compiler experience has taught us anything, it
has taught us that you need an optimizing backend.  Maybe not as job #1,
but probably very, very soon.  Being able to recognize common code
patterns like 32 - fbh(x) and generating the right thing is not that
hard.  If you have a code-generator generator (!2680), it is trivial.

> The solution seems obvious; use nir_op_uclz instead.
> 
> But that's also a bit annoying, for a few reasons:
> 
> 1. Only *one* backend actually implements support for it. So this
> either means a lot of work, or making it an opt-in feature somehow.
> 
> 2. We would probably have to support lowering in either direction to
> support what all hardware prefers.
> 
> 3. That zero-case still needs special treatment in several backends, it
> seems. We could alternatively declare that nir_op_uclz is undefined for
> zero-input, and handle this when lowering...?
> 
> 4. It seems some (Intel?) hardware only supports 32-bit clz, so we
> would have to lower to something else for other bit-sizes. That's not
> too hard, though.
> 
> So yeah...
> 
> I guess the first step would be to add a switch to use nir_uclz()
> instead of nir_clz_u() when handling OpenCLstd_Clz in vtn.
> 
> Next, I guess I would add a lower_ufind_msb flag to
> nir_shader_compiler_options, and make nir_opt_algebraic.py lower
> ufind_msb to uclz.
> 
> Finally, we can start implementing support for this in more drivers,
> and flip on some switches.
> 
> I'm still not really sold on what to do about the special-case for
> zero... By making it undefined, I think we're just punishing all
> backends, just in the name of making the compiler backends a bit
> simpler, so that doesn't seem too good of an idea either.
> 
> Does anyone have a better idea? I would kinda love to optimize away the
> zero-case if it's obvious that it's impossible, e.g cases like "clz(x |
> 1)"... 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-d

Re: [Mesa-dev] [PATCH kmscube] texturator: Use !! for boolean assignment

2020-03-31 Thread Ian Romanick
On 3/31/20 12:25 PM, Fabio Estevam wrote:
> The 'complemented' variable is a pointer to boolean. Use the !! operator
> to fix the following build warning:
> 
> ../texturator.c:603:45: warning: '*' in boolean context, suggest '&&' instead 
> [-Wint-in-bool-context]
>   *complemented = (((float)rgba[2]) / 255.0) / 0.25;
> 
> Signed-off-by: Fabio Estevam 
> ---
>  texturator.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/texturator.c b/texturator.c
> index a450dfe..d31b601 100644
> --- a/texturator.c
> +++ b/texturator.c
> @@ -602,7 +602,7 @@ static void extract_pix(uint8_t *rgba, int *slice, int 
> *level, bool *complemente
>  {
>   *slice = (((float)rgba[0]) / 255.0) * 8.0;
>   *level = (((float)rgba[1]) / 255.0) * 16.0;
> - *complemented = (((float)rgba[2]) / 255.0) / 0.25;
> + *complemented = !!(((float)rgba[2]) / 255.0) / 0.25;

I don't know how others feel, but I know Matt hates this idiom.  I'm not
terribly fond of it either.  I think we both prefer either casting to
bool or x != 0.0.  But... I don't feel that strongly.

>  }
>  
>  static bool probe_pix(int x, int y, int w, int h, int s, int m)
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ian Romanick
On 2/24/20 11:21 AM, Ilia Mirkin wrote:
> On Mon, Feb 24, 2020 at 1:10 PM Ian Romanick  wrote:
>>
>> On 2/23/20 5:57 PM, Ilia Mirkin wrote:
>>> ---
>>>
>>> We talked about something like this a while back, but the end result
>>> was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
>>> But it'd be nice for wine to be able to control this too.
>>>
>>> I couldn't actually find any evidence of the discussion from 2017 or so,
>>> so ... let's have another one.
>>>
>>>  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++
>>>  1 file changed, 136 insertions(+)
>>>  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
>>>
>>> diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec 
>>> b/docs/specs/MESA_ieee_fp_alu_mode.spec
>>> new file mode 100644
>>> index 000..cb274f06571
>>> --- /dev/null
>>> +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
>>> @@ -0,0 +1,136 @@
>>> +Name
>>> +
>>> +MESA_ieee_fp_alu_mode
>>> +
>>> +Name Strings
>>> +
>>> +GL_MESA_ieee_fp_alu_mode
>>> +
>>> +Contact
>>> +
>>> +Ilia Mirkin, ilia 'at' x.org
>>> +
>>> +IP Status
>>> +
>>> +No known IP issues.
>>> +
>>> +Status
>>> +
>>> +Proposed
>>> +
>>> +Version
>>> +
>>> +Number
>>> +
>>> +TBD
>>> +
>>> +Dependencies
>>> +
>>> +OpenGL 3.0 or OpenGL ES 3.0 is required.
>>> +
>>> +The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
>>> +specifications.
>>> +
>>> +Overview
>>> +
>>> +Pre-GL3 hardware did not generally have full IEEE floating point 
>>> operation
>>> +support. Among other things, 0 * Infinity would work out to 0, and 
>>> NaN's
>>> +might not be generated, or otherwise be treated improperly. GL3-class 
>>> and
>>> +later hardware introduced full IEEE FP support, including NaN, 
>>> Infinity,
>>> +and the proper generation of these.
>>> +
>>> +Some software targeted at older hardware makes assumptions about how 
>>> the
>>> +shader ALU works. And to accomodate these, GL3-class hardware has a 
>>> way to
>>> +change how the shader ALU behaves. There are no standards around this, 
>>> and
>>> +different hardware has different ways of dealing with it. However these
>>> +modes were designed specifically with such older software in mind.
>>> +
>>> +This extension introduces a way to configure a context to be in 
>>> non-IEEE
>>> +ALU mode. This extension does not specify precisely what this means, as
>>> +each vendor has something different. Generally it means non-IEEE 
>>> compliant
>>> +handling of multiplication, as well as any other unspecified changes.
>>
>> I think many of the other things are specified.  They're the non-IEEE
>> behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
>> those mimic the required behavior of early DX shader models.  There are
>> a bunch of cases that specify that zero is generated when IEEE would
>> require NaN.
>>
>> If there's just a small handful of things like this, we'd probably be
>> better adding a couple new built-in functions to do the job.  The
>> problem on Intel hardware is... we really, really don't want to switch
>> to non-IEEE mode because it changes how a bunch of things work, and we
>> haven't tested any of that in many years.  I'd much rather put in some
>> kind of work-arounds for things that don't want multiplication or pow()
>> to generate NaN.
> 
> So basically anything that ever involves multiplication needs to have
> these variants. Things like dot, the various crazy ops of days past
> whose names escape me but involve complex calculations, etc. Things
> like pow are questionable (depends on if they get decomposed or not),
> and things like rcp/rsq unquestionably produce NaN's (or Infinity,
> sorry not 100% sure but easily checked) on NVIDIA irrespective of that
> mode being enabled.
> 
> Also on Intel hardware, as you mention, the "non-ieee" mode is ...
> interesting, so to allow for that, I didn't want to say anything other
> than the positive cases. If you have no interest in exposing this

Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ian Romanick
On 2/23/20 5:57 PM, Ilia Mirkin wrote:
> ---
> 
> We talked about something like this a while back, but the end result
> was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> But it'd be nice for wine to be able to control this too.
> 
> I couldn't actually find any evidence of the discussion from 2017 or so,
> so ... let's have another one.
> 
>  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++
>  1 file changed, 136 insertions(+)
>  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
> 
> diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec 
> b/docs/specs/MESA_ieee_fp_alu_mode.spec
> new file mode 100644
> index 000..cb274f06571
> --- /dev/null
> +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
> @@ -0,0 +1,136 @@
> +Name
> +
> +MESA_ieee_fp_alu_mode
> +
> +Name Strings
> +
> +GL_MESA_ieee_fp_alu_mode
> +
> +Contact
> +
> +Ilia Mirkin, ilia 'at' x.org
> +
> +IP Status
> +
> +No known IP issues.
> +
> +Status
> +
> +Proposed
> +
> +Version
> +
> +Number
> +
> +TBD
> +
> +Dependencies
> +
> +OpenGL 3.0 or OpenGL ES 3.0 is required.
> +
> +The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
> +specifications.
> +
> +Overview
> +
> +Pre-GL3 hardware did not generally have full IEEE floating point 
> operation
> +support. Among other things, 0 * Infinity would work out to 0, and NaN's
> +might not be generated, or otherwise be treated improperly. GL3-class and
> +later hardware introduced full IEEE FP support, including NaN, Infinity,
> +and the proper generation of these.
> +
> +Some software targeted at older hardware makes assumptions about how the
> +shader ALU works. And to accomodate these, GL3-class hardware has a way 
> to
> +change how the shader ALU behaves. There are no standards around this, 
> and
> +different hardware has different ways of dealing with it. However these
> +modes were designed specifically with such older software in mind.
> +
> +This extension introduces a way to configure a context to be in non-IEEE
> +ALU mode. This extension does not specify precisely what this means, as
> +each vendor has something different. Generally it means non-IEEE 
> compliant
> +handling of multiplication, as well as any other unspecified changes.

I think many of the other things are specified.  They're the non-IEEE
behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
those mimic the required behavior of early DX shader models.  There are
a bunch of cases that specify that zero is generated when IEEE would
require NaN.

If there's just a small handful of things like this, we'd probably be
better adding a couple new built-in functions to do the job.  The
problem on Intel hardware is... we really, really don't want to switch
to non-IEEE mode because it changes how a bunch of things work, and we
haven't tested any of that in many years.  I'd much rather put in some
kind of work-arounds for things that don't want multiplication or pow()
to generate NaN.

As for the mechanism, I'm very strongly in favor of something that would
be locked-in when the shader is compiled.  I really want to avoid any
potential that an external glEnable could trigger a a recompile.

The more I think about it... having an extension that adds a handful
built-in functions that give old shader model behavior would be a good
idea.  We could even test it. :)  I've looked a lot of shaders, and I've
seen a lot of not-quite-what-they-wanted methods for avoiding NaN
behavior in a bunch of these functions.  Having a special version of
inversesqrt() that returns FLT_MAX for 0 would be useful to a lot of
users.  As part of the spec we could even provide canonical versions of
the functions so that users could copy-and-paste

#ifndef GL_MESA_foo

float inveresqrt_nonIEEE(float x)
{
...
}

#endif

> +
> +New Tokens
> +
> +Accepted by the  parameter of Enable, Disable, and IsEnabled, by
> +the  parameter of GetBooleanv, GetIntegerv, GetFloatv, and
> +GetDoublev:
> +
> +IEEE_FP_ALU_MODE_MESA  0x
> +
> +
> +Changes to GLSL Section 4.1.4 Floats:
> +
> +Add the following paragraph:
> +
> +In case that the shader is being executed in a context with
> +IEEE_FP_ALU_MODE_MESA disabled, multiplication shall produce the 
> following
> +(non-IEEE-complaint) result:
> +
> +   float a = 0;
> +   float b = Infinity;
> +   float c = a * b; // c == 0
> +
> +There may be other implications from this mode being enabled, including
> +clamping of non-finite values, or anything else the hardware mode happens
> +to enable to achieve compatibility.
> +
> +New State
> +
> +(add to table 6.52, Miscellaneous, p.392)
> +
> +   Initial
> +Get Value  Type   Get Command   Value Description   
> Sec.   Attribute
> +

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-12 Thread Ian Romanick
On 2/12/20 10:46 AM, Marek Olšák wrote:
> How do you enable LTO+PGO? Is it something we could enable by default
> for release builds?

I'm assuming PGO is "profile guided optimization."  That requires a
cycle of build, run workloads to collect data, rebuild with collected
data.  It would be awesome if there were a reasonable way to do that in
distro builds, but I think it will continue to be a dream. :(

> Marek
> 
> On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel  > wrote:
> 
> Hello Gert,
> 
> your merge 'broke' LTO and then later on PGO compilation/linking.
> 
> I do generally compiling with '-Dgallium-drivers=r600,radeonsi,swrast'
> for testing radeonsi and (your) r600 work. ;-)
> 
> After your merge I get several warnings in 'addrlib' with LTO and
> even a
> compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> 
> I had to disable 'r600' ('swrast' is needed for 'nine') to get a
> working
> LTO and even better PGO radeonsi driver.
> I'm preparing GREAT LTO+PGO (the later is the greater) numbers over the
> last 2 months. I'll send my results later, today.
> 
> Summary
> radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> 
> Honza and the GCC people (Intel's ICC folks) do GREAT things.
> 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> 
> Need some sleep.
> 
> See my log, below.
> 
> Greetings and GREAT work!
> 
> -Dieter
> 
> Am 09.02.2020 15:46, schrieb Gert Wollny:
> > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
> >> has anybody any objections if I merge the r600/NIR code?
> >> Without explicitely setting the debug flag it doesn't change a
> >> thing, but it would be better to continue developing in-tree.
> > Okay, if nobody objects, I'll merge it Monday evening.
> >
> > Best,
> > Gert
> 
> [1425/1433] Linking target src/gallium/targets/dri/libgallium_dri.so.
> FAILED: src/gallium/targets/dri/libgallium_dri.so
> c++  -o src/gallium/targets/dri/libgallium_dri.so
> 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
> -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared
> -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> src/util/libmesa_util.a src/util/format/libmesa_format.a
> src/compiler/nir/libnir.a src/compiler/libcompiler.a
> src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
> src/mesa/drivers/dri/common/libmegadriver_stub.a
> src/gallium/state_trackers/dri/libdri.a
> src/gallium/auxiliary/libgalliumvl.a src/gallium/auxiliary/libgallium.a
> src/mapi/shared-glapi/libglapi.so.0.0.0
> src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> src/loader/libloader.a src/util/libxmlconfig.a
> src/gallium/winsys/sw/null/libws_null.a
> src/gallium/winsys/sw/wrapper/libwsw.a
> src/gallium/winsys/sw/dri/libswdri.a
> src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> src/gallium/drivers/llvmpipe/libllvmpipe.a
> src/gallium/drivers/softpipe/libsoftpipe.a
> src/gallium/drivers/r600/libr600.a
> src/gallium/winsys/radeon/drm/libradeonwinsys.a
> src/gallium/drivers/radeonsi/libradeonsi.a
> src/gallium/winsys/amdgpu/drm/libamdgpuwinsys.a
> src/amd/addrlib/libaddrlib.a src/amd/common/libamd_common.a
> src/amd/llvm/libamd_common_llvm.a -Wl,--build-id=sha1 -Wl,--gc-sections
> -Wl,--version-script /opt/mesa/src/gallium/targets/dri/dri.sym
> -Wl,--dynamic-list /opt/mesa/src/gallium/targets/dri/../dri-vdpau.dyn
> /usr/lib64/libdrm.so -L/usr/local/lib -lLLVM-10git -pthread
> /usr/lib64/libexpat.so
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libz.so -lm
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libzstd.so
> -L/usr/local/lib -lLLVM-10git /usr/lib64/libunwind.so -ldl -lsensors
> -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_radeon.so
> /usr/lib64/libelf.so -L/usr/local/lib -lLLVM-10git -L/usr/local/lib
> -lLLVM-10git -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_amdgpu.so
> -L/usr/local/lib -lLLVM-10git -Wl,--end-group
> 
> '-Wl,-rpath,$ORIGIN/../../../mesa:$ORIGIN/../../../compiler/glsl:$ORIGIN/../../../compiler/glsl/glcpp:$ORIGIN/../../../util:$ORIGIN/../../../util/format:$ORIGIN/../../../compiler/nir:$ORIGIN/../../../compiler:$ORIGIN/../../../mesa/drivers/dri/common:$ORIGIN/../../state_trackers/dri:$ORIGIN/../../auxiliary:$ORIGIN/../../../mapi/shared-glapi:$ORIGIN/../../auxiliary/pipe-loader:$ORIGIN/../../../loader:$ORIGIN/../../winsys/sw/null:$ORIGIN/../../winsys/sw/wrapper:$ORIGIN/../../winsys/sw/dri:$ORIGIN/../../winsys/sw/kms-dri:$ORIGIN/../../drivers/llvmpipe:$ORIGIN/../../drivers/softpipe:$ORIGIN/../../drivers/r600:$ORIGIN/../../winsys/radeon/drm:$ORIGIN/../../drivers/radeon

Re: [Mesa-dev] -fno-common build failures (default from upcoming gcc release 10)

2020-01-21 Thread Ian Romanick
On 1/20/20 6:41 AM, Stefan Dirsch wrote:
> Hi
> 
> Starting from the upcoming GCC release 10, the default of -fcommon option will
> change to -fno-common. Due to this we're going to see a lot of build failures

It seems like many of the places where this would occur in Mesa are
likely to be fine... but it would be easy enough to fix by sprinkling
'extern' around (if my understanding of the GCC docs is correct).  It
also seems unlikely that GCC will be able to apply any hypothetical
optimizations to the uses in Mesa.  I think most places where a global
is intended to be isolated, we already decorate it with static.

See also
https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fcommon

We definitely should decide which behavior we want, and we should start
enforcing it sooner rather than later.

> in Mesa drivers like
> 
> multiple definition of `syncobj_handle'; 
> src/amd/vulkan/9198681@@vulkan_radeon@sha/meson-generated_.._radv_entrypoints.c.o
>  (symbol from plugin):(.text+0x0): first defined here
> [  213s]
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld:
> src/amd/vulkan/9198681@@vulkan_radeon@sha/radv_wsi_wayland.c.o (symbol from
> plugin): in function `radv_GetPhysicalDeviceWaylandPresentationSupportKHR':
> 
> I'm wondering if there is already anybody working on fixing these issues or is
> it recommended to workaround it by setting -fcommon manually somehow? How can
> the latter be done when using meson/ninja as build tool?

I know you can do it with buildtype=plain, but that may be overkill for
anything other than testing.

> Thanks,
> Stefan
> 
> Public Key available
> --
> Stefan Dirsch (Res. & Dev.)   SUSE Software Solutions Germany GmbH
> Tel: 0911-740 53 0Maxfeldstraße 5
> FAX: 0911-740 53 479  D-90409 Nürnberg
> http://www.suse.deGermany 
> 
> (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Requiring a full author name when contributing to mesa?

2019-12-11 Thread Ian Romanick
On 12/11/19 2:27 PM, Timothy Arceri wrote:
> Hi,
> 
> So it seems lately we have been increasingly merging patches with made
> up names, or single names etc [1]. The latest submitted patch has the
> name Icecream95. This seems wrong to me from a point of keeping up the
> integrity of the project. I'm not a legal expert but it doesn't seem
> ideal to be amassing commits with these type of author tags from that
> point of view either.
> 
> Is it just me or do others agree we should at least require a proper
> name on the commits (as fake as that may be also)? Seems like a low bar
> to me.

First we don't allow single Unicode characters or emojis, and now you
want to require realistic looking names.  Where does it end, Tim?!?
WHERE DOES IT END???

🤣

But seriously... I definitely agree with the sentiment.  It seems really
lame to have a bunch of commits from clearly nonsense names.  Who are
these randos?  Where's the accountability?

As far as any possible legal aspects go, since we don't require (as far
as I'm aware) submitters to sign any sort of certificate of origin, I
don't know that Icecream95 is any better or worse than Ralphio Grant (a
realistic looking name that I just made up) or Ian Romanck.

It seems to me that this is a social problem, so it likely has a social
solution.  If we don't want people to be anonymous cowards with clearly
phony names, we should try to make the alternatives of being involved in
the community and using a real name more attractive.  We should do our
best to encourage people to "do the right thing."

I don't think it's very realistic for us to try to compel people to do
so, and I don't think there's much value in it... especially if it
drives away people making technically competent contributions.

> [1]
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050#note_361924
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Remove classic drivers or fork src/mesa for gallium?

2019-12-04 Thread Ian Romanick
On 12/3/19 10:48 PM, Marek Olšák wrote:
> On Wed., Dec. 4, 2019, 01:20 Tapani Pälli,  > wrote:
> 
> Hi;
> 
> On 12/4/19 2:39 AM, Marek Olšák wrote:
> > Hi,
> >
> > Here are 2 proposals to simplify and better optimize the GL->Gallium
> > translation.
> >
> > 1) Move classic drivers to a fork of Mesa, and remove them from
> master.
> > Classic drivers won't share any code with master. glvnd will load
> them,
> > but glvnd is not ready for this yet.
> >
> > 2) Keep classic drivers. Fork src/mesa for Gallium. I think only
> > mesa/main, mesa/vbo, mesa/program, and drivers/dri/common need to be
> > forked and mesa/state_tracker moved.
> src/gallium/state-trackers/gl/ can
> > be the target location.
> >
> > Option 2 is more acceptable to people who want to keep classic
> drivers
> > in the tree and it can be done right now.
> >
> > Opinions?
> >
> 
> I'm still quite newbie with gallium side of things and I'd like to know
> more about the possible simplifications and optimizations that could be
> done if this happened. Is this more about 'cleaning up the tree' or are
> there also some performance opportunities we are missing right now with
> current state?
> 
> 
> It's possible to reduce CPU overhead even more by moving state
> translation from st_validate_state to GL functions. This is possible
> already, but at the cost of effectively adding dead code to classic
> drivers. In theory we could do slightly better without classic drivers,
> but we don't know for sure.

If someone wants to start on this, I would be more than happy to help
with the work on the old, poorly maintained drivers.  I have almost all
of the necessary hardware installed in (mostly) functioning systems.
The "big CPU system" I have in my house has a PCI Radeon 9000 installed,
and I use it fairly regularly.

> If we had nir_to_tgsi, we could remove TGSI support from st/mesa. Option
> 1 would leave st/mesa as the only consumer of Mesa IR and GLSL IR, so
> both IRs could be eliminated in favor of NIR more easily. Although I
> guess a simpler option is not to touch anything.
> 
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: no-op C99 _Pragma() with MSVC

2019-11-22 Thread Ian Romanick
On 11/22/19 6:49 PM, Brian Paul wrote:
> This fixes a build failure on MSVC.
> 
> BTW, it looks like clang supports _Pragma() but I don't know if it
> understands the "gcc unroll N" directive.

It probably doesn't, but that should be okay.  This just exists to speed
up the debug builds in the pre-merge CI.

Reviewed-by: Ian Romanick 

> Signed-off-by: Brian Paul 
> ---
>  src/compiler/nir/nir_range_analysis.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_range_analysis.c 
> b/src/compiler/nir/nir_range_analysis.c
> index df5d4da..d38bcc0 100644
> --- a/src/compiler/nir/nir_range_analysis.c
> +++ b/src/compiler/nir/nir_range_analysis.c
> @@ -218,6 +218,13 @@ analyze_constant(const struct nir_alu_instr *instr, 
> unsigned src,
>   */
>  #define ___ unknown
>  
> +
> +/* MSVC doesn't have C99's _Pragma() */
> +#ifdef _MSC_VER
> +#define _Pragma(x)
> +#endif
> +
> +
>  #ifndef NDEBUG
>  #define ASSERT_TABLE_IS_COMMUTATIVE(t)\
> do {   \
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [2/2] state_tracker: Handle texture view min level in st_generate_mipmap()

2019-11-01 Thread Ian Romanick
On 11/1/19 12:47 PM, Paul Gofman wrote:
> On 11/1/19 22:19, Ilia Mirkin wrote:
>> It looks like the _mesa_generate_mipmap fallback has a similar problem
>> (which would happen for glGenerateMipmap with e.g. a s3tc format).
> 
> It looks to me it doesn't. I hit this path in the tests I am running
> with 3D textures (I made sure of that by inserting debug output in the
> code), and it works as expected. Please mind that GetTexSubImage() and
> MapTextureImage() driver functions which are called from within
> _mesa_generate_mipmap() to access actual texture data add .MinLevel to
> the actual level they retrieve.
> 
>> I think this could all use some piglit tests that iterate through all
>> or at least many different formats, including both renderable and
>> non-renderable ones.
> So do you think it is required to add such tests prior to these fixes?

It's not strictly a requirement, but having some kind of a test is very
highly desirable.

> Regards,
> 
>     Paul.
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Update on Khronos conformance submissions

2019-10-16 Thread Ian Romanick
On 10/15/19 1:38 AM, Daniel Vetter wrote:
> Hi all,
> 
> At XDC and with a few follow ups with Neil we've clarified the process
> for submitting conformance results to get the GL/VK trademarks and
> everything:
> 
> https://www.x.org/wiki/Khronos/
> 
> Big update is that we've had a misunderstanding around submissions by
> hardware vendors. Those are only a concern for hardware vendors (they
> need to be adopters and pay fees even if they're implementation is
> based on X.org), we can submit anything that's open and conformant.
> Even if there's no corresponding submission by a hardware vendor, and
> including software-only renderers.
> 
> Hopefully we'll see a bunch more submissions in the future now!

It would be very cool (and validating) to see submissions for softpipe
and / or llvmpipe at the very least.

> Cheers, Daniel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/nir: fix illegal designated initializer in st_glsl_to_nir.cpp

2019-09-11 Thread Ian Romanick
On 9/10/19 10:53 PM, Brian Paul wrote:
> IIRC, designated initializers are not legal C++.
> Fixes the MSVC build.
> 
> Fixes: 83fd1e58 ("glsl/nir: Add and use a gl_nir_link() function")
> ---
>  src/mesa/state_tracker/st_glsl_to_nir.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
> b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> index 280a778..d6a0264 100644
> --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
> @@ -688,7 +688,7 @@ st_link_nir(struct gl_context *ctx,
>  */
> if (shader_program->data->spirv) {
>static const gl_nir_linker_options opts = {
> - .fill_parameters = true,
> + true /*fill_parameters */

Could we get a comment in the definition of gl_nir_linker_options to
remind people to either add options only to the end or double check all
of the places that initialize the structures?  If someone adds 'bool
do_foo_instead_of_bar' option at the beginning of that struct, it will
cause problems.

>};
>if (!gl_nir_link(ctx, shader_program, &opts))
>   return GL_FALSE;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Extension help

2019-08-02 Thread Ian Romanick
On 8/2/19 1:13 PM, Fritz Koenig wrote:
> Hi,
> 
> I would like to be able to use the MESA_framebuffer_flip_y extension
> in GLES 3.0.  The blocker is that FramebufferParameteri is not part of
> GLES 3.0.  I have explored a couple of ways of achieving this.
> 
> 1.   FramebufferParameteri was first provided by the GL extension
> ARB_framebuffer_no_attachments.  However,
> ARB_framebuffer_no_attachments was never a GLES extension.  An option
> would be to modify the ARB_framebuffer_no_attachments spec to make it
> an extension for GLES.
> 
> 2.  Change the MESA_framebuffer_flip_y spec to allow the extension to
> implement FramebufferParameteri.

GLES has more strict rules about what functions can be advertised than
desktop OpenGL.  Since glFramebufferParameteri is part of the reserved
namespace, I'm not 100% sure that it's legal to expose it in pre-3.1.  I
would suggest making a PR to the registry first.

My gut tells me you're going to have to add a FramebufferParameteriMESA
function that is an alias of FramebufferParameteri.

Has this extension already shipped in a Mesa release?

> I choose the 2nd option and put together a pr for it[1].
> 
> I didn't get the implementation quite right, and I'm looking for some
> help.  My change is failing the DispatchSanity_test.GLES3 test:
> 
> [ RUN  ] DispatchSanity_test.GLES3
> ../src/mesa/main/tests/dispatch_sanity.cpp:174: Failure
>   Expected: nop_table[i]
>   Which is: 0x563d50d6fa01
> To be equal to: table[i]
>   Which is: 0x563d50db158a
> i = 888 (FramebufferParameteri)
> ../src/mesa/main/tests/dispatch_sanity.cpp:174: Failure
>   Expected: nop_table[i]
>   Which is: 0x563d50d6fa01
> To be equal to: table[i]
>   Which is: 0x563d50db1add
> i = 889 (GetFramebufferParameteriv)
> [  FAILED  ] DispatchSanity_test.GLES3 (1 ms)
> 
> It appears there is a problem with the way that I'm
> defining/redefining FramebufferParameteri.  I've tried adding it to
> src/mapi/glapi/gen/es_EXT.xml, but the only way I was able to achieve
> that was to give it a different function name and an alias.
> 
> Does anyone have experience with this sort of backporting of
> extensions that could offer some insight on what I might be doing
> wrong?
> 
> Thanks.
> 
> -Fritz
> 
> [1]: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1560
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: fix OpenGL 3.1 context creation

2019-08-02 Thread Ian Romanick
On 8/2/19 12:39 PM, Ian Romanick wrote:
> On 8/1/19 6:38 PM, Timothy Arceri wrote:
>> From the EGL_KHR_create_context spec:
>>
>>"* If OpenGL 3.1 is requested, the context returned may implement
>>any of the following versions:
>>
>>  * Version 3.1. The GL_ARB_compatibility extension may or may
>>not be implemented, as determined by the implementation.
>>  * The core profile of version 3.2 or greater."
>>
>> Fixes CTS tests:
>>
>> dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil
>> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil
>> dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil
>> 
>> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil
>> dEQP-EGL.functional.create_context_ext.gl_31.rgba_depth_no_stencil
>> dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil
>> 
>> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba_depth_no_stencil
>> 
>> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil
>> dEQP-EGL.functional.create_context_ext.gl_31.rgba_no_depth_no_stencil
>> 
>> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba_no_depth_no_stencil
>> dEQP-EGL.functional.create_context_ext.gl_31.rgba_depth_stencil
>> 
>> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba_depth_stencil
>> ---
>>  src/egl/drivers/dri2/egl_dri2.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
>> b/src/egl/drivers/dri2/egl_dri2.c
>> index c712c106b06..918d61a1e9b 100644
>> --- a/src/egl/drivers/dri2/egl_dri2.c
>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>> @@ -1245,6 +1245,9 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay 
>> *disp, _EGLConfig *conf,
>> && dri2_ctx->base.ClientMinorVersion >= 2))
>>&& dri2_ctx->base.Profile == 
>> EGL_CONTEXT_OPENGL_CORE_PROFILE_BIT_KHR)
>>   api = __DRI_API_OPENGL_CORE;
>> +  else if (dri2_ctx->base.ClientMajorVersion == 3 &&
>> +   dri2_ctx->base.ClientMinorVersion == 1)
>> + api = __DRI_API_OPENGL_CORE;
> 
> If my recollection of the way these are handled in the driver is
> correct, I think this will prevent us from ever exposing
> GL_ARB_compatibility when the context is created with EGL.  I /think/
> the API choice should be further conditioned by
> dri_screen->max_gl_compat_version.  Something like

It looks like dri2_convert_glx_attribs (src/glx/dri_common.c) has the
old behavior too... which makes me wonder how creating an OpenGL 3.1
context has ever worked.

Either way, EGL and GLX should behave the same for this kind of thing.

> 
>   else if (dri2_ctx->base.ClientMajorVersion == 3 &&
>dri2_ctx->base.ClientMinorVersion == 1)
>  api = dri2_dpy->dri_screen->max_gl_compat_version >= 31
> ? __DRI_API_OPENGL : __DRI_API_OPENGL_CORE;
> 
>>else
>>   api = __DRI_API_OPENGL;
>>break;
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: fix OpenGL 3.1 context creation

2019-08-02 Thread Ian Romanick
On 8/1/19 6:38 PM, Timothy Arceri wrote:
> From the EGL_KHR_create_context spec:
> 
>"* If OpenGL 3.1 is requested, the context returned may implement
>any of the following versions:
> 
>  * Version 3.1. The GL_ARB_compatibility extension may or may
>not be implemented, as determined by the implementation.
>  * The core profile of version 3.2 or greater."
> 
> Fixes CTS tests:
> 
> dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil
> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil
> dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil
> 
> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil
> dEQP-EGL.functional.create_context_ext.gl_31.rgba_depth_no_stencil
> dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil
> 
> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba_depth_no_stencil
> 
> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil
> dEQP-EGL.functional.create_context_ext.gl_31.rgba_no_depth_no_stencil
> 
> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba_no_depth_no_stencil
> dEQP-EGL.functional.create_context_ext.gl_31.rgba_depth_stencil
> dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba_depth_stencil
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
> index c712c106b06..918d61a1e9b 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -1245,6 +1245,9 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, 
> _EGLConfig *conf,
> && dri2_ctx->base.ClientMinorVersion >= 2))
>&& dri2_ctx->base.Profile == 
> EGL_CONTEXT_OPENGL_CORE_PROFILE_BIT_KHR)
>   api = __DRI_API_OPENGL_CORE;
> +  else if (dri2_ctx->base.ClientMajorVersion == 3 &&
> +   dri2_ctx->base.ClientMinorVersion == 1)
> + api = __DRI_API_OPENGL_CORE;

If my recollection of the way these are handled in the driver is
correct, I think this will prevent us from ever exposing
GL_ARB_compatibility when the context is created with EGL.  I /think/
the API choice should be further conditioned by
dri_screen->max_gl_compat_version.  Something like

  else if (dri2_ctx->base.ClientMajorVersion == 3 &&
   dri2_ctx->base.ClientMinorVersion == 1)
 api = dri2_dpy->dri_screen->max_gl_compat_version >= 31
? __DRI_API_OPENGL : __DRI_API_OPENGL_CORE;

>else
>   api = __DRI_API_OPENGL;
>break;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa (update-reviewers-for-vmware): i965/clear: clear_value better precision

2019-08-02 Thread Ian Romanick
On 8/2/19 9:12 AM, Brian Paul wrote:
> On 08/02/2019 09:56 AM, Eric Engestrom wrote:
>> On Friday, 2019-08-02 17:50:17 +0200, Michel Dänzer wrote:
>>> On 2019-08-02 5:37 p.m., Brian Paul wrote:
 Ugh, I didn't mean to do this.  I'm trying to figure out how to make a
 merge request with gitlab.
>>>
>>> Just push to a branch in your personal repository, and the output of git
>>> push contains an URL for creating an MR for it.
>>
>> Precisely, but just to be extra clear: your personal repo needs to be
>> forked from the main mesa repo [1], not just "a repo containing the mesa
>> git history".
>> GitLab needs to know the two are linked and pressing that "fork" button
>> is how you tell it.
> 
> Yeah, I just figured that out a few minutes ago.  After I figure out all
> the detailed steps from scratch I'll add it to the documentation.
> 
> I've really never done anything with gitlab, github, etc. and have been
> busy with non-Mesa work for over a year now.  I have a lot of catch-up
> to do.

Welcome back. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Set minimum possible GLSL version

2019-07-04 Thread Ian Romanick
From: Ian Romanick 

Set the absolute minimum possible GLSL version.  API_OPENGL_CORE can
mean an OpenGL 3.0 forward-compatible context, so that implies a minimum
possible version of 1.30.  Otherwise, the minimum possible version 1.20.
Since Mesa unconditionally advertises GL_ARB_shading_language_100 and
GL_ARB_shader_objects, every driver has GLSL 1.20... even if they don't
advertise any extensions to enable any shader stages (e.g.,
GL_ARB_vertex_shader).

Converts about 2,500 piglit tests from crash to skip on NV18.
---
 src/mesa/main/context.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index e5a89d9c2fc..516660d55d2 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -616,6 +616,17 @@ _mesa_init_constants(struct gl_constants *consts, gl_api 
api)
consts->MaxProgramMatrices = MAX_PROGRAM_MATRICES;
consts->MaxProgramMatrixStackDepth = MAX_PROGRAM_MATRIX_STACK_DEPTH;
 
+   /* Set the absolute minimum possible GLSL version.  API_OPENGL_CORE can
+* mean an OpenGL 3.0 forward-compatible context, so that implies a minimum
+* possible version of 1.30.  Otherwise, the minimum possible version 1.20.
+* Since Mesa unconditionally advertises GL_ARB_shading_language_100 and
+* GL_ARB_shader_objects, every driver has GLSL 1.20... even if they don't
+* advertise any extensions to enable any shader stages (e.g.,
+* GL_ARB_vertex_shader).
+*/
+   consts->GLSLVersion = api == API_OPENGL_CORE ? 130 : 120;
+   consts->GLSLVersionCompat = consts->GLSLVersion;
+
/* Assume that if GLSL 1.30+ (or GLSL ES 3.00+) is supported that
 * gl_VertexID is implemented using a native hardware register with OpenGL
 * semantics.
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: avoid _mesa_problem invocation when running on drivers without glsl

2019-07-04 Thread Ian Romanick
On 7/4/19 4:21 PM, Ilia Mirkin wrote:
> For example wine might query GL_SHADING_LANGUAGE_VERSION on a driver
> that doesn't support GLSL. This is not a problem in itself, we can just
> return a INVALID_ENUM error.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109524
> Signed-off-by: Ilia Mirkin 
> ---
>  src/mesa/main/getstring.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/mesa/main/getstring.c b/src/mesa/main/getstring.c
> index 3d5ae0b694b..6c0dd9048da 100644
> --- a/src/mesa/main/getstring.c
> +++ b/src/mesa/main/getstring.c
> @@ -150,6 +150,8 @@ _mesa_GetString( GLenum name )
>case GL_SHADING_LANGUAGE_VERSION:
>   if (ctx->API == API_OPENGLES)
>  break;
> + if (_mesa_is_desktop_gl(ctx) && ctx->Const.GLSLVersion == 0)
> +break;

GLSL version should never be zero.  We advertise GL_ARB_shading_language
in all drivers, so every driver has "GLSL" even if it doesn't have
vertex shaders or fragment shaders.  I thought I sent out a patch some
time ago that set GLSLVersion to 120 by default to avoid problems like this.

>return shading_language_version(ctx);
>case GL_PROGRAM_ERROR_STRING_ARB:
>   if (ctx->API == API_OPENGL_COMPAT &&
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] What does WIP really mean in an MR?

2019-06-28 Thread Ian Romanick
After a conversation yesterday with a couple of the other Intel devs,
I've come to the conclusion that *everyone* interprets WIP to mean
something different.  I heard no less than four interpretations.

* This series is good.  It hasn't been reviewed, so don't click "merge."

* This series has some sketchy bits.  It probably isn't ready for review
unless you've been tagged for design feedback.

* This series has been reviewed.  Incorporation of detailed feedback is
in progress, but it's going to take some time.

* This series is good, but there are some questionable patches at the end.

Due to this lack of common understanding, we discovered at least one MR
that was ready to go but had been ignored for months. :(  This makes me
wonder if other MRs have similarly languished for no good reason.

Can we formulate some guidelines for how people should apply WIP to
their MRs and how people should interpret WIP when they see it on an MR?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel_stub: Wrap fcntl64

2019-06-25 Thread Ian Romanick
On 6/25/19 10:23 AM, Jason Ekstrand wrote:
> I needed this patch today so I've pushed it.  FYI, we do have MRs for
> shader-db which are a bit easier to find than a needle in the hay-stack
> that is mesa-dev.

Ah... that's good to know.  After I sent this patch, I completely forgot
about it. :)

> --Jason
> 
> On Fri, Jun 7, 2019 at 11:27 AM Emil Velikov  <mailto:emil.l.veli...@gmail.com>> wrote:
> 
>     On Fri, 7 Jun 2019 at 00:30, Ian Romanick  <mailto:i...@freedesktop.org>> wrote:
> >
> > From: Ian Romanick  <mailto:ian.d.roman...@intel.com>>
> >
> > This makes the wrapper work on glibc 2.29 on Fedora 30.
> > ---
> AFAICT this patch is for shader-db and looks spot on.
> Reviewed-by: Emil Velikov  <mailto:emil.veli...@collabora.com>>
> 
> -Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Possible bug in nir_algebraic?

2019-06-24 Thread Ian Romanick
On 6/22/19 5:10 AM, Connor Abbott wrote:
> I haven't thought about whether it's algebraically correct, but
> otherwise your pattern looks fine to me.

I'll have a proof when I send out the patch. :)

> If you haven't noticed already, I added some commented out code to
> nir_replace_instr() that will print out each pattern that's matched.
> The first thing I'd do is move that up to the beginning of the
> function, so that it prints potential matches instead of actual
> matches. If your pattern shows up as a potential match, then it's a
> problem with match_expression() and not the automaton, and at that
> point you can start setting breakpoints and stepping through it.

It looks like the automaton is matching, but match_expression is not.
I'll dig deeper.  Thanks for the tips.

> If it's not a potential match because the automaton filtered it out,
> then debugging is currently a little harder. You'll have to add some
> debugging code to ${pass_name}_pre_block() that prints out the
> assigned state for each ALU instruction. The state is an integer which
> is conceptually an index into TreeAutomaton.states for the constructed
> TreeAutomaton. So if an instruction has state n, then
> automaton.states[n] should be a set of potential partial matches for
> that instruction. Note that variables like a, b, c etc. are converted
> into __wildcard while constants are all converted into __constant. So
> for example, ssa_2 should have a set { __const, __wildcard } as it can
> be matched as a variable or as a constant (we actually explicitly
> construct this set in TreeAutomaton._build_table). ssa_83 should have
> a set { __wildcard, (neg __wildcard) } since it can be matched either
> as a variable or as something like (neg a). (yes, this is very similar
> to the subset construction for getting a DFA from an NFA...). Unless
> there's a bug, each of these "match sets" should contain the
> appropriate subset of your pattern until ssa_97 which should have the
> full pattern as one of the entries in the set. Let us know the details
> if that's not the case.
> 
> I think that we could definitely do better when it comes to debugging
> why the automaton didn't match something. We could emit the
> automaton's state list in C, and then have a debugging option to print
> the match set for each instruction so you'd know where something went
> awry. I didn't do that earlier since I didn't have a need for it while
> bringing up the automaton, but we could add it if it helps. That being
> said, hopefully you won't need it this time :)
> 
> Best,
> 
> Connor
> 
> On Sat, Jun 22, 2019 at 2:26 AM Ian Romanick  wrote:
>>
>> I have encountered what I believe to be a bug in nir_algebraic.  Since
>> the rewrite to use automata, I'm not sure how to begin debugging it.
>> I'm looking for some suggestions... even if the suggestion is, "Fix your
>> patterns."
>>
>> I have added a pattern like:
>>
>>(('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
>>   ('fadd', 1.0, ('fneg', a))),
>>  ('fmul', ('flrp', a, 1.0, a), b)),
>> ('flrp', 1.0, b, a), '!options->lower_flrp32'),
>>
>> While using NIR_PRINT=1, I see this in my instruction stream:
>>
>> vec1 32 ssa_2 = load_const (0x3f80 /* 1.00 */)
>> ...
>> vec1 32 ssa_196 = intrinsic load_uniform (ssa_195) (68, 4, 160)
>> vec1 32 ssa_83 = fneg ssa_196
>> vec1 32 ssa_84 = fadd ssa_83, ssa_2
>> vec1 32 ssa_85 = fmul ssa_84, ssa_84
>> ...
>> vec1 32 ssa_95 = flrp ssa_196, ssa_2, ssa_196
>> vec1 32 ssa_96 = fmul ssa_78, ssa_95
>> vec1 32 ssa_97 = fadd ssa_96, ssa_85
>>
>> But nir_opt_algebraic does not make any progress.  It sure looks like it
>> should trigger with a = ssa_196 and b = ssa_78.
>>
>> However, progress is made if I change the pattern to
>>
>>(('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
>>   c),
>>  ('fmul', ('flrp', a, 1.0, a), b)),
>> ('flrp', 1.0, b, a), '!options->lower_flrp32'),
>>
>> ssa_85 is definitely ('fmul', ssa_84, ssa_84), and ssa_84 is definitely
>> ('fadd', 1.0, ('fneg', ssa_196))... both times. :)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Possible bug in nir_algebraic?

2019-06-21 Thread Ian Romanick
I have encountered what I believe to be a bug in nir_algebraic.  Since
the rewrite to use automata, I'm not sure how to begin debugging it.
I'm looking for some suggestions... even if the suggestion is, "Fix your
patterns."

I have added a pattern like:

   (('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
  ('fadd', 1.0, ('fneg', a))),
 ('fmul', ('flrp', a, 1.0, a), b)),
('flrp', 1.0, b, a), '!options->lower_flrp32'),

While using NIR_PRINT=1, I see this in my instruction stream:

vec1 32 ssa_2 = load_const (0x3f80 /* 1.00 */)
...
vec1 32 ssa_196 = intrinsic load_uniform (ssa_195) (68, 4, 160)
vec1 32 ssa_83 = fneg ssa_196
vec1 32 ssa_84 = fadd ssa_83, ssa_2
vec1 32 ssa_85 = fmul ssa_84, ssa_84
...
vec1 32 ssa_95 = flrp ssa_196, ssa_2, ssa_196
vec1 32 ssa_96 = fmul ssa_78, ssa_95
vec1 32 ssa_97 = fadd ssa_96, ssa_85

But nir_opt_algebraic does not make any progress.  It sure looks like it
should trigger with a = ssa_196 and b = ssa_78.

However, progress is made if I change the pattern to

   (('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
  c),
 ('fmul', ('flrp', a, 1.0, a), b)),
('flrp', 1.0, b, a), '!options->lower_flrp32'),

ssa_85 is definitely ('fmul', ssa_84, ssa_84), and ssa_84 is definitely
('fadd', 1.0, ('fneg', ssa_196))... both times. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] util: Add util_is_power_of_two_minus_one

2019-06-14 Thread Ian Romanick
On 6/14/19 9:42 AM, Alyssa Rosenzweig wrote:
> Checks if a number is one less than a power of two. Equivalently, this
> checks if a number is all ones in binary. The latter definition is
> helpful in the context of masks.
> 
> The function is trivial; this is *the* canonical check and is
> arguably no less clean than calling util_is_power_of_two(x + 1) (the

Except it would have to be util_is_power_of_two_or_zero because
util_is_power_of_two(0x + 1) is false. :)

Is there actually a 2/2 for this?  We usually wouldn't land something
like this without a caller.

> latter function implemented similarly). Still, it's worth having a
> dedicated check for this; semantically, in the context of masks, this
> check is meaningful standalone, justifying an independent implementation
> from the existing util_is_power_of_two* utilites.
> 
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Ian Romanick 
> Cc: Eduardo Lima Mitev 
> ---
>  src/util/bitscan.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/util/bitscan.h b/src/util/bitscan.h
> index dc89ac93f28..632f7dd2e67 100644
> --- a/src/util/bitscan.h
> +++ b/src/util/bitscan.h
> @@ -158,6 +158,15 @@ util_is_power_of_two_nonzero(unsigned v)
>  #endif
>  }
>  
> +/* Determine if an unsigned value is one less than a power-of-two
> + */
> +
> +static inline bool
> +util_is_power_of_two_minus_one(unsigned v)
> +{
> +   return (v & (v + 1)) == 0;

This will return true for v == 0.  Is that the desired behavior?  I
mean, that is 2**0 - 1, but it is not "all ones in binary."  I think the
result may surprise people wanting to use this to detect a mask.  This
is also how we ended up with util_is_power_of_two_nonzero and
util_is_power_of_two_or_zero.

> +}
> +
>  /* For looping over a bitmask when you want to loop over consecutive bits
>   * manually, for example:
>   *
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel_stub: Wrap fcntl64

2019-06-06 Thread Ian Romanick
From: Ian Romanick 

This makes the wrapper work on glibc 2.29 on Fedora 30.
---
 intel_stub.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/intel_stub.c b/intel_stub.c
index 8b8db64..590792e 100644
--- a/intel_stub.c
+++ b/intel_stub.c
@@ -50,6 +50,7 @@ static int (*libc_fstat64)(int fd, struct stat64 *buf);
 static int (*libc__fxstat)(int ver, int fd, struct stat *buf);
 static int (*libc__fxstat64)(int ver, int fd, struct stat64 *buf);
 static int (*libc_fcntl)(int fd, int cmd, int param);
+static int (*libc_fcntl64)(int fd, int cmd, int param);
 static ssize_t (*libc_readlink)(const char *pathname, char *buf, size_t 
bufsiz);
 
 static int drm_fd = 0xBEEF;
@@ -198,6 +199,22 @@ fcntl(int fd, int cmd, ...)
return libc_fcntl(fd, cmd, param);
 }
 
+__attribute__ ((visibility ("default"))) int
+fcntl64(int fd, int cmd, ...)
+{
+   va_list args;
+   int param;
+
+   if (fd == drm_fd && cmd == F_DUPFD_CLOEXEC)
+   return drm_fd;
+
+   va_start(args, cmd);
+   param = va_arg(args, int);
+   va_end(args);
+
+   return libc_fcntl64(fd, cmd, param);
+}
+
 __attribute__ ((visibility ("default"))) void *
 mmap(void *addr, size_t len, int prot, int flags,
  int fildes, off_t off)
@@ -315,6 +332,7 @@ init(void)
libc_open64 = dlsym(RTLD_NEXT, "open64");
libc_close = dlsym(RTLD_NEXT, "close");
libc_fcntl = dlsym(RTLD_NEXT, "fcntl");
+   libc_fcntl64 = dlsym(RTLD_NEXT, "fcntl64");
libc_fstat = dlsym(RTLD_NEXT, "fstat");
libc_fstat64 = dlsym(RTLD_NEXT, "fstat64");
libc__fxstat = dlsym(RTLD_NEXT, "__fxstat");
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH] nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)

2019-05-24 Thread Ian Romanick
On 5/23/19 7:54 PM, Ilia Mirkin wrote:
> How does max(NaN, 0) work? IIRC there's some provision that this
> becomes 0, while abs(NaN) = NaN.

That is correct.  There are a couple other algebraic patterns that have
the same potential problem (e.g., the min(max()) -> sat() patterns), and
we just make those as imprecise with ~.  I /think/ that should be
adequate here too.

> On Thu, May 23, 2019 at 10:47 PM Alyssa Rosenzweig  
> wrote:
>>
>> I noticed this pattern in glmark's jellyfish scene.
>>
>> Assuming this is correct (it should be...?), could someone do a
>> shader-db run? Thank you!
>>
>> Signed-off-by: Alyssa Rosenzweig 
>> Cc: Ian Romanick 
>> ---
>>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index 89d07aa1261..abd0b6591ce 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -377,6 +377,7 @@ optimizations = [
>> (('imax', a, a), a),
>> (('umin', a, a), a),
>> (('umax', a, a), a),
>> +   (('fmax', ('fabs', a), 0.0), ('fabs', a)),
>> (('fmax', ('fmax', a, b), b), ('fmax', a, b)),
>> (('umax', ('umax', a, b), b), ('umax', a, b)),
>> (('imax', ('imax', a, b), b), ('imax', a, b)),
>> --
>> 2.20.1
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/tex: ignore the diff between GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE

2019-05-21 Thread Ian Romanick
On 5/21/19 4:36 AM, Olivier Fourdan wrote:
> Hi all,
> 
> On Thu, Jul 19, 2018 at 12:08 PM andrey simiklit
>  wrote:
>>> Ugh... not so good.  According to Oliver on the bug, this just make the 
>>> assert go away and doesn't actually fix anything.  Likely this is needed 
>>> but not sufficient.
>>
>> So as far as I understand Oliver found the bad commit in xorg glamor:
>> https://bugs.freedesktop.org/show_bug.cgi?id=107287
>>
>> So at the moment we should fix just this "assertion" issue for Intel because 
>> "rendering" issue came from xorg/glamor and there is no "rendering" issue in 
>> Intel part.
>> Please correct me if I incorrect.
> 
> Reviving an old thread/patch here.
> 
> Andrey, I reckon your patch here is still much needed as it fixes the
> assert() issue:
> 
> intel_mipmap_tree.c:1301: intel_miptree_match_image: Assertion
> `image->TexObject->Target == mt->target' failed.
> 
> Which is still occurring even with current master.
> 
> My patch was to fix the rendering issue (landed a while ago before
> 18.1 iirc), but yours was never merged and is still needed, I can
> reproduce the assert() at will with the reproducer from
> https://bugs.freedesktop.org/show_bug.cgi?id=107117
> 
> Jason, can we reconsider Andrii's patch? It still applies cleanly
> (https://patchwork.freedesktop.org/patch/237490/)

Looking at the patch and the "Simple reproducer" in the bug, I think
this just papers over the issue.  It seems like the problem is somewhere
down inside the driver's handling of glXBindTexImageEXT.  My best guess
is that the texture is GL_TEXTURE_2D but the miptree backing it is
GL_TEXTURE_RECTANGLE.  It seems that the glXBindTexImageEXT handling
should mark the miptree as GL_TEXTURE_2D when binding the image to a
texture that is GL_TEXTURE_2D.  Or is that not possible for some
non-obvious reason?

> Cheers,
> Olivier
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs/features: don't list EXT extensions in a list for KHR/ARB/OES extensions

2019-05-17 Thread Ian Romanick
On 5/17/19 6:24 AM, Eric Engestrom wrote:
> On 2019-05-16 at 18:34, Ian Romanick  wrote:
>> On 5/15/19 7:39 AM, Gert Wollny wrote:
>>> How about moving these extensions to another (new) section? I think it
>>> is nice to have a one-stop place to find out what is supported. 
>>
>> Given the existence of mesamatrix.net, is that useful?
> 
> mesamatrix.net is nothing more than a pretty parser for this file. If you 
> remove
> the information from this file, it won't be on the website anymore either ;-)
> 
>> When we started
>> this file, the purpose was to track work that people were doing to avoid
>> collisions and track progress towards closing the functionality gap with
>> the rest of the industry.  There's not a lot of new functionality work
>> being done, and there's not much of a functionality gap with the rest of
>> the industry.
>>
>> Given that it's unlikely there will ever be another GL version, ARB
>> extension, KHR extension, or OES extension, I'm honestly not sure how
>> much value this file has at all.
> 
> This file has contained other things as well for a while, which is why it was 
> eventually renamed from gl3.txt to features.txt a few years ago.

When I made commit f926cf5bd0a ("docs: Rename GL3.txt to features.txt")
in 2016, it was because we finished OpenGL 3.x, and had been using the
file to track progress on OpenGL 4.x and OpenGL ES 3.x features for some
time.

> I don't have a stake in this steak, but to me this issue with this patch is 
> that I don't see what's gained by removing this information?

I think Marek's point is that there are zero other EXT / vendor
extensions in features.txt that are not also part of some OpenGL or
OpenGL ES version.  I think if reviewers had been paying attention to
features.txt, none of these would have landed in the first place.
Looking at the logs, it seems that many of these changes were either
unreviewed or were reviewed by Marek.  There's some irony there. :)

I also believe that leaving these odd ball extensions invites more
clutter in this file.

>>> Best, 
>>> Gert
>>>
>>> On Di, 2019-05-14 at 16:07 -0400, Marek Olšák wrote:
>>>> From: Marek Olšák 
>>>>
>>>> ---
>>>>  docs/features.txt | 10 --
>>>>  1 file changed, 10 deletions(-)
>>>>
>>>> diff --git a/docs/features.txt b/docs/features.txt
>>>> index 38d6186dbe1..b1799550a0c 100644
>>>> --- a/docs/features.txt
>>>> +++ b/docs/features.txt
>>>> @@ -309,30 +309,20 @@ Khronos, ARB, and OES extensions that are not
>>>> part of any OpenGL or OpenGL ES ve
>>>>GL_ARB_seamless_cubemap_per_texture   DONE
>>>> (freedreno, i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>>>>GL_ARB_shader_ballot  DONE
>>>> (i965/gen8+, nvc0, radeonsi)
>>>>GL_ARB_shader_clock   DONE
>>>> (i965/gen7+, nv50, nvc0, r600, radeonsi, virgl)
>>>>GL_ARB_shader_stencil_export  DONE
>>>> (i965/gen9+, r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>>>>GL_ARB_shader_viewport_layer_arrayDONE
>>>> (i965/gen6+, nvc0, radeonsi)
>>>>GL_ARB_sparse_buffer  DONE
>>>> (radeonsi/CIK+)
>>>>GL_ARB_sparse_texture not started
>>>>GL_ARB_sparse_texture2not started
>>>>GL_ARB_sparse_texture_clamp   not started
>>>>GL_ARB_texture_filter_minmax  not started
>>>> -  GL_EXT_memory_object  DONE
>>>> (radeonsi)
>>>> -  GL_EXT_memory_object_fd   DONE
>>>> (radeonsi)
>>>> -  GL_EXT_memory_object_win32not started
>>>> -  GL_EXT_render_snorm   DONE (i965,
>>>> radeonsi)
>>>> -  GL_EXT_semaphore  DONE
>>>> (radeonsi)
>>>> -  GL_EXT_semaphore_fd   DONE
>>>> (radeonsi)
>>>> -  GL_EXT_semaphore_win32not started
>>>> -  GL_EXT_sRGB_write_control DONE (all
>>>> drivers that support GLES 3.0+)
>>>> -  GL_EXT_texture_norm16 DONE
>>>> (freedr

Re: [Mesa-dev] [PATCH] docs/features: don't list EXT extensions in a list for KHR/ARB/OES extensions

2019-05-16 Thread Ian Romanick
On 5/15/19 7:39 AM, Gert Wollny wrote:
> How about moving these extensions to another (new) section? I think it
> is nice to have a one-stop place to find out what is supported. 

Given the existence of mesamatrix.net, is that useful?  When we started
this file, the purpose was to track work that people were doing to avoid
collisions and track progress towards closing the functionality gap with
the rest of the industry.  There's not a lot of new functionality work
being done, and there's not much of a functionality gap with the rest of
the industry.

Given that it's unlikely there will ever be another GL version, ARB
extension, KHR extension, or OES extension, I'm honestly not sure how
much value this file has at all.

> Best, 
> Gert
> 
> On Di, 2019-05-14 at 16:07 -0400, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> ---
>>  docs/features.txt | 10 --
>>  1 file changed, 10 deletions(-)
>>
>> diff --git a/docs/features.txt b/docs/features.txt
>> index 38d6186dbe1..b1799550a0c 100644
>> --- a/docs/features.txt
>> +++ b/docs/features.txt
>> @@ -309,30 +309,20 @@ Khronos, ARB, and OES extensions that are not
>> part of any OpenGL or OpenGL ES ve
>>GL_ARB_seamless_cubemap_per_texture   DONE
>> (freedreno, i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>>GL_ARB_shader_ballot  DONE
>> (i965/gen8+, nvc0, radeonsi)
>>GL_ARB_shader_clock   DONE
>> (i965/gen7+, nv50, nvc0, r600, radeonsi, virgl)
>>GL_ARB_shader_stencil_export  DONE
>> (i965/gen9+, r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>>GL_ARB_shader_viewport_layer_arrayDONE
>> (i965/gen6+, nvc0, radeonsi)
>>GL_ARB_sparse_buffer  DONE
>> (radeonsi/CIK+)
>>GL_ARB_sparse_texture not started
>>GL_ARB_sparse_texture2not started
>>GL_ARB_sparse_texture_clamp   not started
>>GL_ARB_texture_filter_minmax  not started
>> -  GL_EXT_memory_object  DONE
>> (radeonsi)
>> -  GL_EXT_memory_object_fd   DONE
>> (radeonsi)
>> -  GL_EXT_memory_object_win32not started
>> -  GL_EXT_render_snorm   DONE (i965,
>> radeonsi)
>> -  GL_EXT_semaphore  DONE
>> (radeonsi)
>> -  GL_EXT_semaphore_fd   DONE
>> (radeonsi)
>> -  GL_EXT_semaphore_win32not started
>> -  GL_EXT_sRGB_write_control DONE (all
>> drivers that support GLES 3.0+)
>> -  GL_EXT_texture_norm16 DONE
>> (freedreno, i965, r600, radeonsi, nvc0)
>> -  GL_EXT_texture_sRGB_R8DONE (all
>> drivers that support GLES 3.0+)
>>GL_KHR_blend_equation_advanced_coherent   DONE
>> (i965/gen9+)
>>GL_KHR_texture_compression_astc_hdr   DONE
>> (i965/bxt)
>>GL_KHR_texture_compression_astc_sliced_3d DONE
>> (i965/gen9+, radeonsi)
>>GL_OES_depth_texture_cube_map DONE (all
>> drivers that support GLSL 1.30+)
>>GL_OES_EGL_image  DONE (all
>> drivers)
>>GL_OES_EGL_image_external DONE (all
>> drivers)
>>GL_OES_EGL_image_external_essl3   DONE (all
>> drivers)
>>GL_OES_required_internalformatDONE (all
>> drivers)
>>GL_OES_surfaceless_contextDONE (all
>> drivers)
>>GL_OES_texture_compression_astc   DONE (core
>> only)
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs/features: don't list EXT extensions in a list for KHR/ARB/OES extensions

2019-05-14 Thread Ian Romanick
Reviewed-by: Ian Romanick 

On 5/14/19 1:07 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  docs/features.txt | 10 --
>  1 file changed, 10 deletions(-)
> 
> diff --git a/docs/features.txt b/docs/features.txt
> index 38d6186dbe1..b1799550a0c 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -309,30 +309,20 @@ Khronos, ARB, and OES extensions that are not part of 
> any OpenGL or OpenGL ES ve
>GL_ARB_seamless_cubemap_per_texture   DONE (freedreno, 
> i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>GL_ARB_shader_ballot  DONE (i965/gen8+, 
> nvc0, radeonsi)
>GL_ARB_shader_clock   DONE (i965/gen7+, 
> nv50, nvc0, r600, radeonsi, virgl)
>GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
> r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, 
> nvc0, radeonsi)
>GL_ARB_sparse_buffer  DONE (radeonsi/CIK+)
>GL_ARB_sparse_texture not started
>GL_ARB_sparse_texture2not started
>GL_ARB_sparse_texture_clamp   not started
>GL_ARB_texture_filter_minmax  not started
> -  GL_EXT_memory_object  DONE (radeonsi)
> -  GL_EXT_memory_object_fd   DONE (radeonsi)
> -  GL_EXT_memory_object_win32not started
> -  GL_EXT_render_snorm   DONE (i965, radeonsi)
> -  GL_EXT_semaphore  DONE (radeonsi)
> -  GL_EXT_semaphore_fd   DONE (radeonsi)
> -  GL_EXT_semaphore_win32not started
> -  GL_EXT_sRGB_write_control DONE (all drivers 
> that support GLES 3.0+)
> -  GL_EXT_texture_norm16 DONE (freedreno, 
> i965, r600, radeonsi, nvc0)
> -  GL_EXT_texture_sRGB_R8DONE (all drivers 
> that support GLES 3.0+)
>GL_KHR_blend_equation_advanced_coherent   DONE (i965/gen9+)
>GL_KHR_texture_compression_astc_hdr   DONE (i965/bxt)
>GL_KHR_texture_compression_astc_sliced_3d DONE (i965/gen9+, 
> radeonsi)
>GL_OES_depth_texture_cube_map DONE (all drivers 
> that support GLSL 1.30+)
>GL_OES_EGL_image  DONE (all drivers)
>GL_OES_EGL_image_external DONE (all drivers)
>GL_OES_EGL_image_external_essl3   DONE (all drivers)
>GL_OES_required_internalformatDONE (all drivers)
>GL_OES_surfaceless_contextDONE (all drivers)
>GL_OES_texture_compression_astc   DONE (core only)
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix uninitialized lower_flrp_progress variable

2019-05-08 Thread Ian Romanick

I sent an MR for this and the other cases earlier this morning.

On May 8, 2019 9:20:16 AM Brian Paul  wrote:


The 'progress' variable is initialized to false in other locations.
This fixes a new Coverity warning.
---
src/mesa/state_tracker/st_glsl_to_nir.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp

index 0a67d45..5706425 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -338,7 +338,7 @@ st_nir_opts(nir_shader *nir, bool scalar)
NIR_PASS(progress, nir, nir_opt_constant_folding);


if (lower_flrp != 0) {
- bool lower_flrp_progress;
+ bool lower_flrp_progress = false;


NIR_PASS(lower_flrp_progress, nir, nir_lower_flrp,
 lower_flrp,
--
2.7.4


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/7] compiler: Add enums for blend state

2019-05-07 Thread Ian Romanick
This commit is

Reviewed-by: Ian Romanick 

On 5/5/19 7:26 PM, Alyssa Rosenzweig wrote:
> We add enums corresponding to (GLES) blend state to shader_enums.h,
> complementing the existing advanced blending enums in the file. This
> allows us to represent blending state in a driver-agnostic, API-agnostic
> way to permit lowering.
> 
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Eric Anholt 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/shader_enums.h | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h
> index ac293af4519..47b1ca01dd6 100644
> --- a/src/compiler/shader_enums.h
> +++ b/src/compiler/shader_enums.h
> @@ -753,6 +753,27 @@ enum gl_advanced_blend_mode
> BLEND_ALL= 0x7fff,
>  };
>  
> +enum blend_func
> +{
> +   BLEND_FUNC_ADD,
> +   BLEND_FUNC_SUBTRACT,
> +   BLEND_FUNC_REVERSE_SUBTRACT,
> +   BLEND_FUNC_MIN,
> +   BLEND_FUNC_MAX,
> +};
> +
> +enum blend_factor
> +{
> +   BLEND_FACTOR_ZERO,
> +   BLEND_FACTOR_SRC_COLOR,
> +   BLEND_FACTOR_DST_COLOR,
> +   BLEND_FACTOR_SRC_ALPHA,
> +   BLEND_FACTOR_DST_ALPHA,
> +   BLEND_FACTOR_CONSTANT_COLOR,
> +   BLEND_FACTOR_CONSTANT_ALPHA,
> +   BLEND_FACTOR_SRC_ALPHA_SATURATE,
> +};
> +
>  enum gl_tess_spacing
>  {
> TESS_SPACING_UNSPECIFIED,
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/7] nir: Add blend_const_color_rgba sysval

2019-05-07 Thread Ian Romanick
This commit is

Reviewed-by: Ian Romanick 

On 5/5/19 7:26 PM, Alyssa Rosenzweig wrote:
> This represents a float vec4 constant color, as passed to glBlendColor.
> While the existing 4 shader sysvals are retained to minimize code churn,
> a single vectorized intrinsic is required for efficient blending on
> vector architectures. (This may also apply to archictectures like
> Bifrost where ALU is scalar but load/store is vector; it largely depends
> on how blending is implemented per-driver.)
> 
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Eric Anholt 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_intrinsics.py | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/compiler/nir/nir_intrinsics.py 
> b/src/compiler/nir/nir_intrinsics.py
> index 3a0470c2ca1..df459a3cdec 100644
> --- a/src/compiler/nir/nir_intrinsics.py
> +++ b/src/compiler/nir/nir_intrinsics.py
> @@ -568,11 +568,14 @@ system_value("viewport_z_offset", 1)
>  system_value("viewport_scale", 3)
>  system_value("viewport_offset", 3)
>  
> -# Blend constant color values.  Float values are clamped.#
> +# Blend constant color values.  Float values are clamped. Vectored versions 
> are
> +# provided as well for driver convenience
> +
>  system_value("blend_const_color_r_float", 1)
>  system_value("blend_const_color_g_float", 1)
>  system_value("blend_const_color_b_float", 1)
>  system_value("blend_const_color_a_float", 1)
> +system_value("blend_const_color_rgba", 4)
>  system_value("blend_const_color_rgba_unorm", 1)
>  system_value("blend_const_color__unorm", 1)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/7] nir: Add nir_lower_blend pass

2019-05-07 Thread Ian Romanick
On 5/5/19 7:26 PM, Alyssa Rosenzweig wrote:
> This new lowering pass implements the OpenGL ES blend pipeline in
> shaders, applicable to hardware lacking full-featured blending hardware
> (including Midgard/Bifrost and vc4). This pass is run on a fragment
> shader, rewriting the store to a blended version, loading in the
> framebuffer destination color and constant color via intrinsics as
> necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to
> pass dEQP's blend tests. That said, at present it has the following
> limitations:
> 
>  - MRT is not supported.
>  - Logic ops are not supported.

Logic ops seem... challenging to emulate in the shader.  That shader
would need the destination colors in the framebuffer storage format, and
I'm not sure that's always possible (maybe?).  We (and I think everyone
else) removed GL_EXT_blend_logic_op because nobody's hardware could
handle the interactions GL_EXT_blend_equation_separate.  It would be
cool to add it back. :)

It might also be fun to add support for GL_AMD_blend_minmax_factor and
GL_SGIX_blend_alpha_minmax.  Looking at this lowering pass, it seems
like most of the work would be adding tests.

> MRT support is on my TODO list but paused until MRT is implemented in
> the rest of the driver. Both changes should be fairly trivial.
> 
> It also includes MIN/MAX modes, so in conjunction with the advanced
> blend mode lowering it should be sufficient for ES3, though this has not
> been thoroughly tested. It is an open question whether the current GLSL
> IR based advanced blend lowering should be NIRified and merged into this
> pass.

Having all of the lowering related to blending in one place seems like a
good idea.

>  ...Dual-source blending is not supported, Ryan.
> 
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Eric Anholt 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/Makefile.sources  |   1 +
>  src/compiler/nir/meson.build   |   1 +
>  src/compiler/nir/nir.h |  22 +++
>  src/compiler/nir/nir_lower_blend.c | 214 +
>  4 files changed, 238 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_blend.c
> 
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 9bebc3d8867..d68b9550b02 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -238,6 +238,7 @@ NIR_FILES = \
>   nir/nir_lower_bit_size.c \
>   nir/nir_lower_bool_to_float.c \
>   nir/nir_lower_bool_to_int32.c \
> + nir/nir_lower_blend.c \
>   nir/nir_lower_clamp_color_outputs.c \
>   nir/nir_lower_clip.c \
>   nir/nir_lower_clip_cull_distance_arrays.c \
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index a8faeb9c018..73ab62d4b46 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -116,6 +116,7 @@ files_libnir = files(
>'nir_lower_array_deref_of_vec.c',
>'nir_lower_atomics_to_ssbo.c',
>'nir_lower_bitmap.c',
> +  'nir_lower_blend.c',
>'nir_lower_bool_to_float.c',
>'nir_lower_bool_to_int32.c',
>'nir_lower_clamp_color_outputs.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 37161e83e4d..8b68faed819 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -3447,6 +3447,28 @@ typedef enum  {
>  
>  bool nir_lower_to_source_mods(nir_shader *shader, 
> nir_lower_to_source_mods_flags options);
>  
> +/* These structs encapsulates the blend state such that it can be lowered
encapsulate
> + * cleanly */

*/ on its own line.  There's at least one more instance of this below.

> +
> +typedef struct {
> +  enum blend_func func;
> +
> +  enum blend_factor src_factor;
> +  bool invert_src_factor;
> +
> +  enum blend_factor dst_factor;
> +  bool invert_dst_factor;
> +} nir_lower_blend_channel;
> +
> +typedef struct {
> +   struct {
> +  nir_lower_blend_channel rgb;
> +  nir_lower_blend_channel alpha;
> +   } rt[8];
> +} nir_lower_blend_options;
> +
> +void nir_lower_blend(nir_shader *shader, nir_lower_blend_options options);
> +
>  bool nir_lower_gs_intrinsics(nir_shader *shader);
>  
>  typedef unsigned (*nir_lower_bit_size_callback)(const nir_alu_instr *, void 
> *);
> diff --git a/src/compiler/nir/nir_lower_blend.c 
> b/src/compiler/nir/nir_lower_blend.c
> new file mode 100644
> index 000..5a874f08834
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_blend.c
> @@ -0,0 +1,214 @@
> +/*
> + * Copyright (C) 2019 Alyssa Rosenzweig
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the foll

Re: [Mesa-dev] Mesa (master): nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently

2019-05-07 Thread Ian Romanick
On 5/7/19 8:20 AM, Samuel Pitoiset wrote:
> This introduces glitches with Talos and Serious Sam 2017 with RADV...
> 
> Are you able to reproduce the problem with ANV?

Probably not very easily.  If you can figure out which shader it is, it
should be easy to figure out the problem from before / after NIR.

> On 5/7/19 8:01 AM, GitLab Mirror wrote:
>> Module: Mesa
>> Branch: master
>> Commit: 5b908db604b2f47bb8382047533e556db8d5f52b
>> URL:   
>> http://cgit.freedesktop.org/mesa/mesa/commit/?id=5b908db604b2f47bb8382047533e556db8d5f52b
>>
>>
>> Author: Ian Romanick 
>> Date:   Tue Aug 21 17:17:24 2018 -0700
>>
>> nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently
>>
>> No changes on any other Intel platforms.
>>
>> v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
>> nir_opt_algebraic")
>>
>> Iron Lake and GM45 had similar results. (Iron Lake shown)
>> total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
>> instructions in affected programs: 1199037 -> 1163061 (-3.00%)
>> helped: 4124
>> HURT: 10
>> helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
>> helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
>> HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
>> HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
>> 95% mean confidence interval for instructions value: -8.84 -8.56
>> 95% mean confidence interval for instructions %-change: -5.12% -4.77%
>> Instructions are helped.
>>
>> total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
>> cycles in affected programs: 27505596 -> 27325850 (-0.65%)
>> helped: 4026
>> HURT: 77
>> helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
>> helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
>> HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
>> HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
>> 95% mean confidence interval for cycles value: -44.75 -42.87
>> 95% mean confidence interval for cycles %-change: -2.44% -2.17%
>> Cycles are helped.
>>
>> LOST:   3
>> GAINED: 35
>>
>> Reviewed-by: Matt Turner 
>>
>> ---
>>
>>   src/compiler/nir/nir_lower_flrp.c | 134
>> ++
>>   1 file changed, 134 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir_lower_flrp.c
>> b/src/compiler/nir/nir_lower_flrp.c
>> index 952068ec9cc..c041fefc52b 100644
>> --- a/src/compiler/nir/nir_lower_flrp.c
>> +++ b/src/compiler/nir/nir_lower_flrp.c
>> @@ -137,6 +137,89 @@ replace_with_fast(struct nir_builder *bld, struct
>> u_vector *dead_flrp,
>>  append_flrp_to_dead_list(dead_flrp, alu);
>>   }
>>   +/**
>> + * Replace flrp(a, b, c) with (b*c ± c) + a
>> + */
>> +static void
>> +replace_with_expanded_ffma_and_add(struct nir_builder *bld,
>> +   struct u_vector *dead_flrp,
>> +   struct nir_alu_instr *alu, bool
>> subtract_c)
>> +{
>> +   nir_ssa_def *const a = nir_ssa_for_alu_src(bld, alu, 0);
>> +   nir_ssa_def *const b = nir_ssa_for_alu_src(bld, alu, 1);
>> +   nir_ssa_def *const c = nir_ssa_for_alu_src(bld, alu, 2);
>> +
>> +   nir_ssa_def *const b_times_c = nir_fadd(bld, b, c);
>> +   nir_instr_as_alu(b_times_c->parent_instr)->exact = alu->exact;
>> +
>> +   nir_ssa_def *inner_sum;
>> +
>> +   if (subtract_c) {
>> +  nir_ssa_def *const neg_c = nir_fneg(bld, c);
>> +  nir_instr_as_alu(neg_c->parent_instr)->exact = alu->exact;
>> +
>> +  inner_sum = nir_fadd(bld, b_times_c, neg_c);
>> +   } else {
>> +  inner_sum = nir_fadd(bld, b_times_c, c);
>> +   }
>> +
>> +   nir_instr_as_alu(inner_sum->parent_instr)->exact = alu->exact;
>> +
>> +   nir_ssa_def *const outer_sum = nir_fadd(bld, inner_sum, a);
>> +   nir_instr_as_alu(outer_sum->parent_instr)->exact = alu->exact;
>> +
>> +   nir_ssa_def_rewrite_uses(&alu->dest.dest.ssa,
>> nir_src_for_ssa(outer_sum));
>> +
>> +   /* DO NOT REMOVE the original flrp yet.  Many of the lowering
>> choices are
>> +    * based on other uses of the sources.  Removing the flrp may
>> cause the
>> +    * last flrp in a sequence to make a different, incorrect choice.
>> +    */
>> +   append_flrp_to_dead_list(dead_flrp, alu);
>> +}
>> +
>> +/**
>> + * Determines whether a swizzled source is constant w/ all components
>> t

Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Ian Romanick
On 4/25/19 3:37 PM, Alyssa Rosenzweig wrote:
> In addition to the familiar iand/ior/ixor, some architectures feature
> destination-inverted versions inand/inor/inxor. Certain
> architectures also have source-inverted forms, dubbed iandnot/iornot
> here. Midgard has the all of these opcodes natively. Many arches have
> comparible features to implement some/all of the above. Paired with De
> Morgan's Laws, these opcodes allow anything of the form
> "~? (~?a [&|] ~?b)" to complete in one instruction.
> 
> This can be used to simplify some backend-specific code on affected
> architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on
> Gen8+").
> 
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Ian Romanick 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/nir/nir.h|  4 
>  src/compiler/nir/nir_opcodes.py   | 18 ++
>  src/compiler/nir/nir_opt_algebraic.py | 12 
>  3 files changed, 34 insertions(+)
> 
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index e878a63409d..3e01ec2cc06 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options {
> bool lower_hadd;
> bool lower_add_sat;
>  
> +   /* Set if inand/inor/inxor and iandnot/iornot supported respectively */
> +   bool bitwise_dest_invertable;
> +   bool bitwise_src_invertable;
> +
> /**
>  * Should nir_lower_io() create load_interpolated_input intrinsics?
>  *
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d35d820aa5b..f9d92afb53e 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & 
> src1")
>  binop("ior", tuint, commutative + associative, "src0 | src1")
>  binop("ixor", tuint, commutative + associative, "src0 ^ src1")
>  
> +# inverted bitwise logic operators
> +#
> +# These variants of the above include bitwise NOTs either on the result of 
> the
> +# whole expression or on the latter operand. On some hardware (e.g. Midgard),
> +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be
> +# implemented as modifiers of the standard three. Along with appropriate
> +# algebraic passes, these should permit any permutation of inverses on AND/OR
> +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = 
> ~a
> +# | b = b | ~a = iornot(b, a).
> +
> +binop("inand", tuint, commutative, "~(src0 & src1)")
> +binop("inor", tuint, commutative, "~(src0 | src1)")
> +binop("inxor", tuint, commutative, "~(src0 ^ src1)")
> +binop("iandnot", tuint, "", "src0 & (~src1)")
> +binop("iornot", tuint, "", "src0 & (~src1)")
> +
> +
> +
>  
>  # floating point logic operators
>  #
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index dad0545594f..6cb3e8cb950 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -1052,6 +1052,18 @@ late_optimizations = [
> (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', 
> b)), ('fadd', c, ('fmax', a, b))),
>  
> (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', 
> ('ior', a, b,
> +
> +   # We don't want to deal with inverted forms, so run this late. Any
> +   # combination of inverts on flags or output should result in a single
> +   # instruction if these are supported; cases not explicitly handled would
> +   # have been simplified via De Morgan's Law
> +   (('inot', ('iand', a, b)), ('inand', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ior', a, b)), ('inor', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ixor', a, b)), ('inxor', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('iand', ('inot', a), b), ('iandnot', b, a), 
> 'options->bitwise_src_invertable'),
> +   (('iand', a, ('inot', b)), ('iandnot', a, b), 
> 'options->bitwise_src_invertable'),

iand and ior are commutative, so you don't need both.

> +   (('ior

[Mesa-dev] [PATCH] glsl: Silence may unused parameter warnings in glsl/ir.h

2019-04-18 Thread Ian Romanick
From: Ian Romanick 

Every file that included glsl/ir.h had a warning like:

src/compiler/glsl/ir.h: In member function ‘virtual bool 
ir_rvalue::is_lvalue(const _mesa_glsl_parse_state*) const’:
src/compiler/glsl/ir.h:236:64: warning: unused parameter ‘state’ 
[-Wunused-parameter]
virtual bool is_lvalue(const struct _mesa_glsl_parse_state *state = NULL) 
const
^
Cc: Samuel Pitoiset 
Fixes: fa4ebf6b8d9 ("glsl: add _mesa_glsl_parse_state object to is_lvalue()")
---
 src/compiler/glsl/ir.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index 4dc2a5f6bc7..6f7b4542b1d 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -233,7 +233,7 @@ public:
 
ir_rvalue *as_rvalue_to_saturate();
 
-   virtual bool is_lvalue(const struct _mesa_glsl_parse_state *state = NULL) 
const
+   virtual bool is_lvalue(const struct _mesa_glsl_parse_state * = NULL) const
{
   return false;
}
-- 
2.17.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 4/12/19 4:46 PM, Alyssa Rosenzweig wrote:
> On Mali hardware (supported by Panfrost and Lima), the fixed-function
> transformation from world-space to screen-space coordinates is done in
> the vertex shader prior to writing out the gl_Position varying, rather
> than in dedicated hardware. This commit adds a shared NIR pass for
> implementing coordinate transformation and lowering gl_Position writes
> into screen-space gl_Position writes.
> 
> v2: Run directly on derefs before io/vars are lowered to cleanup the
> code substantially. Thank you to Qiang for this suggestion!
> 
> v3: Bikeshed continues.
> 
> Signed-off-by: Alyssa Rosenzweig 
> Suggested-by: Qiang Yu 
> Cc: Jason Ekstrand 
> Cc: Eric Anholt 
> ---
>  src/compiler/nir/meson.build  |   1 +
>  src/compiler/nir/nir.h|   1 +
>  .../nir/nir_lower_viewport_transform.c| 101 ++
>  3 files changed, 103 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
> 
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index c65f2ff62ff..c274361bdc4 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -151,6 +151,7 @@ files_libnir = files(
>'nir_lower_vars_to_ssa.c',
>'nir_lower_var_copies.c',
>'nir_lower_vec_to_movs.c',
> +  'nir_lower_viewport_transform.c',
>'nir_lower_wpos_center.c',
>'nir_lower_wpos_ytransform.c',
>'nir_lower_bit_size.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index bc72d8f83f5..0f6ed734efa 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
> nir_variable_mode mask);
>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
> mask);
>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>  
> +void nir_lower_viewport_transform(nir_shader *shader);
>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>  
>  typedef struct nir_lower_subgroups_options {
> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
> b/src/compiler/nir/nir_lower_viewport_transform.c
> new file mode 100644
> index 000..66085b8da5a
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright (C) 2019 Alyssa Rosenzweig
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/* On some hardware (particularly, all current versions of Mali GPUs),
> + * vertex shaders do not output gl_Position in world-space. Instead, they
> + * output gl_Position in transformed screen space via the "pseudo"
> + * position varying. Thus, this pass finds writes to gl_Position and
> + * changes them to transformed writes, still to gl_Position. The
> + * outputted screen space is still written back to VARYING_SLOT_POS,
> + * which is semantically ambiguous but nevertheless a good match for
> + * Gallium/NIR/Mali.
> + *
> + * Implements coordinate transformation as defined in section 12.5
> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
> + *
> + * This pass must run before lower_vars/lower_io such that derefs are
> + * still in place.
> + */
> +
> +#include "nir/nir.h"
> +#include "nir/nir_builder.h"
> +
> +void
> +nir_lower_viewport_transform(nir_shader *shader)
>

Re: [Mesa-dev] [PATCH v2 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Ian Romanick
On 4/12/19 5:11 PM, Ian Romanick wrote:
> On 4/8/19 5:34 AM, Thomas Helland wrote:
>> man. 8. apr. 2019 kl. 06:30 skrev Alyssa Rosenzweig :
>>>
>>> On Mali hardware (supported by Panfrost and Lima), the fixed-function
>>> transformation from world-space to screen-space coordinates is done in
>>> the vertex shader prior to writing out the gl_Position varying, rather
>>> than in dedicated hardware. This commit adds a shared NIR pass for
>>> implementing coordinate transformation and lowering gl_Position writes
>>> into screen-space gl_Position writes.
>>>
>>> v2: Run directly on derefs before io/vars are lowered to cleanup the
>>> code substantially. Thank you to Qiang for this suggestion!
>>>
>>> Signed-off-by: Alyssa Rosenzweig 
>>> Suggested-by: Qiang Yu 
>>> Cc: Jason Ekstrand 
>>> Cc: Eric Anholt 
>>> ---
>>>  src/compiler/nir/meson.build  |  1 +
>>>  src/compiler/nir/nir.h|  1 +
>>>  .../nir/nir_lower_viewport_transform.c| 98 +++
>>>  3 files changed, 100 insertions(+)
>>>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
>>>
>>> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>>> index c65f2ff62ff..c274361bdc4 100644
>>> --- a/src/compiler/nir/meson.build
>>> +++ b/src/compiler/nir/meson.build
>>> @@ -151,6 +151,7 @@ files_libnir = files(
>>>'nir_lower_vars_to_ssa.c',
>>>'nir_lower_var_copies.c',
>>>'nir_lower_vec_to_movs.c',
>>> +  'nir_lower_viewport_transform.c',
>>>'nir_lower_wpos_center.c',
>>>'nir_lower_wpos_ytransform.c',
>>>'nir_lower_bit_size.c',
>>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>>> index bc72d8f83f5..0f6ed734efa 100644
>>> --- a/src/compiler/nir/nir.h
>>> +++ b/src/compiler/nir/nir.h
>>> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
>>> nir_variable_mode mask);
>>>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
>>> mask);
>>>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>>>
>>> +void nir_lower_viewport_transform(nir_shader *shader);
>>>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>>>
>>>  typedef struct nir_lower_subgroups_options {
>>> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
>>> b/src/compiler/nir/nir_lower_viewport_transform.c
>>> new file mode 100644
>>> index 000..9646b72c053
>>> --- /dev/null
>>> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
>>> @@ -0,0 +1,98 @@
>>> +/*
>>> + * Copyright (C) 2019 Alyssa Rosenzweig
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the 
>>> next
>>> + * paragraph) shall be included in all copies or substantial portions of 
>>> the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>>> OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>>> OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>>> DEALINGS
>>> + * IN THE SOFTWARE.
>>> + */
>>> +
>>> +/* On some hardware (particularly, all current versions of Mali GPUs),
>>> + * vertex shaders do not output gl_Position in world-space. Instead, they
>>> + * output gl_Position in transformed screen space via the "pseudo"
>>> + * position varying.

Re: [Mesa-dev] [PATCH v2 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Ian Romanick
On 4/8/19 5:34 AM, Thomas Helland wrote:
> man. 8. apr. 2019 kl. 06:30 skrev Alyssa Rosenzweig :
>>
>> On Mali hardware (supported by Panfrost and Lima), the fixed-function
>> transformation from world-space to screen-space coordinates is done in
>> the vertex shader prior to writing out the gl_Position varying, rather
>> than in dedicated hardware. This commit adds a shared NIR pass for
>> implementing coordinate transformation and lowering gl_Position writes
>> into screen-space gl_Position writes.
>>
>> v2: Run directly on derefs before io/vars are lowered to cleanup the
>> code substantially. Thank you to Qiang for this suggestion!
>>
>> Signed-off-by: Alyssa Rosenzweig 
>> Suggested-by: Qiang Yu 
>> Cc: Jason Ekstrand 
>> Cc: Eric Anholt 
>> ---
>>  src/compiler/nir/meson.build  |  1 +
>>  src/compiler/nir/nir.h|  1 +
>>  .../nir/nir_lower_viewport_transform.c| 98 +++
>>  3 files changed, 100 insertions(+)
>>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
>>
>> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>> index c65f2ff62ff..c274361bdc4 100644
>> --- a/src/compiler/nir/meson.build
>> +++ b/src/compiler/nir/meson.build
>> @@ -151,6 +151,7 @@ files_libnir = files(
>>'nir_lower_vars_to_ssa.c',
>>'nir_lower_var_copies.c',
>>'nir_lower_vec_to_movs.c',
>> +  'nir_lower_viewport_transform.c',
>>'nir_lower_wpos_center.c',
>>'nir_lower_wpos_ytransform.c',
>>'nir_lower_bit_size.c',
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index bc72d8f83f5..0f6ed734efa 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
>> nir_variable_mode mask);
>>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
>> mask);
>>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>>
>> +void nir_lower_viewport_transform(nir_shader *shader);
>>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>>
>>  typedef struct nir_lower_subgroups_options {
>> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
>> b/src/compiler/nir/nir_lower_viewport_transform.c
>> new file mode 100644
>> index 000..9646b72c053
>> --- /dev/null
>> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * Copyright (C) 2019 Alyssa Rosenzweig
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +/* On some hardware (particularly, all current versions of Mali GPUs),
>> + * vertex shaders do not output gl_Position in world-space. Instead, they
>> + * output gl_Position in transformed screen space via the "pseudo"
>> + * position varying. Thus, this pass finds writes to gl_Position and
>> + * changes them to transformed writes, still to gl_Position. The
>> + * outputted screen space is still written back to VARYING_SLOT_POS,
>> + * which is semantically ambiguous but nevertheless a good match for
>> + * Gallium/NIR/Mali.
>> + *
>> + * Implements coordinate transformation as defined in section 12.5
>> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
>> + *
>> + * This pass must run before lower_vars/lower_io such that derefs are
>> + * still in place.
>> + */
>> +
>> +#include "nir/nir.h"
>> +#include "nir/nir_builder.h"
>> +
>> +void
>> +nir_lower_viewport_transform(nir_shader *shader)
>> +{
>> +   assert(shader->info.stage == MESA_SHADER_VERTEX);
>> +
>> +   nir_foreach_function(func, shader) {
>> +  nir_foreach_block(block, func->impl) {
>> + nir_foreach_instr_safe(instr, block) {
>> +if (instr->type != nir_instr_type_intrinsic) continue;
>> +
>> +nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
>> +  

Re: [Mesa-dev] new dispatch generator broke with Marek's parallel compile commit

2019-04-04 Thread Ian Romanick
On 4/3/19 10:20 AM, Emil Velikov wrote:
> On Tue, 2 Apr 2019 at 20:00, Ian Romanick  wrote:
>>
>> On 4/2/19 4:43 AM, Emil Velikov wrote:
>>> On Tue, 2 Apr 2019 at 04:55, Dave Airlie  wrote:
>>>>
>>>> On Tue, 2 Apr 2019 at 11:24, Dave Airlie  wrote:
>>>>>
>>>>> Marek's commit to add ARB_parallel_shader_compile broke some es1 tests
>>>>> in the Intel CI.
>>>>>
>>>>> It appears the whatever generates the es1api isn't consistent, for
>>>>> example glTranslatex on my local system is 1405 in es1api but is 1406
>>>>> in the gl api.
>>>>>
>>>>> I'm no expert on this area, Emil any ideas?
>>>>
>>>> This seems to be due the new registry xml parser, I'm not sure how
>>>> broken it is, but it seems like it's a bit busted, and nobody tested
>>>> the scenario where a new function gets introduced in the middle.
>>>>
>>>> It looks like static_data.py has a limit on the offsets it cares
>>>> about, I thought adding static offsets for these functions would help
>>>> here, but it appears currently it all just work by luck, that the
>>>> static offsets work out to be the same as ones generated by gl_XML.py
>>>> for values above MAX_OFFSETS.
>>>>
>>>> I've got a hacky patch that makes it work here, that increases
>>>> MAX_OFFSETS to 1420, adds a new entry to the end for the new APIs, but
>>>> really I think the current code is broken, and is happening to work
>>>> out, but I'm hoping I'm just missing something obvious and it'll be a
>>>> one line fix for Emil.
>>>>
>>> As you have noticed the old generator would add entries to the glapi
>>> table in arbitrary order.
>>> Meaning that the ABI between dri/glapi/libGL* would break every now and 
>>> then.
>>>
>>> In more detail - libGL* would expect glFooBar at offset X, while the
>>> function is at Y according to glapi and the dri module sets the
>>> dispatch at Y. Latter uses a combination of fixed offset and dynamic
>>> offset lookup.
>>
>> This doesn't make sense to me.  Can you explain more?  There are only
>> two parts.  There's the loader, and there's the driver.  The loader
>> assigns locations either at compile-time or at run-time.  The driver
>> queries the locations of functions that it will provide.  All functions
>> that the loader knows about should realistically have a dispatch
>> location set at compile time.  The loader should only generate new
>> locations at run-time if the application or driver asks for a function
>> that it does not know.
>>
>> For things that don't have a static dispatch location set in the XML,
>> the loader is free to assign any location is pleases, but that location
>> must be consistent across all APIs because a single application can
>> create contexts from every possible API in the same address space, but
>> glXGetProcAddress or eglGetProcAddress are API agnostic.
>>
>> If there is any possibility for a single function to have different
>> dispatch locations in different APIs, then the single most fundamental
>> invariant of the entire dispatch table system is being violated.  That
>> is a pretty catastrophic failure.
>>
> Note: I'm not 100% sure if the following is an issue or design decision.
> I'm inclined towards the former, although I could be wrong.
> 
> Loader references known functions entries by glapi_table offset,
> they're resolved at build-time.
> At some point a developer:
>  - changes the include order in our XMLs
>  - adds a new XML/entrypoints (say glFooBar) at in "special" order
>  - flips the aliasing order of glFoo and glFooARB
> 
> Since our old parser validates the entrypoints in the order they're
> presented**  - glFooBar gets an offset of X and shifts (some of) the
> existing entrypoints by 1.
> Say glHamSandwich - moves from X to X+1
> 
> The driver populates the entrypoints by offset, hence they're resolved
> at build-time.

There are a small number of functions that have locations explicitly set
in our XML.  These are the ones with the static_offset tag.  All of the
other functions must be queried by the driver.  The driver has its own
separate table that it uses to manage the mapping of names to dispatch
offsets.  This is the "remap table."  The remap table has fixed offsets
for each function, and each location in the table stores that location
in the dispatch table of

Re: [Mesa-dev] new dispatch generator broke with Marek's parallel compile commit

2019-04-02 Thread Ian Romanick
On 4/2/19 4:43 AM, Emil Velikov wrote:
> On Tue, 2 Apr 2019 at 04:55, Dave Airlie  wrote:
>>
>> On Tue, 2 Apr 2019 at 11:24, Dave Airlie  wrote:
>>>
>>> Marek's commit to add ARB_parallel_shader_compile broke some es1 tests
>>> in the Intel CI.
>>>
>>> It appears the whatever generates the es1api isn't consistent, for
>>> example glTranslatex on my local system is 1405 in es1api but is 1406
>>> in the gl api.
>>>
>>> I'm no expert on this area, Emil any ideas?
>>
>> This seems to be due the new registry xml parser, I'm not sure how
>> broken it is, but it seems like it's a bit busted, and nobody tested
>> the scenario where a new function gets introduced in the middle.
>>
>> It looks like static_data.py has a limit on the offsets it cares
>> about, I thought adding static offsets for these functions would help
>> here, but it appears currently it all just work by luck, that the
>> static offsets work out to be the same as ones generated by gl_XML.py
>> for values above MAX_OFFSETS.
>>
>> I've got a hacky patch that makes it work here, that increases
>> MAX_OFFSETS to 1420, adds a new entry to the end for the new APIs, but
>> really I think the current code is broken, and is happening to work
>> out, but I'm hoping I'm just missing something obvious and it'll be a
>> one line fix for Emil.
>>
> As you have noticed the old generator would add entries to the glapi
> table in arbitrary order.
> Meaning that the ABI between dri/glapi/libGL* would break every now and then.
> 
> In more detail - libGL* would expect glFooBar at offset X, while the
> function is at Y according to glapi and the dri module sets the
> dispatch at Y. Latter uses a combination of fixed offset and dynamic
> offset lookup.

This doesn't make sense to me.  Can you explain more?  There are only
two parts.  There's the loader, and there's the driver.  The loader
assigns locations either at compile-time or at run-time.  The driver
queries the locations of functions that it will provide.  All functions
that the loader knows about should realistically have a dispatch
location set at compile time.  The loader should only generate new
locations at run-time if the application or driver asks for a function
that it does not know.

For things that don't have a static dispatch location set in the XML,
the loader is free to assign any location is pleases, but that location
must be consistent across all APIs because a single application can
create contexts from every possible API in the same address space, but
glXGetProcAddress or eglGetProcAddress are API agnostic.

If there is any possibility for a single function to have different
dispatch locations in different APIs, then the single most fundamental
invariant of the entire dispatch table system is being violated.  That
is a pretty catastrophic failure.

> Currently ES* is ported to the new generator and I have some patches
> for libGL and glapi, but no DRI modules just yet.
> A reasonable short term fix is to update the old generator to honour
> the full static_data table.
> 
> I'll have a look at that and updating the libGL/libglapi patches.
> 
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] util: no-op __builtin_types_compatible_p() for non-GCC compilers

2019-03-29 Thread Ian Romanick
On 3/29/19 1:54 PM, Brian Paul wrote:
> On 03/29/2019 12:58 PM, Ian Romanick wrote:
>> On 3/29/19 9:57 AM, Brian Paul wrote:
>>> __builtin_types_compatible_p() is GCC-specific and breaks the
>>> MSVC build.
>>>
>>> This intrinsic has been in u_vector_foreach() for a long time, but
>>> that macro has only recently been used in code
>>> (nir/nir_opt_comparison_pre.c) that's built with MSVC.
>>>
>>> Fixes: 2cf59861a ("nir: Add partial redundancy elimination for
>>> compares")
>>> ---
>>>   src/util/u_vector.h | 4 
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/src/util/u_vector.h b/src/util/u_vector.h
>>> index cd8a95d..6807748 100644
>>> --- a/src/util/u_vector.h
>>> +++ b/src/util/u_vector.h
>>> @@ -80,6 +80,10 @@ u_vector_finish(struct u_vector *queue)
>>>  free(queue->data);
>>>   }
>>>   +#ifndef __GNUC__
>>> +#define __builtin_types_compatible_p(x) 1
>>> +#endif
>>> +
>>>   #define u_vector_foreach(elem,
>>> queue)  \
>>>  STATIC_ASSERT(__builtin_types_compatible_p(__typeof__(queue),
>>> struct u_vector *)); \
>>
>> The way this is GCC builtin is used here, this should be fine.  However,
>> in case it's begin used elsewhere, we should #undef it afterwards.
> 
> That doesn't seem to work.  When u_vector_foreach() is instantiated
> later, __builtin_types_compatible_p is undefined and we error out.

Ah, right.  u_vector_foreach is itself a macro, so the
__builtin_type_compatible_p macro isn't evaluated until u_vector_foreach
is evaluated.  By that point __builtin_type_compatible_p would be undefined.

This should be fine as-is, then.  Hopefully this won't result in a
weird, hard to debug compile failure later on.

Reviewed-by: Ian Romanick 

> -Brian
> 
> 
>   I'd
>> hate to mask some other kind of bug that may be introduced later.
>>
>>>  for (uint32_t __u_vector_offset =
>>> (queue)->tail;    \
>>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] util: no-op __builtin_types_compatible_p() for non-GCC compilers

2019-03-29 Thread Ian Romanick
On 3/29/19 9:57 AM, Brian Paul wrote:
> __builtin_types_compatible_p() is GCC-specific and breaks the
> MSVC build.
> 
> This intrinsic has been in u_vector_foreach() for a long time, but
> that macro has only recently been used in code
> (nir/nir_opt_comparison_pre.c) that's built with MSVC.
> 
> Fixes: 2cf59861a ("nir: Add partial redundancy elimination for compares")
> ---
>  src/util/u_vector.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/util/u_vector.h b/src/util/u_vector.h
> index cd8a95d..6807748 100644
> --- a/src/util/u_vector.h
> +++ b/src/util/u_vector.h
> @@ -80,6 +80,10 @@ u_vector_finish(struct u_vector *queue)
> free(queue->data);
>  }
>  
> +#ifndef __GNUC__
> +#define __builtin_types_compatible_p(x) 1
> +#endif
> +
>  #define u_vector_foreach(elem, queue)  \
> STATIC_ASSERT(__builtin_types_compatible_p(__typeof__(queue), struct 
> u_vector *)); \

The way this is GCC builtin is used here, this should be fine.  However,
in case it's begin used elsewhere, we should #undef it afterwards.  I'd
hate to mask some other kind of bug that may be introduced later.

> for (uint32_t __u_vector_offset = (queue)->tail;  
>   \
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-28 Thread Ian Romanick
On 2/28/19 4:47 AM, Iago Toral wrote:
> On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote:
>> On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
>>> Now that we propagate constants to the first source of 2src
>>> instructions we
>>> see more opportunities of constant folding in the backend.
>>>
>>> Shader-db results on KBL:
>>>
>>> total instructions in shared programs: 14965607 -> 14855983 (-
>>> 0.73%)
>>> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
>>> helped: 14292
>>> HURT: 59
>>>
>>> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
>>> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
>>> helped: 14056
>>> HURT: 3314
>>>
>>> total loops in shared programs: 4283 -> 4283 (0.00%)
>>> loops in affected programs: 0 -> 0
>>> helped: 0
>>> HURT: 0
>>>
>>> total spills in shared programs: 27812 -> 24350 (-12.45%)
>>> spills in affected programs: 24921 -> 21459 (-13.89%)
>>> helped: 345
>>> HURT: 19
>>>
>>> total fills in shared programs: 24173 -> 22032 (-8.86%)
>>> fills in affected programs: 21124 -> 18983 (-10.14%)
>>> helped: 355
>>> HURT: 25
>>
>> Ignore my previous questions about nir_opt_constant_folding after
>> nir_opt_algebraic_late.  I had done that because I added a bunch of
>> things to nir_opt_algebraic_late that created my constant folding
>> opportunities.
>>
>> This is the combined changes for this patch and the previous
>> patch.  For
>> this patch alone, I got:
>>
>> total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
>> instructions in affected programs: 2911451 -> 2826756 (-2.91%)
>> helped: 13121
>> HURT: 44
>> helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
>> helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
>> HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
>> HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
>> 95% mean confidence interval for instructions value: -6.61 -6.26
>> 95% mean confidence interval for instructions %-change: -4.23% -4.07%
>> Instructions are helped.
>>
>> total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
>> cycles in affected programs: 146769299 -> 144179283 (-1.76%)
>> helped: 10992
>> HURT: 1833
>> helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
>> helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
>> HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
>> HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
>> 95% mean confidence interval for cycles value: -248.21 -155.69
>> 95% mean confidence interval for cycles %-change: -1.67% -1.44%
>> Cycles are helped.
>>
>> total spills in shared programs: 28828 -> 2 (0.21%)
>> spills in affected programs: 2037 -> 2097 (2.95%)
>> helped: 0
>> HURT: 24
>>
>> total fills in shared programs: 35542 -> 35639 (0.27%)
>> fills in affected programs: 3078 -> 3175 (3.15%)
>> helped: 2
>> HURT: 26
>>
>> I decided to look at some of the hurt shaders... it looks like some
>> of
>> the Unigine geometry shaders really took a beating (+150%
>> instructions).
>> Note the "max" in the "instructions in affected programs" above.
> 
> I am seeing quite different results on my KBL laptop:
> 
> total instructions in shared programs: 14945933 -> 14858158 (-0.59%)
> instructions in affected programs: 2842901 -> 2755126 (-3.09%)
> helped: 13196
> HURT: 5
> 
> instructions HURT:   shaders/closed/steam/deus-ex-mankind-
> divided/274.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
> instructions HURT:   shaders/closed/steam/deus-ex-mankind-
> divided/184.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
> instructions HURT:   shaders/dolphin/ubershaders/147.shader_test FS
> SIMD8: 3481 -> 3491 (0.29%)
> instructions HURT:   shaders/dolphin/ubershaders/156.shader_test FS
> SIMD8: 3465 -> 3475 (0.29%)
> instructions HURT:   shaders/dolphin/ubershaders/138.shader_test FS
> SIMD8: 3465 -> 3475 (0.29%)
> 
> Did you test on a different gen? Can you paste here the paths of some
> of the GS shaders where you see the big regressions so I can verify I
> have them in my shader-db?
> 
> Also, how did you test this patch exactly? When I was going to capture
> the reference shader-db results for patch 2 in this series so I could
> extract the results for patch 3 by comparing agai

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Ian Romanick
On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> Now that we propagate constants to the first source of 2src instructions we
> see more opportunities of constant folding in the backend.
> 
> Shader-db results on KBL:
> 
> total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> helped: 14292
> HURT: 59
> 
> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> helped: 14056
> HURT: 3314
> 
> total loops in shared programs: 4283 -> 4283 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
> 
> total spills in shared programs: 27812 -> 24350 (-12.45%)
> spills in affected programs: 24921 -> 21459 (-13.89%)
> helped: 345
> HURT: 19
> 
> total fills in shared programs: 24173 -> 22032 (-8.86%)
> fills in affected programs: 21124 -> 18983 (-10.14%)
> helped: 355
> HURT: 25

Ignore my previous questions about nir_opt_constant_folding after
nir_opt_algebraic_late.  I had done that because I added a bunch of
things to nir_opt_algebraic_late that created my constant folding
opportunities.

This is the combined changes for this patch and the previous patch.  For
this patch alone, I got:

total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
instructions in affected programs: 2911451 -> 2826756 (-2.91%)
helped: 13121
HURT: 44
helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
95% mean confidence interval for instructions value: -6.61 -6.26
95% mean confidence interval for instructions %-change: -4.23% -4.07%
Instructions are helped.

total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
cycles in affected programs: 146769299 -> 144179283 (-1.76%)
helped: 10992
HURT: 1833
helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
95% mean confidence interval for cycles value: -248.21 -155.69
95% mean confidence interval for cycles %-change: -1.67% -1.44%
Cycles are helped.

total spills in shared programs: 28828 -> 2 (0.21%)
spills in affected programs: 2037 -> 2097 (2.95%)
helped: 0
HURT: 24

total fills in shared programs: 35542 -> 35639 (0.27%)
fills in affected programs: 3078 -> 3175 (3.15%)
helped: 2
HURT: 26

I decided to look at some of the hurt shaders... it looks like some of
the Unigine geometry shaders really took a beating (+150% instructions).
Note the "max" in the "instructions in affected programs" above.

More comments below by SHL...

> LOST:   0
> GAINED: 5
> ---
>  src/intel/compiler/brw_fs.cpp | 203 --
>  1 file changed, 195 insertions(+), 8 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2358acbeb59..b2b60237c82 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
>   break;
>  
>case BRW_OPCODE_MUL:
> - if (inst->src[1].file != IMM)
> + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
>  continue;
>  
> + /* Constant folding */
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> +assert(inst->src[0].type == inst->src[1].type);
> +bool local_progress = true;
> +switch (inst->src[0].type) {
> +case BRW_REGISTER_TYPE_HF: {
> +   float v1 = _mesa_half_to_float(inst->src[0].ud & 0xu);
> +   float v2 = _mesa_half_to_float(inst->src[1].ud & 0xu);
> +   inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * v2));
> +   break;
> +}
> +case BRW_REGISTER_TYPE_W: {
> +   int16_t v1 = inst->src[0].ud & 0xu;
> +   int16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_w(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_UW: {
> +   uint16_t v1 = inst->src[0].ud & 0xu;
> +   uint16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_uw(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_F:
> +   inst->src[0].f *= inst->src[1].f;
> +   break;
> +case BRW_REGISTER_TYPE_D:
> +   inst->src[0].d *= inst->src[1].d;
> +   break;
> +case BRW_REGISTER_TYPE_UD:
> +   inst->src[0].ud *= inst->src[1].ud;
> +   break;
> +default:
> +   local_progress = false;
> +   break;
> +};
> +
> +if (

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Ian Romanick
On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> Now that we propagate constants to the first source of 2src instructions we
> see more opportunities of constant folding in the backend.

All the benefit of the series is from more constant folding?  Once upon
a time, I had a patch that added another call to
nir_opt_constant_folding after we call nir_opt_algebraic_late.  My
recollection is that it hurt vec4 shaders, but it helped scalar shaders
quite a bit.  How does doing that affect these results?

Hrm... I can collect that data.

> Shader-db results on KBL:
> 
> total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> helped: 14292
> HURT: 59
> 
> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> helped: 14056
> HURT: 3314
> 
> total loops in shared programs: 4283 -> 4283 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
> 
> total spills in shared programs: 27812 -> 24350 (-12.45%)
> spills in affected programs: 24921 -> 21459 (-13.89%)
> helped: 345
> HURT: 19
> 
> total fills in shared programs: 24173 -> 22032 (-8.86%)
> fills in affected programs: 21124 -> 18983 (-10.14%)
> helped: 355
> HURT: 25
> 
> LOST:   0
> GAINED: 5
> ---
>  src/intel/compiler/brw_fs.cpp | 203 --
>  1 file changed, 195 insertions(+), 8 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2358acbeb59..b2b60237c82 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
>   break;
>  
>case BRW_OPCODE_MUL:
> - if (inst->src[1].file != IMM)
> + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
>  continue;
>  
> + /* Constant folding */
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> +assert(inst->src[0].type == inst->src[1].type);
> +bool local_progress = true;
> +switch (inst->src[0].type) {
> +case BRW_REGISTER_TYPE_HF: {
> +   float v1 = _mesa_half_to_float(inst->src[0].ud & 0xu);
> +   float v2 = _mesa_half_to_float(inst->src[1].ud & 0xu);
> +   inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * v2));
> +   break;
> +}
> +case BRW_REGISTER_TYPE_W: {
> +   int16_t v1 = inst->src[0].ud & 0xu;
> +   int16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_w(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_UW: {
> +   uint16_t v1 = inst->src[0].ud & 0xu;
> +   uint16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_uw(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_F:
> +   inst->src[0].f *= inst->src[1].f;
> +   break;
> +case BRW_REGISTER_TYPE_D:
> +   inst->src[0].d *= inst->src[1].d;
> +   break;
> +case BRW_REGISTER_TYPE_UD:
> +   inst->src[0].ud *= inst->src[1].ud;
> +   break;
> +default:
> +   local_progress = false;
> +   break;
> +};
> +
> +if (local_progress) {
> +   inst->opcode = BRW_OPCODE_MOV;
> +   inst->src[1] = reg_undef;
> +   progress = true;
> +   break;
> +}
> + }
> +
> +
>   /* a * 1.0 = a */
>   if (inst->src[1].is_one()) {
>  inst->opcode = BRW_OPCODE_MOV;
> @@ -2594,6 +2640,14 @@ fs_visitor::opt_algebraic()
>  break;
>   }
>  
> + if (inst->src[0].is_one()) {
> +inst->opcode = BRW_OPCODE_MOV;
> +inst->src[0] = inst->src[1];
> +inst->src[1] = reg_undef;
> +progress = true;
> +break;
> + }
> +
>   /* a * -1.0 = -a */
>   if (inst->src[1].is_negative_one()) {
>  inst->opcode = BRW_OPCODE_MOV;
> @@ -2603,27 +2657,160 @@ fs_visitor::opt_algebraic()
>  break;
>   }
>  
> - if (inst->src[0].file == IMM) {
> -assert(inst->src[0].type == BRW_REGISTER_TYPE_F);
> + if (inst->src[0].is_negative_one()) {
> +inst->opcode = BRW_OPCODE_MOV;
> +inst->src[0] = inst->src[1];
> +inst->src[0].negate = !inst->src[1].negate;
> +inst->src[1] = reg_undef;
> +progress = true;
> +break;
> + }
> +
> + /* a * 0 = 0 (this is not exact for floating point) */
> + if (inst->src[1].is_zero() &&
> + brw_reg_type_is_integer(inst->src[1].type)) {
> +inst->opcode = BRW_OPCODE_MOV;
> +  

Re: [Mesa-dev] [PATCH 3/9] nir: Add a new ALU nir_op_imad24_ir3

2019-02-25 Thread Ian Romanick
On 2/13/19 1:29 PM, Eduardo Lima Mitev wrote:
> ir3 compiler has an integer multiply-add instruction (MAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can directly represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..abbb3627a33 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -892,3 +892,19 @@ dst.w = src3.x;
>  """)
>  
>  
> +# Freedreno-specific opcode that maps directly to ir3_MAD_S24.
> +# It is emitted by ir3_nir_lower_io_offsets pass when computing
> +# byte-offsets for image store and atomics.
> +#
> +# The nir_algebraic expression below is: get 23 bits of the
> +# two factors as unsigned and multiply them. If either of the
> +# two was negative, invert sign of the product. Then add it src2.
> +# @FIXME: I suspect there is a simpler expression for this.
> +triop("imad24_ir3", tint, """
> +unsigned f0 = ((unsigned) src0) & 0x7f;
> +unsigned f1 = ((unsigned) src1) & 0x7f;
> +dst = f0 * f1;

How about (((int)src0 << 8) >> 8) * (((int)src1 << 8) >> 8) + src2?  The
trick is making sure the implementation matches what the hardware does
in all cases.  My expression will produce different results than yours
for cases like 0xf01f * 2.  0x3e vs -0x3e.  "Correct"
depends entirely on what real hardware would produce.  If I had to
guess, I would guess that the hardware would produce 0x3e since it
likely just ignores the upper 8 bits of the sources.

> +if (src0 * src1 < 0)
> +   dst = -dst;
> +dst += src2;
> +""")
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/algebraic: Replace a-fract(a) with floor(a)

2019-02-25 Thread Ian Romanick
On 2/23/19 4:11 PM, Timothy Arceri wrote:
> 
> 
> On 23/2/19 4:09 pm, Ian Romanick wrote:
>> From: Ian Romanick 
>>
>> I noticed this while looking at a shader that was affected by Tim's
>> "more loop unrolling" series.
>>
>> All Gen6+ platforms had similar results. (Skylake shown)
>> total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
>> instructions in affected programs: 213651 -> 211909 (-0.82%)
>> helped: 988
>> HURT: 0
>> helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
>> helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
>> 95% mean confidence interval for instructions value: -1.89 -1.63
>> 95% mean confidence interval for instructions %-change: -1.23% -1.05%
>> Instructions are helped.
>>
>> total cycles in shared programs: 383007378 -> 382997063 (<.01%)
>> cycles in affected programs: 1650825 -> 1640510 (-0.62%)
>> helped: 679
>> HURT: 302
> 
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different. :(

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL:

total cycles in shared programs: 382997063 -> 382998953 (<.01%)
cycles in affected programs: 549527 -> 551417 (0.34%)
helped: 82
HURT: 241
helped stats (abs) min: 1 max: 26 x̄: 6.88 x̃: 6
helped stats (rel) min: 0.06% max: 2.04% x̄: 0.56% x̃: 0.44%
HURT stats (abs)   min: 1 max: 120 x̄: 10.18 x̃: 14
HURT stats (rel)   min: 0.04% max: 3.86% x̄: 0.63% x̃: 0.52%
95% mean confidence interval for cycles value: 4.44 7.26
95% mean confidence interval for cycles %-change: 0.24% 0.42%
Cycles are HURT.


>> helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
>> helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
>> HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
>> HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
>> 95% mean confidence interval for cycles value: -13.05 -7.98
>> 95% mean confidence interval for cycles %-change: -0.86% -0.50%
>> Cycles are helped.
>>
>> Iron Lake and GM45 had similar results. (GM45 shown)
>> total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
>> instructions in affected programs: 119691 -> 119085 (-0.51%)
>> helped: 432
>> HURT: 0
>> helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
>> helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
>> 95% mean confidence interval for instructions value: -1.58 -1.23
>> 95% mean confidence interval for instructions %-change: -0.72% -0.59%
>> Instructions are helped.
>>
>> total cycles in shared programs: 128139812 -> 128135762 (<.01%)
>> cycles in affected programs: 3829724 -> 3825674 (-0.11%)
>> helped: 602
>> HURT: 0
>> helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
>> helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
>> 95% mean confidence interval for cycles value: -8.40 -5.05
>> 95% mean confidence interval for cycles %-change: -0.22% -0.16%
>> Cycles are helped.
>> ---
>>   src/compiler/nir/nir_opt_algebraic.py | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index ba27d702b5d..c8fc938cc8f 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -127,6 +127,7 @@ optimizations = [
>>  (('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
>> 'options->lower_flrp32'),
>>  (('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
>> 'options->lower_flrp64'),

[Mesa-dev] [PATCH] nir/algebraic: Replace a-fract(a) with floor(a)

2019-02-22 Thread Ian Romanick
From: Ian Romanick 

I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.

total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.
---
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index ba27d702b5d..c8fc938cc8f 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -127,6 +127,7 @@ optimizations = [
(('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp32'),
(('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp64'),
(('ffloor', a), ('fsub', a, ('ffract', a)), 'options->lower_ffloor'),
+   (('fadd', a, ('fneg', ('ffract', a))), ('ffloor', a), 
'!options->lower_ffloor'),
(('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
(('fceil', a), ('fneg', ('ffloor', ('fneg', a))), 'options->lower_fceil'),
(('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', 'c@1', ('fmul', 
b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/builder: Don't emit no-op swizzles

2019-02-22 Thread Ian Romanick
Reviewed-by: Ian Romanick 

On 2/22/19 4:03 PM, Jason Ekstrand wrote:
> The nir_swizzle helper is used some on it's own but it's also called by
> nir_channel and nir_channels which are used everywhere.  It's pretty
> quick to check while we're walking the swizzle anyway whether or not
> it's an identity swizzle.  If it is, we now don't bother emitting the
> instruction.  Sure, copy-prop will clean it up for us but there's no
> sense making more work for the optimizer than we have to.
> ---
>  src/compiler/nir/nir_builder.h | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h
> index c6e80e729a8..253ca5941cb 100644
> --- a/src/compiler/nir/nir_builder.h
> +++ b/src/compiler/nir/nir_builder.h
> @@ -497,8 +497,16 @@ nir_swizzle(nir_builder *build, nir_ssa_def *src, const 
> unsigned *swiz,
> assert(num_components <= NIR_MAX_VEC_COMPONENTS);
> nir_alu_src alu_src = { NIR_SRC_INIT };
> alu_src.src = nir_src_for_ssa(src);
> -   for (unsigned i = 0; i < num_components && i < NIR_MAX_VEC_COMPONENTS; 
> i++)
> +
> +   bool is_identity_swizzle = true;
> +   for (unsigned i = 0; i < num_components && i < NIR_MAX_VEC_COMPONENTS; 
> i++) {
> +  if (swiz[i] != i)
> + is_identity_swizzle = false;
>alu_src.swizzle[i] = swiz[i];
> +   }
> +
> +   if (num_components == src->num_components && is_identity_swizzle)
> +  return src;
>  
> return use_fmov ? nir_fmov_alu(build, alu_src, num_components) :
>   nir_imov_alu(build, alu_src, num_components);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] spirv: Add missing break

2019-02-13 Thread Ian Romanick
From: Ian Romanick 

Fixes: c6465fec0c5 ("spirv: add SpvCapabilityInt64Atomics")
CID: 1442555
---
 src/compiler/spirv/spirv_to_nir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 1cbc926c818..5e8eb222555 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3590,6 +3590,7 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
 
   case SpvCapabilityInt64Atomics:
  spv_check_supported(int64_atomics, cap);
+ break;
 
   case SpvCapabilityInt8:
  spv_check_supported(int8, cap);
-- 
2.14.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] nir: turn ssa check into an assert

2019-02-13 Thread Ian Romanick
On 2/13/19 12:00 AM, Timothy Arceri wrote:
> Everthing should be in ssa form when this is called. Checking
> for it here is expensive so turn this into an assert instead.
> 
> Do the cheap thing first and check if we can even progress with
> this instruction type.
> ---
>  src/compiler/nir/nir_instr_set.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_instr_set.c 
> b/src/compiler/nir/nir_instr_set.c
> index 61476c0b03f..c795efbca6a 100644
> --- a/src/compiler/nir/nir_instr_set.c
> +++ b/src/compiler/nir/nir_instr_set.c
> @@ -498,6 +498,16 @@ dest_is_ssa(nir_dest *dest, void *data)
> return dest->is_ssa;
>  }
>  
> +static bool
> +instr_each_src_and_dest_is_ssa(nir_instr *instr)
> +{
> +   if (!nir_foreach_dest(instr, dest_is_ssa, NULL) ||
> +   !nir_foreach_src(instr, src_is_ssa, NULL))
> +  return false;
> +
> +   return true;

I get that this is trying to be obvious about just moving code, but I'd
like this a lot better if it more directly matched the name:

   return nir_foreach_dest(instr, dest_is_ssa, NULL) &&
  nir_foreach_src(instr, src_is_ssa, NULL);

> +}
> +
>  /* This function determines if uses of an instruction can safely be rewritten
>   * to use another identical instruction instead. Note that this function must
>   * be kept in sync with hash_instr() and nir_instrs_equal() -- only
> @@ -509,9 +519,7 @@ static bool
>  instr_can_rewrite(nir_instr *instr)
>  {
> /* We only handle SSA. */
> -   if (!nir_foreach_dest(instr, dest_is_ssa, NULL) ||
> -   !nir_foreach_src(instr, src_is_ssa, NULL))
> -  return false;
> +   assert(instr_each_src_and_dest_is_ssa(instr));
>  
> switch (instr->type) {
> case nir_instr_type_alu:
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: move ALU instruction before the jump instruction

2019-02-13 Thread Ian Romanick
On 2/13/19 9:59 AM, Juan A. Suarez Romero wrote:
> On Wed, 2019-02-13 at 09:16 -0800, Ian Romanick wrote:
>> On 2/13/19 7:53 AM, Juan A. Suarez Romero wrote:
>>> On Tue, 2019-02-12 at 16:22 -0800, Ian Romanick wrote:
>>>> On 2/12/19 12:58 AM, Juan A. Suarez Romero wrote:
>>>>> opt_split_alu_of_phi moves ALU instruction to the end of continue block.
>>>>>
>>>>> But if the continue block ends with a jump instruction (an explicit
>>>>> "continue" instruction) then the ALU must be inserted before the jump,
>>>>> as it is illegal to add instructions after the jump.
>>>>
>>>> I'm assuming you found this by inspection?  Since this pass only
>>>> operates when the first block of the loop only has two predecessors (the
>>>> block before the loop and the implicit continue at the end of the loop),
>>>> this shouldn't be a a problem in practice... or were you able to trigger
>>>> it somehow?
>>>
>>> Found when dealing with the SPIR-V code that I've sent in 
>>> https://lists.freedesktop.org/archives/mesa-dev/2019-February/214906.html
>>>
>>>
>>> The obtained NIR code has an explicit continue at the end of the loop (see 
>>> http://paste.debian.net/1067619/, in particular the loop with header 
>>> block_2).
>>
>> That loop is a mess.  Wow... I'm impressed. :) I see a continue inside
>> an else-clause at line 107 and a break at line 112.  I now understand
>> the fundamental problem.  Am I correct that this problem would also
>> occur before 8fb8ebfbb05?  And that's what "nir: allow stitching of
>> non-empty block" fixes?
> 
> Yeah :) It's a SPIR-V code generated with some fuzzy tool.
> 
> 
> And yes, even with this patch applied, it will break later when stitching the
> code. 
> 
> Fortunately, I got rid of the "nir: allow stitching of non-empty block" patch
> and sent instead another one that removes one of the jumps to avoid the issue.
> 
>> Did nir_validate trip on this?  If not, it seems like it should...
>> though that might cause problems when nir_validate is run immediately
>> following translation into NIR.
> 
> The NIR code seems valid. The error (assert) was caught by nir_instr_insert,
> when applying the optimization.

Having instructions in a block after an unconditional jump seems bogus.
 It's technically valid, but it can never be correct.  I'm pretty sure
we scrub that from the GLSL frontend, and I'm not sure it's legal in
SPIR-V.  I'll look into it more.

>> It seems like we could craft a couple *simple* piglit tests that
>> exercise this.  I don't want to rely on a giant, ugly test case as our
>> only coverage for this issue.  I'm sure debugging this was not fun, and
>> I don't want someone to have to go through that again should a similar
>> issue creep back in.  I can take a stab at that if you don't already
>> have something ready.
>>
> 
> I didn't write a piglit test for this, as this test will end up in the Vulkan
> CTS.

Right.  We've always had a preference for simpler tests that poke at one
specific thing.  I'll come up with a couple tests, and I'll CC you on them.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: move ALU instruction before the jump instruction

2019-02-13 Thread Ian Romanick
On 2/13/19 7:53 AM, Juan A. Suarez Romero wrote:
> On Tue, 2019-02-12 at 16:22 -0800, Ian Romanick wrote:
>> On 2/12/19 12:58 AM, Juan A. Suarez Romero wrote:
>>> opt_split_alu_of_phi moves ALU instruction to the end of continue block.
>>>
>>> But if the continue block ends with a jump instruction (an explicit
>>> "continue" instruction) then the ALU must be inserted before the jump,
>>> as it is illegal to add instructions after the jump.
>>
>> I'm assuming you found this by inspection?  Since this pass only
>> operates when the first block of the loop only has two predecessors (the
>> block before the loop and the implicit continue at the end of the loop),
>> this shouldn't be a a problem in practice... or were you able to trigger
>> it somehow?
> 
> Found when dealing with the SPIR-V code that I've sent in 
> https://lists.freedesktop.org/archives/mesa-dev/2019-February/214906.html
> 
> 
> The obtained NIR code has an explicit continue at the end of the loop (see 
> http://paste.debian.net/1067619/, in particular the loop with header block_2).

That loop is a mess.  Wow... I'm impressed. :) I see a continue inside
an else-clause at line 107 and a break at line 112.  I now understand
the fundamental problem.  Am I correct that this problem would also
occur before 8fb8ebfbb05?  And that's what "nir: allow stitching of
non-empty block" fixes?

Did nir_validate trip on this?  If not, it seems like it should...
though that might cause problems when nir_validate is run immediately
following translation into NIR.

It seems like we could craft a couple *simple* piglit tests that
exercise this.  I don't want to rely on a giant, ugly test case as our
only coverage for this issue.  I'm sure debugging this was not fun, and
I don't want someone to have to go through that again should a similar
issue creep back in.  I can take a stab at that if you don't already
have something ready.

Either way, I think this change is obviously correct.  This patch is

Reviewed-by: Ian Romanick 

>   J.A.
> 
> 
> 
>>> CC: Ian Romanick 
>>> Fixes: 0881e90c099 ("nir: Split ALU instructions in loops that read phis")
>>> ---
>>>  src/compiler/nir/nir_opt_if.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/src/compiler/nir/nir_opt_if.c b/src/compiler/nir/nir_opt_if.c
>>> index 9afb901be14..932af9e37ab 100644
>>> --- a/src/compiler/nir/nir_opt_if.c
>>> +++ b/src/compiler/nir/nir_opt_if.c
>>> @@ -488,7 +488,7 @@ opt_split_alu_of_phi(nir_builder *b, nir_loop *loop)
>>>*
>>>* Insert the new instruction at the end of the continue block.
>>>*/
>>> - b->cursor = nir_after_block(continue_block);
>>> + b->cursor = nir_after_block_before_jump(continue_block);
>>>  
>>>   nir_ssa_def *const alu_copy =
>>>  clone_alu_and_replace_src_defs(b, alu, continue_srcs);

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: move ALU instruction before the jump instruction

2019-02-12 Thread Ian Romanick
On 2/12/19 12:58 AM, Juan A. Suarez Romero wrote:
> opt_split_alu_of_phi moves ALU instruction to the end of continue block.
> 
> But if the continue block ends with a jump instruction (an explicit
> "continue" instruction) then the ALU must be inserted before the jump,
> as it is illegal to add instructions after the jump.

I'm assuming you found this by inspection?  Since this pass only
operates when the first block of the loop only has two predecessors (the
block before the loop and the implicit continue at the end of the loop),
this shouldn't be a a problem in practice... or were you able to trigger
it somehow?

> CC: Ian Romanick 
> Fixes: 0881e90c099 ("nir: Split ALU instructions in loops that read phis")
> ---
>  src/compiler/nir/nir_opt_if.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/compiler/nir/nir_opt_if.c b/src/compiler/nir/nir_opt_if.c
> index 9afb901be14..932af9e37ab 100644
> --- a/src/compiler/nir/nir_opt_if.c
> +++ b/src/compiler/nir/nir_opt_if.c
> @@ -488,7 +488,7 @@ opt_split_alu_of_phi(nir_builder *b, nir_loop *loop)
>*
>* Insert the new instruction at the end of the continue block.
>*/
> - b->cursor = nir_after_block(continue_block);
> + b->cursor = nir_after_block_before_jump(continue_block);
>  
>   nir_ssa_def *const alu_copy =
>  clone_alu_and_replace_src_defs(b, alu, continue_srcs);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: allow stitching of non-empty block

2019-02-08 Thread Ian Romanick
On 2/8/19 5:21 AM, Juan A. Suarez Romero wrote:
> On Sat, 2019-01-26 at 08:37 -0800, Jason Ekstrand wrote:
>> This makes me a bit nervous. I'll have to look at it in more detail.
>>
> 
> Did you have time to take a look at this?

Is there a test case that hits this?  Was it found by inspection?  I'm
curious what the back story is...

>   J.A.
> 
>> On January 25, 2019 09:37:52 "Juan A. Suarez Romero"  
>> wrote:
>>
>>> When stitching two blocks A and B, where A's last instruction is a jump,
>>> it is not required that B is empty; it can be plainly removed.
>>>
>>> This can happen in a situation like this:
>>>
>>> vec1 1 ssa_1 = load_const (true)
>>> vec1 1 ssa_2 = load_const (false)
>>> block block_1:
>>> [...]
>>> loop {
>>>  vec1 ssa_3 = phi block_1: ssa_2, block_4: ssa_1
>>>  if ssa_3 {
>>>block block_2:
>>>[...]
>>>break
>>>  } else {
>>>block block_3:
>>>  }
>>>  vec1 ssa_4 = 
>>>  if ssa_4 {
>>>block block_4:
>>>continue
>>>  } else {
>>>block block_5:
>>>  }
>>>  block block_6:
>>>  [...]
>>> }
>>>
>>> And opt_peel_loop_initial_if is applied. In this case, we would be
>>> ending up stitching block_2 (which finalizes with a jump) with
>>> block_4, which is not empty.
>>>
>>> CC: Jason Ekstrand 
>>> ---
>>> src/compiler/nir/nir_control_flow.c | 1 -
>>> 1 file changed, 1 deletion(-)
>>>
>>> diff --git a/src/compiler/nir/nir_control_flow.c 
>>> b/src/compiler/nir/nir_control_flow.c
>>> index ddba2e55b45..27508f230d6 100644
>>> --- a/src/compiler/nir/nir_control_flow.c
>>> +++ b/src/compiler/nir/nir_control_flow.c
>>> @@ -550,7 +550,6 @@ stitch_blocks(nir_block *before, nir_block *after)
>>> */
>>>
>>>if (nir_block_ends_in_jump(before)) {
>>> -  assert(exec_list_is_empty(&after->instr_list));
>>>   if (after->successors[0])
>>>  remove_phi_src(after->successors[0], after);
>>>   if (after->successors[1])
>>> --
>>> 2.20.1
>>
>>
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/12] nir: rename global/local to private/function memory

2019-01-09 Thread Ian Romanick
On 1/8/19 9:57 PM, Kenneth Graunke wrote:
> On Tuesday, December 4, 2018 10:26:43 AM PST Karol Herbst wrote:
>> the naming is a bit confusing no matter how you look at it. Within SPIR-V
>> "global" memory is memory accessible from all threads. glsl "global" memory
>> normally refers to shader thread private memory declared at global scope. As
>> we already use "shared" for memory shared across all thrads of a work group
>> the solution where everybody could be happy with is to rename "global" to
>> "private" and use "global" later for memory usually stored within system
>> accessible memory (be it VRAM or system RAM if keeping SVM in mind).
>> glsl "local" memory is memory only accessible within a function, while SPIR-V
>> "local" memory is memory accessible within the same workgroup.
>>
>> v2: rename local to function as well
>>
>> Signed-off-by: Karol Herbst 
> 
> I strongly dislike this patch, and I think we ought to revert it.
> 
> This probably makes sense from an OpenCL memory-model view of the world,
> but it's really confusing from a compiler or general programming point
> of view.
> 
> /Everybody/ knows what a local variable is.  It's one of the most used
> concepts in programming.  Calling it nir_var_function is very confusing.
> The variable is a...function?  Maybe it's a function pointer?  Neither
> of those things even exist in GLSL, so...what the heck is it?
> 
> Renaming global scope variables to "private" is also confusing IMO.
> They're certainly not private to a function.  They're globally
> accessible by anything in the whole shader.  I'll admit "global" isn't
> a great name either.

It seems like the concepts we're after a function local and thread
local, so why not nir_var_thread_local (for old nir_var_global) and
nir_var_function_local (for old nir_var_local).  When "global" is
reintroduced to mean thread global, we could add it as
nir_var_thread_global.  That seems to match at least one reasonable view
of a storage hierarchy.

> I think we need to discuss this more.  There are people with large
> series of outstanding work that now have to rebase on this flag day
> rename, and I don't think two people is enough consensus for renaming
> a core IR concept.  Can we find names we're all happy with?
> 
> --Ken



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] util/queue: add util_queue_adjust_num_threads

2019-01-03 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> for ARB_parallel_shader_compile
> ---
>  src/util/u_queue.c | 50 --
>  src/util/u_queue.h |  8 
>  2 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 612ad5e83c6..383a9c09919 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -27,42 +27,43 @@
>  #include "u_queue.h"
>  
>  #include 
>  
>  #include "util/os_time.h"
>  #include "util/u_string.h"
>  #include "util/u_thread.h"
>  #include "u_process.h"
>  
>  static void
> -util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads);
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads,
> +bool finish_locked);
>  
>  /
>   * Wait for all queues to assert idle when exit() is called.
>   *
>   * Otherwise, C++ static variable destructors can be called while threads
>   * are using the static variables.
>   */
>  
>  static once_flag atexit_once_flag = ONCE_FLAG_INIT;
>  static struct list_head queue_list;
>  static mtx_t exit_mutex = _MTX_INITIALIZER_NP;
>  
>  static void
>  atexit_handler(void)
>  {
> struct util_queue *iter;
>  
> mtx_lock(&exit_mutex);
> /* Wait for all queues to assert idle. */
> LIST_FOR_EACH_ENTRY(iter, &queue_list, head) {
> -  util_queue_kill_threads(iter, 0);
> +  util_queue_kill_threads(iter, 0, false);
> }
> mtx_unlock(&exit_mutex);
>  }
>  
>  static void
>  global_init(void)
>  {
> LIST_INITHEAD(&queue_list);
> atexit(atexit_handler);
>  }
> @@ -333,20 +334,53 @@ util_queue_create_thread(struct util_queue *queue, 
> unsigned index)
> *
> * Note that Linux only allows decreasing the priority. The original
> * priority can't be restored.
> */
>pthread_setschedparam(queue->threads[index], SCHED_IDLE, &sched_param);
>  #endif
> }
> return true;
>  }
>  
> +void
> +util_queue_adjust_num_threads(struct util_queue *queue, unsigned num_threads)
> +{
> +   num_threads = MIN2(num_threads, queue->max_threads);
> +   num_threads = MAX2(num_threads, 1);
> +
> +   mtx_lock(&queue->finish_lock);
> +   unsigned old_num_threads = queue->num_threads;
> +
> +   if (num_threads == old_num_threads) {
> +  mtx_unlock(&queue->finish_lock);
> +  return;
> +   }
> +
> +   if (num_threads < old_num_threads) {
> +  util_queue_kill_threads(queue, num_threads, true);
> +  mtx_unlock(&queue->finish_lock);
> +  return;
> +   }
> +
> +   /* Create threads.
> +*
> +* We need to update num_threads first, because threads terminate
> +* when thread_index < num_threads.
> +*/
> +   queue->num_threads = num_threads;
> +   for (unsigned i = old_num_threads; i < num_threads; i++) {
> +  if (!util_queue_create_thread(queue, i))
> + break;
> +   }
> +   mtx_unlock(&queue->finish_lock);
> +}
> +
>  bool
>  util_queue_init(struct util_queue *queue,
>  const char *name,
>  unsigned max_jobs,
>  unsigned num_threads,
>  unsigned flags)
>  {
> unsigned i;
>  
> /* Form the thread name from process_name and name, limited to 13
> @@ -371,20 +405,21 @@ util_queue_init(struct util_queue *queue,
> memset(queue, 0, sizeof(*queue));
>  
> if (process_len) {
>util_snprintf(queue->name, sizeof(queue->name), "%.*s:%s",
>  process_len, process_name, name);
> } else {
>util_snprintf(queue->name, sizeof(queue->name), "%s", name);
> }
>  
> queue->flags = flags;
> +   queue->max_threads = num_threads;
> queue->num_threads = num_threads;
> queue->max_jobs = max_jobs;
>  
> queue->jobs = (struct util_queue_job*)
>   calloc(max_jobs, sizeof(struct util_queue_job));
> if (!queue->jobs)
>goto fail;
>  
> (void) mtx_init(&queue->lock, mtx_plain);
> (void) mtx_init(&queue->finish_lock, mtx_plain);
> @@ -422,48 +457,51 @@ fail:
>cnd_destroy(&queue->has_queued_cond);
>mtx_destroy(&queue->lock);
>free(queue->jobs);
> }
> /* also util_queue_is_initialized can be used to 

Re: [Mesa-dev] [PATCH 5/7] util/queue: hold a lock when reading num_threads in util_queue_finish

2019-01-03 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/util/u_queue.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 5aaf60ae78e..612ad5e83c6 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -582,29 +582,29 @@ util_queue_finish_execute(void *data, int num_thread)
> util_barrier_wait(barrier);
>  }
>  
>  /**
>   * Wait until all previously added jobs have completed.
>   */
>  void
>  util_queue_finish(struct util_queue *queue)
>  {
> util_barrier barrier;
> -   struct util_queue_fence *fences = malloc(queue->num_threads * 
> sizeof(*fences));
> -
> -   util_barrier_init(&barrier, queue->num_threads);
> +   struct util_queue_fence *fences;
>  
> /* If 2 threads were adding jobs for 2 different barries at the same time,
>  * a deadlock would happen, because 1 barrier requires that all threads
>  * wait for it exclusively.
>  */
> mtx_lock(&queue->finish_lock);
> +   fences = malloc(queue->num_threads * sizeof(*fences));
> +   util_barrier_init(&barrier, queue->num_threads);
>  
> for (unsigned i = 0; i < queue->num_threads; ++i) {
>util_queue_fence_init(&fences[i]);
>util_queue_add_job(queue, &barrier, &fences[i], 
> util_queue_finish_execute, NULL);
> }
>  
> for (unsigned i = 0; i < queue->num_threads; ++i) {
>util_queue_fence_wait(&fences[i]);
>util_queue_fence_destroy(&fences[i]);
> }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] util/queue: add ability to kill a subset of threads

2019-01-03 Thread Ian Romanick
On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> for ARB_parallel_shader_compile
> ---
>  src/util/u_queue.c | 49 +-
>  src/util/u_queue.h |  5 ++---
>  2 files changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 48c5c79552d..5aaf60ae78e 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -26,42 +26,43 @@
>  
>  #include "u_queue.h"
>  
>  #include 
>  
>  #include "util/os_time.h"
>  #include "util/u_string.h"
>  #include "util/u_thread.h"
>  #include "u_process.h"
>  
> -static void util_queue_killall_and_wait(struct util_queue *queue);
> +static void
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads);
>  
>  /
>   * Wait for all queues to assert idle when exit() is called.
>   *
>   * Otherwise, C++ static variable destructors can be called while threads
>   * are using the static variables.
>   */
>  
>  static once_flag atexit_once_flag = ONCE_FLAG_INIT;
>  static struct list_head queue_list;
>  static mtx_t exit_mutex = _MTX_INITIALIZER_NP;
>  
>  static void
>  atexit_handler(void)
>  {
> struct util_queue *iter;
>  
> mtx_lock(&exit_mutex);
> /* Wait for all queues to assert idle. */
> LIST_FOR_EACH_ENTRY(iter, &queue_list, head) {
> -  util_queue_killall_and_wait(iter);
> +  util_queue_kill_threads(iter, 0);
> }
> mtx_unlock(&exit_mutex);
>  }
>  
>  static void
>  global_init(void)
>  {
> LIST_INITHEAD(&queue_list);
> atexit(atexit_handler);
>  }
> @@ -259,55 +260,58 @@ util_queue_thread_func(void *input)
>u_thread_setname(name);
> }
>  
> while (1) {
>struct util_queue_job job;
>  
>mtx_lock(&queue->lock);
>assert(queue->num_queued >= 0 && queue->num_queued <= queue->max_jobs);
>  
>/* wait if the queue is empty */
> -  while (!queue->kill_threads && queue->num_queued == 0)
> +  while (thread_index < queue->num_threads && queue->num_queued == 0)
>   cnd_wait(&queue->has_queued_cond, &queue->lock);
>  
> -  if (queue->kill_threads) {
> +  /* only kill threads that are above "num_threads" */
> +  if (thread_index >= queue->num_threads) {
>   mtx_unlock(&queue->lock);
>   break;
>}
>  
>job = queue->jobs[queue->read_idx];
>memset(&queue->jobs[queue->read_idx], 0, sizeof(struct 
> util_queue_job));
>queue->read_idx = (queue->read_idx + 1) % queue->max_jobs;
>  
>queue->num_queued--;
>cnd_signal(&queue->has_space_cond);
>mtx_unlock(&queue->lock);
>  
>if (job.job) {
>   job.execute(job.job, thread_index);
>   util_queue_fence_signal(job.fence);
>   if (job.cleanup)
>  job.cleanup(job.job, thread_index);
>}
> }
>  
> -   /* signal remaining jobs before terminating */
> +   /* signal remaining jobs if all threads are being terminated */
> mtx_lock(&queue->lock);
> -   for (unsigned i = queue->read_idx; i != queue->write_idx;
> -i = (i + 1) % queue->max_jobs) {
> -  if (queue->jobs[i].job) {
> - util_queue_fence_signal(queue->jobs[i].fence);
> - queue->jobs[i].job = NULL;
> +   if (queue->num_threads == 0) {
> +  for (unsigned i = queue->read_idx; i != queue->write_idx;
> +   i = (i + 1) % queue->max_jobs) {
> + if (queue->jobs[i].job) {
> +util_queue_fence_signal(queue->jobs[i].fence);
> +queue->jobs[i].job = NULL;
> + }
>}
> +  queue->read_idx = queue->write_idx;
> +  queue->num_queued = 0;
> }
> -   queue->read_idx = queue->write_idx;
> -   queue->num_queued = 0;
> mtx_unlock(&queue->lock);
> return 0;
>  }
>  
>  static bool
>  util_queue_create_thread(struct util_queue *queue, unsigned index)
>  {
> struct thread_input *input =
>(struct thread_input *) malloc(sizeof(struct thread_input));
> input->queue = queue;
> @@ -418,60 +422,69 @@ fail:
>cnd_destroy(&queue->has_queued_cond);
>mtx_destroy(&queue->lock);
>free(queue->jobs);
> }
> /* also util_queue_is_initialized can be used to check for success */
> memset(queue, 0, sizeof(*queue));
> return false;
>  }
>  
>  static void
> -util_queue_killall_and_wait(struct util_queue *queue)
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads)
>  {
> unsigned i;
>  
> /* Signal all threads to terminate. */
> +   mtx_lock(&queue->finish_lock);
> +
> +   if (keep_num_threads >= queue->num_threads) {
> +  mtx_unlock(&queue->finish_lock);
> +  return;
> +   }
> +
> mtx_lock(&queue->lock);
> -   queue->kill_threads = 1;
> +   unsigned old_num_threads = queue->num_threads;
> +   queue->num_threads = keep_num_threads;

Shouldn't this still be set below, after the threads are joined?

Re: [Mesa-dev] [PATCH 3/7] util/queue: move thread creation into a separate function

2019-01-03 Thread Ian Romanick
Looks to be a simple enough refactor.  This patch is

Reviewed-by: Ian Romanick 

On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/util/u_queue.c | 56 ++
>  1 file changed, 32 insertions(+), 24 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 3812c824b6d..48c5c79552d 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -298,20 +298,51 @@ util_queue_thread_func(void *input)
>   util_queue_fence_signal(queue->jobs[i].fence);
>   queue->jobs[i].job = NULL;
>}
> }
> queue->read_idx = queue->write_idx;
> queue->num_queued = 0;
> mtx_unlock(&queue->lock);
> return 0;
>  }
>  
> +static bool
> +util_queue_create_thread(struct util_queue *queue, unsigned index)
> +{
> +   struct thread_input *input =
> +  (struct thread_input *) malloc(sizeof(struct thread_input));
> +   input->queue = queue;
> +   input->thread_index = index;
> +
> +   queue->threads[index] = u_thread_create(util_queue_thread_func, input);
> +
> +   if (!queue->threads[index]) {
> +  free(input);
> +  return false;
> +   }
> +
> +   if (queue->flags & UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY) {
> +#if defined(__linux__) && defined(SCHED_IDLE)
> +  struct sched_param sched_param = {0};
> +
> +  /* The nice() function can only set a maximum of 19.
> +   * SCHED_IDLE is the same as nice = 20.
> +   *
> +   * Note that Linux only allows decreasing the priority. The original
> +   * priority can't be restored.
> +   */
> +  pthread_setschedparam(queue->threads[index], SCHED_IDLE, &sched_param);
> +#endif
> +   }
> +   return true;
> +}
> +
>  bool
>  util_queue_init(struct util_queue *queue,
>  const char *name,
>  unsigned max_jobs,
>  unsigned num_threads,
>  unsigned flags)
>  {
> unsigned i;
>  
> /* Form the thread name from process_name and name, limited to 13
> @@ -357,53 +388,30 @@ util_queue_init(struct util_queue *queue,
> queue->num_queued = 0;
> cnd_init(&queue->has_queued_cond);
> cnd_init(&queue->has_space_cond);
>  
> queue->threads = (thrd_t*) calloc(num_threads, sizeof(thrd_t));
> if (!queue->threads)
>goto fail;
>  
> /* start threads */
> for (i = 0; i < num_threads; i++) {
> -  struct thread_input *input =
> - (struct thread_input *) malloc(sizeof(struct thread_input));
> -  input->queue = queue;
> -  input->thread_index = i;
> -
> -  queue->threads[i] = u_thread_create(util_queue_thread_func, input);
> -
> -  if (!queue->threads[i]) {
> - free(input);
> -
> +  if (!util_queue_create_thread(queue, i)) {
>   if (i == 0) {
>  /* no threads created, fail */
>  goto fail;
>   } else {
>  /* at least one thread created, so use it */
>  queue->num_threads = i;
>  break;
>   }
>}
> -
> -  if (flags & UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY) {
> -   #if defined(__linux__) && defined(SCHED_IDLE)
> - struct sched_param sched_param = {0};
> -
> - /* The nice() function can only set a maximum of 19.
> -  * SCHED_IDLE is the same as nice = 20.
> -  *
> -  * Note that Linux only allows decreasing the priority. The original
> -  * priority can't be restored.
> -  */
> - pthread_setschedparam(queue->threads[i], SCHED_IDLE, &sched_param);
> -   #endif
> -  }
> }
>  
> add_to_atexit_list(queue);
> return true;
>  
>  fail:
> free(queue->threads);
>  
> if (queue->jobs) {
>cnd_destroy(&queue->has_space_cond);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] mesa: implement ARB/KHR_parallel_shader_compile

2019-01-03 Thread Ian Romanick
On 1/3/19 11:40 AM, Ian Romanick wrote:
> On 11/28/18 6:59 PM, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> Tested by piglit.
> 
> It doesn't look like there are any piglit test

Ignore that.  I started typing, checked the piglit list, then forgot to
delete it.

>> ---
>>  docs/features.txt   |  2 +-
>>  docs/relnotes/19.0.0.html   |  2 ++
>>  src/mapi/glapi/gen/gl_API.xml   | 15 ++-
>>  src/mesa/main/dd.h  |  7 +++
>>  src/mesa/main/extensions_table.h|  2 ++
>>  src/mesa/main/get_hash_params.py|  3 +++
>>  src/mesa/main/hint.c| 12 
>>  src/mesa/main/hint.h|  4 
>>  src/mesa/main/mtypes.h  |  1 +
>>  src/mesa/main/shaderapi.c   | 10 ++
>>  src/mesa/main/tests/dispatch_sanity.cpp |  4 
>>  11 files changed, 60 insertions(+), 2 deletions(-)
>>
>> diff --git a/docs/features.txt b/docs/features.txt
>> index 8999e42519c..7b827de6a92 100644
>> --- a/docs/features.txt
>> +++ b/docs/features.txt
>> @@ -295,21 +295,21 @@ GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+, 
>> radeonsi, virgl
>>GL_OES_texture_storage_multisample_2d_array   DONE (all drivers 
>> that support GL_ARB_texture_multisample)
>>  
>>  Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL 
>> ES version:
>>  
>>GL_ARB_bindless_texture   DONE (nvc0, 
>> radeonsi)
>>GL_ARB_cl_event   not started
>>GL_ARB_compute_variable_group_sizeDONE (nvc0, 
>> radeonsi)
>>GL_ARB_ES3_2_compatibilityDONE (i965/gen8+, 
>> radeonsi, virgl)
>>GL_ARB_fragment_shader_interlock  DONE (i965)
>>GL_ARB_gpu_shader_int64   DONE (i965/gen8+, 
>> nvc0, radeonsi, softpipe, llvmpipe)
>> -  GL_ARB_parallel_shader_compilenot started, but 
>> Chia-I Wu did some related work in 2014
>> +  GL_ARB_parallel_shader_compileDONE (all drivers)
>>GL_ARB_post_depth_coverageDONE (i965, nvc0)
>>GL_ARB_robustness_isolation   not started
>>GL_ARB_sample_locations   DONE (nvc0)
>>GL_ARB_seamless_cubemap_per_texture   DONE (freedreno, 
>> i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>>GL_ARB_shader_ballot  DONE (i965/gen8+, 
>> nvc0, radeonsi)
>>GL_ARB_shader_clock   DONE (i965/gen7+, 
>> nv50, nvc0, r600, radeonsi, virgl)
>>GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
>> r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>>GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, 
>> nvc0, radeonsi)
>>GL_ARB_sparse_buffer  DONE (radeonsi/CIK+)
>>GL_ARB_sparse_texture not started
>> diff --git a/docs/relnotes/19.0.0.html b/docs/relnotes/19.0.0.html
>> index bc1776e8f4e..540482bca5f 100644
>> --- a/docs/relnotes/19.0.0.html
>> +++ b/docs/relnotes/19.0.0.html
>> @@ -33,24 +33,26 @@ Compatibility contexts may report a lower version 
>> depending on each driver.
>>  SHA256 checksums
>>  
>>  TBD.
>>  
>>  
>>  
>>  New features
>>  
>>  
>>  GL_AMD_texture_texture4 on all GL 4.0 drivers.
>> +GL_ARB_parallel_shader_compile on all drivers.
>>  GL_EXT_shader_implicit_conversions on all drivers (ES extension).
>>  GL_EXT_texture_compression_bptc on all GL 4.0 drivers (ES 
>> extension).
>>  GL_EXT_texture_compression_rgtc on all GL 3.0 drivers (ES 
>> extension).
>>  GL_EXT_texture_view on drivers supporting texture views (ES 
>> extension).
>> +GL_KHR_parallel_shader_compile on all drivers.
>>  GL_OES_texture_view on drivers supporting texture views (ES 
>> extension).
>>  
>>  
>>  Bug fixes
>>  
>>  
>>  TBD
>>  
>>  
>>  Changes
>> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
>> index f4d0808f13b..4ce691b361b 100644
>> --- a/src/mapi/glapi/gen/gl_API.xml
>> +++ b/src/mapi/glapi/gen/gl_API.xml
>> @@ -8402,21 +8402,34 @@
>>  
>>  
>>  
>>  
>>  
>>  
>&g

Re: [Mesa-dev] [PATCH 1/7] mesa: implement ARB/KHR_parallel_shader_compile

2019-01-03 Thread Ian Romanick
On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> Tested by piglit.

It doesn't look like there are any piglit test

> ---
>  docs/features.txt   |  2 +-
>  docs/relnotes/19.0.0.html   |  2 ++
>  src/mapi/glapi/gen/gl_API.xml   | 15 ++-
>  src/mesa/main/dd.h  |  7 +++
>  src/mesa/main/extensions_table.h|  2 ++
>  src/mesa/main/get_hash_params.py|  3 +++
>  src/mesa/main/hint.c| 12 
>  src/mesa/main/hint.h|  4 
>  src/mesa/main/mtypes.h  |  1 +
>  src/mesa/main/shaderapi.c   | 10 ++
>  src/mesa/main/tests/dispatch_sanity.cpp |  4 
>  11 files changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/features.txt b/docs/features.txt
> index 8999e42519c..7b827de6a92 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -295,21 +295,21 @@ GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+, radeonsi, 
> virgl
>GL_OES_texture_storage_multisample_2d_array   DONE (all drivers 
> that support GL_ARB_texture_multisample)
>  
>  Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL 
> ES version:
>  
>GL_ARB_bindless_texture   DONE (nvc0, radeonsi)
>GL_ARB_cl_event   not started
>GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi)
>GL_ARB_ES3_2_compatibilityDONE (i965/gen8+, 
> radeonsi, virgl)
>GL_ARB_fragment_shader_interlock  DONE (i965)
>GL_ARB_gpu_shader_int64   DONE (i965/gen8+, 
> nvc0, radeonsi, softpipe, llvmpipe)
> -  GL_ARB_parallel_shader_compilenot started, but 
> Chia-I Wu did some related work in 2014
> +  GL_ARB_parallel_shader_compileDONE (all drivers)
>GL_ARB_post_depth_coverageDONE (i965, nvc0)
>GL_ARB_robustness_isolation   not started
>GL_ARB_sample_locations   DONE (nvc0)
>GL_ARB_seamless_cubemap_per_texture   DONE (freedreno, 
> i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>GL_ARB_shader_ballot  DONE (i965/gen8+, 
> nvc0, radeonsi)
>GL_ARB_shader_clock   DONE (i965/gen7+, 
> nv50, nvc0, r600, radeonsi, virgl)
>GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
> r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, 
> nvc0, radeonsi)
>GL_ARB_sparse_buffer  DONE (radeonsi/CIK+)
>GL_ARB_sparse_texture not started
> diff --git a/docs/relnotes/19.0.0.html b/docs/relnotes/19.0.0.html
> index bc1776e8f4e..540482bca5f 100644
> --- a/docs/relnotes/19.0.0.html
> +++ b/docs/relnotes/19.0.0.html
> @@ -33,24 +33,26 @@ Compatibility contexts may report a lower version 
> depending on each driver.
>  SHA256 checksums
>  
>  TBD.
>  
>  
>  
>  New features
>  
>  
>  GL_AMD_texture_texture4 on all GL 4.0 drivers.
> +GL_ARB_parallel_shader_compile on all drivers.
>  GL_EXT_shader_implicit_conversions on all drivers (ES extension).
>  GL_EXT_texture_compression_bptc on all GL 4.0 drivers (ES extension).
>  GL_EXT_texture_compression_rgtc on all GL 3.0 drivers (ES extension).
>  GL_EXT_texture_view on drivers supporting texture views (ES 
> extension).
> +GL_KHR_parallel_shader_compile on all drivers.
>  GL_OES_texture_view on drivers supporting texture views (ES 
> extension).
>  
>  
>  Bug fixes
>  
>  
>  TBD
>  
>  
>  Changes
> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index f4d0808f13b..4ce691b361b 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -8402,21 +8402,34 @@
>  
>  
>  
>  
>  
>  
>  
>  
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
> -
> +
> +
> +
> +
> +
> +
> +
> +
> + alias="MaxShaderCompilerThreadsKHR">
> +
> +
> +
> +
> +
>  
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
>  
>  
>  
>  
>  
>  
>  
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index f14c3e04e91..92b6ecac33c 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -1292,20 +1292,27 @@ struct dd_function_table {
> /**
>  * Called to initialize gl_program::driver_cache_blob (and size) with a
>  * ralloc allocated buffer.
>  *
>  * This buffer will be saved and restored as part of the gl_program
>  * serialization and deserialization.
>  */
> void (*ShaderCacheSerializeDriverBlob)(struct gl_context *ctx,
>struct gl_program *prog

Re: [Mesa-dev] [PATCH 05/16] nir: combine fmul and fadd across ffma operations

2019-01-02 Thread Ian Romanick
On 12/19/18 8:39 AM, Jonathan Marek wrote:
> This works by moving the fadd up across the ffma operations, so that it
> can eventually can be combined with a fmul. I'm not sure it works in all
> cases, but it works in all the common cases.
> 
> This will only affect freedreno since it is the only driver using the
> fuse_ffma option.

tl;dr: Optimal generation of FFMAs is much more difficult than you would
think it should be.  You should collect some actual data before landing
this.

Any change to ffma generation is likely to have massive, unforeseen
changes to lots of shaders.  Seemingly simple, obvious changes result in
changes to live ranges, register pressure, scheduling, constant folding,
and on, and on.

I took this patch, substituted !options->lower_ffma for
options->fuse_ffma in the pattern you added, and ran it through
shader-db for Skylake and Haswell.  As I expected, the results were just
all over the place (see below).  Notice that register spills are helped
on one platform but hurt on the other.

There are some simple rules in nir_opt_algebraic for generating and
reassociating ffmas.  Given the complex interactions with live ranges,
register pressure, and scheduling, I feel like ffma generation should
happen much, much later in the process... it should almost certainly be
deep in the backend where register pressure and scheduling information
are available.

The Intel compiler has its own pass for ffma generation, and I've found
that makes really, really bad choices due to lack of this information.
For example, consider a sequence like

(shaderInputA * uniformB) + (texture(...) * shaderInputC)

There are two ways to generate an ffma from that.  One will schedule
well, and the other will be horrible.  You /probably/ want

ffma(texture(...), shaderInputC, (shaderInputA * uniformB))

so that the first multiply can happen during the latency of the texture
lookup.  But maybe not.  Maybe shaderInputA and uniformB are still live
after the multiply and storing the result of the multiply pushes
register pressure too high.

Right now our ffma pass is greedy.  If it sees a*b+c, it will always
generate ffma(a, b, c), regardless of whether or not c is also a
multiply.  In one of my experiments, I flipped the logic so a*b+c*d
would always generate ffma(c, d, a*b).  The number of helped and hurt
shaders was very close to even.  Some shaders were helped by a huge
amount, and other shaders were hurt by an equally huge amount.  I also
tried not generating an ffma at all for the a*b+c*d case.  My
recollection is that a few shaders were helped by a large amount, and
many thousands of shaders were hurt by small amounts.

If I add it all up, I probably spent several weeks last year poking at
changes like this in our ffma pass.  It began to feel like the old woman
who swallowed a fly.  Every change helped some things, but it made other
things fall off a cliff.  The next fix helped a few of the things
damaged by the previous change, but it made other things fall of a
different cliff.  I eventually abandoned the project.  If I ever pick it
back up, it will be as a pass that occurs closer to scheduling and
register allocation.

Skylake
total instructions in shared programs: 15031138 -> 15035206 (0.03%)
instructions in affected programs: 1230624 -> 1234692 (0.33%)
helped: 1428
HURT: 1067
helped stats (abs) min: 1 max: 671 x̄: 7.08 x̃: 3
helped stats (rel) min: 0.04% max: 24.72% x̄: 2.30% x̃: 1.78%
HURT stats (abs)   min: 1 max: 1601 x̄: 13.29 x̃: 4
HURT stats (rel)   min: 0.05% max: 352.64% x̄: 4.42% x̃: 2.35%
95% mean confidence interval for instructions value: 0.03 3.23
95% mean confidence interval for instructions %-change: 0.24% 0.91%
Instructions are HURT.

total cycles in shared programs: 369712682 -> 370166527 (0.12%)
cycles in affected programs: 128542483 -> 128996328 (0.35%)
helped: 1679
HURT: 2639
helped stats (abs) min: 1 max: 27317 x̄: 162.81 x̃: 18
helped stats (rel) min: <.01% max: 60.25% x̄: 2.34% x̃: 1.38%
HURT stats (abs)   min: 1 max: 57100 x̄: 275.56 x̃: 58
HURT stats (rel)   min: <.01% max: 147.37% x̄: 8.62% x̃: 5.01%
95% mean confidence interval for cycles value: 61.86 148.35
95% mean confidence interval for cycles %-change: 4.06% 4.66%
Cycles are HURT.

total spills in shared programs: 10158 -> 9688 (-4.63%)
spills in affected programs: 1829 -> 1359 (-25.70%)
helped: 140
HURT: 3

total fills in shared programs: 22117 -> 21371 (-3.37%)
fills in affected programs: 2575 -> 1829 (-28.97%)
helped: 140
HURT: 3

LOST:   7
GAINED: 0


Haswell
total instructions in shared programs: 13625863 -> 13635875 (0.07%)
instructions in affected programs: 1554579 -> 1564591 (0.64%)
helped: 844
HURT: 1651
helped stats (abs) min: 1 max: 96 x̄: 4.16 x̃: 3
helped stats (rel) min: 0.04% max: 10.26% x̄: 1.91% x̃: 1.90%
HURT stats (abs)   min: 1 max: 1602 x̄: 8.19 x̃: 5
HURT stats (rel)   min: 0.10% max: 346.00% x̄: 2.97% x̃: 1.45%
95% mean confidence interval for instructions value: 2.70 5.33
95% mean confidenc

Re: [Mesa-dev] [PATCH 04/16] nir: add nir_lower_bool_to_float

2019-01-02 Thread Ian Romanick
On 12/19/18 9:25 AM, Dylan Baker wrote:
> Quoting Jonathan Marek (2018-12-19 08:39:53)
>> Mainly a copy of nir_lower_bool_to_int32, but with float opcodes.
>>
>> Signed-off-by: Jonathan Marek 
>> ---
>>  src/compiler/Makefile.sources  |   1 +
>>  src/compiler/nir/meson.build   |   3 +-
>>  src/compiler/nir/nir.h |   1 +
>>  src/compiler/nir/nir_lower_bool_to_float.c | 165 +
>>  4 files changed, 169 insertions(+), 1 deletion(-)
>>  create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c
>>
>> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
>> index ef47bdb33b..39eaedc658 100644
>> --- a/src/compiler/Makefile.sources
>> +++ b/src/compiler/Makefile.sources
>> @@ -231,6 +231,7 @@ NIR_FILES = \
>> nir/nir_lower_atomics_to_ssbo.c \
>> nir/nir_lower_bitmap.c \
>> nir/nir_lower_bit_size.c \
>> +   nir/nir_lower_bool_to_float.c \
>> nir/nir_lower_bool_to_int32.c \
>> nir/nir_lower_clamp_color_outputs.c \
>> nir/nir_lower_clip.c \
>> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>> index e252f64539..f1016104af 100644
>> --- a/src/compiler/nir/meson.build
>> +++ b/src/compiler/nir/meson.build
>> @@ -114,6 +114,7 @@ files_libnir = files(
>>'nir_lower_alpha_test.c',
>>'nir_lower_atomics_to_ssbo.c',
>>'nir_lower_bitmap.c',
>> +  'nir_lower_bool_to_float.c',
>>'nir_lower_bool_to_int32.c',
>>'nir_lower_clamp_color_outputs.c',
>>'nir_lower_clip.c',
>> @@ -248,7 +249,7 @@ if with_tests
>>include_directories : [inc_common],
>>dependencies : [dep_thread, idep_gtest, idep_nir],
>>link_with : libmesa_util,
>> -), 
>> +),
> 
> This looks like stray whitespace?

It's deleting a stray (incorrect?) whitespace.  I'm usually not fond of
slipping unrelated changes into a commit... but who's going to send a
1-line patch that deletes a single space character? :)

> 
> other than that, for the build system bits:
> Reviewed-by: Dylan Baker 
> 
>>  suite : ['compiler', 'nir'],
>>)
>>  
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 54f9c64a3a..f6d0bdf7ec 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2905,6 +2905,7 @@ void nir_lower_alpha_test(nir_shader *shader, enum 
>> compare_func func,
>>bool alpha_to_one);
>>  bool nir_lower_alu(nir_shader *shader);
>>  bool nir_lower_alu_to_scalar(nir_shader *shader);
>> +bool nir_lower_bool_to_float(nir_shader *shader);
>>  bool nir_lower_bool_to_int32(nir_shader *shader);
>>  bool nir_lower_load_const_to_scalar(nir_shader *shader);
>>  bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
>> diff --git a/src/compiler/nir/nir_lower_bool_to_float.c 
>> b/src/compiler/nir/nir_lower_bool_to_float.c
>> new file mode 100644
>> index 00..2756a1815f
>> --- /dev/null
>> +++ b/src/compiler/nir/nir_lower_bool_to_float.c
>> @@ -0,0 +1,165 @@
>> +/*
>> + * Copyright  2018 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "nir.h"
>> +
>> +static bool
>> +assert_ssa_def_is_not_1bit(nir_ssa_def *def, UNUSED void *unused)
>> +{
>> +   assert(def->bit_size > 1);
>> +   return true;
>> +}
>> +
>> +static bool
>> +rewrite_1bit_ssa_def_to_32bit(nir_ssa_def *def, void *_progress)
>> +{
>> +   bool *progress = _progress;
>> +   if (def->bit_size == 1) {
>> +  def->bit_size = 32;
>> +  *progress = true;
>> +   }
>> +   return true;
>> +}
>> +
>> +static bool
>> +lower_alu_instr(nir_alu_instr *alu)
>> +{
>> +   const nir_op_info *op_info = &nir_op_infos[alu->op];
>> +
>> +   switch (alu->op) {
>> +   case nir_op_vec2:
>> +   case nir_op_vec3:
>> +   case nir_op_vec4:
>> +  /* These we expect to have boo

[Mesa-dev] MR: nir/algebraic: Don't put quotes around floating point literals

2018-12-18 Thread Ian Romanick
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/30

NOTE: This is still running through our CI, and I'm trying to come up
with a piglit test case that will reproduce it.

The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.

Signed-off-by: Ian Romanick 
Fixes: 6bcd2af0863 ("nir/algebraic: Add some optimizations for D3D-style
Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: Revert "nir/lower_indirect: Bail early if modes == 0"

2018-12-18 Thread Ian Romanick
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/26

"There's no point in walking the program if we're never going to
actually lower anything."

Except we might lower compacted local arrays.  In that case, modes will
be 0, but there is still lowering to be done.

This reverts commit 7f75cf2a.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke kenn...@whitecape.org
Cc: Jason Ekstrand ja...@jlekstrand.net
Cc: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: NIR: fsign optimizations and a couple 1-bit Boolean optimizations too.

2018-12-18 Thread Ian Romanick
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/27

This series mostly improves code generation for cases of x*fsign(y).  I
had some difficulty getting everything working after rebasing on the
1-bit Boolean changes.  In that process, I noticed a couple things that
led to the last two patches.

shader-db results across the whole series for Skylake:

total instructions in shared programs: 14956605 -> 14941001 (-0.10%)
instructions in affected programs: 991239 -> 975635 (-1.57%)
helped: 4710
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.31 x̃: 2
helped stats (rel) min: 0.07% max: 10.00% x̄: 1.85% x̃: 1.38%
95% mean confidence interval for instructions value: -3.49 -3.14
95% mean confidence interval for instructions %-change: -1.89% -1.82%
Instructions are helped.

total cycles in shared programs: 359972474 -> 359757117 (-0.06%)
cycles in affected programs: 16235692 -> 16020335 (-1.33%)
helped: 4723
HURT: 473
helped stats (abs) min: 1 max: 2086 x̄: 50.91 x̃: 26
helped stats (rel) min: 0.01% max: 29.14% x̄: 2.81% x̃: 2.34%
HURT stats (abs)   min: 1 max: 1836 x̄: 53.03 x̃: 10
HURT stats (rel)   min: <.01% max: 34.36% x̄: 2.54% x̃: 0.57%
95% mean confidence interval for cycles value: -44.91 -37.98
95% mean confidence interval for cycles %-change: -2.40% -2.24%
Cycles are helped.

total spills in shared programs: 10056 -> 10055 (<.01%)
spills in affected programs: 54 -> 53 (-1.85%)
helped: 1
HURT: 0

total fills in shared programs: 21788 -> 21787 (<.01%)
fills in affected programs: 151 -> 150 (-0.66%)
helped: 1
HURT: 0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MR: NIR: Partial redundancy elimination for compares

2018-12-17 Thread Ian Romanick
I was really hoping to avoid having review discussion for a patch series
on both the mailing list and the MR, but I guess that ship has sailed.

On 12/17/18 2:39 PM, Roland Scheidegger wrote:
> Am 17.12.18 um 23:27 schrieb Roland Scheidegger:
>> Am 17.12.18 um 23:07 schrieb Ilia Mirkin:
>>> On Mon, Dec 17, 2018 at 5:05 PM Ian Romanick  wrote:
>>>>
>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2Fmerge_requests%2F22&data=02%7C01%7Csroland%40vmware.com%7C5773f37aa397417e6beb08d6646c24c6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636806812986635916&sdata=z2ev4sVJEW8Kw2hsoNWfYHh6FkSwSy%2B5CdTItxoh%2FPE%3D&reserved=0
>>>>
>>>> This series adds a new optimization pass that tries to replace code
>>>> sequences like
>>>>
>>>> if (x < y) {
>>>> z = y - x;
>>>> ...
>>>> }
>>>>
>>>> with a sequence like
>>>>
>>>> t = x - y;
>>>> if (t < 0) {
>>>> z = -t;
>>>> ...
>>>> }
>>>
>>> Is it worth worrying about infinities? e.g. if x = -Infinity, y =
>>> Infinity, "x < y" will be true, but "x - y < 0" will not be (pretty
>>> sure it'll be a NaN, which is not < 0).
>>
>> I was wondering the same, but I think this should still work.
>> -Inf - Inf = -Inf, so no problem there.
> 
> Although it looks like the optimization might be a bit problematic? For
> it to work you really need not just be able to use flags generated by
> sub, but also you need to be able to eliminate the negation one way or
> another (e.g. free input negate going into another operation, turning
> subsequent adds into subs, ...). But well maybe that's often possible.

The optimization handles various combinations of 'x < y', 'x > y', 'x <=
y', and 'x >= y' combined with various combinations of 'x - y' and 'y -
x'.  Not all of the cases require a negation in the replacement.

Right now this is only enabled in the i965 driver, and negation is free
on that architecture.  If this is enabled on other architectures,
someone might want to add a flag that limits the kind of replacements to
avoid the negation... even that may be overly conservative.  In the
example case, it's possible that the only use of z is in an expression
like 'w - z'.

> Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] PSA: Please send MRs to the mailing list

2018-12-17 Thread Ian Romanick
On 12/16/18 4:54 PM, Ilia Mirkin wrote:
> A diffstat would also be nice as it would better inform people reading
> emails whether they need to care or not.

That's a good idea.  I tagged the message that I just sent with MR: to
make it more obvious that it's for a merge request.  I saw Jason's
messages, but I didn't realize they were merge request notices until
after he started this thread. :)

> On Sun, Dec 16, 2018 at 7:53 PM Jason Ekstrand  wrote:
>>
>> One of these days, we will hopefully have a script to just do this for us.  
>> In the mean-time, manual isn't too bad.
>>
>> On Sun, Dec 16, 2018 at 6:52 PM Jason Ekstrand  wrote:
>>>
>>> I don't know if it was actually in the doc that Jordan wrote up but it's 
>>> courteous of you to send a quick e-mail to the mailing list when you create 
>>> a new MR so that people who aren't regularly trolling the list of MRs are 
>>> at least aware that it exists.  Of the 20 MRs that have been posted so far, 
>>> I think I'm the only one doing this.  I'm a big fan of MRs but I also don't 
>>> want us MR fans to anger the list. :-)
>>>
>>> --Jason
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: NIR: Partial redundancy elimination for compares

2018-12-17 Thread Ian Romanick
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/22

This series adds a new optimization pass that tries to replace code
sequences like

if (x < y) {
z = y - x;
...
}

with a sequence like

t = x - y;
if (t < 0) {
z = -t;
...
}

On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction. It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.

Currently only floating point compares and adds are supported. Adding
support for integer will be a challenge due to integer overflow. There
are a couple possible solutions, but they may not apply to all
architectures.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Let's talk about -DDEBUG

2018-12-12 Thread Ian Romanick
On 12/12/18 4:13 PM, Dylan Baker wrote:
> Quoting Rob Clark (2018-12-12 15:52:47)
>> On Wed, Dec 12, 2018 at 6:25 PM Dylan Baker  wrote:
>>>
>>> In the autotools discussion I've come to realize that we also need to talk 
>>> about
>>> the -DDEBUG guard. It seems that there are two different uses, and thus two
>>> different asks about it:
>>>
>>> - Nine (and RadeonSI?) use -DDEBUG to hide generic debugging
>>> - NIR and Intel (at least) use -DDEBUG to hide really expensive checks that 
>>> are
>>>   useful, but necessarily tank performance.
>>>
>>> The first group would like -DDEBUG in debugoptimized builds, the second
>>> obviously doesn't.
>>>
>>> Is the right solution to move the first group being !NDEBUG, or would it be
>>> better to split DEBUG into two different defines such as DEBUG_MESSAGES and
>>> EXPENSIVE_VALIDATION (paint the bikeshed whatever color you like), with the
>>> first for both debug and debugoptimized and the second only in debug builds?
>>
>> I guess my use cases for !=release builds are:
>>
>> + I want all the expensive checking because I'm not in it to win the
>>   deqp/piglit fps race
>> + I want debug syms for profiling and/or valgrind, but otherwise
>>   want something close to a release build but with debug syms
>>
>>
>> That said, I can get behind replacing DEBUG with !NDEBUG or
>> EXPENSIVE_DEBUG or whatever permutation of that color folks prefer
>>
>>
>> BR,
>> -R
> 
> I guess I should have covered that:
> 
> autotools had effectively two build types "debug" and "not debug", "debug" set
> "-DDEBUG -g -O2", "not debug" set -DNDEBUG
> 
> Meson has 4 build types, and a separate toggle for NDEBUG:
> debug: -O0 -DDEBUG (we add -DDEBUG)
> debugoptimzed: -O2 -g
> release: -O2
> plain: (nothing)

About 5 minutes into using meson, I started using plain builds with my
own flags set.  I can configure it *exactly* the way I want without
bothering or being bothered by anyone.  I haven't looked back since.

> Meson doesn't define NDEBUG by default, so if you want to turn off asserts you
> need to add -Db_ndebug=true
> 
> autotools debug is roughly between meson's debugoptimized and debug, while
> autotools non-debug corresponds to meson's plain buildtype.
> 
> Dylan
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] docs: Document GitLab merge request process (email alternative)

2018-12-11 Thread Ian Romanick
It's fairly common for Mesa developers to have several patch series on
the mailing list at a time.  I believe most people will author these as
a continuous stream with either implicit dependencies (i.e., commit
messages in the second series with shader-db results that are impacted
by the first series) or explicit dependencies (i.e., second series uses
interfaces added or changed by the first).  When I do this, I still like
to send the series out a separate sets of patches that accomplish
separate, logical tasks.

We can't be the only project where people work like this, so what is the
common practice in other projects?  I've thought of several possible
solutions, but each seems fatally flawed.

- A single, possibly giant, MR defeats the ability to have separate
logical work units.

- Splitting into multiple MRs based from master may not be practical or
possible without including some patches in both series.

- Waiting to send the second series out may increase the lag in the
review process or lead to the submitter forgetting to submit. :)

On 12/5/18 3:32 PM, Jordan Justen wrote:
> This documents a process for using GitLab Merge Requests as an second
> way to submit code changes for Mesa. Only one of the two methods is
> allowed for each patch series.
> 
> We will *not* require all patches to be emailed. Some code changes may
> be reviewed and merged without any discussion on the mesa-dev email
> list.
> 
> v2:
>  * No longer require email. Allow submitter to choose email or a
>GitLab merge request.
>  * Various feedback from Brian, Daniel, Dylan, Eric, Erik, Jason,
>Matt, Michel and Rob.
> 
> Signed-off-by: Jordan Justen 
> ---
>  docs/submittingpatches.html | 76 ++---
>  1 file changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/submittingpatches.html b/docs/submittingpatches.html
> index 92d954a2d09..21175988d0b 100644
> --- a/docs/submittingpatches.html
> +++ b/docs/submittingpatches.html
> @@ -21,7 +21,7 @@
>  Basic guidelines
>  Patch formatting
>  Testing Patches
> -Mailing Patches
> +Submitting Patches
>  Reviewing Patches
>  Nominating a commit for a stable branch
>  Criteria for accepting patches to the stable 
> branch
> @@ -42,8 +42,10 @@ components.
>  git bisect.)
>  Patches should be properly formatted.
>  Patches should be sufficiently tested before 
> submitting.
> -Patches should be submitted to mesa-dev
> -for review using git send-email.
> +Patches should be submitted
> +to mesa-dev or with
> +a merge request
> +for review.
>  
>  
>  
> @@ -180,10 +182,19 @@ run.
>  
>  
>  
> -Mailing Patches
> +Submitting Patches
>  
>  
> -Patches should be sent to the mesa-dev mailing list for review:
> +Patches may be submitted to the Mesa project by
> +email or with a
> +GitLab merge request. To prevent
> +duplicate code review, only use one method to submit your changes.
> +
> +
> +Mailing Patches
> +
> +
> +Patches may be sent to the mesa-dev mailing list for review:
>  https://lists.freedesktop.org/mailman/listinfo/mesa-dev";>
>  mesa-dev@lists.freedesktop.org.
>  When submitting a patch make sure to use
> @@ -217,8 +228,63 @@ disabled before sending your patches. (Note that you may 
> need to contact
>  your email administrator for this.)
>  
>  
> +GitLab Merge Requests
> +
> +
> +  https://gitlab.freedesktop.org/mesa/mesa";>GitLab Merge
> +  Requests (MR) can also be used to submit patches for Mesa.
> +
> +
> +
> +  If the MR may have interest for most of the Mesa community, you can
> +  send an email to the mesa-dev email list including a link to the MR.
> +  Don't send the patch to mesa-dev, just the MR link.
> +
> +
> +  Add labels to your MR to help reviewers find it. For example:
> +  
> +Mesa changes affecting all drivers: mesa
> +Hardware vendor specific code: amd, intel, nvidia, ...
> +Driver specific code: anvil, freedreno, i965, iris, radeonsi,
> +  radv, vc4, ...
> +Other tag examples: gallium, util
> +  
> +
> +
> +  If you revise your patches based on code review and push an update
> +  to your branch, you should maintain a clean history
> +  in your patches. There should not be "fixup" patches in the history.
> +  The series should be buildable and functional after every commit
> +  whenever you push the branch.
> +
> +
> +  It is your responsibility to keep the MR alive and making progress,
> +  as there are no guarantees that a Mesa dev will independently take
> +  interest in it.
> +
> +
> +  Some other notes:
> +  
> +Make changes and update your branch based on feedback
> +Old, stale MR may be closed, but you can reopen it if you
> +  still want to pursue the changes
> +You should periodically check to see if your MR needs to be
> +  rebased
> +Make sure your MR is closed if your patches get pushed outside
> +  of GitLab
> +  
> +
> +
>  Reviewing Patches
>  
> +
> +  To participate in code review, you should monitor the
> +  https://lists.freedesktop.org/mailman/listinfo/

Re: [Mesa-dev] [PATCH 1/6] mesa: wire up InvalidateFramebuffer

2018-12-11 Thread Ian Romanick
On 12/11/18 3:15 PM, Rob Clark wrote:
> On Tue, Dec 11, 2018 at 6:06 PM Ian Romanick  wrote:
>>
>> On 12/11/18 2:50 PM, Rob Clark wrote:
>>> And before someone actually starts implementing DiscardFramebuffer()
>>> lets rework the interface to something that is actually usable.
>>>
>>> Signed-off-by: Rob Clark 
>>> ---
>>>  src/mesa/main/dd.h   |  5 +--
>>>  src/mesa/main/fbobject.c | 79 ++--
>>>  2 files changed, 77 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
>>> index f14c3e04e91..1214eeaa474 100644
>>> --- a/src/mesa/main/dd.h
>>> +++ b/src/mesa/main/dd.h
>>> @@ -784,9 +784,8 @@ struct dd_function_table {
>>> GLint srcX0, GLint srcY0, GLint srcX1, GLint 
>>> srcY1,
>>> GLint dstX0, GLint dstY0, GLint dstX1, GLint 
>>> dstY1,
>>> GLbitfield mask, GLenum filter);
>>> -   void (*DiscardFramebuffer)(struct gl_context *ctx,
>>> -  GLenum target, GLsizei numAttachments,
>>> -  const GLenum *attachments);
>>> +   void (*DiscardFramebuffer)(struct gl_context *ctx, struct 
>>> gl_framebuffer *fb,
>>> +  struct gl_renderbuffer_attachment *att);
>>>
>>> /**
>>>  * \name Functions for GL_ARB_sample_locations
>>> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
>>> index 23e49396199..f931e8f76b1 100644
>>> --- a/src/mesa/main/fbobject.c
>>> +++ b/src/mesa/main/fbobject.c
>>> @@ -4637,6 +4637,67 @@ invalid_enum:
>>> return;
>>>  }
>>>
>>> +static struct gl_renderbuffer_attachment *
>>> +get_fb_attachment(struct gl_context *ctx, struct gl_framebuffer *fb,
>>> +  const GLenum attachment)
>>> +{
>>> +   GLint idx;
>>
>> Nancy Reagan says "Just say no... to silly GL types."  I'd also be
>> inclined to move this...
> 
> defn agree.. that just seemed to be the 'when in Rome' convention
> 
> But I don't mind bucking convention

This used to be the convention, but I think we stopped using GL types
for things that aren't exposed to the application some time ago.  I
think Eric (wisely) led that charge.  Then it became another one of
those cases where we didn't go back and change everything.

Which reminds me...

>>> +
>>> +   switch (attachment) {
>>> +   case GL_COLOR:
>>> +   case GL_COLOR_ATTACHMENT0_EXT:
>>> +   case GL_COLOR_ATTACHMENT1_EXT:
>>> +   case GL_COLOR_ATTACHMENT2_EXT:
>>> +   case GL_COLOR_ATTACHMENT3_EXT:
>>> +   case GL_COLOR_ATTACHMENT4_EXT:
>>> +   case GL_COLOR_ATTACHMENT5_EXT:
>>> +   case GL_COLOR_ATTACHMENT6_EXT:
>>> +   case GL_COLOR_ATTACHMENT7_EXT:
>>> +   case GL_COLOR_ATTACHMENT8_EXT:
>>> +   case GL_COLOR_ATTACHMENT9_EXT:
>>> +   case GL_COLOR_ATTACHMENT10_EXT:
>>> +   case GL_COLOR_ATTACHMENT11_EXT:
>>> +   case GL_COLOR_ATTACHMENT12_EXT:
>>> +   case GL_COLOR_ATTACHMENT13_EXT:
>>> +   case GL_COLOR_ATTACHMENT14_EXT:
>>> +   case GL_COLOR_ATTACHMENT15_EXT:

s/_EXT// here and elsewhere. :)

>>> +  if (attachment == GL_COLOR) {
>>> + idx = 0;
>>> +  } else {
>>> + idx = attachment - GL_COLOR_ATTACHMENT0_EXT;
>>> +  }
>>
>> ...here and do
>>
>>   const unsigned idx = attachment == GL_COLOR ? 0 : attachment - 
>> GL_COLOR_ATTACHMENT0_EXT;
>>
> 
> mostly just trying to keep it in 80(ish) columns

Yeah... I'll usually break these at the " = " if they get too long.  I
don't know if we actually have a proper coding style guideline for this.
 I know Matt does something different from what I do.  Not sure about
Ken or Jason. *shrug*

>> but that's just my personal preference.
>>
>>> +  return &fb->Attachment[BUFFER_COLOR0 + idx];
>>> +   case GL_DEPTH:
>>> +   case GL_DEPTH_ATTACHMENT_EXT:
>>> +   case GL_DEPTH_STENCIL_ATTACHMENT:
>>> +  return &fb->Attachment[BUFFER_DEPTH];
>>> +   case GL_STENCIL:
>>> +   case GL_STENCIL_ATTACHMENT_EXT:
>>> +  return &fb->Attachment[BUFFER_STENCIL];
>>> +   default:
>>> +  return NULL;
>>> +   }
>>> +}
>>> +
>>> +static void
>>> +discard_fra

Re: [Mesa-dev] [PATCH 2/6] mesa: wire up InvalidateSubFramebuffer

2018-12-11 Thread Ian Romanick
On 12/11/18 2:50 PM, Rob Clark wrote:
> Signed-off-by: Rob Clark 
> ---
>  src/mesa/main/dd.h   |  3 +++
>  src/mesa/main/fbobject.c | 34 +-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index 1214eeaa474..c7112677223 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -786,6 +786,9 @@ struct dd_function_table {
> GLbitfield mask, GLenum filter);
> void (*DiscardFramebuffer)(struct gl_context *ctx, struct gl_framebuffer 
> *fb,
>struct gl_renderbuffer_attachment *att);
> +   void (*DiscardSubFramebuffer)(struct gl_context *ctx, struct 
> gl_framebuffer *fb,
> + struct gl_renderbuffer_attachment *att, 
> GLint x,
> + GLint y, GLsizei width, GLsizei height);

After looking at the rest of the series... I wonder if some higher layer
should be responsible for detecting the case where the subrect size of
the DiscardSubFramebuffer is the size of the entire attachment (which
may be different than the renderable size of the framebuffer) and call
DiscardFramebuffer instead.  It seems like many implementations won't do
anything for DiscardSubFramebuffer but will do something for
DiscardFramebuffer.

Maybe just leave a note in discard_sub_framebuffer so that the first
person working on a driver that would benefit from this will implement it?

>  
> /**
>  * \name Functions for GL_ARB_sample_locations
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
> index f931e8f76b1..8ef5eb747c0 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -4699,12 +4699,41 @@ discard_framebuffer(struct gl_context *ctx, struct 
> gl_framebuffer *fb,
> }
>  }
>  
> +static void
> +discard_sub_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb,
> +GLsizei numAttachments, const GLenum *attachments,
> +GLint x, GLint y, GLsizei width, GLsizei height)
> +{
> +   GLint i;
> +
> +   if (!ctx->Driver.DiscardSubFramebuffer)
> +  return;
> +
> +   for (i = 0; i < numAttachments; i++) {
> +  struct gl_renderbuffer_attachment *att =
> +get_fb_attachment(ctx, fb, attachments[i]);
> +
> +  if (!att)
> + continue;
> +
> +  ctx->Driver.DiscardSubFramebuffer(ctx, fb, att, x, y, width, height);
> +   }
> +}
> +
>  void GLAPIENTRY
>  _mesa_InvalidateSubFramebuffer_no_error(GLenum target, GLsizei 
> numAttachments,
>  const GLenum *attachments, GLint x,
>  GLint y, GLsizei width, GLsizei 
> height)
>  {
> -   /* no-op */
> +   struct gl_framebuffer *fb;
> +   GET_CURRENT_CONTEXT(ctx);
> +
> +   fb = get_framebuffer_target(ctx, target);
> +   if (!fb)
> +  return;
> +
> +   discard_sub_framebuffer(ctx, fb, numAttachments, attachments,
> +   x, y, width, height);
>  }
>  
>  
> @@ -4727,6 +4756,9 @@ _mesa_InvalidateSubFramebuffer(GLenum target, GLsizei 
> numAttachments,
> invalidate_framebuffer_storage(ctx, fb, numAttachments, attachments,
>x, y, width, height,
>"glInvalidateSubFramebuffer");
> +
> +   discard_sub_framebuffer(ctx, fb, numAttachments, attachments,
> +   x, y, width, height);
>  }
>  
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] mesa: wire up InvalidateSubFramebuffer

2018-12-11 Thread Ian Romanick
On 12/11/18 2:50 PM, Rob Clark wrote:
> Signed-off-by: Rob Clark 
> ---
>  src/mesa/main/dd.h   |  3 +++
>  src/mesa/main/fbobject.c | 34 +-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index 1214eeaa474..c7112677223 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -786,6 +786,9 @@ struct dd_function_table {
> GLbitfield mask, GLenum filter);
> void (*DiscardFramebuffer)(struct gl_context *ctx, struct gl_framebuffer 
> *fb,
>struct gl_renderbuffer_attachment *att);
> +   void (*DiscardSubFramebuffer)(struct gl_context *ctx, struct 
> gl_framebuffer *fb,
> + struct gl_renderbuffer_attachment *att, 
> GLint x,
> + GLint y, GLsizei width, GLsizei height);
>  
> /**
>  * \name Functions for GL_ARB_sample_locations
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
> index f931e8f76b1..8ef5eb747c0 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -4699,12 +4699,41 @@ discard_framebuffer(struct gl_context *ctx, struct 
> gl_framebuffer *fb,
> }
>  }
>  
> +static void
> +discard_sub_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb,
> +GLsizei numAttachments, const GLenum *attachments,
> +GLint x, GLint y, GLsizei width, GLsizei height)
> +{
> +   GLint i;
> +
> +   if (!ctx->Driver.DiscardSubFramebuffer)
> +  return;
> +
> +   for (i = 0; i < numAttachments; i++) {

Same comment here about move the declaration and changing the type.
With that,

Reviewed-by: Ian Romanick 

> +  struct gl_renderbuffer_attachment *att =
> +get_fb_attachment(ctx, fb, attachments[i]);
> +
> +  if (!att)
> + continue;
> +
> +  ctx->Driver.DiscardSubFramebuffer(ctx, fb, att, x, y, width, height);
> +   }
> +}
> +
>  void GLAPIENTRY
>  _mesa_InvalidateSubFramebuffer_no_error(GLenum target, GLsizei 
> numAttachments,
>  const GLenum *attachments, GLint x,
>  GLint y, GLsizei width, GLsizei 
> height)
>  {
> -   /* no-op */
> +   struct gl_framebuffer *fb;
> +   GET_CURRENT_CONTEXT(ctx);
> +
> +   fb = get_framebuffer_target(ctx, target);
> +   if (!fb)
> +  return;
> +
> +   discard_sub_framebuffer(ctx, fb, numAttachments, attachments,
> +   x, y, width, height);
>  }
>  
>  
> @@ -4727,6 +4756,9 @@ _mesa_InvalidateSubFramebuffer(GLenum target, GLsizei 
> numAttachments,
> invalidate_framebuffer_storage(ctx, fb, numAttachments, attachments,
>x, y, width, height,
>"glInvalidateSubFramebuffer");
> +
> +   discard_sub_framebuffer(ctx, fb, numAttachments, attachments,
> +   x, y, width, height);
>  }
>  
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] mesa: wire up InvalidateFramebuffer

2018-12-11 Thread Ian Romanick
On 12/11/18 2:50 PM, Rob Clark wrote:
> And before someone actually starts implementing DiscardFramebuffer()
> lets rework the interface to something that is actually usable.
> 
> Signed-off-by: Rob Clark 
> ---
>  src/mesa/main/dd.h   |  5 +--
>  src/mesa/main/fbobject.c | 79 ++--
>  2 files changed, 77 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index f14c3e04e91..1214eeaa474 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -784,9 +784,8 @@ struct dd_function_table {
> GLint srcX0, GLint srcY0, GLint srcX1, GLint 
> srcY1,
> GLint dstX0, GLint dstY0, GLint dstX1, GLint 
> dstY1,
> GLbitfield mask, GLenum filter);
> -   void (*DiscardFramebuffer)(struct gl_context *ctx,
> -  GLenum target, GLsizei numAttachments,
> -  const GLenum *attachments);
> +   void (*DiscardFramebuffer)(struct gl_context *ctx, struct gl_framebuffer 
> *fb,
> +  struct gl_renderbuffer_attachment *att);
>  
> /**
>  * \name Functions for GL_ARB_sample_locations
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
> index 23e49396199..f931e8f76b1 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -4637,6 +4637,67 @@ invalid_enum:
> return;
>  }
>  
> +static struct gl_renderbuffer_attachment *
> +get_fb_attachment(struct gl_context *ctx, struct gl_framebuffer *fb,
> +  const GLenum attachment)
> +{
> +   GLint idx;

Nancy Reagan says "Just say no... to silly GL types."  I'd also be
inclined to move this...

> +
> +   switch (attachment) {
> +   case GL_COLOR:
> +   case GL_COLOR_ATTACHMENT0_EXT:
> +   case GL_COLOR_ATTACHMENT1_EXT:
> +   case GL_COLOR_ATTACHMENT2_EXT:
> +   case GL_COLOR_ATTACHMENT3_EXT:
> +   case GL_COLOR_ATTACHMENT4_EXT:
> +   case GL_COLOR_ATTACHMENT5_EXT:
> +   case GL_COLOR_ATTACHMENT6_EXT:
> +   case GL_COLOR_ATTACHMENT7_EXT:
> +   case GL_COLOR_ATTACHMENT8_EXT:
> +   case GL_COLOR_ATTACHMENT9_EXT:
> +   case GL_COLOR_ATTACHMENT10_EXT:
> +   case GL_COLOR_ATTACHMENT11_EXT:
> +   case GL_COLOR_ATTACHMENT12_EXT:
> +   case GL_COLOR_ATTACHMENT13_EXT:
> +   case GL_COLOR_ATTACHMENT14_EXT:
> +   case GL_COLOR_ATTACHMENT15_EXT:
> +  if (attachment == GL_COLOR) {
> + idx = 0;
> +  } else {
> + idx = attachment - GL_COLOR_ATTACHMENT0_EXT;
> +  }

...here and do

  const unsigned idx = attachment == GL_COLOR ? 0 : attachment - 
GL_COLOR_ATTACHMENT0_EXT;

but that's just my personal preference.

> +  return &fb->Attachment[BUFFER_COLOR0 + idx];
> +   case GL_DEPTH:
> +   case GL_DEPTH_ATTACHMENT_EXT:
> +   case GL_DEPTH_STENCIL_ATTACHMENT:
> +  return &fb->Attachment[BUFFER_DEPTH];
> +   case GL_STENCIL:
> +   case GL_STENCIL_ATTACHMENT_EXT:
> +  return &fb->Attachment[BUFFER_STENCIL];
> +   default:
> +  return NULL;
> +   }
> +}
> +
> +static void
> +discard_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb,
> +GLsizei numAttachments, const GLenum *attachments)
> +{
> +   GLint i;
> +
> +   if (!ctx->Driver.DiscardFramebuffer)
> +  return;
> +
> +   for (i = 0; i < numAttachments; i++) {

I'd definitely move 'int i' inside the for.

With at least s/GLint/int/ or similar, this patch is

Reviewed-by: Ian Romanick 

> +  struct gl_renderbuffer_attachment *att =
> +get_fb_attachment(ctx, fb, attachments[i]);
> +
> +  if (!att)
> + continue;
> +
> +  ctx->Driver.DiscardFramebuffer(ctx, fb, att);
> +   }
> +}
>  
>  void GLAPIENTRY
>  _mesa_InvalidateSubFramebuffer_no_error(GLenum target, GLsizei 
> numAttachments,
> @@ -4695,14 +4756,23 @@ _mesa_InvalidateNamedFramebufferSubData(GLuint 
> framebuffer,
> invalidate_framebuffer_storage(ctx, fb, numAttachments, attachments,
>x, y, width, height,
>"glInvalidateNamedFramebufferSubData");
> -}
>  
> +   discard_sub_framebuffer(ctx, fb, numAttachments, attachments,
> +   x, y, width, height);
> +}
>  
>  void GLAPIENTRY
>  _mesa_InvalidateFramebuffer_no_error(GLenum target, GLsizei numAttachments,
>   const GLenum *attachments)
>  {
> -   /* no-op */
> +   struct gl_framebuffer *fb;
> +   GET_CURRENT_CONTEXT(ctx);
> +
> +   fb = get_framebuffer_target(ctx, target);
&

Re: [Mesa-dev] [PATCH] mesa: add EXT_debug_label support

2018-12-10 Thread Ian Romanick
On 12/10/18 5:52 PM, Timothy Arceri wrote:
> On 11/12/18 11:35 am, Ian Romanick wrote:
>> It seems like someone already sent out patches to implement this, and we
>> decided to not take it for some reason.  Maybe it was Rob?
> 
> I discovered a thread from the beginning of 2017 titled "feature.txt &
> EXT_debug_label extension". But couldn't find any implementation.
> 
> There was a reply from yourself, but it seems incorrect to me:
> 
> "I checked both extensions, and they're not "just" aliases.  The EXT adds
> a single function with an enum to select the kind of object.  The KHR
> adds a function per kind of object.  It would be easy enough to add, but
> it seems more valuable to suggest the developer use the more broadly
> supported extension."

That's weird for a couple reasons.  One, that's not even the discussion
that I was thinking of.  I'll check in the morning to see if I can find
it.  Two, I was clearly full of it... I really don't see how I came that
conclusion.  I don't even see any other related extensions that I could
have been confusing either thing with.

>> On 12/10/18 4:08 PM, Timothy Arceri wrote:
>>> KHR_debug already provides superior functionality but this
>>> extension is still in use and adding support for it seems fairly
>>> harmless. For example it seems to be used by Unity as seen in the
>>> Parkitect trace attached to Mesa bug #108919.
>>> ---
>>>   src/mapi/glapi/gen/gl_API.xml    | 17 +
>>>   src/mesa/main/extensions_table.h |  1 +
>>>   src/mesa/main/objectlabel.c  |  6 ++
>>>   3 files changed, 24 insertions(+)
>>>
>>> diff --git a/src/mapi/glapi/gen/gl_API.xml
>>> b/src/mapi/glapi/gen/gl_API.xml
>>> index f1def8090d..75423c4edb 100644
>>> --- a/src/mapi/glapi/gen/gl_API.xml
>>> +++ b/src/mapi/glapi/gen/gl_API.xml
>>> @@ -12973,6 +12973,23 @@
>>>   >> value="0x904B" />
>>>   
>>>   +
>>> +  
>>
>> Since these are just aliases, I don't think any changes needed in
>> dispatch-sanity... but did you run 'make check' anyway? :)
>>
> 
> Yes :) Passed as expected.
> 
> 
>>> +    
>>> +    
>>> +    
>>> +    
>>> +  
>>> +
>>> +  
>>> +    
>>> +    
>>> +    
>>> +    
>>> +    
>>> +  
>>> +
>>> +
>>>   >> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>>     
>>> diff --git a/src/mesa/main/extensions_table.h
>>> b/src/mesa/main/extensions_table.h
>>> index dad38124d5..b68f6781c4 100644
>>> --- a/src/mesa/main/extensions_table.h
>>> +++ b/src/mesa/main/extensions_table.h
>>> @@ -217,6 +217,7 @@ EXT(EXT_compiled_vertex_array   ,
>>> dummy_true
>>>   EXT(EXT_compressed_ETC1_RGB8_sub_texture    ,
>>> OES_compressed_ETC1_RGB8_texture   ,  x ,  x , ES1, ES2, 2014)
>>>   EXT(EXT_copy_image  ,
>>> OES_copy_image ,  x ,  x ,  x ,  30, 2014)
>>>   EXT(EXT_copy_texture    ,
>>> dummy_true , GLL,  x ,  x ,  x , 1995)
>>> +EXT(EXT_debug_label ,
>>> dummy_true , GLL, GLC,  x ,  x , 2013)
>>>   EXT(EXT_depth_bounds_test   ,
>>> EXT_depth_bounds_test  , GLL, GLC,  x ,  x , 2002)
>>>   EXT(EXT_discard_framebuffer ,
>>> dummy_true ,  x ,  x , ES1, ES2, 2009)
>>>   EXT(EXT_disjoint_timer_query    ,
>>> EXT_disjoint_timer_query   ,  x ,  x ,  x , ES2, 2016)
>>> diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
>>> index 1e3022ee54..9d4cc1871e 100644
>>> --- a/src/mesa/main/objectlabel.c
>>> +++ b/src/mesa/main/objectlabel.c
>>> @@ -139,6 +139,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
>>> identifier, GLuint name,
>>>    switch (identifier) {
>>>  case GL_BUFFER:
>>> +   case GL_BUFFER_OBJECT_EXT:
>>>     {
>>>    struct gl_buffer_object *bufObj =
>>> _mesa_lookup_bufferobj(ctx, name);
>>>    if (bufObj)
>>> @@ -146,6 +147,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
>>> identifier, GLuint name,
>>>     }
>>>     break;
>>>  case GL_SHADER:
>

Re: [Mesa-dev] [PATCH] mesa: add EXT_debug_label support

2018-12-10 Thread Ian Romanick
It seems like someone already sent out patches to implement this, and we
decided to not take it for some reason.  Maybe it was Rob?

On 12/10/18 4:08 PM, Timothy Arceri wrote:
> KHR_debug already provides superior functionality but this
> extension is still in use and adding support for it seems fairly
> harmless. For example it seems to be used by Unity as seen in the
> Parkitect trace attached to Mesa bug #108919.
> ---
>  src/mapi/glapi/gen/gl_API.xml| 17 +
>  src/mesa/main/extensions_table.h |  1 +
>  src/mesa/main/objectlabel.c  |  6 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index f1def8090d..75423c4edb 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -12973,6 +12973,23 @@
>   />
>  
>  
> +
> +  

Since these are just aliases, I don't think any changes needed in
dispatch-sanity... but did you run 'make check' anyway? :)

> +
> +
> +
> +
> +  
> +
> +  
> +
> +
> +
> +
> +
> +  
> +
> +
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
>  
> diff --git a/src/mesa/main/extensions_table.h 
> b/src/mesa/main/extensions_table.h
> index dad38124d5..b68f6781c4 100644
> --- a/src/mesa/main/extensions_table.h
> +++ b/src/mesa/main/extensions_table.h
> @@ -217,6 +217,7 @@ EXT(EXT_compiled_vertex_array   , dummy_true
>  EXT(EXT_compressed_ETC1_RGB8_sub_texture, 
> OES_compressed_ETC1_RGB8_texture   ,  x ,  x , ES1, ES2, 2014)
>  EXT(EXT_copy_image  , OES_copy_image 
> ,  x ,  x ,  x ,  30, 2014)
>  EXT(EXT_copy_texture, dummy_true 
> , GLL,  x ,  x ,  x , 1995)
> +EXT(EXT_debug_label , dummy_true 
> , GLL, GLC,  x ,  x , 2013)
>  EXT(EXT_depth_bounds_test   , EXT_depth_bounds_test  
> , GLL, GLC,  x ,  x , 2002)
>  EXT(EXT_discard_framebuffer , dummy_true 
> ,  x ,  x , ES1, ES2, 2009)
>  EXT(EXT_disjoint_timer_query, EXT_disjoint_timer_query   
> ,  x ,  x ,  x , ES2, 2016)
> diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
> index 1e3022ee54..9d4cc1871e 100644
> --- a/src/mesa/main/objectlabel.c
> +++ b/src/mesa/main/objectlabel.c
> @@ -139,6 +139,7 @@ get_label_pointer(struct gl_context *ctx, GLenum 
> identifier, GLuint name,
>  
> switch (identifier) {
> case GL_BUFFER:
> +   case GL_BUFFER_OBJECT_EXT:
>{
>   struct gl_buffer_object *bufObj = _mesa_lookup_bufferobj(ctx, name);
>   if (bufObj)
> @@ -146,6 +147,7 @@ get_label_pointer(struct gl_context *ctx, GLenum 
> identifier, GLuint name,
>}
>break;
> case GL_SHADER:
> +   case GL_SHADER_OBJECT_EXT:
>{
>   struct gl_shader *shader = _mesa_lookup_shader(ctx, name);
>   if (shader)
> @@ -153,6 +155,7 @@ get_label_pointer(struct gl_context *ctx, GLenum 
> identifier, GLuint name,
>}
>break;
> case GL_PROGRAM:
> +   case GL_PROGRAM_OBJECT_EXT:
>{
>   struct gl_shader_program *program =
>  _mesa_lookup_shader_program(ctx, name);
> @@ -161,6 +164,7 @@ get_label_pointer(struct gl_context *ctx, GLenum 
> identifier, GLuint name,
>}
>break;
> case GL_VERTEX_ARRAY:
> +   case GL_VERTEX_ARRAY_OBJECT_EXT:
>{
>   struct gl_vertex_array_object *obj = _mesa_lookup_vao(ctx, name);
>   if (obj)
> @@ -168,6 +172,7 @@ get_label_pointer(struct gl_context *ctx, GLenum 
> identifier, GLuint name,
>}
>break;
> case GL_QUERY:
> +   case GL_QUERY_OBJECT_EXT:
>{
>   struct gl_query_object *query = _mesa_lookup_query_object(ctx, 
> name);
>   if (query)
> @@ -225,6 +230,7 @@ get_label_pointer(struct gl_context *ctx, GLenum 
> identifier, GLuint name,
>}
>break;
> case GL_PROGRAM_PIPELINE:
> +   case GL_PROGRAM_PIPELINE_OBJECT_EXT:
>{
>   struct gl_pipeline_object *pipe =
>  _mesa_lookup_pipeline_object(ctx, name);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] glapi: fixup EXT_multisampled_render_to_texture dispatch

2018-12-10 Thread Ian Romanick
Reviewed-by: Ian Romanick 

On 12/10/18 10:14 AM, Emil Velikov wrote:
> From: "Kristian H. Kristensen" 
> 
> There's a few missing and convoluted bits:
> 
>  - FramebufferTexture2DMultisampleEXT
> Missing sanity check, should be desktop="false"
> 
>  - RenderbufferStorageMultisampleEXT
> Missing sanity check, is aliased to RenderbufferStorageMultisample.
> Thus it's set only when desktop GL or GLES2 v3.0+, while the extension
> is GLES2 2.0+.
> 
> If we flip the aliasing we'll break indirect GLX, so loosen the version
> to 2.0. Not perfect, yet this is the most sane thing I could think of.
> 
> v2: [Emil] Fixup RenderbufferStorageMultisampleEXT, commmit message
> 
> Cc: Kristian H. Kristensen 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108974
> Fixes: 1b331ae505e ("mesa: Add core support for 
> EXT_multisampled_render_to_texture{,2}")
> Reviewed-by: Emil Velikov 
> Signed-off-by: Emil Velikov 
> ---
>  src/mapi/glapi/gen/ARB_framebuffer_object.xml  | 10 +-
>  .../glapi/gen/EXT_multisampled_render_to_texture.xml   |  2 +-
>  src/mapi/glapi/gen/es_EXT.xml  |  2 ++
>  src/mapi/glapi/gen/gl_API.xml  |  2 --
>  src/mesa/main/tests/dispatch_sanity.cpp|  6 +-
>  5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mapi/glapi/gen/ARB_framebuffer_object.xml 
> b/src/mapi/glapi/gen/ARB_framebuffer_object.xml
> index bd0793c8ece..295175c8816 100644
> --- a/src/mapi/glapi/gen/ARB_framebuffer_object.xml
> +++ b/src/mapi/glapi/gen/ARB_framebuffer_object.xml
> @@ -172,7 +172,15 @@
>   
>  
>  
> -
> +
> +
>  
>  
>  
> diff --git a/src/mapi/glapi/gen/EXT_multisampled_render_to_texture.xml 
> b/src/mapi/glapi/gen/EXT_multisampled_render_to_texture.xml
> index 555b008bd33..d76ecd47d0e 100644
> --- a/src/mapi/glapi/gen/EXT_multisampled_render_to_texture.xml
> +++ b/src/mapi/glapi/gen/EXT_multisampled_render_to_texture.xml
> @@ -20,7 +20,7 @@
>  
>  -->
>  
> -
> + desktop="false">
>  
>  
>  
> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
> index bbc4a1a1118..917fed62f98 100644
> --- a/src/mapi/glapi/gen/es_EXT.xml
> +++ b/src/mapi/glapi/gen/es_EXT.xml
> @@ -810,6 +810,8 @@
>  
>  
>  
> + xmlns:xi="http://www.w3.org/2001/XInclude"/>
> +
>  
>  
>  
> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index f1def8090de..f4d0808f13b 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -8175,8 +8175,6 @@
>  
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
> - xmlns:xi="http://www.w3.org/2001/XInclude"/>
> -
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
>  
> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
> b/src/mesa/main/tests/dispatch_sanity.cpp
> index fb2acfbdeea..307639a4a4e 100644
> --- a/src/mesa/main/tests/dispatch_sanity.cpp
> +++ b/src/mesa/main/tests/dispatch_sanity.cpp
> @@ -2236,6 +2236,10 @@ const struct function gles2_functions_possible[] = {
> /* GL_NV_conservative_raster_pre_snap_triangles */
> { "glConservativeRasterParameteriNV", 20, -1 },
>  
> +   /* GL_EXT_multisampled_render_to_texture */
> +   { "glRenderbufferStorageMultisampleEXT", 20, -1 },
> +   { "glFramebufferTexture2DMultisampleEXT", 20, -1 },
> +
> { NULL, 0, -1 }
>  };
>  
> @@ -2330,7 +2334,7 @@ const struct function gles3_functions_possible[] = {
> // glProgramParameteri aliases glProgramParameteriEXT in GLES 2
> // We check for the aliased -NV version in GLES 2
> // { "glReadBuffer", 30, -1 },
> -   { "glRenderbufferStorageMultisample", 30, -1 },
> +   // glRenderbufferStorageMultisample aliases 
> glRenderbufferStorageMultisampleEXT in GLES 2
> { "glResumeTransformFeedback", 30, -1 },
> { "glSamplerParameterf", 30, -1 },
> { "glSamplerParameterfv", 30, -1 },
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/28] Revert "spirv: Don’t check for NaN for most OpFOrd* comparisons"

2018-12-06 Thread Ian Romanick
On 12/05/2018 10:12 AM, Connor Abbott wrote:
> This won't work, since this optimization in nir_opt_algebraic will undo it:
> 
> # For any float comparison operation, "cmp", if you have "a == a && a cmp b"
> # then the "a == a" is redundant because it's equivalent to "a is not NaN"
> # and, if a is a NaN then the second comparison will fail anyway.
> for op in ['flt', 'fge', 'feq']:
>optimizations += [
>   (('iand', ('feq', a, a), (op, a, b)), (op, a, b)),
>   (('iand', ('feq', a, a), (op, b, a)), (op, b, a)),
>]
> 
> and then this optimization might change the behavior for NaN's:
> 
># Comparison simplifications
>(('~inot', ('flt', a, b)), ('fge', a, b)),
>(('~inot', ('fge', a, b)), ('flt', a, b)),
>(('~inot', ('feq', a, b)), ('fne', a, b)),
>(('~inot', ('fne', a, b)), ('feq', a, b)),
> 
> The topic of NaN's and comparisons in NIR has been discussed several
> times before, most recently with this thread:
> https://lists.freedesktop.org/archives/mesa-dev/2018-December/210780.html

These two optimizations do not play well together.  Each by itself is
valid, but the combination is not.  As I mentioned in the previous
thread, I believe the first change should mark the resulting (op, a, b)
as precise.  That would prevent the second optimization from triggering.
 I don't think nir_opt_algebraic has way to do this, but it doesn't seem
like it would be that hard to add.  Regardless of what happens with the
rest, breaking the cumulative effect of these optimizations is necessary.

> Given that this language got added to the Vulkan spec: "By default,
> the implementation may perform optimizations on half, single, or
> double-precision floating-point instructions respectively that ignore
> sign of a zero, or assume that arguments and results are not Nans or
> ±∞" I think we should probably do the following:
> 
> - Fix the CTS tests that prompted this whole block of code to only
> check the result of comparing NaN when this extension is available and
> NaN's are preserved.
> - nir_op_fne must be unordered already, regardless of what
> floating-point options the user asked for, since it's used to
> implement isNan() already. We should probably also define nir_op_feq
> to be ordered. So we don't have to do anything with them, and !(a ==
> b) == (a == b) is guaranteed.
> - Define fgt and fle to be ordered, as this is what every backend
> already does. No need to add unnecessary NaN checks.
> - Disable the comparison simplifications (except for the fne one,
> which shouldn't be marked imprecise as it is now) when preserve-nan is
> set. I think there are a few other imprecise transforms that also need
> to be disabled.

Even with the work that I've been doing this week, removing these
optimizations still hurts (quite dramatically in a few cases) over 1500
shaders in shader-db.  By and large, applications go to great lengths to
avoid things that could generate NaN.  If a calculation generates NaN in
a graphics shader, you've already lost.  As a result, these hurt shaders
work as-is.  Adding an extra 8% instructions to a working shader doesn't
help anyone.

Based on the commit message in d55835b8bdf0, my hypothesis is that
disabling the combination of the two sets of optimizations won't
negatively affect any shaders.

> - (optional) Add fgtu and fleu opcodes for unordered comparisons. This
> might help backends which can do these in only one instruction. Even
> if we don't do this, these can be implemented as not (fle a, b) and
> not (fgt a, b) respectively, which is fewer instructions than the
> current lowering.
> - (optional) Add fequ and fneo opcodes that do unordered equal and
> ordered not-equal, respectively. Otherwise they have to be implemented
> with explicit NaN checks like now.
> 
> On Wed, Dec 5, 2018 at 4:56 PM Samuel Iglesias Gonsálvez
>  wrote:
>>
>> This reverts commit c4ab1bdcc9710e3c7cc7115d3be9c69b7e7712ef. We need
>> to check the arguments looking for NaNs, because they can introduce
>> failures in tests for FOrd*, specially when running
>> VK_KHR_shader_float_control tests in CTS.
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez 
>> ---
>>  src/compiler/spirv/vtn_alu.c | 17 +++--
>>  1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c
>> index dc6fedc9129..629b57560ca 100644
>> --- a/src/compiler/spirv/vtn_alu.c
>> +++ b/src/compiler/spirv/vtn_alu.c
>> @@ -535,18 +535,23 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
>>break;
>> }
>>
>> -   case SpvOpFOrdNotEqual: {
>> -  /* For all the SpvOpFOrd* comparisons apart from NotEqual, the value
>> -   * from the ALU will probably already be false if the operands are not
>> -   * ordered so we don’t need to handle it specially.
>> -   */
>> +   case SpvOpFOrdEqual:
>> +   case SpvOpFOrdNotEqual:
>> +   case SpvOpFOrdLessThan:
>> +   case SpvOpFOrdGreaterThan:
>> +   case SpvOpFOrdLessThanEqual:
>> +   ca

Re: [Mesa-dev] [PATCH] spirv: Silence a couple of warnings

2018-12-05 Thread Ian Romanick
These look good to me.  I don't remember whether or not that kind of
initializer makes MSCV happy, but meh.

Reviewed-by: Ian Romanick 

On 12/05/2018 01:13 PM, Jason Ekstrand wrote:
> They occur when building with clang:
> 
> [1/40] Compiling C object 
> 'src/compiler/nir/s...piler@nir@@nir@sta/.._spirv_vtn_glsl450.c.o'.
> ../src/compiler/spirv/vtn_glsl450.c:845:39: warning: implicit conversion from 
> enumeration type 'SpvOp' (aka 'enum SpvOp_') to different enumeration type 
> 'enum GLSLstd450' [-Wenum-conversion]
>   handle_glsl450_interpolation(b, ext_opcode, w, count);
>   ^~
> 1 warning generated.
> [2/40] Compiling C object 
> 'src/compiler/nir/s...iler@nir@@nir@sta/.._spirv_spirv_to_nir.c.o'.
> ../src/compiler/spirv/spirv_to_nir.c:2167:30: warning: suggest braces around 
> initialization of subobject [-Wmissing-braces]
>  nir_tex_src none = {0};
>  ^
>  {}
> ../src/compiler/spirv/spirv_to_nir.c:2167:30: warning: suggest braces around 
> initialization of subobject [-Wmissing-braces]
>  nir_tex_src none = {0};
>  ^
>  {}
> 2 warnings generated.
> ---
>  src/compiler/spirv/spirv_to_nir.c | 2 +-
>  src/compiler/spirv/vtn_glsl450.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/compiler/spirv/spirv_to_nir.c 
> b/src/compiler/spirv/spirv_to_nir.c
> index 22efaa276d9..2ce2cbd1c4d 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -2164,7 +2164,7 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
>   (*p++) = vtn_tex_src(b, w[idx++], nir_tex_src_offset);
>  
>if (operands & SpvImageOperandsConstOffsetsMask) {
> - nir_tex_src none = {0};
> + nir_tex_src none = { .src_type = 0 };
>   gather_offsets = vtn_ssa_value(b, w[idx++]);
>   (*p++) = none;
>}
> diff --git a/src/compiler/spirv/vtn_glsl450.c 
> b/src/compiler/spirv/vtn_glsl450.c
> index b54aeb9b217..dcfc85a579c 100644
> --- a/src/compiler/spirv/vtn_glsl450.c
> +++ b/src/compiler/spirv/vtn_glsl450.c
> @@ -842,7 +842,7 @@ vtn_handle_glsl450_instruction(struct vtn_builder *b, 
> SpvOp ext_opcode,
> case GLSLstd450InterpolateAtCentroid:
> case GLSLstd450InterpolateAtSample:
> case GLSLstd450InterpolateAtOffset:
> -  handle_glsl450_interpolation(b, ext_opcode, w, count);
> +  handle_glsl450_interpolation(b, (enum GLSLstd450)ext_opcode, w, count);
>break;
>  
> default:
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Make Jordan an Owner of the mesa project?

2018-12-04 Thread Ian Romanick

Ack from me.

On December 3, 2018 4:49:09 PM Jason Ekstrand  wrote:
Jordan has requested to be made an Owner of the mesa project.  As much as I 
may be the guy who pushed to get everything set up, I don't want to do this 
sort of thing on my own.  As such, I'm asking for some ACKs.  If I can get 
5 ACKs (at least 2 non-intel) from other Owners and no NAKs, I'll click the 
button.


Personally, I think the answer here is absurdly obvious.  Jordan is one of 
the most involved people in the community. :-D


As a side-note, does this seem like a reasonable process for adding people 
as Owners?


--Jason


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: add a compiler option for disabling float comparison simplifications

2018-12-03 Thread Ian Romanick
On 11/29/2018 07:47 AM, Connor Abbott wrote:
> On Thu, Nov 29, 2018 at 4:22 PM Jason Ekstrand  wrote:
>>
>> Can you provide some context for this?  Those rules are already flagged 
>> "inexact" (that's what the ~ means) so they won't apply to anything that's 
>> "precise" or "invariant".
> 
> I think the concern is that this isn't allowed in SPIR-V, even without
> exact or invariant. We even go out of our way to do the correct thing
> in the frontend by inserting an "&& a == a" or "|| a != a", but then
> opt_algebraic removes it with another rule and then this rule can flip
> it from ordered to unordered. The spec says that operations don't have

I think this is the real problem.  There's this block in
nir_opt_algebraic:

# For any float comparison operation, "cmp", if you have "a == a && a cmp b"
# then the "a == a" is redundant because it's equivalent to "a is not NaN"
# and, if a is a NaN then the second comparison will fail anyway.
for op in ['flt', 'fge', 'feq']:
   optimizations += [
  (('iand', ('feq', a, a), (op, a, b)), (op, a, b)),
  (('iand', ('feq', a, a), (op, b, a)), (op, b, a)),
   ]

The assumption is that dropping the 'a == a' will not affect the result,
but if the whole tree was actually '!(a == a && a >= b)' dropping the 'a
== a' does affect the result.  At the very least, I think this should be
modified (somehow?) to mark the replacement expression as precise.  If
we don't already, we should also make sure the isnan function marks the
comparison precise.

> to produce NaN, but it doesn't say anything on comparisons other than
> the generic "everything must follow IEEE rules" and an entry in the
> table that says "produces correct results." Then again, I can't find
> anything in GLSL allowing these transforms either, so maybe we just
> need to get rid of them.
> 
>>
>> On Thu, Nov 29, 2018 at 9:18 AM Samuel Pitoiset  
>> wrote:
>>>
>>> It's correct in GLSL because the behaviour is undefined in
>>> presence of NaNs. But this seems incorrect in Vulkan.
>>>
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  src/compiler/nir/nir.h| 6 ++
>>>  src/compiler/nir/nir_opt_algebraic.py | 8 
>>>  2 files changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>>> index db935c8496b..4107c293962 100644
>>> --- a/src/compiler/nir/nir.h
>>> +++ b/src/compiler/nir/nir.h
>>> @@ -2188,6 +2188,12 @@ typedef struct nir_shader_compiler_options {
>>> /* Set if nir_lower_wpos_ytransform() should also invert gl_PointCoord. 
>>> */
>>> bool lower_wpos_pntc;
>>>
>>> +   /* If false, lower ~inot(flt(a,b)) -> fge(a,b) and variants.
>>> +* In presence of NaNs, this is correct in GLSL because the 
>>> behaviour is
>>> +* undefined. In Vulkan, doing these transformations is incorrect.
>>> +*/
>>> +   bool exact_float_comparisons;
>>> +
>>> /**
>>>  * Should nir_lower_io() create load_interpolated_input intrinsics?
>>>  *
>>> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
>>> b/src/compiler/nir/nir_opt_algebraic.py
>>> index f2a7be0c403..3750874407b 100644
>>> --- a/src/compiler/nir/nir_opt_algebraic.py
>>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>>> @@ -154,10 +154,10 @@ optimizations = [
>>> (('ishl', ('imul', a, '#b'), '#c'), ('imul', a, ('ishl', b, c))),
>>>
>>> # Comparison simplifications
>>> -   (('~inot', ('flt', a, b)), ('fge', a, b)),
>>> -   (('~inot', ('fge', a, b)), ('flt', a, b)),
>>> -   (('~inot', ('feq', a, b)), ('fne', a, b)),
>>> -   (('~inot', ('fne', a, b)), ('feq', a, b)),
>>> +   (('~inot', ('flt', a, b)), ('fge', a, b), 
>>> '!options->exact_float_comparisons'),
>>> +   (('~inot', ('fge', a, b)), ('flt', a, b), 
>>> '!options->exact_float_comparisons'),
>>> +   (('~inot', ('feq', a, b)), ('fne', a, b), 
>>> '!options->exact_float_comparisons'),
>>> +   (('~inot', ('fne', a, b)), ('feq', a, b), 
>>> '!options->exact_float_comparisons'),
>>
>>
>> The feq/fne one is actually completely safe.  fne is defined to be !feq even 
>> when NaN is considered.
>>
>> --Jasoan
>>
>>>
>>> (('inot', ('ilt', a, b)), ('ige', a, b)),
>>> (('inot', ('ult', a, b)), ('uge', a, b)),
>>> (('inot', ('ige', a, b)), ('ilt', a, b)),
>>> --
>>> 2.19.2
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] mesa: Revert INTEL_fragment_shader_ordering support

2018-12-03 Thread Ian Romanick
On 12/03/2018 08:10 AM, Emil Velikov wrote:
> On Thu, 29 Nov 2018 at 23:54, Matt Turner  wrote:
>>
>> This extension is not properly tested (testing for
>> GL_ARB_fragment_shader_interlock is not sufficient), and since this was
>> noted in review on August 28th no tests have been sent.
>>
>> Revert "i965: Add INTEL_fragment_shader_ordering support."
>> Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering"
>>
>> This reverts commit 03ecec9ed2099f6e2b62994b33dc948dc731e7b8.
>> This reverts commit 119435c8778dd26cb7c8bcde9f04b3982239fe60.
>>
> Kind of unfortunate but I can see where you're coming from.
> No tests, thus one cannot verify the extension works correctly and
> ensures it stays OK.
> 
> Fwiw I couldn't spot any dEQP tests either :-(
> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>> Emil: I just noticed that this was never reverted from master (and it
>> needs to be removed before the 18.3 release)
>>
> Are you planning to push this to master? Or you'd like it reverted
> only for the 18.3 branch?

master and 18.3.

> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: add a compiler option for disabling float comparison simplifications

2018-12-03 Thread Ian Romanick
On 12/01/2018 05:58 AM, Connor Abbott wrote:
> On Fri, Nov 30, 2018 at 10:18 PM Ian Romanick  wrote:
>>
>> On 11/29/2018 07:47 AM, Connor Abbott wrote:
>>> On Thu, Nov 29, 2018 at 4:22 PM Jason Ekstrand  wrote:
>>>>
>>>> Can you provide some context for this?  Those rules are already flagged 
>>>> "inexact" (that's what the ~ means) so they won't apply to anything that's 
>>>> "precise" or "invariant".
>>>
>>> I think the concern is that this isn't allowed in SPIR-V, even without
>>> exact or invariant. We even go out of our way to do the correct thing
>>> in the frontend by inserting an "&& a == a" or "|| a != a", but then
>>
>> If you're that paranoid about it, why not just mark the operations are
>> precise?  That's literally why it exists.
> 
> Yes, that's right. And if we decide we need to get it correct for
> SPIR-V, then what it's doing now is broken anyways...
> 
>>
>>> opt_algebraic removes it with another rule and then this rule can flip
>>> it from ordered to unordered. The spec says that operations don't have
>>> to produce NaN, but it doesn't say anything on comparisons other than
>>> the generic "everything must follow IEEE rules" and an entry in the
>>> table that says "produces correct results." Then again, I can't find
>>> anything in GLSL allowing these transforms either, so maybe we just
>>> need to get rid of them.
>>
>> What I hear you saying is, "The behavior isn't defined."  Unless you can
>> point to a CTS test or an application that has incorrect behavior, I'm
>> going to oppose removing this pretty strongly.  *Every* GLSL compiler
>> does this.
> 
> No, I don't think ARB_shader_precision definitely says that the
> behavior is undefined. While it does say that you don't have to
> produce NaN's, it also says that intBitsToFloat() must produce a NaN
> given the right input, it otherwise just says that comparisons must
> produce the "correct result," with no exception for NaN's. "correct
> result" does not mean "the behavior is undefined." It never refers
> back to the IEEE spec or says what "correct result" means, but one
> could only assume it's referring to the required unsignaling
> comparisons (Table 5.1 and 5.3 in IEEE 754-2008), which is also what C
> defines them to be. Those rules haven't changed much since, and
> they're basically the same for Vulkan.

GLSL 4.6 says:

(In section 4.1.4 Floating-Point Variables) "While encodings are
logically IEEE 754, operations (addition, multiplication, etc.) are not
necessarily performed as required by IEEE 754. See section 4.7.1 “Range
and Precision” for more details on precision and usage of NaNs (Not a
Number) and Infs (positive or negative infinities)."

(In section 4.7.1 Range and Precision) "NaNs are not required to be
generated.  Support for signaling NaNs is not required and exceptions
are never raised. Operations and built-in functions that operate on a
NaN are not required to return a NaN as the result."

(In the description of intBitsToFloat) "If a NaN is passed in, it will
not signal, and the resulting value is unspecified."

(In the description of packDouble2x32) "If an IEEE 754 Inf or NaN is
created, it will not signal, and the resulting floating-point value is
unspecified."

Aside from the description of the isnan function, that is all the
mention of NaN in the spec.  If you are depending on any particular NaN
behavior, you have already lost.  I will emphasize "operations... are
not necessarily performed as required by IEEE 754."

> As have others have said, there are currently Vulkan CTS tests that
> actually checks comparisons with NaN, and we currently pass it
> basically by dumb luck because of the brokenness I mentioned (see mesa
> commit e062eb6415de3aa51b43f30d638ce8215efc0511 which introduced the
> extra checks for NaN and cites the CTS tests). It would probably be an
> uphill battle to change the CTS tests, partially because one can argue
> that it actually is required, but also because of the CL-over-Vulkan
> efforts, as well as DXVK and VKD3D which are emulating API's that need
> comparisons with NaN to work correctly. Also, according to
> https://patchwork.freedesktop.org/patch/206486/, apparently
> Wolfenstein 2 actually does care about it and breaks if you change
> ordered to unordered -- again, we're getting it right by dumb luck.
> And it's probably likely that some DX game does it, and we also get it
> 

  1   2   3   4   5   6   7   8   9   10   >