Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-15 Thread Rob Clark
On Wed, Apr 15, 2015 at 10:32 AM, Connor Abbott  wrote:
>>
>> But more immediately, I hit a sort of snag:  I cannot seem to narrow
>> from 32b to 16b at the same time I move to address register.  Which
>> ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
>> get it into address register...  which sort of defeats the purpose of
>> this whole exercise.  I need to do some more r/e around this, but it
>> may end up being better the way it was before.  And if we end up
>> needing to do the shl in half-precision registers, then solving this
>> in NIR would (I think) require NIR to be aware of half-precision.
>> Which sounds useful, but -EBIGGER_FIRES
>
> I don't quite understand... is this just a problem with using
> registers? Would the entire sequence of operations need to be in 16
> bits, or can you have whatever instruction computed your address do
> the conversion to 16-bit as part of the output? If it's the latter,
> you can just re-emit a 16-bit-outputting version of it and use that,
> although it's a bit of a hack.

Well, the problem if the shl is done in NIR, I commenly end up with:

  cov.f32s32 r0.x, r0.x
  shl.b r0.x, r0.x, 2
  cov.s32s16 hr0.x, r0.x
  mova a0.x, hr0.x

vs

  cov.f32s16 hr0.x, r0.x
  shl.b hr0.x, hr0.x, 2
  mova a0.x, hr0.x

(in both cases, with four nop's between each if I don't have other
instructions I can schedule in between, so the extra instruction here
could translate to 4 cycles in a lot of cases)

I tried to see if I could do something like 'shl.b hr0.x, r0.x, 2' but
that didn't seem to work.. possibly the narrowing on alu op's only
works for float.  (The ISA has all these fun holes like that, where
the instruction encoding supports things, but the designers apparently
didn't think it was worth it to spend transistors for less common
cases)


> Long-term, support for half-precision in NIR is definitely in the
> cards, but it'll probably have to wait until fp64 support as they're
> both very similar wrt changes we have to make in the IR. Unless
> someone has a burning desire to do half-precision first :).

It would be useful..  or at least my understanding is half-precision
should be faster for me.  Although I don't see a particularly good way
to solve that until we figure out how to get TGSI out from the middle.
I guess I need to take a look at the tie-ins between
vbo/uniform/texture/etc state in mesa st and tgsi, and figure out how
to make that work without a glsl_to_tgsi pass so drivers can request
NIR directly.  That probably comes after flow control in ir3 backend,
but *eventually* I'll be really interested in proper mediump support.

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-15 Thread Connor Abbott
On Tue, Apr 14, 2015 at 7:08 PM, Rob Clark  wrote:
> On Tue, Apr 14, 2015 at 6:24 PM, Connor Abbott  wrote:
>> On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark  wrote:
>>> On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand  
>>> wrote:
> +   /**
> +* Addressing mode for corresponding _indirect intrinsics:
> +*/
> +   nir_addressing_mode var_addressing_mode;
> +   nir_addressing_mode input_addressing_mode;
> +   nir_addressing_mode output_addressing_mode;
> +   nir_addressing_mode uniform_addressing_mode;
> +   nir_addressing_mode ubo_addressing_mode;

 What is var_addressing_mode?  Sorry for not bringing that up before.
>>>
>>>
>>> well, originally in my thinking it was for load_var/store_var..  but
>>> perhaps that doesn't make sense (given lower_io).  Maybe it makes more
>>> sense to define is as applying to var_local/var_global (where the
>>> others apply to shader_in/shader_out/uniform and their equivalent
>>> intrinsic forms)?
>>>
>>> Maybe it's a bit weird since I don't lower vars to regs before feeding
>>> to my ir3 frontend, but the whole load_var/store_var for array access,
>>> and ssa for everything else form works kind of nicely for me.
>>>
>>> BR,
>>> -R
>>
>> I don't think we should be letting the driver define the stride of
>> variable array accesses. Variables are supposed to be structured,
>> backend-independent things that core NIR can manipulate and optimize
>> as it pleases; it shouldn't need to know anything about how the driver
>> will index the data. For doing the kinds of optimizations you're
>> talking about, you have registers that are backend-dependent, and core
>> NIR (other than the lower_locals_to_regs) doesn't need to know what
>> the indices mean. What you're doing right now is a hack, and if you
>> want to get the benefits of optimizing the index expression in core
>> NIR you should be using lower_locals_to_regs(). Having scalars be SSA
>> values and arrays be registers can't be that much more complicated
>> than having arrays be variables, and that's how it was set up to work
>> from the beginning.
>
> well, it is pretty convenient for me to have direct and indirect array
> access come via intrinsics, since that gives me a nice single point to
> do all the magic I need to do to set up instruction dependencies for
> scheduling and register assignment where the arrays get allocated in
> registers.  Possibly that means we need an option to lower array
> access to some new sort of intrinsic?  Not sure, I'll play with the
> lower_locals_to_regs without first coming out of SSA.. maybe if then
> the only things in regs are array accesses, I can achieve the same
> result.

Yeah, it seems like some sort of load_register intrinsic might be more
useful to you... there are a few reasons I never added it from the
beginning:

- You can't have a pointer to the nir_register * that contains useful
info like the number of vector components. You would have to search
through the list of registers, which takes O(n) time and is a lot more
annoying.
- There's another usecase for registers, namely backends that don't
support SSA at all; there, the possibility of register reads/writes
that are arbitrarily ordered compared to how they're used seems like
no fun. Having more than one way to represent a register load/store
doesn't seem like a great idea either.

although I can't imagine supporting the current way of loading/storing
registers would be that much more complicated. Variables definitely
aren't what you want.

>
> But more immediately, I hit a sort of snag:  I cannot seem to narrow
> from 32b to 16b at the same time I move to address register.  Which
> ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
> get it into address register...  which sort of defeats the purpose of
> this whole exercise.  I need to do some more r/e around this, but it
> may end up being better the way it was before.  And if we end up
> needing to do the shl in half-precision registers, then solving this
> in NIR would (I think) require NIR to be aware of half-precision.
> Which sounds useful, but -EBIGGER_FIRES

I don't quite understand... is this just a problem with using
registers? Would the entire sequence of operations need to be in 16
bits, or can you have whatever instruction computed your address do
the conversion to 16-bit as part of the output? If it's the latter,
you can just re-emit a 16-bit-outputting version of it and use that,
although it's a bit of a hack.

Long-term, support for half-precision in NIR is definitely in the
cards, but it'll probably have to wait until fp64 support as they're
both very similar wrt changes we have to make in the IR. Unless
someone has a burning desire to do half-precision first :).

>
> The other problem is that currently ttn gives addr src in float, which
> is how things are in tgsi land.  I'm not sure if changing this will be
> a problem for Eric.
>
> An interesting alternative solu

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Rob Clark
On Tue, Apr 14, 2015 at 7:08 PM, Rob Clark  wrote:
> On Tue, Apr 14, 2015 at 6:24 PM, Connor Abbott  wrote:
>> On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark  wrote:
>>> On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand  
>>> wrote:
> +   /**
> +* Addressing mode for corresponding _indirect intrinsics:
> +*/
> +   nir_addressing_mode var_addressing_mode;
> +   nir_addressing_mode input_addressing_mode;
> +   nir_addressing_mode output_addressing_mode;
> +   nir_addressing_mode uniform_addressing_mode;
> +   nir_addressing_mode ubo_addressing_mode;

 What is var_addressing_mode?  Sorry for not bringing that up before.
>>>
>>>
>>> well, originally in my thinking it was for load_var/store_var..  but
>>> perhaps that doesn't make sense (given lower_io).  Maybe it makes more
>>> sense to define is as applying to var_local/var_global (where the
>>> others apply to shader_in/shader_out/uniform and their equivalent
>>> intrinsic forms)?
>>>
>>> Maybe it's a bit weird since I don't lower vars to regs before feeding
>>> to my ir3 frontend, but the whole load_var/store_var for array access,
>>> and ssa for everything else form works kind of nicely for me.
>>>
>>> BR,
>>> -R
>>
>> I don't think we should be letting the driver define the stride of
>> variable array accesses. Variables are supposed to be structured,
>> backend-independent things that core NIR can manipulate and optimize
>> as it pleases; it shouldn't need to know anything about how the driver
>> will index the data. For doing the kinds of optimizations you're
>> talking about, you have registers that are backend-dependent, and core
>> NIR (other than the lower_locals_to_regs) doesn't need to know what
>> the indices mean. What you're doing right now is a hack, and if you
>> want to get the benefits of optimizing the index expression in core
>> NIR you should be using lower_locals_to_regs(). Having scalars be SSA
>> values and arrays be registers can't be that much more complicated
>> than having arrays be variables, and that's how it was set up to work
>> from the beginning.
>
> well, it is pretty convenient for me to have direct and indirect array
> access come via intrinsics, since that gives me a nice single point to
> do all the magic I need to do to set up instruction dependencies for
> scheduling and register assignment where the arrays get allocated in
> registers.  Possibly that means we need an option to lower array
> access to some new sort of intrinsic?  Not sure, I'll play with the
> lower_locals_to_regs without first coming out of SSA.. maybe if then
> the only things in regs are array accesses, I can achieve the same
> result.
>
> But more immediately, I hit a sort of snag:  I cannot seem to narrow
> from 32b to 16b at the same time I move to address register.  Which
> ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
> get it into address register...  which sort of defeats the purpose of
> this whole exercise.  I need to do some more r/e around this, but it
> may end up being better the way it was before.  And if we end up
> needing to do the shl in half-precision registers, then solving this
> in NIR would (I think) require NIR to be aware of half-precision.
> Which sounds useful, but -EBIGGER_FIRES
>
> The other problem is that currently ttn gives addr src in float, which
> is how things are in tgsi land.  I'm not sure if changing this will be
> a problem for Eric.

ahh, disregard that last para.. I had ARL vs UARL confusion.. but
still the narrowing to half precision is an issue for me.

BR,
-R

> An interesting alternative solution to consider, is to allow the
> backend to lower to driver specific specific alu opcodes.. and then
> somehow run those through the other generic NIR opt passes.  I'm not
> quite sure yet *how* that will work (esp. considering my need for
> doing some things in half precision), but if we come up with something
> it would help in other cases too (such as lowering integer multiply)
>
> For now, I think I'll go back to having ttn UBO support not depending
> on this patch, since in the short term I need to sort out if/else
> flattening and flow control so that we can drop the tgsi f/e.
>
> BR,
> -R
>
>> Connor
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Rob Clark
On Tue, Apr 14, 2015 at 6:24 PM, Connor Abbott  wrote:
> On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark  wrote:
>> On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand  wrote:
 +   /**
 +* Addressing mode for corresponding _indirect intrinsics:
 +*/
 +   nir_addressing_mode var_addressing_mode;
 +   nir_addressing_mode input_addressing_mode;
 +   nir_addressing_mode output_addressing_mode;
 +   nir_addressing_mode uniform_addressing_mode;
 +   nir_addressing_mode ubo_addressing_mode;
>>>
>>> What is var_addressing_mode?  Sorry for not bringing that up before.
>>
>>
>> well, originally in my thinking it was for load_var/store_var..  but
>> perhaps that doesn't make sense (given lower_io).  Maybe it makes more
>> sense to define is as applying to var_local/var_global (where the
>> others apply to shader_in/shader_out/uniform and their equivalent
>> intrinsic forms)?
>>
>> Maybe it's a bit weird since I don't lower vars to regs before feeding
>> to my ir3 frontend, but the whole load_var/store_var for array access,
>> and ssa for everything else form works kind of nicely for me.
>>
>> BR,
>> -R
>
> I don't think we should be letting the driver define the stride of
> variable array accesses. Variables are supposed to be structured,
> backend-independent things that core NIR can manipulate and optimize
> as it pleases; it shouldn't need to know anything about how the driver
> will index the data. For doing the kinds of optimizations you're
> talking about, you have registers that are backend-dependent, and core
> NIR (other than the lower_locals_to_regs) doesn't need to know what
> the indices mean. What you're doing right now is a hack, and if you
> want to get the benefits of optimizing the index expression in core
> NIR you should be using lower_locals_to_regs(). Having scalars be SSA
> values and arrays be registers can't be that much more complicated
> than having arrays be variables, and that's how it was set up to work
> from the beginning.

well, it is pretty convenient for me to have direct and indirect array
access come via intrinsics, since that gives me a nice single point to
do all the magic I need to do to set up instruction dependencies for
scheduling and register assignment where the arrays get allocated in
registers.  Possibly that means we need an option to lower array
access to some new sort of intrinsic?  Not sure, I'll play with the
lower_locals_to_regs without first coming out of SSA.. maybe if then
the only things in regs are array accesses, I can achieve the same
result.

But more immediately, I hit a sort of snag:  I cannot seem to narrow
from 32b to 16b at the same time I move to address register.  Which
ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
get it into address register...  which sort of defeats the purpose of
this whole exercise.  I need to do some more r/e around this, but it
may end up being better the way it was before.  And if we end up
needing to do the shl in half-precision registers, then solving this
in NIR would (I think) require NIR to be aware of half-precision.
Which sounds useful, but -EBIGGER_FIRES

The other problem is that currently ttn gives addr src in float, which
is how things are in tgsi land.  I'm not sure if changing this will be
a problem for Eric.

An interesting alternative solution to consider, is to allow the
backend to lower to driver specific specific alu opcodes.. and then
somehow run those through the other generic NIR opt passes.  I'm not
quite sure yet *how* that will work (esp. considering my need for
doing some things in half precision), but if we come up with something
it would help in other cases too (such as lowering integer multiply)

For now, I think I'll go back to having ttn UBO support not depending
on this patch, since in the short term I need to sort out if/else
flattening and flow control so that we can drop the tgsi f/e.

BR,
-R

> Connor
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Connor Abbott
On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark  wrote:
> On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand  wrote:
>>> +   /**
>>> +* Addressing mode for corresponding _indirect intrinsics:
>>> +*/
>>> +   nir_addressing_mode var_addressing_mode;
>>> +   nir_addressing_mode input_addressing_mode;
>>> +   nir_addressing_mode output_addressing_mode;
>>> +   nir_addressing_mode uniform_addressing_mode;
>>> +   nir_addressing_mode ubo_addressing_mode;
>>
>> What is var_addressing_mode?  Sorry for not bringing that up before.
>
>
> well, originally in my thinking it was for load_var/store_var..  but
> perhaps that doesn't make sense (given lower_io).  Maybe it makes more
> sense to define is as applying to var_local/var_global (where the
> others apply to shader_in/shader_out/uniform and their equivalent
> intrinsic forms)?
>
> Maybe it's a bit weird since I don't lower vars to regs before feeding
> to my ir3 frontend, but the whole load_var/store_var for array access,
> and ssa for everything else form works kind of nicely for me.
>
> BR,
> -R

I don't think we should be letting the driver define the stride of
variable array accesses. Variables are supposed to be structured,
backend-independent things that core NIR can manipulate and optimize
as it pleases; it shouldn't need to know anything about how the driver
will index the data. For doing the kinds of optimizations you're
talking about, you have registers that are backend-dependent, and core
NIR (other than the lower_locals_to_regs) doesn't need to know what
the indices mean. What you're doing right now is a hack, and if you
want to get the benefits of optimizing the index expression in core
NIR you should be using lower_locals_to_regs(). Having scalars be SSA
values and arrays be registers can't be that much more complicated
than having arrays be variables, and that's how it was set up to work
from the beginning.

Connor
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Rob Clark
On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand  wrote:
>> +   /**
>> +* Addressing mode for corresponding _indirect intrinsics:
>> +*/
>> +   nir_addressing_mode var_addressing_mode;
>> +   nir_addressing_mode input_addressing_mode;
>> +   nir_addressing_mode output_addressing_mode;
>> +   nir_addressing_mode uniform_addressing_mode;
>> +   nir_addressing_mode ubo_addressing_mode;
>
> What is var_addressing_mode?  Sorry for not bringing that up before.


well, originally in my thinking it was for load_var/store_var..  but
perhaps that doesn't make sense (given lower_io).  Maybe it makes more
sense to define is as applying to var_local/var_global (where the
others apply to shader_in/shader_out/uniform and their equivalent
intrinsic forms)?

Maybe it's a bit weird since I don't lower vars to regs before feeding
to my ir3 frontend, but the whole load_var/store_var for array access,
and ssa for everything else form works kind of nicely for me.

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Jason Ekstrand
On Tue, Apr 14, 2015 at 12:56 PM, Rob Clark  wrote:
> On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand  wrote:
>> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
>>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
 Rob Clark  writes:

> From: Rob Clark 
>
> Add compiler options so driver can request the units for address value,
> for the various _indirect intrinsics.  This way, the front end can
> insert the appropriate multiply/shift operations to get the address into
> the units that the driver needs, avoiding need for driver to insert
> these on the backend (and loose out on some optimizing potential).
>
> The addressing modes are specified per var/input/output/uniform/ubo.
> I'm not sure about other drivers, but for freedreno we want byte
> addressing for ubo's and register addressing for the rest.
>
> NOTE: so far just updated ttn and freedreno, and included everything in
> one commit to make it easier to see how it fits together.  The driver
> patches would naturally be separate (since drivers not setting these
> options get the current/default behavior, ie. splitting out won't hurt
> bisectability).  Also the ttn part could be split out as long as it
> comes before freedreno or vc4 parts.

> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 6531237..e8bce9f 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
> exec_node_data(nir_function_overload, \
>exec_list_get_head(&(func)->overload_list), node)
>
> +/**
> + * Addressing mode used by the backend for various sorts of indirect
> + * access.  This allows the frontend to generate the appropriate
> + * multiply or shift operations to convert the address param for the
> + * corresponding _indirect intrinsic, rather than relying on the
> + * driver to insert them after the various other NIR optimization
> + * passes.
> + */
> +typedef enum {
> +   /** No address conversion, units in elements of array member: */
> +   nir_addressing_mode_none = 0,
> +   /** Address converted to units of registers (ie. float/int32): */
> +   nir_addressing_mode_register = 1,
> +   /** Address converted to units of bytes: */
> +   nir_addressing_mode_byte = 2,
> +} nir_addressing_mode;
> +
>  typedef struct nir_shader_compiler_options {
> bool lower_ffma;
> bool lower_flrp;
> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>  * are simulated by floats.)
>  */
> bool native_integers;
> +
> +   /**
> +* Addressing mode for corresponding _indirect intrinsics:
> +*/
> +   nir_addressing_mode var_addressing_mode;
> +   nir_addressing_mode input_addressing_mode;
> +   nir_addressing_mode output_addressing_mode;
> +   nir_addressing_mode uniform_addressing_mode;
> +   nir_addressing_mode ubo_addressing_mode;

What is var_addressing_mode?  Sorry for not bringing that up before.

> +
>  } nir_shader_compiler_options;
>
>  typedef struct nir_shader {
> --
> 2.1.0

 This seems like a good idea, as it'll mean we can have passes that do
 things like scalarizing uniform loads, which I think could be productive
 for us.

 However, I find the "register" size name confusing -- it seems you mean
 a channel in a register, while on both the pieces of hardware I've
 worked on recently, a register is 16 dwords wide.  I think I'd call it
 "dword" or "float", with a slight preference for dword.
>>>
>>> yeah, I was having trouble picking a good name for that one.. I'd
>>> originally had float, but that sounds funny when it could just as
>>> easily be uint.  (And also, I have half-precision registers that would
>>> be nice to use some day, etc.)  I think "dword" is a better
>>> alternative.
>>
>> I'm not sure what I think of this.  However, having things better
>> defined is a good idea and this may be a step in the right direction.
>>
>> Yes, please use either dword or float for 32-bits.  Also, what is this
>> addressing_mode_none?  If we want a vec4 addressing mode, we should
>> just call it vec4.
>
> well, tbh, it is vec4 for ttn, but I wasn't completely clear on how it
> behaves if you actually have vec2/vec3..  so "none" just means "do
> what you were doing before, whatever that was"..
>
> if glsl_to_nir is vec4 based, then I can rename it to vec4..
>
>>
 Also, should this apply to just the indirect variants? I'd think they
 would apply to the const_index component, as well.  Either that, or we
 make const_index always be bytes, maybe.
>>>
>>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>>> isn't very consistent w/ dwords/slots in the other cases, it might
>>> make sense..  otoh, that wouldn't change th

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Ilia Mirkin
On Tue, Apr 14, 2015 at 3:47 PM, Jason Ekstrand  wrote:
> On Tue, Apr 14, 2015 at 12:44 PM, Ilia Mirkin  wrote:
>> On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand  wrote:
>>> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
 On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
> Rob Clark  writes:
>
>> From: Rob Clark 
>>
>> Add compiler options so driver can request the units for address value,
>> for the various _indirect intrinsics.  This way, the front end can
>> insert the appropriate multiply/shift operations to get the address into
>> the units that the driver needs, avoiding need for driver to insert
>> these on the backend (and loose out on some optimizing potential).
>>
>> The addressing modes are specified per var/input/output/uniform/ubo.
>> I'm not sure about other drivers, but for freedreno we want byte
>> addressing for ubo's and register addressing for the rest.
>>
>> NOTE: so far just updated ttn and freedreno, and included everything in
>> one commit to make it easier to see how it fits together.  The driver
>> patches would naturally be separate (since drivers not setting these
>> options get the current/default behavior, ie. splitting out won't hurt
>> bisectability).  Also the ttn part could be split out as long as it
>> comes before freedreno or vc4 parts.
>
>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>> index 6531237..e8bce9f 100644
>> --- a/src/glsl/nir/nir.h
>> +++ b/src/glsl/nir/nir.h
>> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
>> exec_node_data(nir_function_overload, \
>>exec_list_get_head(&(func)->overload_list), node)
>>
>> +/**
>> + * Addressing mode used by the backend for various sorts of indirect
>> + * access.  This allows the frontend to generate the appropriate
>> + * multiply or shift operations to convert the address param for the
>> + * corresponding _indirect intrinsic, rather than relying on the
>> + * driver to insert them after the various other NIR optimization
>> + * passes.
>> + */
>> +typedef enum {
>> +   /** No address conversion, units in elements of array member: */
>> +   nir_addressing_mode_none = 0,
>> +   /** Address converted to units of registers (ie. float/int32): */
>> +   nir_addressing_mode_register = 1,
>> +   /** Address converted to units of bytes: */
>> +   nir_addressing_mode_byte = 2,
>> +} nir_addressing_mode;
>> +
>>  typedef struct nir_shader_compiler_options {
>> bool lower_ffma;
>> bool lower_flrp;
>> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>>  * are simulated by floats.)
>>  */
>> bool native_integers;
>> +
>> +   /**
>> +* Addressing mode for corresponding _indirect intrinsics:
>> +*/
>> +   nir_addressing_mode var_addressing_mode;
>> +   nir_addressing_mode input_addressing_mode;
>> +   nir_addressing_mode output_addressing_mode;
>> +   nir_addressing_mode uniform_addressing_mode;
>> +   nir_addressing_mode ubo_addressing_mode;
>> +
>>  } nir_shader_compiler_options;
>>
>>  typedef struct nir_shader {
>> --
>> 2.1.0
>
> This seems like a good idea, as it'll mean we can have passes that do
> things like scalarizing uniform loads, which I think could be productive
> for us.
>
> However, I find the "register" size name confusing -- it seems you mean
> a channel in a register, while on both the pieces of hardware I've
> worked on recently, a register is 16 dwords wide.  I think I'd call it
> "dword" or "float", with a slight preference for dword.

 yeah, I was having trouble picking a good name for that one.. I'd
 originally had float, but that sounds funny when it could just as
 easily be uint.  (And also, I have half-precision registers that would
 be nice to use some day, etc.)  I think "dword" is a better
 alternative.
>>>
>>> I'm not sure what I think of this.  However, having things better
>>> defined is a good idea and this may be a step in the right direction.
>>>
>>> Yes, please use either dword or float for 32-bits.  Also, what is this
>>> addressing_mode_none?  If we want a vec4 addressing mode, we should
>>> just call it vec4.
>>>
> Also, should this apply to just the indirect variants? I'd think they
> would apply to the const_index component, as well.  Either that, or we
> make const_index always be bytes, maybe.

 Hmm.. possibly.. since const_index is already bytes on ubo's, which
 isn't very consistent w/ dwords/slots in the other cases, it might
 make sense..  otoh, that wouldn't change the code the driver emitted.
 So I guess I could really go either way.
>>>
>>> Yes.  The indirect source and const_index should have the same units.
>>> If your hardware 

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Rob Clark
On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand  wrote:
> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
>>> Rob Clark  writes:
>>>
 From: Rob Clark 

 Add compiler options so driver can request the units for address value,
 for the various _indirect intrinsics.  This way, the front end can
 insert the appropriate multiply/shift operations to get the address into
 the units that the driver needs, avoiding need for driver to insert
 these on the backend (and loose out on some optimizing potential).

 The addressing modes are specified per var/input/output/uniform/ubo.
 I'm not sure about other drivers, but for freedreno we want byte
 addressing for ubo's and register addressing for the rest.

 NOTE: so far just updated ttn and freedreno, and included everything in
 one commit to make it easier to see how it fits together.  The driver
 patches would naturally be separate (since drivers not setting these
 options get the current/default behavior, ie. splitting out won't hurt
 bisectability).  Also the ttn part could be split out as long as it
 comes before freedreno or vc4 parts.
>>>
 diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
 index 6531237..e8bce9f 100644
 --- a/src/glsl/nir/nir.h
 +++ b/src/glsl/nir/nir.h
 @@ -1374,6 +1374,23 @@ typedef struct nir_function {
 exec_node_data(nir_function_overload, \
exec_list_get_head(&(func)->overload_list), node)

 +/**
 + * Addressing mode used by the backend for various sorts of indirect
 + * access.  This allows the frontend to generate the appropriate
 + * multiply or shift operations to convert the address param for the
 + * corresponding _indirect intrinsic, rather than relying on the
 + * driver to insert them after the various other NIR optimization
 + * passes.
 + */
 +typedef enum {
 +   /** No address conversion, units in elements of array member: */
 +   nir_addressing_mode_none = 0,
 +   /** Address converted to units of registers (ie. float/int32): */
 +   nir_addressing_mode_register = 1,
 +   /** Address converted to units of bytes: */
 +   nir_addressing_mode_byte = 2,
 +} nir_addressing_mode;
 +
  typedef struct nir_shader_compiler_options {
 bool lower_ffma;
 bool lower_flrp;
 @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
  * are simulated by floats.)
  */
 bool native_integers;
 +
 +   /**
 +* Addressing mode for corresponding _indirect intrinsics:
 +*/
 +   nir_addressing_mode var_addressing_mode;
 +   nir_addressing_mode input_addressing_mode;
 +   nir_addressing_mode output_addressing_mode;
 +   nir_addressing_mode uniform_addressing_mode;
 +   nir_addressing_mode ubo_addressing_mode;
 +
  } nir_shader_compiler_options;

  typedef struct nir_shader {
 --
 2.1.0
>>>
>>> This seems like a good idea, as it'll mean we can have passes that do
>>> things like scalarizing uniform loads, which I think could be productive
>>> for us.
>>>
>>> However, I find the "register" size name confusing -- it seems you mean
>>> a channel in a register, while on both the pieces of hardware I've
>>> worked on recently, a register is 16 dwords wide.  I think I'd call it
>>> "dword" or "float", with a slight preference for dword.
>>
>> yeah, I was having trouble picking a good name for that one.. I'd
>> originally had float, but that sounds funny when it could just as
>> easily be uint.  (And also, I have half-precision registers that would
>> be nice to use some day, etc.)  I think "dword" is a better
>> alternative.
>
> I'm not sure what I think of this.  However, having things better
> defined is a good idea and this may be a step in the right direction.
>
> Yes, please use either dword or float for 32-bits.  Also, what is this
> addressing_mode_none?  If we want a vec4 addressing mode, we should
> just call it vec4.

well, tbh, it is vec4 for ttn, but I wasn't completely clear on how it
behaves if you actually have vec2/vec3..  so "none" just means "do
what you were doing before, whatever that was"..

if glsl_to_nir is vec4 based, then I can rename it to vec4..

>
>>> Also, should this apply to just the indirect variants? I'd think they
>>> would apply to the const_index component, as well.  Either that, or we
>>> make const_index always be bytes, maybe.
>>
>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>> isn't very consistent w/ dwords/slots in the other cases, it might
>> make sense..  otoh, that wouldn't change the code the driver emitted.
>> So I guess I could really go either way.
>
> Yes.  The indirect source and const_index should have the same units.
> If your hardware takes a const_index in bytes and an indirect in
> dwords, that's not NIR

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Jason Ekstrand
On Tue, Apr 14, 2015 at 12:44 PM, Ilia Mirkin  wrote:
> On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand  wrote:
>> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
>>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
 Rob Clark  writes:

> From: Rob Clark 
>
> Add compiler options so driver can request the units for address value,
> for the various _indirect intrinsics.  This way, the front end can
> insert the appropriate multiply/shift operations to get the address into
> the units that the driver needs, avoiding need for driver to insert
> these on the backend (and loose out on some optimizing potential).
>
> The addressing modes are specified per var/input/output/uniform/ubo.
> I'm not sure about other drivers, but for freedreno we want byte
> addressing for ubo's and register addressing for the rest.
>
> NOTE: so far just updated ttn and freedreno, and included everything in
> one commit to make it easier to see how it fits together.  The driver
> patches would naturally be separate (since drivers not setting these
> options get the current/default behavior, ie. splitting out won't hurt
> bisectability).  Also the ttn part could be split out as long as it
> comes before freedreno or vc4 parts.

> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 6531237..e8bce9f 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
> exec_node_data(nir_function_overload, \
>exec_list_get_head(&(func)->overload_list), node)
>
> +/**
> + * Addressing mode used by the backend for various sorts of indirect
> + * access.  This allows the frontend to generate the appropriate
> + * multiply or shift operations to convert the address param for the
> + * corresponding _indirect intrinsic, rather than relying on the
> + * driver to insert them after the various other NIR optimization
> + * passes.
> + */
> +typedef enum {
> +   /** No address conversion, units in elements of array member: */
> +   nir_addressing_mode_none = 0,
> +   /** Address converted to units of registers (ie. float/int32): */
> +   nir_addressing_mode_register = 1,
> +   /** Address converted to units of bytes: */
> +   nir_addressing_mode_byte = 2,
> +} nir_addressing_mode;
> +
>  typedef struct nir_shader_compiler_options {
> bool lower_ffma;
> bool lower_flrp;
> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>  * are simulated by floats.)
>  */
> bool native_integers;
> +
> +   /**
> +* Addressing mode for corresponding _indirect intrinsics:
> +*/
> +   nir_addressing_mode var_addressing_mode;
> +   nir_addressing_mode input_addressing_mode;
> +   nir_addressing_mode output_addressing_mode;
> +   nir_addressing_mode uniform_addressing_mode;
> +   nir_addressing_mode ubo_addressing_mode;
> +
>  } nir_shader_compiler_options;
>
>  typedef struct nir_shader {
> --
> 2.1.0

 This seems like a good idea, as it'll mean we can have passes that do
 things like scalarizing uniform loads, which I think could be productive
 for us.

 However, I find the "register" size name confusing -- it seems you mean
 a channel in a register, while on both the pieces of hardware I've
 worked on recently, a register is 16 dwords wide.  I think I'd call it
 "dword" or "float", with a slight preference for dword.
>>>
>>> yeah, I was having trouble picking a good name for that one.. I'd
>>> originally had float, but that sounds funny when it could just as
>>> easily be uint.  (And also, I have half-precision registers that would
>>> be nice to use some day, etc.)  I think "dword" is a better
>>> alternative.
>>
>> I'm not sure what I think of this.  However, having things better
>> defined is a good idea and this may be a step in the right direction.
>>
>> Yes, please use either dword or float for 32-bits.  Also, what is this
>> addressing_mode_none?  If we want a vec4 addressing mode, we should
>> just call it vec4.
>>
 Also, should this apply to just the indirect variants? I'd think they
 would apply to the const_index component, as well.  Either that, or we
 make const_index always be bytes, maybe.
>>>
>>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>>> isn't very consistent w/ dwords/slots in the other cases, it might
>>> make sense..  otoh, that wouldn't change the code the driver emitted.
>>> So I guess I could really go either way.
>>
>> Yes.  The indirect source and const_index should have the same units.
>> If your hardware takes a const_index in bytes and an indirect in
>> dwords, that's not NIR's problem.  We need to keep it consistent.
>
> One issue is that for UBO's, adreno (or at

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Ilia Mirkin
On Tue, Apr 14, 2015 at 3:32 PM, Jason Ekstrand  wrote:
> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
>> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
>>> Rob Clark  writes:
>>>
 From: Rob Clark 

 Add compiler options so driver can request the units for address value,
 for the various _indirect intrinsics.  This way, the front end can
 insert the appropriate multiply/shift operations to get the address into
 the units that the driver needs, avoiding need for driver to insert
 these on the backend (and loose out on some optimizing potential).

 The addressing modes are specified per var/input/output/uniform/ubo.
 I'm not sure about other drivers, but for freedreno we want byte
 addressing for ubo's and register addressing for the rest.

 NOTE: so far just updated ttn and freedreno, and included everything in
 one commit to make it easier to see how it fits together.  The driver
 patches would naturally be separate (since drivers not setting these
 options get the current/default behavior, ie. splitting out won't hurt
 bisectability).  Also the ttn part could be split out as long as it
 comes before freedreno or vc4 parts.
>>>
 diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
 index 6531237..e8bce9f 100644
 --- a/src/glsl/nir/nir.h
 +++ b/src/glsl/nir/nir.h
 @@ -1374,6 +1374,23 @@ typedef struct nir_function {
 exec_node_data(nir_function_overload, \
exec_list_get_head(&(func)->overload_list), node)

 +/**
 + * Addressing mode used by the backend for various sorts of indirect
 + * access.  This allows the frontend to generate the appropriate
 + * multiply or shift operations to convert the address param for the
 + * corresponding _indirect intrinsic, rather than relying on the
 + * driver to insert them after the various other NIR optimization
 + * passes.
 + */
 +typedef enum {
 +   /** No address conversion, units in elements of array member: */
 +   nir_addressing_mode_none = 0,
 +   /** Address converted to units of registers (ie. float/int32): */
 +   nir_addressing_mode_register = 1,
 +   /** Address converted to units of bytes: */
 +   nir_addressing_mode_byte = 2,
 +} nir_addressing_mode;
 +
  typedef struct nir_shader_compiler_options {
 bool lower_ffma;
 bool lower_flrp;
 @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
  * are simulated by floats.)
  */
 bool native_integers;
 +
 +   /**
 +* Addressing mode for corresponding _indirect intrinsics:
 +*/
 +   nir_addressing_mode var_addressing_mode;
 +   nir_addressing_mode input_addressing_mode;
 +   nir_addressing_mode output_addressing_mode;
 +   nir_addressing_mode uniform_addressing_mode;
 +   nir_addressing_mode ubo_addressing_mode;
 +
  } nir_shader_compiler_options;

  typedef struct nir_shader {
 --
 2.1.0
>>>
>>> This seems like a good idea, as it'll mean we can have passes that do
>>> things like scalarizing uniform loads, which I think could be productive
>>> for us.
>>>
>>> However, I find the "register" size name confusing -- it seems you mean
>>> a channel in a register, while on both the pieces of hardware I've
>>> worked on recently, a register is 16 dwords wide.  I think I'd call it
>>> "dword" or "float", with a slight preference for dword.
>>
>> yeah, I was having trouble picking a good name for that one.. I'd
>> originally had float, but that sounds funny when it could just as
>> easily be uint.  (And also, I have half-precision registers that would
>> be nice to use some day, etc.)  I think "dword" is a better
>> alternative.
>
> I'm not sure what I think of this.  However, having things better
> defined is a good idea and this may be a step in the right direction.
>
> Yes, please use either dword or float for 32-bits.  Also, what is this
> addressing_mode_none?  If we want a vec4 addressing mode, we should
> just call it vec4.
>
>>> Also, should this apply to just the indirect variants? I'd think they
>>> would apply to the const_index component, as well.  Either that, or we
>>> make const_index always be bytes, maybe.
>>
>> Hmm.. possibly.. since const_index is already bytes on ubo's, which
>> isn't very consistent w/ dwords/slots in the other cases, it might
>> make sense..  otoh, that wouldn't change the code the driver emitted.
>> So I guess I could really go either way.
>
> Yes.  The indirect source and const_index should have the same units.
> If your hardware takes a const_index in bytes and an indirect in
> dwords, that's not NIR's problem.  We need to keep it consistent.

One issue is that for UBO's, adreno (or at least the way I implemented
it) wants the offsets in bytes, but for regular uniforms, it wants it
in 32-bit units. UBO loads are done from memory, while regular
un

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Kenneth Graunke
On Tuesday, April 14, 2015 12:32:47 PM Jason Ekstrand wrote:
> On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
> > On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
[snip]
> >> This seems like a good idea, as it'll mean we can have passes that do
> >> things like scalarizing uniform loads, which I think could be productive
> >> for us.
> >>
> >> However, I find the "register" size name confusing -- it seems you mean
> >> a channel in a register, while on both the pieces of hardware I've
> >> worked on recently, a register is 16 dwords wide.  I think I'd call it
> >> "dword" or "float", with a slight preference for dword.
> >
> > yeah, I was having trouble picking a good name for that one.. I'd
> > originally had float, but that sounds funny when it could just as
> > easily be uint.  (And also, I have half-precision registers that would
> > be nice to use some day, etc.)  I think "dword" is a better
> > alternative.
> 
> I'm not sure what I think of this.  However, having things better
> defined is a good idea and this may be a step in the right direction.
> 
> Yes, please use either dword or float for 32-bits.  Also, what is this
> addressing_mode_none?  If we want a vec4 addressing mode, we should
> just call it vec4.

Using "vec1" for dword and "vec4" for 4-dwords might not be bad either.
Otherwise, "dword" seems fine...

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Jason Ekstrand
On Tue, Apr 14, 2015 at 11:54 AM, Rob Clark  wrote:
> On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
>> Rob Clark  writes:
>>
>>> From: Rob Clark 
>>>
>>> Add compiler options so driver can request the units for address value,
>>> for the various _indirect intrinsics.  This way, the front end can
>>> insert the appropriate multiply/shift operations to get the address into
>>> the units that the driver needs, avoiding need for driver to insert
>>> these on the backend (and loose out on some optimizing potential).
>>>
>>> The addressing modes are specified per var/input/output/uniform/ubo.
>>> I'm not sure about other drivers, but for freedreno we want byte
>>> addressing for ubo's and register addressing for the rest.
>>>
>>> NOTE: so far just updated ttn and freedreno, and included everything in
>>> one commit to make it easier to see how it fits together.  The driver
>>> patches would naturally be separate (since drivers not setting these
>>> options get the current/default behavior, ie. splitting out won't hurt
>>> bisectability).  Also the ttn part could be split out as long as it
>>> comes before freedreno or vc4 parts.
>>
>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>> index 6531237..e8bce9f 100644
>>> --- a/src/glsl/nir/nir.h
>>> +++ b/src/glsl/nir/nir.h
>>> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
>>> exec_node_data(nir_function_overload, \
>>>exec_list_get_head(&(func)->overload_list), node)
>>>
>>> +/**
>>> + * Addressing mode used by the backend for various sorts of indirect
>>> + * access.  This allows the frontend to generate the appropriate
>>> + * multiply or shift operations to convert the address param for the
>>> + * corresponding _indirect intrinsic, rather than relying on the
>>> + * driver to insert them after the various other NIR optimization
>>> + * passes.
>>> + */
>>> +typedef enum {
>>> +   /** No address conversion, units in elements of array member: */
>>> +   nir_addressing_mode_none = 0,
>>> +   /** Address converted to units of registers (ie. float/int32): */
>>> +   nir_addressing_mode_register = 1,
>>> +   /** Address converted to units of bytes: */
>>> +   nir_addressing_mode_byte = 2,
>>> +} nir_addressing_mode;
>>> +
>>>  typedef struct nir_shader_compiler_options {
>>> bool lower_ffma;
>>> bool lower_flrp;
>>> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>>>  * are simulated by floats.)
>>>  */
>>> bool native_integers;
>>> +
>>> +   /**
>>> +* Addressing mode for corresponding _indirect intrinsics:
>>> +*/
>>> +   nir_addressing_mode var_addressing_mode;
>>> +   nir_addressing_mode input_addressing_mode;
>>> +   nir_addressing_mode output_addressing_mode;
>>> +   nir_addressing_mode uniform_addressing_mode;
>>> +   nir_addressing_mode ubo_addressing_mode;
>>> +
>>>  } nir_shader_compiler_options;
>>>
>>>  typedef struct nir_shader {
>>> --
>>> 2.1.0
>>
>> This seems like a good idea, as it'll mean we can have passes that do
>> things like scalarizing uniform loads, which I think could be productive
>> for us.
>>
>> However, I find the "register" size name confusing -- it seems you mean
>> a channel in a register, while on both the pieces of hardware I've
>> worked on recently, a register is 16 dwords wide.  I think I'd call it
>> "dword" or "float", with a slight preference for dword.
>
> yeah, I was having trouble picking a good name for that one.. I'd
> originally had float, but that sounds funny when it could just as
> easily be uint.  (And also, I have half-precision registers that would
> be nice to use some day, etc.)  I think "dword" is a better
> alternative.

I'm not sure what I think of this.  However, having things better
defined is a good idea and this may be a step in the right direction.

Yes, please use either dword or float for 32-bits.  Also, what is this
addressing_mode_none?  If we want a vec4 addressing mode, we should
just call it vec4.

>> Also, should this apply to just the indirect variants? I'd think they
>> would apply to the const_index component, as well.  Either that, or we
>> make const_index always be bytes, maybe.
>
> Hmm.. possibly.. since const_index is already bytes on ubo's, which
> isn't very consistent w/ dwords/slots in the other cases, it might
> make sense..  otoh, that wouldn't change the code the driver emitted.
> So I guess I could really go either way.

Yes.  The indirect source and const_index should have the same units.
If your hardware takes a const_index in bytes and an indirect in
dwords, that's not NIR's problem.  We need to keep it consistent.

>> This patch should probably edit the comment in nir_intrinsics.h about
>> addressing, too.
>
> hmm, yeah, that comment doesn't really make any sense since (afaict)
> NIR doesn't *really* know if you are vec or scalar backend..

We also need to update nir_lower_io, nir_lower_locals_to_regs and
nir_lower_globals_to_regs.  Having these things in global NIR
structures and the

Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Rob Clark
On Tue, Apr 14, 2015 at 1:56 PM, Eric Anholt  wrote:
> Rob Clark  writes:
>
>> From: Rob Clark 
>>
>> Add compiler options so driver can request the units for address value,
>> for the various _indirect intrinsics.  This way, the front end can
>> insert the appropriate multiply/shift operations to get the address into
>> the units that the driver needs, avoiding need for driver to insert
>> these on the backend (and loose out on some optimizing potential).
>>
>> The addressing modes are specified per var/input/output/uniform/ubo.
>> I'm not sure about other drivers, but for freedreno we want byte
>> addressing for ubo's and register addressing for the rest.
>>
>> NOTE: so far just updated ttn and freedreno, and included everything in
>> one commit to make it easier to see how it fits together.  The driver
>> patches would naturally be separate (since drivers not setting these
>> options get the current/default behavior, ie. splitting out won't hurt
>> bisectability).  Also the ttn part could be split out as long as it
>> comes before freedreno or vc4 parts.
>
>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>> index 6531237..e8bce9f 100644
>> --- a/src/glsl/nir/nir.h
>> +++ b/src/glsl/nir/nir.h
>> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
>> exec_node_data(nir_function_overload, \
>>exec_list_get_head(&(func)->overload_list), node)
>>
>> +/**
>> + * Addressing mode used by the backend for various sorts of indirect
>> + * access.  This allows the frontend to generate the appropriate
>> + * multiply or shift operations to convert the address param for the
>> + * corresponding _indirect intrinsic, rather than relying on the
>> + * driver to insert them after the various other NIR optimization
>> + * passes.
>> + */
>> +typedef enum {
>> +   /** No address conversion, units in elements of array member: */
>> +   nir_addressing_mode_none = 0,
>> +   /** Address converted to units of registers (ie. float/int32): */
>> +   nir_addressing_mode_register = 1,
>> +   /** Address converted to units of bytes: */
>> +   nir_addressing_mode_byte = 2,
>> +} nir_addressing_mode;
>> +
>>  typedef struct nir_shader_compiler_options {
>> bool lower_ffma;
>> bool lower_flrp;
>> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>>  * are simulated by floats.)
>>  */
>> bool native_integers;
>> +
>> +   /**
>> +* Addressing mode for corresponding _indirect intrinsics:
>> +*/
>> +   nir_addressing_mode var_addressing_mode;
>> +   nir_addressing_mode input_addressing_mode;
>> +   nir_addressing_mode output_addressing_mode;
>> +   nir_addressing_mode uniform_addressing_mode;
>> +   nir_addressing_mode ubo_addressing_mode;
>> +
>>  } nir_shader_compiler_options;
>>
>>  typedef struct nir_shader {
>> --
>> 2.1.0
>
> This seems like a good idea, as it'll mean we can have passes that do
> things like scalarizing uniform loads, which I think could be productive
> for us.
>
> However, I find the "register" size name confusing -- it seems you mean
> a channel in a register, while on both the pieces of hardware I've
> worked on recently, a register is 16 dwords wide.  I think I'd call it
> "dword" or "float", with a slight preference for dword.

yeah, I was having trouble picking a good name for that one.. I'd
originally had float, but that sounds funny when it could just as
easily be uint.  (And also, I have half-precision registers that would
be nice to use some day, etc.)  I think "dword" is a better
alternative.

> Also, should this apply to just the indirect variants? I'd think they
> would apply to the const_index component, as well.  Either that, or we
> make const_index always be bytes, maybe.

Hmm.. possibly.. since const_index is already bytes on ubo's, which
isn't very consistent w/ dwords/slots in the other cases, it might
make sense..  otoh, that wouldn't change the code the driver emitted.
So I guess I could really go either way.

> This patch should probably edit the comment in nir_intrinsics.h about
> addressing, too.

hmm, yeah, that comment doesn't really make any sense since (afaict)
NIR doesn't *really* know if you are vec or scalar backend..

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Eric Anholt
Rob Clark  writes:

> From: Rob Clark 
>
> Add compiler options so driver can request the units for address value,
> for the various _indirect intrinsics.  This way, the front end can
> insert the appropriate multiply/shift operations to get the address into
> the units that the driver needs, avoiding need for driver to insert
> these on the backend (and loose out on some optimizing potential).
>
> The addressing modes are specified per var/input/output/uniform/ubo.
> I'm not sure about other drivers, but for freedreno we want byte
> addressing for ubo's and register addressing for the rest.
>
> NOTE: so far just updated ttn and freedreno, and included everything in
> one commit to make it easier to see how it fits together.  The driver
> patches would naturally be separate (since drivers not setting these
> options get the current/default behavior, ie. splitting out won't hurt
> bisectability).  Also the ttn part could be split out as long as it
> comes before freedreno or vc4 parts.

> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 6531237..e8bce9f 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1374,6 +1374,23 @@ typedef struct nir_function {
> exec_node_data(nir_function_overload, \
>exec_list_get_head(&(func)->overload_list), node)
>  
> +/**
> + * Addressing mode used by the backend for various sorts of indirect
> + * access.  This allows the frontend to generate the appropriate
> + * multiply or shift operations to convert the address param for the
> + * corresponding _indirect intrinsic, rather than relying on the
> + * driver to insert them after the various other NIR optimization
> + * passes.
> + */
> +typedef enum {
> +   /** No address conversion, units in elements of array member: */
> +   nir_addressing_mode_none = 0,
> +   /** Address converted to units of registers (ie. float/int32): */
> +   nir_addressing_mode_register = 1,
> +   /** Address converted to units of bytes: */
> +   nir_addressing_mode_byte = 2,
> +} nir_addressing_mode;
> +
>  typedef struct nir_shader_compiler_options {
> bool lower_ffma;
> bool lower_flrp;
> @@ -1393,6 +1410,16 @@ typedef struct nir_shader_compiler_options {
>  * are simulated by floats.)
>  */
> bool native_integers;
> +
> +   /**
> +* Addressing mode for corresponding _indirect intrinsics:
> +*/
> +   nir_addressing_mode var_addressing_mode;
> +   nir_addressing_mode input_addressing_mode;
> +   nir_addressing_mode output_addressing_mode;
> +   nir_addressing_mode uniform_addressing_mode;
> +   nir_addressing_mode ubo_addressing_mode;
> +
>  } nir_shader_compiler_options;
>  
>  typedef struct nir_shader {
> -- 
> 2.1.0

This seems like a good idea, as it'll mean we can have passes that do
things like scalarizing uniform loads, which I think could be productive
for us.

However, I find the "register" size name confusing -- it seems you mean
a channel in a register, while on both the pieces of hardware I've
worked on recently, a register is 16 dwords wide.  I think I'd call it
"dword" or "float", with a slight preference for dword.

Also, should this apply to just the indirect variants? I'd think they
would apply to the const_index component, as well.  Either that, or we
make const_index always be bytes, maybe.

This patch should probably edit the comment in nir_intrinsics.h about
addressing, too.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC] nir: compiler options for addressing modes

2015-04-14 Thread Rob Clark
From: Rob Clark 

Add compiler options so driver can request the units for address value,
for the various _indirect intrinsics.  This way, the front end can
insert the appropriate multiply/shift operations to get the address into
the units that the driver needs, avoiding need for driver to insert
these on the backend (and loose out on some optimizing potential).

The addressing modes are specified per var/input/output/uniform/ubo.
I'm not sure about other drivers, but for freedreno we want byte
addressing for ubo's and register addressing for the rest.

NOTE: so far just updated ttn and freedreno, and included everything in
one commit to make it easier to see how it fits together.  The driver
patches would naturally be separate (since drivers not setting these
options get the current/default behavior, ie. splitting out won't hurt
bisectability).  Also the ttn part could be split out as long as it
comes before freedreno or vc4 parts.

As for gtn or intel backends, I guess there are enough others who
actually know their way around that code to add support, if appropriate.
---
 src/gallium/auxiliary/nir/tgsi_to_nir.c| 54 +++---
 .../drivers/freedreno/ir3/ir3_compiler_nir.c   | 20 +++-
 src/glsl/nir/nir.h | 27 +++
 3 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
b/src/gallium/auxiliary/nir/tgsi_to_nir.c
index 0d4bd7c..a370a01 100644
--- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
+++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
@@ -257,7 +257,21 @@ ttn_emit_immediate(struct ttn_compile *c)
 }
 
 static nir_src *
-ttn_src_for_indirect(struct ttn_compile *c, struct tgsi_ind_register 
*indirect);
+ttn_src_for_indirect(struct ttn_compile *c, struct tgsi_ind_register *indirect,
+ unsigned file);
+
+/* everything in tgsi-land is vec4's, so we can convert to left-shift: */
+static unsigned ttn_addr_mode_to_shift(nir_addressing_mode mode)
+{
+   switch (mode) {
+   case nir_addressing_mode_none:  return 0;
+   case nir_addressing_mode_register:  return 2;
+   case nir_addressing_mode_byte:  return 4;
+   default:
+  unreachable("bad addressing mode");
+  return ~0;
+   }
+}
 
 /* generate either a constant or indirect deref chain for accessing an
  * array variable.
@@ -275,7 +289,8 @@ ttn_array_deref(struct ttn_compile *c, nir_intrinsic_instr 
*instr,
 
if (indirect) {
   arr->deref_array_type = nir_deref_array_type_indirect;
-  arr->indirect = *ttn_src_for_indirect(c, indirect);
+  arr->indirect = *ttn_src_for_indirect(c, indirect,
+TGSI_FILE_TEMPORARY);
} else {
   arr->deref_array_type = nir_deref_array_type_direct;
}
@@ -383,7 +398,7 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned 
file, unsigned index,
   load->const_index[0] = index;
   load->const_index[1] = 1;
   if (indirect) {
- load->src[0] = *ttn_src_for_indirect(c, indirect);
+ load->src[0] = *ttn_src_for_indirect(c, indirect, file);
   }
   nir_ssa_dest_init(&load->instr, &load->dest, 4, NULL);
   nir_instr_insert_after_cf_list(b->cf_node_list, &load->instr);
@@ -400,8 +415,24 @@ ttn_src_for_file_and_index(struct ttn_compile *c, unsigned 
file, unsigned index,
return src;
 }
 
+static nir_addressing_mode
+ttn_addr_mode_for_file(struct ttn_compile *c, unsigned file)
+{
+   const struct nir_shader_compiler_options *options = c->s->options;
+   switch (file) {
+   case TGSI_FILE_CONSTANT:  return options->uniform_addressing_mode;
+   case TGSI_FILE_INPUT: return options->input_addressing_mode;
+   case TGSI_FILE_OUTPUT:return options->output_addressing_mode;
+   case TGSI_FILE_TEMPORARY: return options->var_addressing_mode;
+   default:
+  unreachable("invalid file");
+  return 0;
+   }
+}
+
 static nir_src *
-ttn_src_for_indirect(struct ttn_compile *c, struct tgsi_ind_register *indirect)
+ttn_src_for_indirect(struct ttn_compile *c, struct tgsi_ind_register *indirect,
+ unsigned file)
 {
nir_builder *b = &c->build;
nir_alu_src src;
@@ -411,8 +442,18 @@ ttn_src_for_indirect(struct ttn_compile *c, struct 
tgsi_ind_register *indirect)
src.src = ttn_src_for_file_and_index(c,
 indirect->File,
 indirect->Index, NULL);
+
+   nir_ssa_def *addr = nir_imov_alu(b, src, 1);
+
+   /* do we need to convert address from vec4? */
+   nir_addressing_mode addr_mode = ttn_addr_mode_for_file(c, file);
+   unsigned shift = ttn_addr_mode_to_shift(addr_mode);
+
+   if (shift)
+  addr = nir_ishl(b, addr, nir_imm_int(b, shift));
+
nir_src *result = ralloc(b->shader, nir_src);
-   *result = nir_src_for_ssa(nir_imov_alu(b, src, 1));
+   *result = nir_src_for_ssa(addr);
return result;
 }
 
@@ -481,7 +522,8 @@ ttn_get_dest(struct ttn_compile *c, struct 
tgsi