OpenGL 4.0

Samuel Iglesias Gonsálvez Tue, 13 Dec 2016 04:22:43 -0800

On Sun, 2016-12-11 at 15:00 -0800, Matt Turner wrote:
> i965/vec4: handle 32 and 64 bit channels in liveness analysis
> 
>       Please indent the returned multiline expressions in
>       var_from_reg() like we do elsewhere, so that the second line
>       begins on the same column as the first line.
> 
>       */ goes on its own line.
> 
>       I'm having a hard time reviewing this one. The logic is rather
>       complex. I'll ask someone to help me review it on Monday at the
>       office.
>


OK, no problem!

> i965/vec4: add a horiz_offset() helper
> i965: move the group field from fs_inst to backend_instruction.
> i965/vec4: add a SIMD lowering pass
> 
>       In the commit message, you say
> 
>           For now the pass only handles the gen7 restriction where
> any
>           instruction that writes 2 registers also needs to read 2
>           registers.  This affects double-precision instructions
>           reading uniforms, for example. Later patches will extend
> the
>           lowering pass adding a few more cases.
> 
>       But the rule about if-writing-two-regs, must-read-two-regs
>       says that scalar sources are an exception:
> 
>               "When source is scalar, the source registers are not
>                incremented."
> 
>       I don't see any code that allows us to avoid splitting an
>       instruction if it's writing two registers but sourcing a scalar
>       uniform. Maybe this doesn't apply because we have to use a non
>       scalar swizzle (.xy) to access a single fp64 component?
> 

Right, however, this is necessary in align16 because uniforms are
vectors (hstride != 0) so they are not scalars.

> i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
> instructions
> i965/vec4: dump NibCtrl for instructions with execsize != 8
> i965/disasm: print NibCtrl for instructions with execsize < 8
> i965/vec4: teach CSE about exec_size, group and doubles
> i965/vec4: teach cmod propagation about different execution sizes
> i965/vec4: split double-precision bcsel
> 
>       bcsel is the NIR opcode. I'd change references to bcsel to SEL.
> 
>       Very interesting find...
> 
> i965/vec4: add a scalarization pass for double-precision instructions
> 
>       Don't indent case inside a switch.
> 
> i965/vec4: translate 64-bit swizzles to 32-bit
> i965/vec4: implement access to DF source components Z/W
> 
>       Wow, bien hecho!
> 
> i965/disasm: fix subreg for dst in Align16 mode
> i965/vec4: teach register coalescing about 64-bit
> i965/vec4: fix pack_uniform_registers for doubles
> i965/vec4: fix indentation in pack_uniform_registers
> i965/vec4: Skip swizzle to subnr in 3src instructions with DF
> operands
> 
>       s/need/needs/ in the comment.
> 
> i965/vec4/nir: do not emit 64-bit MAD
> i965/vec4: do not emit 64-bit MAD
> 
>       I might change the name of this commit to "i965/vec4: Lower
>       64-bit MAD" or "i965/vec4: Lower DF MAD"
> 
>       I think I'd change the name of the function as well, maybe to
>       lower_64bit_mad[_to_mul_add] or something.
> 

OK, we will do the rename.

> i965/vec4: support multiple dispatch widths and groups in the IR
> builder.
> i965/vec4: Add a shuffle_64bit_data helper
> 
>       I was initially confused by r0.0:DF/r0.1:DF, thinking that .1
> in
>       r0.1:DF was a subreg offset. But I think it's actually the
>       register offset (i.e., .offset)?
> 
>       If that's the case, I think it would be clearer just to
>       increment the register number in the example:
> 
>               r0.0:DF  x0 y0 z0 w0
>               r1.0:DF  x1 y1 z1 w1
> 
>       s/opperation/operation/ in the comment.
> 
>       On the multiline bld.group(...), I think Curro's style is to
>       align with the '.'. For instance,
> 
>       inst = bld.group(4, for_write ? 1 : 0)
>                 .MOV(writemask(dst, WRITEMASK_ZW),
>                      swizzle(byte_offset(src, REG_SIZE),
> BRW_SWIZZLE_XYXY));
> 
>       so that group and MOV align, with the '.' on the same line as
>       the MOV.
> 

Regarding the example, yes, you are right. We are going to fix it.
Thanks for the rest of suggestions, we will do them too :-)


> i965/vec4: Fix UBO loads for 64-bit data
> i965/vec4: Fix SSBO loads for 64-bit data
> i965/vec4: Fix SSBO stores for 64-bit data
> i965/vec4: don't constant propagate 64-bit immediates
> i965/vec4: prevent copy-propagation from values with a different type
> size
> i965/vec4: Prevent copy propagation from violating pre-gen8
> restrictions
> 
>       Similar comment as before about being allowed to write two
>       registers while sourcing a scalar. Maybe doesn't apply because
>       of the double swizzle.
> 

Same reply as to the other question.

> i965/vec4: don't propagate single-precision uniforms into 4-wide
> instructions
> i965/vec4: don't copy propagate misaligned registers
> i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8
> platforms
> 
>       I don't see this in the BDW PRMs, and the internal
> documentation
>       says that it applies to "CHV, BXT"
> 
>       I suggest dropping this patch (or replacing it with one that
>       adds || devinfo->is_broxton).
> 

It is present in BDW PRMs, Volume 7: 3D-Media-GPGPU, "Special
Requirements for Handling Double Precision Data Types",
page 837:
    
"When source or destination datatype is 64b or operation is integer
DWord multiply, DepCtrl must not be used."

Regarding Broxton, there is no public PRMs here [0]. So if your
internal docs say it is needed, then we must fix this patch.

[0] https://01.org/linuxgraphics/documentation/hardware-specification-p
rms


> i965/vec4: Do not use DepCtrl with 64-bit instructions
> i965/vec4: do not split scratch read/write opcodes
> i965/vec4: fix scratch offset for 64bit data
> i965/vec4: fix scratch reads for 64bit data
> i965/vec4: fix scratch writes for 64bit data
> i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-
> bit data
> i965/vec4: fix indentation in move_push_constants_to_pull_constants()
> i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit
> data
> i965/vec4: make emit_pull_constant_load support 64-bit loads
> i965/vec4: fix indentation in lower_attributes_to_hw_regs()
> i965/vec4: fix attribute setup for doubles
> i965/vec4: fix store output for 64-bit types
> i965/vec4/gs: fix input loading for 64bit data
> i965/vec4/tcs: fix input loading for 64-bit data
> i965/vec4/tcs: fix outputs for 64-bit data
> i965/vec4/tes: fix input loading for 64bit data types
> i965/vec4/tes: fix setup_payload() for 64bit data types
> i965/vec4/tes: consider register offsets during attribute setup
> i965/vec4: dump subnr for FIXED_GRF
> i965/vec4: split instructions that read 64-bit interleaved attributes
> 
>       I think you want to make stage_uses_interleaved_attributes
>       static.
> 
>       Also don't indent case inside switch.
> 
> i965/vec4/scalarize_df: do not scalarize swizzles that we can support
> natively
> i965/vec4/scalarize_df: support more swizzles via vstride=0
> i965/vec4: prevent src/dst hazards during 64-bit register allocation
> i965/vec4: run scalarize_df() after spilling
> 
>       Calling scalarize_df() on code that's already been scalarized
>       seems a little scary...
> 
>       Are you sure that scalarize_df() will not modify any code that
>       has already been scalarized?
> 

Yes, it is safe. The scalarization pass operates on logical channels, 
so for something like:

                mov r0.xy:df r1.zw:df

it produces:

                mov r0.x:df r1.z:dfmov r0.y:df r1.w:df

and if we run the pass again nothing will change for these instructions
because they only operate on a single channel, so we end up with the
same thing. It would have been a different story if we also translated
the logical channels into 32-bit channels during the scalarization
process, but that won't happen until right before codegen precisely so
we can avoid this sort of problems (specifically we do it in
convert_to_hw_regs).

> i965/vec4: support basic spilling of 64-bit registers
> i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit
> access
> i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination
> i965/vec4: adjust spilling costs for 64-bit registers.
> i965/vec4: enable ARB_gpu_shader_fp64 for Haswell
> i965/gen7: expose OpenGL 4.0 on Haswell
> 
>       There's been a lot of discussion about this patch that I
> haven't
>       been involved in. Presumably given my review of the previous
> 100
>       patches, someone previously involved can be bothered to
> rereview
>       this one...

OK, no problem. Thanks for reviewing all the other patches! We will
ping Curro about this one since he was the one who triggered the v3.

Sam

signature.asc
Description: This is a digitally signed message part

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

Reply via email to