OpenGL 4.0

Matt Turner Sun, 11 Dec 2016 15:00:22 -0800

i965/vec4: handle 32 and 64 bit channels in liveness analysis


        Please indent the returned multiline expressions in
        var_from_reg() like we do elsewhere, so that the second line
        begins on the same column as the first line.

        */ goes on its own line.

        I'm having a hard time reviewing this one. The logic is rather
        complex. I'll ask someone to help me review it on Monday at the
        office.

i965/vec4: add a horiz_offset() helper
i965: move the group field from fs_inst to backend_instruction.
i965/vec4: add a SIMD lowering pass

        In the commit message, you say

            For now the pass only handles the gen7 restriction where any
            instruction that writes 2 registers also needs to read 2
            registers.  This affects double-precision instructions
            reading uniforms, for example. Later patches will extend the
            lowering pass adding a few more cases.

        But the rule about if-writing-two-regs, must-read-two-regs
        says that scalar sources are an exception:

                "When source is scalar, the source registers are not
                 incremented."

        I don't see any code that allows us to avoid splitting an
        instruction if it's writing two registers but sourcing a scalar
        uniform. Maybe this doesn't apply because we have to use a non
        scalar swizzle (.xy) to access a single fp64 component?

i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions
i965/vec4: dump NibCtrl for instructions with execsize != 8
i965/disasm: print NibCtrl for instructions with execsize < 8
i965/vec4: teach CSE about exec_size, group and doubles
i965/vec4: teach cmod propagation about different execution sizes
i965/vec4: split double-precision bcsel

        bcsel is the NIR opcode. I'd change references to bcsel to SEL.

        Very interesting find...

i965/vec4: add a scalarization pass for double-precision instructions

        Don't indent case inside a switch.

i965/vec4: translate 64-bit swizzles to 32-bit
i965/vec4: implement access to DF source components Z/W

        Wow, bien hecho!

i965/disasm: fix subreg for dst in Align16 mode
i965/vec4: teach register coalescing about 64-bit
i965/vec4: fix pack_uniform_registers for doubles
i965/vec4: fix indentation in pack_uniform_registers
i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

        s/need/needs/ in the comment.

i965/vec4/nir: do not emit 64-bit MAD
i965/vec4: do not emit 64-bit MAD

        I might change the name of this commit to "i965/vec4: Lower
        64-bit MAD" or "i965/vec4: Lower DF MAD"

        I think I'd change the name of the function as well, maybe to
        lower_64bit_mad[_to_mul_add] or something.

i965/vec4: support multiple dispatch widths and groups in the IR builder.
i965/vec4: Add a shuffle_64bit_data helper

        I was initially confused by r0.0:DF/r0.1:DF, thinking that .1 in
        r0.1:DF was a subreg offset. But I think it's actually the
        register offset (i.e., .offset)?

        If that's the case, I think it would be clearer just to
        increment the register number in the example:

                r0.0:DF  x0 y0 z0 w0
                r1.0:DF  x1 y1 z1 w1

        s/opperation/operation/ in the comment.

        On the multiline bld.group(...), I think Curro's style is to
        align with the '.'. For instance,

        inst = bld.group(4, for_write ? 1 : 0)
                  .MOV(writemask(dst, WRITEMASK_ZW),
                       swizzle(byte_offset(src, REG_SIZE), BRW_SWIZZLE_XYXY));

        so that group and MOV align, with the '.' on the same line as
        the MOV.

i965/vec4: Fix UBO loads for 64-bit data
i965/vec4: Fix SSBO loads for 64-bit data
i965/vec4: Fix SSBO stores for 64-bit data
i965/vec4: don't constant propagate 64-bit immediates
i965/vec4: prevent copy-propagation from values with a different type size
i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions

        Similar comment as before about being allowed to write two
        registers while sourcing a scalar. Maybe doesn't apply because
        of the double swizzle.

i965/vec4: don't propagate single-precision uniforms into 4-wide instructions
i965/vec4: don't copy propagate misaligned registers
i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms

        I don't see this in the BDW PRMs, and the internal documentation
        says that it applies to "CHV, BXT"

        I suggest dropping this patch (or replacing it with one that
        adds || devinfo->is_broxton).

i965/vec4: Do not use DepCtrl with 64-bit instructions
i965/vec4: do not split scratch read/write opcodes
i965/vec4: fix scratch offset for 64bit data
i965/vec4: fix scratch reads for 64bit data
i965/vec4: fix scratch writes for 64bit data
i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data
i965/vec4: fix indentation in move_push_constants_to_pull_constants()
i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data
i965/vec4: make emit_pull_constant_load support 64-bit loads
i965/vec4: fix indentation in lower_attributes_to_hw_regs()
i965/vec4: fix attribute setup for doubles
i965/vec4: fix store output for 64-bit types
i965/vec4/gs: fix input loading for 64bit data
i965/vec4/tcs: fix input loading for 64-bit data
i965/vec4/tcs: fix outputs for 64-bit data
i965/vec4/tes: fix input loading for 64bit data types
i965/vec4/tes: fix setup_payload() for 64bit data types
i965/vec4/tes: consider register offsets during attribute setup
i965/vec4: dump subnr for FIXED_GRF
i965/vec4: split instructions that read 64-bit interleaved attributes

        I think you want to make stage_uses_interleaved_attributes
        static.

        Also don't indent case inside switch.

i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively
i965/vec4/scalarize_df: support more swizzles via vstride=0
i965/vec4: prevent src/dst hazards during 64-bit register allocation
i965/vec4: run scalarize_df() after spilling

        Calling scalarize_df() on code that's already been scalarized
        seems a little scary...

        Are you sure that scalarize_df() will not modify any code that
        has already been scalarized?

i965/vec4: support basic spilling of 64-bit registers
i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access
i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination
i965/vec4: adjust spilling costs for 64-bit registers.
i965/vec4: enable ARB_gpu_shader_fp64 for Haswell
i965/gen7: expose OpenGL 4.0 on Haswell

        There's been a lot of discussion about this patch that I haven't
        been involved in. Presumably given my review of the previous 100
        patches, someone previously involved can be bothered to rereview
        this one...

signature.asc
Description: Digital signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

Reply via email to