https://bugs.freedesktop.org/show_bug.cgi?id=106774

            Bug ID: 106774
           Summary: NIR "copy propagates" loads of SSBOs
           Product: Mesa
           Version: git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: glsl-compiler
          Assignee: mesa-dev@lists.freedesktop.org
          Reporter: i...@freedesktop.org
        QA Contact: intel-3d-b...@lists.freedesktop.org

Created attachment 139956
  --> https://bugs.freedesktop.org/attachment.cgi?id=139956&action=edit
Test case that exhibits the issue

If a value loaded from an SSBO is used multiple times, the ssbo_load gets copy
propagated to each use.  SSBO loads are expensive, so, while this may be
technically correct, it is also bad.

Arguments can be made that the test case is not correct, but that is an
orthogonal issue.  The core of the attached test case is:

    layout(binding = 0) buffer bufblock {
        int value;
    };

...

        int f;

        /* This is an open-coded atomicAdd. */
        do {
            f = value;
        } while (f != atomicCompSwap(value, f, f + 4));

For this code, the resulting NIR is (note the four load_ssbo operations):

        loop {
                block block_1:
                /* preds: block_0 block_4 */
                vec1 32 ssa_4 = load_const (0x00000001 /* 0.000000 */)
                vec1 32 ssa_5 = iadd ssa_0, ssa_4
                vec1 32 ssa_6 = intrinsic load_ssbo (ssa_5, ssa_0) () ()
                vec1 32 ssa_7 = load_const (0x00000001 /* 0.000000 */)
                vec1 32 ssa_8 = iadd ssa_0, ssa_7
                vec1 32 ssa_9 = intrinsic load_ssbo (ssa_8, ssa_0) () ()
                vec1 32 ssa_10 = load_const (0x00000001 /* 0.000000 */)
                vec1 32 ssa_11 = iadd ssa_0, ssa_10
                vec1 32 ssa_12 = intrinsic load_ssbo (ssa_11, ssa_0) () ()
                vec1 32 ssa_13 = iadd ssa_12, ssa_1
                vec1 32 ssa_14 = load_const (0x00000001 /* 0.000000 */)
                vec1 32 ssa_15 = iadd ssa_0, ssa_14
                vec1 32 ssa_16 = intrinsic ssbo_atomic_comp_swap (ssa_15,
ssa_0, ssa_9, ssa_13) () ()
                vec1 32 ssa_17 = load_const (0x00000001 /* 0.000000 */)
                vec1 32 ssa_18 = iadd ssa_0, ssa_17
                vec1 32 ssa_19 = intrinsic load_ssbo (ssa_18, ssa_0) () ()
                vec1 32 ssa_20 = ieq ssa_19, ssa_16
                /* succs: block_2 block_3 */
                if ssa_20 {
                        block block_2:
                        /* preds: block_1 */
                        break
                        /* succs: block_5 */
                } else {
                        block block_3:
                        /* preds: block_1 */
                        /* succs: block_4 */
                }
                block block_4:
                /* preds: block_3 */
                /* succs: block_1 */
        }

The multiple reads get a mishmosh of values (written by other threads), and 

Annotating 'value' with volatile helps, but the generated code still wrong. 
There should definitely *NOT* be a load_ssbo after the ssbo_atomic_comp_swap!

        loop {
                block block_1:
                /* preds: block_0 block_4 */
                vec1 32 ssa_4 = intrinsic load_ssbo (ssa_2, ssa_0) () ()
                vec1 32 ssa_5 = intrinsic load_ssbo (ssa_2, ssa_0) () ()
                vec1 32 ssa_6 = intrinsic load_ssbo (ssa_2, ssa_0) () ()
                vec1 32 ssa_7 = iadd ssa_6, ssa_1
                vec1 32 ssa_8 = intrinsic ssbo_atomic_comp_swap (ssa_2, ssa_0,
ssa_5, ssa_7) () ()
                vec1 32 ssa_9 = intrinsic load_ssbo (ssa_2, ssa_0) () ()
                vec1 32 ssa_10 = ieq ssa_9, ssa_8
                /* succs: block_2 block_3 */
                if ssa_10 {
                        block block_2:
                        /* preds: block_1 */
                        break
                        /* succs: block_5 */
                } else {
                        block block_3:
                        /* preds: block_1 */
                        /* succs: block_4 */
                }
                block block_4:
                /* preds: block_3 */
                /* succs: block_1 */
        }

Completely re-writing the loop as

        int f = value;
        int a;
        do {
                a = f;
        } while ((f = atomicCompSwap(value, f, f + 4)) != a);

produces the expected results:

        vec1 32 ssa_4 = intrinsic load_ssbo (ssa_2, ssa_0) () ()
        /* succs: block_1 */
        loop {
                block block_1:
                /* preds: block_0 block_4 */
                vec1 32 ssa_5 = phi block_0: ssa_4, block_4: ssa_7
                vec1 32 ssa_6 = iadd ssa_5, ssa_1
                vec1 32 ssa_7 = intrinsic ssbo_atomic_comp_swap (ssa_2, ssa_0,
ssa_5, ssa_6) () ()
                vec1 32 ssa_8 = ieq ssa_7, ssa_5
                /* succs: block_2 block_3 */
                if ssa_8 {
                        block block_2:
                        /* preds: block_1 */
                        break
                        /* succs: block_5 */
                } else {
                        block block_3:
                        /* preds: block_1 */
                        /* succs: block_4 */
                }
                block block_4:
                /* preds: block_3 */
                /* succs: block_1 */
        }

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to