Re: [Mesa-dev] [PATCH 1/3] nvc0: fix blit triangle size to fully cover FB's > 8192x8192
What about nv50? Is there the same issue? Don't you need to update nvc0_clear_buffer() accordingly? On 03/13/2016 04:07 AM, Ilia Mirkin wrote: The idea is that a single triangle will cover the whole area being drawn, allowing the blit shader to do its work. However the max fb size is 16384x16384, which means that the triangle we draw needs to be twice that in order to cover the whole area fully. Increase the size of the triangle to 32768x32768. This fixes a number of dEQP tests that were failing because a blit was involved which would miss some of the resulting texture. Signed-off-by: Ilia Mirkin Cc: "11.1 11.2" --- src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c index ccfc9e2..f2ad4bf 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c @@ -1215,8 +1215,8 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info) x0 = (float)info->src.box.x - x_range * (float)info->dst.box.x; y0 = (float)info->src.box.y - y_range * (float)info->dst.box.y; - x1 = x0 + 16384.0f * x_range; - y1 = y0 + 16384.0f * y_range; + x1 = x0 + 32768.0f * x_range; + y1 = y0 + 32768.0f * y_range; x0 *= (float)(1 << nv50_miptree(src)->ms_x); x1 *= (float)(1 << nv50_miptree(src)->ms_x); @@ -1327,14 +1327,14 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info) *(vbuf++) = fui(y0); *(vbuf++) = fui(z); - *(vbuf++) = fui(16384 << nv50_miptree(dst)->ms_x); + *(vbuf++) = fui(32768 << nv50_miptree(dst)->ms_x); *(vbuf++) = fui(0.0f); *(vbuf++) = fui(x1); *(vbuf++) = fui(y0); *(vbuf++) = fui(z); *(vbuf++) = fui(0.0f); - *(vbuf++) = fui(16384 << nv50_miptree(dst)->ms_y); + *(vbuf++) = fui(32768 << nv50_miptree(dst)->ms_y); *(vbuf++) = fui(x0); *(vbuf++) = fui(y1); *(vbuf++) = fui(z); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] nv50/ir: avoid folding mul + add if the mul has a dnz
This doesn't seem crazy. Reviewed-by: Samuel Pitoiset On 03/13/2016 04:07 AM, Ilia Mirkin wrote: Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 6192c06..66e7b2e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1635,11 +1635,10 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, operation toOp) if (src->getUniqueInsn() && src->getUniqueInsn()->bb != add->bb) return false; - if (src->getInsn()->saturate) + if (src->getInsn()->saturate || src->getInsn()->postFactor || + src->getInsn()->dnz) return false; - if (src->getInsn()->postFactor) - return false; if (toOp == OP_SAD) { ImmediateValue imm; if (!src->getInsn()->src(2).getImmediate(imm)) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] nv50, nvc0: handle SQRT lowering inside the driver
On 03/13/2016 04:07 AM, Ilia Mirkin wrote: First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to find out whether the input is less than 0). Secondly the current approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced instead of inf. When I had a look at this precision thing, this seemed a bit weird to me but I did not investigate much of time on it. Instead we switch to the less accurate rcp(rsq(x)) method - this behaves nicely for all valid inputs. We still don't do this for DSQRT since the RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson steps right now. Eventually we should have a separate library function for DSQRT that does it more precisely (and perhaps move this lowering to the post-opt phase). This fixes a number of dEQP precision tests that were expecting better behavior for infinite inputs. Yep, this fixes distance, length, refract and sqrt deqp precision tests (37 tests). But they are still some fails with atan2, idexp and tanh. Are you going to fix them too? Except that you forgot to remove unused variables: codegen/nv50_ir_lowering_nvc0.cpp: In member function ‘bool nv50_ir::NVC0LoweringPass::handleSQRT(nv50_ir::Instruction*)’: codegen/nv50_ir_lowering_nvc0.cpp:1785:20: warning: unused variable ‘mov’ [-Wunused-variable] Instruction *mov, *rsq; ^ codegen/nv50_ir_lowering_nvc0.cpp:1785:26: warning: unused variable ‘rsq’ [-Wunused-variable] Instruction *mov, *rsq; ^ CC nvc0/nvc0_surface.lo This patch is: Reviewed-by: Samuel Pitoiset Tested-by: Samuel Pitoiset Signed-off-by: Ilia Mirkin --- .../drivers/nouveau/codegen/nv50_ir_build_util.cpp | 6 +++- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++ .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 7 ++--- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 32 +++--- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +- 6 files changed, 28 insertions(+), 23 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp index f58cf97..84ebfdb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp @@ -585,6 +585,7 @@ BuildUtil::split64BitOpPostRA(Function *fn, Instruction *i, return NULL; srcNr = 2; break; + case OP_SELP: srcNr = 3; break; default: // TODO when needed return NULL; @@ -601,7 +602,10 @@ BuildUtil::split64BitOpPostRA(Function *fn, Instruction *i, for (int s = 0; s < srcNr; ++s) { if (lo->getSrc(s)->reg.size < 8) { - hi->setSrc(s, zero); + if (s == 2) +hi->setSrc(s, lo->getSrc(s)); + else +hi->setSrc(s, zero); } else { if (lo->getSrc(s)->refCount() > 1) lo->setSrc(s, cloneShallow(fn, lo->getSrc(s))); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index b06d86a..d284446 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -616,6 +616,7 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(RCP, RCP); NV50_IR_OPCODE_CASE(RSQ, RSQ); + NV50_IR_OPCODE_CASE(SQRT, SQRT); NV50_IR_OPCODE_CASE(MUL, MUL); NV50_IR_OPCODE_CASE(ADD, ADD); @@ -2689,6 +2690,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_FLR: case TGSI_OPCODE_TRUNC: case TGSI_OPCODE_RCP: + case TGSI_OPCODE_SQRT: case TGSI_OPCODE_IABS: case TGSI_OPCODE_INEG: case TGSI_OPCODE_NOT: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp index 8752b0c..12c5f69 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp @@ -1203,10 +1203,9 @@ NV50LoweringPreSSA::handleDIV(Instruction *i) bool NV50LoweringPreSSA::handleSQRT(Instruction *i) { - Instruction *rsq = bld.mkOp1(OP_RSQ, TYPE_F32, -bld.getSSA(), i->getSrc(0)); - i->op = OP_MUL; - i->setSrc(1, rsq->getDef(0)); + bld.setPosition(i, true); + i->op = OP_RSQ; + bld.mkOp1(OP_RCP, i->dType, i->getDef(0), i->getDef(0)); return true; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index d181f15..29b77c9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1778,22 +1778,22 @@ NVC0Lowerin
[Mesa-dev] [Bug 94522] weston eglSwapBuffers crash in kms_swrast_dri.so on GMA500
https://bugs.freedesktop.org/show_bug.cgi?id=94522 Bug ID: 94522 Summary: weston eglSwapBuffers crash in kms_swrast_dri.so on GMA500 Product: Mesa Version: 11.1 Hardware: x86 (IA32) OS: All Status: NEW Severity: normal Priority: medium Component: EGL/Wayland Assignee: wayland-b...@lists.freedesktop.org Reporter: comicfan...@gmail.com QA Contact: mesa-dev@lists.freedesktop.org mesa version 11.1.2 llvm 3.7.1 cpu: intel ATOM z520 :32bit only,supports sse sse2 ssse3 but not sse4 nor avx weston crash when calling eglSwapBuffers backtrace shows crash thread receives SIGBUS Thread 2 "llvmpipe-0" received signal SIGBUS, Bus error. #0 0xb755e633 in ?? () #1 0xb6e2960b in lp_rast_shade_tile (task=0x80a9ab4, arg=...) at lp_rast.c:352 #2 0xb6e29981 in do_rasterize_bin (bin=, x=, y=, task=0x80a9ab4) at lp_rast.c:609 #3 rasterize_bin (y=, x=, bin=, task=0x80a9ab4) at lp_rast.c:628 #4 rasterize_scene (task=task@entry=0x80a9ab4, scene=) at lp_rast.c:688 #5 0xb6e2a10a in thread_function (init_data=0x80a9ab4) at lp_rast.c:828 #6 0xb6e29f25 in impl_thrd_routine (p=0x80a2498) at ../../../../include/c11/threads_posix.h:87 #7 0xb7c6f291 in start_thread () from target:/usr/lib/libpthread.so.0 #8 0xb7d75d7e in clone () from target:/usr/lib/libc.so.6 eglSwapBuffers calling thread: 0xb7fdbc11 in __kernel_vsyscall () #1 0xb7c74aab in pthread_cond_wait@@GLIBC_2.3.2 () from target:/usr/lib/libpthread.so.0 #2 0xb7d8248d in pthread_cond_wait@@GLIBC_2.3.2 () from target:/usr/lib/libc.so.6 #3 0xb6e2a55a in cnd_wait (mtx=0x80a9b64, cond=0x80a9b7c) at ../../../../include/c11/threads_posix.h:159 #4 pipe_semaphore_wait (sema=0x80a9b64) at ../../../../src/gallium/auxiliary/os/os_thread.h:259 #5 lp_rast_finish (rast=0x80a9aa8) at lp_rast.c:771 #6 0xb6e35aab in lp_setup_rasterize_scene (setup=0x811cac8) at lp_setup.c:180 #7 set_scene_state (setup=setup@entry=0x811cac8, new_state=new_state@entry=SETUP_FLUSHED, reason=0xb6f44308 <__func__.14289> "do_flush") at lp_setup.c:330 #8 0xb6e3666f in lp_setup_flush (setup=0x811cac8, fence=0x0, reason=0xb6f44308 <__func__.14289> "do_flush") at lp_setup.c:359 #9 0xb6e287f5 in llvmpipe_flush (pipe=0x80fc1a0, fence=0x0, reason=0xb6f44308 <__func__.14289> "do_flush") at lp_flush.c:55 #10 0xb6e27e33 in do_flush (pipe=0x80fc1a0, fence=0x0, flags=1) at lp_context.c:113 #11 0xb69b43b1 in st_flush (st=0x81ee428, fence=0x0, flags=1) at state_tracker/st_cb_flush.c:87 #12 0xb69fe5eb in st_context_flush (stctxi=0x81ee428, flags=2, fence=0x0) at state_tracker/st_manager.c:504 #13 0xb6ad192d in dri_flush (cPriv=0x80cb130, dPriv=0x80fb9b0, flags=5, reason=__DRI2_THROTTLE_SWAPBUFFER) at dri_drawable.c:538 #14 0xb753b82b in dri2_flush_drawable_for_swapbuffers (disp=0x80b7458, draw=0x80fb7f0) at drivers/dri2/egl_dri2.c:1318 #15 0xb75417e0 in dri2_drm_swap_buffers (drv=0x80b56c0, disp=0x80b7458, draw=0x80fb7f0) at drivers/dri2/platform_drm.c:441 #16 0xb7538db8 in dri2_swap_buffers (drv=0x80b56c0, dpy=0x80b7458, surf=0x80fb7f0) at drivers/dri2/egl_dri2.c:1333 #17 0xb75331b4 in eglSwapBuffers (dpy=0x80b7458, surface=0x80fb7f0) at main/eglapi.c:1010 another thread backtrace: #0 0xb6e2b9e8 in do_block_16_32_1 (c=, y=, x=, plane=0xb35b10f4, tri=0x812beb0, task=0x80a9bb0) at lp_rast_tri_tmp.h:136 #1 lp_rast_triangle_32_1 (task=0x80a9bb0, arg=...) at lp_rast_tri_tmp.h:232 #2 0xb6e29981 in do_rasterize_bin (bin=, x=, y=, task=0x80a9bb0) at lp_rast.c:609 #3 rasterize_bin (y=, x=, bin=, task=0x80a9bb0) at lp_rast.c:628 #4 rasterize_scene (task=task@entry=0x80a9bb0, scene=) at lp_rast.c:688 #5 0xb6e2a10a in thread_function (init_data=0x80a9bb0) at lp_rast.c:828 #6 0xb6e29f25 in impl_thrd_routine (p=0x8091c30) at ../../../../include/c11/threads_posix.h:87 #7 0xb7c6f291 in start_thread () from target:/usr/lib/libpthread.so.0 #8 0xb7d75d7e in clone () from target:/usr/lib/libc.so.6 -- You are receiving this mail because: You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 94522] weston eglSwapBuffers crash in kms_swrast_dri.so on GMA500
https://bugs.freedesktop.org/show_bug.cgi?id=94522 --- Comment #1 from comicfans44 --- local variable and args in crash thread: (gdb) info locals color = {0xb2298100 , 0xc01064b3 , 0xb7d6de89 "[=\001\360\377\377\017\203\213\227\363\377\303f\220f\220f\220f\220f\220e\203=\f", 0xb7c72c0b <__pthread_mutex_unlock_usercnt+11> "\201\307\365\003\001", 0xb715b000 "\360]\225", 0xb2d6a008 "0\303\017\b\260y,\b", 0xb2d6a1fc "", 0xb2d6a868 "\300\367\021\b\360B-\b\360B-\b\300\367\021\b\230X-\b\230X-\b\300\367\021\b0r-\b0r-\b"} depth = i = stride = {4096, 3000410620, 134911960, 0, 3066076233, 0, 2621440, 3084396197} depth_stride = scene = 0xb2d6a008 inputs = 0x812c030 state = 0x811f7c0 variant = 0x8202c30 tile_x = 64 tile_y = 448 x = 0 y = 12 (gdb) info args task = 0x80a9ab4 arg = {shade_tile = 0x812c030, triangle = {tri = 0x812c030, plane_mask = 10}, set_state = 0x812c030, clear_rb = 0x812c030, clear_zstencil = {value = 43085119536, mask = 18446744030745985025}, state = 0x812c030, fence = 0x812c030, query_obj = 0x812c030} -- You are receiving this mail because: You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] nv50/ir: Check for valid insn instead of def size
On Tesla cards, the first register $r0 contains the thread id; later generations use a specialised register for it. In order to prevent the register from being given to anyone, and thus lose the thread id information, an lvalue is created to represent $r0 and is passed as an argument to the `main` function. However, since the inputs and outputs of a function are stored as value definitions, a definition is added onto the previously created lvalue without it being associated to an instruction. Therefore, checking the number of definitions of an lvalue do not ensure that it is associated to an instruction. Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index 62b0aa1..7c319df 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -853,7 +853,7 @@ isShortRegOp(Instruction *insn) static bool isShortRegVal(LValue *lval) { - if (lval->defs.size() == 0) + if (lval->getInsn() == NULL) return false; for (Value::DefCIterator def = lval->defs.begin(); def != lval->defs.end(); ++def) @@ -1467,7 +1467,7 @@ GCRA::allocateRegisters(ArrayList& insns) nodes[i].init(regs, lval); RIG.insert(&nodes[i]); - if (lval->inFile(FILE_GPR) && lval->defs.size() > 0 && + if (lval->inFile(FILE_GPR) && lval->getInsn() != NULL && prog->getTarget()->getChipset() < 0xc0) { Instruction *insn = lval->getInsn(); if (insn->op == OP_MAD || insn->op == OP_SAD) -- 2.7.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/11] nv50/ir: Check for valid insn instead of def size
01/11? Where are the other patches? On 03/13/2016 02:16 PM, Pierre Moreau wrote: On Tesla cards, the first register $r0 contains the thread id; later generations use a specialised register for it. In order to prevent the register from being given to anyone, and thus lose the thread id information, an lvalue is created to represent $r0 and is passed as an argument to the `main` function. However, since the inputs and outputs of a function are stored as value definitions, a definition is added onto the previously created lvalue without it being associated to an instruction. Therefore, checking the number of definitions of an lvalue do not ensure that it is associated to an instruction. Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index 62b0aa1..7c319df 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -853,7 +853,7 @@ isShortRegOp(Instruction *insn) static bool isShortRegVal(LValue *lval) { - if (lval->defs.size() == 0) + if (lval->getInsn() == NULL) return false; for (Value::DefCIterator def = lval->defs.begin(); def != lval->defs.end(); ++def) @@ -1467,7 +1467,7 @@ GCRA::allocateRegisters(ArrayList& insns) nodes[i].init(regs, lval); RIG.insert(&nodes[i]); - if (lval->inFile(FILE_GPR) && lval->defs.size() > 0 && + if (lval->inFile(FILE_GPR) && lval->getInsn() != NULL && prog->getTarget()->getChipset() < 0xc0) { Instruction *insn = lval->getInsn(); if (insn->op == OP_MAD || insn->op == OP_SAD) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/11] nv50/ir: Check for valid insn instead of def size
Hum… Something went wrong, sorry. This is the same as the previous patch and not the updated version… Pierre On 02:16 PM - Mar 13 2016, Pierre Moreau wrote: > On Tesla cards, the first register $r0 contains the thread id; later > generations use a specialised register for it. In order to prevent the > register > from being given to anyone, and thus lose the thread id information, an lvalue > is created to represent $r0 and is passed as an argument to the `main` > function. > > However, since the inputs and outputs of a function are stored as value > definitions, a definition is added onto the previously created lvalue without > it being associated to an instruction. Therefore, checking the number of > definitions of an lvalue do not ensure that it is associated to an > instruction. > > Signed-off-by: Pierre Moreau > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp > index 62b0aa1..7c319df 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp > @@ -853,7 +853,7 @@ isShortRegOp(Instruction *insn) > static bool > isShortRegVal(LValue *lval) > { > - if (lval->defs.size() == 0) > + if (lval->getInsn() == NULL) >return false; > for (Value::DefCIterator def = lval->defs.begin(); > def != lval->defs.end(); ++def) > @@ -1467,7 +1467,7 @@ GCRA::allocateRegisters(ArrayList& insns) > nodes[i].init(regs, lval); > RIG.insert(&nodes[i]); > > - if (lval->inFile(FILE_GPR) && lval->defs.size() > 0 && > + if (lval->inFile(FILE_GPR) && lval->getInsn() != NULL && > prog->getTarget()->getChipset() < 0xc0) { > Instruction *insn = lval->getInsn(); > if (insn->op == OP_MAD || insn->op == OP_SAD) > -- > 2.7.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] nv50/ir: Check for valid insn instead of def size
Functions arguments get a definition from the function itself, a definition which is therefore not linked to any instruction. If a value ends up having a definition but no linked instruction, the register allocation pass can skip that value since it is not being used. This fixes a null pointer dereference during the register allocation pass, if a function had unused arguments. v2: Rewrite commit message based on Ilia Mirkin's comment Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index 62b0aa1..7c319df 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -853,7 +853,7 @@ isShortRegOp(Instruction *insn) static bool isShortRegVal(LValue *lval) { - if (lval->defs.size() == 0) + if (lval->getInsn() == NULL) return false; for (Value::DefCIterator def = lval->defs.begin(); def != lval->defs.end(); ++def) @@ -1467,7 +1467,7 @@ GCRA::allocateRegisters(ArrayList& insns) nodes[i].init(regs, lval); RIG.insert(&nodes[i]); - if (lval->inFile(FILE_GPR) && lval->defs.size() > 0 && + if (lval->inFile(FILE_GPR) && lval->getInsn() != NULL && prog->getTarget()->getChipset() < 0xc0) { Instruction *insn = lval->getInsn(); if (insn->op == OP_MAD || insn->op == OP_SAD) -- 2.7.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] require libdrm 2.4.66 for AMD drivers
since 737b6ed13e8f813987b5566004f0f45e9c55f1e8 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c not longer compile: error: unknown type name ‘drmDevicePtr’ --- configure.ac | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 49be147..55b1c95 100644 --- a/configure.ac +++ b/configure.ac @@ -69,8 +69,8 @@ AC_SUBST([OPENCL_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.60 -LIBDRM_RADEON_REQUIRED=2.4.56 -LIBDRM_AMDGPU_REQUIRED=2.4.63 +LIBDRM_RADEON_REQUIRED=2.4.66 +LIBDRM_AMDGPU_REQUIRED=2.4.66 LIBDRM_INTEL_REQUIRED=2.4.61 LIBDRM_NVVIEUX_REQUIRED=2.4.66 LIBDRM_NOUVEAU_REQUIRED=2.4.66 -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/10] st/glsl_to_tgsi: set memory access type on image intrinsics
From: Nicolai Hähnle This is required to preserve the image variable's coherent/restrict/volatile qualifiers in TGSI. --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index fcfd8b7..18cea60 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -3616,6 +3616,13 @@ glsl_to_tgsi_visitor::visit_image_intrinsic(ir_call *ir) inst->image_format = st_mesa_format_to_pipe_format(st_context(ctx), _mesa_get_shader_image_format(imgvar->data.image_format)); + + if (imgvar->data.image_coherent) + inst->buffer_access |= TGSI_MEMORY_COHERENT; + if (imgvar->data.image_restrict) + inst->buffer_access |= TGSI_MEMORY_RESTRICT; + if (imgvar->data.image_volatile) + inst->buffer_access |= TGSI_MEMORY_VOLATILE; } void -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/10] tgsi: add Texture and Format to tgsi_instruction_memory
From: Nicolai Hähnle Frontends should have this information readily available, and it simplifies image LOAD/STORE/ATOM* handling especially with indirect image access. --- src/gallium/auxiliary/tgsi/tgsi_dump.c | 8 src/gallium/include/pipe/p_shader_tokens.h | 4 +++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index f232f38..c8b91bb 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -637,6 +637,14 @@ iter_instruction( TXT(", "); ENM(bit, tgsi_memory_names); } + if (inst->Memory.Texture) { + TXT( ", " ); + ENM( inst->Memory.Texture, tgsi_texture_names ); + } + if (inst->Memory.Format) { + TXT( ", " ); + TXT( util_format_name(inst->Memory.Format) ); + } } switch (inst->Instruction.Opcode) { diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 9d4a96a..34e491e 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -743,7 +743,9 @@ struct tgsi_dst_register struct tgsi_instruction_memory { unsigned Qualifier : 3; /* TGSI_MEMORY_ */ - unsigned Padding : 29; + unsigned Texture : 8; /* only for images: TGSI_TEXTURE_ */ + unsigned Format: 10; /* only for images: PIPE_FORMAT_ */ + unsigned Padding : 11; }; #define TGSI_MEMBAR_SHADER_BUFFER (1 << 0) -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/10] st/mesa: set image access flags in st_bind_images
From: Nicolai Hähnle --- src/mesa/state_tracker/st_atom_image.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/mesa/state_tracker/st_atom_image.c b/src/mesa/state_tracker/st_atom_image.c index bf7486b..e96d10a1 100644 --- a/src/mesa/state_tracker/st_atom_image.c +++ b/src/mesa/state_tracker/st_atom_image.c @@ -70,6 +70,21 @@ st_bind_images(struct st_context *st, struct gl_shader *shader, img->resource = stObj->pt; img->format = st_mesa_format_to_pipe_format(st, u->_ActualFormat); + + switch (u->Access) { + case GL_READ_ONLY: + img->access = PIPE_IMAGE_ACCESS_READ; + break; + case GL_WRITE_ONLY: + img->access = PIPE_IMAGE_ACCESS_WRITE; + break; + case GL_READ_WRITE: + img->access = PIPE_IMAGE_ACCESS_READ_WRITE; + break; + default: + unreachable("bad gl_image_unit::Access"); + } + if (stObj->pt->target == PIPE_BUFFER) { unsigned base, size; unsigned f, n; -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/10] gallium: more preparation for shader images support
Hi, this is probably my last batch of hardware-independent patches that need to be pushed before the series that finally adds ARB_shader_image_load_store for radeonsi. Patches 1 to 7 add various fields to TGSI and pipe_image_view to communicate state and flags to the driver that are so far not communicated at all or are only communicated in a less convenient form. Patches 8 & 9 are two simple utility functions that I am using in my radeonsi series. Patch 10 already implements early_fragment_tests for radeonsi, even though it cannot be used yet. Please review! Thanks, Nicolai -- src/gallium/auxiliary/tgsi/tgsi_build.c | 15 src/gallium/auxiliary/tgsi/tgsi_build.h | 5 +++ src/gallium/auxiliary/tgsi/tgsi_dump.c | 8 + src/gallium/auxiliary/tgsi/tgsi_strings.c| 1 + src/gallium/auxiliary/tgsi/tgsi_ureg.c | 12 +-- src/gallium/auxiliary/tgsi/tgsi_ureg.h | 8 +++-- src/gallium/auxiliary/util/u_inlines.h | 10 ++ src/gallium/docs/source/tgsi.rst | 6 .../drivers/radeonsi/si_state_shaders.c | 3 ++ src/gallium/include/pipe/p_defines.h | 8 + src/gallium/include/pipe/p_shader_tokens.h | 7 ++-- src/gallium/include/pipe/p_state.h | 3 +- src/mesa/state_tracker/st_atom_image.c | 15 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 34 +- 14 files changed, 118 insertions(+), 17 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/10] st/glsl_to_tgsi: provide Texture and Format information for image ops
From: Nicolai Hähnle --- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 12 +--- src/gallium/auxiliary/tgsi/tgsi_ureg.h | 8 ++-- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 24 +++- 3 files changed, 30 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index e1a7278..ab1d034 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -1242,7 +1242,9 @@ ureg_emit_texture_offset(struct ureg_program *ureg, void ureg_emit_memory(struct ureg_program *ureg, unsigned extended_token, - unsigned qualifier) + unsigned qualifier, + unsigned texture, + unsigned format) { union tgsi_any_token *out, *insn; @@ -1253,6 +1255,8 @@ ureg_emit_memory(struct ureg_program *ureg, out[0].value = 0; out[0].insn_memory.Qualifier = qualifier; + out[0].insn_memory.Texture = texture; + out[0].insn_memory.Format = format; } void @@ -1413,7 +1417,9 @@ ureg_memory_insn(struct ureg_program *ureg, unsigned nr_dst, const struct ureg_src *src, unsigned nr_src, - unsigned qualifier) + unsigned qualifier, + unsigned texture, + unsigned format) { struct ureg_emit_insn_result insn; unsigned i; @@ -1430,7 +1436,7 @@ ureg_memory_insn(struct ureg_program *ureg, nr_dst, nr_src); - ureg_emit_memory(ureg, insn.extended_token, qualifier); + ureg_emit_memory(ureg, insn.extended_token, qualifier, texture, format); for (i = 0; i < nr_dst; i++) ureg_emit_dst(ureg, dst[i]); diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h index 6a3b5dd..04a62a6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h @@ -541,7 +541,9 @@ ureg_memory_insn(struct ureg_program *ureg, unsigned nr_dst, const struct ureg_src *src, unsigned nr_src, - unsigned qualifier); + unsigned qualifier, + unsigned texture, + unsigned format); /*** * Internal instruction helpers, don't call these directly: @@ -582,7 +584,9 @@ ureg_emit_texture_offset(struct ureg_program *ureg, void ureg_emit_memory(struct ureg_program *ureg, unsigned insn_token, - unsigned qualifier); + unsigned qualifier, + unsigned texture, + unsigned format); void ureg_emit_dst( struct ureg_program *ureg, diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 92cd775..fcfd8b7 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -5572,7 +5572,7 @@ compile_tgsi_instruction(struct st_translate *t, int num_dst; int num_src; - unsigned tex_target; + unsigned tex_target = 0; num_dst = num_inst_dst_regs(inst); num_src = num_inst_src_regs(inst); @@ -5647,32 +5647,38 @@ compile_tgsi_instruction(struct st_translate *t, for (i = num_src - 1; i >= 0; i--) src[i + 1] = src[i]; num_src++; - if (inst->buffer.file == PROGRAM_MEMORY) + if (inst->buffer.file == PROGRAM_MEMORY) { src[0] = t->shared_memory; - else if (inst->buffer.file == PROGRAM_BUFFER) + } else if (inst->buffer.file == PROGRAM_BUFFER) { src[0] = t->buffers[inst->buffer.index]; - else + } else { src[0] = t->images[inst->buffer.index]; + tex_target = st_translate_texture_target(inst->tex_target, inst->tex_shadow); + } if (inst->buffer.reladdr) src[0] = ureg_src_indirect(src[0], ureg_src(t->address[2])); assert(src[0].File != TGSI_FILE_NULL); ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src, - inst->buffer_access); + inst->buffer_access, + tex_target, inst->image_format); break; case TGSI_OPCODE_STORE: - if (inst->buffer.file == PROGRAM_MEMORY) + if (inst->buffer.file == PROGRAM_MEMORY) { dst[0] = ureg_dst(t->shared_memory); - else if (inst->buffer.file == PROGRAM_BUFFER) + } else if (inst->buffer.file == PROGRAM_BUFFER) { dst[0] = ureg_dst(t->buffers[inst->buffer.index]); - else + } else { dst[0] = ureg_dst(t->images[inst->buffer.index]); + tex_target = st_translate_texture_target(inst->tex_target, inst->tex_shadow); + } dst[0] = ureg_writemask(dst[0], inst->dst[0].writemask); if (inst->buffer.reladdr) dst[0] = ureg_dst_indir
[Mesa-dev] [PATCH 04/10] tgsi: add TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL
From: Nicolai Hähnle --- src/gallium/auxiliary/tgsi/tgsi_strings.c | 1 + src/gallium/docs/source/tgsi.rst | 6 ++ src/gallium/include/pipe/p_shader_tokens.h | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index b15ae69..6bd1a2e 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -144,6 +144,7 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] = "TES_POINT_MODE", "NUM_CLIPDIST_ENABLED", "NUM_CULLDIST_ENABLED", + "FS_EARLY_DEPTH_STENCIL", }; const char *tgsi_return_type_names[TGSI_RETURN_TYPE_COUNT] = diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 489cbb0..af2df22 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -3206,6 +3206,12 @@ NUM_CULLDIST_ENABLED How many cull distance scalar outputs are enabled. +FS_EARLY_DEPTH_STENCIL +"" + +Whether depth test, stencil test, and occlusion query should run before +the fragment shader (regardless of fragment shader side effects). Corresponds +to GLSL early_fragment_tests. Texture Sampling and Texture Formats diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 34e491e..7a34841 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -277,7 +277,8 @@ union tgsi_immediate_data #define TGSI_PROPERTY_TES_POINT_MODE 14 #define TGSI_PROPERTY_NUM_CLIPDIST_ENABLED 15 #define TGSI_PROPERTY_NUM_CULLDIST_ENABLED 16 -#define TGSI_PROPERTY_COUNT 17 +#define TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL 17 +#define TGSI_PROPERTY_COUNT 18 struct tgsi_property { unsigned Type : 4; /**< TGSI_TOKEN_TYPE_PROPERTY */ -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] gallium/u_inlines: add util_copy_image_view
From: Nicolai Hähnle --- src/gallium/auxiliary/util/u_inlines.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/auxiliary/util/u_inlines.h b/src/gallium/auxiliary/util/u_inlines.h index d081203..0e80cef 100644 --- a/src/gallium/auxiliary/util/u_inlines.h +++ b/src/gallium/auxiliary/util/u_inlines.h @@ -622,6 +622,16 @@ util_copy_constant_buffer(struct pipe_constant_buffer *dst, } } +static inline void +util_copy_image_view(struct pipe_image_view *dst, + const struct pipe_image_view *src) +{ + pipe_resource_reference(&dst->resource, src->resource); + dst->format = src->format; + dst->access = src->access; + dst->u = src->u; +} + static inline unsigned util_max_layer(const struct pipe_resource *r, unsigned level) { -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] tgsi: add tgsi_full_src_register_from_dst helper function
From: Nicolai Hähnle --- src/gallium/auxiliary/tgsi/tgsi_build.c | 15 +++ src/gallium/auxiliary/tgsi/tgsi_build.h | 5 + 2 files changed, 20 insertions(+) diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c index cfe9b92..e5355f5 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.c +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c @@ -1425,3 +1425,18 @@ tgsi_build_full_property( return size; } + +struct tgsi_full_src_register +tgsi_full_src_register_from_dst(const struct tgsi_full_dst_register *dst) +{ + struct tgsi_full_src_register src; + src.Register = tgsi_default_src_register(); + src.Register.File = dst->Register.File; + src.Register.Indirect = dst->Register.Indirect; + src.Register.Dimension = dst->Register.Dimension; + src.Register.Index = dst->Register.Index; + src.Indirect = dst->Indirect; + src.Dimension = dst->Dimension; + src.DimIndirect = dst->DimIndirect; + return src; +} diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.h b/src/gallium/auxiliary/tgsi/tgsi_build.h index c5127e1..34d181a 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.h +++ b/src/gallium/auxiliary/tgsi/tgsi_build.h @@ -30,6 +30,8 @@ struct tgsi_token; +struct tgsi_full_dst_register; +struct tgsi_full_src_register; #if defined __cplusplus @@ -111,6 +113,9 @@ tgsi_build_full_instruction( struct tgsi_instruction_predicate tgsi_default_instruction_predicate(void); +struct tgsi_full_src_register +tgsi_full_src_register_from_dst(const struct tgsi_full_dst_register *dst); + #if defined __cplusplus } #endif -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] st/glsl_to_tgsi: set FS_EARLY_DEPTH_STENCIL when required
From: Nicolai Hähnle --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 18cea60..1841405 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -6121,6 +6121,9 @@ st_translate_program( } if (procType == TGSI_PROCESSOR_FRAGMENT) { + if (program->shader->EarlyFragmentTests) + ureg_property(ureg, TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL, 1); + if (proginfo->InputsRead & VARYING_BIT_POS) { /* Must do this after setting up t->inputs. */ emit_wpos(st_context(ctx), t, proginfo, ureg, -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] gallium: add access field to pipe_image_view
From: Nicolai Hähnle This allows drivers to make smarter decisions e.g. about whether the image has to be decompressed. --- src/gallium/include/pipe/p_defines.h | 8 src/gallium/include/pipe/p_state.h | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index a490b3b..bdd76ab 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -541,6 +541,14 @@ enum pipe_reset_status PIPE_HANDLE_USAGE_WRITE) /** + * pipe_image_view access flags. + */ +#define PIPE_IMAGE_ACCESS_READ (1 << 0) +#define PIPE_IMAGE_ACCESS_WRITE (1 << 1) +#define PIPE_IMAGE_ACCESS_READ_WRITE (PIPE_IMAGE_ACCESS_READ | \ + PIPE_IMAGE_ACCESS_WRITE) + +/** * Implementation capabilities/limits which are queried through * pipe_screen::get_param() */ diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index c568c48..2e720ce 100644 --- a/src/gallium/include/pipe/p_state.h +++ b/src/gallium/include/pipe/p_state.h @@ -393,13 +393,14 @@ struct pipe_sampler_view /** - * A description of a writable buffer or texture that can be bound to a shader + * A description of a buffer or texture image that can be bound to a shader * stage. */ struct pipe_image_view { struct pipe_resource *resource; /**< resource into which this is a view */ enum pipe_format format; /**< typed PIPE_FORMAT_x */ + unsigned access; /**< PIPE_IMAGE_ACCESS_x */ union { struct { -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] radeonsi: set DEPTH_BEFORE_SHADER based on FS_EARLY_DEPTH_STENCIL
From: Nicolai Hähnle --- src/gallium/drivers/radeonsi/si_state_shaders.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 321b87d..5fe1f79 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1154,6 +1154,9 @@ static void *si_create_shader_selector(struct pipe_context *ctx, break; } + if (sel->info.properties[TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL]) + sel->db_shader_control |= S_02880C_DEPTH_BEFORE_SHADER(1); + /* Compile the main shader part for use with a prolog and/or epilog. */ if (sel->type != PIPE_SHADER_GEOMETRY && !sscreen->use_monolithic_shaders) { -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: avoid crash when a sampler state is bound for a buffer texture
On 12.03.2016 19:09, Ilia Mirkin wrote: On Fri, Mar 11, 2016 at 11:17 AM, Nicolai Hähnle wrote: From: Nicolai Hähnle Sampler states don't really make sense with buffer textures, but the PBO upload code sets one because apparently nouveau needs it. It would be nice to work that out at some point, but in any case being defensive here is a good idea. Sampler states are set in regular GL as well if you have a regular buffer texture too, no? I suppose you're right - we just haven't had many users of buffer textures before PBO upload, so nobody noticed this. I'll change the comment before pushing. Cheers, Nicolai Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284 Cc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/radeonsi/si_descriptors.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 9aa4877..f5ad113 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -324,6 +324,7 @@ static void si_bind_sampler_states(struct pipe_context *ctx, unsigned shader, */ if (samplers->views.views[i] && samplers->views.views[i]->texture && + samplers->views.views[i]->texture->target != PIPE_BUFFER && ((struct r600_texture*)samplers->views.views[i]->texture)->fmask.size) continue; -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/10] gallium: more preparation for shader images support
I can't say I'm a huge fan of adding the format/etc info per-instruction, but I'm also not opposed to it if it makes your life a lot easier. Patches 1-9 are Reviewed-by: Ilia Mirkin Thanks for taking care of the access/early depth bits :) -ilia On Sun, Mar 13, 2016 at 10:29 AM, Nicolai Hähnle wrote: > Hi, > > this is probably my last batch of hardware-independent patches that need > to be pushed before the series that finally adds ARB_shader_image_load_store > for radeonsi. > > Patches 1 to 7 add various fields to TGSI and pipe_image_view to communicate > state and flags to the driver that are so far not communicated at all or are > only communicated in a less convenient form. > > Patches 8 & 9 are two simple utility functions that I am using in my radeonsi > series. > > Patch 10 already implements early_fragment_tests for radeonsi, even though it > cannot be used yet. > > Please review! > > Thanks, > Nicolai > -- > src/gallium/auxiliary/tgsi/tgsi_build.c | 15 > src/gallium/auxiliary/tgsi/tgsi_build.h | 5 +++ > src/gallium/auxiliary/tgsi/tgsi_dump.c | 8 + > src/gallium/auxiliary/tgsi/tgsi_strings.c| 1 + > src/gallium/auxiliary/tgsi/tgsi_ureg.c | 12 +-- > src/gallium/auxiliary/tgsi/tgsi_ureg.h | 8 +++-- > src/gallium/auxiliary/util/u_inlines.h | 10 ++ > src/gallium/docs/source/tgsi.rst | 6 > .../drivers/radeonsi/si_state_shaders.c | 3 ++ > src/gallium/include/pipe/p_defines.h | 8 + > src/gallium/include/pipe/p_shader_tokens.h | 7 ++-- > src/gallium/include/pipe/p_state.h | 3 +- > src/mesa/state_tracker/st_atom_image.c | 15 > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 34 +- > 14 files changed, 118 insertions(+), 17 deletions(-) > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] nvc0: fix blit triangle size to fully cover FB's > 8192x8192
On Sun, Mar 13, 2016 at 7:43 AM, Samuel Pitoiset wrote: > What about nv50? Is there the same issue? Yes, but nv50's max fb size is 8192x8192, and there the blit triangle's size is 16384x16384, so it all works out. (Also nv50 has some funny business wrt MS textures that ends up being a bit different on nvc0) > Don't you need to update nvc0_clear_buffer() accordingly? I don't think so, but perhaps I'm missing something? clear_buffer doesn't draw any triangles... the point of this blit is to draw a triangle that rasterizes over the full fb. By making the w/h of the tri 2x the fb size, we ensure that the full fb is covered... here's some ASCII art that might help: +-+-/ | | / | | / +-/ | / | / / note that the width of a char is 2x the height, which is why it looks a bit funny. The square in the middle is the fb, and the larger triangle is defined by the position coordinates that we're providing. > > > On 03/13/2016 04:07 AM, Ilia Mirkin wrote: >> >> The idea is that a single triangle will cover the whole area being >> drawn, allowing the blit shader to do its work. However the max fb size >> is 16384x16384, which means that the triangle we draw needs to be twice >> that in order to cover the whole area fully. Increase the size of the >> triangle to 32768x32768. >> >> This fixes a number of dEQP tests that were failing because a blit was >> involved which would miss some of the resulting texture. >> >> Signed-off-by: Ilia Mirkin >> Cc: "11.1 11.2" >> --- >> src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c >> b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c >> index ccfc9e2..f2ad4bf 100644 >> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c >> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c >> @@ -1215,8 +1215,8 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct >> pipe_blit_info *info) >> x0 = (float)info->src.box.x - x_range * (float)info->dst.box.x; >> y0 = (float)info->src.box.y - y_range * (float)info->dst.box.y; >> >> - x1 = x0 + 16384.0f * x_range; >> - y1 = y0 + 16384.0f * y_range; >> + x1 = x0 + 32768.0f * x_range; >> + y1 = y0 + 32768.0f * y_range; >> >> x0 *= (float)(1 << nv50_miptree(src)->ms_x); >> x1 *= (float)(1 << nv50_miptree(src)->ms_x); >> @@ -1327,14 +1327,14 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const >> struct pipe_blit_info *info) >> *(vbuf++) = fui(y0); >> *(vbuf++) = fui(z); >> >> - *(vbuf++) = fui(16384 << nv50_miptree(dst)->ms_x); >> + *(vbuf++) = fui(32768 << nv50_miptree(dst)->ms_x); >> *(vbuf++) = fui(0.0f); >> *(vbuf++) = fui(x1); >> *(vbuf++) = fui(y0); >> *(vbuf++) = fui(z); >> >> *(vbuf++) = fui(0.0f); >> - *(vbuf++) = fui(16384 << nv50_miptree(dst)->ms_y); >> + *(vbuf++) = fui(32768 << nv50_miptree(dst)->ms_y); >> *(vbuf++) = fui(x0); >> *(vbuf++) = fui(y1); >> *(vbuf++) = fui(z); >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] nv50, nvc0: handle SQRT lowering inside the driver
On Sun, Mar 13, 2016 at 7:52 AM, Samuel Pitoiset wrote: > > > On 03/13/2016 04:07 AM, Ilia Mirkin wrote: >> >> First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to >> find out whether the input is less than 0). Secondly the current >> approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced >> instead of inf. >> > > When I had a look at this precision thing, this seemed a bit weird to me but > I did not investigate much of time on it. > >> Instead we switch to the less accurate rcp(rsq(x)) method - this behaves >> nicely for all valid inputs. We still don't do this for DSQRT since the >> RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson >> steps right now. Eventually we should have a separate library function >> for DSQRT that does it more precisely (and perhaps move this lowering to >> the post-opt phase). >> >> This fixes a number of dEQP precision tests that were expecting better >> behavior for infinite inputs. > > > Yep, this fixes distance, length, refract and sqrt deqp precision tests (37 > tests). But they are still some fails with atan2, idexp and tanh. > Are you going to fix them too? Not today. (that's "ldexp" btw) I glanced at atan2 - looks like it's returning results out of range... perhaps a min/max on the poly generated by the lowering logic is in order. Didn't investigate too closely. > > Except that you forgot to remove unused variables: > codegen/nv50_ir_lowering_nvc0.cpp: In member function ‘bool > nv50_ir::NVC0LoweringPass::handleSQRT(nv50_ir::Instruction*)’: > codegen/nv50_ir_lowering_nvc0.cpp:1785:20: warning: unused variable ‘mov’ > [-Wunused-variable] >Instruction *mov, *rsq; > ^ > codegen/nv50_ir_lowering_nvc0.cpp:1785:26: warning: unused variable ‘rsq’ > [-Wunused-variable] >Instruction *mov, *rsq; > ^ > CC nvc0/nvc0_surface.lo Yeah, I noticed that about 5s after I sent the patch out... already fixed locally. > > This patch is: > > Reviewed-by: Samuel Pitoiset > Tested-by: Samuel Pitoiset Thanks! > > >> >> Signed-off-by: Ilia Mirkin >> --- >> .../drivers/nouveau/codegen/nv50_ir_build_util.cpp | 6 +++- >> .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++ >> .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 7 ++--- >> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 32 >> +++--- >> src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- >> src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +- >> 6 files changed, 28 insertions(+), 23 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp >> index f58cf97..84ebfdb 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp >> @@ -585,6 +585,7 @@ BuildUtil::split64BitOpPostRA(Function *fn, >> Instruction *i, >>return NULL; >> srcNr = 2; >> break; >> + case OP_SELP: srcNr = 3; break; >> default: >> // TODO when needed >> return NULL; >> @@ -601,7 +602,10 @@ BuildUtil::split64BitOpPostRA(Function *fn, >> Instruction *i, >> >> for (int s = 0; s < srcNr; ++s) { >> if (lo->getSrc(s)->reg.size < 8) { >> - hi->setSrc(s, zero); >> + if (s == 2) >> +hi->setSrc(s, lo->getSrc(s)); >> + else >> +hi->setSrc(s, zero); >> } else { >>if (lo->getSrc(s)->refCount() > 1) >> lo->setSrc(s, cloneShallow(fn, lo->getSrc(s))); >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> index b06d86a..d284446 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> @@ -616,6 +616,7 @@ static nv50_ir::operation translateOpcode(uint opcode) >> >> NV50_IR_OPCODE_CASE(RCP, RCP); >> NV50_IR_OPCODE_CASE(RSQ, RSQ); >> + NV50_IR_OPCODE_CASE(SQRT, SQRT); >> >> NV50_IR_OPCODE_CASE(MUL, MUL); >> NV50_IR_OPCODE_CASE(ADD, ADD); >> @@ -2689,6 +2690,7 @@ Converter::handleInstruction(const struct >> tgsi_full_instruction *insn) >> case TGSI_OPCODE_FLR: >> case TGSI_OPCODE_TRUNC: >> case TGSI_OPCODE_RCP: >> + case TGSI_OPCODE_SQRT: >> case TGSI_OPCODE_IABS: >> case TGSI_OPCODE_INEG: >> case TGSI_OPCODE_NOT: >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp >> index 8752b0c..12c5f69 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp >> @@ -1203,10 +1203,9 @@ NV50LoweringPreSSA::handleDIV(Instruction *i) >> bool >> NV50LoweringPreSSA::handleSQRT(Instruction *i) >
[Mesa-dev] [PATCH] configure.ac require libdrm 2.4.65 for amdgpu because of drmGetDevice
From: Marek Olšák --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 49be147..d768db6 100644 --- a/configure.ac +++ b/configure.ac @@ -70,7 +70,7 @@ AC_SUBST([OPENCL_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.60 LIBDRM_RADEON_REQUIRED=2.4.56 -LIBDRM_AMDGPU_REQUIRED=2.4.63 +LIBDRM_AMDGPU_REQUIRED=2.4.65 LIBDRM_INTEL_REQUIRED=2.4.61 LIBDRM_NVVIEUX_REQUIRED=2.4.66 LIBDRM_NOUVEAU_REQUIRED=2.4.66 -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 09/10] i965: Add and use is_scheduling_barrier() function.
--- .../drivers/dri/i965/brw_schedule_instructions.cpp | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp index 98fa5e3..befa9ff 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -892,6 +892,14 @@ fs_instruction_scheduler::is_compressed(fs_inst *inst) return inst->exec_size == 16; } +static bool +is_scheduling_barrier(const fs_inst *inst) +{ + return inst->opcode == FS_OPCODE_PLACEHOLDER_HALT || + inst->is_control_flow() || + inst->has_side_effects(); +} + void fs_instruction_scheduler::calculate_deps() { @@ -917,9 +925,7 @@ fs_instruction_scheduler::calculate_deps() foreach_in_list(schedule_node, n, &instructions) { fs_inst *inst = (fs_inst *)n->inst; - if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT || - inst->is_control_flow() || - inst->has_side_effects()) + if (is_scheduling_barrier(inst)) add_barrier_deps(n); /* read-after-write deps. */ @@ -1131,6 +1137,13 @@ fs_instruction_scheduler::calculate_deps() } } +static bool +is_scheduling_barrier(const vec4_instruction *inst) +{ + return inst->is_control_flow() || + inst->has_side_effects(); +} + void vec4_instruction_scheduler::calculate_deps() { @@ -1152,7 +1165,7 @@ vec4_instruction_scheduler::calculate_deps() foreach_in_list(schedule_node, n, &instructions) { vec4_instruction *inst = (vec4_instruction *)n->inst; - if (inst->is_control_flow() || inst->has_side_effects()) + if (is_scheduling_barrier(inst)) add_barrier_deps(n); /* read-after-write deps. */ -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 10/10] i965: Don't add barrier deps for FB write messages.
Ken did this earlier, and this is just me reimplementing his patch a little differently. --- src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp index befa9ff..8d92584 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -897,7 +897,8 @@ is_scheduling_barrier(const fs_inst *inst) { return inst->opcode == FS_OPCODE_PLACEHOLDER_HALT || inst->is_control_flow() || - inst->has_side_effects(); + inst->eot || + (inst->has_side_effects() && inst->opcode != FS_OPCODE_FB_WRITE); } void -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] softpipe: fix misleading TGSI_QUAD_SIZE usage
From: Roland Scheidegger All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE. In practice both are of course the same unchangeable value (4), but it makes the code look a bit confusing. Moreover, some of the functions were actually given an array of 4 values according to the declaration, yet the code was addressing values 0/4/8/12 out of it, so fix this by just saying it's a pointer to floats like the other functions. While here, also add comment about not quite correct filtering. There's no actual code difference. --- src/gallium/drivers/softpipe/sp_tex_sample.c | 53 +++- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c b/src/gallium/drivers/softpipe/sp_tex_sample.c index e6bdd50..5703ca2 100644 --- a/src/gallium/drivers/softpipe/sp_tex_sample.c +++ b/src/gallium/drivers/softpipe/sp_tex_sample.c @@ -1047,7 +1047,7 @@ img_filter_2d_linear_repeat_POT(const struct sp_sampler_view *sp_sview, } /* interpolate R, G, B, A */ - for (c = 0; c < TGSI_QUAD_SIZE; c++) { + for (c = 0; c < TGSI_NUM_CHANNELS; c++) { rgba[TGSI_NUM_CHANNELS*c] = lerp_2d(xw, yw, tx[0][c], tx[1][c], tx[2][c], tx[3][c]); @@ -1063,7 +1063,7 @@ static inline void img_filter_2d_nearest_repeat_POT(const struct sp_sampler_view *sp_sview, const struct sp_sampler *sp_samp, const struct img_filter_args *args, - float rgba[TGSI_QUAD_SIZE]) + float *rgba) { const unsigned xpot = pot_level_size(sp_sview->xpot, args->level); const unsigned ypot = pot_level_size(sp_sview->ypot, args->level); @@ -1085,7 +1085,7 @@ img_filter_2d_nearest_repeat_POT(const struct sp_sampler_view *sp_sview, addr.bits.z = sp_sview->base.u.tex.first_layer; out = get_texel_2d_no_border(sp_sview, addr, x0, y0); - for (c = 0; c < TGSI_QUAD_SIZE; c++) + for (c = 0; c < TGSI_NUM_CHANNELS; c++) rgba[TGSI_NUM_CHANNELS*c] = out[c]; if (DEBUG_TEX) { @@ -1098,7 +1098,7 @@ static inline void img_filter_2d_nearest_clamp_POT(const struct sp_sampler_view *sp_sview, const struct sp_sampler *sp_samp, const struct img_filter_args *args, -float rgba[TGSI_QUAD_SIZE]) +float *rgba) { const unsigned xpot = pot_level_size(sp_sview->xpot, args->level); const unsigned ypot = pot_level_size(sp_sview->ypot, args->level); @@ -1128,7 +1128,7 @@ img_filter_2d_nearest_clamp_POT(const struct sp_sampler_view *sp_sview, y0 = ypot - 1; out = get_texel_2d_no_border(sp_sview, addr, x0, y0); - for (c = 0; c < TGSI_QUAD_SIZE; c++) + for (c = 0; c < TGSI_NUM_CHANNELS; c++) rgba[TGSI_NUM_CHANNELS*c] = out[c]; if (DEBUG_TEX) { @@ -1141,7 +1141,7 @@ static void img_filter_1d_nearest(const struct sp_sampler_view *sp_sview, const struct sp_sampler *sp_samp, const struct img_filter_args *args, - float rgba[TGSI_QUAD_SIZE]) + float *rgba) { const struct pipe_resource *texture = sp_sview->base.texture; const int width = u_minify(texture->width0, args->level); @@ -1159,7 +1159,7 @@ img_filter_1d_nearest(const struct sp_sampler_view *sp_sview, out = get_texel_1d_array(sp_sview, sp_samp, addr, x, sp_sview->base.u.tex.first_layer); - for (c = 0; c < TGSI_QUAD_SIZE; c++) + for (c = 0; c < TGSI_NUM_CHANNELS; c++) rgba[TGSI_NUM_CHANNELS*c] = out[c]; if (DEBUG_TEX) { @@ -1191,7 +1191,7 @@ img_filter_1d_array_nearest(const struct sp_sampler_view *sp_sview, sp_samp->nearest_texcoord_s(args->s, width, args->offset[0], &x); out = get_texel_1d_array(sp_sview, sp_samp, addr, x, layer); - for (c = 0; c < TGSI_QUAD_SIZE; c++) + for (c = 0; c < TGSI_NUM_CHANNELS; c++) rgba[TGSI_NUM_CHANNELS*c] = out[c]; if (DEBUG_TEX) { @@ -1225,7 +1225,7 @@ img_filter_2d_nearest(const struct sp_sampler_view *sp_sview, sp_samp->nearest_texcoord_t(args->t, height, args->offset[1], &y); out = get_texel_2d(sp_sview, sp_samp, addr, x, y); - for (c = 0; c < TGSI_QUAD_SIZE; c++) + for (c = 0; c < TGSI_NUM_CHANNELS; c++) rgba[TGSI_NUM_CHANNELS*c] = out[c]; if (DEBUG_TEX) { @@ -1260,7 +1260,7 @@ img_filter_2d_array_nearest(const struct sp_sampler_view *sp_sview, sp_samp->nearest_texcoord_t(args->t, height, args->offset[1], &y); out = get_texel_2d_array(sp_sview, sp_samp, addr, x, y, layer); - for (c = 0; c < TGSI_QUAD_SIZE; c++) + for (c = 0; c < TGSI_NUM_CHANNELS; c++) rgba[TGSI_NUM_CHANNELS*c] = out[c]; if (DEBUG_TEX) { @@ -1304,7 +1304,7 @@ img_filter_cube_nearest(const struct sp_sampl
Re: [Mesa-dev] [PATCH v2 10/10] i965: Don't add barrier deps for FB write messages.
Matt Turner writes: > Ken did this earlier, and this is just me reimplementing his patch a > little differently. Heh, it seems a little mean to Ken to revert his patch only to implement almost the same thing a few commits later. I suggest you squash the revert and this patch into PATCHv2 9 before pushing. Either way the end result looks reasonable so the three patches (4, 9, 10) are: Reviewed-by: Francisco Jerez Thanks. > --- > src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > index befa9ff..8d92584 100644 > --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > @@ -897,7 +897,8 @@ is_scheduling_barrier(const fs_inst *inst) > { > return inst->opcode == FS_OPCODE_PLACEHOLDER_HALT || >inst->is_control_flow() || > - inst->has_side_effects(); > + inst->eot || > + (inst->has_side_effects() && inst->opcode != FS_OPCODE_FB_WRITE); > } > > void > -- > 2.4.10 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: fix line stippling
From: Roland Scheidegger The logic was comparing actual ints, not true/false values. This meant that it was emitting always multiple line segments instead of just one even if the stipple test had the same result, which looks inefficient, and the segments also overlapped thus breaking line aa as well. (In practice, with the no-op default line stipple pattern, for a 10-pixel long line from 0-9 it was emitting 10 segments, with the individual segments ranging from 0-1, 0-2, 0-3 and so on.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193 CC: --- src/gallium/auxiliary/draw/draw_pipe_stipple.c | 30 +- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_stipple.c b/src/gallium/auxiliary/draw/draw_pipe_stipple.c index dcf05aa..04854ad 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_stipple.c +++ b/src/gallium/auxiliary/draw/draw_pipe_stipple.c @@ -108,11 +108,11 @@ emit_segment(struct draw_stage *stage, struct prim_header *header, } -static inline unsigned +static inline boolean stipple_test(int counter, ushort pattern, int factor) { int b = (counter / factor) & 0xf; - return (1 << b) & pattern; + return !!((1 << b) & pattern); } @@ -126,7 +126,7 @@ stipple_line(struct draw_stage *stage, struct prim_header *header) const float *pos0 = v0->data[pos]; const float *pos1 = v1->data[pos]; float start = 0; - int state = 0; + boolean state = 0; float x0 = pos0[0]; float x1 = pos1[0]; @@ -143,29 +143,29 @@ stipple_line(struct draw_stage *stage, struct prim_header *header) stipple->counter = 0; - /* XXX ToDo: intead of iterating pixel-by-pixel, use a look-up table. + /* XXX ToDo: instead of iterating pixel-by-pixel, use a look-up table. */ for (i = 0; i < length; i++) { - int result = stipple_test( (int) stipple->counter+i, - (ushort) stipple->pattern, stipple->factor ); + boolean result = stipple_test((int)stipple->counter + i, +(ushort)stipple->pattern, stipple->factor); if (result != state) { /* changing from "off" to "on" or vice versa */ -if (state) { - if (start != i) { + if (state) { +if (start != i) { /* finishing an "on" segment */ - emit_segment( stage, header, start / length, i / length ); + emit_segment(stage, header, start / length, i / length); } -} -else { + } + else { /* starting an "on" segment */ - start = (float) i; -} -state = result; +start = (float)i; + } + state = result; } } if (state && start < length) - emit_segment( stage, header, start / length, 1.0 ); + emit_segment(stage, header, start / length, 1.0); stipple->counter += length; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] squash: Fix up VPM read optimization.
- Do not reorder instructions reading packed sources (fixes piglit regressions with draw-vertices and draw-vertices-user). --- Eric, Rhys, with this bit we should be good to go. src/gallium/drivers/vc4/vc4_opt_vpm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/vc4/vc4_opt_vpm.c b/src/gallium/drivers/vc4/vc4_opt_vpm.c index a4ee6af..d15b0c1 100644 --- a/src/gallium/drivers/vc4/vc4_opt_vpm.c +++ b/src/gallium/drivers/vc4/vc4_opt_vpm.c @@ -77,7 +77,8 @@ qir_opt_vpm(struct vc4_compile *c) continue; for (int j = 0; j < qir_get_op_nsrc(inst->op); j++) { -if (inst->src[j].file != QFILE_TEMP) +if (inst->src[j].file != QFILE_TEMP || +inst->src[j].pack) continue; uint32_t temp = inst->src[j].index; -- 2.6.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50, nvc0: Set only NEW_CP_GLOBALS upon binding
Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/nv50/nv50_state.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index c73e3ba..b9efb3f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -1246,7 +1246,7 @@ nv50_set_global_bindings(struct pipe_context *pipe, nouveau_bufctx_reset(nv50->bufctx_cp, NV50_BIND_CP_GLOBAL); - nv50->dirty_cp = NV50_NEW_CP_GLOBALS; + nv50->dirty_cp |= NV50_NEW_CP_GLOBALS; } void diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c index c279093..36e3546 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c @@ -1343,7 +1343,7 @@ nvc0_set_global_bindings(struct pipe_context *pipe, nouveau_bufctx_reset(nvc0->bufctx_cp, NVC0_BIND_CP_GLOBAL); - nvc0->dirty_cp = NVC0_NEW_CP_GLOBALS; + nvc0->dirty_cp |= NVC0_NEW_CP_GLOBALS; } void -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50: Mark compute states as dirty on context switch
Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c index 5536978..d06ba4a 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c @@ -437,6 +437,7 @@ nv50_switch_pipe_context(struct nv50_context *ctx_to) ctx_to->state = ctx_to->screen->save_state; ctx_to->dirty = ~0; + ctx_to->dirty_cp = ~0; ctx_to->viewports_dirty = ~0; ctx_to->scissors_dirty = ~0; -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50, nvc0: Set only NEW_CP_GLOBALS upon binding
This is there since ... 2013 ... but this was never really used because it's compute-related, that might explain why you are the first one to hit the issue. :-) Luckily, this doesn't affect compute shaders on Fermi because globals buffers are validated *after* all other things. Good catch! Reviewed-by: Samuel Pitoiset On 03/13/2016 10:11 PM, Pierre Moreau wrote: Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/nv50/nv50_state.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index c73e3ba..b9efb3f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -1246,7 +1246,7 @@ nv50_set_global_bindings(struct pipe_context *pipe, nouveau_bufctx_reset(nv50->bufctx_cp, NV50_BIND_CP_GLOBAL); - nv50->dirty_cp = NV50_NEW_CP_GLOBALS; + nv50->dirty_cp |= NV50_NEW_CP_GLOBALS; } void diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c index c279093..36e3546 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c @@ -1343,7 +1343,7 @@ nvc0_set_global_bindings(struct pipe_context *pipe, nouveau_bufctx_reset(nvc0->bufctx_cp, NVC0_BIND_CP_GLOBAL); - nvc0->dirty_cp = NVC0_NEW_CP_GLOBALS; + nvc0->dirty_cp |= NVC0_NEW_CP_GLOBALS; } void ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] i965: Remove NOP insertion kludge in scheduler.
Matt Turner writes: > Instead of removing every instruction in add_insts_from_block(), just > move the instruction to its scheduled location. This is a step towards > doing both bottom-up and top-down scheduling without conflicts. > > Note that this patch changes cycle counts for programs because it begins > including control flow instructions in the estimates. > --- > .../drivers/dri/i965/brw_schedule_instructions.cpp | 25 > +- > 1 file changed, 5 insertions(+), 20 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > index 46b45a5..98fa5e3 100644 > --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > @@ -783,26 +783,13 @@ schedule_node::schedule_node(backend_instruction *inst, > void > instruction_scheduler::add_insts_from_block(bblock_t *block) > { > - /* Removing the last instruction from a basic block removes the block as > -* well, so put a NOP at the end to keep it alive. > -*/ > - if (!block->end()->is_control_flow()) { > - backend_instruction *nop = new(mem_ctx) backend_instruction(); > - nop->opcode = BRW_OPCODE_NOP; > - block->end()->insert_after(block, nop); > - } > - Thanks for removing this hack, I had been meaning to do the same for a while. What was really bothering me about it is the fact that it inserts a naked backend_instruction into the basic block's instruction list, which everything else assumes contains elements of a concrete instruction class, fs_inst or vec4_instruction depending on the back-end -- So thanks a lot, Reviewed-by: Francisco Jerez > - foreach_inst_in_block_safe(backend_instruction, inst, block) { > - if (inst->opcode == BRW_OPCODE_NOP || inst->is_control_flow()) > - continue; > - > + foreach_inst_in_block(backend_instruction, inst, block) { >schedule_node *n = new(mem_ctx) schedule_node(inst, this); > > - this->instructions_to_schedule++; > - > - inst->remove(block); >instructions.push_tail(n); > } > + > + this->instructions_to_schedule = block->end_ip - block->start_ip + 1; > } > > /** Recursive computation of the delay member of a node. */ > @@ -1463,7 +1450,6 @@ void > instruction_scheduler::schedule_instructions(bblock_t *block) > { > const struct brw_device_info *devinfo = bs->devinfo; > - backend_instruction *inst = block->end(); > time = 0; > if (!post_reg_alloc) >reg_pressure = reg_pressure_in[block->num]; > @@ -1482,7 +1468,8 @@ instruction_scheduler::schedule_instructions(bblock_t > *block) >/* Schedule this instruction. */ >assert(chosen); >chosen->remove(); > - inst->insert_before(block, chosen->inst); > + chosen->inst->exec_node::remove(); > + block->instructions.push_tail(chosen->inst); >instructions_to_schedule--; > >if (!post_reg_alloc) { > @@ -1551,8 +1538,6 @@ instruction_scheduler::schedule_instructions(bblock_t > *block) >} > } > > - if (block->end()->opcode == BRW_OPCODE_NOP) > - block->end()->remove(block); > assert(instructions_to_schedule == 0); > > block->cycle_count = time; > -- > 2.4.10 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: Mark compute states as dirty on context switch
Well, without a new validation path for compute on Tesla this won't change anything because nv50_state_validate() is 3d-related and it should be never called by compute. On 03/13/2016 10:11 PM, Pierre Moreau wrote: Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c index 5536978..d06ba4a 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c @@ -437,6 +437,7 @@ nv50_switch_pipe_context(struct nv50_context *ctx_to) ctx_to->state = ctx_to->screen->save_state; ctx_to->dirty = ~0; + ctx_to->dirty_cp = ~0; ctx_to->viewports_dirty = ~0; ctx_to->scissors_dirty = ~0; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50, nvc0: Set only NEW_CP_GLOBALS upon binding
On 03/13/2016 10:18 PM, Samuel Pitoiset wrote: This is there since ... 2013 ... but this was never really used because it's compute-related, that might explain why you are the first one to hit the issue. :-) Luckily, this doesn't affect compute shaders on Fermi because globals buffers are validated *after* all other things. Err, because we don't need to bind global buffers. Good catch! Reviewed-by: Samuel Pitoiset On 03/13/2016 10:11 PM, Pierre Moreau wrote: Signed-off-by: Pierre Moreau --- src/gallium/drivers/nouveau/nv50/nv50_state.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index c73e3ba..b9efb3f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -1246,7 +1246,7 @@ nv50_set_global_bindings(struct pipe_context *pipe, nouveau_bufctx_reset(nv50->bufctx_cp, NV50_BIND_CP_GLOBAL); - nv50->dirty_cp = NV50_NEW_CP_GLOBALS; + nv50->dirty_cp |= NV50_NEW_CP_GLOBALS; } void diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c index c279093..36e3546 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c @@ -1343,7 +1343,7 @@ nvc0_set_global_bindings(struct pipe_context *pipe, nouveau_bufctx_reset(nvc0->bufctx_cp, NVC0_BIND_CP_GLOBAL); - nvc0->dirty_cp = NVC0_NEW_CP_GLOBALS; + nvc0->dirty_cp |= NVC0_NEW_CP_GLOBALS; } void ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Small changes in GL3.txt
Hi! With the recent commits, I'm having some trouble parsing the GL3.txt text file for mesamatrix.net. With these set of 4 patches, I'm proposing to rationalize a little bit this file. The content is exactly the same. (Note: I'm not really used to propose patches, so if this format is wrong, I'll be glad to do it the proper way) # gl3_txt_1_howto.patch Add some explanations on how to read (and edit) GL3.txt. It tends to give some sort of easy to follow standard. # gl3_txt_2_realign_status.patch Realign the Status column to the same column number. # gl3_txt_3_renormalize_some_extensions.patch The comment for GL_ARB_texture_buffer_object was between "DONE" and the brackets, it's an exception compared to the way it's done for other extensions. The status of GL_KHR_robustness was "90% done", it is now "in progress" and "90% done" is in the extension comment. # gl3_txt_4_renormalize_older_extensions.patch On older extensions, there is an explanation first and the extension name in brackets. I inverted that so we have the extension first and then the explanation in brackets (it will help me later for the "all drivers that support "). I might have more patches later, but I'd like to see if these are OK for you first. Cheers! Romain diff --git a/docs/GL3.txt b/docs/GL3.txt index ee7faca..1ed5c1a 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -1,13 +1,28 @@ +# Status of OpenGL extensions in Mesa -Status of OpenGL 3.x features in Mesa +Here's how to read this file: +all DONE: , ... +All the extensions are done for the given list of drivers. -Note: when an item is marked as "DONE" it means all the core Mesa -infrastructure is complete but it may be the case that few (if any) drivers -implement the features. +DONE +The extension is done for Mesa and no implementation is necessary on the +driver-side. +DONE () +The extension is done for Mesa and all the drivers in the "all DONE" list. -OpenGL Core and Compatibility context support +DONE (, ...) +The extension is done for Mesa, all the drivers in the "all DONE" list, and +all the drivers in the brackets. + +in progress +The extension is started but not finished yet. + +not started +The extension isn't started yet. + +# OpenGL Core and Compatibility context support OpenGL 3.1 and later versions are only supported with the Core profile. There are no plans to support GL_ARB_compatibility. The last supported OpenGL diff --git a/docs/GL3.txt b/docs/GL3.txt index 1ed5c1a..17522d5 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -30,8 +30,8 @@ version with all deprecated features is 3.0. Some of the later GL features are exposed in the 3.0 context as extensions. -Feature Status -- +Feature Status +--- GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe @@ -109,170 +109,170 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft GL 4.0, GLSL 4.00 --- all DONE: nvc0, r600, radeonsi - GL_ARB_draw_buffers_blendDONE (i965, nv50, llvmpipe, softpipe) - GL_ARB_draw_indirect DONE (i965, llvmpipe, softpipe) - GL_ARB_gpu_shader5 DONE (i965) - - 'precise' qualifierDONE - - Dynamically uniform sampler array indices DONE (softpipe) - - Dynamically uniform UBO array indices DONE () - - Implicit signed -> unsigned conversionsDONE - - Fused multiply-add DONE () - - Packing/bitfield/conversion functions DONE (softpipe) - - Enhanced textureGather DONE (softpipe) - - Geometry shader instancing DONE (llvmpipe, softpipe) - - Geometry shader multiple streams DONE () - - Enhanced per-sample shadingDONE () - - Interpolation functionsDONE () - - New overload resolution rules DONE - GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe) - GL_ARB_sample_shadingDONE (i965, nv50) - GL_ARB_shader_subroutine DONE (i965, nv50, llvmpipe, softpipe) - GL_ARB_tessellation_shader DONE (i965) - GL_ARB_texture_buffer_object_rgb32 DONE (i965, llvmpipe, softpipe) - GL_ARB_texture_cube_map_arrayDONE (i965, nv50, llvmpipe, softpipe) - GL_ARB_texture_gatherDONE (i965, nv50, llvmpipe, softpipe) - GL_ARB_texture_query_lod DONE (i965, nv50, softpipe) - GL_ARB_transform_feedback2
Re: [Mesa-dev] [PATCH] configure.ac require libdrm 2.4.65 for amdgpu because of drmGetDevice
Hi Marek, On 13 March 2016 at 16:46, Marek Olšák wrote: > From: Marek Olšák > The struct is available in 2.4.65, although the functions (drm{Get,Free}Device) are available since 2.4.66 afaics. Mildly related: Not sure it there are any non PCI ATI/AMD devices, then again one should check drmDevice::bus_type prior to accessing drmDevice::bus_info.pci. > src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c not longer compile: > error: unknown type name ‘drmDevicePtr’ > --- > configure.ac | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure.ac b/configure.ac > index 49be147..d768db6 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -70,7 +70,7 @@ AC_SUBST([OPENCL_VERSION]) > dnl Versions for external dependencies > LIBDRM_REQUIRED=2.4.60 > LIBDRM_RADEON_REQUIRED=2.4.56 > -LIBDRM_AMDGPU_REQUIRED=2.4.63 > +LIBDRM_AMDGPU_REQUIRED=2.4.65 LIBDRM_REQUIRED should be bumped, as it's the one that provides the symbol(s). -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac require libdrm 2.4.65 for amdgpu because of drmGetDevice
Reviewed-by: Edward O'Callaghan On 2016-03-14 03:46, Marek Olšák wrote: From: Marek Olšák --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 49be147..d768db6 100644 --- a/configure.ac +++ b/configure.ac @@ -70,7 +70,7 @@ AC_SUBST([OPENCL_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.60 LIBDRM_RADEON_REQUIRED=2.4.56 -LIBDRM_AMDGPU_REQUIRED=2.4.63 +LIBDRM_AMDGPU_REQUIRED=2.4.65 LIBDRM_INTEL_REQUIRED=2.4.61 LIBDRM_NVVIEUX_REQUIRED=2.4.66 LIBDRM_NOUVEAU_REQUIRED=2.4.66 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] spirv: Fix structure splitting with per-vertex interface arrays.
We want to use interface_type, not vtn_var->type. They're normally equivalent, but for geometry/tessellation per-vertex interface arrays, we need to unwrap a level. Otherwise, we tried to iterate a structure members but instead used an array length. If the array length was longer than the number of fields in the structure, we'd crash. --- src/compiler/nir/spirv/vtn_variables.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/nir/spirv/vtn_variables.c b/src/compiler/nir/spirv/vtn_variables.c index 31bf416..1628042 100644 --- a/src/compiler/nir/spirv/vtn_variables.c +++ b/src/compiler/nir/spirv/vtn_variables.c @@ -923,7 +923,7 @@ var_decoration_cb(struct vtn_builder *b, struct vtn_value *val, int member, vtn_var->var->data.explicit_location = true; } else { assert(vtn_var->members); - unsigned length = glsl_get_length(vtn_var->type->type); + unsigned length = glsl_get_length(val->type->type); for (unsigned i = 0; i < length; i++) { vtn_var->members[i]->data.location = location; vtn_var->members[i]->data.explicit_location = true; -- 2.7.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa include guard style. (Was: [PATCH] i965/cfg: Remove redundant #pragma once.)
Ian Romanick writes: > On 03/11/2016 03:46 PM, Eric Anholt wrote: >> Ian Romanick writes: >> >>> On 03/10/2016 05:53 PM, Francisco Jerez wrote: Iago Toral writes: > On Wed, 2016-03-09 at 19:04 -0800, Francisco Jerez wrote: >> Matt Turner writes: >> >>> On Wed, Mar 9, 2016 at 1:37 PM, Francisco Jerez >>> wrote: Iago Toral writes: > On Tue, 2016-03-08 at 17:42 -0800, Francisco Jerez wrote: >> brw_cfg.h already has include guards, remove the "#pragma once" which >> is redundant and non-standard. > > FWIW, I think using both #pragma once and include guards is a way to > keep portability while still getting the performance advantage of > #pragma once where it is supported. > It's highly unlikely to make any significant difference on any reasonably modern compiler. I cannot measure any change in compilation time locally from my cleanup. > Also it seems that we do the same thing in many other files... > Really? I'm not aware of any other file where we use both. >>> >>> There are quite a few in glsl/ >> >> Heh, apparently you're right. Anyway it seems rather pointless to use >> '#pragma once' in a bunch of scattered header files with the expectation >> to gain some speed, the improvement from a single header file is so >> minuscule (if it will make any difference at all on a modern compiler >> and compilation workload, which I doubt) that we would have to use it >> universally in order to have the chance to measure any improvement. >> >> Can we please just decide for one of the include guard styles and use it >> consistently? Given that the majority of header files in the Mesa >> codebase use old-school define guards, that it's the only standard >> option, that it has well-defined semantics in presence of file copies >> and hardlinks, and that the performance argument against it is rather >> dubious (although I definitely find '#pragma once' prettier and more >> concise), I'd vote for using preprocessor define guards universally. >> >> What do other people think? > > I think we have to use define guards necessarily since #pragma once is > not standard even it it has wide support. So the question is whether we > want to use only define guards or define guards plus #pragma once. I am > fine with doing only define guards as you propose. *Shrug* I have the impression that the only real advantage of '#pragma once' is that you no longer need to do the ifndef/define dance, so I don't think I can see much benefit in doing both. >>> >>> Several compilers will cache the file name where '#pragma once' occurs >>> and never read that file again. A #include of a file previously seen >>> with '#pragma once' becomes a no-op. Since the file is never read, the >>> compiler avoids all the I/O and the parsing. That is true of MSVC and, >>> I thought, some versions of GCC. As Iago points out, some compilers >>> ignore the #pragma altogether. Since Mesa supports (or does it?) some >>> of these compilers, we have to have the ifdef/define/endif guards. >> >> Compilers have noticed that ifdef/define/endif is a thing and optimized >> it, anyway. >> >> https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html > > That's cool! I don't think GCC did that when I looked into this in > 2010. It sounds like the #pragma actually breaks the GCC optimization, > so let's get rid of them all. Shall I take that as an Acked-by for my patch? It doesn't get rid of them all but it's a start. ;) signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] squash: Fix up VPM read optimization.
On 13 March 2016 at 17:09, Varad Gautam wrote: > - Do not reorder instructions reading packed sources (fixes piglit > regressions with draw-vertices and draw-vertices-user). > --- > Eric, Rhys, with this bit we should be good to go. > > src/gallium/drivers/vc4/vc4_opt_vpm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/vc4/vc4_opt_vpm.c > b/src/gallium/drivers/vc4/vc4_opt_vpm.c > index a4ee6af..d15b0c1 100644 > --- a/src/gallium/drivers/vc4/vc4_opt_vpm.c > +++ b/src/gallium/drivers/vc4/vc4_opt_vpm.c > @@ -77,7 +77,8 @@ qir_opt_vpm(struct vc4_compile *c) > continue; > > for (int j = 0; j < qir_get_op_nsrc(inst->op); j++) { > -if (inst->src[j].file != QFILE_TEMP) > +if (inst->src[j].file != QFILE_TEMP || > +inst->src[j].pack) > continue; > > uint32_t temp = inst->src[j].index; > -- > 2.6.2 > > No piglit regressions, so this gets my Tested-by: Rhys Kidd This applies to the squashed set of [0], [1], and [2] plus this patch. Given I believe you do not yet have Mesa commit access, perhaps ask Eric if he'd commit that set for you. Thanks for your contribution Varad! [0] https://patchwork.freedesktop.org/patch/76069/ [1] https://patchwork.freedesktop.org/patch/76070/ [2] https://patchwork.freedesktop.org/patch/76528/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] LLVMInitializeAMDGPU* undefined?
2016-03-11 11:50 GMT+08:00 Jan Vesely : > On Fri, 2016-03-11 at 10:09 +0800, Chih-Wei Huang wrote: >> cc1: some warnings being treated as errors >> >> >> But I'm still not sure whether if it should be fixed on the mesa >> side. >> >> So my question is, what kind of fix do we want (i.e., acceptable by >> the upstream)? > > this is the result of using llvm without AMDGPU backend. the header > "llvm-c/Target.h" includes "llvm/Config/Targets.def" and declares > functions based on backends listed there (selected at llvm build time). > > Your "llvm/Config/Targets.def" needs to include this line: > LLVM_TARGET(AMDGPU) Ah, thank you for pointed it out. Originally I thought adding Android.mk for AMDGPU is enough. I'll try to add this. Anything else we need to do to "enable" AMDGPU backend? > otherwise the functions are not declared. > probably the only patch needed on mesa side is to detect this at > configure time. Android doesn't need configure. What we need is to detect Android version then we know the llvm version it uses and what backends it has. The patch is already in android-x86 codebase but probably not submited to mesa-dev yet. -- Chih-Wei Android-x86 project http://www.android-x86.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] get: reconcile aliasing enums for MaxCombinedShaderOutputResources
On Thursday, March 10, 2016 6:26:43 PM PDT Nicolai Hähnle wrote: > From: Nicolai Hähnle > > The enums MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS and > MAX_COMBINED_SHADER_OUTPUT_RESOURCES are equal and should therefore only > appear once. > > Noticed while implementing ARB_shader_image_load_store without previously > implementing SSBO. > --- > src/mesa/main/get.c | 7 +++ > src/mesa/main/get_hash_params.py | 6 -- > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c > index 67c4f99..b0fadc9 100644 > --- a/src/mesa/main/get.c > +++ b/src/mesa/main/get.c > @@ -384,6 +384,13 @@ static const int extra_ARB_shader_storage_buffer_object_and_geometry_shader[] = > EXTRA_END > }; > > +static const int extra_ARB_shader_image_load_store_shader_storage_buffer_object_es31[] = { > + EXT(ARB_shader_image_load_store), > + EXT(ARB_shader_storage_buffer_object), > + EXTRA_API_ES31, > + EXTRA_END > +}; > + > static const int extra_ARB_framebuffer_no_attachments_and_geometry_shader[] = { > EXTRA_EXT_FB_NO_ATTACH_GS, > EXTRA_END > diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/ get_hash_params.py > index f9d66f8..12c2189 100644 > --- a/src/mesa/main/get_hash_params.py > +++ b/src/mesa/main/get_hash_params.py > @@ -495,9 +495,12 @@ descriptor=[ >[ "MAX_COMBINED_SHADER_STORAGE_BLOCKS", "CONTEXT_INT(Const.MaxCombinedShaderStorageBlocks), extra_ARB_shader_storage_buffer_object_es31" ], >[ "MAX_SHADER_STORAGE_BLOCK_SIZE", "CONTEXT_INT(Const.MaxShaderStorageBlockSize), extra_ARB_shader_storage_buffer_object_es31" ], >[ "MAX_SHADER_STORAGE_BUFFER_BINDINGS", "CONTEXT_INT(Const.MaxShaderStorageBufferBindings), extra_ARB_shader_storage_buffer_object_es31" ], > - [ "MAX_COMBINED_SHADER_OUTPUT_RESOURCES", "CONTEXT_INT(Const.MaxCombinedShaderOutputResources), extra_ARB_shader_storage_buffer_object_es31" ], >[ "SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT", "CONTEXT_INT(Const.ShaderStorageBufferOffsetAlignment), extra_ARB_shader_storage_buffer_object_es31" ], >[ "SHADER_STORAGE_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0, extra_ARB_shader_storage_buffer_object_es31" ], > + > + # GL_ARB_shader_image_load_store / GL_ARB_shader_storage_buffer_object / GLES 3.1 > + # (MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS in GL_ARB_shader_image_load_store) > + [ "MAX_COMBINED_SHADER_OUTPUT_RESOURCES", "CONTEXT_INT(Const.MaxCombinedShaderOutputResources), extra_ARB_shader_image_load_store_shader_storage_buffer_object_es31" ], > ]}, > > # Enums in OpenGL Core profile and ES 3.1 > @@ -841,7 +844,6 @@ descriptor=[ >[ "MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB", "CONTEXT_INT(Const.MaxProgramTextureGatherComponents), extra_ARB_texture_gather"], > > # GL_ARB_shader_image_load_store > - [ "MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS", "CONTEXT_INT(Const.MaxCombinedShaderOutputResources), extra_ARB_shader_image_load_store" ], >[ "MAX_IMAGE_SAMPLES", "CONTEXT_INT(Const.MaxImageSamples), extra_ARB_shader_image_load_store" ], > > # GL_EXT_polygon_offset_clamp > Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/30] i965/fs: Add missing analysis invalidation in opt_sampler_eot().
Bug found by the liveness analysis validation pass that will be introduced in a later commit. opt_sampler_eot() was allocating registers and inserting and removing instructions, which makes the cached liveness analysis calculation inconsistent with the shader IR, so it must be invalidated. Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index ff67cf4..42bc5e2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2441,8 +2441,10 @@ fs_visitor::opt_sampler_eot() * we have enough space, but it will make sure the dead code eliminator kills * the instruction that this will replace. */ - if (tex_inst->header_size != 0) + if (tex_inst->header_size != 0) { + invalidate_live_intervals(); return true; + } fs_reg send_header = ibld.vgrf(BRW_REGISTER_TYPE_F, load_payload->sources + 1); @@ -2473,6 +2475,7 @@ fs_visitor::opt_sampler_eot() tex_inst->insert_before(cfg->blocks[cfg->num_blocks - 1], new_load_payload); tex_inst->src[0] = send_header; + invalidate_live_intervals(); return true; } -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 19/30] i965/fs: Pass single backend_shader argument to the fs_live_variables constructor.
This removes the dependency of fs_live_variables on fs_visitor. The IR analysis framework requires the analysis result to be constructible with a single argument -- The second argument was redundant anyway. --- src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 12 ++-- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 6 ++ 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 6da026e..4b0943f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -238,22 +238,22 @@ fs_live_variables::compute_start_end() } } -fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg) - : v(v), cfg(cfg) +fs_live_variables::fs_live_variables(const backend_shader *s) + : cfg(s->cfg) { mem_ctx = ralloc_context(NULL); - num_vgrfs = v->alloc.count; + num_vgrfs = s->alloc.count; num_vars = 0; var_from_vgrf = rzalloc_array(mem_ctx, int, num_vgrfs); for (int i = 0; i < num_vgrfs; i++) { var_from_vgrf[i] = num_vars; - num_vars += v->alloc.sizes[i]; + num_vars += s->alloc.sizes[i]; } vgrf_from_var = rzalloc_array(mem_ctx, int, num_vars); for (int i = 0; i < num_vgrfs; i++) { - for (unsigned j = 0; j < v->alloc.sizes[i]; j++) { + for (unsigned j = 0; j < s->alloc.sizes[i]; j++) { vgrf_from_var[var_from_vgrf[i] + j] = i; } } @@ -323,7 +323,7 @@ fs_visitor::calculate_live_intervals() if (this->live_intervals) return; - this->live_intervals = new(mem_ctx) fs_live_variables(this, cfg); + this->live_intervals = new(mem_ctx) fs_live_variables(this); } bool diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index c0943c7..e1cd12c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -32,7 +32,7 @@ #include "util/bitset.h" struct cfg_t; -class fs_visitor; +struct backend_shader; namespace brw { @@ -66,7 +66,7 @@ public: DECLARE_RALLOC_CXX_OPERATORS(fs_live_variables) - fs_live_variables(fs_visitor *v, const cfg_t *cfg); + fs_live_variables(const backend_shader *s); ~fs_live_variables(); bool vars_interfere(int a, int b) const; @@ -118,10 +118,8 @@ protected: void compute_live_variables(); void compute_start_end(); - fs_visitor *v; const cfg_t *cfg; void *mem_ctx; - }; } /* namespace brw */ -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/30] i965/ir: Introduce simple IR analysis pass framework.
Motivated in detail in the source code. The only piece missing here from the analysis pass infrastructure is some sort of mechanism to broadcast changes in the IR to all existing analysis passes, which will be addressed by a future commit. The analysis_dependency_class enum might seem a bit silly at this point, more interesting dependency categories will be defined later on. --- src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_ir_analysis.h | 146 2 files changed, 147 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/brw_ir_analysis.h diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 1c6d2e9..ba6f841 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -39,6 +39,7 @@ i965_compiler_FILES = \ brw_interpolation_map.c \ brw_ir.h \ brw_ir_allocator.h \ + brw_ir_analysis.h \ brw_ir_fs.h \ brw_ir_vec4.h \ brw_nir.h \ diff --git a/src/mesa/drivers/dri/i965/brw_ir_analysis.h b/src/mesa/drivers/dri/i965/brw_ir_analysis.h new file mode 100644 index 000..31989ec --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_ir_analysis.h @@ -0,0 +1,146 @@ +/* -*- c++ -*- */ +/* + * Copyright © 2016 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef BRW_IR_ANALYSIS_H +#define BRW_IR_ANALYSIS_H + +namespace brw { + /** +* Bitset of state categories that can influence the result of IR analysis +* passes. +*/ + enum analysis_dependency_class { + /** + * The analysis doesn't depend on the IR, its result is effectively a + * constant during the compilation. + */ + DEPENDENCY_NOTHING = 0, + /** + * The analysis depends on the program being literally the same (good + * luck...), any change in the input invalidates previous analysis + * computations. + */ + DEPENDENCY_EVERYTHING = ~0 + }; + + inline analysis_dependency_class + operator|(analysis_dependency_class x, analysis_dependency_class y) + { + return static_cast( + static_cast(x) | static_cast(y)); + } +} + +/** + * Instantiate a program analysis class \p L which can calculate an object of + * type \p T as result. \p C is a closure that encapsulates whatever + * information is required as argument to run the analysis pass. The purpose + * of this class is to make sure that: + * + * - The analysis pass is executed lazily whenever it's needed and multiple + *executions are optimized out as long as the cached result remains marked + *up-to-date. + * + * - There is no way to access the cached analysis result without first + *calling L::require(), which makes sure that the analysis pass is rerun + *if necessary. + * + * - The cached result doesn't become inconsistent with the program for as + *long as it remains marked up-to-date. (This is only enforced in debug + *builds for performance reasons) + * + * The requirements on \p T are the following: + * + * - Constructible with a single argument, as in 'x = T(c)' for \p c of type + *\p C. + * + * - 'x.dependency_class()' on const \p x returns a bitset of + *brw::analysis_dependency_class specifying the set of IR objects that are + *required to remain invariant for the cached analysis result to be + *considered valid. + * + * - 'x.validate(c)' on const \p x returns a boolean result specifying + *whether the analysis result \p x is consistent with the input IR. This + *is currently only used for validation in debug builds. + */ +#define BRW_ANALYSIS(L, T, C) \ + class L {\ + public: \ +
[Mesa-dev] [PATCH 05/30] i965/ir: Move base IR definitions into a separate header file.
This pulls out the i965 IR definitions into a separate file and leaves the top-level backend_shader structure and back-end compiler entry points in brw_shader.h. The purpose is to keep things tidy and prevent a nasty circular dependency between brw_cfg.h and brw_shader.h. The logical dependency between these data structures looks like: backend_shader (brw_shader.h) -> cfg_t (brw_cfg.h) -> bblock_t (brw_cfg.h) -> backend_instruction (brw_shader.h) This circular header dependency is currently resolved by using forward declarations of cfg_t/bblock_t in brw_shader.h and having brw_cfg.h include brw_shader.h, which seems backwards and won't work at all when the forward declarations of cfg_t/bblock_t are no longer sufficient in a future commit. --- src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_ir.h | 164 + src/mesa/drivers/dri/i965/brw_shader.h | 138 +--- 3 files changed, 167 insertions(+), 136 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/brw_ir.h diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 4689588..1c6d2e9 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -37,6 +37,7 @@ i965_compiler_FILES = \ brw_fs_visitor.cpp \ brw_inst.h \ brw_interpolation_map.c \ + brw_ir.h \ brw_ir_allocator.h \ brw_ir_fs.h \ brw_ir_vec4.h \ diff --git a/src/mesa/drivers/dri/i965/brw_ir.h b/src/mesa/drivers/dri/i965/brw_ir.h new file mode 100644 index 000..1e4d8c7 --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_ir.h @@ -0,0 +1,164 @@ +/* -*- c++ -*- */ +/* + * Copyright © 2010-2016 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef BRW_IR_H +#define BRW_IR_H + +#include "brw_reg.h" +#include "compiler/glsl/list.h" + +#define MAX_SAMPLER_MESSAGE_SIZE 11 +#define MAX_VGRF_SIZE 16 + +#ifdef __cplusplus +struct backend_reg : private brw_reg +{ + backend_reg() {} + backend_reg(const struct brw_reg ®) : brw_reg(reg) {} + + const brw_reg &as_brw_reg() const + { + assert(file == ARF || file == FIXED_GRF || file == MRF || file == IMM); + assert(reg_offset == 0); + return static_cast(*this); + } + + brw_reg &as_brw_reg() + { + assert(file == ARF || file == FIXED_GRF || file == MRF || file == IMM); + assert(reg_offset == 0); + return static_cast(*this); + } + + bool equals(const backend_reg &r) const; + + bool is_zero() const; + bool is_one() const; + bool is_negative_one() const; + bool is_null() const; + bool is_accumulator() const; + bool in_range(const backend_reg &r, unsigned n) const; + + /** +* Offset within the virtual register. +* +* In the scalar backend, this is in units of a float per pixel for pre- +* register allocation registers (i.e., one register in SIMD8 mode and two +* registers in SIMD16 mode). +* +* For uniforms, this is in units of 1 float. +*/ + uint16_t reg_offset; + + using brw_reg::type; + using brw_reg::file; + using brw_reg::negate; + using brw_reg::abs; + using brw_reg::address_mode; + using brw_reg::subnr; + using brw_reg::nr; + + using brw_reg::swizzle; + using brw_reg::writemask; + using brw_reg::indirect_offset; + using brw_reg::vstride; + using brw_reg::width; + using brw_reg::hstride; + + using brw_reg::f; + using brw_reg::d; + using brw_reg::ud; +}; + +struct bblock_t; +#endif + +#ifdef __cplusplus +struct backend_instruction : public exec_node { + bool is_3src() const; + bool is_tex() const; + bool is_math() const; + bool is_control_flow() const; + bool is_commutative() const; + bool can_do_source_mods() const; + bool can_do_saturate() const; + bool can_do_cmod() const; + bool r
[Mesa-dev] [PATCH 15/30] i965/ir: Mark virtual_grf_interferes and vars_interfere as const.
--- src/mesa/drivers/dri/i965/brw_fs.h| 2 +- src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 2 +- src/mesa/drivers/dri/i965/brw_vec4.h | 2 +- src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index e47a8e5..4f5d085 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -162,7 +162,7 @@ public: bool remove_duplicate_mrf_writes(); bool opt_sampler_eot(); - bool virtual_grf_interferes(int a, int b); + bool virtual_grf_interferes(int a, int b) const; void schedule_instructions(instruction_scheduler_mode mode); void insert_gen4_send_dependency_workarounds(); void insert_gen4_pre_send_dependency_workarounds(bblock_t *block, diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 4ebefe4..676619a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -333,14 +333,14 @@ fs_visitor::calculate_live_intervals() } bool -fs_live_variables::vars_interfere(int a, int b) +fs_live_variables::vars_interfere(int a, int b) const { return !(end[b] <= start[a] || end[a] <= start[b]); } bool -fs_visitor::virtual_grf_interferes(int a, int b) +fs_visitor::virtual_grf_interferes(int a, int b) const { return !(virtual_grf_end[a] <= virtual_grf_start[b] || virtual_grf_end[b] <= virtual_grf_start[a]); diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index 6b49cfc..fb5aee7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -69,7 +69,7 @@ public: fs_live_variables(fs_visitor *v, const cfg_t *cfg); ~fs_live_variables(); - bool vars_interfere(int a, int b); + bool vars_interfere(int a, int b) const; int var_from_reg(const fs_reg ®) const { return var_from_vgrf[reg.nr] + reg.reg_offset; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index c3f9ca5..bb0329f0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -143,7 +143,7 @@ public: bool dead_code_eliminate(); int var_range_start(unsigned v, unsigned n) const; int var_range_end(unsigned v, unsigned n) const; - bool virtual_grf_interferes(int a, int b); + bool virtual_grf_interferes(int a, int b) const; bool opt_cmod_propagation(); bool opt_copy_propagation(bool do_constant_prop = true); bool opt_cse_local(bblock_t *block); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 8e1efa4..ecf5dee 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -339,7 +339,7 @@ vec4_visitor::var_range_end(unsigned v, unsigned n) const } bool -vec4_visitor::virtual_grf_interferes(int a, int b) +vec4_visitor::virtual_grf_interferes(int a, int b) const { return !((var_range_end(4 * alloc.offsets[a], 4 * alloc.sizes[a]) <= var_range_start(4 * alloc.offsets[b], 4 * alloc.sizes[b])) || -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/30] i965/ir: Reverse inclusion dependency between brw_cfg.h and brw_shader.h.
This reflects the natural dependency relationship between brw_cfg.h and brw_shader.h. brw_cfg.h only requires the base IR definitions which are now part of a separate header. --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 1 + src/mesa/drivers/dri/i965/brw_cfg.h| 5 +++-- src/mesa/drivers/dri/i965/brw_predicated_break.cpp | 2 +- src/mesa/drivers/dri/i965/brw_shader.h | 4 +--- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index 5d46615..4fcee72 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -26,6 +26,7 @@ */ #include "brw_cfg.h" +#include "brw_shader.h" /** @file brw_cfg.cpp * diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index a2ca6b1..0813798 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -28,7 +28,7 @@ #ifndef BRW_CFG_H #define BRW_CFG_H -#include "brw_shader.h" +#include "brw_ir.h" struct bblock_t; @@ -46,7 +46,8 @@ struct bblock_link { struct bblock_t *block; }; -struct backend_instruction; +struct backend_shader; +struct cfg_t; struct bblock_t { #ifdef __cplusplus diff --git a/src/mesa/drivers/dri/i965/brw_predicated_break.cpp b/src/mesa/drivers/dri/i965/brw_predicated_break.cpp index 607715d..8aeaa24 100644 --- a/src/mesa/drivers/dri/i965/brw_predicated_break.cpp +++ b/src/mesa/drivers/dri/i965/brw_predicated_break.cpp @@ -21,7 +21,7 @@ * IN THE SOFTWARE. */ -#include "brw_cfg.h" +#include "brw_shader.h" using namespace brw; diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index 3f38dc1..81d3f96 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -27,9 +27,7 @@ #include "brw_reg.h" #include "brw_defines.h" #include "brw_context.h" -#include "brw_ir.h" - -struct cfg_t; +#include "brw_cfg.h" #ifdef __cplusplus #include "brw_ir_allocator.h" -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/30] i965/vec4: Restructure live intervals computation code.
This makes the structure of the vec4 live intervals calculation more similar to the FS back-end liveness analysis code. The non-CF-aware start/end computation is moved into the same pass that calculates the block-local def/use sets, which saves quite a bit of code, while the CF-aware start/end computation is moved into a separate compute_start_end() function as is done in the FS back-end. --- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 133 - .../drivers/dri/i965/brw_vec4_live_variables.h | 1 + 2 files changed, 54 insertions(+), 80 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index d2a7adf..7042371 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -30,6 +30,8 @@ using namespace brw; +#define MAX_INSTRUCTION (1 << 30) + /** @file brw_vec4_live_variables.cpp * * Support for computing at the basic block level which variables @@ -40,7 +42,7 @@ using namespace brw; */ /** - * Sets up the use[] and def[] arrays. + * Sets up the use/def arrays and block-local approximation of the live ranges. * * The basic-block-level live variable analysis needs to know which * variables get used before they're completely defined, and which @@ -73,13 +75,17 @@ vec4_live_variables::setup_def_use() foreach_inst_in_block(vec4_instruction, inst, block) { struct block_data *bd = &block_data[block->num]; -/* Set use[] for this instruction */ + /* Set up the instruction uses. */ for (unsigned int i = 0; i < 3; i++) { if (inst->src[i].file == VGRF) { for (unsigned j = 0; j < inst->regs_read(i); j++) { for (int c = 0; c < 4; c++) { const unsigned v = var_from_reg(alloc, offset(inst->src[i], j), c); + + start[v] = MIN2(start[v], ip); + end[v] = ip; + if (!BITSET_TEST(bd->def, v)) BITSET_SET(bd->use, v); } @@ -93,18 +99,23 @@ vec4_live_variables::setup_def_use() } } -/* Check for unconditional writes to whole registers. These - * are the things that screen off preceding definitions of a - * variable, and thus qualify for being in def[]. - */ -if (inst->dst.file == VGRF && -(!inst->predicate || inst->opcode == BRW_OPCODE_SEL)) { + /* Set up the instruction defs. */ + if (inst->dst.file == VGRF) { for (unsigned i = 0; i < inst->regs_written; i++) { for (int c = 0; c < 4; c++) { if (inst->dst.writemask & (1 << c)) { const unsigned v = var_from_reg(alloc, offset(inst->dst, i), c); - if (!BITSET_TEST(bd->use, v)) + + start[v] = MIN2(start[v], ip); + end[v] = ip; + + /* Check for unconditional register writes, these are the + * things that screen off preceding definitions of a + * variable, and thus qualify for being in def[]. + */ + if ((!inst->predicate || inst->opcode == BRW_OPCODE_SEL) && + !BITSET_TEST(bd->use, v)) BITSET_SET(bd->def, v); } } @@ -182,6 +193,30 @@ vec4_live_variables::compute_live_variables() } } +/** + * Extend the start/end ranges for each variable to account for the + * new information calculated from control flow. + */ +void +vec4_live_variables::compute_start_end() +{ + foreach_block (block, cfg) { + const struct block_data &bd = block_data[block->num]; + + for (int i = 0; i < num_vars; i++) { + if (BITSET_TEST(bd.livein, i)) { +start[i] = MIN2(start[i], block->start_ip); +end[i] = MAX2(end[i], block->start_ip); + } + + if (BITSET_TEST(bd.liveout, i)) { +start[i] = MIN2(start[i], block->end_ip); +end[i] = MAX2(end[i], block->end_ip); + } + } + } +} + vec4_live_variables::vec4_live_variables(const simple_allocator &alloc, cfg_t *cfg) : alloc(alloc), cfg(cfg) @@ -189,6 +224,14 @@ vec4_live_variables::vec4_live_variables(const simple_allocator &alloc, mem_ctx = ralloc_context(NULL); num_vars = alloc.total_size * 4; + start = ralloc_array(mem_ctx, int, num_vars); + end = ralloc_array(mem_ctx, int, num_vars); + + for (int i = 0; i < num_vars; i++) { + start[i] = MAX_INSTRUCTION; + end[i] = -1; + } + block_data = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks); bitset_words = BITSET_WORDS(num_vars); @@ -206,6 +249,7 @@ vec4_live_var
[Mesa-dev] [PATCH 22/30] i965/vec4: Add live interval validation pass.
This could be improved somewhat with additional validation of the calculated live in/out sets and by checking that the calculated live intervals are minimal (which isn't strictly necessary to guarantee the correctness of the program). This should be good enough though to catch accidental use of stale liveness results due to missing or incorrect analysis invalidation. --- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 43 ++ .../drivers/dri/i965/brw_vec4_live_variables.h | 3 ++ 2 files changed, 46 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 297d027..a7c991f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -294,6 +294,49 @@ vec4_visitor::invalidate_live_intervals() live_intervals = NULL; } +static bool +check_register_live_range(const vec4_live_variables *live, int ip, + unsigned var, unsigned n) +{ + for (unsigned j = 0; j < n; j += 4) { + if (var + j >= unsigned(live->num_vars) || + live->start[var + j] > ip || live->end[var + j] < ip) + return false; + } + + return true; +} + +bool +vec4_live_variables::validate(const backend_shader *s) const +{ + unsigned ip = 0; + + foreach_block_and_inst(block, vec4_instruction, inst, s->cfg) { + for (unsigned c = 0; c < 4; c++) { + if (inst->dst.writemask & (1 << c)) { +for (unsigned i = 0; i < 3; i++) { + if (inst->src[i].file == VGRF && + !check_register_live_range(this, ip, + var_from_reg(alloc, inst->src[i], c), + inst->regs_read(i))) + return false; +} + +if (inst->dst.file == VGRF && +!check_register_live_range(this, ip, + var_from_reg(alloc, inst->dst, c), + inst->regs_written)) + return false; + } + } + + ip++; + } + + return true; +} + int vec4_live_variables::var_range_start(unsigned v, unsigned n) const { diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h index 7384f04..f6cefa0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h @@ -68,6 +68,9 @@ public: vec4_live_variables(const backend_shader *s); ~vec4_live_variables(); + bool + validate(const backend_shader *s) const; + int num_vars; int bitset_words; -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/30] i965/vec4: Move all live interval analysis results into vec4_live_variables.
This moves the following methods that are currently defined in vec4_visitor (even though they are side products of the liveness analysis computation) and are already implemented in brw_vec4_live_variables.cpp: > int var_range_start(unsigned v, unsigned n) const; > int var_range_end(unsigned v, unsigned n) const; > bool virtual_grf_interferes(int a, int b) const; > int *virtual_grf_start; > int *virtual_grf_end; It makes sense for them to be part of the vec4_live_variables object, because they have the same lifetime as other liveness analysis results and because this will allow some extra validation to happen wherever they are accessed in order to make sure that we only ever use up-to-date liveness analysis results. The naming of the virtual_grf_start/end arrays was rather misleading, they were indexed by variable rather than by vgrf, this renames them start/end to match the FS liveness analysis pass. The churn in the definition of var_range_start/end is just in order to avoid a collision between the start/end arrays and local variables declared with the same name. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +- src/mesa/drivers/dri/i965/brw_vec4.h | 5 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 3 +- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 32 +- .../drivers/dri/i965/brw_vec4_live_variables.h | 11 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 -- 7 files changed, 35 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index cf5b114..7c2c9cc 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1061,7 +1061,8 @@ vec4_visitor::opt_register_coalesce() /* Can't coalesce this GRF if someone else was going to * read it later. */ - if (var_range_end(var_from_reg(alloc, inst->src[0]), 4) > ip) + if (live_intervals->var_range_end( + var_from_reg(alloc, inst->src[0]), 4) > ip) continue; /* We need to check interference with the final destination between this diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index bb0329f0..35bd265 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -101,8 +101,6 @@ public: int first_non_payload_grf; unsigned int max_grf; - int *virtual_grf_start; - int *virtual_grf_end; brw::vec4_live_variables *live_intervals; dst_reg userplane[MAX_CLIP_PLANES]; @@ -141,9 +139,6 @@ public: bool opt_vector_float(); bool opt_reduce_swizzle(); bool dead_code_eliminate(); - int var_range_start(unsigned v, unsigned n) const; - int var_range_end(unsigned v, unsigned n) const; - bool virtual_grf_interferes(int a, int b) const; bool opt_cmod_propagation(); bool opt_copy_propagation(bool do_constant_prop = true); bool opt_cse_local(bblock_t *block); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 38dd3ed..3a1dfda 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp @@ -246,7 +246,8 @@ vec4_visitor::opt_cse_local(bblock_t *block) * more -- a sure sign they'll fail operands_match(). */ if (src->file == VGRF) { - if (var_range_end(var_from_reg(alloc, *src), 4) < ip) { + if (live_intervals->var_range_end( + var_from_reg(alloc, *src), 4) < ip) { entry->remove(); ralloc_free(entry); break; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index ecf5dee..d2a7adf 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -240,10 +240,6 @@ vec4_visitor::calculate_live_intervals() int *start = ralloc_array(mem_ctx, int, this->alloc.total_size * 4); int *end = ralloc_array(mem_ctx, int, this->alloc.total_size * 4); - ralloc_free(this->virtual_grf_start); - ralloc_free(this->virtual_grf_end); - this->virtual_grf_start = start; - this->virtual_grf_end = end; for (unsigned i = 0; i < this->alloc.total_size * 4; i++) { start[i] = MAX_INSTRUCTION; @@ -290,6 +286,11 @@ vec4_visitor::calculate_live_intervals() * this point we're distilling it down to vgrfs. */ this->live_intervals = new(mem_ctx) vec4_live_variables(alloc, cfg); + /* XXX -- This belongs in the constructor of vec4_live_variables, will be +* cleaned up later. +*/ + this->live_intervals->start = start; + this->live_intervals->end = end; foreach_block (block, cfg) { const struct vec4_live_variables::block_data *bd =
[Mesa-dev] [PATCH 10/30] i965/vec4: Reverse inclusion dependency between brw_vec4_live_variables.h and brw_vec4.h.
brw_vec4.h (in particular vec4_visitor) is logically a user of the live variables analysis pass, not the other way around. brw_vec4_live_variables.h requires the definition of some VEC4 IR data structures to compile, but those can be obtained directly from brw_ir_vec4.h without including brw_vec4.h. --- src/mesa/drivers/dri/i965/brw_vec4.cpp| 1 - src/mesa/drivers/dri/i965/brw_vec4.h | 4 +--- src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_live_variables.h | 3 +-- 4 files changed, 3 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index baf72a2..a3d1f7c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -27,7 +27,6 @@ #include "brw_vs.h" #include "brw_nir.h" #include "brw_vec4_builder.h" -#include "brw_vec4_live_variables.h" #include "brw_dead_control_flow.h" #include "program/prog_parameter.h" diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index d43a5a8..b56b35d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -29,6 +29,7 @@ #ifdef __cplusplus #include "brw_ir_vec4.h" +#include "brw_vec4_live_variables.h" #endif #include "compiler/glsl/ir.h" @@ -52,9 +53,6 @@ brw_vec4_generate_assembly(const struct brw_compiler *compiler, } /* extern "C" */ namespace brw { - -class vec4_live_variables; - /** * The vertex shader front-end. * diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index b4476ef..8e1efa4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -25,7 +25,7 @@ * */ -#include "brw_cfg.h" +#include "brw_vec4.h" #include "brw_vec4_live_variables.h" using namespace brw; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h index ae8e406..a6e04b2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h @@ -28,12 +28,11 @@ #ifndef BRW_VEC4_LIVE_VARIABLES_H #define BRW_VEC4_LIVE_VARIABLES_H +#include "brw_ir_vec4.h" #include "util/bitset.h" -#include "brw_vec4.h" namespace brw { - class vec4_live_variables { public: struct block_data { -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/30] i965/ir: Nest definition of live variables block_data structures.
These two structures have exactly the same name which prevents the two files from being included at the same time and could cause serious trouble in the future if it ever leads to a (silent) violation of the C++ one definition rule. --- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 52 +++--- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 3 +- .../drivers/dri/i965/brw_vec4_live_variables.h | 51 ++--- 3 files changed, 54 insertions(+), 52 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index 96cadea..882315a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -32,34 +32,34 @@ struct cfg_t; namespace brw { -struct block_data { - /** -* Which variables are defined before being used in the block. -* -* Note that for our purposes, "defined" means unconditionally, completely -* defined. -*/ - BITSET_WORD *def; - - /** -* Which variables are used before being defined in the block. -*/ - BITSET_WORD *use; - - /** Which defs reach the entry point of the block. */ - BITSET_WORD *livein; - - /** Which defs reach the exit point of the block. */ - BITSET_WORD *liveout; - - BITSET_WORD flag_def[1]; - BITSET_WORD flag_use[1]; - BITSET_WORD flag_livein[1]; - BITSET_WORD flag_liveout[1]; -}; - class fs_live_variables { public: + struct block_data { + /** + * Which variables are defined before being used in the block. + * + * Note that for our purposes, "defined" means unconditionally, completely + * defined. + */ + BITSET_WORD *def; + + /** + * Which variables are used before being defined in the block. + */ + BITSET_WORD *use; + + /** Which defs reach the entry point of the block. */ + BITSET_WORD *livein; + + /** Which defs reach the exit point of the block. */ + BITSET_WORD *liveout; + + BITSET_WORD flag_def[1]; + BITSET_WORD flag_use[1]; + BITSET_WORD flag_livein[1]; + BITSET_WORD flag_liveout[1]; + }; + DECLARE_RALLOC_CXX_OPERATORS(fs_live_variables) fs_live_variables(fs_visitor *v, const cfg_t *cfg); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 57d5fbb..b4476ef 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -292,7 +292,8 @@ vec4_visitor::calculate_live_intervals() this->live_intervals = new(mem_ctx) vec4_live_variables(alloc, cfg); foreach_block (block, cfg) { - struct block_data *bd = &live_intervals->block_data[block->num]; + const struct vec4_live_variables::block_data *bd = + &live_intervals->block_data[block->num]; for (int i = 0; i < live_intervals->num_vars; i++) { if (BITSET_TEST(bd->livein, i)) { diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h index 12d281e..f55bed7 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h @@ -30,34 +30,35 @@ namespace brw { -struct block_data { - /** -* Which variables are defined before being used in the block. -* -* Note that for our purposes, "defined" means unconditionally, completely -* defined. -*/ - BITSET_WORD *def; - - /** -* Which variables are used before being defined in the block. -*/ - BITSET_WORD *use; - - /** Which defs reach the entry point of the block. */ - BITSET_WORD *livein; - - /** Which defs reach the exit point of the block. */ - BITSET_WORD *liveout; - - BITSET_WORD flag_def[1]; - BITSET_WORD flag_use[1]; - BITSET_WORD flag_livein[1]; - BITSET_WORD flag_liveout[1]; -}; class vec4_live_variables { public: + struct block_data { + /** + * Which variables are defined before being used in the block. + * + * Note that for our purposes, "defined" means unconditionally, completely + * defined. + */ + BITSET_WORD *def; + + /** + * Which variables are used before being defined in the block. + */ + BITSET_WORD *use; + + /** Which defs reach the entry point of the block. */ + BITSET_WORD *livein; + + /** Which defs reach the exit point of the block. */ + BITSET_WORD *liveout; + + BITSET_WORD flag_def[1]; + BITSET_WORD flag_use[1]; + BITSET_WORD flag_livein[1]; + BITSET_WORD flag_liveout[1]; + }; + DECLARE_RALLOC_CXX_OPERATORS(vec4_live_variables) vec4_live_variables(const simple_allocator &alloc, cfg_t *cfg); -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/m
[Mesa-dev] [PATCH 02/30] i965/fs: Add missing analysis invalidation in fixup_3src_null_dest().
Bug found by the liveness analysis validation pass that will be introduced in a later commit. fixup_3src_null_dest() was allocating registers which makes the cached liveness analysis calculation incomplete, so it must be invalidated. Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 42bc5e2..86d2bd9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -5190,12 +5190,18 @@ fs_visitor::optimize() void fs_visitor::fixup_3src_null_dest() { + bool progress = false; + foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { if (inst->is_3src() && inst->dst.is_null()) { inst->dst = fs_reg(VGRF, alloc.allocate(dispatch_width / 8), inst->dst.type); + progress = true; } } + + if (progress) + invalidate_live_intervals(); } void -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/30] i965: IR analysis pass framework.
The purpose of this series is to introduce some lightweight infrastructure intended to make room for additional analysis passes in the i965 back-end without increasing the clutter of the visitor objects, to reduce duplication of logic between analysis passes, and to prevent some mistakes that are fairly difficult to avoid under the current (rather informal) approach. Patches 1-3 of this series fix some long-standing bugs that were uncovered by the additional validation done by the analysis pass framework. The bugs could be hit during either shader-db or piglit runs. All the three caused us to use inconsistent live interval metadata in some cases and could potentially lead to program miscompilation. I don't know whether there are any real-life workloads where it actually caused corruption, in doubt it seems sensible to back-port them to stable branches. Patches 5-10 is mostly trivial header file churn intended to clean things up and prevent circular header dependencies in cases where the header inclusion order was the opposite of their logical dependency order. Patches 11-13 is the core of this series. They introduce the general framework that will be put to use in the following patches. Patch 14 instruments the whole back-end with analysis metadata invalidation and might be one of the more difficult to review. On the bright side it's no longer a particularly critical thing to get right, because any mistake in the invalidation instrumentation will eventually trigger an assertion failure if it leads to a situation where we attempt to use analysis metadata that has become inconsistent with the program, as demonstrated in patches 1-3. The rest of the series is mainly about converting existing analysis passes to the new framework. Patches 16-25 port the VEC4 and FS liveness analysis passes, patches 26-27 port the dominance analysis pass and patch 29 ports the register pressure calculation in the FS back-end. For a branch in testable form see: https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-ir-analysis Enjoy. [PATCH 01/30] i965/fs: Add missing analysis invalidation in opt_sampler_eot(). [PATCH 02/30] i965/fs: Add missing analysis invalidation in fixup_3src_null_dest(). [PATCH 03/30] i965/vec4: Consider removal of no-op MOVs as progress during register coalesce. [PATCH 04/30] i965/fs: Restrict inequality that can only hold equal in saturate propagation. [PATCH 05/30] i965/ir: Move base IR definitions into a separate header file. [PATCH 06/30] i965/ir: Reverse inclusion dependency between brw_cfg.h and brw_shader.h. [PATCH 07/30] i965/ir: Nest definition of live variables block_data structures. [PATCH 08/30] i965/ir: Add include guards to the live variables header files. [PATCH 09/30] i965/fs: Reverse inclusion dependency between brw_fs_live_variables.h and brw_fs.h. [PATCH 10/30] i965/vec4: Reverse inclusion dependency between brw_vec4_live_variables.h and brw_vec4.h. [PATCH 11/30] i965/ir: Introduce simple IR analysis pass framework. [PATCH 12/30] i965/ir: Introduce backend_shader method to propagate IR changes to analysis passes. [PATCH 13/30] i965/ir: Define more detailed analysis dependency classes. [PATCH 14/30] i965/ir: Pass detailed dependency classes to invalidate_analysis(). [PATCH 15/30] i965/ir: Mark virtual_grf_interferes and vars_interfere as const. [PATCH 16/30] i965/fs: Move all live interval analysis results into fs_live_variables. [PATCH 17/30] i965/vec4: Move all live interval analysis results into vec4_live_variables. [PATCH 18/30] i965/vec4: Restructure live intervals computation code. [PATCH 19/30] i965/fs: Pass single backend_shader argument to the fs_live_variables constructor. [PATCH 20/30] i965/vec4: Pass single backend_shader argument to the vec4_live_variables constructor. [PATCH 21/30] i965/fs: Add live interval validation pass. [PATCH 22/30] i965/vec4: Add live interval validation pass. [PATCH 23/30] i965/fs: Switch liveness analysis to IR analysis framework. [PATCH 24/30] i965/vec4: Switch liveness analysis to IR analysis framework. [PATCH 25/30] i965/ir: Drop invalidate_live_intervals(). [PATCH 26/30] i965/ir: Move idom tree calculation and related logic into analysis object. [PATCH 27/30] i965/ir: Move dominance tree data structure into idom_tree object. [PATCH 28/30] i965/ir: Simplify new_idom reduction in dominance tree calculation. [PATCH 29/30] i965/fs: Move register pressure calculation into IR analysis object. [PATCH 30/30] i965/fs: Calculate num_instructions in O(1) during register pressure calculation. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/30] i965/vec4: Pass single backend_shader argument to the vec4_live_variables constructor.
The IR analysis framework requires the analysis result to be constructible with a single argument. --- src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 7 +++ src/mesa/drivers/dri/i965/brw_vec4_live_variables.h | 4 +++- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 7042371..297d027 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -217,9 +217,8 @@ vec4_live_variables::compute_start_end() } } -vec4_live_variables::vec4_live_variables(const simple_allocator &alloc, - cfg_t *cfg) - : alloc(alloc), cfg(cfg) +vec4_live_variables::vec4_live_variables(const backend_shader *s) + : alloc(s->alloc), cfg(s->cfg) { mem_ctx = ralloc_context(NULL); @@ -285,7 +284,7 @@ vec4_visitor::calculate_live_intervals() * The control flow-aware analysis was done at a channel level, while at * this point we're distilling it down to vgrfs. */ - this->live_intervals = new(mem_ctx) vec4_live_variables(alloc, cfg); + this->live_intervals = new(mem_ctx) vec4_live_variables(this); } void diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h index f7a6932..7384f04 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h @@ -31,6 +31,8 @@ #include "brw_ir_vec4.h" #include "util/bitset.h" +struct backend_shader; + namespace brw { class vec4_live_variables { @@ -63,7 +65,7 @@ public: DECLARE_RALLOC_CXX_OPERATORS(vec4_live_variables) - vec4_live_variables(const simple_allocator &alloc, cfg_t *cfg); + vec4_live_variables(const backend_shader *s); ~vec4_live_variables(); int num_vars; -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/30] i965/ir: Add include guards to the live variables header files.
--- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 5 + src/mesa/drivers/dri/i965/brw_vec4_live_variables.h | 5 + 2 files changed, 10 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index 882315a..a5c1764 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -25,6 +25,9 @@ * */ +#ifndef BRW_FS_LIVE_VARIABLES_H +#define BRW_FS_LIVE_VARIABLES_H + #include "brw_fs.h" #include "util/bitset.h" @@ -113,3 +116,5 @@ protected: }; } /* namespace brw */ + +#endif diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h index f55bed7..ae8e406 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h @@ -25,6 +25,9 @@ * */ +#ifndef BRW_VEC4_LIVE_VARIABLES_H +#define BRW_VEC4_LIVE_VARIABLES_H + #include "util/bitset.h" #include "brw_vec4.h" @@ -99,3 +102,5 @@ var_from_reg(const simple_allocator &alloc, const dst_reg ®, } } /* namespace brw */ + +#endif -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/30] i965/ir: Pass detailed dependency classes to invalidate_analysis().
Have fun reading through the whole back-end optimizer to verify whether I've missed any dependency flags -- Or alternatively, just trust that any mistake here will trigger an assertion failure during analysis pass validation if it ever poses a problem for the consistency of any of the analysis passes managed by the framework. --- .../drivers/dri/i965/brw_dead_control_flow.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 41 +++--- .../drivers/dri/i965/brw_fs_cmod_propagation.cpp | 2 +- .../drivers/dri/i965/brw_fs_combine_constants.cpp | 2 +- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 3 +- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +- .../dri/i965/brw_fs_dead_code_eliminate.cpp| 2 +- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp | 2 +- src/mesa/drivers/dri/i965/brw_predicated_break.cpp | 2 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 4 +-- src/mesa/drivers/dri/i965/brw_vec4.cpp | 15 .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 2 +- .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 2 +- .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 2 +- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 2 +- 18 files changed, 48 insertions(+), 44 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp b/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp index 65e009c..75c7be3 100644 --- a/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp +++ b/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp @@ -111,7 +111,7 @@ dead_control_flow_eliminate(backend_shader *s) } if (progress) - s->invalidate_analysis(DEPENDENCY_EVERYTHING); + s->invalidate_analysis(DEPENDENCY_BLOCKS | DEPENDENCY_INSTRUCTIONS); return progress; } diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 60924b6..9183d1f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1843,7 +1843,7 @@ fs_visitor::split_virtual_grfs() } } } - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL | DEPENDENCY_VARIABLES); } /** @@ -1884,7 +1884,7 @@ fs_visitor::compact_virtual_grfs() } else { remap_table[i] = new_index; alloc.sizes[new_index] = alloc.sizes[i]; - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL | DEPENDENCY_VARIABLES); ++new_index; } } @@ -2088,7 +2088,7 @@ fs_visitor::demote_pull_constants() inst->src[i].reg_offset = 0; } } - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTIONS | DEPENDENCY_VARIABLES); } bool @@ -2316,7 +2316,8 @@ fs_visitor::opt_algebraic() } if (progress) - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTION_DATA_FLOW | + DEPENDENCY_INSTRUCTION_DETAIL); return progress; } @@ -2367,7 +2368,7 @@ fs_visitor::opt_zero_samples() } if (progress) - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL); return progress; } @@ -2446,7 +2447,7 @@ fs_visitor::opt_sampler_eot() * the instruction that this will replace. */ if (tex_inst->header_size != 0) { - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTIONS); return true; } @@ -2479,7 +2480,7 @@ fs_visitor::opt_sampler_eot() tex_inst->insert_before(cfg->blocks[cfg->num_blocks - 1], new_load_payload); tex_inst->src[0] = send_header; - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTIONS | DEPENDENCY_VARIABLES); return true; } @@ -2532,7 +2533,7 @@ fs_visitor::opt_register_renaming() } if (progress) { - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL | DEPENDENCY_VARIABLES); for (unsigned i = 0; i < ARRAY_SIZE(delta_xy); i++) { if (delta_xy[i].file == VGRF && remap[delta_xy[i].nr] != -1) { @@ -2580,7 +2581,7 @@ fs_visitor::opt_redundant_discard_jumps() } if (progress) - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTIONS); return progress; } @@ -2740,7 +2741,7 @@ fs_visitor::compute_to_mrf() } if (progress) - invalidate_analysis(DEPENDENCY_EVERYTHING); + invalidate_analysis(DEPENDENCY_INSTRUCTIONS); return progress; } @@ -2790,7 +2791,7 @@ fs_visitor::eliminate_find_live_channel() } if (progress) - invalidate_analysis(DE
[Mesa-dev] [PATCH 12/30] i965/ir: Introduce backend_shader method to propagate IR changes to analysis passes.
The invalidate_analysis() method knows what analysis passes there are in the back-end and calls their invalidate() method to report changes in the IR. For the moment it just calls invalidate_live_intervals() (which will eventually be fully replaced by this function) if anything changed. This makes all optimization passes invalidate DEPENDENCY_EVERYTHING, which is clearly far from ideal -- The dependency classes passed to invalidate_analysis() will be refined in a future commit. --- .../drivers/dri/i965/brw_dead_control_flow.cpp | 4 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + .../drivers/dri/i965/brw_fs_cmod_propagation.cpp | 4 +- .../drivers/dri/i965/brw_fs_combine_constants.cpp | 2 +- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 4 +- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +- .../dri/i965/brw_fs_dead_code_eliminate.cpp| 4 +- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 4 +- src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp | 2 +- src/mesa/drivers/dri/i965/brw_predicated_break.cpp | 2 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 4 +- src/mesa/drivers/dri/i965/brw_shader.cpp | 7 src/mesa/drivers/dri/i965/brw_shader.h | 2 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 +++--- src/mesa/drivers/dri/i965/brw_vec4.h | 1 + .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 2 +- .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 2 +- .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 2 +- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 2 +- 22 files changed, 84 insertions(+), 41 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp b/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp index 2c1abaf..65e009c 100644 --- a/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp +++ b/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp @@ -29,6 +29,8 @@ #include "brw_shader.h" #include "brw_cfg.h" +using namespace brw; + /* Look for and eliminate dead control flow: * * - if/endif @@ -109,7 +111,7 @@ dead_control_flow_eliminate(backend_shader *s) } if (progress) - s->invalidate_live_intervals(); + s->invalidate_analysis(DEPENDENCY_EVERYTHING); return progress; } diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 86d2bd9..60924b6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1843,7 +1843,7 @@ fs_visitor::split_virtual_grfs() } } } - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); } /** @@ -1884,7 +1884,7 @@ fs_visitor::compact_virtual_grfs() } else { remap_table[i] = new_index; alloc.sizes[new_index] = alloc.sizes[i]; - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); ++new_index; } } @@ -2088,7 +2088,7 @@ fs_visitor::demote_pull_constants() inst->src[i].reg_offset = 0; } } - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); } bool @@ -2314,6 +2314,10 @@ fs_visitor::opt_algebraic() } } } + + if (progress) + invalidate_analysis(DEPENDENCY_EVERYTHING); + return progress; } @@ -2363,7 +2367,7 @@ fs_visitor::opt_zero_samples() } if (progress) - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); return progress; } @@ -2442,7 +2446,7 @@ fs_visitor::opt_sampler_eot() * the instruction that this will replace. */ if (tex_inst->header_size != 0) { - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); return true; } @@ -2475,7 +2479,7 @@ fs_visitor::opt_sampler_eot() tex_inst->insert_before(cfg->blocks[cfg->num_blocks - 1], new_load_payload); tex_inst->src[0] = send_header; - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); return true; } @@ -2528,7 +2532,7 @@ fs_visitor::opt_register_renaming() } if (progress) { - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); for (unsigned i = 0; i < ARRAY_SIZE(delta_xy); i++) { if (delta_xy[i].file == VGRF && remap[delta_xy[i].nr] != -1) { @@ -2576,7 +2580,7 @@ fs_visitor::opt_redundant_discard_jumps() } if (progress) - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); return progress; } @@ -2736,7 +2740,7 @@ fs_visitor::compute_to_mrf() } if (progress) - invalidate_live_intervals(); + invalidate_analysis(DEPENDENCY_EVERYTHING); return progr
[Mesa-dev] [PATCH 24/30] i965/vec4: Switch liveness analysis to IR analysis framework.
This involves wrapping vec4_live_variables in a BRW_ANALYSIS object and hooking it up to invalidate_analysis() so it's properly invalidated. Seems like a lot of churn but it's fairly straightforward. The vec4_visitor invalidate_ and calculate_live_intervals() methods are no longer necessary after this change. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 7 ++--- src/mesa/drivers/dri/i965/brw_vec4.h | 6 ++-- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 10 +++--- .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 10 +++--- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 36 ++ .../drivers/dri/i965/brw_vec4_live_variables.h | 11 +-- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 5 ++- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 +- 8 files changed, 29 insertions(+), 59 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 7c2c9cc..6695ddb 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1018,8 +1018,7 @@ vec4_visitor::opt_register_coalesce() { bool progress = false; int next_ip = 0; - - calculate_live_intervals(); + const vec4_live_variables &live = live_analysis.require(); foreach_block_and_inst_safe (block, vec4_instruction, inst, cfg) { int ip = next_ip; @@ -1061,8 +1060,7 @@ vec4_visitor::opt_register_coalesce() /* Can't coalesce this GRF if someone else was going to * read it later. */ - if (live_intervals->var_range_end( - var_from_reg(alloc, inst->src[0]), 4) > ip) + if (live.var_range_end(var_from_reg(alloc, inst->src[0]), 4) > ip) continue; /* We need to check interference with the final destination between this @@ -1878,6 +1876,7 @@ void vec4_visitor::invalidate_analysis(brw::analysis_dependency_class c) { backend_shader::invalidate_analysis(c); + live_analysis.invalidate(c); } bool diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 35bd265..c95e741 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -101,7 +101,8 @@ public: int first_non_payload_grf; unsigned int max_grf; - brw::vec4_live_variables *live_intervals; + BRW_ANALYSIS(live_analysis, brw::vec4_live_variables, +backend_shader *) live_analysis; dst_reg userplane[MAX_CLIP_PLANES]; bool need_all_constants_in_pull_buffer; @@ -132,7 +133,6 @@ public: void move_push_constants_to_pull_constants(); void split_uniform_registers(); void pack_uniform_registers(); - void calculate_live_intervals(); void invalidate_live_intervals(); virtual void invalidate_analysis(brw::analysis_dependency_class c); void split_virtual_grfs(); @@ -141,7 +141,7 @@ public: bool dead_code_eliminate(); bool opt_cmod_propagation(); bool opt_copy_propagation(bool do_constant_prop = true); - bool opt_cse_local(bblock_t *block); + bool opt_cse_local(bblock_t *block, const vec4_live_variables &live); bool opt_cse(); bool opt_algebraic(); bool opt_register_coalesce(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 3a1dfda..04e03ab 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp @@ -134,7 +134,7 @@ instructions_match(vec4_instruction *a, vec4_instruction *b) } bool -vec4_visitor::opt_cse_local(bblock_t *block) +vec4_visitor::opt_cse_local(bblock_t *block, const vec4_live_variables &live) { bool progress = false; exec_list aeb; @@ -246,8 +246,7 @@ vec4_visitor::opt_cse_local(bblock_t *block) * more -- a sure sign they'll fail operands_match(). */ if (src->file == VGRF) { - if (live_intervals->var_range_end( - var_from_reg(alloc, *src), 4) < ip) { + if (live.var_range_end(var_from_reg(alloc, *src), 4) < ip) { entry->remove(); ralloc_free(entry); break; @@ -268,11 +267,10 @@ bool vec4_visitor::opt_cse() { bool progress = false; - - calculate_live_intervals(); + const vec4_live_variables &live = live_analysis.require(); foreach_block (block, cfg) { - progress = opt_cse_local(block) || progress; + progress = opt_cse_local(block, live) || progress; } if (progress) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp index eefb48a..113b253 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp @@ -69,17 +69,15 @@ bool vec4_visitor::dead_code_eliminate() { bool progress = false; - - calculate_live_intervals(); - - int num_var
[Mesa-dev] [PATCH 04/30] i965/fs: Restrict inequality that can only hold equal in saturate propagation.
Should have no functional change. The IP value of an instruction that reads src_var cannot possibly be after the end of the live interval of the variable it's reading from, by the definition of live interval. Might save future readers a momentary WTF while trying to understand this code. --- src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp index dc2b0c8..f59fdbd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp @@ -73,7 +73,7 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t *block) if (scan_inst->saturate) { inst->saturate = false; progress = true; -} else if (src_end_ip <= ip || inst->dst.equals(inst->src[0])) { +} else if (src_end_ip == ip || inst->dst.equals(inst->src[0])) { if (scan_inst->can_do_saturate()) { if (scan_inst->dst.type != inst->dst.type) { scan_inst->dst.type = inst->dst.type; -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/30] i965/fs: Reverse inclusion dependency between brw_fs_live_variables.h and brw_fs.h.
brw_fs.h (in particular fs_visitor) is logically a user of the live variables analysis pass, not the other way around. brw_fs_live_variables.h requires the definition of some FS IR data structures to compile, but those can be obtained directly from brw_ir_fs.h without including brw_fs.h. The dependency of fs_live_variables on fs_visitor is rather accidental and will be removed in a future commit, a forward declaration is enough for the moment. --- src/mesa/drivers/dri/i965/brw_fs.h | 2 +- src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d4acc87..3448275 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -30,6 +30,7 @@ #include "brw_shader.h" #include "brw_ir_fs.h" #include "brw_fs_builder.h" +#include "brw_fs_live_variables.h" #include "compiler/glsl/ir.h" #include "compiler/nir/nir.h" @@ -39,7 +40,6 @@ namespace { } namespace brw { - class fs_live_variables; } struct brw_gs_compile; diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 66b70a9..4ebefe4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -25,7 +25,7 @@ * */ -#include "brw_cfg.h" +#include "brw_fs.h" #include "brw_fs_live_variables.h" using namespace brw; diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index a5c1764..6b49cfc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -28,10 +28,11 @@ #ifndef BRW_FS_LIVE_VARIABLES_H #define BRW_FS_LIVE_VARIABLES_H -#include "brw_fs.h" +#include "brw_ir_fs.h" #include "util/bitset.h" struct cfg_t; +class fs_visitor; namespace brw { -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 27/30] i965/ir: Move dominance tree data structure into idom_tree object.
It makes sense to keep the result of analysis passes independent from the IR itself. Instead of representing the idom tree as a pointer in each basic block pointing to its immediate dominator, the whole dominator tree is represented separately from the IR as an array of pointers inside the idom_tree object. This has the advantage that it's no longer possible to use stale dominance results by accident without having called require() beforehand, which makes sure that the idom tree is recalculated if necessary. --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 51 ++- src/mesa/drivers/dri/i965/brw_cfg.h | 22 +-- 2 files changed, 47 insertions(+), 26 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index f586dd0..f951f44 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -54,7 +54,7 @@ link(void *mem_ctx, bblock_t *block) } bblock_t::bblock_t(cfg_t *cfg) : - cfg(cfg), idom(NULL), start_ip(0), end_ip(0), num(0) + cfg(cfg), start_ip(0), end_ip(0), num(0) { instructions.make_empty(); parents.make_empty(); @@ -426,8 +426,9 @@ cfg_t::dump(backend_shader *s) const idom_tree *idom = (s ? &s->idom_analysis.require() : NULL); foreach_block (block, this) { - if (block->idom) - fprintf(stderr, "START B%d IDOM(B%d)", block->num, block->idom->num); + if (idom && idom->parent(block)) + fprintf(stderr, "START B%d IDOM(B%d)", block->num, + idom->parent(block)->num); else fprintf(stderr, "START B%d IDOM(none)", block->num); @@ -455,14 +456,14 @@ cfg_t::dump(backend_shader *s) * (less than 1000 nodes) that this algorithm is significantly faster than * others like Lengauer-Tarjan. */ -idom_tree::idom_tree(const backend_shader *s) +idom_tree::idom_tree(const backend_shader *s) : + num_parents(s->cfg->num_blocks), + parents(new bblock_t *[num_parents]()) { - foreach_block(block, s->cfg) { - block->idom = NULL; - } - s->cfg->blocks[0]->idom = s->cfg->blocks[0]; - bool changed; + + parents[0] = s->cfg->blocks[0]; + do { changed = false; @@ -471,24 +472,29 @@ idom_tree::idom_tree(const backend_shader *s) continue; bblock_t *new_idom = NULL; - foreach_list_typed(bblock_link, parent, link, &block->parents) { -if (parent->block->idom) { + foreach_list_typed(bblock_link, parent_link, link, &block->parents) { +if (parent(parent_link->block)) { if (new_idom == NULL) { - new_idom = parent->block; - } else if (parent->block->idom != NULL) { - new_idom = intersect(parent->block, new_idom); + new_idom = parent_link->block; + } else if (parent(parent_link->block) != NULL) { + new_idom = intersect(parent_link->block, new_idom); } } } - if (block->idom != new_idom) { -block->idom = new_idom; + if (parent(block) != new_idom) { +parents[block->num] = new_idom; changed = true; } } } while (changed); } +idom_tree::~idom_tree() +{ + delete parents; +} + bblock_t * idom_tree::intersect(bblock_t *b1, bblock_t *b2) const { @@ -498,23 +504,20 @@ idom_tree::intersect(bblock_t *b1, bblock_t *b2) const */ while (b1->num != b2->num) { while (b1->num > b2->num) - b1 = b1->idom; + b1 = parent(b1); while (b2->num > b1->num) - b2 = b2->idom; + b2 = parent(b2); } assert(b1); return b1; } void -idom_tree::dump(const backend_shader *s) const +idom_tree::dump() const { printf("digraph DominanceTree {\n"); - foreach_block(block, s->cfg) { - if (block->idom) { - printf("\t%d -> %d\n", block->idom->num, block->num); - } - } + for (unsigned i = 0; i < num_parents; i++) + printf("\t%d -> %d\n", parents[i]->num, i); printf("}\n"); } diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index 1c9207c..fe6aa44 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -84,7 +84,6 @@ struct bblock_t { struct exec_node link; struct cfg_t *cfg; - struct bblock_t *idom; int start_ip; int end_ip; @@ -349,6 +348,7 @@ namespace brw { */ struct idom_tree { idom_tree(const backend_shader *s); + ~idom_tree(); bool validate(const backend_shader *) const @@ -363,11 +363,29 @@ namespace brw { return DEPENDENCY_BLOCKS; } + const bblock_t * + parent(const bblock_t *b) const + { + assert(unsigned(b->num) < num_parents); + return parents[b->num]; + } + + bblock_t * + parent(bblock_t *b) const + { + assert(unsigned
[Mesa-dev] [PATCH 25/30] i965/ir: Drop invalidate_live_intervals().
--- src/mesa/drivers/dri/i965/brw_fs.h| 1 - src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 8 src/mesa/drivers/dri/i965/brw_shader.cpp | 2 -- src/mesa/drivers/dri/i965/brw_shader.h| 1 - src/mesa/drivers/dri/i965/brw_vec4.h | 1 - src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 8 6 files changed, 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index ac5e8cf..7a1916d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -142,7 +142,6 @@ public: bool compact_virtual_grfs(); void assign_constant_locations(); void demote_pull_constants(); - void invalidate_live_intervals(); virtual void invalidate_analysis(brw::analysis_dependency_class c); void calculate_register_pressure(); void validate(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 0d5bef2..5c1e00b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -345,14 +345,6 @@ fs_live_variables::validate(const backend_shader *s) const return true; } -void -fs_visitor::invalidate_live_intervals() -{ - /* XXX -- Leave this around for the moment to keep the fs_vistor object -* concrete. -*/ -} - bool fs_live_variables::vars_interfere(int a, int b) const { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 5b05725..4d701d9 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -1049,8 +1049,6 @@ backend_shader::calculate_cfg() void backend_shader::invalidate_analysis(brw::analysis_dependency_class c) { - if (c) - invalidate_live_intervals(); } /** diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index 47e6f02..a244f48 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -83,7 +83,6 @@ public: void calculate_cfg(); - virtual void invalidate_live_intervals() = 0; virtual void invalidate_analysis(brw::analysis_dependency_class c); }; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index c95e741..a21eb48 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -133,7 +133,6 @@ public: void move_push_constants_to_pull_constants(); void split_uniform_registers(); void pack_uniform_registers(); - void invalidate_live_intervals(); virtual void invalidate_analysis(brw::analysis_dependency_class c); void split_virtual_grfs(); bool opt_vector_float(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 665cf58..03830ab 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -256,14 +256,6 @@ vec4_live_variables::~vec4_live_variables() ralloc_free(mem_ctx); } -void -vec4_visitor::invalidate_live_intervals() -{ - /* XXX -- Leave this around for the moment to keep the vec4_vistor object -* concrete. -*/ -} - static bool check_register_live_range(const vec4_live_variables *live, int ip, unsigned var, unsigned n) -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/30] i965/ir: Define more detailed analysis dependency classes.
I've deliberately separated this from the general analysis pass infrastructure in order to discuss it independently. The dependency classes defined here refer to state changes of several objects of the program IR, and are fully orthogonal and expected to change less often than the set of analysis passes present in the compiler back-end. The objective is to avoid unnecessary coupling between optimization and analysis passes in the back-end. By doing things in this way the set of flags to be passed to invalidate_analysis() can be determined from knowledge of a single optimization pass and a small set of well specified dependency classes alone -- IOW there is no need to audit all analysis passes to find out which ones might be affected by certain kind of program transformation performed by an optimization pass, as well as the converse, there is no need to audit all optimization passes when writing a new analysis pass to find out which ones can potentially invalidate the result of the analysis. The set of dependency classes defined here is rather conservative and mainly based on the requirements of the few analysis passes already part of the back-end. I've also used them without difficulty with a few additional analysis passes I've written but haven't yet sent for review. --- src/mesa/drivers/dri/i965/brw_ir_analysis.h | 46 + 1 file changed, 46 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_ir_analysis.h b/src/mesa/drivers/dri/i965/brw_ir_analysis.h index 31989ec..f008ac7 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_analysis.h +++ b/src/mesa/drivers/dri/i965/brw_ir_analysis.h @@ -37,6 +37,52 @@ namespace brw { */ DEPENDENCY_NOTHING = 0, /** + * The analysis depends on the set of instructions in the program and + * their naming. Note that because instructions are named sequentially + * by IP this implies a dependency on the control flow edges between + * instructions. This will be signaled whenever instructions are + * inserted, removed or reordered in the program. + */ + DEPENDENCY_INSTRUCTION_IDENTITY = 0x1, + /** + * The analysis is sensitive to the detailed semantics of instructions + * in the program, where "detailed" means any change in the instruction + * data structures other than the linked-list pointers (which are + * already covered by DEPENDENCY_INSTRUCTION_IDENTITY). E.g. changing + * the negate or abs flags of an instruction source would signal this + * flag alone because it would preserve all other instruction dependency + * classes. + */ + DEPENDENCY_INSTRUCTION_DETAIL = 0x2, + /** + * The analysis depends on the set of data flow edges between + * instructions. This will be signaled whenever the dataflow relation + * between instructions has potentially changed, e.g. when the VGRF + * index of an instruction source or destination changes (in which case + * it will appear in combination with DEPENDENCY_INSTRUCTION_DETAIL), or + * when data-dependent instructions are reordered (in which case it will + * appear in combination with DEPENDENCY_INSTRUCTION_IDENTITY). + */ + DEPENDENCY_INSTRUCTION_DATA_FLOW = 0x4, + /** + * The analysis depends on all instruction dependency classes. These + * will typically be signaled simultaneously when inserting or removing + * instructions in the program (or if you're feeling too lazy to read + * through your optimization pass to figure out which of the instruction + * dependency classes above it invalidates). + */ + DEPENDENCY_INSTRUCTIONS = 0x7, + /** + * The analysis depends on the set of VGRFs in the program and their + * naming. This will be signaled when VGRFs are allocated or released. + */ + DEPENDENCY_VARIABLES = 0x8, + /** + * The analysis depends on the set of basic blocks in the program, their + * control flow edges and naming. + */ + DEPENDENCY_BLOCKS = 0x10, + /** * The analysis depends on the program being literally the same (good * luck...), any change in the input invalidates previous analysis * computations. -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/30] i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.
Bug found by the liveness analysis validation pass that will be introduced in a later commit. The no-op MOV check in opt_register_coalesce() was removing instructions which makes the cached liveness analysis calculation inconsistent with the shader IR. We were failing to set progress to true in that case though, which means that invalidate_live_intervals() wouldn't necessarily be called at the end of the function. Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index cf62ed9..baf72a2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1051,6 +1051,7 @@ vec4_visitor::opt_register_coalesce() if (is_nop_mov) { inst->remove(block); +progress = true; continue; } } -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 26/30] i965/ir: Move idom tree calculation and related logic into analysis object.
This only does half of the work. The actual representation of the idom tree is left untouched, but the computation algorithm is moved into a separate analysis result class wrapped in a BRW_ANALYSIS object, along with the intersect() and dump_domtree() auxiliary functions in order to keep things tidy. --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 44 ++ src/mesa/drivers/dri/i965/brw_cfg.h| 38 --- .../drivers/dri/i965/brw_fs_combine_constants.cpp | 4 +- src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +- src/mesa/drivers/dri/i965/brw_shader.h | 2 + 5 files changed, 59 insertions(+), 32 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index 4fcee72..f586dd0 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -34,6 +34,8 @@ * blocks with successor/predecessor edges connecting them. */ +using namespace brw; + static bblock_t * pop_stack(exec_list *list) { @@ -158,7 +160,6 @@ cfg_t::cfg_t(exec_list *instructions) block_list.make_empty(); blocks = NULL; num_blocks = 0; - idom_dirty = true; bblock_t *cur = NULL; int ip = 0; @@ -384,7 +385,6 @@ cfg_t::remove_block(bblock_t *block) this->blocks[this->num_blocks - 1]->num = this->num_blocks - 2; this->num_blocks--; - idom_dirty = true; } bblock_t * @@ -423,8 +423,7 @@ cfg_t::make_block_array() void cfg_t::dump(backend_shader *s) { - if (idom_dirty) - calculate_idom(); + const idom_tree *idom = (s ? &s->idom_analysis.require() : NULL); foreach_block (block, this) { if (block->idom) @@ -456,19 +455,18 @@ cfg_t::dump(backend_shader *s) * (less than 1000 nodes) that this algorithm is significantly faster than * others like Lengauer-Tarjan. */ -void -cfg_t::calculate_idom() +idom_tree::idom_tree(const backend_shader *s) { - foreach_block(block, this) { + foreach_block(block, s->cfg) { block->idom = NULL; } - blocks[0]->idom = blocks[0]; + s->cfg->blocks[0]->idom = s->cfg->blocks[0]; bool changed; do { changed = false; - foreach_block(block, this) { + foreach_block(block, s->cfg) { if (block->num == 0) continue; @@ -489,12 +487,10 @@ cfg_t::calculate_idom() } } } while (changed); - - idom_dirty = false; } bblock_t * -cfg_t::intersect(bblock_t *b1, bblock_t *b2) +idom_tree::intersect(bblock_t *b1, bblock_t *b2) const { /* Note, the comparisons here are the opposite of what the paper says * because we index blocks from beginning -> end (i.e. reverse post-order) @@ -511,26 +507,26 @@ cfg_t::intersect(bblock_t *b1, bblock_t *b2) } void -cfg_t::dump_cfg() +idom_tree::dump(const backend_shader *s) const { - printf("digraph CFG {\n"); - for (int b = 0; b < num_blocks; b++) { - bblock_t *block = this->blocks[b]; - - foreach_list_typed_safe (bblock_link, child, link, &block->children) { - printf("\t%d -> %d\n", b, child->block->num); + printf("digraph DominanceTree {\n"); + foreach_block(block, s->cfg) { + if (block->idom) { + printf("\t%d -> %d\n", block->idom->num, block->num); } } printf("}\n"); } void -cfg_t::dump_domtree() +cfg_t::dump_cfg() { - printf("digraph DominanceTree {\n"); - foreach_block(block, this) { - if (block->idom) { - printf("\t%d -> %d\n", block->idom->num, block->num); + printf("digraph CFG {\n"); + for (int b = 0; b < num_blocks; b++) { + bblock_t *block = this->blocks[b]; + + foreach_list_typed_safe (bblock_link, child, link, &block->children) { + printf("\t%d -> %d\n", b, child->block->num); } } printf("}\n"); diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index 0813798..1c9207c 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -29,6 +29,9 @@ #define BRW_CFG_H #include "brw_ir.h" +#ifdef __cplusplus +#include "brw_ir_analysis.h" +#endif struct bblock_t; @@ -272,12 +275,9 @@ struct cfg_t { bblock_t *new_block(); void set_next_block(bblock_t **cur, bblock_t *block, int ip); void make_block_array(); - void calculate_idom(); - static bblock_t *intersect(bblock_t *b1, bblock_t *b2); void dump(backend_shader *s); void dump_cfg(); - void dump_domtree(); #endif void *mem_ctx; @@ -286,8 +286,6 @@ struct cfg_t { struct bblock_t **blocks; int num_blocks; - bool idom_dirty; - unsigned cycle_count; }; @@ -344,4 +342,34 @@ struct cfg_t { !__scan_inst->is_head_sentinel(); \ __scan_inst = (__type *)__scan_inst->prev) +#ifdef __cplusplus +namespace brw { + /** +* Immediate dominator tree analysis of a shader. +*/ + struct idom_tree { + idom_tree(const backend_shader *s); + +
[Mesa-dev] [PATCH 30/30] i965/fs: Calculate num_instructions in O(1) during register pressure calculation.
And mark the variable declaration as const. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 6de47fa..e8fad87 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -5064,9 +5064,8 @@ fs_visitor::setup_cs_payload() brw::register_pressure::register_pressure(const fs_visitor *v) { const fs_live_variables &live = v->live_analysis.require(); - unsigned num_instructions = 0; - foreach_block(block, v->cfg) - num_instructions += block->instructions.length(); + const unsigned num_instructions = v->cfg->num_blocks ? + v->cfg->blocks[v->cfg->num_blocks - 1]->end_ip + 1 : 0; regs_live_at_ip = new unsigned[num_instructions](); -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 29/30] i965/fs: Move register pressure calculation into IR analysis object.
This defines a new BRW_ANALYSIS object which wraps the register pressure computation code along with its result. For the rationale see the previous commits converting the liveness and dominance analysis passes to the IR analysis framework. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 27 +-- src/mesa/drivers/dri/i965/brw_fs.h | 32 +--- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 6 ++ 3 files changed, 47 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 030e66f..6de47fa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4636,11 +4636,11 @@ fs_visitor::dump_instructions(const char *name) } if (cfg) { - calculate_register_pressure(); - int ip = 0, max_pressure = 0; + const register_pressure &rp = regpressure_analysis.require(); + unsigned ip = 0, max_pressure = 0; foreach_block_and_inst(block, backend_instruction, inst, cfg) { - max_pressure = MAX2(max_pressure, regs_live_at_ip[ip]); - fprintf(file, "{%3d} %4d: ", regs_live_at_ip[ip], ip); + max_pressure = MAX2(max_pressure, rp.regs_live_at_ip[ip]); + fprintf(file, "{%3d} %4d: ", rp.regs_live_at_ip[ip], ip); dump_instruction(inst, file); ip++; } @@ -5061,27 +5061,32 @@ fs_visitor::setup_cs_payload() } } -void -fs_visitor::calculate_register_pressure() +brw::register_pressure::register_pressure(const fs_visitor *v) { - const fs_live_variables &live = live_analysis.require(); + const fs_live_variables &live = v->live_analysis.require(); unsigned num_instructions = 0; - foreach_block(block, cfg) + foreach_block(block, v->cfg) num_instructions += block->instructions.length(); - regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions); + regs_live_at_ip = new unsigned[num_instructions](); - for (unsigned reg = 0; reg < alloc.count; reg++) { + for (unsigned reg = 0; reg < v->alloc.count; reg++) { for (int ip = live.vgrf_start[reg]; ip <= live.vgrf_end[reg]; ip++) - regs_live_at_ip[ip] += alloc.sizes[reg]; + regs_live_at_ip[ip] += v->alloc.sizes[reg]; } } +brw::register_pressure::~register_pressure() +{ + delete regs_live_at_ip; +} + void fs_visitor::invalidate_analysis(brw::analysis_dependency_class c) { backend_shader::invalidate_analysis(c); live_analysis.invalidate(c); + regpressure_analysis.invalidate(c); } void diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 7a1916d..b913d3f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -39,7 +39,34 @@ namespace { struct acp_entry; } +class fs_visitor; + namespace brw { + /** +* Register pressure analysis of a shader. Estimates how many registers +* are live at any point of the program in GRF units. +*/ + struct register_pressure { + register_pressure(const fs_visitor *v); + ~register_pressure(); + + analysis_dependency_class + dependency_class() const + { + return (DEPENDENCY_INSTRUCTION_IDENTITY | + DEPENDENCY_INSTRUCTION_DATA_FLOW | + DEPENDENCY_VARIABLES); + } + + bool + validate(const fs_visitor *) const + { + /* FINISHME */ + return true; + } + + unsigned *regs_live_at_ip; + }; } struct brw_gs_compile; @@ -143,7 +170,6 @@ public: void assign_constant_locations(); void demote_pull_constants(); virtual void invalidate_analysis(brw::analysis_dependency_class c); - void calculate_register_pressure(); void validate(); bool opt_algebraic(); bool opt_redundant_discard_jumps(); @@ -330,8 +356,8 @@ public: BRW_ANALYSIS(live_analysis, brw::fs_live_variables, backend_shader *) live_analysis; - - int *regs_live_at_ip; + BRW_ANALYSIS(regpressure_analysis, brw::register_pressure, +fs_visitor *) regpressure_analysis; /** Number of uniform variable components visited. */ unsigned uniforms; diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index ede39df..0388974 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -973,7 +973,7 @@ fs_visitor::fs_visitor(const struct brw_compiler *compiler, void *log_data, : backend_shader(compiler, log_data, mem_ctx, shader, prog_data), key(key), gs_compile(NULL), prog_data(prog_data), prog(prog), input_vue_map(input_vue_map), - live_analysis(this), + live_analysis(this), regpressure_analysis(this), dispatch_width(dispatch_width), shader_time_index(shader_time_index), bld(fs_builder(this, dispatch_width).at_end()) @@ -991,7 +991,7 @@ fs_visitor::fs_visitor(const st
[Mesa-dev] [PATCH 16/30] i965/fs: Move all live interval analysis results into fs_live_variables.
This moves the following methods that are currently defined in fs_visitor (even though they are side products of the liveness analysis computation) and are already implemented in brw_fs_live_variables.cpp: > bool virtual_grf_interferes(int a, int b) const; > int *virtual_grf_start; > int *virtual_grf_end; It makes sense for them to be part of the fs_live_variables object, because they have the same lifetime as other liveness analysis results and because this will allow some extra validation to happen wherever they are accessed in order to make sure that we only ever use up-to-date liveness analysis results. This shortens the virtual_grf prefix in order to compensate for the slightly increased lexical overhead from the live_intervals pointer dereference. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 +-- src/mesa/drivers/dri/i965/brw_fs.h | 3 -- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 3 +- .../drivers/dri/i965/brw_fs_live_variables.cpp | 40 +- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 8 + src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 4 +-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 -- .../drivers/dri/i965/brw_schedule_instructions.cpp | 4 +-- 8 files changed, 34 insertions(+), 35 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9183d1f..afa1db9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2627,7 +2627,7 @@ fs_visitor::compute_to_mrf() /* Can't compute-to-MRF this GRF if someone else was going to * read it later. */ - if (this->virtual_grf_end[inst->src[0].nr] > ip) + if (live_intervals->vgrf_end[inst->src[0].nr] > ip) continue; /* Found a move of a GRF to a MRF. Let's see if we can go @@ -5074,7 +5074,8 @@ fs_visitor::calculate_register_pressure() regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions); for (unsigned reg = 0; reg < alloc.count; reg++) { - for (int ip = virtual_grf_start[reg]; ip <= virtual_grf_end[reg]; ip++) + for (int ip = live_intervals->vgrf_start[reg]; + ip <= live_intervals->vgrf_end[reg]; ip++) regs_live_at_ip[ip] += alloc.sizes[reg]; } } diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 4f5d085..82d5d63 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -162,7 +162,6 @@ public: bool remove_duplicate_mrf_writes(); bool opt_sampler_eot(); - bool virtual_grf_interferes(int a, int b) const; void schedule_instructions(instruction_scheduler_mode mode); void insert_gen4_send_dependency_workarounds(); void insert_gen4_pre_send_dependency_workarounds(bblock_t *block, @@ -329,8 +328,6 @@ public: int *param_size; - int *virtual_grf_start; - int *virtual_grf_end; brw::fs_live_variables *live_intervals; int *regs_live_at_ip; diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index a484caf..d112378 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -328,7 +328,8 @@ fs_visitor::opt_cse_local(bblock_t *block) /* Kill any AEB entries using registers that don't get reused any * more -- a sure sign they'll fail operands_match(). */ -if (src_reg->file == VGRF && virtual_grf_end[src_reg->nr] < ip) { +if (src_reg->file == VGRF && +live_intervals->vgrf_end[src_reg->nr] < ip) { entry->remove(); ralloc_free(entry); break; diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 676619a..6da026e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -265,6 +265,13 @@ fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg) end[i] = -1; } + vgrf_start = ralloc_array(mem_ctx, int, num_vgrfs); + vgrf_end = ralloc_array(mem_ctx, int, num_vgrfs); + for (int i = 0; i < num_vgrfs; i++) { + vgrf_start[i] = MAX_INSTRUCTION; + vgrf_end[i] = -1; + } + block_data= rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks); bitset_words = BITSET_WORDS(num_vars); @@ -283,6 +290,13 @@ fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg) setup_def_use(); compute_live_variables(); compute_start_end(); + + /* Merge the per-component live ranges to whole VGRF live ranges. */ + for (int i = 0; i < num_vars; i++) { + const unsigned vgrf = vgrf_from_var[i]; + vgrf_start[vgrf] = MIN2(vgrf_start[vgrf], start[i]); + vgrf_end[vgrf] = MAX2(vgrf_end[vgrf], end[i]); + } } fs_live_variables::~fs_live_variables()
[Mesa-dev] [PATCH 23/30] i965/fs: Switch liveness analysis to IR analysis framework.
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and hooking it up to invalidate_analysis() so it's properly invalidated. Seems like a lot of churn but it's fairly straightforward. The fs_visitor invalidate_ and calculate_live_intervals() methods are no longer necessary after this change. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 12 src/mesa/drivers/dri/i965/brw_fs.h | 10 --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 10 +++ .../dri/i965/brw_fs_dead_code_eliminate.cpp| 18 ++-- .../drivers/dri/i965/brw_fs_live_variables.cpp | 20 ++--- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 11 ++-- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 10 +++ .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 33 +- .../dri/i965/brw_fs_saturate_propagation.cpp | 13 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 19 ++--- 11 files changed, 72 insertions(+), 87 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index afa1db9..030e66f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2596,7 +2596,7 @@ fs_visitor::compute_to_mrf() if (devinfo->gen >= 7) return false; - calculate_live_intervals(); + const fs_live_variables &live = live_analysis.require(); foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { int ip = next_ip; @@ -2627,7 +2627,7 @@ fs_visitor::compute_to_mrf() /* Can't compute-to-MRF this GRF if someone else was going to * read it later. */ - if (live_intervals->vgrf_end[inst->src[0].nr] > ip) + if (live.vgrf_end[inst->src[0].nr] > ip) continue; /* Found a move of a GRF to a MRF. Let's see if we can go @@ -5064,9 +5064,7 @@ fs_visitor::setup_cs_payload() void fs_visitor::calculate_register_pressure() { - invalidate_live_intervals(); - calculate_live_intervals(); - + const fs_live_variables &live = live_analysis.require(); unsigned num_instructions = 0; foreach_block(block, cfg) num_instructions += block->instructions.length(); @@ -5074,8 +5072,7 @@ fs_visitor::calculate_register_pressure() regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions); for (unsigned reg = 0; reg < alloc.count; reg++) { - for (int ip = live_intervals->vgrf_start[reg]; - ip <= live_intervals->vgrf_end[reg]; ip++) + for (int ip = live.vgrf_start[reg]; ip <= live.vgrf_end[reg]; ip++) regs_live_at_ip[ip] += alloc.sizes[reg]; } } @@ -5084,6 +5081,7 @@ void fs_visitor::invalidate_analysis(brw::analysis_dependency_class c) { backend_shader::invalidate_analysis(c); + live_analysis.invalidate(c); } void diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 82d5d63..ac5e8cf 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -132,7 +132,9 @@ public: void assign_regs_trivial(); void calculate_payload_ranges(int payload_node_count, int *payload_last_use_ip); - void setup_payload_interference(struct ra_graph *g, int payload_reg_count, + void setup_payload_interference(struct ra_graph *g, + const brw::fs_live_variables &live, + int payload_reg_count, int first_payload_node); int choose_spill_reg(struct ra_graph *g); void spill_reg(int spill_reg); @@ -142,13 +144,12 @@ public: void demote_pull_constants(); void invalidate_live_intervals(); virtual void invalidate_analysis(brw::analysis_dependency_class c); - void calculate_live_intervals(); void calculate_register_pressure(); void validate(); bool opt_algebraic(); bool opt_redundant_discard_jumps(); bool opt_cse(); - bool opt_cse_local(bblock_t *block); + bool opt_cse_local(const brw::fs_live_variables &live, bblock_t *block); bool opt_copy_propagate(); bool try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry); bool try_constant_propagate(fs_inst *inst, acp_entry *entry); @@ -328,7 +329,8 @@ public: int *param_size; - brw::fs_live_variables *live_intervals; + BRW_ANALYSIS(live_analysis, brw::fs_live_variables, +backend_shader *) live_analysis; int *regs_live_at_ip; diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index d112378..f2c257e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -222,7 +222,7 @@ create_copy_instr(const fs_builder &bld, fs_inst *inst, fs_reg src, bool negate) } bool -fs_visitor::opt_cse_local(bblock_t *block) +fs_visitor::opt_cse_local(const f
[Mesa-dev] [PATCH 28/30] i965/ir: Simplify new_idom reduction in dominance tree calculation.
Trivial, just use a few less tokens to do the same thing. --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index f951f44..8701d16 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -474,11 +474,8 @@ idom_tree::idom_tree(const backend_shader *s) : bblock_t *new_idom = NULL; foreach_list_typed(bblock_link, parent_link, link, &block->parents) { if (parent(parent_link->block)) { - if (new_idom == NULL) { - new_idom = parent_link->block; - } else if (parent(parent_link->block) != NULL) { - new_idom = intersect(parent_link->block, new_idom); - } + new_idom = (new_idom ? intersect(new_idom, parent_link->block) : + parent_link->block); } } -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/30] i965/fs: Add live interval validation pass.
This could be improved somewhat with additional validation of the calculated live in/out sets and by checking that the calculated live intervals are minimal (which isn't strictly necessary to guarantee the correctness of the program). This should be good enough though to catch accidental use of stale liveness results due to missing or incorrect analysis invalidation. --- .../drivers/dri/i965/brw_fs_live_variables.cpp | 41 ++ src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 3 ++ 2 files changed, 44 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 4b0943f..215349a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -304,6 +304,47 @@ fs_live_variables::~fs_live_variables() ralloc_free(mem_ctx); } +static bool +check_register_live_range(const fs_live_variables *live, int ip, + const fs_reg ®, unsigned n) +{ + const unsigned var = live->var_from_reg(reg); + + if (var + n > unsigned(live->num_vars) || + live->vgrf_start[reg.nr] > ip || live->vgrf_end[reg.nr] < ip) + return false; + + for (unsigned j = 0; j < n; j++) { + if (live->start[var + j] > ip || live->end[var + j] < ip) + return false; + } + + return true; +} + +bool +fs_live_variables::validate(const backend_shader *s) const +{ + int ip = 0; + + foreach_block_and_inst(block, fs_inst, inst, s->cfg) { + for (unsigned i = 0; i < inst->sources; i++) { + if (inst->src[i].file == VGRF && + !check_register_live_range(this, ip, +inst->src[i], inst->regs_read(i))) +return false; + } + + if (inst->dst.file == VGRF && + !check_register_live_range(this, ip, inst->dst, inst->regs_written)) + return false; + + ip++; + } + + return true; +} + void fs_visitor::invalidate_live_intervals() { diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h index e1cd12c..c2a3c63 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h @@ -69,6 +69,9 @@ public: fs_live_variables(const backend_shader *s); ~fs_live_variables(); + bool + validate(const backend_shader *s) const; + bool vars_interfere(int a, int b) const; bool vgrfs_interfere(int a, int b) const; int var_from_reg(const fs_reg ®) const -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/30] i965: IR analysis pass framework.
1-4 are Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 21/30] i965/fs: Add live interval validation pass.
On Sun, Mar 13, 2016 at 8:47 PM, Francisco Jerez wrote: > This could be improved somewhat with additional validation of the > calculated live in/out sets and by checking that the calculated live > intervals are minimal (which isn't strictly necessary to guarantee the > correctness of the program). This should be good enough though to > catch accidental use of stale liveness results due to missing or > incorrect analysis invalidation. > --- > .../drivers/dri/i965/brw_fs_live_variables.cpp | 41 > ++ > src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 3 ++ > 2 files changed, 44 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > index 4b0943f..215349a 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > @@ -304,6 +304,47 @@ fs_live_variables::~fs_live_variables() > ralloc_free(mem_ctx); > } > > +static bool > +check_register_live_range(const fs_live_variables *live, int ip, > + const fs_reg ®, unsigned n) > +{ > + const unsigned var = live->var_from_reg(reg); > + > + if (var + n > unsigned(live->num_vars) || > + live->vgrf_start[reg.nr] > ip || live->vgrf_end[reg.nr] < ip) > + return false; > + > + for (unsigned j = 0; j < n; j++) { > + if (live->start[var + j] > ip || live->end[var + j] < ip) > + return false; > + } > + > + return true; > +} > + > +bool > +fs_live_variables::validate(const backend_shader *s) const > +{ > + int ip = 0; > + > + foreach_block_and_inst(block, fs_inst, inst, s->cfg) { > + for (unsigned i = 0; i < inst->sources; i++) { > + if (inst->src[i].file == VGRF && > + !check_register_live_range(this, ip, > +inst->src[i], inst->regs_read(i))) > +return false; > + } > + > + if (inst->dst.file == VGRF && > + !check_register_live_range(this, ip, inst->dst, inst->regs_written)) Looks like the indentation is slightly off on this line. > + return false; > + > + ip++; > + } > + > + return true; > +} > + > void > fs_visitor::invalidate_live_intervals() > { > diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > index e1cd12c..c2a3c63 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > @@ -69,6 +69,9 @@ public: > fs_live_variables(const backend_shader *s); > ~fs_live_variables(); > > + bool > + validate(const backend_shader *s) const; Return type on its own line -- intentional? Others, seen below, are on the same line. I'd have put it on the same line. > + > bool vars_interfere(int a, int b) const; > bool vgrfs_interfere(int a, int b) const; > int var_from_reg(const fs_reg ®) const > -- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 21/30] i965/fs: Add live interval validation pass.
On Sun, Mar 13, 2016 at 8:47 PM, Francisco Jerez wrote: > This could be improved somewhat with additional validation of the > calculated live in/out sets and by checking that the calculated live > intervals are minimal (which isn't strictly necessary to guarantee the > correctness of the program). This should be good enough though to > catch accidental use of stale liveness results due to missing or > incorrect analysis invalidation. > --- > .../drivers/dri/i965/brw_fs_live_variables.cpp | 41 > ++ > src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 3 ++ > 2 files changed, 44 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > index 4b0943f..215349a 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > @@ -304,6 +304,47 @@ fs_live_variables::~fs_live_variables() > ralloc_free(mem_ctx); > } > > +static bool > +check_register_live_range(const fs_live_variables *live, int ip, > + const fs_reg ®, unsigned n) > +{ > + const unsigned var = live->var_from_reg(reg); > + > + if (var + n > unsigned(live->num_vars) || > + live->vgrf_start[reg.nr] > ip || live->vgrf_end[reg.nr] < ip) > + return false; > + > + for (unsigned j = 0; j < n; j++) { > + if (live->start[var + j] > ip || live->end[var + j] < ip) > + return false; Braces in nested control flow. Elsewhere in this patch as well, and also in the next patch. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Reminder: 2016 X.Org Board of Directors Elections Nomination period is NOW
We are seeking nominations for candidates for election to the X.Org Foundation Board of Directors. All X.Org Foundation members are eligible for election to the board. Nominations for the 2016 election are now open and will remain open until 23:59 UTC on 15 March 2016. The Board consists of directors elected from the membership. Each year, an election is held to bring the total number of directors to eight. The four members receiving the highest vote totals will serve as directors for two year terms. The directors who received two year terms starting in 2015 were Peter Hutterer, Martin Peres, Rob Clark and Daniel Vetter, They will continue to serve until their term ends in 2017. Current directors whose term expires in 2016 are Alex Deucher, Matt Dew, Egbert Eich and Keith Packard. A director is expected to participate in the fortnightly IRC meeting to discuss current business and to attend the annual meeting of the X.Org Foundation, which will be held at a location determined in advance by the Board of Directors. A member may nominate themselves or any other member they feel is qualified. Nominations should be sent to the Election Committee at electi...@x.org. Nominees shall be required to be current members of the X.Org Foundation, and submit a personal statement of up to 200 words that will be provided to prospective voters. The collected statements, along with the statement of contribution to the X.Org Foundation in the members account page on http://members.x.org, will be made available to all voters to help them make their voting decisions. Nominations, membership applications or renewals and completed personal statements must be received no later than 23:59 UTC on 15 March 2016. The slate of candidates will be published 17 March 2016 and candidate Q&A will begin then. The deadline for Xorg membership applications and renewals is 17 March 2016. Cheers, Peter, on behalf of the X.Org BoD ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] Clean up preprocessor checks for execinfo.h
Fixes building with debug enabled on musl systems. --- configure.ac| 1 + src/gallium/auxiliary/util/u_debug_symbol.c | 4 ++-- src/mapi/glapi/gen/gl_gentable.py | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index 49be147..1369937 100644 --- a/configure.ac +++ b/configure.ac @@ -758,6 +758,7 @@ AC_CHECK_HEADER([xlocale.h], [DEFINES="$DEFINES -DHAVE_XLOCALE_H"]) AC_CHECK_HEADER([sys/sysctl.h], [DEFINES="$DEFINES -DHAVE_SYS_SYSCTL_H"]) AC_CHECK_FUNC([strtof], [DEFINES="$DEFINES -DHAVE_STRTOF"]) AC_CHECK_FUNC([mkostemp], [DEFINES="$DEFINES -DHAVE_MKOSTEMP"]) +AC_CHECK_HEADER([execinfo.h], [DEFINES="$DEFINES -DHAVE_EXECINFO_H"]) dnl Check to see if dlopen is in default libraries (like Solaris, which dnl has it in libc), or if libdl is needed to get it. diff --git a/src/gallium/auxiliary/util/u_debug_symbol.c b/src/gallium/auxiliary/util/u_debug_symbol.c index 10efdd5..4026089 100644 --- a/src/gallium/auxiliary/util/u_debug_symbol.c +++ b/src/gallium/auxiliary/util/u_debug_symbol.c @@ -219,7 +219,7 @@ debug_symbol_name_dbghelp(const void *addr, char* buf, unsigned size) #endif /* PIPE_OS_WINDOWS */ -#if defined(__GLIBC__) && !defined(__UCLIBC__) +#if defined(HAVE_EXECINFO_H) #include @@ -240,7 +240,7 @@ debug_symbol_name_glibc(const void *addr, char* buf, unsigned size) return TRUE; } -#endif /* defined(__GLIBC__) && !defined(__UCLIBC__) */ +#endif /* defined(HAVE_EXECINFO_H) */ void diff --git a/src/mapi/glapi/gen/gl_gentable.py b/src/mapi/glapi/gen/gl_gentable.py index 7cd475a..0d1ebd4 100644 --- a/src/mapi/glapi/gen/gl_gentable.py +++ b/src/mapi/glapi/gen/gl_gentable.py @@ -44,7 +44,7 @@ header = """/* GLXEXT is the define used in the xserver when the GLX extension i #endif #if (defined(GLXEXT) && defined(HAVE_BACKTRACE)) \\ - || (!defined(GLXEXT) && defined(DEBUG) && !defined(__CYGWIN__) && !defined(__MINGW32__) && !defined(__OpenBSD__) && !defined(__NetBSD__) && !defined(__DragonFly__)) + || (!defined(GLXEXT) && defined(DEBUG) && defined(HAVE_EXECINFO_H)) #define USE_BACKTRACE #endif -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/14] nir: Add explicitly sized types
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 12/03/16 17:26, Connor Abbott wrote: > On Fri, Mar 11, 2016 at 2:33 AM, Samuel Iglesias Gonsálvez > wrote: > > > On 11/03/16 01:08, Jason Ekstrand wrote: On Thu, Mar 10, 2016 at 4:00 PM, Connor Abbott wrote: > On Mon, Mar 7, 2016 at 3:45 AM, Samuel Iglesias Gonsálvez > wrote: >> From: Jason Ekstrand >> >> v2: Fix size/type mask to properly handle 8-bit types. >> >> Signed-off-by: Juan A. Suarez Romero >> --- src/compiler/nir/nir.h | 17 >> - 1 file changed, 16 insertions(+), 1 >> deletion(-) >> >> diff --git a/src/compiler/nir/nir.h >> b/src/compiler/nir/nir.h index cccb3a4..659e98c 100644 >> --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h >> @@ -605,9 +605,24 @@ typedef enum { nir_type_float, >> nir_type_int, nir_type_uint, - nir_type_bool + >> nir_type_bool, + nir_type_bool32 =32 | >> nir_type_bool, + nir_type_int8 = 8 | >> nir_type_int, + nir_type_int16 = 16 | nir_type_int, + >> nir_type_int32 = 32 | nir_type_int, + nir_type_int64 = >> 64 | nir_type_int, + nir_type_uint8 = 8 | >> nir_type_uint, + nir_type_uint16 =16 | nir_type_uint, >> + nir_type_uint32 = 32 | nir_type_uint, + >> nir_type_uint64 =64 | nir_type_uint, + >> nir_type_float16 = 16 | nir_type_float, + >> nir_type_float32 = 32 | nir_type_float, + >> nir_type_float64 = 64 | nir_type_float, } >> nir_alu_type; >> >> +#define NIR_ALU_TYPE_SIZE_MASK 0xfff8 +#define >> NIR_ALU_TYPE_BASE_TYPE_MASK 0x0007 > > So I'm not really the one to be reviewing this series > (after all, I wrote most of it :) ) but one thing that I > never quite liked, and didn't get around to fixing, is how > we use these raw constants all over the place. Perhaps we > could make things more readable by adding > nir_get_sized_type(), nir_get_unsized_type(), and > nir_type_size() helpers and then use those instead of > or-ing/and-ing things together everywhere. > Agreed. > > Agreed. We saw it too but, as this is used in a lot in the fp64 > patches, we were thinking on apply one patch at the end of the fp64 > series adding those helper functions (maybe just macros like > NIR_GET_UNSIZED_TYPE and NIR_GET_TYPE_SIZE) and adapting the users > of the mask. > >> I should probably mention, in general we tend to prefer inline >> functions over macros where possible since it's clearer what >> their argument types and return type are and they tend to >> integrate better with gdb. > Thanks for the advice! We will use inline functions then. Sam > > However, we can add them here and modify the rest of fp64 patches > if you prefer it. > > Sam > > >> + typedef enum { NIR_OP_IS_COMMUTATIVE = (1 << 0), >> NIR_OP_IS_ASSOCIATIVE = (1 << 1), -- 2.7.0 >> >> ___ mesa-dev >> mailing list mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev > mailing list mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJW5lSfAAoJEH/0ujLxfcNDeVMP/jv3UO2VNeW2R/3BX1d7KTni GJ/0lH4BxEEDlJNUoxCOoxRGVmUVI8D/NNRKWg9lWjP1aPSKcpNLXCSzw5C3YHql YzklKjnNi6nSZdzjjDWOtC7wJvOrLUhbst2ng6N9vu9SkjGTT2TOjp6AMSc0O5G6 ptnH+mdqKqBb31rh22w5LhlJnjIA2qmAsVpR2XZ8RCTaRZw/whKnQ+RN3MBSW7qs ENMoeBeaG8pKdJc7w8YzfO1jM4Tgic8cFlPq2rWF3eeG0s9pACLmCOFroyBjJvD9 ysQkUQoNyG25ElVMA8jdhuI19CemHN7EK1BxmA+U0RQyUQfMmax4g4TUFLloALnw FapL4FI0j4nuER9liW/EED/NVgn78cLpMMOYyVjxOEt3GjAZ9WwIHHtDr9ZTXmdu u9pyciB724XQJNkcaMQiAF1vJgsnR3WdBuHKgOul8B3EV8urRtUYLOv+4eqVbaIT p5E3n1On5zxyifeWX/Nq2JDFA029K74G+6bVN3VnyZNUv5dMw4Ry+fNY6CrAdYqY cvTs9WMxWhsPwK3CRKVYImquKngoVdkXazA7sePRF44AGBMobFPN4tF2DOtRVfOq EtvbP7nJ3UJSA527KKdnUG4scf/1XDVkvPVK48/nkVVpURIZPj4fkXZvdAdUsutk VZR5zXfg5JuF1xNiV6gH =DTjV -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev