Unfortuantely we don't know if a particular load is a real 2d image (as
would be a cube face or 2d array element), or a layer of a 3d image.
Since we pass in the TIC reference, the instruction's type has to match
what's in the TIC (experimentally). In order to properly support
bindless images, this also can't be done by looking at the current
bindings and generating appropriate code.
As a result all plain 2d loads are converted into a pair of 2d/3d loads,
with appropriate predicates to ensure only one of those actually
executes, and the values are all merged in.
This goes somewhat against the current flow, so for GM107 we do the OOB
handling directly in the surface processing logic. Perhaps the other
gens should do something similar, but that is left to another change.
This fixes dEQP tests like image_load_store.3d.*_single_layer and GL-CTS
tests like shader_image_load_store.non-layered_binding without breaking
anything else.
Signed-off-by: Ilia Mirkin
---
OK, first of all -- to whoever thought that binding single layers of a 3d
image and telling the shader they were regular 2d images was a good idea --
I disagree.
This change feels super super dirty, but I honestly don't see a materially
cleaner way of handling it. Instead of being able to reuse the OOB
handling, it's put in with the coord processing (!), and the surface
conversion function is seriously hacked up.
But splitting it up is harder, since a lot of information has to flow
from stage to stage, like when to do what kind of access, and cloning
the surface op is much easier in the coord processing stage.
.../nouveau/codegen/nv50_ir_emit_gm107.cpp| 34 ++-
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 206 +-
.../nouveau/codegen/nv50_ir_lowering_nvc0.h | 4 +-
src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 10 +-
4 files changed, 201 insertions(+), 53 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 6eefe8f0025..e244bd0d610 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -122,6 +122,8 @@ private:
void emitSAM();
void emitRAM();
+ void emitPSETP();
+
void emitMOV();
void emitS2R();
void emitCS2R();
@@ -690,6 +692,31 @@ CodeEmitterGM107::emitRAM()
* predicate/cc
**/
+void
+CodeEmitterGM107::emitPSETP()
+{
+
+ emitInsn(0x5090);
+
+ switch (insn->op) {
+ case OP_AND: emitField(0x18, 3, 0); break;
+ case OP_OR: emitField(0x18, 3, 1); break;
+ case OP_XOR: emitField(0x18, 3, 2); break;
+ default:
+ assert(!"unexpected operation");
+ break;
+ }
+
+ // emitINV (0x2a);
+ emitPRED(0x27); // TODO: support 3-arg
+ emitINV (0x20, insn->src(1));
+ emitPRED(0x1d, insn->src(1));
+ emitINV (0x0f, insn->src(0));
+ emitPRED(0x0c, insn->src(0));
+ emitPRED(0x03, insn->def(0));
+ emitPRED(0x00);
+}
+
/***
* movement / conversion
**/
@@ -3557,7 +3584,12 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
case OP_AND:
case OP_OR:
case OP_XOR:
- emitLOP();
+ switch (insn->def(0).getFile()) {
+ case FILE_GPR: emitLOP(); break;
+ case FILE_PREDICATE: emitPSETP(); break;
+ default:
+ assert(!"invalid bool op");
+ }
break;
case OP_NOT:
emitNOT();
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index 1f702a987d8..0f68a9a229f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1802,6 +1802,9 @@ NVC0LoweringPass::loadSuInfo32(Value *ptr, int slot,
uint32_t off, bool bindless
{
uint32_t base = slot * NVC0_SU_INFO__STRIDE;
+ // We don't upload surface info for bindless for GM107+
+ assert(!bindless || targ->getChipset() < NVISA_GM107_CHIPSET);
+
if (ptr) {
ptr = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), ptr, bld.mkImm(slot));
if (bindless)
@@ -2204,7 +2207,7 @@ getDestType(const ImgType type) {
}
void
-NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su)
+NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su, Instruction
**loaded)
{
const TexInstruction::ImgFormatDesc *format = su->tex.format;
int width = format->bits[0] + format->bits[1] +
@@ -2223,21 +2226,38 @@ NVC0LoweringPass::convertSurfaceFormat(TexInstruction
*su)
if (width < 32)
untypedDst[0] = bld.getSSA();
- for (int i = 0; i < 4; i++) {
- typedDst[i] = su->getDef(i);
+ if (loaded && loaded[0]) {
+ for (int i = 0; i < 4; i++) {
+ if