Am 09.11.2017 um 18:43 schrieb Jan Vesely: > On Thu, 2017-11-09 at 18:39 +0100, Nicolai Hähnle wrote: >> On 09.11.2017 18:26, Roland Scheidegger wrote: >>> Am 09.11.2017 um 18:19 schrieb Jan Vesely: >>>> On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote: >>>>> From: Roland Scheidegger <srol...@vmware.com> >>>>> >>>>> I believe this is the safe thing to do, especially ever since the driver >>>>> actually generates NaNs for muls too. >>>>> Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not >>>>> entirely sure what the non-dx10 versions do, >>>> >>>> non-dx10 version return nan if one of the operands is nan (tested on >>>> Turks). >>> >>> Yes, I've modified a piglit test and came to the same conclusion. (I >>> will put that up for review shortly.) >>> I don't know why you'd ever want that, though (ieee fmin/fmax also >>> should return non-nan). >> >> My guess is that DX9-level hardware had that behavior by virtue of just >> not caring about NaN at all, and the hardware folks were just being >> conservative in adding a new opcode rather than changing the behavior of >> the old one. I don't think GCN has the old-style min/max. > > Looks like it. > There's v_max_legacy_f32 at least on SI/CI. comment says "D.f = > max(S0.f, S1.f) (DX9 rules for NaN)" The problem with dx9 rules for NaN is that noone really seems to know what they are exactly - in that sense even GL is an improvement since it's at least obvious you can do whatever floats your boat :-). Albeit generally, in d3d9 you should never generate a NaN in the first place, hence how you promote it doesn't really matter.
Roland > > Jan >> >> Cheers, >> Nicolai >> >>> >>> Roland >>> >>> >>>> >>>> Jan >>>> >>>>> but (as required by dx10) >>>>> the dx10 versions should pick a non-nan source over a nan source. >>>>> Other drivers presumably do the same (radeonsi, llvmpipe). >>>>> This was shown to make some difference for bug 103544, albeit it is not >>>>> required to fix it. >>>>> --- >>>>> src/gallium/drivers/r600/r600_shader.c | 12 ++++++------ >>>>> src/gallium/drivers/r600/sb/sb_expr.cpp | 2 ++ >>>>> 2 files changed, 8 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/src/gallium/drivers/r600/r600_shader.c >>>>> b/src/gallium/drivers/r600/r600_shader.c >>>>> index 188fbc9d47..6a755bb3fd 100644 >>>>> --- a/src/gallium/drivers/r600/r600_shader.c >>>>> +++ b/src/gallium/drivers/r600/r600_shader.c >>>>> @@ -8844,8 +8844,8 @@ static const struct r600_shader_tgsi_instruction >>>>> r600_shader_tgsi_instruction[] >>>>> [TGSI_OPCODE_DP3] = { ALU_OP2_DOT4_IEEE, tgsi_dp}, >>>>> [TGSI_OPCODE_DP4] = { ALU_OP2_DOT4_IEEE, tgsi_dp}, >>>>> [TGSI_OPCODE_DST] = { ALU_OP0_NOP, tgsi_opdst}, >>>>> - [TGSI_OPCODE_MIN] = { ALU_OP2_MIN, tgsi_op2}, >>>>> - [TGSI_OPCODE_MAX] = { ALU_OP2_MAX, tgsi_op2}, >>>>> + [TGSI_OPCODE_MIN] = { ALU_OP2_MIN_DX10, tgsi_op2}, >>>>> + [TGSI_OPCODE_MAX] = { ALU_OP2_MAX_DX10, tgsi_op2}, >>>>> [TGSI_OPCODE_SLT] = { ALU_OP2_SETGT, tgsi_op2_swap}, >>>>> [TGSI_OPCODE_SGE] = { ALU_OP2_SETGE, tgsi_op2}, >>>>> [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD_IEEE, tgsi_op3}, >>>>> @@ -9042,8 +9042,8 @@ static const struct r600_shader_tgsi_instruction >>>>> eg_shader_tgsi_instruction[] = >>>>> [TGSI_OPCODE_DP3] = { ALU_OP2_DOT4_IEEE, tgsi_dp}, >>>>> [TGSI_OPCODE_DP4] = { ALU_OP2_DOT4_IEEE, tgsi_dp}, >>>>> [TGSI_OPCODE_DST] = { ALU_OP0_NOP, tgsi_opdst}, >>>>> - [TGSI_OPCODE_MIN] = { ALU_OP2_MIN, tgsi_op2}, >>>>> - [TGSI_OPCODE_MAX] = { ALU_OP2_MAX, tgsi_op2}, >>>>> + [TGSI_OPCODE_MIN] = { ALU_OP2_MIN_DX10, tgsi_op2}, >>>>> + [TGSI_OPCODE_MAX] = { ALU_OP2_MAX_DX10, tgsi_op2}, >>>>> [TGSI_OPCODE_SLT] = { ALU_OP2_SETGT, tgsi_op2_swap}, >>>>> [TGSI_OPCODE_SGE] = { ALU_OP2_SETGE, tgsi_op2}, >>>>> [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD_IEEE, tgsi_op3}, >>>>> @@ -9265,8 +9265,8 @@ static const struct r600_shader_tgsi_instruction >>>>> cm_shader_tgsi_instruction[] = >>>>> [TGSI_OPCODE_DP3] = { ALU_OP2_DOT4_IEEE, tgsi_dp}, >>>>> [TGSI_OPCODE_DP4] = { ALU_OP2_DOT4_IEEE, tgsi_dp}, >>>>> [TGSI_OPCODE_DST] = { ALU_OP0_NOP, tgsi_opdst}, >>>>> - [TGSI_OPCODE_MIN] = { ALU_OP2_MIN, tgsi_op2}, >>>>> - [TGSI_OPCODE_MAX] = { ALU_OP2_MAX, tgsi_op2}, >>>>> + [TGSI_OPCODE_MIN] = { ALU_OP2_MIN_DX10, tgsi_op2}, >>>>> + [TGSI_OPCODE_MAX] = { ALU_OP2_MAX_DX10, tgsi_op2}, >>>>> [TGSI_OPCODE_SLT] = { ALU_OP2_SETGT, tgsi_op2_swap}, >>>>> [TGSI_OPCODE_SGE] = { ALU_OP2_SETGE, tgsi_op2}, >>>>> [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD_IEEE, tgsi_op3}, >>>>> diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp >>>>> b/src/gallium/drivers/r600/sb/sb_expr.cpp >>>>> index 3dd3a4815b..7a5d62c8e8 100644 >>>>> --- a/src/gallium/drivers/r600/sb/sb_expr.cpp >>>>> +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp >>>>> @@ -753,7 +753,9 @@ bool expr_handler::fold_alu_op2(alu_node& n) { >>>>> n.bc.src[0].abs == n.bc.src[1].abs) { >>>>> switch (n.bc.op) { >>>>> case ALU_OP2_MIN: // (MIN x, x) => (MOV x) >>>>> + case ALU_OP2_MIN_DX10: >>>>> case ALU_OP2_MAX: >>>>> + case ALU_OP2_MAX_DX10: >>>>> convert_to_mov(n, v0, n.bc.src[0].neg, >>>>> n.bc.src[0].abs); >>>>> return fold_alu_op1(n); >>>>> case ALU_OP2_ADD: // (ADD x, x) => (MUL x, 2) >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> >> >> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev