[clang-tools-extra] [llvm] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
goldsteinn wrote: For my money this was merged prematurely. There are still outstanding concerns about whether this transform is desirable, as well there is an outstanding comment about the implementation itself. I'm fairly agnostic about this code getting in, but I think it should be reverted until some degree of consensus is reached. It's painful to have comments/a PR languish without replies, but don't think the answer is to just push. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
HaohaiWen wrote: I'd like to merge it. Please let me know if you have more concern. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
HaohaiWen wrote: > > Thanks for the updated example! > > > > To explain what I meant in first comment using this example: We would > > perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not > > depend at all on how `%yx` is constructed, and whether there is any way to > > form the `fshl` separately. If the `%yx` is appropriately constructed, the > > `fshl` can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing > > transform). > > > > Is this not a viable approach? Is there a concern here that generating both > > fshl and bitreverse may be non-profitable for targets without bitreverse? > > Or maybe supporting this makes the matching too expensive? > > It's absolutely a feasible solution. > > -- > > Solution1: > First optimize bitreverse then eliminate redundant fshl: > https://alive2.llvm.org/ce/z/g_gWf3 > This requires > a) First teach collectBitParts to not only search until unknown opcode, but > also try to use itself as root. > b) Teach recognizeBSwapOrBitReverseIdiom to recognize bit pattern [n/2-1, > ..., 1, 0, n-1, n-2, n/2]. Then insert bitreverse and fshl. > c) Teach instcombine to remove redundant fshl if opposite concat exists. This > requires to scan def-users chains. > > Advantage: > 1). Even if we can't eliminate fshl, we can still optimize a bunch of IR to > fshl+bitreverse. Don't know whether its profitable for most targets. > > -- > Solution2: > First optimize or to fshl then optimize bitreverse: > https://alive2.llvm.org/ce/z/WbzJVo > This requires > a) What we did in this PR. This requires to scan def-users chains. > b) same as step a) in Solution 1. > > Advantage: > 1). Can optimize more opposite concat pattern to fshl. It's beneficial for > targets with cycle rotate instruction (e.g. rol in x86). > 2). More easily for implementation. Do not requires step b) in Solution1. > > -- > > Both solutions requires to scan def-users chains. I don't think this is an > issue. > Both solutions can handle my cases. Solution2 is easier to implementation. > Any concern about this PR? > I think b) in solution1 can be implemented in the future if we want both > advantages of solution1 and 2. InstCombine will always first try to match > fshl then bitreverse. Therefore with solution2 and b) of solution1, we don't > need to implent c) in solution1 at all. > > Ref for bitreverse optimization: > https://github.com/llvm/llvm-project/blob/38b34c61e028751b6778493d6185d07a8af1a3b5/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L2686 Comments? https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
nikic wrote: Thanks for the updated example! To explain what I meant in first comment using this example: We would perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not depend at all on how `%yx` is constructed, and whether there is any way to form the `fshl` separately. If the `%yx` is appropriately constructed, the `fshl` can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing transform). Is this not a viable approach? Is there a concern here that generating both fshl and bitreverse may be non-profitable for targets without bitreverse? Or maybe supporting this makes the matching too expensive? https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2727,105 +2727,161 @@ Instruction *InstCombinerImpl::matchBSwapOrBitReverse(Instruction &I, } /// Match UB-safe variants of the funnel shift intrinsic. -static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { +static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC, + const DominatorTree &DT) { // TODO: Can we reduce the code duplication between this and the related // rotate matching code under visitSelect and visitTrunc? unsigned Width = Or.getType()->getScalarSizeInBits(); - // First, find an or'd pair of opposite shifts: - // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1) - BinaryOperator *Or0, *Or1; - if (!match(Or.getOperand(0), m_BinOp(Or0)) || - !match(Or.getOperand(1), m_BinOp(Or1))) + Instruction *Or0, *Or1; + if (!match(Or.getOperand(0), m_Instruction(Or0)) || + !match(Or.getOperand(1), m_Instruction(Or1))) return nullptr; - Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1; - if (!match(Or0, m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0 || - !match(Or1, m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1 || - Or0->getOpcode() == Or1->getOpcode()) -return nullptr; + bool IsFshl = true; // Sub on LSHR. + SmallVector FShiftArgs; - // Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)). - if (Or0->getOpcode() == BinaryOperator::LShr) { -std::swap(Or0, Or1); -std::swap(ShVal0, ShVal1); -std::swap(ShAmt0, ShAmt1); - } - assert(Or0->getOpcode() == BinaryOperator::Shl && - Or1->getOpcode() == BinaryOperator::LShr && - "Illegal or(shift,shift) pair"); - - // Match the shift amount operands for a funnel shift pattern. This always - // matches a subtraction on the R operand. - auto matchShiftAmount = [&](Value *L, Value *R, unsigned Width) -> Value * { -// Check for constant shift amounts that sum to the bitwidth. -const APInt *LI, *RI; -if (match(L, m_APIntAllowUndef(LI)) && match(R, m_APIntAllowUndef(RI))) - if (LI->ult(Width) && RI->ult(Width) && (*LI + *RI) == Width) -return ConstantInt::get(L->getType(), *LI); - -Constant *LC, *RC; -if (match(L, m_Constant(LC)) && match(R, m_Constant(RC)) && -match(L, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && -match(R, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && -match(ConstantExpr::getAdd(LC, RC), m_SpecificIntAllowUndef(Width))) - return ConstantExpr::mergeUndefsWith(LC, RC); - -// (shl ShVal, X) | (lshr ShVal, (Width - x)) iff X < Width. -// We limit this to X < Width in case the backend re-expands the intrinsic, -// and has to reintroduce a shift modulo operation (InstCombine might remove -// it after this fold). This still doesn't guarantee that the final codegen -// will match this original pattern. -if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L) { - KnownBits KnownL = IC.computeKnownBits(L, /*Depth*/ 0, &Or); - return KnownL.getMaxValue().ult(Width) ? L : nullptr; + // First, find an or'd pair of opposite shifts: + // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1) + if (isa(Or0) && isa(Or1)) { +Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1; +if (!match(Or0, + m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0 || +!match(Or1, + m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1 || +Or0->getOpcode() == Or1->getOpcode()) + return nullptr; + +// Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)). +if (Or0->getOpcode() == BinaryOperator::LShr) { + std::swap(Or0, Or1); + std::swap(ShVal0, ShVal1); + std::swap(ShAmt0, ShAmt1); } +assert(Or0->getOpcode() == BinaryOperator::Shl && + Or1->getOpcode() == BinaryOperator::LShr && + "Illegal or(shift,shift) pair"); + +// Match the shift amount operands for a funnel shift pattern. This always +// matches a subtraction on the R operand. +auto matchShiftAmount = [&](Value *L, Value *R, unsigned Width) -> Value * { + // Check for constant shift amounts that sum to the bitwidth. + const APInt *LI, *RI; + if (match(L, m_APIntAllowUndef(LI)) && match(R, m_APIntAllowUndef(RI))) +if (LI->ult(Width) && RI->ult(Width) && (*LI + *RI) == Width) + return ConstantInt::get(L->getType(), *LI); + + Constant *LC, *RC; + if (match(L, m_Constant(LC)) && match(R, m_Constant(RC)) && + match(L, +m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && + match(R, +m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && + match(ConstantExpr::getAdd(LC, RC), m_SpecificIntAllowUndef(Width))) +return ConstantExpr::mergeUndefsWith(LC, RC); + + // (shl ShVal, X) | (lshr ShVal, (Width - x)) iff X < Width. +
[clang-tools-extra] [llvm] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
HaohaiWen wrote: > Yes, I understand that this transform is only a step towards handling the > full pattern. I'm asking for a complete, working example of the original > motivating case. The snippets posted in [#68502 > (comment)](https://github.com/llvm/llvm-project/pull/68502#discussion_r1351618002) > do not appear to be correct, or I failed to assemble them correctly. Please > provide complete src and tgt functions that verify with alive2. Oh... sorry, I made a wrong IR example. The real case looks like this: https://alive2.llvm.org/ce/z/-DXnJc Both %x and %y are i16. The first or/fshl swaps half of of i32. The rest swaps byte, 4bit, 2bit. I extended the fshl transformation to apply for both symmetric and asymmetric combination. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
HaohaiWen wrote: I'd like to merge it if no any more comments. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
HaohaiWen wrote: > > Any more comments? I'd like to merge it if no objection. > > I think there are outstanding objections, or at least blocking concerns from > nikic. gentle ping @nikic https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
goldsteinn wrote: > Any more comments? I'd like to merge it if no objection. I think there are outstanding objections, or at least blocking concerns from nikic. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
HaohaiWen wrote: Any more comments? I'd like to merge it if no objection. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
nikic wrote: > See most recent comments about missing tests. That being said code looks > functionally correct to me. Still not 100% sure this is a desirable change. > Will defer to @nikic about that. I'm also skeptical about accepting this optimization. Looking at the motivating case in https://github.com/llvm/llvm-project/pull/68502#discussion_r1351618002, this seems like a bad approach to the problem: It means that in order to fold the pattern to `bitreverse(%xy)`, we must just so happen to have the right `%xy` lying around in the IR, even though it doesn't have any relation to the main pattern (it's not used inside it, just injected via an extra use). It sounds to me like the better way to handle that case would be to support matching a variant of plain bitreverse in the form of `bitreverse(rot(%yx))`. Replacing the `rot(%yx)` by `%xy` is then an extra optimization opportunity, but it's no longer a precondition to performing the transform. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
goldsteinn wrote: See most recent comments about missing tests. That being said code looks functionally correct to me. Still not 100% sure this is a desirable change. Will defer to @nikic about that. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2840,6 +2841,60 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { return nullptr; FShiftArgs = {ShVal0, ShVal1, ShAmt}; + } else if (isa(Or0) || isa(Or1)) { +// If there are two 'or' instructions concat variables in opposite order: +// +// Slot1 and Slot2 are all zero bits. +// | Slot1 | Low | Slot2 | High | +// LowHigh = or (shl (zext Low), ZextLowShlAmt), (zext High) +// | Slot2 | High | Slot1 | Low | +// HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low) +// +// the latter 'or' can be safely convert to +// -> HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt +// if ZextLowShlAmt + ZextHighShlAmt == Width. +if (!isa(Or1)) + std::swap(Or0, Or1); + +Value *High, *ZextHigh, *Low; +const APInt *ZextHighShlAmt; +if (!match(Or0, + m_OneUse(m_Shl(m_Value(ZextHigh), m_APInt(ZextHighShlAmt) + return nullptr; + +if (!match(Or1, m_ZExt(m_Value(Low))) || +!match(ZextHigh, m_ZExt(m_Value(High + return nullptr; + +unsigned HighSize = High->getType()->getScalarSizeInBits(); +unsigned LowSize = Low->getType()->getScalarSizeInBits(); +// Make sure High does not overlap with Low and most significant bits of +// High aren't shifted out. +if (ZextHighShlAmt->ult(LowSize) || ZextHighShlAmt->ugt(Width - HighSize)) + return nullptr; + +for (User *U : ZextHigh->users()) { + Value *X, *Y; + if (!match(U, m_Or(m_Value(X), m_Value(Y +continue; + + if (!isa(Y)) +std::swap(X, Y); + + const APInt *ZextLowShlAmt; + if (!match(X, m_Shl(m_Specific(Or1), m_APInt(ZextLowShlAmt))) || + !match(Y, m_Specific(ZextHigh)) || !DT.dominates(U, &Or)) +continue; + + // Make sure Low does not overlap with High and most significant bits of + // Low aren't shifted out and we can rotate shift LowHigh to HighLow. + if (ZextLowShlAmt->ult(HighSize) || ZextLowShlAmt->ugt(Width - LowSize) || + *ZextLowShlAmt + *ZextHighShlAmt != Width) goldsteinn wrote: Think you are missing negative test for this (also for non-dominating). https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2840,6 +2841,60 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { return nullptr; FShiftArgs = {ShVal0, ShVal1, ShAmt}; + } else if (isa(Or0) || isa(Or1)) { +// If there are two 'or' instructions concat variables in opposite order: +// +// Slot1 and Slot2 are all zero bits. +// | Slot1 | Low | Slot2 | High | +// LowHigh = or (shl (zext Low), ZextLowShlAmt), (zext High) +// | Slot2 | High | Slot1 | Low | +// HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low) +// +// the latter 'or' can be safely convert to +// -> HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt +// if ZextLowShlAmt + ZextHighShlAmt == Width. +if (!isa(Or1)) + std::swap(Or0, Or1); + +Value *High, *ZextHigh, *Low; +const APInt *ZextHighShlAmt; +if (!match(Or0, + m_OneUse(m_Shl(m_Value(ZextHigh), m_APInt(ZextHighShlAmt) + return nullptr; + +if (!match(Or1, m_ZExt(m_Value(Low))) || +!match(ZextHigh, m_ZExt(m_Value(High + return nullptr; + +unsigned HighSize = High->getType()->getScalarSizeInBits(); +unsigned LowSize = Low->getType()->getScalarSizeInBits(); +// Make sure High does not overlap with Low and most significant bits of +// High aren't shifted out. +if (ZextHighShlAmt->ult(LowSize) || ZextHighShlAmt->ugt(Width - HighSize)) goldsteinn wrote: Think you are missing a negative test for this. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2840,6 +2841,46 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { return nullptr; FShiftArgs = {ShVal0, ShVal1, ShAmt}; + } else if (isa(Or0) || isa(Or1)) { +// If there are two 'or' instructions concat variables in opposite order, +// the latter one can be safely convert to fshl. +// +// LowHigh = or (shl (zext Low), Width - ZextHighShlAmt), (zext High) +// HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low) +// -> +// HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt +if (!isa(Or1)) + std::swap(Or0, Or1); + +Value *High, *ZextHigh, *Low; +const APInt *ZextHighShlAmt; +if (!match(Or0, + m_OneUse(m_Shl(m_Value(ZextHigh), m_APInt(ZextHighShlAmt) + return nullptr; + +if (!match(Or1, m_ZExt(m_Value(Low))) || +!match(ZextHigh, m_ZExt(m_Value(High goldsteinn wrote: Ahh, I see, although could handle masking `and`. Really its just a knownbits check though. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -354,6 +354,48 @@ define <2 x i64> @fshl_select_vector(<2 x i64> %x, <2 x i64> %y, <2 x i64> %sham ret <2 x i64> %r } +; Convert 'or concat' to fshl if opposite 'or concat' exists. + +define i32 @fshl_concat(i8 %x, i24 %y, ptr %addr) { +; CHECK-LABEL: @fshl_concat( +; CHECK-NEXT:[[ZEXT_X:%.*]] = zext i8 [[X:%.*]] to i32 +; CHECK-NEXT:[[SLX:%.*]] = shl nuw i32 [[ZEXT_X]], 24 +; CHECK-NEXT:[[ZEXT_Y:%.*]] = zext i24 [[Y:%.*]] to i32 +; CHECK-NEXT:[[XY:%.*]] = or i32 [[SLX]], [[ZEXT_Y]] +; CHECK-NEXT:store i32 [[XY]], ptr [[ADDR:%.*]], align 4 +; CHECK-NEXT:[[YX:%.*]] = call i32 @llvm.fshl.i32(i32 [[XY]], i32 [[XY]], i32 8) +; CHECK-NEXT:ret i32 [[YX]] +; + %zext.x = zext i8 %x to i32 + %slx = shl nuw i32 %zext.x, 24 + %zext.y = zext i24 %y to i32 + %xy = or i32 %zext.y, %slx + store i32 %xy, ptr %addr, align 4 + %sly = shl nuw i32 %zext.y, 8 + %yx = or i32 %zext.x, %sly + ret i32 %yx +} + +define <2 x i32> @fshl_concat_vector(<2 x i8> %x, <2 x i24> %y, ptr %addr) { +; CHECK-LABEL: @fshl_concat_vector( +; CHECK-NEXT:[[ZEXT_X:%.*]] = zext <2 x i8> [[X:%.*]] to <2 x i32> +; CHECK-NEXT:[[SLX:%.*]] = shl nuw <2 x i32> [[ZEXT_X]], +; CHECK-NEXT:[[ZEXT_Y:%.*]] = zext <2 x i24> [[Y:%.*]] to <2 x i32> +; CHECK-NEXT:[[XY:%.*]] = or <2 x i32> [[SLX]], [[ZEXT_Y]] +; CHECK-NEXT:store <2 x i32> [[XY]], ptr [[ADDR:%.*]], align 4 +; CHECK-NEXT:[[YX:%.*]] = call <2 x i32> @llvm.fshl.v2i32(<2 x i32> [[XY]], <2 x i32> [[XY]], <2 x i32> ) +; CHECK-NEXT:ret <2 x i32> [[YX]] +; + %zext.x = zext <2 x i8> %x to <2 x i32> + %slx = shl nuw <2 x i32> %zext.x, + %zext.y = zext <2 x i24> %y to <2 x i32> + %xy = or <2 x i32> %slx, %zext.y + store <2 x i32> %xy, ptr %addr, align 4 + %sly = shl nuw <2 x i32> %zext.y, + %yx = or <2 x i32> %sly, %zext.x + ret <2 x i32> %yx +} HaohaiWen wrote: Done. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2840,6 +2841,46 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { return nullptr; FShiftArgs = {ShVal0, ShVal1, ShAmt}; + } else if (isa(Or0) || isa(Or1)) { +// If there are two 'or' instructions concat variables in opposite order, +// the latter one can be safely convert to fshl. +// +// LowHigh = or (shl (zext Low), Width - ZextHighShlAmt), (zext High) +// HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low) +// -> +// HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt +if (!isa(Or1)) + std::swap(Or0, Or1); + +Value *High, *ZextHigh, *Low; +const APInt *ZextHighShlAmt; +if (!match(Or0, + m_OneUse(m_Shl(m_Value(ZextHigh), m_APInt(ZextHighShlAmt) + return nullptr; + +if (!match(Or1, m_ZExt(m_Value(Low))) || +!match(ZextHigh, m_ZExt(m_Value(High HaohaiWen wrote: If we only match (shl i32, 8), we can't guarantee it's reverse concat since we don't know if the most significant 8bit in i32 is zero. ``` %zext.x = zext i8 %x to i32 %slx = shl nuw i32 %zext.x, 24 %zext.y = zext i24 %y to i32 %xy = or i32 %zext.y, %slx#[x[7:0], y[23:0]] %sly = shl nuw i32 %zext.y, 8 %yx = or i32 %zext.x, %sly#[y[23:0], x[7:0]] ``` If not match zext: ``` %zext.x = zext i8 %x to i32 %slx = shl nuw i32 %zext.x, 24 %xy = or i32 %y, %slx #[unknown, y[23,0]] y[31:24] may not be zero. %sly = shl nuw i32 %y, 8 #[y[23:0], 0,0,0,0,0,0,0,0] %yx = or i32 %zext.x, %sly #[y[23:0], x[7:0]] ``` https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
https://github.com/HaohaiWen updated https://github.com/llvm/llvm-project/pull/68502 >From 5b3b1bbb5b263bc5711adde031d85b1461ccbab6 Mon Sep 17 00:00:00 2001 From: Haohai Wen Date: Sat, 7 Oct 2023 13:48:32 +0800 Subject: [PATCH 1/3] [InstCombine] Refactor matchFunnelShift to allow more pattern (NFC) Current implementation of matchFunnelShift only allows opposite shift pattern. Refactor it to allow more pattern. --- .../InstCombine/InstCombineAndOrXor.cpp | 172 ++ 1 file changed, 93 insertions(+), 79 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp index cbdab3e9c5fb91d..b04e070fd19d7d1 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp @@ -2732,100 +2732,114 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { // rotate matching code under visitSelect and visitTrunc? unsigned Width = Or.getType()->getScalarSizeInBits(); - // First, find an or'd pair of opposite shifts: - // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1) - BinaryOperator *Or0, *Or1; - if (!match(Or.getOperand(0), m_BinOp(Or0)) || - !match(Or.getOperand(1), m_BinOp(Or1))) -return nullptr; - - Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1; - if (!match(Or0, m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0 || - !match(Or1, m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1 || - Or0->getOpcode() == Or1->getOpcode()) + Instruction *Or0, *Or1; + if (!match(Or.getOperand(0), m_Instruction(Or0)) || + !match(Or.getOperand(1), m_Instruction(Or1))) return nullptr; - // Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)). - if (Or0->getOpcode() == BinaryOperator::LShr) { -std::swap(Or0, Or1); -std::swap(ShVal0, ShVal1); -std::swap(ShAmt0, ShAmt1); - } - assert(Or0->getOpcode() == BinaryOperator::Shl && - Or1->getOpcode() == BinaryOperator::LShr && - "Illegal or(shift,shift) pair"); - - // Match the shift amount operands for a funnel shift pattern. This always - // matches a subtraction on the R operand. - auto matchShiftAmount = [&](Value *L, Value *R, unsigned Width) -> Value * { -// Check for constant shift amounts that sum to the bitwidth. -const APInt *LI, *RI; -if (match(L, m_APIntAllowUndef(LI)) && match(R, m_APIntAllowUndef(RI))) - if (LI->ult(Width) && RI->ult(Width) && (*LI + *RI) == Width) -return ConstantInt::get(L->getType(), *LI); - -Constant *LC, *RC; -if (match(L, m_Constant(LC)) && match(R, m_Constant(RC)) && -match(L, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && -match(R, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && -match(ConstantExpr::getAdd(LC, RC), m_SpecificIntAllowUndef(Width))) - return ConstantExpr::mergeUndefsWith(LC, RC); - -// (shl ShVal, X) | (lshr ShVal, (Width - x)) iff X < Width. -// We limit this to X < Width in case the backend re-expands the intrinsic, -// and has to reintroduce a shift modulo operation (InstCombine might remove -// it after this fold). This still doesn't guarantee that the final codegen -// will match this original pattern. -if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L) { - KnownBits KnownL = IC.computeKnownBits(L, /*Depth*/ 0, &Or); - return KnownL.getMaxValue().ult(Width) ? L : nullptr; -} + bool IsFshl = true; // Sub on LSHR. + SmallVector FShiftArgs; -// For non-constant cases, the following patterns currently only work for -// rotation patterns. -// TODO: Add general funnel-shift compatible patterns. -if (ShVal0 != ShVal1) + // First, find an or'd pair of opposite shifts: + // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1) + if (isa(Or0) && isa(Or1)) { +Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1; +if (!match(Or0, + m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0 || +!match(Or1, + m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1 || +Or0->getOpcode() == Or1->getOpcode()) return nullptr; -// For non-constant cases we don't support non-pow2 shift masks. -// TODO: Is it worth matching urem as well? -if (!isPowerOf2_32(Width)) - return nullptr; +// Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)). +if (Or0->getOpcode() == BinaryOperator::LShr) { + std::swap(Or0, Or1); + std::swap(ShVal0, ShVal1); + std::swap(ShAmt0, ShAmt1); +} +assert(Or0->getOpcode() == BinaryOperator::Shl && + Or1->getOpcode() == BinaryOperator::LShr && + "Illegal or(shift,shift) pair"); + +// Match the shift amount operands for a funnel shift pattern. This always +// matches a subtraction on the R operand. +auto matchShiftAmount = [&](Value *L
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2840,6 +2841,46 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { return nullptr; FShiftArgs = {ShVal0, ShVal1, ShAmt}; + } else if (isa(Or0) || isa(Or1)) { +// If there are two 'or' instructions concat variables in opposite order, +// the latter one can be safely convert to fshl. +// +// LowHigh = or (shl (zext Low), Width - ZextHighShlAmt), (zext High) +// HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low) +// -> +// HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt +if (!isa(Or1)) + std::swap(Or0, Or1); + +Value *High, *ZextHigh, *Low; +const APInt *ZextHighShlAmt; +if (!match(Or0, + m_OneUse(m_Shl(m_Value(ZextHigh), m_APInt(ZextHighShlAmt) + return nullptr; + +if (!match(Or1, m_ZExt(m_Value(Low))) || +!match(ZextHigh, m_ZExt(m_Value(High goldsteinn wrote: Do you actually need the shifted value to be `zext`? I.e if you have `(shl (zext i24 to i32), 8)` that is bitwise equivilent to `(shl i32, 8)` https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -354,6 +354,48 @@ define <2 x i64> @fshl_select_vector(<2 x i64> %x, <2 x i64> %y, <2 x i64> %sham ret <2 x i64> %r } +; Convert 'or concat' to fshl if opposite 'or concat' exists. + +define i32 @fshl_concat(i8 %x, i24 %y, ptr %addr) { +; CHECK-LABEL: @fshl_concat( +; CHECK-NEXT:[[ZEXT_X:%.*]] = zext i8 [[X:%.*]] to i32 +; CHECK-NEXT:[[SLX:%.*]] = shl nuw i32 [[ZEXT_X]], 24 +; CHECK-NEXT:[[ZEXT_Y:%.*]] = zext i24 [[Y:%.*]] to i32 +; CHECK-NEXT:[[XY:%.*]] = or i32 [[SLX]], [[ZEXT_Y]] +; CHECK-NEXT:store i32 [[XY]], ptr [[ADDR:%.*]], align 4 +; CHECK-NEXT:[[YX:%.*]] = call i32 @llvm.fshl.i32(i32 [[XY]], i32 [[XY]], i32 8) +; CHECK-NEXT:ret i32 [[YX]] +; + %zext.x = zext i8 %x to i32 + %slx = shl nuw i32 %zext.x, 24 + %zext.y = zext i24 %y to i32 + %xy = or i32 %zext.y, %slx + store i32 %xy, ptr %addr, align 4 + %sly = shl nuw i32 %zext.y, 8 + %yx = or i32 %zext.x, %sly + ret i32 %yx +} + +define <2 x i32> @fshl_concat_vector(<2 x i8> %x, <2 x i24> %y, ptr %addr) { +; CHECK-LABEL: @fshl_concat_vector( +; CHECK-NEXT:[[ZEXT_X:%.*]] = zext <2 x i8> [[X:%.*]] to <2 x i32> +; CHECK-NEXT:[[SLX:%.*]] = shl nuw <2 x i32> [[ZEXT_X]], +; CHECK-NEXT:[[ZEXT_Y:%.*]] = zext <2 x i24> [[Y:%.*]] to <2 x i32> +; CHECK-NEXT:[[XY:%.*]] = or <2 x i32> [[SLX]], [[ZEXT_Y]] +; CHECK-NEXT:store <2 x i32> [[XY]], ptr [[ADDR:%.*]], align 4 +; CHECK-NEXT:[[YX:%.*]] = call <2 x i32> @llvm.fshl.v2i32(<2 x i32> [[XY]], <2 x i32> [[XY]], <2 x i32> ) +; CHECK-NEXT:ret <2 x i32> [[YX]] +; + %zext.x = zext <2 x i8> %x to <2 x i32> + %slx = shl nuw <2 x i32> %zext.x, + %zext.y = zext <2 x i24> %y to <2 x i32> + %xy = or <2 x i32> %slx, %zext.y + store <2 x i32> %xy, ptr %addr, align 4 + %sly = shl nuw <2 x i32> %zext.y, + %yx = or <2 x i32> %sly, %zext.x + ret <2 x i32> %yx +} goldsteinn wrote: Can you also add an `i8`/`i8` test? Can you also add some todo/negative tests? Missing `zext` on low/high. shift amount not matching. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
https://github.com/goldsteinn edited https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
https://github.com/goldsteinn edited https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
@@ -2840,6 +2841,46 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { return nullptr; FShiftArgs = {ShVal0, ShVal1, ShAmt}; + } else if (isa(Or0) || isa(Or1)) { +// If there are two 'or' instructions concat variables in opposite order, +// the latter one can be safely convert to fshl. +// +// LowHigh = or (shl (zext Low), Width - ZextHighShlAmt), (zext High) +// HighLow = or (shl (zext High), ZextHighShlAmt), (zext Low) +// -> +// HighLow = fshl LowHigh, LowHigh, ZextHighShlAmt goldsteinn wrote: Comment doesn't seem to match the code. Seem to matching: `LowHigh = (zext (or Shl, Low, Width-ZExtHighShlAmt), High))`, (likewise for high). Not `zext` on the `or` operands. https://github.com/llvm/llvm-project/pull/68502 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [InstCombine] Convert or concat to fshl if opposite or concat exists (PR #68502)
https://github.com/HaohaiWen updated https://github.com/llvm/llvm-project/pull/68502 >From 5b3b1bbb5b263bc5711adde031d85b1461ccbab6 Mon Sep 17 00:00:00 2001 From: Haohai Wen Date: Sat, 7 Oct 2023 13:48:32 +0800 Subject: [PATCH 1/2] [InstCombine] Refactor matchFunnelShift to allow more pattern (NFC) Current implementation of matchFunnelShift only allows opposite shift pattern. Refactor it to allow more pattern. --- .../InstCombine/InstCombineAndOrXor.cpp | 172 ++ 1 file changed, 93 insertions(+), 79 deletions(-) diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp index cbdab3e9c5fb91d..b04e070fd19d7d1 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp @@ -2732,100 +2732,114 @@ static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) { // rotate matching code under visitSelect and visitTrunc? unsigned Width = Or.getType()->getScalarSizeInBits(); - // First, find an or'd pair of opposite shifts: - // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1) - BinaryOperator *Or0, *Or1; - if (!match(Or.getOperand(0), m_BinOp(Or0)) || - !match(Or.getOperand(1), m_BinOp(Or1))) -return nullptr; - - Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1; - if (!match(Or0, m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0 || - !match(Or1, m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1 || - Or0->getOpcode() == Or1->getOpcode()) + Instruction *Or0, *Or1; + if (!match(Or.getOperand(0), m_Instruction(Or0)) || + !match(Or.getOperand(1), m_Instruction(Or1))) return nullptr; - // Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)). - if (Or0->getOpcode() == BinaryOperator::LShr) { -std::swap(Or0, Or1); -std::swap(ShVal0, ShVal1); -std::swap(ShAmt0, ShAmt1); - } - assert(Or0->getOpcode() == BinaryOperator::Shl && - Or1->getOpcode() == BinaryOperator::LShr && - "Illegal or(shift,shift) pair"); - - // Match the shift amount operands for a funnel shift pattern. This always - // matches a subtraction on the R operand. - auto matchShiftAmount = [&](Value *L, Value *R, unsigned Width) -> Value * { -// Check for constant shift amounts that sum to the bitwidth. -const APInt *LI, *RI; -if (match(L, m_APIntAllowUndef(LI)) && match(R, m_APIntAllowUndef(RI))) - if (LI->ult(Width) && RI->ult(Width) && (*LI + *RI) == Width) -return ConstantInt::get(L->getType(), *LI); - -Constant *LC, *RC; -if (match(L, m_Constant(LC)) && match(R, m_Constant(RC)) && -match(L, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && -match(R, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) && -match(ConstantExpr::getAdd(LC, RC), m_SpecificIntAllowUndef(Width))) - return ConstantExpr::mergeUndefsWith(LC, RC); - -// (shl ShVal, X) | (lshr ShVal, (Width - x)) iff X < Width. -// We limit this to X < Width in case the backend re-expands the intrinsic, -// and has to reintroduce a shift modulo operation (InstCombine might remove -// it after this fold). This still doesn't guarantee that the final codegen -// will match this original pattern. -if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L) { - KnownBits KnownL = IC.computeKnownBits(L, /*Depth*/ 0, &Or); - return KnownL.getMaxValue().ult(Width) ? L : nullptr; -} + bool IsFshl = true; // Sub on LSHR. + SmallVector FShiftArgs; -// For non-constant cases, the following patterns currently only work for -// rotation patterns. -// TODO: Add general funnel-shift compatible patterns. -if (ShVal0 != ShVal1) + // First, find an or'd pair of opposite shifts: + // or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1) + if (isa(Or0) && isa(Or1)) { +Value *ShVal0, *ShVal1, *ShAmt0, *ShAmt1; +if (!match(Or0, + m_OneUse(m_LogicalShift(m_Value(ShVal0), m_Value(ShAmt0 || +!match(Or1, + m_OneUse(m_LogicalShift(m_Value(ShVal1), m_Value(ShAmt1 || +Or0->getOpcode() == Or1->getOpcode()) return nullptr; -// For non-constant cases we don't support non-pow2 shift masks. -// TODO: Is it worth matching urem as well? -if (!isPowerOf2_32(Width)) - return nullptr; +// Canonicalize to or(shl(ShVal0, ShAmt0), lshr(ShVal1, ShAmt1)). +if (Or0->getOpcode() == BinaryOperator::LShr) { + std::swap(Or0, Or1); + std::swap(ShVal0, ShVal1); + std::swap(ShAmt0, ShAmt1); +} +assert(Or0->getOpcode() == BinaryOperator::Shl && + Or1->getOpcode() == BinaryOperator::LShr && + "Illegal or(shift,shift) pair"); + +// Match the shift amount operands for a funnel shift pattern. This always +// matches a subtraction on the R operand. +auto matchShiftAmount = [&](Value *L