[clang] [CIR][AArch64] Upstream vector-shift-right-and-insert NEON builtins (PR #196776)

Vicky Nguyen via cfe-commits Mon, 25 May 2026 12:53:43 -0700

================
@@ -369,12 +369,96 @@ static mlir::Value emitCommonNeonSISDBuiltinExpr(
   case NEON::BI__builtin_neon_vmaxv_f32:
   case NEON::BI__builtin_neon_vmaxvq_f32:
   case NEON::BI__builtin_neon_vmaxvq_f64:
-    return emitNeonCall(cgf.cgm, cgf.getBuilder(),
-                        {cgf.convertType(expr->getArg(0)->getType())}, ops,
-                        llvmIntrName, cgf.convertType(expr->getType()), loc);
+  case NEON::BI__builtin_neon_vsrid_n_s64:
+  case NEON::BI__builtin_neon_vsrid_n_u64:
+    break;
   }
 
-  return nullptr;
+  // Generic handling based on TypeModifier flags, mirroring
+  // EmitCommonNeonSISDBuiltinExpr + LookupNeonLLVMIntrinsic in ARM.cpp.
+  //
+  // The TypeModifier encodes how the intrinsic's argument and return types
+  // relate to the builtin's scalar types. For SISD builtins the key flags
+  // are:
+  //   - VectorizeArgTypes: wrap each arg type into a fixed-width vector
+  //   - Use64BitVectors / Use128BitVectors: choose the vector width
+  //     (when neither is set the vector has 1 element)
+  //   - AddRetType / VectorizeRetType: analogous flags for the return type
+  //
+  // ARM.cpp doesn't need to know about specific builtins like
+  // `vsrid_n_{s,u}64` because it lets LLVM resolve the intrinsic's
+  // signature (via `CGM.getIntrinsic`) and then walks the resolved
+  // Function* formal parameter types. CIR has no LLVMContext here, so
+  // we derive the same argument/result types directly from the Clang
+  // operand types.
+
+  unsigned modifier = info.TypeModifier;
+  CIRGenBuilderTy &builder = cgf.getBuilder();
+  mlir::Type argTy = cgf.convertType(expr->getArg(0)->getType());
+  mlir::Type resultTy = cgf.convertType(expr->getType());
+
+  int vectorSize = 0;
+  if (modifier & Use64BitVectors)
+    vectorSize = 64;
+  else if (modifier & Use128BitVectors)
+    vectorSize = 128;
+
+  auto wrapAsVector = [&](mlir::Type ty) -> cir::VectorType {
+    unsigned bits = cgf.cgm.getDataLayout().getTypeSizeInBits(ty);
+    unsigned elts = vectorSize ? vectorSize / bits : 1;
+    return cir::VectorType::get(ty, elts);
+  };
+
+  // Determine the vectorized data type.
+  cir::VectorType vecArgTy;
+  if (modifier & VectorizeArgTypes)
+    vecArgTy = wrapAsVector(argTy);
+
+  // Determine the intrinsic result type: `VectorizeRetType` returns a
+  // vector; otherwise, if data args are vectorized and `AddRetType` is
+  // unset, use a vector return with the same shape as those args.
+  mlir::Type funcResTy = resultTy;
+  if (modifier & VectorizeRetType)
+    funcResTy = wrapAsVector(resultTy);
+  else if (vecArgTy && !(modifier & AddRetType))
+    funcResTy = wrapAsVector(resultTy);
+
+  // Build the arg types for `emitNeonCall`.
+  llvm::SmallVector<mlir::Type> argTypes;
+  argTypes.reserve(ops.size());
+  for (mlir::Value op : ops) {
+    if (vecArgTy && op.getType() == argTy)
+      argTypes.push_back(vecArgTy);
+    else
+      argTypes.push_back(op.getType());
+  }
----------------
iamvickynguyen wrote:


You're right to question this! It isn't obvious at first glance :smile:

So far, in this PR, only `vsri*` need bitcast. For those, we need to loop 
through all the operands to check the types. `vsri*` has 1 overload knob but 2 
operands of that type, and  both need bitcast.
https://github.com/llvm/llvm-project/blob/e7887d5470e080c55fd69de83786fbf060d3955a/llvm/include/llvm/IR/IntrinsicsAArch64.td#L433-L433

https://github.com/llvm/llvm-project/blob/e7887d5470e080c55fd69de83786fbf060d3955a/llvm/include/llvm/IR/IntrinsicsAArch64.td#L200-L203

If we use `Add1ArgType` / `Add2ArgTypes` like in `ARM.cpp`, only the first 
operand would be bitcast and the second would stay as i64. In `ARM.cpp`, when 
`Add1ArgType` is set, they push `ArgType` once into the overload-type list for 
`getIntrinsic`.
https://github.com/llvm/llvm-project/blob/e7887d5470e080c55fd69de83786fbf060d3955a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp?plain=1#L1103-L1110

In `ARM.cpp`, the per-operand bitcasts happen later in `EmitNeonCall`
https://github.com/llvm/llvm-project/blob/e7887d5470e080c55fd69de83786fbf060d3955a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp?plain=1#L427-L440

The CIR loop does the same thing.

If we used `Add1ArgType` / `Add2ArgTypes` for this, `test_vsrid_n_s64` and 
`test_vsrid_n_u64` would fail.


https://github.com/llvm/llvm-project/pull/196776
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CIR][AArch64] Upstream vector-shift-right-and-insert NEON builtins (PR #196776)

Reply via email to