[PATCH] D72114: [MS] Overhaul how clang passes overaligned args on x86_32

John McCall via Phabricator via cfe-commits Tue, 07 Jan 2020 00:08:55 -0800

rjmccall added inline comments.


================
Comment at: clang/test/CodeGen/x86_32-arguments-win32.c:77
+// CHECK-LABEL: define dso_local void @receive_vec_256(<8 x float> inreg %x, 
<8 x float> inreg %y, <8 x float> inreg %z, <8 x float>* %0, <8 x float>* %1)
+// CHECK-LABEL: define dso_local void @receive_vec_512(<16 x float> inreg %x, 
<16 x float> inreg %y, <16 x float> inreg %z, <16 x float>* %0, <16 x float>* 
%1)
+// CHECK-LABEL: define dso_local void @receive_vec_1024(<32 x float>* %0, <32 
x float>* %1, <32 x float>* %2, <32 x float>* %3, <32 x float>* %4)
----------------
AntonYudintsev wrote:
> rjmccall wrote:
> > rnk wrote:
> > > rjmccall wrote:
> > > > rnk wrote:
> > > > > craig.topper wrote:
> > > > > > What happens in the backend with inreg if 512-bit vectors aren't 
> > > > > > legal?
> > > > > LLVM splits the vector up using the largest legal vector size. As 
> > > > > many pieces as possible are passed in available XMM/YMM registers, 
> > > > > and the rest are passed in memory. MSVC, of course, assumes the user 
> > > > > wanted the larger vector size, and uses whatever vector instructions 
> > > > > it needs to move the arguments around.
> > > > > 
> > > > > Previously, I advocated for a model where calling an Intel intrinsic 
> > > > > function had the effect of implicitly marking the caller with the 
> > > > > target attributes of the intrinsic. This falls down if the user tries 
> > > > > to write a single function that conditionally branches between code 
> > > > > that uses different instruction set extensions. You can imagine the 
> > > > > SSE2 codepath accidentally using AVX instructions because the 
> > > > > compiler thinks they are better. I'm told that ICC models CPU 
> > > > > micro-architectural features in the CFG, but I don't ever expect that 
> > > > > LLVM will do that. If we're stuck with per-function CPU feature 
> > > > > settings, it seems like it would be nice to try to do what the user 
> > > > > asked by default, and warn the user if we see them doing a cpuid 
> > > > > check in a function that has been implicitly blessed with some target 
> > > > > attributes. You could imagine doing a similar thing when large vector 
> > > > > type variables are used: if a large vector argument or local is used, 
> > > > > implicitly enable the appropriate target features to move vectors of 
> > > > > that size around.
> > > > > 
> > > > > This idea didn't get anywhere, and the current situation has 
> > > > > persisted.
> > > > > 
> > > > > ---
> > > > > 
> > > > > You know, maybe we should just keep clang the way it is, and just set 
> > > > > up a warning in the backend that says "hey, I split your large 
> > > > > vector. You probably didn't want that." And then we just continue 
> > > > > doing what we do now. Nobody likes backend warnings, but it seems 
> > > > > better than the current direction of the frontend knowing every 
> > > > > detail of x86 vector extensions.
> > > > If target attributes affect ABI, it seems really dangerous to 
> > > > implicitly set attributes based on what intrinsics are called.
> > > > 
> > > > The local CPU-testing problem seems similar to the problems with local 
> > > > `#pragma STDC FENV_ACCESS` blocks that the constrained-FP people are 
> > > > looking into.  They both have a "this operation is normally fully 
> > > > optimizable, but we might need to be more careful in specific 
> > > > functions" aspect to them.  I wonder if there's a reasonable way to 
> > > > unify the approaches, or at least benefit from lessons learned.
> > > I agree, we wouldn't want intrinsic usage to change ABI. But, does 
> > > anybody actually want the behavior that LLVM implements today where large 
> > > vectors get split across registers and memory? I think most users who 
> > > pass a 512 bit vector want it to either be passed in ZMM registers or 
> > > fail to compile. Why do we even support passing 1024 bit vectors? Could 
> > > we make that an error?
> > > 
> > > Anyway, major redesigns aside, should clang do something when large 
> > > vectors are passed? Maybe we should warn here? Passing by address is 
> > > usually the safest way to pass something, so that's an option. 
> > > Implementing splitting logic in clang doesn't seem worth it.
> > > I agree, we wouldn't want intrinsic usage to change ABI. But, does 
> > > anybody actually want the behavior that LLVM implements today where large 
> > > vectors get split across registers and memory?
> > 
> > I take it you're implying that the actual (Windows-only?) platform ABI 
> > doesn't say anything about this because other compilers don't allow large 
> > vectors.  How large are the vectors we do have ABI rules for?  Do they have 
> > the problem as the SysV ABI where the ABI rules are sensitive to compiler 
> > flags?
> > 
> > Anyway, I didn't realize the i386 Windows ABI *ever* used registers for 
> > arguments.  (Whether you can convince LLVM to do so  for a function 
> > signature that Clang isn't supposed to emit for ABI-conforming functions is 
> > a separate story.)   You're saying it uses them for vectors?  Presumably up 
> > to some limit, and only when they're non-variadic arguments?
> > You're saying it uses them for vectors? Presumably up to some limit, and 
> > only when they're non-variadic arguments?
> 
> Sorry to cut in (I am the one who report the issue, and so looking forward 
> for this patch to be merged).
> Yes, MSVC/x86 win ABI uses three registers for first three arguments.
> 
> https://godbolt.org/z/PZ3dBa
Interesting.  And anything that would caused the stack to be used is still 
banned: passing more than 3 vectors or passing them as variadic arguments.

I guess they chose not to implement stack realignment when they implemented 
this, and rather than being stuck with a suboptimal ABI, they just banned the 
cases that would have required it.  Technically that means that they haven't 
committed to an ABI, so even though LLVM is perfectly happy to realign the 
stack when required, we shouldn't actually take advantage of that here, and 
instead we should honor the same restriction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72114/new/

https://reviews.llvm.org/D72114



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72114: [MS] Overhaul how clang passes overaligned args on x86_32

Reply via email to