[PATCH] D79279: Add overloaded versions of builtin mem* functions

Richard Smith - zygoloid via Phabricator via cfe-commits Wed, 05 Aug 2020 11:43:09 -0700

rsmith added a comment.

Two observations that are new to me in this review:

1. We already treat all builtins as being overloaded on address space.
2. The revised patch treats `__builtin_mem*_overloaded` as being overloaded 
*only* on address space, volatility, and atomicity. (We've tuned the design to 
a point where the other qualifiers don't matter any more.)

So, we're adding three features here: overloading on (a) address space, (b) 
volatility, and (c) atomicity. (a) is already available in the 
non-`_overloaded` form, and we seem to be approaching agreement that (b) should 
be available in the non-`_overloaded` form too. So that only leaves (c), which 
is really not `_overloaded` but `_atomic`.

Based on those observations I'm now thinking that we might prefer a somewhat 
different approach (but one that should require only minimal changes to the 
patch in front of us). Specifically:

1. Stop treating lib builtins (eg, plain `memcpy`) as overloaded on address 
space. That's a (pre-existing) conformance bug, at least for the Embedded C TR.
2. Keep `__builtin_` forms of lib builtins overloaded on address space. (No 
change.)
3. Also overload `__builtin_` forms of lib builtins on volatility where it 
makes sense, instead of adding new builtin names `__builtin_mem*_overloaded`.
4. Add a new name for the builtin for the atomic forms of `memcpy` and `memset` 
(`__builtin_memcpy_unordered_atomic` maybe?).
5. Convert the "trivial types" check from an error to a warning and apply it to 
all the mem* overloads. (Though I think we might actually already have such a 
check, so this might only require extending it to cover the atomic builtin.)

What do you think?

================
Comment at: clang/docs/LanguageExtensions.rst:2434
+* ``volatile``
+* ``restrict``
+* ``__unaligned``
----------------
Does `restrict` really make sense here? It seems like the only difference it 
could possibly make would be to treat `memcpy` as `memmove` if either operand 
is marked restrict, but
(a) if the caller wants that, they can just use `memcpy` directly, and
(b) it's not correct to propagate restrict-ness from the caller to the callee 
anyway, because restrict-ness is really a property of the declaration of the 
identifier in its scope, not a property of its type:
```
void f(void *restrict p) {
  __builtin_memmove(p, p + 1, 4);
}
```
(c) a restrict qualifier on the pointee type is irrelevant to memcpy and a 
restrict qualifier on the pointer type isn't part of QUAL.

================
Comment at: clang/docs/LanguageExtensions.rst:2435
+* ``restrict``
+* ``__unaligned``
+* non-default address spaces
----------------
I don't think ``__unaligned`` matters any more. We always take the actual 
alignment inferred from the pointer arguments, just like we do for 
non-overloaded `memcpy`.

================
Comment at: clang/docs/LanguageExtensions.rst:2455-2456
+
+The overloaded builtins expect both destination and source to be trivially
+copyable types.
+
----------------

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2639-2640
   case Builtin::BI__builtin_mempcpy: {
+    QualType DestTy = getPtrArgType(CGM, E, 0);
+    QualType SrcTy = getPtrArgType(CGM, E, 1);
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
----------------
jfb wrote:
> rsmith wrote:
> > jfb wrote:
> > > rsmith wrote:
> > > > Looking through implicit conversions in `getPtrArgType` here will 
> > > > change the code we generate for cases like:
> > > > 
> > > > ```
> > > > void f(volatile void *p, volatile void *q) {
> > > >   memcpy(p, q, 4);
> > > > }
> > > > ```
> > > > 
> > > > ... (in C, where we permit such implicit conversions) to use a volatile 
> > > > memcpy intrinsic. Is that an intentional change?
> > > I'm confused... what's the difference that this makes for the 
> > > pre-existing builtins? My intent was to get the `QualType` 
> > > unconditionally, but I can conditionalize it if needed... However this 
> > > ought to make no difference:
> > > ```
> > > static QualType getPtrArgType(CodeGenModule &CGM, const CallExpr *E,
> > >                               unsigned ArgNo) {
> > >   QualType ArgTy = E->getArg(ArgNo)->IgnoreImpCasts()->getType();
> > >   if (ArgTy->isArrayType())
> > >     return CGM.getContext().getAsArrayType(ArgTy)->getElementType();
> > >   if (ArgTy->isObjCObjectPointerType())
> > >     return 
> > > ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
> > >   return ArgTy->castAs<clang::PointerType>()->getPointeeType();
> > > }
> > > ```
> > > and indeed I can't see the example you provided change in IR from one to 
> > > the other. The issue I'm working around is that getting it 
> > > unconditionally would make ObjC code sad when `id` is passed in as I 
> > > outlined above.
> > The example I gave should produce a non-volatile memcpy, and used to do so 
> > (we passed `false` as the fourth parameter to `CreateMemCpy`). With this 
> > patch, `getPtrArgType`will strip off the implicit conversion from `volatile 
> > void*` to `void*` in the argument type, so `isVolatile` below will be 
> > `true`, so I think it will now create a volatile memcpy for the same 
> > testcase. If that's not what's happening, then I'd like to understand why 
> > not :)
> > 
> > I'm not saying that's necessarily a bad change, but it is a change, and 
> > it's one we should make intentionally if we make it at all.
> Oh yes, sorry I thought you were talking about something that `getPtrArgType` 
> did implicitly! Indeed the C code behaves differently in that it doesn't just 
> strip `volatile` anymore.
> 
> I'm not super thrilled by the default C behavior, and I think this new 
> behavior removes a gotcha, and is in fact what I was going for in the first 
> iteration of the patch. Now that I've separated the builtin I agree that it's 
> a bit odd... but it seems like the right thing to do anyways? But it no 
> longer matches the C library function to do so.
> 
> FWIW, this currently "works as you'd expect":
> ```
> void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) 
> {
>     __builtin_memcpy(dst, src, sz);
> }
> ```
> https://godbolt.org/z/dcWxcK
> 
> and I think that's completely accidental because the C library function 
> doesn't (and, as John pointed out earlier, the builtin is meant to act like 
> the C function).
> FWIW, this currently "works as you'd expect":
>
> ```
> void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) 
> {
>     __builtin_memcpy(dst, src, sz);
> }
> ```

The same is true even if you remove the `__builtin_` (and add a suitable 
include), and that seems like a bug to me. It looks like we have special logic 
that treats *all* builtins taking pointers as being overloaded on address 
space, which seems wrong at least for lib builtins. C TR 18037:2008 is quite 
explicit about this. Section 5.1.4 says:

"""
The standard C library (ISO/IEC 9899:1999 clause 7 - Libraries) is unchanged; 
the library's
functions and objects continue to be declared only with regard to the generic 
address space. One
consequence is that pointers into named address spaces cannot be passed as 
arguments to library
functions except in the special case that the named address spaces are subsets 
of the generic
address space.
"""

We could retain that special rule for `__builtin_`-spelled variants of lib 
builtins. If we do, then maybe we shouldn't be adding 
`__builtin_memcpy_overloaded` at all and should only extend the behavior of 
`__builtin_memcpy` to also propagate volatility (and add a new builtin for the 
atomic case).

Regarding volatile, consider:

```
void maybe_volatile_memcpy(volatile void *dst, const volatile void *src, int 
sz, _Bool is_volatile) {
  if (is_volatile) {
#ifdef __clang__
    __builtin_memcpy_overloaded(dst, src, sz);
#elif __GNUC__
    // ...
#else
    // volatile char copy loop
#endif
  }
  memcpy(dst, src, sz);
}
```

With this patch, the above code will always perform a volatile `memcpy`. I 
think that's a surprise. A call to `memcpy` should follow the C semantics, even 
if we choose to change the semantics of `__builtin_memcpy`.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:5732
+
+    if (!DstValTy.isTriviallyCopyableType(Context) && !DstValTy->isVoidType())
+      return ExprError(Diag(TheCall->getBeginLoc(),
----------------
You need to call `Sema::isCompleteType` first before asking this question, in 
order to trigger class instantiation when necessary in C++. (Likewise for the 
checks in the previous function.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79279/new/

https://reviews.llvm.org/D79279

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79279: Add overloaded versions of builtin mem* functions

Reply via email to