[clang] Fix codegen of consteval functions returning an empty class. (PR #93115)

2024-05-22 Thread Richard Smith via cfe-commits

https://github.com/zygoloid approved this pull request.

Looks good. Do we also need to worry about overwriting tail padding here?

https://github.com/llvm/llvm-project/pull/93115
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Implement CWG2351 `void{}` (PR #78060)

2024-05-21 Thread Richard Smith via cfe-commits

https://github.com/zygoloid approved this pull request.

LG

https://github.com/llvm/llvm-project/pull/78060
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Implement provisional wording for CWG2398 regarding packs (PR #90820)

2024-05-16 Thread Richard Smith via cfe-commits

https://github.com/zygoloid commented:

I think we should go ahead with this. The behavior here is subtle but I think 
it does make sense, and we're in the process of proposing this change to WG21.

https://github.com/llvm/llvm-project/pull/90820
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Enable C++17 relaxed template template argument matching by default (PR #89807)

2024-05-15 Thread Richard Smith via cfe-commits

zygoloid wrote:

> The immediate deprecation causes a few issues

On the one hand: waiting to deprecate something that we know we're going to 
deprecate usually doesn't help anyone. We delay getting the message out to our 
users, we sometimes forget to do it for the next release, and at best it means 
it takes an extra release cycle to deliver whatever kind of benefit we were 
aiming for. So I think that we should generally start issuing deprecation 
warnings as soon as the new thing is ready and we know that we plan to remove 
the old thing.

On the other hand: a warning without a dedicated warning flag is, for some of 
our users, the same thing as an error. And, for some of our users, a 
deprecation error is the same thing as removing the feature immediately. So 
that doesn't work in this case :)

Would it be reasonable to add a 
`-Wno-deprecated-relaxed-template-template-args` flag (or something like that) 
for this specific deprecation?

https://github.com/llvm/llvm-project/pull/89807
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Add attribute for consteval builtins; Declare constexpr builtins as constexpr in C++ (PR #91894)

2024-05-13 Thread Richard Smith via cfe-commits


@@ -14,13 +14,18 @@ void __builtin_va_copy(double d);
 // expected-error@+2 {{cannot redeclare builtin function '__builtin_va_end'}}
 // expected-note@+1 {{'__builtin_va_end' is a builtin with type}}
 void __builtin_va_end(__builtin_va_list);
-// RUN: %clang_cc1 %s -fsyntax-only -verify 
-// RUN: %clang_cc1 %s -fsyntax-only -verify -x c
 
 void __va_start(__builtin_va_list*, ...);
 
+  void *__builtin_assume_aligned(const void *, size_t, ...);
 #ifdef __cplusplus
-void *__builtin_assume_aligned(const void *, size_t, ...) noexcept;
-#else
-void *__builtin_assume_aligned(const void *, size_t, ...);
+constexpr void *__builtin_assume_aligned(const void *, size_t, ...);
+  void *__builtin_assume_aligned(const void *, size_t, ...) noexcept;
+constexpr void *__builtin_assume_aligned(const void *, size_t, ...) noexcept;
+  void *__builtin_assume_aligned(const void *, size_t, ...) throw();
+constexpr void *__builtin_assume_aligned(const void *, size_t, ...) throw();
+

zygoloid wrote:

I don't think there are particularly good non-historical reasons why we allow 
non-`LIBBUILTIN` builtins to be redeclared -- I think it's allowed primarily 
for compatibility with GCC and existing code, and it's not clear to me why GCC 
allows it. There were some examples of code redeclaring builtins in [this prior 
review thread](https://reviews.llvm.org/D45383).

https://github.com/llvm/llvm-project/pull/91894
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Fix malformed AST for anonymous class access in template. (PR #90842)

2024-05-13 Thread Richard Smith via cfe-commits

https://github.com/zygoloid approved this pull request.

Nothing further on my side, LGTM

https://github.com/llvm/llvm-project/pull/90842
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-12 Thread Richard Smith via cfe-commits

zygoloid wrote:

> I'd like to proposal a separate PR for static analyzer. #91879 WDYT?

That sounds good to me.

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Analyzer][CFG] Correctly handle rebuilt default arg and default init expression (PR #91879)

2024-05-12 Thread Richard Smith via cfe-commits


@@ -2433,6 +2429,30 @@ CFGBlock *CFGBuilder::VisitChildren(Stmt *S) {
   return B;
 }
 
+CFGBlock *CFGBuilder::VisitCXXDefaultArgExpr(CXXDefaultArgExpr *Arg,
+ AddStmtChoice asc) {
+  if (Arg->hasRewrittenInit()) {
+if (asc.alwaysAdd(*this, Arg)) {
+  autoCreateBlock();
+  appendStmt(Block, Arg);
+}
+return VisitStmt(Arg->getExpr(), asc);
+  }
+  return VisitStmt(Arg, asc);

zygoloid wrote:

Actually, are we safe from that even if the default argument is rewritten? Do 
we guarantee to recreate all subexpressions in that case?

https://github.com/llvm/llvm-project/pull/91879
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Analyzer][CFG] Correctly handle rebuilt default arg and default init expression (PR #91879)

2024-05-12 Thread Richard Smith via cfe-commits


@@ -2433,6 +2429,30 @@ CFGBlock *CFGBuilder::VisitChildren(Stmt *S) {
   return B;
 }
 
+CFGBlock *CFGBuilder::VisitCXXDefaultArgExpr(CXXDefaultArgExpr *Arg,
+ AddStmtChoice asc) {
+  if (Arg->hasRewrittenInit()) {
+if (asc.alwaysAdd(*this, Arg)) {
+  autoCreateBlock();
+  appendStmt(Block, Arg);
+}
+return VisitStmt(Arg->getExpr(), asc);
+  }
+  return VisitStmt(Arg, asc);

zygoloid wrote:

I think it'd be useful to keep some of the old comment here: we can't add the 
default argument if it's not rewritten because we could end up with the same 
expression appearing multiple times.

https://github.com/llvm/llvm-project/pull/91879
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-08 Thread Richard Smith via cfe-commits

https://github.com/zygoloid approved this pull request.

Looks great!

I think it'd be good to understand what's happening with the static analyzer 
test, and update the comment to explain. But given that this is fixing a 
conformance / wrong code bug, I don't think we should block this PR on fixing 
any issue that turns up in the analyzer, unless it turns out that there's still 
a conformance issue here.

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits


@@ -711,6 +711,26 @@ void InitListChecker::FillInEmptyInitForField(unsigned 
Init, FieldDecl *Field,
   if (VerifyOnly)
 return;
 
+  // Enter a lifetime extension context, then we can support lifetime
+  // extension of temporary created by aggregate initialization using a
+  // default member initializer (DR1815 https://wg21.link/CWG1815).
+  //
+  // In a lifetime extension context, BuildCXXDefaultInitExpr will clone 
the
+  // initializer expression on each use that would lifetime extend its
+  // temporaries.
+  EnterExpressionEvaluationContext LifetimeExtensionContext(
+  SemaRef, Sema::ExpressionEvaluationContext::PotentiallyEvaluated,
+  /*LambdaContextDecl=*/nullptr,
+  Sema::ExpressionEvaluationContextRecord::EK_Other, true);
+
+  // Lifetime extension in default-member-init.
+  auto  = SemaRef.ExprEvalContexts.back();
+
+  // Just copy previous record, make sure we haven't forget anything.
+  LastRecord =
+  SemaRef.ExprEvalContexts[SemaRef.ExprEvalContexts.size() - 2];
+  LastRecord.InLifetimeExtendingContext = true;

zygoloid wrote:

This doesn't look right -- this would lifetime-extend *all* temporaries in the 
initializer. Only those that are actually bound to a reference member should be 
extended here.

Perhaps instead of these changes involving `InLifetimeExtendedContext`, you 
could change `BuildCXXDefaultInitExpr` to always rebuild the initializer if it 
contains any temporaries (if the initializer expression is an 
`ExprWithCleanups`). Then make sure the normal lifetime extension code recurses 
into the default initializer and does lifetime extension when warranted.

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits


@@ -206,13 +206,10 @@ namespace cwg1814 { // cwg1814: yes
 #endif
 }
 
-namespace cwg1815 { // cwg1815: no
+namespace cwg1815 { // cwg1815: yes

zygoloid wrote:

A test for constant evaluation would be nice here too. Maybe:
```c++
struct C { const int  = 0; };
constexpr C c = {};
static_assert(c.r == 0);
```
or
```c++
constexpr int f() {
  A a = {};
  return a.r;
}
static_assert(f() == 0);
```

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits


@@ -10698,7 +10698,7 @@ C++ defect report implementation 
status
 https://cplusplus.github.io/CWG/issues/1815.html;>1815
 CD4
 Lifetime extension in aggregate initialization
-No
+Clang 19

zygoloid wrote:

```suggestion
Clang 19
```
We use a different CSS class here for features that haven't been in an official 
release yet.

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits


@@ -122,7 +122,7 @@ void aggregateWithReferences() {
   clang_analyzer_dump(viaReference.ry); // expected-warning-re 
{{_extended_object{Composite, viaReference, S{{[0-9]+}}} }}
 
   // clang does not currently implement extending lifetime of object bound to 
reference members of aggregates,
-  // that are created from default member initializer (see 
`warn_unsupported_lifetime_extension` from `-Wdangling`)
+  // that are created from default member initializer (see 
`warn_unsupported_lifetime_extension` from `-Wdangling`) (Supported now).
   RefAggregate defaultInitExtended{i}; // clang-bug does not extend `Composite`
   clang_analyzer_dump(defaultInitExtended.ry); // expected-warning {{Unknown }}

zygoloid wrote:

If the bug is fixed, does the analyzer output not change here? In any case, the 
comment should explain whatever the new behavior is, and shouldn't be 
mentioning a warning that no longer exists.

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits


@@ -269,6 +269,26 @@ void init_capture_init_list() {
   // CHECK: }
 }
 
+void check_dr1815() { // dr1815: yes
+#if __cplusplus >= 201402L
+
+  struct A {
+int & = 0;
+~A() {}
+  };
+
+  struct B {
+A & = A{};
+~B() {}
+  };
+
+  // CHECK: void @_Z12check_dr1815v()
+  // CHECK: call void @_ZZ12check_dr1815vEN1BD1Ev(
+  // CHECK: call void @_ZZ12check_dr1815vEN1AD1Ev(
+  B a = {};

zygoloid wrote:

This check of destructor ordering is a bit too subtle for my tastes. How about:
```suggestion
  // CHECK: void @_Z12check_dr1815v()
  B a = {};
  // CHECK: call {{.*}}some_other_function
  extern void some_other_function();
  some_other_function();
  // CHECK: call void @_ZZ12check_dr1815vEN1BD1Ev(
  // CHECK: call void @_ZZ12check_dr1815vEN1AD1Ev(
```
... to really drive home that the `A` temporary isn't destroyed until the end 
of the function?

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits


@@ -8194,25 +8216,18 @@ void Sema::checkInitializerLifetime(const 
InitializedEntity ,
   }
 
   switch (shouldLifetimeExtendThroughPath(Path)) {
+  case PathLifetimeKind::ShouldExtend:

zygoloid wrote:

Do we need separate `ShouldExtend` and `Extend` kinds still, or can we combine 
them?

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][CWG1815] Support lifetime extension of temporary created by aggregate initialization using a default member initializer (PR #87933)

2024-05-03 Thread Richard Smith via cfe-commits

https://github.com/zygoloid commented:

Thanks for working on this.

https://github.com/llvm/llvm-project/pull/87933
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Fix malformed AST for anonymous class access in template. (PR #90842)

2024-05-02 Thread Richard Smith via cfe-commits

https://github.com/zygoloid commented:

Another possibility to consider: when [transforming a member 
access](https://github.com/llvm/llvm-project/blob/ff210b94d449de8ebe1f32cf0d7763ba63b27b39/clang/lib/Sema/TreeTransform.h#L11950),
 strip off any implicit member accesses from the base expression before 
transforming the base. That way, we'll rebuild the implicit member access from 
scratch, which might be a little more work, but should get details like this 
one right. (We can then probably also remove the logic to deal with an absent 
`DeclName` in `RebuildMemberExpr` since that shouldn't happen any more.)

https://github.com/llvm/llvm-project/pull/90842
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [SemaCXX] Implement CWG2351 `void{}` (PR #78060)

2024-04-29 Thread Richard Smith via cfe-commits

zygoloid wrote:

> Note that the AST for the expression `T{}` looks like:
> 
> ```
> // using T = int;
> CXXFunctionalCastExpr 'T':'int' functional cast to T 
> `-InitListExpr 'T':'int'
> // using T = const int;
> CXXFunctionalCastExpr 'int' functional cast to T 
> `-InitListExpr 'int'
> // using T = void;
> CXXFunctionalCastExpr 'void' functional cast to T 
> `-InitListExpr 'void'
> // using T = const void;
> CXXFunctionalCastExpr 'void' functional cast to T 
> `-InitListExpr 'void'
> ```
> 
> (Since the `InitListExpr` already has `void` before anything is done, the 
> type doesn't need to be adjusted)

Nonetheless, I think it'd be more reasonable to use `CK_ToVoid` here rather 
than `CK_NoOp`. The `void` type for the `InitListExpr` is just a placeholder 
and not meant to mean that it's a real expression of type `void`. (We really 
ought to use a better placeholder there, so we can print the type out as 
`` instead of as `void` in diagnostics.)

https://github.com/llvm/llvm-project/pull/78060
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] deprecate frelaxed-template-template-args, make it on by default (PR #89807)

2024-04-23 Thread Richard Smith via cfe-commits


@@ -8343,58 +8343,52 @@ bool 
Sema::CheckTemplateTemplateArgument(TemplateTemplateParmDecl *Param,
   // C++1z [temp.arg.template]p3: (DR 150)
   //   A template-argument matches a template template-parameter P when P
   //   is at least as specialized as the template-argument A.
-  // FIXME: We should enable RelaxedTemplateTemplateArgs by default as it is a
-  //  defect report resolution from C++17 and shouldn't be introduced by
-  //  concepts.
-  if (getLangOpts().RelaxedTemplateTemplateArgs) {

zygoloid wrote:

I think the goal should be to remove the flags eventually, but given that it's 
easy to keep the flags functioning, I think it'd be a good idea to do so for at 
least long enough for people to tell us if this change breaks something. Maybe 
one release cycle with the default flipped and the flags working but warning, 
then we remove them if people aren't complaining?

https://github.com/llvm/llvm-project/pull/89807
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse (PR #89005)

2024-04-19 Thread Richard Smith via cfe-commits

zygoloid wrote:

> Unfortunately with this patch I'm still seeing the same 
> source-location-exhausted error.

Can you try applying #89428 as well? I think we would need both in order to 
prune the module map from the serialized SLocEntries.

https://github.com/llvm/llvm-project/pull/89005
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Only avoid pruning module maps when asked to (PR #89428)

2024-04-19 Thread Richard Smith via cfe-commits

https://github.com/zygoloid approved this pull request.

Thank you!

https://github.com/llvm/llvm-project/pull/89428
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse (PR #89005)

2024-04-17 Thread Richard Smith via cfe-commits


@@ -1313,6 +1313,14 @@ OptionalFileEntryRef 
HeaderSearch::LookupSubframeworkHeader(
 // File Info Management.
 
//===--===//
 
+static bool
+headerFileInfoModuleBitsMatchRole(const HeaderFileInfo *HFI,
+  ModuleMap::ModuleHeaderRole Role) {
+  return (HFI->isModuleHeader == ModuleMap::isModular(Role)) &&
+ (HFI->isTextualModuleHeader ==
+  ((Role & ModuleMap::TextualHeader) != 0));

zygoloid wrote:

If a file is in multiple modules, we use the logic in [`findModuleForHeader` 
and 
`isBetterKnownHeader`](https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/ModuleMap.cpp#L578)
 to pick between them when deciding how to handle a `#include`: prefer the 
declaration from the current module over anything else, prefer available over 
non-available, public over private, non-textual over textual, and (if we're 
allowing excluded headers at all) non-excluded over excluded.

If there's still a tie, then yeah, we make an arbitrary decision. (Not 
non-deterministic -- we prefer the first-loaded module map -- but also not 
likely to be predictable or useful. But it should generally not matter which 
one we pick -- the header should have been parsed and interpreted in the same 
way regardless -- unless different `.pcm`s have different compiler flags, in 
which case we have an ODR violation.)

The point of `textual header` -- at least in our configuration -- is to mark a 
header as being intentionally part of a module, so that 
`[no_undeclared_includes]` / `-fstrict-modules-decluse` can diagnose a 
`#include` of a file that wasn't a declared dependency. This means that it can 
be reasonable for a header to be a textual header in one module and a 
non-textual header in another module -- in particular, that would mean that 
code with a direct dependency on the first module is permitted to include the 
header, but that the header is not built as part of building that library. If 
that compilation is passed a `.pcm` containing the same header as a non-textual 
header, the desired behavior is that we perform an import for that header 
(treat it as non-textual for `#include` handling) but use the `textual header` 
declaration when determining whether the inclusion is permitted.

https://github.com/llvm/llvm-project/pull/89005
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse (PR #89005)

2024-04-17 Thread Richard Smith via cfe-commits


@@ -1313,6 +1313,14 @@ OptionalFileEntryRef 
HeaderSearch::LookupSubframeworkHeader(
 // File Info Management.
 
//===--===//
 
+static bool
+headerFileInfoModuleBitsMatchRole(const HeaderFileInfo *HFI,
+  ModuleMap::ModuleHeaderRole Role) {
+  return (HFI->isModuleHeader == ModuleMap::isModular(Role)) &&
+ (HFI->isTextualModuleHeader ==
+  ((Role & ModuleMap::TextualHeader) != 0));

zygoloid wrote:

Should this be:
```suggestion
  return (HFI->isModuleHeader == ModuleMap::isModular(Role)) &&
 (HFI->isModuleHeader ||
  HFI->isTextualModuleHeader ==
  ((Role & ModuleMap::TextualHeader) != 0));
```
It looks like you're considering "modular header" and "textual header" to be 
mutually exclusive in `HeaderFileInfo` when merging, so presumably we should do 
the same here. I don't think this would make a difference for our use case, 
though.

Incidentally, this choice of meaning for `isTextualModuleHeader` seems a little 
confusing to me from a modeling perspective, given that a header can absolutely 
be modular in one module and textual in another, but I think it's fine from a 
correctness standpoint. I think it'd be clearer to either use a three-value 
enum here, or to independently track whether the header is textual or modular 
and change the check below to `isTextualModuleHeader && !isModuleHeader`. But 
that's not relevant for this PR.

https://github.com/llvm/llvm-project/pull/89005
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Only compute affecting module maps with implicit search (PR #87849)

2024-04-16 Thread Richard Smith via cfe-commits


@@ -161,8 +161,13 @@ static TypeCode getTypeCodeForTypeClass(Type::TypeClass 
id) {
 
 namespace {
 
-std::set GetAffectingModuleMaps(const Preprocessor ,
-   Module *RootModule) {
+std::optional>
+GetAffectingModuleMaps(const Preprocessor , Module *RootModule) {
+  // Without implicit module map search, there's no good reason to know about
+  // any module maps that are not affecting.
+  if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ImplicitModuleMaps)
+return std::nullopt;

zygoloid wrote:

Yes, that sounds great, thank you. In our case, it's not that the module map 
files are unused, it's that we specify the same `-fmodule-map-file=` flag to 
all of our compilations, and so we *usually* get the information from a `.pcm` 
file rather than from the module map file, and it turns out that pruning out 
the repeated module maps make a rather large difference to our source location 
address space usage :)

All that said, we have some other ideas for how to address the problems we're 
seeing with source location address space usage. Maybe we can get to a point 
where we're not relying on this at all -- I think this patch is probably the 
right behavior, even though I think it's contributing to a regression for us, 
so I think this new behavior should be the default, and we can clean up the 
flag once we don't need it any more.

https://github.com/llvm/llvm-project/pull/87849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Only compute affecting module maps with implicit search (PR #87849)

2024-04-16 Thread Richard Smith via cfe-commits


@@ -161,8 +161,13 @@ static TypeCode getTypeCodeForTypeClass(Type::TypeClass 
id) {
 
 namespace {
 
-std::set GetAffectingModuleMaps(const Preprocessor ,
-   Module *RootModule) {
+std::optional>
+GetAffectingModuleMaps(const Preprocessor , Module *RootModule) {
+  // Without implicit module map search, there's no good reason to know about
+  // any module maps that are not affecting.
+  if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ImplicitModuleMaps)
+return std::nullopt;

zygoloid wrote:

Hm, I see that adding this check was the whole point of the patch. I don't 
think this works -- it's not feasible for the build system to know whether a 
module map file would affect a compilation or not.

https://github.com/llvm/llvm-project/pull/87849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Only compute affecting module maps with implicit search (PR #87849)

2024-04-16 Thread Richard Smith via cfe-commits


@@ -161,8 +161,13 @@ static TypeCode getTypeCodeForTypeClass(Type::TypeClass 
id) {
 
 namespace {
 
-std::set GetAffectingModuleMaps(const Preprocessor ,
-   Module *RootModule) {
+std::optional>
+GetAffectingModuleMaps(const Preprocessor , Module *RootModule) {
+  // Without implicit module map search, there's no good reason to know about
+  // any module maps that are not affecting.
+  if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ImplicitModuleMaps)
+return std::nullopt;

zygoloid wrote:

I don't think this is correct -- we can know about non-affecting module map 
files because they were specified on the command line using 
`-fmodule-map-file=` and either weren't actually used or (as happens in our 
case) we got the same information from a PCM file and so didn't need the module 
map file.

I think this check is a regression; can it be removed?

https://github.com/llvm/llvm-project/pull/87849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][modules] Headers meant to be included multiple times can be completely invisible in clang module builds (PR #83660)

2024-04-16 Thread Richard Smith via cfe-commits


@@ -1403,94 +1421,146 @@ bool 
HeaderSearch::isFileMultipleIncludeGuarded(FileEntryRef File) const {
 void HeaderSearch::MarkFileModuleHeader(FileEntryRef FE,
 ModuleMap::ModuleHeaderRole Role,
 bool isCompilingModuleHeader) {
-  bool isModularHeader = ModuleMap::isModular(Role);
-
   // Don't mark the file info as non-external if there's nothing to change.
   if (!isCompilingModuleHeader) {
-if (!isModularHeader)
+if ((Role & ModuleMap::ExcludedHeader))
   return;
 auto *HFI = getExistingFileInfo(FE);
 if (HFI && HFI->isModuleHeader)
   return;
   }
 
   auto  = getFileInfo(FE);
-  HFI.isModuleHeader |= isModularHeader;
+  HFI.mergeModuleMembership(Role);
   HFI.isCompilingModuleHeader |= isCompilingModuleHeader;

zygoloid wrote:

It looks to me like we're now calling `getFileInfo(FE)` even in cases where the 
info doesn't change, for a textual header. I think that's what's causing the 
regression we're seeing -- we're now considering a repeatedly-used module map 
file to be affecting, even though it isn't, because we end up with a local 
`HeaderFileInfo` instead of an imported one.

https://github.com/llvm/llvm-project/pull/83660
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Modules] Add -cc1 -flate-module-map-file to load module maps after PCMs (PR #88893)

2024-04-16 Thread Richard Smith via cfe-commits

zygoloid wrote:

Have you tried changing the behavior of the existing `-fmodule-map-file=` to 
load the file after we load module files? If so, what kinds of things do you 
see breaking or changing? If we can avoid it, I think it would be better to 
only have a single mode here, and loading the module files first seems to make 
a lot of sense given that they can make it unnecessary to load the module map 
files at all.

https://github.com/llvm/llvm-project/pull/88893
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Extend lifetime of temporaries in mem-default-init for P2718R0 (PR #86960)

2024-04-10 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/86960
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Extend lifetime of temporaries in mem-default-init for P2718R0 (PR #86960)

2024-04-10 Thread Richard Smith via cfe-commits


@@ -1230,11 +1230,26 @@ CodeGenFunction::EmitCXXForRangeStmt(const 
CXXForRangeStmt ,
   JumpDest LoopExit = getJumpDestInCurrentScope("for.end");
 
   LexicalScope ForScope(*this, S.getSourceRange());
+  const DeclStmt *RangeDS = cast(S.getRangeStmt());
+  const VarDecl *RangeVar = cast(RangeDS->getSingleDecl());
+  if (getLangOpts().CPlusPlus23)

zygoloid wrote:

Thr quoted rule is [class.base.init]/11, and in that context is describing the 
behavior of the initialization of class members in a constructor.

Aggregate initialization is handled by the general rule in [class.temporary] 
but see also the [note in 
[dcl.init.general]](https://eel.is/c++draft/dcl.init#general-16.6.2.2.sentence-8)
 that explicitly mentions this case.

https://github.com/llvm/llvm-project/pull/86960
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)

2024-04-07 Thread Richard Smith via cfe-commits


@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, 
bool AddToContext) {
   cast(D)->isFunctionTemplateSpecialization())
 return;
 
+  if (isa(D) && D->getDeclName().isEmpty()) {

zygoloid wrote:

The reserved identifier check appears to be incorrect. For example, for:
```c++
enum __A {
x, y
};
void f() {
using enum __A;
enum __A e;
}
```
Clang produces two warnings: one on the declaration of `enum __A` and one on 
the `using enum __A`. The second warning is a bug -- the 
`-Wreserved-identifier` warning is only supposed to fire on the declaration of 
a reserved name, and `using enum __A` does not declare the name `__A`, it only 
references it. (Note that there's no warning on the use of `enum __A` on the 
next line, which again is only a reference, not a declaration.)

https://github.com/llvm/llvm-project/pull/87144
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)

2024-04-07 Thread Richard Smith via cfe-commits


@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, 
bool AddToContext) {
   cast(D)->isFunctionTemplateSpecialization())
 return;
 
+  if (isa(D) && D->getDeclName().isEmpty()) {

zygoloid wrote:

For what name? The `using enum` declaration doesn't introduce the name of the 
enum.

https://github.com/llvm/llvm-project/pull/87144
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)

2024-04-07 Thread Richard Smith via cfe-commits


@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, 
bool AddToContext) {
   cast(D)->isFunctionTemplateSpecialization())
 return;
 
+  if (isa(D) && D->getDeclName().isEmpty()) {

zygoloid wrote:

Are those useful checks? It seems surprising to me that we'd push a `using 
enum` declaration into scope given that it never introduces any name of its own 
(the enumerators are handled separately). Are we really performing checks here 
that we need for a named enum but that are wrong pr unnecessary for an 
anonymous enum?

https://github.com/llvm/llvm-project/pull/87144
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Skip checking anonymous enum in using enum declaration (PR #87144)

2024-04-06 Thread Richard Smith via cfe-commits


@@ -1537,6 +1537,10 @@ void Sema::PushOnScopeChains(NamedDecl *D, Scope *S, 
bool AddToContext) {
   cast(D)->isFunctionTemplateSpecialization())
 return;
 
+  if (isa(D) && D->getDeclName().isEmpty()) {

zygoloid wrote:

Why do we push a `UsingEnumDecl` into the scope at all? Would it make sense for 
the caller to just add the declaration to the `DeclContext` instead of calling 
`PushOnScopeChains`?

https://github.com/llvm/llvm-project/pull/87144
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [BitInt] Expose a _BitInt literal suffix in C++ (PR #86586)

2024-04-01 Thread Richard Smith via cfe-commits


@@ -1117,19 +1118,37 @@ NumericLiteralParser::NumericLiteralParser(StringRef 
TokSpelling,
   if (isImaginary) break;   // Cannot be repeated.
   isImaginary = true;
   continue;  // Success.
+case '_':
+  if (isFPConstant)
+break; // Invalid for floats
+  if (HasSize)
+break;
+  if (possibleBitInt)
+break; // Cannot be repeated.
+  if (LangOpts.CPlusPlus && s[1] == '_') {
+// Scan ahead to find possible rest of BitInt suffix
+for (const char *c = s; c != ThisTokEnd; ++c) {
+  if (*c == 'w' || *c == 'W') {
+possibleBitInt = true;
+++s;

zygoloid wrote:

FYI, per CWG1810 / [over.literal]/1, it's now IFNDR rather than UB:

> Some literal suffix identifiers are reserved for future standardization; see 
> [usrlit.suffix]. A declaration whose literal-operator-id uses such a literal 
> suffix identifier is ill-formed, no diagnostic required.

... but that doesn't make a difference here. (The "NDR" part is just to allow 
implementations to permit the standard library to declare these literal 
operators.)

https://github.com/llvm/llvm-project/pull/86586
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Resolve FIXME: Look at E, not M (PR #85541)

2024-03-29 Thread Richard Smith via cfe-commits

zygoloid wrote:

IIRC the issue here is that in C++98, the MTE is in a "weird" place in the AST, 
because of the different rules governing how and when temporaries are formed. 
For example, for `const int  = C().a[0];`, we'll form an MTE wrapping just 
the `C()` expression, converting it to an xvalue, in C++11 onwards. But in 
C++98, xvalues mostly don't exist, the MTE wraps the whole `C().a[0]`, and you 
need to call `skipRValueSubobjectAdjustments` to get the actual expression that 
initializes the temporary, which I guess is what `E` is here.

So I think the idea is that `E` is always the expression that initializes the 
temporary, and the type of `M` in C++98 might be some subobject of that, with a 
type that doesn't reflect properties of the temporary. So we should generally 
look at properties of `E`, not `M`, to understand properties of the temporary 
object. But in C++11 onwards I think it doesn't matter, because we always 
materialize immediately.

(That's mostly from memory, I might have got some details wrong.)

https://github.com/llvm/llvm-project/pull/85541
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-27 Thread Richard Smith via cfe-commits

zygoloid wrote:

I don't think we've established an explicit policy, and we've made ABI-breaking 
changes previously. I think we should avoid breaks within a major release 
version (don't backport this) to avoid giving packagers headaches: the ubsan 
runtime is installed in a versioned directory, but I don't think that includes 
the patch version.

Having an ABI break here seems OK and in line with historic precedent. (But we 
ought to document our ABI policy for our runtime libraries.)

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][ExprConst] Fix second arg of __builtin_{clzg,ctzg} not always being evaluated (PR #86742)

2024-03-26 Thread Richard Smith via cfe-commits


@@ -0,0 +1,39 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+// expected-no-diagnostics
+
+constexpr int increment(int& x) {
+  x++;
+  return x;
+}
+
+constexpr int test_clzg_0() {
+  int x = 0;
+  [[maybe_unused]] int unused = __builtin_clzg(0U, increment(x));
+  return x;
+}
+
+static_assert(test_clzg_0() == 1);
+
+constexpr int test_clzg_1() {
+  int x = 0;
+  [[maybe_unused]] int unused = __builtin_clzg(1U, increment(x));
+  return x;
+}
+
+static_assert(test_clzg_1() == 1);
+
+constexpr int test_ctzg_0() {
+  int x = 0;
+  [[maybe_unused]] int unused = __builtin_ctzg(0U, increment(x));
+  return x;
+}
+
+static_assert(test_ctzg_0() == 1);
+
+constexpr int test_ctzg_1() {
+  int x = 0;
+  [[maybe_unused]] int unused = __builtin_ctzg(1U, increment(x));

zygoloid wrote:

Can you cast to `void` instead?

```suggestion
  (void)__builtin_clzg(0U, increment(x));
  return x;
}

static_assert(test_clzg_0() == 1);

constexpr int test_clzg_1() {
  int x = 0;
  (void)__builtin_clzg(1U, increment(x));
  return x;
}

static_assert(test_clzg_1() == 1);

constexpr int test_ctzg_0() {
  int x = 0;
  (void)__builtin_ctzg(0U, increment(x));
  return x;
}

static_assert(test_ctzg_0() == 1);

constexpr int test_ctzg_1() {
  int x = 0;
  (void)__builtin_ctzg(1U, increment(x));
```

https://github.com/llvm/llvm-project/pull/86742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [codegen] Emit missing cleanups for stmt-expr and coro suspensions [take-2] (PR #85398)

2024-03-26 Thread Richard Smith via cfe-commits


@@ -2503,6 +2506,29 @@ Stmt *BlockExpr::getBody() {
 // Generic Expression Routines
 
//===--===//
 
+bool Expr::mayBranchOut() const {

zygoloid wrote:

Computing this seems a little expensive in general. I wonder if we could track 
a bit on `FunctionDecl` that indicates whether it contains any branches out of 
statement expressions, and skip calling this if the enclosing function is not a 
coroutine and does not contain branches out of statement expressions.

https://github.com/llvm/llvm-project/pull/85398
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HLSL] Implement array temporary support (PR #79382)

2024-03-13 Thread Richard Smith via cfe-commits


@@ -3173,41 +3174,46 @@ class ArrayType : public Type, public 
llvm::FoldingSetNode {
 return T->getTypeClass() == ConstantArray ||
T->getTypeClass() == VariableArray ||
T->getTypeClass() == IncompleteArray ||
-   T->getTypeClass() == DependentSizedArray;
+   T->getTypeClass() == DependentSizedArray ||
+   T->getTypeClass() == ArrayParameter;
   }
 };
 
 /// Represents the canonical version of C arrays with a specified constant 
size.
 /// For example, the canonical type for 'int A[4 + 4*100]' is a
 /// ConstantArrayType where the element type is 'int' and the size is 404.
-class ConstantArrayType final
-: public ArrayType,
-  private llvm::TrailingObjects {

zygoloid wrote:

As a general principle we try to keep our AST nodes small. This change isn't 
going to make a big difference by itself, but abandoning that principle and 
making a bunch of changes like this would. I think it's worth putting in a 
small amount of effort here, but if it doesn't work out nicely then it's fine 
to remove the optimization. We can find other places to save space.

https://github.com/llvm/llvm-project/pull/79382
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [HLSL] Implement array temporary support (PR #79382)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -3173,41 +3174,46 @@ class ArrayType : public Type, public 
llvm::FoldingSetNode {
 return T->getTypeClass() == ConstantArray ||
T->getTypeClass() == VariableArray ||
T->getTypeClass() == IncompleteArray ||
-   T->getTypeClass() == DependentSizedArray;
+   T->getTypeClass() == DependentSizedArray ||
+   T->getTypeClass() == ArrayParameter;
   }
 };
 
 /// Represents the canonical version of C arrays with a specified constant 
size.
 /// For example, the canonical type for 'int A[4 + 4*100]' is a
 /// ConstantArrayType where the element type is 'int' and the size is 404.
-class ConstantArrayType final
-: public ArrayType,
-  private llvm::TrailingObjects {

zygoloid wrote:

The size expression is usually not part of a constant array type. Normally, all 
we want as part of the type node is the bound, and the size expression if 
needed can be found on the `ArrayTypeLoc`. The rare special case being handled 
here is that if the array bound is instantiation-dependent but not dependent, 
then we need to preserve the expression in the type (not just in the `TypeLoc`) 
because it affects (for example) whether two template declarations involving 
such a type are considered to declare the same template. Because this is a very 
rare case, and `ConstantArrayType`s can be relatively common (eg, as the types 
of string literals), it seemed preferable to not store the `Expr*` if it's not 
needed, but in absolute terms it's probably a minor difference in memory usage, 
so it's probably fine to remove this optimization.

If we want to keep the optimization, perhaps we could replace the `Size` and 
`SizeExpr` fields with a pointer union that can hold either a <= 63-bit array 
bound (the common case) or a pointer to a slab-allocated pair of `APInt` and 
`const Expr*` (in the case where the bound doesn't fit in 63 bits or the size 
expression is instantiation-dependent).

https://github.com/llvm/llvm-project/pull/79382
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

> > in some simple test cases it does appear to work.
> 
> Please don't be insulting. I didn't just run a couple of "simple" tests and 
> called it a day. I had many rather convoluted examples and they all worked.

I apologize, I didn't mean to imply you hadn't been diligent here.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

> First you tell me that I can't use LLVM's IR to determine the subobject, even 
> though I did and it worked just fine

It definitely doesn't work fine, but sure, in some simple test cases it does 
appear to work.

> now you're saying that I can't use the front-end's knowledge about the 
> structure.

I'm not. I'm saying that you can't assume that the LLVM IR idea of the complete 
object lines up with some enclosing subobject you've found in the frontend. 
You're still trying to use the IR notion of complete object and subobjects, and 
that still doesn't work for the same reasons as before.

You can absolutely compute where the subobject is in the frontend, and pass 
that information onto LLVM. But you'll need to pass it on in a way that is 
meaningful to LLVM. For example, if you pass a pointer and size to the 
intrinsic describing the complete range of addresses covering the subobject, 
that should be fine. But an offset is not useful if the frontend and middle-end 
can't agree on what it's an offset relative to.

> In your example, you've explicitly lied to the compiler about the types being 
> passed in.

Sorry, that was a typo in my example: the call in `h()` should be `f()`.

> I have no clue what you mean by pass a "pointer to `p->n` to the intrinsic" 
> as that's already the first argument in the intrinsic.

I mean, pass a pointer to the start of the subobject. For example, for 
`p->arr[i]`, you'd pass in `>arr`.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

You're passing the difference between the address of some arbitrary class 
object and a field. The address of that class object is not the base address 
that the LLVM intrinsic will calculate. So the offset you're passing in is not 
meaningful. And there's no way it even *can* be meaningful. Consider something 
like this:

```c++
struct A {
  int n;
};
struct B {
  int data[100];
  A a;
};
int f(A *p) { return __builtin_dynamic_object_size(>n, 1); }
int g() {
  A a;
  return f();
}
int h() {
  B b;
  return f();
}
```

After inlining, the BDOS intrinsic in `g` will see a base pointer of ``, and 
the intrinsic in `h` will see a base pointer of ``. The offset from the base 
pointer to the `n` subobject is different in each case. So there's no way you 
can compute a constant offset in Clang's lowering and pass it to the intrinsic.

What you *can* do is pass a pointer to `p->n` to the intrinsic as the start of 
the subobject. That'll do the right thing regardless of what LLVM computes to 
be the complete enclosing object.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-12 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

As far as I can see, the "offset" value you're passing in is a difference 
between two pointers both of which are unknown to the middle-end -- it's the 
difference from the start of the two-levels-out enclosing class subobject to 
the one-level-out enclosing field subobject, I think. I could be 
misunderstanding something, but I don't see how you can use that information to 
get correct results from the builtin. Can you explain how the LLVM intrinsic 
uses this information?

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {

zygoloid wrote:

Does GCC look through explicit casts? I wonder if this should be restricted to 
`ImplicitCastExpr`s.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }

zygoloid wrote:

I think you'll need to be more careful when walking through address-of / 
dereferences -- the set of things you should step over when traversing a 
pointer to an object is different from the set of things you should step over 
when traversing an object lvalue. For example, the bounds to use for 
`*p->member` will be computed as the bounds of `member`, which isn't correct. I 
think you could address this by either having separate traversals for pointers 
versus lvalues, or by avoiding (for example) stepping through lvalue-to-rvalue 
conversions when stepping over `CastExpr`s -- and in fact, the latter seems 
like a good idea in general, given that a `CastExpr` could do pretty much 
anything to the pointer / lvalue. In general, I think it's only really safe to 
step over casts that are a no-op for address purposes. Bitcasts seem OK, 
address space conversions seem OK, etc. but a lot of cast kinds are not going 
to be reasonable to step over.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+};
+
+} // end anonymous namespace
+
+/// getFieldInfo - Gather the size and offset of the field \p VD in \p RD.
+static std::pair getFieldInfo(CodeGenFunction ,
+  const RecordDecl *RD,
+  const ValueDecl *VD,
+  uint64_t Offset = 0) {
+  if (!RD)
+return std::make_pair(0, 0);
+
+  ASTContext  = CGF.getContext();
+  const ASTRecordLayout  = Ctx.getASTRecordLayout(RD);
+  unsigned FieldNo = 0;
+
+  for (const Decl *D : RD->decls()) {
+if (const auto *Record = dyn_cast(D)) {
+  std::pair Res =
+  getFieldInfo(CGF, Record->getDefinition(), VD,
+   Offset + Layout.getFieldOffset(FieldNo));
+  if (Res.first != 0)
+return Res;
+  continue;
+}
+
+if (const auto *FD = dyn_cast(D); FD == VD) {
+  Offset += Layout.getFieldOffset(FieldNo);
+  return 
std::make_pair(Ctx.getTypeSizeInChars(FD->getType()).getQuantity(),
+Ctx.toCharUnitsFromBits(Offset).getQuantity());
+}
+
+if (isa(D))
+  ++FieldNo;
+  }

zygoloid wrote:

This work recursively looping through fields is not necessary: this function 
only succeeds when `VD` is a `FieldDecl`, so you can `dyn_cast` it to that 
type, then get the enclosing `DeclContext` to find the record, and use 
`FieldDecl::getFieldIndex` to find the field number.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+};
+
+} // end anonymous namespace
+
+/// getFieldInfo - Gather the size and offset of the field \p VD in \p RD.
+static std::pair getFieldInfo(CodeGenFunction ,
+  const RecordDecl *RD,
+  const ValueDecl *VD,
+  uint64_t Offset = 0) {
+  if (!RD)
+return std::make_pair(0, 0);
+
+  ASTContext  = CGF.getContext();
+  const ASTRecordLayout  = Ctx.getASTRecordLayout(RD);
+  unsigned FieldNo = 0;
+
+  for (const Decl *D : RD->decls()) {
+if (const auto *Record = dyn_cast(D)) {
+  std::pair Res =
+  getFieldInfo(CGF, Record->getDefinition(), VD,
+   Offset + Layout.getFieldOffset(FieldNo));
+  if (Res.first != 0)
+return Res;
+  continue;
+}
+
+if (const auto *FD = dyn_cast(D); FD == VD) {
+  Offset += Layout.getFieldOffset(FieldNo);
+  return 
std::make_pair(Ctx.getTypeSizeInChars(FD->getType()).getQuantity(),
+Ctx.toCharUnitsFromBits(Offset).getQuantity());
+}
+
+if (isa(D))
+  ++FieldNo;
+  }
+
+  return std::make_pair(0, 0);
+}
+
+/// getSubobjectInfo - Find the sub-object that \p E points to. If it lives
+/// inside a struct, return the "size" and "offset" of that sub-object.
+static std::pair getSubobjectInfo(CodeGenFunction ,
+  const Expr *E) {
+  const Expr *Subobject = SubobjectFinder().Visit(E);
+  if (!Subobject)
+return std::make_pair(0, 0);
+
+  const RecordDecl *OuterRD = nullptr;
+  const ValueDecl *VD = nullptr;
+
+  if (const auto *DRE = dyn_cast(Subobject)) {
+// We're pointing to the beginning of the struct.
+VD = DRE->getDecl();
+QualType Ty = VD->getType();
+if (Ty->isPointerType())
+  Ty = Ty->getPointeeType();
+OuterRD = Ty->getAsRecordDecl();

zygoloid wrote:

If I'm reading this correctly, I think this case is redundant: `getFieldInfo` 
only succeeds when `VD` is a field, but we're not going to have an evaluated 
`DeclRefExpr` that names a field. Can we return `0, 0` in this case, like we do 
for compound literals? I think the only case when we have non-zero values to 
return is when we've found a `FieldDecl`.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -1052,11 +1053,143 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const 
Expr *E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// SubobjectFinder - A simple visitor to find the "sub-object" pointed to by a
+/// __builtin_dynamic_object_size call. Information gathered from the
+/// sub-object is used by the back-end to determine the correct size when the
+/// 'TYPE' of the __bdos call has the least significant bit set (i.e. asking
+/// for the sub-object size).
+///
+/// The expectation is that we'll eventually hit one of three expression types:
+///
+///   1. DeclRefExpr - This is the expression for the base of the structure.
+///   2. MemberExpr - This is the field in the structure.
+///   3. CompoundLiteralExpr - This is for people who create something
+///  heretical like (struct foo has a flexible array member):
+///
+///(struct foo){ 1, 2 }.blah[idx];
+///
+/// All other expressions can be correctly handled with the current code.
+struct SubobjectFinder
+: public ConstStmtVisitor {
+  SubobjectFinder() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+  const Expr *VisitMemberExpr(const MemberExpr *E) { return E; }
+  const Expr *VisitCompoundLiteralExpr(const CompoundLiteralExpr *E) {
+return E;
+  }
+
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }

zygoloid wrote:

A derived-to-base cast is a traversal to a subobject in C++, so we should 
presumably terminate the traversal when we reach one and use the offset and 
size of the base class as the subobject.

That'd be a pretty big delta from what you have here, but it'd be a good idea 
to add a FIXME here.

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #83204)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -26996,18 +26996,38 @@ class, structure, array, or other object.
 Arguments:
 ""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size 
is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, 
if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The 
fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes six arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which value to return when the size is unknown:
+
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+
+- The third argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+the size is assumed to be unknown.
+
+- The fourth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
+- The fifth argument controls which size ``llvm.objectsize`` returns:
+
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+
+- If non-zero, the sixth and seventh arguments encode the size and offset
+  information, respectively, of the original subobject's layout and is used
+  when the fifth argument is ``false``.
+- The seventh argument encodes the offset information of the original
+  subobject's layout and is used when the fifth argument is ``false``.

zygoloid wrote:

I think the information you're passing in here isn't quite what we'd want. If 
I'm reading the code correctly, the offset you're passing in is the field 
offset relative to the immediately-enclosing record type, which doesn't give us 
any information about either where the pointer is within the subobject, or 
where the subobject is within the complete object, so this doesn't seem like it 
can be enough information to produce a correct result.

Rather than passing in the offset of the subobject (relative to an unknown 
anchor point), I think it would be more useful to pass in a pointer to the 
start of the subobject. Or to pass in the offset from the start of the 
subobject to the pointer argument, but that would likely be harder for the 
frontend to calculate (eg, you'd need to work out the offset produced by array 
indexing).

https://github.com/llvm/llvm-project/pull/83204
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -5571,11 +5571,52 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// Check if LHS is a bitfield, if RHS contains an implicit cast expression
+// we want to extract that value and potentially (if the bitfield sanitizer
+// is enabled) use it to check for an implicit conversion.
+if (E->getLHS()->refersToBitField()) {
+  // Get the RHS before scalar conversion.
+  if (auto *ICE = GetOriginalRHSForBitfieldSanitizer(E)) {

zygoloid wrote:

What do you think about renaming this to `GetOriginalRHSForBitfieldAssignment`, 
given that we now use it unconditionally for all lowering of bitfield 
assignments?

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-11 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-11 Thread Richard Smith via cfe-commits

https://github.com/zygoloid commented:

Looks good to me. @AaronBallman Did you have remaining concerns here?

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-11 Thread Richard Smith via cfe-commits


@@ -555,13 +555,11 @@ static void 
handleImplicitConversion(ImplicitConversionData *Data,
  ReportOptions Opts, ValueHandle Src,
  ValueHandle Dst) {
   SourceLocation Loc = Data->Loc.acquire();
-  ErrorType ET = ErrorType::GenericUB;
-
   const TypeDescriptor  = Data->FromType;
   const TypeDescriptor  = Data->ToType;
-
   bool SrcSigned = SrcTy.isSignedIntegerTy();
   bool DstSigned = DstTy.isSignedIntegerTy();
+  ErrorType ET = ErrorType::GenericUB;

zygoloid wrote:

Ah, OK. That makes sense.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits

zygoloid wrote:

> I'm not opposed to the changes, though I do find it somewhat strange to add 
> them to UBSan given that they're not undefined behavior (even though there's 
> precedent).

This is adding checks to `-fsanitize=implicit-conversion`, which is not part of 
`-fsanitize=undefined`. It's sort of ambiguous whether UBSan means 
`-fsanitize=undefined` (that is, checks for UB that can be done without a big 
runtime), or whether it means any of the checks that the UBSan infrastructure 
and runtime power, but the established precedent is we also use the UBSan 
infrastructure to do non-UB data-loss checks like this, and this patch seems 
like a logical extension of the existing behavior.

I'd be happier if we drew a brighter distinction between the UB checks and the 
non-UB checks that our sanitizers perform. For example, perhaps the non-UB 
sanitizers shouldn't be listed in 
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html but in a different 
page, or perhaps at least in a different section from the other checks. But I 
think that's independent of this PR, and in the context of this PR, I don't 
think we should be re-litigating whether this kind of 
data-loss-through-conversion check is in scope.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits


@@ -555,13 +555,11 @@ static void 
handleImplicitConversion(ImplicitConversionData *Data,
  ReportOptions Opts, ValueHandle Src,
  ValueHandle Dst) {
   SourceLocation Loc = Data->Loc.acquire();
-  ErrorType ET = ErrorType::GenericUB;
-
   const TypeDescriptor  = Data->FromType;
   const TypeDescriptor  = Data->ToType;
-
   bool SrcSigned = SrcTy.isSignedIntegerTy();
   bool DstSigned = DstTy.isSignedIntegerTy();
+  ErrorType ET = ErrorType::GenericUB;

zygoloid wrote:

This seems suspect, given that this handler is detecting things that aren't UB.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits


@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// Check if LHS is a bitfield and sanitizer checks are enabled

zygoloid wrote:

(nit, throughout) Coding style is to end comments with a period.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits


@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// Check if LHS is a bitfield and sanitizer checks are enabled
+if (E->getLHS()->refersToBitField() &&
+SanOpts.hasOneOf(SanitizerKind::ImplicitConversion |
+ SanitizerKind::ImplicitBitfieldConversion)) {

zygoloid wrote:

```suggestion
SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion)) {
```
No need to check for the containing group; that should be handled automatically.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits


@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// Check if LHS is a bitfield and sanitizer checks are enabled
+if (E->getLHS()->refersToBitField() &&
+SanOpts.hasOneOf(SanitizerKind::ImplicitConversion |
+ SanitizerKind::ImplicitBitfieldConversion)) {
+  // Get the RHS before scalar conversion
+  if (auto *ICE = GetOriginalRHSForBitfieldSanitizer(E)) {
+SrcType = ICE->getSubExpr()->getType();
+Previous = EmitScalarExpr(ICE->getSubExpr());
+// Pass default ScalarConversionOpts to avoid emitting
+// integer sanitizer checks as E refers to bitfield
+llvm::Value *RHS = EmitScalarConversion(
+Previous, SrcType, ICE->getType(), ICE->getExprLoc());
+RV = RValue::get(RHS);
+  }
+}
+
+// Otherwise, visit RHS as usual
+if (!Previous)
+  RV = EmitAnyExpr(E->getRHS());

zygoloid wrote:

Would it be correct to do the above, more-elaborate lowering instead of this 
regardless of whether a sanitizer is enabled? I'd have more confidence that 
we're performing a correct emission if we used the same code regardless of 
sanitizer mode.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits


@@ -5571,11 +5571,50 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const 
BinaryOperator *E) {
   break;
 }
 
-RValue RV = EmitAnyExpr(E->getRHS());
+llvm::Value *Previous = nullptr;
+RValue RV;
+QualType SrcType = E->getRHS()->getType();
+// Check if LHS is a bitfield and sanitizer checks are enabled
+if (E->getLHS()->refersToBitField() &&
+SanOpts.hasOneOf(SanitizerKind::ImplicitConversion |
+ SanitizerKind::ImplicitBitfieldConversion)) {
+  // Get the RHS before scalar conversion
+  if (auto *ICE = GetOriginalRHSForBitfieldSanitizer(E)) {
+SrcType = ICE->getSubExpr()->getType();
+Previous = EmitScalarExpr(ICE->getSubExpr());
+// Pass default ScalarConversionOpts to avoid emitting
+// integer sanitizer checks as E refers to bitfield
+llvm::Value *RHS = EmitScalarConversion(
+Previous, SrcType, ICE->getType(), ICE->getExprLoc());
+RV = RValue::get(RHS);
+  }
+}
+
+// Otherwise, visit RHS as usual
+if (!Previous)
+  RV = EmitAnyExpr(E->getRHS());
+
 LValue LV = EmitCheckedLValue(E->getLHS(), TCK_Store);
+
 if (RV.isScalar())
   EmitNullabilityCheck(LV, RV.getScalarVal(), E->getExprLoc());
-EmitStoreThroughLValue(RV, LV);
+
+// Passing a pointer EmitStoreThroughBitfieldLValue will emit the result
+// If sanitizer checks are not used, we do not use the result
+// Hence, use EmitStoreThroughLValue instead
+if (LV.isBitField() && RV.isScalar() &&
+SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion)) {
+  // If the expression contained an implicit conversion, make sure
+  // to use the value before the scalar conversion.
+  llvm::Value *Src = Previous ? Previous : RV.getScalarVal();
+  QualType DstType = E->getLHS()->getType();
+  llvm::Value *Result;
+  EmitStoreThroughBitfieldLValue(RV, LV, );
+  EmitBitfieldConversionCheck(Src, SrcType, Result, DstType,
+  LV.getBitFieldInfo(), E->getExprLoc());
+} else
+  EmitStoreThroughLValue(RV, LV);

zygoloid wrote:

Same here: can we use the same code regardless, and just make a choice between 
passing in `` and passing in `nullptr` to 
`EmitStoreThroughBitfieldLValue`, and a choice as to whether we 
`EmitBitfieldConversionCheck`, based on the sanitizer mode?

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits


@@ -4564,15 +4737,41 @@ Value *ScalarExprEmitter::VisitBinAssign(const 
BinaryOperator *E) {
   case Qualifiers::OCL_None:
 // __block variables need to have the rhs evaluated first, plus
 // this should improve codegen just a little.
-RHS = Visit(E->getRHS());
+Value *Previous = nullptr;
+QualType SrcType = E->getRHS()->getType();
+// Check if LHS is a bitfield and sanitizer checks are enabled
+if (E->getLHS()->refersToBitField() &&
+CGF.SanOpts.hasOneOf(SanitizerKind::ImplicitConversion |
+ SanitizerKind::ImplicitBitfieldConversion)) {

zygoloid wrote:

```suggestion
CGF.SanOpts.hasOneOf(SanitizerKind::ImplicitBitfieldConversion)) {
```

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits

https://github.com/zygoloid commented:

Generally this looks good to me, thanks.

https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [clang][UBSan] Add implicit conversion check for bitfields (PR #75481)

2024-03-08 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/75481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][test] Add test for incompatible cv-qualified reference types in conversion function template (PR #81950)

2024-03-06 Thread Richard Smith via cfe-commits

https://github.com/zygoloid closed 
https://github.com/llvm/llvm-project/pull/81950
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][test] Add test for incompatible cv-qualified reference types in conversion function template (PR #81950)

2024-03-05 Thread Richard Smith via cfe-commits

https://github.com/zygoloid approved this pull request.


https://github.com/llvm/llvm-project/pull/81950
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-05 Thread Richard Smith via cfe-commits

zygoloid wrote:

> +1 for this, it will be the 95%/5% rule, you might think all cases of 
> (identifier) is a cast, but I'm pretty sure it won't be

Specifically, what I suggested was: when disambiguating whether 
`(identifier)&...` or `(identifier)*...` is a cast vs a multiplication or 
bitwise and (that is, the case that the previous patch changed, resulting in 
the regression we're trying to fix), treat it as a cast unless `identifier` is 
a macro parameter name. I think that will be right >99% of the time. And even 
when clang-format gets it wrong, the programmer can fix that by removing the 
redundant and non-idiomatic parentheses around the identifier.

> I'm with @owenca on this, this patch will move us closer, meaning users don't 
> need to add common types to TypeNames

As I said before, I think special-casing these typedefs is a good thing - it's 
reasonable for clang-format to assume they're type names. But I don't think 
this moves us much closer to fixing the bug, because most casts won't be to one 
of these typedefs.

I do wonder, though -- there's a lot of churn in this patch adding and wiring 
through a new parameter. Can the `TypeNames` mechanism be prepopulated with 
this list of types instead?

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-04 Thread Richard Smith via cfe-commits

zygoloid wrote:

> It does fix the example given.

#83400 has 6 real-world examples. This patch fixes none of them. It also has a 
reduced testcase, which this patch does fix. But fixing the reduced testcase 
without fixing the real-world examples is not fixing the bug.


https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",

zygoloid wrote:

On codesearch.isocpp.org, `(intptr_t)` is more common than `(int8_t)` and 
`(int16_t)`, and similar to `(int32_t)`. But why should this general function 
that's used in various places in clang-format care about how often the type is 
cast with a C-style cast? That doesn't make sense to me.

I find it extremely unlikely that anyone would use `intN_t` as anything other 
than a type; I think your concerns about regressions are unfounded. In 
contrast, there are 518 hits for `(intptr_t)&` in isocpp codesearch, which 
seems pretty logical to me because casts to `(intptr_t)` will be casts from 
pointers.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits

zygoloid wrote:

> This patch does not only fix formatting of C-casting to a C++ standard type. 
> It correctly identifies (most of) such types and might have fixed other kinds 
> of bugs.

Sure, this patch seems like a good change. But it does not fix #83400.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",
+};
+
 // FIXME: This is copy from Sema. Put it in a common place and remove
 // duplication.

zygoloid wrote:

Before this patch, the two functions were intended to do the same thing, but 
have apparently diverged. After this patch they are not intended to do the same 
thing. The FIXME is wrong, can you fix it please?

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits

zygoloid wrote:

> > Another possibility to consider for the original bug: `(single_identifier)` 
> > is almost certainly a cast, not redundant parentheses, unless 
> > `single_identifier` names a macro argument. So I wonder if that would be a 
> > better heuristic to use to fix the regression.
> 
> I don’t think that would work as`(identifier)` can be a number of constructs 
> depending on the context. It can be part of a function/macro call, function 
> declaration, macro definition, C++ cast, C cast, `sizeof` operand, etc.

OK, but the context is deciding whether `(identifier)` or `(identifier)*x` is 
a binary operator or a cast. Given no other information, the fact that it's a 
single identifier seems like a very strong signal that it's a cast.

I don't think this PR fixes #83400 -- handling a small handful of common types 
won't address the same issue for the much larger class of cases where the type 
name is not one of these types. Eg, in #83400 itself the examples included:

```diff
-   static _FORCE_INLINE_ m_type get(const Variant *v) { return 
(m_type)*VariantInternal::get_int(v); }\
+   static _FORCE_INLINE_ m_type get(const Variant *v) { return 
(m_type) * VariantInternal::get_int(v); }  \
```

But looking for a parenthesized single identifier addresses all of the examples 
in the issue.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",
+};
+
 // FIXME: This is copy from Sema. Put it in a common place and remove
 // duplication.

zygoloid wrote:

This function is no longer simply copy/pasted from `Sema`. It doesn't even do 
the same thing as the Sema function any more.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",

zygoloid wrote:

According to codesearch.isocpp.org, `intptr_t`, `uintptr_t`, and `ptrdiff_t` 
are all more common than `int8_t`. Also I'd note that POSIX reserves all 
identifiers ending in `_t` for use as types, and C reserves all identifiers 
starting with `int` or `uint` and ending in `_t` for use as types.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits

zygoloid wrote:

Another possibility to consider for the original bug: `(single_identifier)` is 
almost certainly a cast, not redundant parentheses, unless `single_identifier` 
names a macro argument. So I wonder if that would be a better heuristic to use 
to fix the regression.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",
+};
+
 // FIXME: This is copy from Sema. Put it in a common place and remove
 // duplication.

zygoloid wrote:

Looks like this comment is no longer correct.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -66,13 +72,17 @@ bool FormatToken::isSimpleTypeSpecifier() const {
   case tok::kw_decltype:
   case tok::kw__Atomic:
 return true;
+  case tok::identifier:
+return IsCpp && std::binary_search(CppNonKeywordTypes.begin(),
+   CppNonKeywordTypes.end(), TokenText);

zygoloid wrote:

Should this really be C++-only? All of the types in your list are standard 
library header types in C too, except for `byte` (which, as noted above, 
probably shouldn't be there).

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",
+"size_t", "uint16_t", "uint32_t", "uint64_t", "uint8_t",

zygoloid wrote:

Maybe also `ptrdiff_t`, `intptr_t`, `uintptr_t`. And there's also `float_t`, 
`double_t`, `uint_least32_t`, ...

I wonder if the best heuristic to use here would be simply to look for an 
identifier that matches `[a-z0-9_]*_t`.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] Handle common C++ non-keyword types as such (PR #83709)

2024-03-03 Thread Richard Smith via cfe-commits


@@ -34,9 +34,15 @@ const char *getTokenTypeName(TokenType Type) {
   return nullptr;
 }
 
+// Sorted common C++ non-keyword types.
+static SmallVector CppNonKeywordTypes = {
+"byte",   "int16_t",  "int32_t",  "int64_t",  "int8_t",

zygoloid wrote:

Does `byte` really belong here? It's not available as an unqualified name, only 
as `std::byte`. I could see an argument for including `nullptr_t` here because 
it is available unqualified, but even that seems a little hard to justify since 
I don't think people write that often. I also expect people use `byte` as a 
variable name frequently; the same doesn't seem to be true for these other 
names.

https://github.com/llvm/llvm-project/pull/83709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Sequence C++20 Parenthesized List Init (PR #83476)

2024-02-29 Thread Richard Smith via cfe-commits

https://github.com/zygoloid commented:

Thanks!

[I'm not sure when I'll have time to circle back to this, so I'd be happy for 
someone else to finish the review and approve.]

https://github.com/llvm/llvm-project/pull/83476
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Sequence C++20 Parenthesized List Init (PR #83476)

2024-02-29 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/83476
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Sequence C++20 Parenthesized List Init (PR #83476)

2024-02-29 Thread Richard Smith via cfe-commits


@@ -17626,6 +17626,25 @@ class SequenceChecker : public 
ConstEvaluatedExprVisitor {
 for (unsigned I = 0; I < Elts.size(); ++I)
   Tree.merge(Elts[I]);
   }
+
+  void VisitCXXParenListInitExpr(const CXXParenListInitExpr *PLIE) {
+// C++20 parenthesized list initializations are sequenced. See C++20
+// [decl.init]p17.5 and [decl.init]p17.6.2.2

zygoloid wrote:

I think you mean `dcl.init.general`, not `decl.init` here, and the wording is 
in p16 not p17 in C++20 and the current working paper.
```suggestion
// [dcl.init.general]p16.5 and [dcl.init.general]p16.6.2.2.
```

https://github.com/llvm/llvm-project/pull/83476
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Sequence C++20 Parenthesized List Init (PR #83476)

2024-02-29 Thread Richard Smith via cfe-commits


@@ -17626,6 +17626,25 @@ class SequenceChecker : public 
ConstEvaluatedExprVisitor {
 for (unsigned I = 0; I < Elts.size(); ++I)
   Tree.merge(Elts[I]);
   }
+
+  void VisitCXXParenListInitExpr(const CXXParenListInitExpr *PLIE) {
+// C++20 parenthesized list initializations are sequenced. See C++20
+// [decl.init]p17.5 and [decl.init]p17.6.2.2
+SmallVector Elts;
+SequenceTree::Seq Parent = Region;
+for (const Expr *E : PLIE->getInitExprs()) {
+  if (!E)
+continue;
+  Region = Tree.allocate(Parent);
+  Elts.push_back(Region);
+  Visit(E);
+}
+
+// Forget that the initializers are sequenced.
+Region = Parent;
+for (unsigned I = 0; I < Elts.size(); ++I)
+  Tree.merge(Elts[I]);
+  }

zygoloid wrote:

Is it feasible to factor out the common code between this and 
`VisitInitListExpr`? This looks nearly identical.

https://github.com/llvm/llvm-project/pull/83476
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Add __builtin_start_object_lifetime builtin. (PR #82776)

2024-02-23 Thread Richard Smith via cfe-commits


@@ -896,6 +896,12 @@ def Launder : Builtin {
   let Prototype = "void*(void*)";
 }
 
+def StartObjectLifeTime : Builtin {

zygoloid wrote:

Nit: lifetime is one word not two, so the `T` should not be capitalized.

https://github.com/llvm/llvm-project/pull/82776
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] adds a space after not inside macros (PR #78176)

2024-02-04 Thread Richard Smith via cfe-commits


@@ -4842,7 +4842,7 @@ bool TokenAnnotator::spaceRequiredBefore(const 
AnnotatedLine ,
 return true;
   }
   if (Left.is(TT_UnaryOperator)) {
-if (Right.isNot(tok::l_paren)) {
+if (!Right.isOneOf(tok::r_paren, tok::l_paren, tok::exclaim)) {

zygoloid wrote:

Another data point: the [Python style 
guide](https://peps.python.org/pep-0008/#other-recommendations) also recommends 
always putting a space after `not`.

https://github.com/llvm/llvm-project/pull/78176
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] adds a space after not inside macros (PR #78176)

2024-02-04 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/78176
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] adds a space after not inside macros (PR #78176)

2024-02-04 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/78176
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] adds a space after not inside macros (PR #78176)

2024-02-04 Thread Richard Smith via cfe-commits

https://github.com/zygoloid edited 
https://github.com/llvm/llvm-project/pull/78176
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-format] adds a space after not inside macros (PR #78176)

2024-02-04 Thread Richard Smith via cfe-commits


@@ -4842,7 +4842,7 @@ bool TokenAnnotator::spaceRequiredBefore(const 
AnnotatedLine ,
 return true;
   }
   if (Left.is(TT_UnaryOperator)) {
-if (Right.isNot(tok::l_paren)) {
+if (!Right.isOneOf(tok::r_paren, tok::l_paren, tok::exclaim)) {

zygoloid wrote:

For `not!`, I think adding a space is the right choice for a C++ code 
formatter: `not !b` should probably be formatted with a space, not as `not!b`, 
even though it's a pretty strange thing to write.

I wonder if this should instead be:
```suggestion
if (!Right.isOneOf(tok::r_paren, tok::l_paren, tok::comma)) {
```
... with the `)` and `,` cases firing only in places where we can locally tell 
this isn't actually a `not` operator at all.

https://github.com/llvm/llvm-project/pull/78176
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Convert __builtin_dynamic_object_size into a calculation (PR #80256)

2024-02-01 Thread Richard Smith via cfe-commits


@@ -1051,6 +1052,145 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr 
*E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// \p StructBaseExpr returns the base \p Expr with a structure or union type.
+struct StructBaseExpr : public ConstStmtVisitor {
+  StructBaseExpr() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *Visit(const Expr *E) {
+QualType Ty = E->getType();
+if (Ty->isStructureType() || Ty->isUnionType())
+  return E;
+
+return ConstStmtVisitor::Visit(E);
+  }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+
+  const Expr *VisitMemberExpr(const MemberExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+};
+
+} // end anonymous namespace
+
+/// The offset of a field from the beginning of the record.
+llvm::Value *
+CodeGenFunction::tryEmitObjectSizeCalculation(const Expr *E, unsigned Type,
+  llvm::IntegerType *ResType) {
+  if ((Type & 0x01) != 0)
+// We handle only the whole object size.
+return nullptr;
+
+  E = E->IgnoreParenImpCasts();
+
+  const Expr *Base = StructBaseExpr().Visit(E);
+  if (!Base)
+return nullptr;
+
+  const RecordDecl *RD = Base->getType()->getAsRecordDecl();
+  if (!RD)
+return nullptr;
+
+  // Get the full size of the struct.
+  ASTContext  = getContext();
+  const RecordDecl *OuterRD = RD->getOuterLexicalRecordContext();
+  const clang::Type *RT = OuterRD->getTypeForDecl();
+  CharUnits RecordSize = Ctx.getTypeSizeInChars(RT);
+
+  Value *Res = nullptr;
+
+  if (const auto *U = dyn_cast(E);
+  U && (U->getOpcode() == UO_AddrOf || U->getOpcode() == UO_Deref))
+E = U->getSubExpr()->IgnoreParenImpCasts();
+
+  if (const auto *ASE = dyn_cast(E)) {
+const Expr *Idx = ASE->getIdx();
+Base = ASE->getBase()->IgnoreParenImpCasts();
+
+if (const auto *ME = dyn_cast(Base);
+ME && ME->getType()->isConstantArrayType()) {
+  // The simple case:
+  //
+  // struct s {
+  // int arr[42];
+  // char c;
+  // /* others */
+  // };
+  //
+  // __builtin_dynamic_object_size(>arr[idx], 0);
+  //
+  // We can translate the __builtin_dynamic_object_call into:
+  //
+  // sizeof(struct s) - offsetof(arr) - (idx * sizeof(int))
+  //

zygoloid wrote:

I assume we only need to do this for `Type == 1` (for `0` and `2`, we can just 
use the LLVM intrinsic, and you're bailing out for `3`). In that case, I don't 
think we need to care about `sizeof s` here -- for a pointer to an array 
subscript expression where we know the array bound, we can use `max(sizeof(arr) 
- sizeof(arr[0]) * idx, 0)`. Type 1 shouldn't permit overwriting the unrelated 
`c` field in the example above.

https://github.com/llvm/llvm-project/pull/80256
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Convert __builtin_dynamic_object_size into a calculation (PR #80256)

2024-02-01 Thread Richard Smith via cfe-commits


@@ -1051,6 +1052,145 @@ CodeGenFunction::emitFlexibleArrayMemberSize(const Expr 
*E, unsigned Type,
   return Builder.CreateSelect(Cmp, Res, ConstantInt::get(ResType, 0, 
IsSigned));
 }
 
+namespace {
+
+/// \p StructBaseExpr returns the base \p Expr with a structure or union type.
+struct StructBaseExpr : public ConstStmtVisitor {
+  StructBaseExpr() = default;
+
+  
//======//
+  //Visitor Methods
+  
//======//
+
+  const Expr *VisitStmt(const Stmt *S) { return nullptr; }
+
+  const Expr *Visit(const Expr *E) {
+QualType Ty = E->getType();
+if (Ty->isStructureType() || Ty->isUnionType())
+  return E;
+
+return ConstStmtVisitor::Visit(E);
+  }
+
+  const Expr *VisitDeclRefExpr(const DeclRefExpr *E) { return E; }
+
+  const Expr *VisitMemberExpr(const MemberExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitArraySubscriptExpr(const ArraySubscriptExpr *E) {
+return Visit(E->getBase());
+  }
+  const Expr *VisitCastExpr(const CastExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitParenExpr(const ParenExpr *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryAddrOf(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }
+  const Expr *VisitUnaryDeref(const clang::UnaryOperator *E) {
+return Visit(E->getSubExpr());
+  }

zygoloid wrote:

Looking through both casts and dereferences seems like it won't be correct:

```
struct A { int ***p; } a;
int n = __builtin_dynamic_object_size(***a.p, 0);
```
... looks like it'll base the result on the size of `A`, which is unrelated.

I think you should be more conservative about the kinds of casts you'll walk 
through here: bitcasts and pointer casts seem potentially OK to step through, 
but things like lvalue-to-rvalue conversions will be problematic.

But looking at what you're doing with this below, I think it might be possible 
to avoid doing this walk entirely.

https://github.com/llvm/llvm-project/pull/80256
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

2024-01-30 Thread Richard Smith via cfe-commits

zygoloid wrote:

It might be academic at this point, but for what it's worth, 
`__builtin_dynamic_object_size` is not a GCC builtin that clang copied, it's [a 
clang builtin](https://reviews.llvm.org/D56760) that GCC copied.

https://github.com/llvm/llvm-project/pull/78526
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [mlir] [llvm] [compiler-rt] [clang] Fix a bug with qualified name lookup into current instantiation (PR #73018)

2024-01-29 Thread Richard Smith via cfe-commits

zygoloid wrote:

> I'm not sure I understand why gcc is giving an error in 
> https://godbolt.org/z/qGfnzhfsK. It doesn't give an error with `typename` 
> keyword and I think there shouldn't be any difference since `Y::E` refers to 
> current instantiation and is being used by class member declaration so no 
> `typename` required. cc @shafik @cor3ntin @zygoloid for an opinion.

This looks like a GCC bug to me. I think the presence or absence of `typename` 
isn't supposed to make a difference in an implicit typename context.

https://github.com/llvm/llvm-project/pull/73018
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

2024-01-19 Thread Richard Smith via cfe-commits

zygoloid wrote:

> My answer for the question "what's the semantics of GCC's builtin X?" has 
> always been "whatever GCC does." It's the best we can rely upon. But then we 
> get into situations like this, where you and @nikic have one interpretation 
> of their documentation and I have another. I can point to their behavior to 
> back up my claim, but in the end it's probably not exactly clear even to GCC.

[@nikic 
demonstrated](https://github.com/llvm/llvm-project/pull/78526#issuecomment-1900439850)
 that our current behavior is already compatible with GCC's behavior. If GCC's 
behavior is the spec, then we are allowed to return 48 rather than only 40 or 
-1 (or presumably 0 if `argc` is out of bounds) for the original example, 
because in some cases GCC does so.

> My concern is that we want to use this for code hardening. Without precise 
> object sizes, we're hampered in our goal. The unfortunate reality is that we 
> can only get that size via these `__builtin_[dynamic_]object_size` functions.

That's a totally understandable desire, but I think it's not realistic to 
expect precise *sub*object sizes in the same cases that GCC can provide them, 
due to the different architectural choices in the two compilers. If we had a 
mid-level IR for Clang that still had frontend information, we could do better 
by evaluating BOS there, so maybe that's one long term path forward to 
consider. And in the short term, while there are cases where we won't be able 
to match GCC, I think Clang should do better than it currently does in the 
frontend, specifically in cases like the one in the bug report where there's an 
obvious better answer that doesn't require any sophisticated analysis to 
discover.

> > Here, `f` ideally would return 4, but at the LLVM IR level, `p` and `q` are 
> > identical values and the `>a` operation is a no-op. In cases like this, 
> > the best we can realistically do is to return 8.
> 
> The sub-object for `>a` and even `>b` is `struct X`, not the integers 
> themselves. If you want that, you'll have to use casts: `&((char 
> *)p->b)[2];`. (I had to take care to get that correct.) So `f` should return 
> `8` (note it's likely to get `8` from the `alloc_size` attribute on `malloc` 
> in your example).

GCC disagrees with you: https://godbolt.org/z/s4P74oEqx

https://github.com/llvm/llvm-project/pull/78526
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

2024-01-19 Thread Richard Smith via cfe-commits

zygoloid wrote:

> Perhaps we need clarification on what GCC means by "may point to multiple 
> objects" in this instance. To me that means either "get me the size of the 
> largest of these multiple objects" or "size of the smallest." In my eyes, 
> that means pointing to a union field.

Per @nikic's example, it seems reasonably clear to me that GCC's intended 
semantics are to get either the best upper bound or the best lower bound that 
the compiler is able to compute. I mean sure, we can ask, and it does no harm 
to do so, but it is not reasonable to expect that a quantity like this coming 
from the vagaries of GCC's implementation details will be exactly the same for 
all examples when computed by a different implementation. (It's not even the 
same across optimization levels in GCC.)

> I know that we lose precise struct information going to LLVM IR. If that's 
> what's needed here, there are ways to pass this information along. We retain 
> this information via DWARF. We could use similar metadata for this instance. 
> Would that be acceptable?

In theory, yes, we could preserve enough information in metadata to compute 
this in the middle-end. But we would need to emit a *lot* of metadata, just on 
the off-chance that after optimization we happen to have a builtin_object_size 
query that points to each object that we emit, so in practice I don't think 
there's any chance we can do this. We can't use DWARF, because it won't 
necessarily be available (and typically won't be available in the interesting 
case where we find the object size only after optimization), and in any case, 
the presence or absence of DWARF isn't supposed to affect the executable code. 
Also, we'd need to annotate things that simply don't exist at all in the LLVM 
IR:

```
typedef struct X { int a, b; } X;
int f(void) {
  X *p = malloc(sizeof(X));
  int *q = >a;
  return __builtin_object_size(q, 1);
}
```

Here, `f` ideally would return 4, but at the LLVM IR level, `p` and `q` are 
identical values and the `>a` operation is a no-op. In cases like this, the 
best we can realistically do is to return 8.

https://github.com/llvm/llvm-project/pull/78526
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang] Correct __builtin_dynamic_object_size for subobject types (PR #78526)

2024-01-19 Thread Richard Smith via cfe-commits

zygoloid wrote:

Taking a step back, while this patch is not the right direction, we can and 
should do better for the original example. Probably the best way to do that is 
to analyze the operand to `__builtin_[dynamic_]object_size` in the frontend and 
compute a better bound based on the form of the expression. It looks like it 
should be feasible to produce a tighter bound for an array-to-pointer-decay 
expression like `f.bar[argc]` in subobject mode, as:

- `select llvm.is.constant(argc) and (argc < 0 or argc >= 2), 0, 
sizeof(f.bar[argc])` for the non-dynamic case, and just
- `select (argc < 0 or argc >= 2), 0, sizeof(f.bar[argc])` for the dynamic case.

A possibly simpler alternative would be for the frontend to pass an upper bound 
on the result to the LLVM builtin in mode 1, so Clang could say "I know the 
result will never be more than 40" and LLVM could provide either that size or 
the complete object size, whichever is smaller. That wouldn't give as tight a 
bound for the argc == 2 case, though.

https://github.com/llvm/llvm-project/pull/78526
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >