jfb added a comment.

Can you add a link to bug 40605 in the commit message?

> I'm not quite sure how to show the resulting difference in code. Do you mean 
> we need Clang to support both modes and to compare the resulting assembly?

I only meant tests that show codegen, as you've now added auto-var-init.cpp. 
There might be interesting cases which I wasn't testing when I wrote it, so if 
you think of any please do make sure you add some. You mentioned a Linux struct 
that used to crash? That would be a useful test (I imagine it's one with the 
`Loc` adjustment).



================
Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1143
+    const llvm::StructLayout *Layout =
+        CGM.getDataLayout().getStructLayout(cast<llvm::StructType>(Ty));
+    for (unsigned i = 0; i != constant->getNumOperands(); i++) {
----------------
glider wrote:
> jfb wrote:
> > I think you need a heuristic here, where you don't emit multiple stores if 
> > `getNumOperands` is greater than some number. In that case you'd keep doing 
> > memcpy instead. Note that a struct containing sub-structs (or arrays) 
> > should also count the number of sub-structs (so just checking 
> > `getNumOperands` isn't sufficient).
> > 
> > Other parts of this code chose 6 stores...
> Do we ever need to not split these stores? Can't the  MemCpyOpt pass take 
> care of them later on?
You're assuming the optimizer is running at all :-)
In general your approach seems to care about `-O2`, and I agree that's usually 
what we want to tune. However, clang also support `-O0` which won't touch these 
stores at all, and in `-O0` it's usually nice to just see a copy from a global 
while debugging. Further, configurations such as `-Os` and `-Oz` would be 
hampered by excessive codegen.

You also need to consider the compile-time hit the expansion you're making 
would add. More code means more time to compile, and I'd expect little to no 
win from emitting more than a handful of stores. If someone has a large thing 
on the stack, and it can be optimized, then we should teach the optimizer to 
deal with memcpy better. I don't think e.g. a 400+ byte struct makes sense to 
initialize with a bunch of store.

So I think you want to check what the size of the struct is (check out when 
x86-64 and ARM64 inline memcpy / memset, I assume those are decent guesses for 
size to inline).

Further, your current code sill do something silly for code like this:
```
struct S { int j; char arr[8]; float f; };
```
Before you'd have a single `memcpy`, but now it'll be a store, a `memset`, and 
a store. I think you really want to have the same heuristic for arrays (so 
you'd have just stores), and you want to make sure that if you're going to 
split up a struct you do so for all the fields inside that struct.


================
Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1765
+  emitStoresForConstant(CGM, D, Loc, isVolatile, Builder, constant,
+                        /*forInit*/ false);
 }
----------------
glider wrote:
> jfb wrote:
> > Can you explain why this case needs to be different? IIUC it's because the 
> > types can differ which makes the struct's elements not match up the ones 
> > from the constant in the loop above? I think the code above should 
> > automatically handle that case instead of passing in a flag here.
> > 
> > You also definitely need to test the case where that happens :)
> I just thought we shouldn't break the constants that we didn't create with 
> -ftrivial-var-auto-init.
> I'll take a closer look though (indeed, this crashes on one of the structs 
> from the Linux kernel)
Yeah I think it's from the `Loc` adjustment above. I think your change is worth 
doing for all cases, not just trivial var auto-init. It'll lead us to optimize 
all of it much better IMO (and support things like `-O0` `-Os` and `-Oz` 
uniformly well too).


================
Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:37
 // PATTERN: @__const.test_small_custom.custom = private unnamed_addr constant 
%struct.small { i8 42 }, align 1
 // ZERO: @__const.test_small_custom.custom = private unnamed_addr constant 
%struct.small { i8 42 }, align 1
 struct small { char c; };
----------------
How come `@__const.test_small_custom.custom` still gets emitted? I guess custom 
initialization goes through a different code path?


================
Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:496
 // ZERO-LABEL: @test_smallpartinit_uninit()
 // ZERO: call void @llvm.memset{{.*}}, i8 0,
 
----------------
I wonder if (maybe in a separate patch) you also want to avoid `memset` when 
something is pretty small.


================
Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:664
+// PATTERN: store i32 -1431655766, {{.*}} align 4
+// PATTERN: call void @llvm.memset.{{.*}}(i8* align 4 {{.*}}, i8 0, i64 0, i1 
false)
 // ZERO-LABEL: @test_arraytail_uninit()
----------------
This one is particularly silly since it's a zero-sized `memset` :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57898/new/

https://reviews.llvm.org/D57898



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to