[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-08-01 Thread Joerg Sonnenberger via Phabricator via cfe-commits
joerg added a comment.

There are two different considerations here:
(1) Create less target code
(2) Create less IR

If this code can significantly reduce the amount of IR, it can be useful in 
general. That's why the existing memset logic is helpful.


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-31 Thread JF Bastien via Phabricator via cfe-commits
jfb added a comment.

In https://reviews.llvm.org/D49771#1183641, @mehdi_amini wrote:

> > I'm worried, however, about generating a bunch more code than needed from 
> > clang in the hopes that the compiler will clean it up later.
>
> Isn't a strong design component of clang/LLVM? Clang does not try to generate 
> "smart" code and leave it up to LLVM to clean it up.


The code around this one, and lack of code in LLVM, seem to disagree. :-)


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-31 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

> I'm worried, however, about generating a bunch more code than needed from 
> clang in the hopes that the compiler will clean it up later.

Isn't a strong design component of clang/LLVM? Clang does not try to generate 
"smart" code and leave it up to LLVM to clean it up.


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-31 Thread JF Bastien via Phabricator via cfe-commits
jfb added a comment.

In https://reviews.llvm.org/D49771#1183562, @mehdi_amini wrote:

> In https://reviews.llvm.org/D49771#1183380, @jfb wrote:
>
> > In https://reviews.llvm.org/D49771#1181008, @mehdi_amini wrote:
> >
> > > I'm curious: isn't the kind of optimization we should expect LLVM to 
> > > provide?
> >
> >
> > Maybe? It seems obvious to do here since we know we'll probably want to be 
> > doing it, and I have another patch I'm working on which will make it that 
> > much more obviously useful to have here. The middle-end can definitely 
> > figure it out but it just seems like more work, later, so in the meantime 
> > we'd be looking at more stuff.
>
>
> I'm not asking where is it easier to do, but where does it make the most 
> sense :)


What I mean by "easy" is: we know we're likely to want this type of code, 
there's not much pattern recognition needed on our part here. Were we to wait 
we'd need to do more work. I believe this statement will become truer over time.

> Doing such in LLVM in general means catching more patterns (i.e. after 
> inlining, etc.), and also catching it from multiple frontend. So in general 
> I'm worried when I see optimizations implemented in the frontend  instead of 
> the middle end.

Agreed, LLVM could also do it, and it would likely be useful to do so. I'm 
worried, however, about generating a bunch more code than needed from clang in 
the hopes that the compiler will clean it up later.


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-31 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

In https://reviews.llvm.org/D49771#1183380, @jfb wrote:

> In https://reviews.llvm.org/D49771#1181008, @mehdi_amini wrote:
>
> > I'm curious: isn't the kind of optimization we should expect LLVM to 
> > provide?
>
>
> Maybe? It seems obvious to do here since we know we'll probably want to be 
> doing it, and I have another patch I'm working on which will make it that 
> much more obviously useful to have here. The middle-end can definitely figure 
> it out but it just seems like more work, later, so in the meantime we'd be 
> looking at more stuff.


I'm not asking where is it easier to do, but where does it make the most sense 
:)
Doing such in LLVM in general means catching more patterns (i.e. after 
inlining, etc.), and also catching it from multiple frontend. So in general I'm 
worried when I see optimizations implemented in the frontend  instead of the 
middle end.


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-31 Thread JF Bastien via Phabricator via cfe-commits
jfb added a comment.

In https://reviews.llvm.org/D49771#1181008, @mehdi_amini wrote:

> I'm curious: isn't the kind of optimization we should expect LLVM to provide?


Maybe? It seems obvious to do here since we know we'll probably want to be 
doing it, and I have another patch I'm working on which will make it that much 
more obviously useful to have here. The middle-end can definitely figure it out 
but it just seems like more work, later, so in the meantime we'd be looking at 
more stuff.


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-30 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

I'm curious: isn't the kind of optimization we should expect LLVM to provide?


Repository:
  rL LLVM

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-24 Thread JF Bastien via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL337887: CodeGen: use non-zero memset when possible for 
automatic variables (authored by jfb, committed by ).
Herald added a subscriber: llvm-commits.

Repository:
  rL LLVM

https://reviews.llvm.org/D49771

Files:
  cfe/trunk/lib/CodeGen/CGDecl.cpp
  cfe/trunk/test/CodeGen/init.c

Index: cfe/trunk/lib/CodeGen/CGDecl.cpp
===
--- cfe/trunk/lib/CodeGen/CGDecl.cpp
+++ cfe/trunk/lib/CodeGen/CGDecl.cpp
@@ -948,6 +948,113 @@
  canEmitInitWithFewStoresAfterBZero(Init, StoreBudget);
 }
 
+/// A byte pattern.
+///
+/// Can be "any" pattern if the value was padding or known to be undef.
+/// Can be "none" pattern if a sequence doesn't exist.
+class BytePattern {
+  uint8_t Val;
+  enum class ValueType : uint8_t { Specific, Any, None } Type;
+  BytePattern(ValueType Type) : Type(Type) {}
+
+public:
+  BytePattern(uint8_t Value) : Val(Value), Type(ValueType::Specific) {}
+  static BytePattern Any() { return BytePattern(ValueType::Any); }
+  static BytePattern None() { return BytePattern(ValueType::None); }
+  bool isAny() const { return Type == ValueType::Any; }
+  bool isNone() const { return Type == ValueType::None; }
+  bool isValued() const { return Type == ValueType::Specific; }
+  uint8_t getValue() const {
+assert(isValued());
+return Val;
+  }
+  BytePattern merge(const BytePattern Other) const {
+if (isNone() || Other.isNone())
+  return None();
+if (isAny())
+  return Other;
+if (Other.isAny())
+  return *this;
+if (getValue() == Other.getValue())
+  return *this;
+return None();
+  }
+};
+
+/// Figures out whether the constant can be initialized with memset.
+static BytePattern constantIsRepeatedBytePattern(llvm::Constant *C) {
+  if (isa(C) || isa(C))
+return BytePattern(0x00);
+  if (isa(C))
+return BytePattern::Any();
+
+  if (isa(C)) {
+auto *Int = cast(C);
+if (Int->getBitWidth() % 8 != 0)
+  return BytePattern::None();
+const llvm::APInt &Value = Int->getValue();
+if (Value.isSplat(8))
+  return BytePattern(Value.getLoBits(8).getLimitedValue());
+return BytePattern::None();
+  }
+
+  if (isa(C)) {
+auto *FP = cast(C);
+llvm::APInt Bits = FP->getValueAPF().bitcastToAPInt();
+if (Bits.getBitWidth() % 8 != 0)
+  return BytePattern::None();
+if (!Bits.isSplat(8))
+  return BytePattern::None();
+return BytePattern(Bits.getLimitedValue() & 0xFF);
+  }
+
+  if (isa(C)) {
+llvm::Constant *Splat = cast(C)->getSplatValue();
+if (Splat)
+  return constantIsRepeatedBytePattern(Splat);
+return BytePattern::None();
+  }
+
+  if (isa(C) || isa(C)) {
+BytePattern Pattern(BytePattern::Any());
+for (unsigned I = 0, E = C->getNumOperands(); I != E; ++I) {
+  llvm::Constant *Elt = cast(C->getOperand(I));
+  Pattern = Pattern.merge(constantIsRepeatedBytePattern(Elt));
+  if (Pattern.isNone())
+return Pattern;
+}
+return Pattern;
+  }
+
+  if (llvm::ConstantDataSequential *CDS =
+  dyn_cast(C)) {
+BytePattern Pattern(BytePattern::Any());
+for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I) {
+  llvm::Constant *Elt = CDS->getElementAsConstant(I);
+  Pattern = Pattern.merge(constantIsRepeatedBytePattern(Elt));
+  if (Pattern.isNone())
+return Pattern;
+}
+return Pattern;
+  }
+
+  // BlockAddress, ConstantExpr, and everything else is scary.
+  return BytePattern::None();
+}
+
+/// Decide whether we should use memset to initialize a local variable instead
+/// of using a memcpy from a constant global. Assumes we've already decided to
+/// not user bzero.
+/// FIXME We could be more clever, as we are for bzero above, and generate
+///   memset followed by stores. It's unclear that's worth the effort.
+static BytePattern shouldUseMemSetToInitialize(llvm::Constant *Init,
+   uint64_t GlobalSize) {
+  uint64_t SizeLimit = 32;
+  if (GlobalSize <= SizeLimit)
+return BytePattern::None();
+  return constantIsRepeatedBytePattern(Init);
+}
+
 /// EmitAutoVarDecl - Emit code and set up an entry in LocalDeclMap for a
 /// variable declaration with auto, register, or no storage class specifier.
 /// These turn into simple stack objects, or GlobalValues depending on target.
@@ -1401,41 +1508,49 @@
   if (Loc.getType() != BP)
 Loc = Builder.CreateBitCast(Loc, BP);
 
-  // If the initializer is all or mostly zeros, codegen with bzero then do a
-  // few stores afterward.
-  if (shouldUseBZeroPlusStoresToInitialize(
-  constant,
-  CGM.getDataLayout().getTypeAllocSize(constant->getType( {
+  // If the initializer is all or mostly the same, codegen with bzero / memset
+  // then do a few stores afterward.
+  uint64_t ConstantSize =
+  CGM.getDataLayout().getTypeAllocSize(constant->getType());

[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-24 Thread JF Bastien via Phabricator via cfe-commits
jfb marked an inline comment as done.
jfb added a comment.

Addressed all comments.




Comment at: lib/CodeGen/CGDecl.cpp:956-957
+class BytePattern {
+  uint8_t Val;
+  enum class ValueType { Specific, Any, None } Type;
+  BytePattern(ValueType Type) : Type(Type) {}

bogner wrote:
> Probably makes sense to swap the order of these or give the enum class a 
> smaller underlying type than int.
I defined the enum class' storage type as `uint8_t`.


Repository:
  rC Clang

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-24 Thread JF Bastien via Phabricator via cfe-commits
jfb updated this revision to Diff 157191.
jfb marked 2 inline comments as done.
jfb added a comment.

- Use short to test padding between array elements.
- Define enum class storage type; swap order of if / else to make it more 
readable.


Repository:
  rC Clang

https://reviews.llvm.org/D49771

Files:
  lib/CodeGen/CGDecl.cpp
  test/CodeGen/init.c

Index: test/CodeGen/init.c
===
--- test/CodeGen/init.c
+++ test/CodeGen/init.c
@@ -140,6 +140,72 @@
   // CHECK: call void @bar
 }
 
+void nonzeroMemseti8() {
+  char arr[33] = { 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, };
+  // CHECK-LABEL: @nonzeroMemseti8(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 42, i32 33, i1 false)
+}
+
+void nonzeroMemseti16() {
+  unsigned short arr[17] = { 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, };
+  // CHECK-LABEL: @nonzeroMemseti16(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 66, i32 34, i1 false)
+}
+
+void nonzeroMemseti32() {
+  unsigned arr[9] = { 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, };
+  // CHECK-LABEL: @nonzeroMemseti32(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 36, i1 false)
+}
+
+void nonzeroMemseti64() {
+  unsigned long long arr[7] = { 0x, 0x, 0x, 0x, 0x,  0x,  0x,  };
+  // CHECK-LABEL: @nonzeroMemseti64(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -86, i32 56, i1 false)
+}
+
+void nonzeroMemsetf32() {
+  float arr[9] = { 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, };
+  // CHECK-LABEL: @nonzeroMemsetf32(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 101, i32 36, i1 false)
+}
+
+void nonzeroMemsetf64() {
+  double arr[7] = { 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, };
+  // CHECK-LABEL: @nonzeroMemsetf64(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 68, i32 56, i1 false)
+}
+
+void nonzeroPaddedUnionMemset() {
+  union U { char c; int i; };
+  union U arr[9] = { 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, };
+  // CHECK-LABEL: @nonzeroPaddedUnionMemset(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 36, i1 false)
+}
+
+void nonzeroNestedMemset() {
+  union U { char c; int i; };
+  struct S { union U u; short i; };
+  struct S arr[5] = { { {0xF0}, 0xF0F0 }, { {0xF0}, 0xF0F0 }, { {0xF0}, 0xF0F0 }, { {0xF0}, 0xF0F0 }, { {0xF0}, 0xF0F0 }, };
+  // CHECK-LABEL: @nonzeroNestedMemset(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 40, i1 false)
+}
 
 // PR9257
 struct test11S {
Index: lib/CodeGen/CGDecl.cpp
===
--- lib/CodeGen/CGDecl.cpp
+++ lib/CodeGen/CGDecl.cpp
@@ -948,6 +948,113 @@
  canEmitInitWithFewStoresAfterBZero(Init, StoreBudget);
 }
 
+/// A byte pattern.
+///
+/// Can be "any" pattern if the value was padding or known to be undef.
+/// Can be "none" pattern if a sequence doesn't exist.
+class BytePattern {
+  uint8_t Val;
+  enum class ValueType : uint8_t { Specific, Any, None } Type;
+  BytePattern(ValueType Type) : Type(Type) {}
+
+public:
+  BytePattern(uint8_t Value) : Val(Value), Type(ValueType::Specific) {}
+  static BytePattern Any() { return BytePattern(ValueType::Any); }
+  static BytePattern None() { return BytePattern(ValueType::None); }
+  bool isAny() const { return Type == ValueType::Any; }
+  bool isNone() const { return Type == ValueType::None; }
+  bool isValued() const { return Type == ValueType::Specific; }
+  uint8_t getValue() const {
+assert(isValued());
+return Val;
+  }
+  BytePattern merge(const BytePattern Other) const {
+if (isNone() || Other.isNone())
+  return None();
+if (isAny())
+  return Other;
+if (Other.isAny())
+  return *this;
+if (getValue() == Other.getValue())
+  return *this;
+return None();
+  }
+};
+
+/// Figures out whether the constant can be initialized with memset.
+static BytePattern constantIsRepeatedBytePattern(llvm::Constant *C) {
+  if (isa(C) || isa(C))
+return BytePattern(0x0

[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-24 Thread Justin Bogner via Phabricator via cfe-commits
bogner accepted this revision.
bogner added a comment.
This revision is now accepted and ready to land.

Seems straightforward and correct to me.




Comment at: lib/CodeGen/CGDecl.cpp:956-957
+class BytePattern {
+  uint8_t Val;
+  enum class ValueType { Specific, Any, None } Type;
+  BytePattern(ValueType Type) : Type(Type) {}

Probably makes sense to swap the order of these or give the enum class a 
smaller underlying type than int.



Comment at: lib/CodeGen/CGDecl.cpp:996-998
+if (!Value.isSplat(8))
+  return BytePattern::None();
+return BytePattern(Value.getLoBits(8).getLimitedValue());

Very much a nitpick, but this would be slightly easier to follow written in the 
order without a negation.


Repository:
  rC Clang

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-24 Thread Arthur O'Dwyer via Phabricator via cfe-commits
Quuxplusone added inline comments.



Comment at: test/CodeGen/init.c:202
+  union U { char c; int i; };
+  struct S { union U u; int i; };
+  struct S arr[5] = { { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { 
{0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, };

Drive-by suggestion: If you make this `struct S { union U u; short s; };` then 
you'll also be testing the case of "padding between struct fields", which is 
otherwise untested here.


Repository:
  rC Clang

https://reviews.llvm.org/D49771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49771: CodeGen: use non-zero memset when possible for automatic variables

2018-07-24 Thread JF Bastien via Phabricator via cfe-commits
jfb created this revision.
jfb added a reviewer: dexonsmith.
Herald added a subscriber: cfe-commits.

Right now automatic variables are either initialized with bzero followed by a 
few stores, or memcpy'd from a synthesized global. We end up encountering a 
fair amount of code where memcpy of non-zero byte patterns would be better than 
memcpy from a global because it touches less memory and generates a smaller 
binary. The optimizer could reason about this, but it's not really worth it 
when clang already knows.

This code could definitely be more clever but I'm not sure it's worth it. In 
particular we could track a histogram of bytes seen and figure out (as we do 
with bzero) if a memset could be followed by a handful of stores. Similarly, we 
could tune the heuristics for GlobalSize, but using the same as for bzero seems 
conservatively OK for now.

rdar://problem/42563091


Repository:
  rC Clang

https://reviews.llvm.org/D49771

Files:
  lib/CodeGen/CGDecl.cpp
  test/CodeGen/init.c

Index: test/CodeGen/init.c
===
--- test/CodeGen/init.c
+++ test/CodeGen/init.c
@@ -140,6 +140,72 @@
   // CHECK: call void @bar
 }
 
+void nonzeroMemseti8() {
+  char arr[33] = { 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, };
+  // CHECK-LABEL: @nonzeroMemseti8(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 42, i32 33, i1 false)
+}
+
+void nonzeroMemseti16() {
+  unsigned short arr[17] = { 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, };
+  // CHECK-LABEL: @nonzeroMemseti16(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 66, i32 34, i1 false)
+}
+
+void nonzeroMemseti32() {
+  unsigned arr[9] = { 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, };
+  // CHECK-LABEL: @nonzeroMemseti32(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 36, i1 false)
+}
+
+void nonzeroMemseti64() {
+  unsigned long long arr[7] = { 0x, 0x, 0x, 0x, 0x,  0x,  0x,  };
+  // CHECK-LABEL: @nonzeroMemseti64(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -86, i32 56, i1 false)
+}
+
+void nonzeroMemsetf32() {
+  float arr[9] = { 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, };
+  // CHECK-LABEL: @nonzeroMemsetf32(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 101, i32 36, i1 false)
+}
+
+void nonzeroMemsetf64() {
+  double arr[7] = { 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, 0x1.4p+69, };
+  // CHECK-LABEL: @nonzeroMemsetf64(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 68, i32 56, i1 false)
+}
+
+void nonzeroPaddedUnionMemset() {
+  union U { char c; int i; };
+  union U arr[9] = { 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, };
+  // CHECK-LABEL: @nonzeroPaddedUnionMemset(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 36, i1 false)
+}
+
+void nonzeroNestedMemset() {
+  union U { char c; int i; };
+  struct S { union U u; int i; };
+  struct S arr[5] = { { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, };
+  // CHECK-LABEL: @nonzeroNestedMemset(
+  // CHECK-NOT: store
+  // CHECK-NOT: memcpy
+  // CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 40, i1 false)
+}
 
 // PR9257
 struct test11S {
Index: lib/CodeGen/CGDecl.cpp
===
--- lib/CodeGen/CGDecl.cpp
+++ lib/CodeGen/CGDecl.cpp
@@ -948,6 +948,113 @@
  canEmitInitWithFewStoresAfterBZero(Init, StoreBudget);
 }
 
+/// A byte pattern.
+///
+/// Can be "any" pattern if the value was padding or known to be undef.
+/// Can be "none" pattern if a sequence doesn't exist.
+class BytePattern {
+  uint8_t Val;
+  enum class ValueType { Specific, Any, None } Type;
+  BytePattern(ValueType Type) : Type(Type) {}
+
+public:
+  BytePattern(uint8_t Value) : Val(Value), Type(ValueType::Specific) {}
+  static BytePattern Any() { return BytePattern(ValueType::Any); }
+  static BytePattern None() { return BytePattern(ValueType::None); }
+  bool isAny() const { return Type == ValueType::Any; }
+  bool isNone() const {