[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-11-20 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson abandoned this revision.
tejohnson added a comment.

Abandoned in favor of new approach in 
https://reviews.llvm.org/D53890/https://reviews.llvm.org/D53891.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-11-09 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added inline comments.



Comment at: docs/LTOVisibility.rst:9
 unit's *LTO unit* is the subset of the linkage unit that is linked together
-using link-time optimization; in the case where LTO is not being used, the
-linkage unit's LTO unit is empty. Each linkage unit has only a single LTO unit.
+using link-time optimization; in the case where LTO units are not being used,
+the linkage unit's LTO unit is empty. Each linkage unit has only a single LTO

It's a little confusing to talk about "LTO units" as a property of a 
translation unit when there is only one LTO unit per linkage unit. I think this 
should say that an LTO unit is the subset of the linkage unit compiled with 
certain flags. Then in the rest of the document you can talk about translation 
units that are either part of or not part of the LTO unit.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-11-01 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson updated this revision to Diff 172181.
tejohnson added a comment.

Address comments:
Promote -flto-unit to clang driver option (and test it)
Adjust LTOVisibility.rst to reflect change of default and new option.


Repository:
  rC Clang

https://reviews.llvm.org/D53524

Files:
  docs/LTOVisibility.rst
  include/clang/Driver/CC1Options.td
  include/clang/Driver/Options.td
  include/clang/Driver/SanitizerArgs.h
  lib/Driver/SanitizerArgs.cpp
  lib/Driver/ToolChains/Clang.cpp
  test/Driver/lto-unit.c

Index: test/Driver/lto-unit.c
===
--- test/Driver/lto-unit.c
+++ test/Driver/lto-unit.c
@@ -1,9 +1,18 @@
-// RUN: %clang -target x86_64-unknown-linux -### %s -flto=full 2>&1 | FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-unknown-linux -### %s -flto=thin 2>&1 | FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=full 2>&1 | FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=thin 2>&1 | FileCheck --check-prefix=NOUNIT %s
-// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=full 2>&1 | FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=thin 2>&1 | FileCheck --check-prefix=NOUNIT %s
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto=full -fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto=thin -fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto=full -flto-unit 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto=thin -flto-unit 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=full -fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=thin -fwhole-program-vtables 2>&1 | FileCheck --check-prefix=NOUNIT %s
+// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=full -flto-unit 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=thin -flto-unit 2>&1 | FileCheck --check-prefix=NOUNIT %s
+// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=full -fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=thin -fwhole-program-vtables 2>&1 | FileCheck --check-prefix=NOUNIT %s
+// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=full -flto-unit 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=thin -flto-unit 2>&1 | FileCheck --check-prefix=NOUNIT %s
 
 // UNIT: "-flto-unit"
 // NOUNIT-NOT: "-flto-unit"
+
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto-unit 2>&1 | FileCheck --check-prefix=NO-LTO %s
+// NO-LTO: invalid argument '-flto-unit' only allowed with '-flto'
Index: lib/Driver/ToolChains/Clang.cpp
===
--- lib/Driver/ToolChains/Clang.cpp
+++ lib/Driver/ToolChains/Clang.cpp
@@ -3364,21 +3364,6 @@
 // the use-list order, since serialization to bitcode is part of the flow.
 if (JA.getType() == types::TY_LLVM_BC)
   CmdArgs.push_back("-emit-llvm-uselists");
-
-// Device-side jobs do not support LTO.
-bool isDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) ||
-   JA.isDeviceOffloading(Action::OFK_Host));
-
-if (D.isUsingLTO() && !isDeviceOffloadAction) {
-  Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);
-
-  // The Darwin and PS4 linkers currently use the legacy LTO API, which
-  // does not support LTO unit features (CFI, whole program vtable opt)
-  // under ThinLTO.
-  if (!(RawTriple.isOSDarwin() || RawTriple.isPS4()) ||
-  D.getLTOMode() == LTOK_Full)
-CmdArgs.push_back("-flto-unit");
-}
   }
 
   if (const Arg *A = Args.getLastArg(options::OPT_fthinlto_index_EQ)) {
@@ -4980,6 +4965,31 @@
 CmdArgs.push_back("-fwhole-program-vtables");
   }
 
+  bool LTOUnit = Args.hasArg(options::OPT_flto_unit);
+  if (LTOUnit && !D.isUsingLTO())
+D.Diag(diag::err_drv_argument_only_allowed_with) << "-flto-unit"
+ << "-flto";
+
+  // Device-side jobs do not support LTO.
+  bool isDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) ||
+ JA.isDeviceOffloading(Action::OFK_Host));
+
+  if (D.isUsingLTO() &&
+  (isa(JA) || isa(JA)) &&
+  !isDeviceOffloadAction) {
+Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);
+
+// Enable LTO unit if need for CFI or whole program vtable optimization.
+// The Darwin and PS4 linkers currently use the legacy LTO API, which
+// does not support LTO unit features (CFI, 

[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-30 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In https://reviews.llvm.org/D53524#1279288, @tejohnson wrote:

> In https://reviews.llvm.org/D53524#1276038, @pcc wrote:
>
> > In https://reviews.llvm.org/D53524#1274505, @tejohnson wrote:
> >
> > > In https://reviews.llvm.org/D53524#1271387, @tejohnson wrote:
> > >
> > > > Can we detect that TUs compiled with -flto-unit are being mixed with 
> > > > those not built without -flto-unit at the thin link time and issue an 
> > > > error?
> > >
> > >
> > > This would be doable pretty easily. E.g. add a flag at the index level 
> > > that the module would have been split but wasn't. Users who get the error 
> > > and want to support always-enabled CFI could opt in via -flto-unit.
> >
> >
> > Yes. I don't think we should make a change like this unless there is 
> > something like that in place, though. The documentation (LTOVisibility.rst) 
> > needs to be updated too.
>
>
> Ok, let me work on that now and we can get that in before this one.


Mailed https://reviews.llvm.org/D53890 for this part.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-29 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

In https://reviews.llvm.org/D53524#1279288, @tejohnson wrote:

> In https://reviews.llvm.org/D53524#1276038, @pcc wrote:
>
> > In https://reviews.llvm.org/D53524#1274505, @tejohnson wrote:
> >
> > > In https://reviews.llvm.org/D53524#1271387, @tejohnson wrote:
> > >
> > > > In https://reviews.llvm.org/D53524#1271357, @pcc wrote:
> > > >
> > > > > The reason why LTO unit is always enabled is so that you can link 
> > > > > translation units compiled with `-fsanitize=cfi` and/or 
> > > > > `-fwhole-program-vtables` against translation units compiled without 
> > > > > CFI/WPD. With this change we will see miscompiles in the translation 
> > > > > units compiled with CFI/WPD if they use vtables in the translation 
> > > > > units compiled without CFI/WPD. If we really need this option I think 
> > > > > it should be an opt out.
> > > >
> > > >
> > > > Is there an important use case for support thing mixing and matching? 
> > > > The issue is that it comes at a cost to all ThinLTO compiles for codes 
> > > > with vtables by requiring them all to process IR during the thin link.
> > >
> > >
> > > Ping on the question of why this mode needs to be default. If it was a 
> > > matter of a few percent overhead that would be one thing, but we're 
> > > talking a *huge* overhead (as noted off-patch for my app I'm seeing >20x 
> > > thin link time currently, and with improvements to the hashing to always 
> > > get successful splitting we could potentially get down to closer to 2x - 
> > > still a big overhead). This kind of overhead should be opt-in. The 
> > > average ThinLTO user is not going to realize they are paying a big 
> > > overhead because CFI is always pre-enabled.
> >
> >
> > Well, the intent was always that the overhead would be minimal, which is 
> > why things are set up the way that they are. But it doesn't sound like 
> > anyone is going to have the time to fully address the performance problems 
> > that you've seen any time soon, so maybe it would be fine to introduce the 
> > -flto-unit flag. I guess we can always change the flag so that it has no 
> > effect if/when the performance problem is addressed.
>
>
> Just to clarify, since there is already a -flto-unit flag: it is currently a 
> cc1 flag, did you want it made into a driver option as well?


Yes, that's what I had in mind.

 Can we detect that TUs compiled with -flto-unit are being mixed with those 
 not built without -flto-unit at the thin link time and issue an error?
>>> 
>>> This would be doable pretty easily. E.g. add a flag at the index level that 
>>> the module would have been split but wasn't. Users who get the error and 
>>> want to support always-enabled CFI could opt in via -flto-unit.
>> 
>> Yes. I don't think we should make a change like this unless there is 
>> something like that in place, though. The documentation (LTOVisibility.rst) 
>> needs to be updated too.
> 
> Ok, let me work on that now and we can get that in before this one.




Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-29 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In https://reviews.llvm.org/D53524#1276038, @pcc wrote:

> In https://reviews.llvm.org/D53524#1274505, @tejohnson wrote:
>
> > In https://reviews.llvm.org/D53524#1271387, @tejohnson wrote:
> >
> > > In https://reviews.llvm.org/D53524#1271357, @pcc wrote:
> > >
> > > > The reason why LTO unit is always enabled is so that you can link 
> > > > translation units compiled with `-fsanitize=cfi` and/or 
> > > > `-fwhole-program-vtables` against translation units compiled without 
> > > > CFI/WPD. With this change we will see miscompiles in the translation 
> > > > units compiled with CFI/WPD if they use vtables in the translation 
> > > > units compiled without CFI/WPD. If we really need this option I think 
> > > > it should be an opt out.
> > >
> > >
> > > Is there an important use case for support thing mixing and matching? The 
> > > issue is that it comes at a cost to all ThinLTO compiles for codes with 
> > > vtables by requiring them all to process IR during the thin link.
> >
> >
> > Ping on the question of why this mode needs to be default. If it was a 
> > matter of a few percent overhead that would be one thing, but we're talking 
> > a *huge* overhead (as noted off-patch for my app I'm seeing >20x thin link 
> > time currently, and with improvements to the hashing to always get 
> > successful splitting we could potentially get down to closer to 2x - still 
> > a big overhead). This kind of overhead should be opt-in. The average 
> > ThinLTO user is not going to realize they are paying a big overhead because 
> > CFI is always pre-enabled.
>
>
> Well, the intent was always that the overhead would be minimal, which is why 
> things are set up the way that they are. But it doesn't sound like anyone is 
> going to have the time to fully address the performance problems that you've 
> seen any time soon, so maybe it would be fine to introduce the -flto-unit 
> flag. I guess we can always change the flag so that it has no effect if/when 
> the performance problem is addressed.


Just to clarify, since there is already a -flto-unit flag: it is currently a 
cc1 flag, did you want it made into a driver option as well?

> 
> 
>>> Can we detect that TUs compiled with -flto-unit are being mixed with those 
>>> not built without -flto-unit at the thin link time and issue an error?
>> 
>> This would be doable pretty easily. E.g. add a flag at the index level that 
>> the module would have been split but wasn't. Users who get the error and 
>> want to support always-enabled CFI could opt in via -flto-unit.
> 
> Yes. I don't think we should make a change like this unless there is 
> something like that in place, though. The documentation (LTOVisibility.rst) 
> needs to be updated too.

Ok, let me work on that now and we can get that in before this one.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-25 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc added a comment.

In https://reviews.llvm.org/D53524#1274505, @tejohnson wrote:

> In https://reviews.llvm.org/D53524#1271387, @tejohnson wrote:
>
> > In https://reviews.llvm.org/D53524#1271357, @pcc wrote:
> >
> > > The reason why LTO unit is always enabled is so that you can link 
> > > translation units compiled with `-fsanitize=cfi` and/or 
> > > `-fwhole-program-vtables` against translation units compiled without 
> > > CFI/WPD. With this change we will see miscompiles in the translation 
> > > units compiled with CFI/WPD if they use vtables in the translation units 
> > > compiled without CFI/WPD. If we really need this option I think it should 
> > > be an opt out.
> >
> >
> > Is there an important use case for support thing mixing and matching? The 
> > issue is that it comes at a cost to all ThinLTO compiles for codes with 
> > vtables by requiring them all to process IR during the thin link.
>
>
> Ping on the question of why this mode needs to be default. If it was a matter 
> of a few percent overhead that would be one thing, but we're talking a *huge* 
> overhead (as noted off-patch for my app I'm seeing >20x thin link time 
> currently, and with improvements to the hashing to always get successful 
> splitting we could potentially get down to closer to 2x - still a big 
> overhead). This kind of overhead should be opt-in. The average ThinLTO user 
> is not going to realize they are paying a big overhead because CFI is always 
> pre-enabled.


Well, the intent was always that the overhead would be minimal, which is why 
things are set up the way that they are. But it doesn't sound like anyone is 
going to have the time to fully address the performance problems that you've 
seen any time soon, so maybe it would be fine to introduce the -flto-unit flag. 
I guess we can always change the flag so that it has no effect if/when the 
performance problem is addressed.

>> Can we detect that TUs compiled with -flto-unit are being mixed with those 
>> not built without -flto-unit at the thin link time and issue an error?
> 
> This would be doable pretty easily. E.g. add a flag at the index level that 
> the module would have been split but wasn't. Users who get the error and want 
> to support always-enabled CFI could opt in via -flto-unit.

Yes. I don't think we should make a change like this unless there is something 
like that in place, though. The documentation (LTOVisibility.rst) needs to be 
updated too.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-24 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In https://reviews.llvm.org/D53524#1271387, @tejohnson wrote:

> In https://reviews.llvm.org/D53524#1271357, @pcc wrote:
>
> > The reason why LTO unit is always enabled is so that you can link 
> > translation units compiled with `-fsanitize=cfi` and/or 
> > `-fwhole-program-vtables` against translation units compiled without 
> > CFI/WPD. With this change we will see miscompiles in the translation units 
> > compiled with CFI/WPD if they use vtables in the translation units compiled 
> > without CFI/WPD. If we really need this option I think it should be an opt 
> > out.
>
>
> Is there an important use case for support thing mixing and matching? The 
> issue is that it comes at a cost to all ThinLTO compiles for codes with 
> vtables by requiring them all to process IR during the thin link.


Ping on the question of why this mode needs to be default. If it was a matter 
of a few percent overhead that would be one thing, but we're talking a *huge* 
overhead (as noted off-patch for my app I'm seeing >20x thin link time 
currently, and with improvements to the hashing to always get successful 
splitting we could potentially get down to closer to 2x - still a big 
overhead). This kind of overhead should be opt-in. The average ThinLTO user is 
not going to realize they are paying a big overhead because CFI is always 
pre-enabled.

> Can we detect that TUs compiled with -flto-unit are being mixed with those 
> not built without -flto-unit at the thin link time and issue an error?

This would be doable pretty easily. E.g. add a flag at the index level that the 
module would have been split but wasn't. Users who get the error and want to 
support always-enabled CFI could opt in via -flto-unit.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-22 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In https://reviews.llvm.org/D53524#1271357, @pcc wrote:

> The reason why LTO unit is always enabled is so that you can link translation 
> units compiled with `-fsanitize=cfi` and/or `-fwhole-program-vtables` against 
> translation units compiled without CFI/WPD. With this change we will see 
> miscompiles in the translation units compiled with CFI/WPD if they use 
> vtables in the translation units compiled without CFI/WPD. If we really need 
> this option I think it should be an opt out.


Is there an important use case for support thing mixing and matching? The issue 
is that it comes at a cost to all ThinLTO compiles for codes with vtables by 
requiring them all to process IR during the thin link. Can we detect that TUs 
compiled with -flto-unit are being mixed with those not built without 
-flto-unit at the thin link time and issue an error?


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-22 Thread Peter Collingbourne via Phabricator via cfe-commits
pcc requested changes to this revision.
pcc added a comment.
This revision now requires changes to proceed.

The reason why LTO unit is always enabled is so that you can link translation 
units compiled with `-fsanitize=cfi` and/or `-fwhole-program-vtables` against 
translation units compiled without CFI/WPD. With this change we will see 
miscompiles in the translation units compiled with CFI/WPD if they use vtables 
in the translation units compiled without CFI/WPD. If we really need this 
option I think it should be an opt out.


Repository:
  rC Clang

https://reviews.llvm.org/D53524



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53524: [ThinLTO] Enable LTOUnit only when it is needed

2018-10-22 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson created this revision.
tejohnson added a reviewer: pcc.
Herald added subscribers: dexonsmith, steven_wu, inglorion, mehdi_amini.

Currently, -flto-unit is specified whenever LTO options are used
(unless using the old LTO API). This causes vtable defs to be processed
using regular LTO, which is needed for CFI and whole program vtable
optimizations, since they need to modify the vtables in a whole program
manner.

However, this causes non-negligible overhead due to the regular
LTO processing. Since this isn't needed when not using CFI or
-fwhole-program-vtables, only enable -flto-unit in those cases.
Otherwise all ThinLTO compiles pay the overhead, even when not needed.


Repository:
  rC Clang

https://reviews.llvm.org/D53524

Files:
  include/clang/Driver/SanitizerArgs.h
  lib/Driver/SanitizerArgs.cpp
  lib/Driver/ToolChains/Clang.cpp
  test/Driver/lto-unit.c


Index: test/Driver/lto-unit.c
===
--- test/Driver/lto-unit.c
+++ test/Driver/lto-unit.c
@@ -1,9 +1,9 @@
-// RUN: %clang -target x86_64-unknown-linux -### %s -flto=full 2>&1 | 
FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-unknown-linux -### %s -flto=thin 2>&1 | 
FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=full 2>&1 | 
FileCheck --check-prefix=UNIT %s
-// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=thin 2>&1 | 
FileCheck --check-prefix=NOUNIT %s
-// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=full 2>&1 | FileCheck 
--check-prefix=UNIT %s
-// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=thin 2>&1 | FileCheck 
--check-prefix=NOUNIT %s
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto=full 
-fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-unknown-linux -### %s -flto=thin 
-fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=full 
-fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=thin 
-fwhole-program-vtables 2>&1 | FileCheck --check-prefix=NOUNIT %s
+// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=full 
-fwhole-program-vtables 2>&1 | FileCheck --check-prefix=UNIT %s
+// RUN: %clang -target x86_64-scei-ps4 -### %s -flto=thin 
-fwhole-program-vtables 2>&1 | FileCheck --check-prefix=NOUNIT %s
 
 // UNIT: "-flto-unit"
 // NOUNIT-NOT: "-flto-unit"
Index: lib/Driver/ToolChains/Clang.cpp
===
--- lib/Driver/ToolChains/Clang.cpp
+++ lib/Driver/ToolChains/Clang.cpp
@@ -3364,21 +3364,6 @@
 // the use-list order, since serialization to bitcode is part of the flow.
 if (JA.getType() == types::TY_LLVM_BC)
   CmdArgs.push_back("-emit-llvm-uselists");
-
-// Device-side jobs do not support LTO.
-bool isDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) ||
-   JA.isDeviceOffloading(Action::OFK_Host));
-
-if (D.isUsingLTO() && !isDeviceOffloadAction) {
-  Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);
-
-  // The Darwin and PS4 linkers currently use the legacy LTO API, which
-  // does not support LTO unit features (CFI, whole program vtable opt)
-  // under ThinLTO.
-  if (!(RawTriple.isOSDarwin() || RawTriple.isPS4()) ||
-  D.getLTOMode() == LTOK_Full)
-CmdArgs.push_back("-flto-unit");
-}
   }
 
   if (const Arg *A = Args.getLastArg(options::OPT_fthinlto_index_EQ)) {
@@ -4980,6 +4965,25 @@
 CmdArgs.push_back("-fwhole-program-vtables");
   }
 
+  // Device-side jobs do not support LTO.
+  bool isDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) ||
+ JA.isDeviceOffloading(Action::OFK_Host));
+
+  if (D.isUsingLTO() &&
+  (isa(JA) || isa(JA)) &&
+  !isDeviceOffloadAction) {
+Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);
+
+// Enable LTO unit if need for CFI or whole program vtable optimization.
+// The Darwin and PS4 linkers currently use the legacy LTO API, which
+// does not support LTO unit features (CFI, whole program vtable opt)
+// under ThinLTO.
+bool SupportsLTOUnit = !(RawTriple.isOSDarwin() || RawTriple.isPS4()) ||
+   D.getLTOMode() == LTOK_Full;
+if ((WholeProgramVTables || Sanitize.needsLTO()) && SupportsLTOUnit)
+  CmdArgs.push_back("-flto-unit");
+  }
+
   if (Arg *A = Args.getLastArg(options::OPT_fexperimental_isel,
options::OPT_fno_experimental_isel)) {
 CmdArgs.push_back("-mllvm");
Index: lib/Driver/SanitizerArgs.cpp
===
--- lib/Driver/SanitizerArgs.cpp
+++ lib/Driver/SanitizerArgs.cpp
@@ -207,6 +207,10 @@
   return Sanitizers.Mask & NeedsUnwindTables;
 }