On Thu, Jun 25, 2020 at 10:03:13AM +0200, Peter Zijlstra wrote: > On Wed, Jun 24, 2020 at 02:31:36PM -0700, Nick Desaulniers wrote: > > On Wed, Jun 24, 2020 at 2:15 PM Peter Zijlstra <pet...@infradead.org> wrote: > > > > > > On Wed, Jun 24, 2020 at 01:31:38PM -0700, Sami Tolvanen wrote: > > > > This patch series adds support for building x86_64 and arm64 kernels > > > > with Clang's Link Time Optimization (LTO). > > > > > > > > In addition to performance, the primary motivation for LTO is to allow > > > > Clang's Control-Flow Integrity (CFI) to be used in the kernel. Google's > > > > Pixel devices have shipped with LTO+CFI kernels since 2018. > > > > > > > > Most of the patches are build system changes for handling LLVM bitcode, > > > > which Clang produces with LTO instead of ELF object files, postponing > > > > ELF processing until a later stage, and ensuring initcall ordering. > > > > > > > > Note that first objtool patch in the series is already in linux-next, > > > > but as it's needed with LTO, I'm including it also here to make testing > > > > easier. > > > > > > I'm very sad that yet again, memory ordering isn't addressed. LTO vastly > > > increases the range of the optimizer to wreck things. > > > > Hi Peter, could you expand on the issue for the folks on the thread? > > I'm happy to try to hack something up in LLVM if we check that X does > > or does not happen; maybe we can even come up with some concrete test > > cases that can be added to LLVM's codebase? > > I'm sure Will will respond, but the basic issue is the trainwreck C11 > made of dependent loads. > > Anyway, here's a link to the last time this came up: > > > https://lore.kernel.org/linux-arm-kernel/20171116174830.gx3...@linux.vnet.ibm.com/
Another good read: https://lore.kernel.org/lkml/20150520005510.ga23...@linux.vnet.ibm.com/ and having (partially) re-read that, I now worry intensily about things like latch_tree_find(), cyc2ns_read_begin, __ktime_get_fast_ns(). It looks like kernel/time/sched_clock.c uses raw_read_seqcount() which deviates from the above patterns by, for some reason, using a primitive that includes an extra smp_rmb(). And this is just the few things I could remember off the top of my head, who knows what else is out there.