On Sat, Jun 06, 2020 at 04:13:33PM -0700, Guenter Roeck wrote: > On 6/5/20 9:15 AM, Eric Biggers wrote: > > On Fri, Jun 05, 2020 at 09:41:54AM +0200, Peter Zijlstra wrote: > >> On Thu, Jun 04, 2020 at 05:24:33PM -0700, Eric Biggers wrote: > >>> On Thu, Jun 04, 2020 at 07:18:37AM -0700, Guenter Roeck wrote: > >>>> On Tue, May 26, 2020 at 06:11:04PM +0200, Peter Zijlstra wrote: > >>>>> The recent commit: 90b5363acd47 ("sched: Clean up scheduler_ipi()") > >>>>> got smp_call_function_single_async() subtly wrong. Even though it will > >>>>> return -EBUSY when trying to re-use a csd, that condition is not > >>>>> atomic and still requires external serialization. > >>>>> > >>>>> The change in ttwu_queue_remote() got this wrong. > >>>>> > >>>>> While on first reading ttwu_queue_remote() has an atomic test-and-set > >>>>> that appears to serialize the use, the matching 'release' is not in > >>>>> the right place to actually guarantee this serialization. > >>>>> > >>>>> The actual race is vs the sched_ttwu_pending() call in the idle loop; > >>>>> that can run the wakeup-list without consuming the CSD. > >>>>> > >>>>> Instead of trying to chain the lists, merge them. > >>>>> > >>>>> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > >>>>> --- > >>>> ... > >>>>> + /* > >>>>> + * Assert the CSD_TYPE_TTWU layout is similar enough > >>>>> + * for task_struct to be on the @call_single_queue. > >>>>> + */ > >>>>> + BUILD_BUG_ON(offsetof(struct task_struct, wake_entry_type) - > >>>>> offsetof(struct task_struct, wake_entry) != > >>>>> + offsetof(struct __call_single_data, flags) - > >>>>> offsetof(struct __call_single_data, llist)); > >>>>> + > >>>> > >>>> There is no guarantee in C that > >>>> > >>>> type1 a; > >>>> type2 b; > >>>> > >>>> in two different data structures means that offsetof(b) - offsetof(a) > >>>> is the same in both data structures unless attributes such as > >>>> __attribute__((__packed__)) are used. > >>>> > >>>> As result, this does and will cause a variety of build errors depending > >>>> on the compiler version and compile flags. > >>>> > >>>> Guenter > >>> > >>> Yep, this breaks the build for me. > >> > >> -ENOCONFIG > > > > For me, the problem seems to be randstruct. To reproduce, you can use > > (on x86_64): > > > > make defconfig > > echo CONFIG_GCC_PLUGIN_RANDSTRUCT=y >> .config > > make olddefconfig > > make kernel/smp.o > > > > I confirmed that disabling CONFIG_GCC_PLUGIN_RANDSTRUCT "fixes" the problem > in my test builds. Maybe it would make sense to mark that configuration option > for the time being as BROKEN. >
Still occurring on Linus' tree. This needs to be fixed. (And not by removing support for randstruct; that's not a "fix"...) Shouldn't the kbuild test robot have caught this? - Eric