On Wed, 25 Nov 2015, Thomas Gleixner wrote: > The problem is actually in the vector assignment code. > > > [001] 22.936764: __assign_irq_vector : cpu 44 : vector=134 -> > > 0xffff88102a8196f8 > > No interrupt happened so far. So nothing cleans up the vector on cpu 1 > > > [044] 61.670267: __assign_irq_vector : cpu 34 : vector=123 -> > > 0xffff88102a8196f8 > > Now that moves it from 44 to 34 and ignores that cpu 1 still has the > vector assigned. __assign_irq_vector unconditionally overwrites > data->old_domain, so the bit of cpu 1 is lost .... > > I'm staring into the code to figure out a fix ....
Just to figure out that my analysis was completely wrong. __assign_irq_vector() { if (d->move_in_progress) return -EBUSY; ... So that cannot happen. Now the question is: > [001] 22.936764: __assign_irq_vector : cpu 44 : vector=134 -> > 0xffff88102a8196f8 So CPU1 sees still data->move_in_progress [001] 54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8 And why does __assign_irq_vector not see it, but no cleanup vector was received by cpu1 with data->move_in_progress == 0. > [044] 61.670267: __assign_irq_vector : cpu 34 : vector=123 -> > 0xffff88102a8196f8 Ahhhhh. __send_cleanup_vector() { send_IPI() move_in_progress = 0; } So if CPU1 gets the IPI _BEFORE_ move_in_progress is set to 0, and does not get another IPI before the next move ..... That has been that way forever. Duh. Working on a real fix this time. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/