On Wed, 25 Nov 2015, Thomas Gleixner wrote:
> The problem is actually in the vector assignment code.
> 
> >   [001]    22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 
> > 0xffff88102a8196f8
> 
> No interrupt happened so far. So nothing cleans up the vector on cpu 1
> 
> >   [044]    61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 
> > 0xffff88102a8196f8
>  
> Now that moves it from 44 to 34 and ignores that cpu 1 still has the
> vector assigned. __assign_irq_vector unconditionally overwrites
> data->old_domain, so the bit of cpu 1 is lost ....
> 
> I'm staring into the code to figure out a fix ....

Just to figure out that my analysis was completely wrong.

__assign_irq_vector()
{
        if (d->move_in_progress)
                return -EBUSY;
...

So that cannot happen. Now the question is:

>   [001]    22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 
> 0xffff88102a8196f8

So CPU1 sees still data->move_in_progress

  [001]    54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : 
vector=145 0xffff88102a8196f8

And why does __assign_irq_vector not see it, but no cleanup vector was
received by cpu1 with data->move_in_progress == 0.

>   [044]    61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 
> 0xffff88102a8196f8

Ahhhhh.

__send_cleanup_vector()
{
        send_IPI()
        move_in_progress = 0; 
}

So if CPU1 gets the IPI _BEFORE_ move_in_progress is set to 0, and
does not get another IPI before the next move ..... That has been that
way forever.

Duh. Working on a real fix this time.

Thanks,

        tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to