On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udkni...@gmail.com> wrote: > I get below warning every day with 3.7, > one or two times per day. > > [ 2235.186027] WARNING: at > /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 > default_send_IPI_mask_logical+0x2f/0xb8() > [ 2235.186030] Hardware name: Aspire 4741 > [ 2235.186032] empty IPI mask > [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57 > [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9 > [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24 > [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89 > [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26 > [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502 > [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c > [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2 > [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d > [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa > [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60 > > This patch fix it. > > This patch also fix some system hang problem: > If the data->cpumask been cleared after pass > > if (WARN_ONCE(!mask, "empty IPI mask")) > return; > then the problem 83d349f3 fix will happen again.
Hmm. We have very consciously tried to avoid the extra copy, although I'm not entirely sure why (it might possibly hurt on the MAXSMP configuration). See for example commit 723aae25d5cd ("smp_call_function_many: handle concurrent clearing of mask") which fixed another version of this problem. But I do agree that it looks like the copy is required, simply because - as you say - once we've done the "list_add_rcu()" to add it to the queue, we can have (another) IPI to the target CPU that can now see it and clear the mask. So by the time we get to actually send the IPI, the mask might have been cleared by another IPI. So I do agree that your patch seems correct, but I really really want to run it by other people. Guys? Original patch on lkml. The other possible fix might be to take the &call_function.lock earlier in generic_smp_call_function_interrupt(), so that we can never clear the bit while somebody is adding entries to the list... But I think it very much tries to avoid that on purpose right now, with only the last CPU responding to that IPI taking the lock. So copying the IPI mask seems to be the reasonable approach. Comments? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/