On Thu, Sep 13, 2012 at 02:05:17PM +0530, Srivatsa S. Bhat wrote:
> On 09/12/2012 06:06 PM, Srivatsa S. Bhat wrote:
> > On 07/19/2012 10:45 PM, Paul E. McKenney wrote:
> >> On Thu, Jul 19, 2012 at 05:39:30PM +0530, Srivatsa S. Bhat wrote:
> >>> Hi Paul,
> >>>
> >>> While running a CPU hotplug stress test on v3.5-rc7+
> >>> (mainline commit 8a7298b7805ab) I hit this warning.
> >>> I haven't tried to debug this yet...
> >>>
> >>> Line number 1550 maps to:
> >>>
> >>> WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> >>>
> >>> inside rcu_do_batch().
> >>
> >> Hello, Srivatsa,
> >>
> >> I believe that you need commit a16b7a69 (Prevent __call_rcu() from
> >> invoking RCU core on offline CPUs), which is currently in -tip, queued
> >> for 3.6.  Please see below for the patch.
> >>
> >> Does this help?
> >>
> > 
> > Hi Paul,
> > 
> > I am hitting the cpu_is_offline() warning in rcu_do_batch() (see 2 of the
> > examples below) occasionally while testing CPU hotplug on Thomas' 
> > smp/hotplug
> > branch in -tip. It does contain the commit that you had mentioned above.
> > 
> 
> I also hit some writeback related problems during some of these runs. But I 
> was
> not able to reproduce them after that occurrence. (Adding relevant people to 
> CC.)
>
> I hit the divide error shown below during the CPU hotplug test run, and the 
> general

I've hit the divide error before. And I'm not luckier than you in
reproducing the bug. I tempt to add a test as the workaround:

+       if (WARN_ON(!denominator))
+               return dirty;

btw, any chance you may share the CPU hotplug test scripts?
It'd be a valuable addition to my 0day boot test system.

> protection fault subsequently, while trying to shutdown the machine after the 
> test.

Thanks,
Fengguang

> [  522.987310] SMP alternatives: switching to SMP code
> [  522.999101] smpboot: Booting Node 1 Processor 7 APIC 0x16
> [  524.083872] SMP alternatives: lockdep: fixing up alternatives
> [  524.090053] smpboot: Booting Node 0 Processor 8 APIC 0x1
> [  525.148720] SMP alternatives: lockdep: fixing up alternatives
> [  525.154970] smpboot: Booting Node 0 Processor 9 APIC 0x3
> [  526.024180] divide error: 0000 [#1] SMP 
> [  526.028144] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace 
> cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt 
> iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp 
> pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode 
> lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios 
> rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd 
> ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas 
> scsi_mod thermal thermal_sys hwmon
> [  526.028145] CPU 9 
> [  526.028145] Pid: 2235, comm: flush-8:0 Not tainted 
> 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x 
> -[7870C4Q]-/68Y8033     
> [  526.028145] RIP: 0010:[<ffffffff811276f6>]  [<ffffffff811276f6>] 
> bdi_dirty_limit+0x66/0xc0
> [  526.028145] RSP: 0018:ffff8811530bfcc0  EFLAGS: 00010206
> [  526.028145] RAX: 0000000000b9877e RBX: 00000000001a8112 RCX: 
> 28f5c28f5c28f5c3
> [  526.028145] RDX: 0000000000000000 RSI: 0000000000b9877e RDI: 
> 0000000000000000
> [  526.028145] RBP: ffff8811530bfce0 R08: 0000000000000010 R09: 
> 0000000000000000
> [  526.028145] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffff8808d4408e20
> [  526.028145] R13: ffff8808d4408e20 R14: ffff8808d44091a0 R15: 
> 0000000000000000
> [  526.028145] FS:  0000000000000000(0000) GS:ffff8808ddd40000(0000) 
> knlGS:0000000000000000
> [  526.028145] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  526.028145] CR2: 00007fa35dd4eb60 CR3: 0000000001a0c000 CR4: 
> 00000000000007e0
> [  526.028145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [  526.028145] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [  526.028145] Process flush-8:0 (pid: 2235, threadinfo ffff8811530be000, 
> task ffff88115315c5e0)
> [  526.028145] Stack:
> [  526.028145]  0400000000000000 0000000000000007 ffff8808d4408e20 
> ffffffffffffffee
> [  526.028145]  ffff8811530bfd10 ffffffff811ae95c 0000000000350225 
> 00000000001a8112
> [  526.028145]  0000000000000000 0000000000000002 ffff8811530bfdc0 
> ffffffff811b0620
> [  526.028145] Call Trace:
> [  526.209272] SMP alternatives: lockdep: fixing up alternatives
> [  526.209275] smpboot: Booting Node 0 Processor 10 APIC 0x5
> [  526.220012]  [<ffffffff811ae95c>] over_bground_thresh+0x7c/0x90
> [  526.220012]  [<ffffffff811b0620>] wb_do_writeback+0x170/0x310
> [  526.220012]  [<ffffffff811b08eb>] bdi_writeback_thread+0x12b/0x420
> [  526.220012]  [<ffffffff811b07c0>] ? wb_do_writeback+0x310/0x310
> [  526.220012]  [<ffffffff8106deae>] kthread+0xde/0xf0
> [  526.220012]  [<ffffffff814c6184>] kernel_thread_helper+0x4/0x10
> [  526.220012]  [<ffffffff814bc1f0>] ? retint_restore_args+0x13/0x13
> [  526.220012]  [<ffffffff8106ddd0>] ? __init_kthread_worker+0x70/0x70
> [  526.220012]  [<ffffffff814c6180>] ? gs_change+0x13/0x13
> [  526.220012] Code: 28 5c 8f c2 f5 28 8b 7d e0 48 89 c6 48 0f af f3 48 c1 ee 
> 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 e8 48 89 f0 <48> 
> f7 f7 41 8b 94 24 74 02 00 00 48 0f af d3 48 89 c7 48 c1 ea 
> [  526.220012] RIP  [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
> [  526.220012]  RSP <ffff8811530bfcc0>
> [  526.304469] ---[ end trace bcfc7ab74bdb11a5 ]---
> [  527.330948] SMP alternatives: lockdep: fixing up alternatives
> 
> 
> ----
> 
> [ 1941.614775] SMP alternatives: lockdep: fixing up alternatives
> [ 1941.620614] smpboot: Booting Node 1 Processor 5 APIC 0x12
> [ 1941.657424] SMP alternatives: lockdep: fixing up alternatives
> [ 1941.663215] smpboot: Booting Node 1 Processor 6 APIC 0x14
> 
> [ 1992.721819] general protection fault: 0000 [#2] SMP 
> [ 1992.724844] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace 
> cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt 
> iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp 
> pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode 
> lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios 
> rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd 
> ext3 mbcache jbd fan processor mptsas mptscsih mptbase
> scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> [ 1992.726995] CPU 8 
> [ 1992.726995] Pid: 19654, comm: shutdown Tainted: G      D      
> 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x 
> -[7870C4Q]-/68Y8033     
> [ 1992.726995] RIP: 0010:[<ffffffff810843d7>]  [<ffffffff810843d7>] 
> try_to_wake_up+0x57/0x2f0
> [ 1992.726995] RSP: 0018:ffff8808d47e5e58  EFLAGS: 00010002
> [ 1992.726995] RAX: 6b6b6b6b6b6b6b6b RBX: 000000000000000f RCX: 
> 000000006b6b6b6b
> [ 1992.726995] RDX: 000000006b6c6b6b RSI: ffffffff817a7fbf RDI: 
> ffff88115315cdd0
> [ 1992.726995] RBP: ffff8808d47e5e98 R08: 0000000000000002 R09: 
> 0000000000000001
> [ 1992.726995] R10: 0000000000000000 R11: 0000000000000001 R12: 
> ffff88115315c5e0
> [ 1992.726995] R13: 000000006b6b6b6b R14: ffff8808d44091a0 R15: 
> 0000000000000000
> [ 1992.726995] FS:  00007f32b1beb700(0000) GS:ffff8808ddd00000(0000) 
> knlGS:0000000000000000
> [ 1992.726995] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1992.726995] CR2: 00007f32b173d1a0 CR3: 0000001153bd0000 CR4: 
> 00000000000007e0
> [ 1992.726995] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [ 1992.726995] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [ 1992.726995] Process shutdown (pid: 19654, threadinfo ffff8808d47e4000, 
> task ffff8808d4b045e0)
> [ 1992.726995] Stack:
> [ 1992.726995]  0000000000000246 ffff88115315cdd0 0000000000000286 
> 0000000000000000
> [ 1992.726995]  ffff8808d4408e20 ffff8808d5433288 ffff8808d44091a0 
> 00746c6168206d65
> [ 1992.726995]  ffff8808d47e5ea8 ffffffff810846a0 ffff8808d47e5ed8 
> ffffffff811adce1
> [ 1992.726995] Call Trace:
> [ 1992.726995]  [<ffffffff810846a0>] wake_up_process+0x10/0x20
> [ 1992.726995]  [<ffffffff811adce1>] bdi_queue_work+0xd1/0x1f0
> [ 1992.726995]  [<ffffffff811ae7d9>] __bdi_start_writeback+0x79/0x160
> [ 1992.726995]  [<ffffffff811af1b0>] wakeup_flusher_threads+0x120/0x1e0
> [ 1992.726995]  [<ffffffff811af0ca>] ? wakeup_flusher_threads+0x3a/0x1e0
> [ 1992.726995]  [<ffffffff811b45b2>] sys_sync+0x22/0x90
> [ 1992.726995]  [<ffffffff814c4fb9>] system_call_fastpath+0x16/0x1b
> [ 1992.726995] Code: 31 ed 48 89 c7 48 89 45 c8 e8 46 72 43 00 48 89 45 d0 49 
> 8b 04 24 85 c3 0f 84 02 02 00 00 45 8b 6c 24 2c 49 8b 44 24 08 45 85 ed <44> 
> 8b 70 18 74 75 8b 1d dd ea 9d 00 85 db 0f 85 2d 02 00 00 49  
> [ 1992.726995] RIP  [<ffffffff810843d7>] try_to_wake_up+0x57/0x2f0
> [ 1992.726995]  RSP <ffff8808d47e5e58>
> [ 1992.726995] ---[ end trace bcfc7ab74bdb11a6 ]---
> [ 1992.726995] Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to