Re: [ANNOUNCE] 3.14-rt1
On 05/02/2014 04:37 AM, Sebastian Andrzej Siewior wrote: * Fernando Lopez-Lezcano | 2014-04-26 11:29:04 [-0700]: Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I have seen something similar in 3.12.x-r): Yes, you did: https://lkml.org/lkml/2014/3/7/163 You did not test I've sent. Care to do so? I did patch my kernel and (I think) I did not see the problem again. I did get some very occassional hangs that seemed to be video related but I think I could not see what had caused them. Apr 26 11:16:11 localhost kernel: [ 96.323248] [ cut here ] Apr 26 11:16:11 localhost kernel: [ 96.323262] WARNING: CPU: 0 PID: 2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() Apr 26 11:16:11 localhost kernel: [ 96.323264] list_del corruption. prev->next should be 8802101196a0, but was 0001 Apr 26 11:16:11 localhost kernel: [ 96.323266] Modules linked in: and please send backtrace information properly formatted. This is terrible hard to read. Sorry about that, I will attach files in the future. I re-patched 3.14.3-rt5 with a slightly tweaked version of you patch. Will see what happens and report back. -- Fernando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
* Fernando Lopez-Lezcano | 2014-04-26 11:29:04 [-0700]: >Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I >have seen something similar in 3.12.x-r): Yes, you did: https://lkml.org/lkml/2014/3/7/163 You did not test I've sent. Care to do so? >Apr 26 11:16:11 localhost kernel: [ 96.323248] [ cut >here ] >Apr 26 11:16:11 localhost kernel: [ 96.323262] WARNING: CPU: 0 PID: >2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() >Apr 26 11:16:11 localhost kernel: [ 96.323264] list_del corruption. >prev->next should be 8802101196a0, but was 0001 >Apr 26 11:16:11 localhost kernel: [ 96.323266] Modules linked in: and please send backtrace information properly formatted. This is terrible hard to read. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Fri, 2014-05-02 at 12:09 +0200, Sebastian Andrzej Siewior wrote: > * Mike Galbraith | 2014-04-19 16:46:06 [+0200]: > > >Hi Sebastian, > Hi Mike, > > >This hunk in hotplug-light-get-online-cpus.patch looks like a bug. > > > >@@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int > >/* CPU didn't die: tell everyone. Can't complain. */ > >smpboot_unpark_threads(cpu); > >cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > >- goto out_release; > >+ goto out_cancel; > >} > >BUG_ON(cpu_online(cpu)); > > Yes, it looks like it. v3.12-rt did not have this… I just sent a set of dinky patches with which patch to fold them into, along with my way of dealing with the new stopper_lock. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
* Mike Galbraith | 2014-04-21 05:31:18 [+0200]: >Another little bug. This hunk of patches/stomp-machine-raw-lock.patch >should be while (atomic_read(&done.nr_todo)) Thanks, fixed up. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
* Mike Galbraith | 2014-04-19 16:46:06 [+0200]: >Hi Sebastian, Hi Mike, >This hunk in hotplug-light-get-online-cpus.patch looks like a bug. > >@@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int >/* CPU didn't die: tell everyone. Can't complain. */ >smpboot_unpark_threads(cpu); >cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); >- goto out_release; >+ goto out_cancel; >} >BUG_ON(cpu_online(cpu)); Yes, it looks like it. v3.12-rt did not have this… Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Thu, 2014-05-01 at 14:42 -0400, Steven Rostedt wrote: > On Thu, 01 May 2014 19:36:18 +0200 > Mike Galbraith wrote: > > > Hah! I knew you were just hiding, you sneaky little SOB ;-) > > What's this from? A new bug that had all the patches applied? Or was > this without one of the patches? It's with all patches applied. It's not new, it has muddied the water during other hunting expeditions. You may never see it in a box with a sane topology, that box is kinda special. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Thu, 01 May 2014 19:36:18 +0200 Mike Galbraith wrote: > On Wed, 2014-04-30 at 11:48 -0400, Steven Rostedt wrote: > > On Wed, 30 Apr 2014 17:15:57 +0200 > > Mike Galbraith wrote: > > > > > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote: > > > > > > > > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch > > > > > should be while (atomic_read(&done.nr_todo)) > > > > > > > > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( > > > > > ret = multi_cpu_stop(&msdata); > > > > > > > > > > /* Busy wait for completion. */ > > > > > - while (!completion_done(&done.completion)) > > > > > + while (!atomic_read(&done.nr_todo)) > > >^--- that ! needs to go away > > > > > > > > I don't see this in the code. That is, there is no "completion_done()" > > > > in stop_machine_from_inactive_cpu(). It is already an atomic_read(). > > > > > > Yes, but it should read "while (atomic_read(&done.nr_todo))" > > > > Ah, this would have been better if you had sent a patch. I misread what > > you talked about. > > > > Yes, this was the culprit of my failures. After removing the '!', it > > worked. > > Hah! I knew you were just hiding, you sneaky little SOB ;-) What's this from? A new bug that had all the patches applied? Or was this without one of the patches? -- Steve > > > [50661.070049] smpboot: Booting Node 0 Processor 15 APIC 0x36 > [50661.142381] kvm: enabling virtualization on CPU15 > [50661.142397] BUG: unable to handle kernel NULL pointer dereference at > (null) > [50661.142417] IP: [] wake_up_process+0x1/0x40 > [50661.142420] PGD 0 > [50661.142422] Oops: [#1] PREEMPT SMP > [50661.142470] Modules linked in: nfsd(F) lockd(F) nfs_acl(F) auth_rpcgss(F) > sunrpc(F) autofs4(F) binfmt_misc(F) edd(F) af_packet(F) bridge(F) stp(F) > llc(F) cpufreq_conservative(F) cpufreq_ondemand(F) cpufreq_userspace(F) > cpufreq_powersave(F) pcc_cpufreq(F) fuse(F) loop(F) md_mod(F) dm_mod(F) > iTCO_wdt(F) iTCO_vendor_support(F) gpio_ich(F) vhost_net(F) macvtap(F) > macvlan(F) vhost(F) tun(F) i7core_edac(F) netxen_nic(F) kvm_intel(F) > joydev(F) shpchp(F) edac_core(F) hid_generic(F) kvm(F) ipmi_si(F) sr_mod(F) > ipmi_msghandler(F) bnx2(F) cdrom(F) sg(F) hpilo(F) hpwdt(F) ehci_pci(F) > lpc_ich(F) mfd_core(F) acpi_power_meter(F) pcspkr(F) button(F) ext4(F) > jbd2(F) mbcache(F) crc16(F) usbhid(F) uhci_hcd(F) ehci_hcd(F) usbcore(F) > sd_mod(F) usb_common(F) thermal(F) processor(F) scsi_dh_rdac(F) > scsi_dh_alua(F) scsi_dh_emc(F) > [50661.142475] scsi_dh_hp_sw(F) scsi_dh(F) ata_generic(F) ata_piix(F) > libata(F) cciss(F) hpsa(F) scsi_mod(F) > [50661.142479] CPU: 39 PID: 283 Comm: migration/39 Tainted: GF > 3.14.2-rt1 #667 > [50661.142481] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 > 07/07/2010 > [50661.142482] task: 880274515bb0 ti: 88027454e000 task.ti: > 88027454e000 > [50661.142486] RIP: 0010:[] [] > wake_up_process+0x1/0x40 > [50661.142487] RSP: 0018:88027454fda8 EFLAGS: 00010002 > [50661.142488] RAX: 8001 RBX: 880275581eb8 RCX: > > [50661.142488] RDX: 81aacec0 RSI: 0100 RDI: > > [50661.142489] RBP: 8802772ee9b0 R08: R09: > 81aacec0 > [50661.142490] R10: R11: 8103d640 R12: > 810f26c0 > [50661.142490] R13: 880275581e88 R14: 8802772ee9b8 R15: > 88027454e010 > [50661.142492] FS: () GS:8802772e() > knlGS: > [50661.142493] CS: 0010 DS: ES: CR0: 8005003b > [50661.142494] CR2: CR3: 01a0f000 CR4: > 07e0 > [50661.142494] Stack: > [50661.142505] 880275581eb8 810f2555 880274515bb0 > 0005 > [50661.142508] 0001 0001 0140 > 0001 > [50661.142512] 880274515bb0 88027454e000 8802772f4020 > 0005 > [50661.142512] Call Trace: > [50661.142526] [] ? cpu_stopper_thread+0x125/0x1a0 > [50661.142530] [] ? smpboot_thread_fn+0x23d/0x320 > [50661.142533] [] ? smpboot_create_threads+0x70/0x70 > [50661.142535] [] ? smpboot_create_threads+0x70/0x70 > [50661.142543] [] ? kthread+0xd2/0xe0 > [50661.142545] [] ? kthreadd+0x330/0x330 > [50661.142553] [] ? ret_from_fork+0x7c/0xb0 > [50661.142555] [] ? kthreadd+0x330/0x330 > [50661.142568] Code: fd ff ff 0f 1f 80 00 00 00 00 31 d2 e9 09 fd ff ff 66 0f > 1f 84 00 00 00 00 00 ba 08 00 00 00 be 0f 00 00 00 e9 f1 fc ff ff 90 53 <48> > 8b 07 48 89 fb a8 0c 75 08 48 8b 47 08 a8 0c 74 11 be ba 06 > [50661.142570] RIP [] wake_up_process+0x1/0x40 > [50661.142570] RSP > [50661.142571] CR2: > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-in
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 11:48 -0400, Steven Rostedt wrote: > On Wed, 30 Apr 2014 17:15:57 +0200 > Mike Galbraith wrote: > > > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote: > > > > > > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch > > > > should be while (atomic_read(&done.nr_todo)) > > > > > > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( > > > > ret = multi_cpu_stop(&msdata); > > > > > > > > /* Busy wait for completion. */ > > > > - while (!completion_done(&done.completion)) > > > > + while (!atomic_read(&done.nr_todo)) > >^--- that ! needs to go away > > > > > > I don't see this in the code. That is, there is no "completion_done()" > > > in stop_machine_from_inactive_cpu(). It is already an atomic_read(). > > > > Yes, but it should read "while (atomic_read(&done.nr_todo))" > > Ah, this would have been better if you had sent a patch. I misread what > you talked about. > > Yes, this was the culprit of my failures. After removing the '!', it > worked. Hah! I knew you were just hiding, you sneaky little SOB ;-) [50661.070049] smpboot: Booting Node 0 Processor 15 APIC 0x36 [50661.142381] kvm: enabling virtualization on CPU15 [50661.142397] BUG: unable to handle kernel NULL pointer dereference at (null) [50661.142417] IP: [] wake_up_process+0x1/0x40 [50661.142420] PGD 0 [50661.142422] Oops: [#1] PREEMPT SMP [50661.142470] Modules linked in: nfsd(F) lockd(F) nfs_acl(F) auth_rpcgss(F) sunrpc(F) autofs4(F) binfmt_misc(F) edd(F) af_packet(F) bridge(F) stp(F) llc(F) cpufreq_conservative(F) cpufreq_ondemand(F) cpufreq_userspace(F) cpufreq_powersave(F) pcc_cpufreq(F) fuse(F) loop(F) md_mod(F) dm_mod(F) iTCO_wdt(F) iTCO_vendor_support(F) gpio_ich(F) vhost_net(F) macvtap(F) macvlan(F) vhost(F) tun(F) i7core_edac(F) netxen_nic(F) kvm_intel(F) joydev(F) shpchp(F) edac_core(F) hid_generic(F) kvm(F) ipmi_si(F) sr_mod(F) ipmi_msghandler(F) bnx2(F) cdrom(F) sg(F) hpilo(F) hpwdt(F) ehci_pci(F) lpc_ich(F) mfd_core(F) acpi_power_meter(F) pcspkr(F) button(F) ext4(F) jbd2(F) mbcache(F) crc16(F) usbhid(F) uhci_hcd(F) ehci_hcd(F) usbcore(F) sd_mod(F) usb_common(F) thermal(F) processor(F) scsi_dh_rdac(F) scsi_dh_alua(F) scsi_dh_emc(F) [50661.142475] scsi_dh_hp_sw(F) scsi_dh(F) ata_generic(F) ata_piix(F) libata(F) cciss(F) hpsa(F) scsi_mod(F) [50661.142479] CPU: 39 PID: 283 Comm: migration/39 Tainted: GF 3.14.2-rt1 #667 [50661.142481] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010 [50661.142482] task: 880274515bb0 ti: 88027454e000 task.ti: 88027454e000 [50661.142486] RIP: 0010:[] [] wake_up_process+0x1/0x40 [50661.142487] RSP: 0018:88027454fda8 EFLAGS: 00010002 [50661.142488] RAX: 8001 RBX: 880275581eb8 RCX: [50661.142488] RDX: 81aacec0 RSI: 0100 RDI: [50661.142489] RBP: 8802772ee9b0 R08: R09: 81aacec0 [50661.142490] R10: R11: 8103d640 R12: 810f26c0 [50661.142490] R13: 880275581e88 R14: 8802772ee9b8 R15: 88027454e010 [50661.142492] FS: () GS:8802772e() knlGS: [50661.142493] CS: 0010 DS: ES: CR0: 8005003b [50661.142494] CR2: CR3: 01a0f000 CR4: 07e0 [50661.142494] Stack: [50661.142505] 880275581eb8 810f2555 880274515bb0 0005 [50661.142508] 0001 0001 0140 0001 [50661.142512] 880274515bb0 88027454e000 8802772f4020 0005 [50661.142512] Call Trace: [50661.142526] [] ? cpu_stopper_thread+0x125/0x1a0 [50661.142530] [] ? smpboot_thread_fn+0x23d/0x320 [50661.142533] [] ? smpboot_create_threads+0x70/0x70 [50661.142535] [] ? smpboot_create_threads+0x70/0x70 [50661.142543] [] ? kthread+0xd2/0xe0 [50661.142545] [] ? kthreadd+0x330/0x330 [50661.142553] [] ? ret_from_fork+0x7c/0xb0 [50661.142555] [] ? kthreadd+0x330/0x330 [50661.142568] Code: fd ff ff 0f 1f 80 00 00 00 00 31 d2 e9 09 fd ff ff 66 0f 1f 84 00 00 00 00 00 ba 08 00 00 00 be 0f 00 00 00 e9 f1 fc ff ff 90 53 <48> 8b 07 48 89 fb a8 0c 75 08 48 8b 47 08 a8 0c 74 11 be ba 06 [50661.142570] RIP [] wake_up_process+0x1/0x40 [50661.142570] RSP [50661.142571] CR2: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 11:48 -0400, Steven Rostedt wrote: > On Wed, 30 Apr 2014 17:15:57 +0200 > Mike Galbraith wrote: > > > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote: > > > > > > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch > > > > should be while (atomic_read(&done.nr_todo)) > > > > > > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( > > > > ret = multi_cpu_stop(&msdata); > > > > > > > > /* Busy wait for completion. */ > > > > - while (!completion_done(&done.completion)) > > > > + while (!atomic_read(&done.nr_todo)) > >^--- that ! needs to go away > > > > > > I don't see this in the code. That is, there is no "completion_done()" > > > in stop_machine_from_inactive_cpu(). It is already an atomic_read(). > > > > Yes, but it should read "while (atomic_read(&done.nr_todo))" > > Ah, this would have been better if you had sent a patch. I misread what > you talked about. > > Yes, this was the culprit of my failures. After removing the '!', it > worked. > > Care to send a patch :-) I figured those two were just edit patch, done, but yeah, I can do that. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 30 Apr 2014 17:15:57 +0200 Mike Galbraith wrote: > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote: > > > > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch > > > should be while (atomic_read(&done.nr_todo)) > > > > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( > > > ret = multi_cpu_stop(&msdata); > > > > > > /* Busy wait for completion. */ > > > - while (!completion_done(&done.completion)) > > > + while (!atomic_read(&done.nr_todo)) >^--- that ! needs to go away > > > > I don't see this in the code. That is, there is no "completion_done()" > > in stop_machine_from_inactive_cpu(). It is already an atomic_read(). > > Yes, but it should read "while (atomic_read(&done.nr_todo))" Ah, this would have been better if you had sent a patch. I misread what you talked about. Yes, this was the culprit of my failures. After removing the '!', it worked. Care to send a patch :-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
I fired off a 100 iteration run on 64 core box. If it's still alive in the morning, it should still be busy as hell. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote: > > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch > > should be while (atomic_read(&done.nr_todo)) > > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( > > ret = multi_cpu_stop(&msdata); > > > > /* Busy wait for completion. */ > > - while (!completion_done(&done.completion)) > > + while (!atomic_read(&done.nr_todo)) ^--- that ! needs to go away > > I don't see this in the code. That is, there is no "completion_done()" > in stop_machine_from_inactive_cpu(). It is already an atomic_read(). Yes, but it should read "while (atomic_read(&done.nr_todo))" -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 30 Apr 2014 16:54:46 +0200 Mike Galbraith wrote: > On Wed, 2014-04-30 at 10:33 -0400, Steven Rostedt wrote: > > On Wed, 30 Apr 2014 10:19:19 -0400 > > Steven Rostedt wrote: > > > > > I'm testing it now. But could you please post them as regular patches. > > > They were attachments to this thread, and were not something that stood > > > out. > > > > With your two patches, it still crashes exactly the same way. I > > probably should remove my debug just in case, but I think this box has > > another problem with it. > > You killed this hunk of hotplug-light-get-online-cpus.patch > > @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int > /* CPU didn't die: tell everyone. Can't complain. */ > smpboot_unpark_threads(cpu); > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > - goto out_release; > + goto out_cancel; I added this, but it only happens on the failed case, which I don't think is an issue with what I'm dealing with. > } > BUG_ON(cpu_online(cpu)); > > ..and fixed this too? > > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch > should be while (atomic_read(&done.nr_todo)) > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( > ret = multi_cpu_stop(&msdata); > > /* Busy wait for completion. */ > - while (!completion_done(&done.completion)) > + while (!atomic_read(&done.nr_todo)) I don't see this in the code. That is, there is no "completion_done()" in stop_machine_from_inactive_cpu(). It is already an atomic_read(). -- Steve > cpu_relax(); > > mutex_unlock(&stop_cpus_mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 10:33 -0400, Steven Rostedt wrote: > On Wed, 30 Apr 2014 10:19:19 -0400 > Steven Rostedt wrote: > > > I'm testing it now. But could you please post them as regular patches. > > They were attachments to this thread, and were not something that stood > > out. > > With your two patches, it still crashes exactly the same way. I > probably should remove my debug just in case, but I think this box has > another problem with it. You killed this hunk of hotplug-light-get-online-cpus.patch @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int /* CPU didn't die: tell everyone. Can't complain. */ smpboot_unpark_threads(cpu); cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); - goto out_release; + goto out_cancel; } BUG_ON(cpu_online(cpu)); ..and fixed this too? Another little bug. This hunk of patches/stomp-machine-raw-lock.patch should be while (atomic_read(&done.nr_todo)) @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( ret = multi_cpu_stop(&msdata); /* Busy wait for completion. */ - while (!completion_done(&done.completion)) + while (!atomic_read(&done.nr_todo)) cpu_relax(); mutex_unlock(&stop_cpus_mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 10:19 -0400, Steven Rostedt wrote: > On Wed, 30 Apr 2014 16:00:03 +0200 > Mike Galbraith wrote: > > > On Wed, 2014-04-30 at 09:15 -0400, Steven Rostedt wrote: > > > On Wed, 30 Apr 2014 15:06:29 +0200 > > > Mike Galbraith wrote: > > > > > > > > > > The End.. I hope. I've had enough hotplug entertainment for a while. > > > > > > Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a > > > different issues than what my patch addressed. I'm still debugging it. > > > > If you didn't fix the two bugs I showed, and (wisely) didn't look at the > > beautiful lglock patches I posted (no frozen shark, I'm disappointed;), > > your patch won't help. > > Mike, > > I'm testing it now. But could you please post them as regular patches. > They were attachments to this thread, and were not something that stood > out. They were meant to not stick out :) I showed what I did to deal with that damn lglock, but showing them at all felt more akin to chumming the waters for frozen sharks than posting patches. 'spose I could try to muster up some courage, showing them put a pretty big dent in my supply. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 30 Apr 2014 10:19:19 -0400 Steven Rostedt wrote: > I'm testing it now. But could you please post them as regular patches. > They were attachments to this thread, and were not something that stood > out. With your two patches, it still crashes exactly the same way. I probably should remove my debug just in case, but I think this box has another problem with it. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 30 Apr 2014 16:00:03 +0200 Mike Galbraith wrote: > On Wed, 2014-04-30 at 09:15 -0400, Steven Rostedt wrote: > > On Wed, 30 Apr 2014 15:06:29 +0200 > > Mike Galbraith wrote: > > > > > > > The End.. I hope. I've had enough hotplug entertainment for a while. > > > > Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a > > different issues than what my patch addressed. I'm still debugging it. > > If you didn't fix the two bugs I showed, and (wisely) didn't look at the > beautiful lglock patches I posted (no frozen shark, I'm disappointed;), > your patch won't help. Mike, I'm testing it now. But could you please post them as regular patches. They were attachments to this thread, and were not something that stood out. Thanks, -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 09:15 -0400, Steven Rostedt wrote: > On Wed, 30 Apr 2014 15:06:29 +0200 > Mike Galbraith wrote: > > > > The End.. I hope. I've had enough hotplug entertainment for a while. > > Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a > different issues than what my patch addressed. I'm still debugging it. If you didn't fix the two bugs I showed, and (wisely) didn't look at the beautiful lglock patches I posted (no frozen shark, I'm disappointed;), your patch won't help. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 30 Apr 2014 15:06:29 +0200 Mike Galbraith wrote: > The End.. I hope. I've had enough hotplug entertainment for a while. Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a different issues than what my patch addressed. I'm still debugging it. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Wed, 2014-04-30 at 09:43 +0200, Mike Galbraith wrote: > On Tue, 2014-04-29 at 20:13 -0400, Steven Rostedt wrote: > > On Tue, 29 Apr 2014 07:21:09 +0200 > > Mike Galbraith wrote: > > > > > On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: > > > > > > > > Seems that migrate_disable() must be called before taking the lock as > > > > > it is done in every other location. > > > > > > > > And for tasklist_lock, seems you also MUST do that prior to trylock as > > > > well, else you'll run afoul of the hotplug beast. > > > > > > Bah. Futzing with dmesg while stress script is running is either a very > > > bad idea, or a very good test. Both virgin 3.10-rt and 3.12-rt with new > > > bugs squashed will deadlock. > > > > > > Too bad I kept on testing, I liked the notion that hotplug was solid ;-) > > > > I was able to stress cpu hotplug on 3.12-rt after applying the > > following patch. > > > > If there's no complaints about it. I'm going to add this to the 3.12-rt > > stable tree. As without it, it fails horribly with the cpu hotplug > > stress test, and I wont release a stable kernel that does that. > > My local boxen are happy, 64 core box with 14-rt seems happy as well, > though I couldn't let it burn for long. And 3.12 looks stable on 64 core DL980 as well. (If it survived a 24 hour busy+stress session I'd still likely fall outta my chair though) My kinda sorta 3.12-rt enterprise to be kernel wasn't stable on DL980, while appearing just fine on small boxen, which made me suspect that there was still a big box something lurking, only raising its ugly head in the fatter kernel. That wasn't an rt problem after all, someone in enterprise land just didn't stack their goody pile quite high enough while wedging upstream into the stable base kernel, which bent rt. The End.. I hope. I've had enough hotplug entertainment for a while. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Tue, 2014-04-29 at 20:13 -0400, Steven Rostedt wrote: > On Tue, 29 Apr 2014 07:21:09 +0200 > Mike Galbraith wrote: > > > On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: > > > > > > Seems that migrate_disable() must be called before taking the lock as > > > > it is done in every other location. > > > > > > And for tasklist_lock, seems you also MUST do that prior to trylock as > > > well, else you'll run afoul of the hotplug beast. > > > > Bah. Futzing with dmesg while stress script is running is either a very > > bad idea, or a very good test. Both virgin 3.10-rt and 3.12-rt with new > > bugs squashed will deadlock. > > > > Too bad I kept on testing, I liked the notion that hotplug was solid ;-) > > I was able to stress cpu hotplug on 3.12-rt after applying the > following patch. > > If there's no complaints about it. I'm going to add this to the 3.12-rt > stable tree. As without it, it fails horribly with the cpu hotplug > stress test, and I wont release a stable kernel that does that. My local boxen are happy, 64 core box with 14-rt seems happy as well, though I couldn't let it burn for long. BTW, that dmesg business went into hiding. I didn't have time to put virgin 10-rt back on and play around poking both kernels this that and the other way again, but seems there's some phase-of-moon factor there. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Tue, 29 Apr 2014 07:21:09 +0200 Mike Galbraith wrote: > On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: > > > > Seems that migrate_disable() must be called before taking the lock as > > > it is done in every other location. > > > > And for tasklist_lock, seems you also MUST do that prior to trylock as > > well, else you'll run afoul of the hotplug beast. > > Bah. Futzing with dmesg while stress script is running is either a very > bad idea, or a very good test. Both virgin 3.10-rt and 3.12-rt with new > bugs squashed will deadlock. > > Too bad I kept on testing, I liked the notion that hotplug was solid ;-) I was able to stress cpu hotplug on 3.12-rt after applying the following patch. If there's no complaints about it. I'm going to add this to the 3.12-rt stable tree. As without it, it fails horribly with the cpu hotplug stress test, and I wont release a stable kernel that does that. -- Steve Signed-off-by: Steven Rostedt diff --git a/kernel/rt.c b/kernel/rt.c index bb72347..4f2a613 100644 --- a/kernel/rt.c +++ b/kernel/rt.c @@ -180,12 +180,15 @@ EXPORT_SYMBOL(_mutex_unlock); */ int __lockfunc rt_write_trylock(rwlock_t *rwlock) { - int ret = rt_mutex_trylock(&rwlock->lock); + int ret; + + migrate_disable(); + ret = rt_mutex_trylock(&rwlock->lock); - if (ret) { + if (ret) rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_); - migrate_disable(); - } + else + migrate_enable(); return ret; } @@ -212,11 +215,12 @@ int __lockfunc rt_read_trylock(rwlock_t *rwlock) * write locked. */ if (rt_mutex_owner(lock) != current) { + migrate_disable(); ret = rt_mutex_trylock(lock); - if (ret) { + if (ret) rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_); - migrate_disable(); - } + else + migrate_enable(); } else if (!rwlock->read_depth) { ret = 0; } @@ -245,8 +249,8 @@ void __lockfunc rt_read_lock(rwlock_t *rwlock) */ if (rt_mutex_owner(lock) != current) { rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_); - __rt_spin_lock(lock); migrate_disable(); + __rt_spin_lock(lock); } rwlock->read_depth++; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: > > Seems that migrate_disable() must be called before taking the lock as > > it is done in every other location. > > And for tasklist_lock, seems you also MUST do that prior to trylock as > well, else you'll run afoul of the hotplug beast. Bah. Futzing with dmesg while stress script is running is either a very bad idea, or a very good test. Both virgin 3.10-rt and 3.12-rt with new bugs squashed will deadlock. Too bad I kept on testing, I liked the notion that hotplug was solid ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: > On Mon, 2014-04-28 at 10:18 -0400, Steven Rostedt wrote: > > On Mon, 28 Apr 2014 11:09:46 +0200 > > Mike Galbraith wrote: > > > > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch > > > > > > bug: migrate_disable() after blocking is too late. > > > > > > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a > > > /* Subtract 1 from counter unless that drops it to 0 (ie. it was > > > 1) */ > > > if (atomic_add_unless(atomic, -1, 1)) > > > return 0; > > > - migrate_disable(); > > > rt_spin_lock(lock); > > > - if (atomic_dec_and_test(atomic)) > > > + if (atomic_dec_and_test(atomic)){ > > > + migrate_disable(); > > > > Makes sense, as the CPU can go offline right after the lock is grabbed > > and before the migrate_disable() is called. > > > > Seems that migrate_disable() must be called before taking the lock as > > it is done in every other location. > > And for tasklist_lock, seems you also MUST do that prior to trylock as > well, else you'll run afoul of the hotplug beast. This lockdep gripe is from the deadlocked crashdump with only the clearly busted bits patched up. [ 193.033224] == [ 193.033225] [ INFO: possible circular locking dependency detected ] [ 193.033227] 3.12.18-rt25 #19 Not tainted [ 193.033227] --- [ 193.033228] boot.kdump/5422 is trying to acquire lock: [ 193.033237] (&hp->lock){+.+...}, at: [] pin_current_cpu+0x84/0x1d0 [ 193.033238] but task is already holding lock: [ 193.033241] (tasklist_lock){+.+...}, at: [] do_wait+0xbb/0x2a0 [ 193.033242] which lock already depends on the new lock. [ 193.033242] the existing dependency chain (in reverse order) is: [ 193.033244] -> #1 (tasklist_lock){+.+...}: [ 193.033248][] check_prevs_add+0xf8/0x180 [ 193.033250][] validate_chain.isra.45+0x5aa/0x750 [ 193.033252][] __lock_acquire+0x3f6/0x9f0 [ 193.033253][] lock_acquire+0x8c/0x160 [ 193.033257][] rt_write_lock+0x2c/0x40 [ 193.033260][] _cpu_down+0x219/0x440 [ 193.033261][] cpu_down+0x30/0x50 [ 193.033264][] cpu_subsys_offline+0x1c/0x30 [ 193.033267][] device_offline+0x95/0xc0 [ 193.033269][] online_store+0x40/0x80 [ 193.033271][] dev_attr_store+0x13/0x30 [ 193.033274][] sysfs_write_file+0xf0/0x170 [ 193.033277][] vfs_write+0xc8/0x1d0 [ 193.033279][] SyS_write+0x50/0xa0 [ 193.033282][] system_call_fastpath+0x16/0x1b [ 193.033284] -> #0 (&hp->lock){+.+...}: [ 193.033286][] check_prev_add+0x7bd/0x7d0 [ 193.033287][] check_prevs_add+0xf8/0x180 [ 193.033289][] validate_chain.isra.45+0x5aa/0x750 [ 193.033291][] __lock_acquire+0x3f6/0x9f0 [ 193.033293][] lock_acquire+0x8c/0x160 [ 193.033295][] rt_spin_lock+0x55/0x70 [ 193.033296][] pin_current_cpu+0x84/0x1d0 [ 193.033299][] migrate_disable+0x81/0x100 [ 193.033301][] rt_read_lock+0x47/0x60 [ 193.033303][] do_wait+0xbb/0x2a0 [ 193.033305][] SyS_wait4+0x9e/0x100 [ 193.033307][] system_call_fastpath+0x16/0x1b [ 193.033307] other info that might help us debug this: [ 193.033308] Possible unsafe locking scenario: [ 193.033309]CPU0CPU1 [ 193.033309] [ 193.033310] lock(tasklist_lock); [ 193.033312]lock(&hp->lock); [ 193.033313]lock(tasklist_lock); [ 193.033314] lock(&hp->lock); [ 193.033315] *** DEADLOCK *** [ 193.033316] 1 lock held by boot.kdump/5422: [ 193.033319] #0: (tasklist_lock){+.+...}, at: [] do_wait+0xbb/0x2a0 [ 193.033320] stack backtrace: [ 193.033322] CPU: 0 PID: 5422 Comm: boot.kdump Not tainted 3.12.18-rt25 #19 [ 193.033323] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 [ 193.033326] 880200550818 8802004e5ad8 8155538c [ 193.033328] 8802004e5b28 8154d0df 8802004e5b18 [ 193.00] 8802004e5b50 880200550818 8802005507e0 880200550818 [ 193.01] Call Trace: [ 193.05] [] dump_stack+0x4f/0x91 [ 193.07] [] print_circular_bug+0xd3/0xe4 [ 193.09] [] check_prev_add+0x7bd/0x7d0 [ 193.033342] [] ? sched_clock_local+0x25/0x90 [ 193.033344] [] ? sched_clock_cpu+0xa8/0x120 [ 193.033346] [] check_prevs_add+0xf8/0x180 [ 193.033348] [] validate_chain.isra.45+0x5aa/0x750 [ 193.033350] [] __lock_acquire+0x3f6/0x9f0 [ 193.033352] [] ? rt_spin_lock_slowlock+0x231/0x280 [ 1
Re: [ANNOUNCE] 3.14-rt1
On Mon, 2014-04-28 at 10:18 -0400, Steven Rostedt wrote: > On Mon, 28 Apr 2014 11:09:46 +0200 > Mike Galbraith wrote: > > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch > > > > bug: migrate_disable() after blocking is too late. > > > > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a > > /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) > > */ > > if (atomic_add_unless(atomic, -1, 1)) > > return 0; > > - migrate_disable(); > > rt_spin_lock(lock); > > - if (atomic_dec_and_test(atomic)) > > + if (atomic_dec_and_test(atomic)){ > > + migrate_disable(); > > Makes sense, as the CPU can go offline right after the lock is grabbed > and before the migrate_disable() is called. > > Seems that migrate_disable() must be called before taking the lock as > it is done in every other location. And for tasklist_lock, seems you also MUST do that prior to trylock as well, else you'll run afoul of the hotplug beast. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Mon, 28 Apr 2014 11:09:46 +0200 Mike Galbraith wrote: > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch > > bug: migrate_disable() after blocking is too late. > > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a > /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */ > if (atomic_add_unless(atomic, -1, 1)) > return 0; > - migrate_disable(); > rt_spin_lock(lock); > - if (atomic_dec_and_test(atomic)) > + if (atomic_dec_and_test(atomic)){ > + migrate_disable(); Makes sense, as the CPU can go offline right after the lock is grabbed and before the migrate_disable() is called. Seems that migrate_disable() must be called before taking the lock as it is done in every other location. -- Steve > return 1; > + } > rt_spin_unlock(lock); > - migrate_enable(); > return 0; > } > EXPORT_SYMBOL(atomic_dec_and_spin_lock); > > read_lock-migrate_disable-pushdown-to-rt_read_lock.patch > > bug: ditto. > > @@ -244,8 +246,10 @@ void __lockfunc rt_read_lock(rwlock_t *r > /* > * recursive read locks succeed when current owns the lock > */ > - if (rt_mutex_owner(lock) != current) > + if (rt_mutex_owner(lock) != current) { > __rt_spin_lock(lock); > + migrate_disable(); > + } > rwlock->read_depth++; > } > > Moving that migrate_disable() up will likely fix my hotplug troubles. > I'll find out when I get back from physical torture (therapy) session. > > -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Mon, 2014-04-28 at 07:09 +0200, Mike Galbraith wrote: > Hi Nicholas, > > On Sat, 2014-04-26 at 15:58 +0200, Mike Galbraith wrote: > > On Sat, 2014-04-26 at 10:38 +0200, Mike Galbraith wrote: > > > On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote: > > > > > > > Hotplug can still deadlock in rt trees too, and will if you beat it > > > > hard. > > > > > > Box actually deadlocks like so. > > > > ... > > > > 3.12-rt looks a bit busted migrate_disable/enable() wise. > > > > /me eyeballs 3.10-rt (looks better), confirms 3.10-rt hotplug works, > > swipes working code, confirms 3.12-rt now works. Yup, that was it. > > My boxen, including 64 core DL980 that ran hotplug stress for 3 hours > yesterday with pre-pushdown rwlocks, say the migrate_disable/enable > pushdown patches are very definitely busted. migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch bug: migrate_disable() after blocking is too late. @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */ if (atomic_add_unless(atomic, -1, 1)) return 0; - migrate_disable(); rt_spin_lock(lock); - if (atomic_dec_and_test(atomic)) + if (atomic_dec_and_test(atomic)){ + migrate_disable(); return 1; + } rt_spin_unlock(lock); - migrate_enable(); return 0; } EXPORT_SYMBOL(atomic_dec_and_spin_lock); read_lock-migrate_disable-pushdown-to-rt_read_lock.patch bug: ditto. @@ -244,8 +246,10 @@ void __lockfunc rt_read_lock(rwlock_t *r /* * recursive read locks succeed when current owns the lock */ - if (rt_mutex_owner(lock) != current) + if (rt_mutex_owner(lock) != current) { __rt_spin_lock(lock); + migrate_disable(); + } rwlock->read_depth++; } Moving that migrate_disable() up will likely fix my hotplug troubles. I'll find out when I get back from physical torture (therapy) session. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
Hi Nicholas, On Sat, 2014-04-26 at 15:58 +0200, Mike Galbraith wrote: > On Sat, 2014-04-26 at 10:38 +0200, Mike Galbraith wrote: > > On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote: > > > > > Hotplug can still deadlock in rt trees too, and will if you beat it > > > hard. > > > > Box actually deadlocks like so. > > ... > > 3.12-rt looks a bit busted migrate_disable/enable() wise. > > /me eyeballs 3.10-rt (looks better), confirms 3.10-rt hotplug works, > swipes working code, confirms 3.12-rt now works. Yup, that was it. My boxen, including 64 core DL980 that ran hotplug stress for 3 hours yesterday with pre-pushdown rwlocks, say the migrate_disable/enable pushdown patches are very definitely busted. Instead of whacking selective bits, as I did to verify that the rwlock changes were indeed causing hotplug stress deadlock woes, I'm eyeballing the lot, twiddling primitives to look like I think they should, after which I'll let my boxen express their opinions of the result. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On 04/11/2014 11:57 AM, Sebastian Andrzej Siewior wrote: Dear RT folks! I'm pleased to announce the v3.14-rt1 patch setty). Changes since v3.12.15-rt25 - I dropped the sparc64 patches I had in the queue. They did not apply cleanly, the code in v3.14 changed in the MMU area. Here is where I remembered that it was not working perfectly either. Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I have seen something similar in 3.12.x-r): Apr 26 11:16:11 localhost kernel: [ 96.323248] [ cut here ] Apr 26 11:16:11 localhost kernel: [ 96.323262] WARNING: CPU: 0 PID: 2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() Apr 26 11:16:11 localhost kernel: [ 96.323264] list_del corruption. prev->next should be 8802101196a0, but was 0001 Apr 26 11:16:11 localhost kernel: [ 96.323266] Modules linked in: fuse ipt_MASQUERADE xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw rfcomm ip6table_filter bnep ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel uvcvideo videobuf2_vmalloc microcode videobuf2_memops snd_hda_codec_hdmi videobuf2_core videodev media serio_raw btusb bluetooth intel_ips i2c_i801 6lowpan_iphc snd_hda_codec_conexant snd_hda_codec_generic arc4 iwldvm mac80211 iwlwifi lpc_ich sdhci_pci mfd_core sdhci cfg80211 mmc_core snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_timer ptp mei_me pps_core mei shpchp thinkpad_acpi snd ppdev soundcore rfkill parport_pc parport acpi_cpufreq uinput firewire_ohci nouveau firewire_core crc_itu_t i2c_algo_bit drm_kms_helper ttm drm mxm_wmi i2c_core wmi video Apr 26 11:16:11 localhost kernel: [ 96.323331] CPU: 0 PID: 2051 Comm: cinnamon Not tainted 3.14.1-200.rt1.1.fc19.ccrma.x86_64+rt #1 Apr 26 11:16:11 localhost kernel: [ 96.323332] Hardware name: LENOVO 4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010 Apr 26 11:16:11 localhost kernel: [ 96.323334] 8a5c11dc 8800ae715a88 81707fca Apr 26 11:16:11 localhost kernel: [ 96.323336] 8800ae715ad0 8800ae715ac0 8108d03d 8802101196a0 Apr 26 11:16:11 localhost kernel: [ 96.323337] 880210119b50 880210119b50 880210119b40 88021a615648 Apr 26 11:16:11 localhost kernel: [ 96.323338] Call Trace: Apr 26 11:16:11 localhost kernel: [ 96.323345] [] dump_stack+0x4d/0x82 Apr 26 11:16:11 localhost kernel: [ 96.323351] [] warn_slowpath_common+0x7d/0xc0 Apr 26 11:16:11 localhost kernel: [ 96.323352] [] warn_slowpath_fmt+0x5c/0x80 Apr 26 11:16:11 localhost kernel: [ 96.323354] [] __list_del_entry+0xa1/0xd0 Apr 26 11:16:11 localhost kernel: [ 96.323355] [] list_del+0xd/0x30 Apr 26 11:16:11 localhost kernel: [ 96.323393] [] nouveau_fence_signal+0x53/0x80 [nouveau] Apr 26 11:16:11 localhost kernel: [ 96.323414] [] nouveau_fence_update+0x48/0xa0 [nouveau] Apr 26 11:16:11 localhost kernel: [ 96.323435] [] nouveau_fence_sync+0x45/0x80 [nouveau] Apr 26 11:16:11 localhost kernel: [ 96.323456] [] validate_list+0xd8/0x2e0 [nouveau] Apr 26 11:16:11 localhost kernel: [ 96.323478] [] nouveau_gem_ioctl_pushbuf+0xaa3/0x13e0 [nouveau] Apr 26 11:16:11 localhost kernel: [ 96.323500] [] drm_ioctl+0x4f2/0x620 [drm] Apr 26 11:16:11 localhost kernel: [ 96.323506] [] ? migrate_enable+0x94/0x1c0 Apr 26 11:16:11 localhost kernel: [ 96.323527] [] nouveau_drm_ioctl+0x4e/0x90 [nouveau] Apr 26 11:16:11 localhost kernel: [ 96.323530] [] do_vfs_ioctl+0x2e0/0x4c0 Apr 26 11:16:11 localhost kernel: [ 96.323533] [] ? file_has_perm+0xa6/0xb0 Apr 26 11:16:11 localhost kernel: [ 96.323535] [] SyS_ioctl+0x81/0xa0 Apr 26 11:16:11 localhost kernel: [ 96.323538] [] system_call_fastpath+0x16/0x1b Apr 26 11:16:11 localhost kernel: [ 96.323569] ---[ end trace 0002 ]--- -- Fernando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Sat, 2014-04-26 at 10:38 +0200, Mike Galbraith wrote: > On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote: > > > Hotplug can still deadlock in rt trees too, and will if you beat it > > hard. > > Box actually deadlocks like so. ... 3.12-rt looks a bit busted migrate_disable/enable() wise. /me eyeballs 3.10-rt (looks better), confirms 3.10-rt hotplug works, swipes working code, confirms 3.12-rt now works. Yup, that was it. When I fix lg_global_lock() (I think it and Medusa are both busted) I bet a nickle 3.14-rt will work. Hm, actually, rt_write_trylock() in swiped 3.10-rt code below (and some others) look busted to me. migrate_disable() _after_ grabbing a lock is too late, no? --- include/linux/rwlock_rt.h | 32 kernel/rt.c | 21 +++-- 2 files changed, 39 insertions(+), 14 deletions(-) --- a/include/linux/rwlock_rt.h +++ b/include/linux/rwlock_rt.h @@ -33,50 +33,72 @@ extern void __rt_rwlock_init(rwlock_t *r #define read_lock_irqsave(lock, flags) \ do {\ typecheck(unsigned long, flags);\ + migrate_disable(); \ flags = rt_read_lock_irqsave(lock); \ } while (0) #define write_lock_irqsave(lock, flags)\ do {\ typecheck(unsigned long, flags);\ + migrate_disable(); \ flags = rt_write_lock_irqsave(lock);\ } while (0) -#define read_lock(lock)rt_read_lock(lock) +#define read_lock(lock)\ + do {\ + migrate_disable(); \ + rt_read_lock(lock); \ + } while (0) #define read_lock_bh(lock) \ do {\ local_bh_disable(); \ + migrate_disable(); \ rt_read_lock(lock); \ } while (0) #define read_lock_irq(lock)read_lock(lock) -#define write_lock(lock) rt_write_lock(lock) +#define write_lock(lock) \ + do {\ + migrate_disable(); \ + rt_write_lock(lock);\ + } while (0) #define write_lock_bh(lock)\ do {\ local_bh_disable(); \ + migrate_disable(); \ rt_write_lock(lock);\ } while (0) #define write_lock_irq(lock) write_lock(lock) -#define read_unlock(lock) rt_read_unlock(lock) +#define read_unlock(lock) \ + do {\ + rt_read_unlock(lock); \ + migrate_enable(); \ + } while (0) #define read_unlock_bh(lock) \ do {\ rt_read_unlock(lock); \ + migrate_enable(); \ local_bh_enable(); \ } while (0) #define read_unlock_irq(lock) read_unlock(lock) -#define write_unlock(lock) rt_write_unlock(lock) +#define write_unlock(lock) \ + do {\ + rt_write_unlock(lock); \ + migrate_enable(); \ + } while (0) #define write_unlock_bh(lock) \ do {\ rt_write_unlock(lock); \ + migrate_enable(); \ local_bh_enable(); \ } while (0) @@ -87,6 +109,7 @@ extern void __rt_rwlock_init(rwlock_t *r typecheck(unsigned long, flags);\ (void) flags; \ rt_read_unlock(lock); \ + migrate_enable(); \ } while (0) #define write_unlock_irqrestore(lock, flags) \ @@ -94,6 +117,7 @@ extern void __rt_rwlock_init(rwlock_t *r typecheck(unsigned long, flags);\ (void) flags; \ rt_write_unlock(lock); \ + migrate_enable(); \ } while (0) #endif --- a/kernel/rt.c +++ b/kernel
Re: [ANNOUNCE] 3.14-rt1
On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote: > Hotplug can still deadlock in rt trees too, and will if you beat it > hard. Box actually deadlocks like so. CPU3 boot.kdump sys_wait4 do_wait read_lock(&tasklist_lock) rt_read_lock __rt_spin_lock(lock) migrate_disable() pin_current_cpu() if (hp->grab_lock) { preempt_enable(); <== hmm hotplug_lock(hp); hp = &__get_cpu_var(hotplug_pcp); <== hmm struct hotplug_pcp { unplug = 0x8800b7d0e540, sync_tsk = 0x0, refcount = 0, <== hmm grab_lock = 1, ... lock = { ... owner = 0x8802039f0001, stress-cpu-hotplug_stress.sh?!? <=== he's way over yonder. Yo, dude, would you please NOT take percpu locks with you? CPU0 stress-cpu-hotplug_stress.sh sysfs_write_file dev_attr_store online_store device_offline cpu_subsys_offline cpu_down _cpu_down cpu_hotplug_begin mutex_lock(&cpu_hotplug.lock); ... check_for_tasks write_lock_irq(&tasklist_lock); held by CPU3 boot.kdump over there ===> CPU0 kworker/0:0 cpuset_hotplug_workfn+0x23e/0x380 rebuild_sched_domains+0x15/0x30 rebuild_sched_domains_locked+0x17/0x80 get_online_cpus+0x35/0x50 mutex_lock(&cpu_hotplug.lock); held by stress-cpu-hotplug_stress.sh twiddle twiddle twiddle... INFO: task kworker/0:0:4 blocked for more than 120 seconds. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Sat, 2014-04-19 at 16:46 +0200, Mike Galbraith wrote: > Hi Sebastian, > > On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: > > Dear RT folks! > > > > I'm pleased to announce the v3.14-rt1 patch set. > > This hunk in hotplug-light-get-online-cpus.patch looks like a bug. > > @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int > /* CPU didn't die: tell everyone. Can't complain. */ > smpboot_unpark_threads(cpu); > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > - goto out_release; > + goto out_cancel; > } > BUG_ON(cpu_online(cpu)); ... BTW, the reason I was eyeballing this stuff is because I was highly interested in what you were going to do here... # XXX stomp-machine-deal-clever-with-stopper-lock.patch ...with that bloody lglock. What I did is attached for your amusement. (warning: viewing may induce "Medussa" syndrome:) Hotplug can still deadlock in rt trees too, and will if you beat it hard. The splat below is virgin 3.12-rt (where wonderful lock doesn't yet exist) while running Stevens stress-cpu-hotplug.sh, which is still plenty deadly when liberally applied. [ 161.951908] CPU0 attaching NULL sched-domain. [ 161.970417] CPU2 attaching NULL sched-domain. [ 161.976594] CPU3 attaching NULL sched-domain. [ 161.981044] CPU0 attaching sched-domain: [ 161.985010] domain 0: span 0,3 level CPU [ 161.990627] groups: 0 (cpu_power = 997) 3 (cpu_power = 1021) [ 162.000609] CPU3 attaching sched-domain: [ 162.007723] domain 0: span 0,3 level CPU [ 162.012756] groups: 3 (cpu_power = 1021) 0 (cpu_power = 997) [ 162.025533] smpboot: CPU 2 is now offline [ 162.036113] [ 162.036114] == [ 162.036115] [ INFO: possible circular locking dependency detected ] [ 162.036116] 3.12.17-rt25 #14 Not tainted [ 162.036117] --- [ 162.036118] boot.kdump/6853 is trying to acquire lock: [ 162.036126] (&hp->lock){+.+...}, at: [] pin_current_cpu+0x84/0x1d0 [ 162.036126] [ 162.036126] but task is already holding lock: [ 162.036131] (&mm->mmap_sem){+.}, at: [] __do_page_fault+0x14c/0x5d0 [ 162.036132] [ 162.036132] which lock already depends on the new lock. [ 162.036132] [ 162.036133] [ 162.036133] the existing dependency chain (in reverse order) is: [ 162.036135] [ 162.036135] -> #2 (&mm->mmap_sem){+.}: [ 162.036138][] check_prevs_add+0xf8/0x180 [ 162.036140][] validate_chain.isra.45+0x5aa/0x750 [ 162.036142][] __lock_acquire+0x3f6/0x9f0 [ 162.036143][] lock_acquire+0x8c/0x160 [ 162.036146][] might_fault+0x83/0xb0 [ 162.036149][] sel_loadlut+0x11/0x70 [ 162.036152][] tioclinux+0x23d/0x2c0 [ 162.036153][] vt_ioctl+0x86c/0x11f0 [ 162.036155][] tty_ioctl+0x2a8/0x940 [ 162.036158][] do_vfs_ioctl+0x81/0x340 [ 162.036159][] SyS_ioctl+0x4b/0x90 [ 162.036162][] system_call_fastpath+0x16/0x1b [ 162.036164] [ 162.036164] -> #1 (console_lock){+.+.+.}: [ 162.036165][] check_prevs_add+0xf8/0x180 [ 162.036167][] validate_chain.isra.45+0x5aa/0x750 [ 162.036169][] __lock_acquire+0x3f6/0x9f0 [ 162.036171][] lock_acquire+0x8c/0x160 [ 162.036173][] console_lock+0x6f/0x80 [ 162.036174][] console_cpu_notify+0x1d/0x30 [ 162.036176][] notifier_call_chain+0x4d/0x70 [ 162.036179][] __raw_notifier_call_chain+0x9/0x10 [ 162.036181][] __cpu_notify+0x1b/0x30 [ 162.036182][] cpu_notify_nofail+0x10/0x20 [ 162.036185][] _cpu_down+0x20d/0x440 [ 162.036186][] cpu_down+0x30/0x50 [ 162.036188][] cpu_subsys_offline+0x1c/0x30 [ 162.036191][] device_offline+0x95/0xc0 [ 162.036192][] online_store+0x40/0x80 [ 162.036194][] dev_attr_store+0x13/0x30 [ 162.036197][] sysfs_write_file+0xf0/0x170 [ 162.036200][] vfs_write+0xc8/0x1d0 [ 162.036202][] SyS_write+0x50/0xa0 [ 162.036203][] system_call_fastpath+0x16/0x1b [ 162.036205] [ 162.036205] -> #0 (&hp->lock){+.+...}: [ 162.036207][] check_prev_add+0x7bd/0x7d0 [ 162.036209][] check_prevs_add+0xf8/0x180 [ 162.036210][] validate_chain.isra.45+0x5aa/0x750 [ 162.036212][] __lock_acquire+0x3f6/0x9f0 [ 162.036214][] lock_acquire+0x8c/0x160 [ 162.036216][] rt_spin_lock+0x55/0x70 [ 162.036218][] pin_current_cpu+0x84/0x1d0 [ 162.036220][] migrate_disable+0x81/0x100 [ 162.036222][] handle_pte_fault+0xf8/0x1c0 [ 162.036223][] __handle_mm_fault+0x106/0x1b0 [ 162.036225][] handle_mm_fault+0x22/0x30 [ 162.036227][] __do_page_fault+0x1b1/0x5d0 [ 162.036229][] do_page_fault+0x9/0x10 [ 162.036230][] page_fault+0x22/0x30 [ 162.036232][] ret_from_f
Re: [ANNOUNCE] 3.14-rt1
On Thu, 2014-04-24 at 09:12 +0200, Sebastian Andrzej Siewior wrote: > On 04/24/2014 06:06 AM, Mike Galbraith wrote: > > Turning lockdep on, it says it's busted. > > http://www.spinics.net/lists/linux-rt-users/msg11179.html I was heading toward the same conclusion while regression testing. Guess I can stop that. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On 04/24/2014 06:06 AM, Mike Galbraith wrote: > Turning lockdep on, it says it's busted. http://www.spinics.net/lists/linux-rt-users/msg11179.html Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
Turning lockdep on, it says it's busted. (I'll go stare at it, maybe the beast will blink first for a change) [0.00] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [0.00] ... MAX_LOCKDEP_SUBCLASSES: 8 [0.00] ... MAX_LOCK_DEPTH: 48 [0.00] ... MAX_LOCKDEP_KEYS:8191 [0.00] ... CLASSHASH_SIZE: 4096 [0.00] ... MAX_LOCKDEP_ENTRIES: 16384 [0.00] ... MAX_LOCKDEP_CHAINS: 32768 [0.00] ... CHAINHASH_SIZE: 16384 [0.00] memory used by lock dependency info: 6367 kB [0.00] per task-struct memory footprint: 2688 bytes [0.00] [0.00] | Locking API testsuite: [0.00] [0.00] | spin |wlock |rlock |mutex | wsem | rsem | [0.00] -- [0.00] A-A deadlock: ok | ok |FAILED| [0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16 [0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 [0.00] 0002 81a01f28 815e12a5 810b7727 [0.00] 0001 81a01f58 815e1db2 [0.00] 81a01f68 [0.00] Call Trace: [0.00] [] dump_stack+0x4f/0x7c [0.00] [] ? console_trylock_for_printk+0x37/0xf0 [0.00] [] dotest+0x5f/0xc7 [0.00] [] locking_selftest+0xdf/0xb30 [0.00] [] start_kernel+0x215/0x327 [0.00] [] ? repair_env_string+0x5a/0x5a [0.00] [] ? memblock_reserve+0x49/0x4e [0.00] [] x86_64_start_reservations+0x2a/0x2c [0.00] [] x86_64_start_kernel+0xf0/0xf7 [0.00] ok | ok | ok | [0.00] A-B-B-A deadlock: ok | ok |FAILED| [0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16 [0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 [0.00] 0002 81a01f28 815e12a5 810b7727 [0.00] 0001 81a01f58 815e1db2 [0.00] 81a01f68 [0.00] Call Trace: [0.00] [] dump_stack+0x4f/0x7c [0.00] [] ? console_trylock_for_printk+0x37/0xf0 [0.00] [] dotest+0x5f/0xc7 [0.00] [] locking_selftest+0x16e/0xb30 [0.00] [] start_kernel+0x215/0x327 [0.00] [] ? repair_env_string+0x5a/0x5a [0.00] [] ? memblock_reserve+0x49/0x4e [0.00] [] x86_64_start_reservations+0x2a/0x2c [0.00] [] x86_64_start_kernel+0xf0/0xf7 [0.00] ok | ok | ok | [0.00] A-B-B-C-C-A deadlock: ok | ok |FAILED| [0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16 [0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 [0.00] 0002 81a01f28 815e12a5 810b7727 [0.00] 0001 81a01f58 815e1db2 [0.00] 81a01f68 [0.00] Call Trace: [0.00] [] dump_stack+0x4f/0x7c [0.00] [] ? console_trylock_for_printk+0x37/0xf0 [0.00] [] dotest+0x5f/0xc7 [0.00] [] locking_selftest+0x1fd/0xb30 [0.00] [] start_kernel+0x215/0x327 [0.00] [] ? repair_env_string+0x5a/0x5a [0.00] [] ? memblock_reserve+0x49/0x4e [0.00] [] x86_64_start_reservations+0x2a/0x2c [0.00] [] x86_64_start_kernel+0xf0/0xf7 [0.00] ok | ok | ok | [0.00] A-B-C-A-B-C deadlock: ok | ok |FAILED| [0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16 [0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 [0.00] 0002 81a01f28 815e12a5 810b7727 [0.00] 0001 81a01f58 815e1db2 [0.00] 81a01f68 [0.00] Call Trace: [0.00] [] dump_stack+0x4f/0x7c [0.00] [] ? console_trylock_for_printk+0x37/0xf0 [0.00] [] dotest+0x5f/0xc7 [0.00] [] locking_selftest+0x28c/0xb30 [0.00] [] start_kernel+0x215/0x327 [0.00] [] ? repair_env_string+0x5a/0x5a [0.00] [] ? memblock_reserve+0x49/0x4e [0.00] [] x86_64_start_reservations+0x2a/0x2c [0.00] [] x86_64_start_kernel+0xf0/0xf7 [0.00] ok | ok | ok | [0.00] A-B-B-C-C-D-D-A deadlock: ok | ok |FAILED| [0.00] CPU: 0 PID: 0 C
Re: [ANNOUNCE] 3.14-rt1
On Wed, 23 Apr 2014 12:37:05 +0200 Mike Galbraith wrote: > On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: > > > This -RT series didn't crashed within ~4h testing on my ARM and > > x86-32. > > x86-64 crashed after I started hackbench. I figured out that the crash > > does not happen with lazy-preempt disabled. Therefore the last but one > > patch in the queue disables lazy preempt on x86-64. With this change the > > test box survived ~2h without a crash. I look at this later but it looks > > good now. > > I think the below fixes it (in a more or less minimalist way), but it's > not very pretty. Methinks it would be prettier to either clone the x86 > percpu + fold logic, or neutralize that optimization completely when > PREEMPT_LAZY is enabled. > > x86_32 bit is completely untested, x86_64 hasn't exploded.. yet :) > This patch makes sense to me. Acked-by: Steven Rostedt -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: > This -RT series didn't crashed within ~4h testing on my ARM and > x86-32. > x86-64 crashed after I started hackbench. I figured out that the crash > does not happen with lazy-preempt disabled. Therefore the last but one > patch in the queue disables lazy preempt on x86-64. With this change the > test box survived ~2h without a crash. I look at this later but it looks > good now. I think the below fixes it (in a more or less minimalist way), but it's not very pretty. Methinks it would be prettier to either clone the x86 percpu + fold logic, or neutralize that optimization completely when PREEMPT_LAZY is enabled. x86_32 bit is completely untested, x86_64 hasn't exploded.. yet :) --- include/linux/preempt.h|3 +-- arch/x86/include/asm/preempt.h |8 arch/x86/kernel/asm-offsets.c |1 + arch/x86/kernel/entry_32.S |9 ++--- arch/x86/kernel/entry_64.S |7 +-- 5 files changed, 21 insertions(+), 7 deletions(-) --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -126,8 +126,7 @@ do { \ #define preempt_enable_notrace() \ do { \ barrier(); \ - if (unlikely(__preempt_count_dec_and_test() || \ - test_thread_flag(TIF_NEED_RESCHED_LAZY))) \ + if (unlikely(__preempt_count_dec_and_test())) \ __preempt_schedule_context(); \ } while (0) #else --- a/arch/x86/include/asm/preempt.h +++ b/arch/x86/include/asm/preempt.h @@ -94,7 +94,11 @@ static __always_inline bool __preempt_co { if (preempt_count_dec_and_test()) return true; +#ifdef CONFIG_PREEMPT_LAZY return test_thread_flag(TIF_NEED_RESCHED_LAZY); +#else + return false; +#endif } /* @@ -102,8 +106,12 @@ static __always_inline bool __preempt_co */ static __always_inline bool should_resched(void) { +#ifdef CONFIG_PREEMPT_LAZY return unlikely(!__this_cpu_read_4(__preempt_count) || \ test_thread_flag(TIF_NEED_RESCHED_LAZY)); +#else + return unlikely(!__this_cpu_read_4(__preempt_count)); +#endif } #ifdef CONFIG_PREEMPT --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -72,4 +72,5 @@ void common(void) { BLANK(); DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); + DEFINE(_PREEMPT_ENABLED, PREEMPT_ENABLED); } --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -365,19 +365,22 @@ ENTRY(resume_kernel) need_resched: # preempt count == 0 + NEED_RS set? cmpl $0,PER_CPU_VAR(__preempt_count) +#ifndef CONFIG_PREEMPT_LAZY + jnz restore_all +#else jz test_int_off # atleast preempt count == 0 ? - cmpl $_TIF_NEED_RESCHED,PER_CPU_VAR(__preempt_count) + cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count) jne restore_all cmpl $0,TI_preempt_lazy_count(%ebp) # non-zero preempt_lazy_count ? jnz restore_all - testl $_TIF_NEED_RESCHED_LAZY, %ecx + testl $_TIF_NEED_RESCHED_LAZY, TI_flags(%ebp) jz restore_all - test_int_off: +#endif testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)# interrupts off (exception path) ? jz restore_all call preempt_schedule_irq --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1104,10 +1104,13 @@ ENTRY(native_iret) /* rcx: threadinfo. interrupts off. */ ENTRY(retint_kernel) cmpl $0,PER_CPU_VAR(__preempt_count) +#ifndef CONFIG_PREEMPT_LAZY + jnz retint_restore_args +#else jz check_int_off # atleast preempt count == 0 ? - cmpl $_TIF_NEED_RESCHED,PER_CPU_VAR(__preempt_count) + cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count) jnz retint_restore_args cmpl $0, TI_preempt_lazy_count(%rcx) @@ -1115,8 +1118,8 @@ ENTRY(retint_kernel) bt $TIF_NEED_RESCHED_LAZY,TI_flags(%rcx) jnc retint_restore_args - check_int_off: +#endif bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */ jnc retint_restore_args call preempt_schedule_irq -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
On Sat, 2014-04-19 at 16:46 +0200, Mike Galbraith wrote: > Hi Sebastian, > > On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: > > Dear RT folks! > > > > I'm pleased to announce the v3.14-rt1 patch set. > > This hunk in hotplug-light-get-online-cpus.patch looks like a bug. > > @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int > /* CPU didn't die: tell everyone. Can't complain. */ > smpboot_unpark_threads(cpu); > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > - goto out_release; > + goto out_cancel; > } > BUG_ON(cpu_online(cpu)); > Another little bug. This hunk of patches/stomp-machine-raw-lock.patch should be while (atomic_read(&done.nr_todo)) @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int ( ret = multi_cpu_stop(&msdata); /* Busy wait for completion. */ - while (!completion_done(&done.completion)) + while (!atomic_read(&done.nr_todo)) cpu_relax(); mutex_unlock(&stop_cpus_mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
Hi Sebastian, On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: > Dear RT folks! > > I'm pleased to announce the v3.14-rt1 patch set. This hunk in hotplug-light-get-online-cpus.patch looks like a bug. @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int /* CPU didn't die: tell everyone. Can't complain. */ smpboot_unpark_threads(cpu); cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); - goto out_release; + goto out_cancel; } BUG_ON(cpu_online(cpu)); > x86-64 crashed after I started hackbench. I figured out that the crash > does not happen with lazy-preempt disabled. Therefore the last but one > patch in the queue disables lazy preempt on x86-64. With this change the > test box survived ~2h without a crash. I look at this later but it looks > good now. Ah, I had trouble there a while back too. I'll try to scrape up cycles for a round 2, see who begs for mercy this time, it or me again. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.14-rt1
11.04.2014 22:57, Sebastian Andrzej Siewior пишет: Dear RT folks! I'm pleased to announce the v3.14-rt1 patch set. Hray! -- Pavel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/