subject:"Re\: \[ANNOUNCE\] 3.14\-rt1"

Re: [ANNOUNCE] 3.14-rt1

2014-05-15 Thread Fernando Lopez-Lezcano


On 05/02/2014 04:37 AM, Sebastian Andrzej Siewior wrote:

* Fernando Lopez-Lezcano | 2014-04-26 11:29:04 [-0700]:


Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I
have seen something similar in 3.12.x-r):


Yes, you did: https://lkml.org/lkml/2014/3/7/163
You did not test I've sent. Care to do so?


I did patch my kernel and (I think) I did not see the problem again. I 
did get some very occassional hangs that seemed to be video related but 
I think I could not see what had caused them.



Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut
here ]
Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID:
2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption.
prev->next should be 8802101196a0, but was 0001
Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in:


and please send backtrace information properly formatted. This is
terrible hard to read.


Sorry about that, I will attach files in the future.

I re-patched 3.14.3-rt5 with a slightly tweaked version of you patch. 
Will see what happens and report back.

-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-02 Thread Sebastian Andrzej Siewior

* Fernando Lopez-Lezcano | 2014-04-26 11:29:04 [-0700]:

>Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I
>have seen something similar in 3.12.x-r):

Yes, you did: https://lkml.org/lkml/2014/3/7/163
You did not test I've sent. Care to do so?

>Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut
>here ]
>Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID:
>2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
>Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption.
>prev->next should be 8802101196a0, but was 0001
>Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in:

and please send backtrace information properly formatted. This is
terrible hard to read.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-02 Thread Mike Galbraith

On Fri, 2014-05-02 at 12:09 +0200, Sebastian Andrzej Siewior wrote: 
> * Mike Galbraith | 2014-04-19 16:46:06 [+0200]:
> 
> >Hi Sebastian,
> Hi Mike,
> 
> >This hunk in hotplug-light-get-online-cpus.patch looks like a bug.
> >
> >@@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
> >/* CPU didn't die: tell everyone.  Can't complain. */
> >smpboot_unpark_threads(cpu);
> >cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> >-   goto out_release;
> >+   goto out_cancel;
> >}
> >BUG_ON(cpu_online(cpu));
> 
> Yes, it looks like it. v3.12-rt did not have this…

I just sent a set of dinky patches with which patch to fold them into,
along with my way of dealing with the new stopper_lock.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-02 Thread Sebastian Andrzej Siewior

* Mike Galbraith | 2014-04-21 05:31:18 [+0200]:

>Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
>should be while (atomic_read(&done.nr_todo)) 

Thanks, fixed up.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-02 Thread Sebastian Andrzej Siewior

* Mike Galbraith | 2014-04-19 16:46:06 [+0200]:

>Hi Sebastian,
Hi Mike,

>This hunk in hotplug-light-get-online-cpus.patch looks like a bug.
>
>@@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
>/* CPU didn't die: tell everyone.  Can't complain. */
>smpboot_unpark_threads(cpu);
>cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
>-   goto out_release;
>+   goto out_cancel;
>}
>BUG_ON(cpu_online(cpu));

Yes, it looks like it. v3.12-rt did not have this…

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-01 Thread Mike Galbraith

On Thu, 2014-05-01 at 14:42 -0400, Steven Rostedt wrote: 
> On Thu, 01 May 2014 19:36:18 +0200
> Mike Galbraith  wrote:
> 

> > Hah!  I knew you were just hiding, you sneaky little SOB ;-)
> 
> What's this from? A new bug that had all the patches applied? Or was
> this without one of the patches?

It's with all patches applied.  It's not new, it has muddied the water
during other hunting expeditions.  You may never see it in a box with a
sane topology, that box is kinda special.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-01 Thread Steven Rostedt

On Thu, 01 May 2014 19:36:18 +0200
Mike Galbraith  wrote:

> On Wed, 2014-04-30 at 11:48 -0400, Steven Rostedt wrote: 
> > On Wed, 30 Apr 2014 17:15:57 +0200
> > Mike Galbraith  wrote:
> > 
> > > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote:
> > > 
> > > > > Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
> > > > > should be while (atomic_read(&done.nr_todo)) 
> > > > > 
> > > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
> > > > > ret = multi_cpu_stop(&msdata);
> > > > > 
> > > > > /* Busy wait for completion. */
> > > > > -   while (!completion_done(&done.completion))
> > > > > +   while (!atomic_read(&done.nr_todo))
> > >^--- that ! needs to go away 
> > > > 
> > > > I don't see this in the code. That is, there is no "completion_done()"
> > > > in stop_machine_from_inactive_cpu(). It is already an atomic_read().
> > > 
> > > Yes, but it should read "while (atomic_read(&done.nr_todo))"
> > 
> > Ah, this would have been better if you had sent a patch. I misread what
> > you talked about.
> > 
> > Yes, this was the culprit of my failures. After removing the '!', it
> > worked.
> 
> Hah!  I knew you were just hiding, you sneaky little SOB ;-)

What's this from? A new bug that had all the patches applied? Or was
this without one of the patches?

-- Steve

> 
> 
> [50661.070049] smpboot: Booting Node 0 Processor 15 APIC 0x36
> [50661.142381] kvm: enabling virtualization on CPU15
> [50661.142397] BUG: unable to handle kernel NULL pointer dereference at   
> (null)
> [50661.142417] IP: [] wake_up_process+0x1/0x40
> [50661.142420] PGD 0
> [50661.142422] Oops:  [#1] PREEMPT SMP
> [50661.142470] Modules linked in: nfsd(F) lockd(F) nfs_acl(F) auth_rpcgss(F) 
> sunrpc(F) autofs4(F) binfmt_misc(F) edd(F) af_packet(F) bridge(F) stp(F) 
> llc(F) cpufreq_conservative(F) cpufreq_ondemand(F) cpufreq_userspace(F) 
> cpufreq_powersave(F) pcc_cpufreq(F) fuse(F) loop(F) md_mod(F) dm_mod(F) 
> iTCO_wdt(F) iTCO_vendor_support(F) gpio_ich(F) vhost_net(F) macvtap(F) 
> macvlan(F) vhost(F) tun(F) i7core_edac(F) netxen_nic(F) kvm_intel(F) 
> joydev(F) shpchp(F) edac_core(F) hid_generic(F) kvm(F) ipmi_si(F) sr_mod(F) 
> ipmi_msghandler(F) bnx2(F) cdrom(F) sg(F) hpilo(F) hpwdt(F) ehci_pci(F) 
> lpc_ich(F) mfd_core(F) acpi_power_meter(F) pcspkr(F) button(F) ext4(F) 
> jbd2(F) mbcache(F) crc16(F) usbhid(F) uhci_hcd(F) ehci_hcd(F) usbcore(F) 
> sd_mod(F) usb_common(F) thermal(F) processor(F) scsi_dh_rdac(F) 
> scsi_dh_alua(F) scsi_dh_emc(F)
> [50661.142475]  scsi_dh_hp_sw(F) scsi_dh(F) ata_generic(F) ata_piix(F) 
> libata(F) cciss(F) hpsa(F) scsi_mod(F)
> [50661.142479] CPU: 39 PID: 283 Comm: migration/39 Tainted: GF
> 3.14.2-rt1 #667
> [50661.142481] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 
> 07/07/2010
> [50661.142482] task: 880274515bb0 ti: 88027454e000 task.ti: 
> 88027454e000
> [50661.142486] RIP: 0010:[]  [] 
> wake_up_process+0x1/0x40
> [50661.142487] RSP: 0018:88027454fda8  EFLAGS: 00010002
> [50661.142488] RAX: 8001 RBX: 880275581eb8 RCX: 
> 
> [50661.142488] RDX: 81aacec0 RSI: 0100 RDI: 
> 
> [50661.142489] RBP: 8802772ee9b0 R08:  R09: 
> 81aacec0
> [50661.142490] R10:  R11: 8103d640 R12: 
> 810f26c0
> [50661.142490] R13: 880275581e88 R14: 8802772ee9b8 R15: 
> 88027454e010
> [50661.142492] FS:  () GS:8802772e() 
> knlGS:
> [50661.142493] CS:  0010 DS:  ES:  CR0: 8005003b
> [50661.142494] CR2:  CR3: 01a0f000 CR4: 
> 07e0
> [50661.142494] Stack:
> [50661.142505]  880275581eb8 810f2555 880274515bb0 
> 0005
> [50661.142508]  0001 0001 0140 
> 0001
> [50661.142512]  880274515bb0 88027454e000 8802772f4020 
> 0005
> [50661.142512] Call Trace:
> [50661.142526]  [] ? cpu_stopper_thread+0x125/0x1a0
> [50661.142530]  [] ? smpboot_thread_fn+0x23d/0x320
> [50661.142533]  [] ? smpboot_create_threads+0x70/0x70
> [50661.142535]  [] ? smpboot_create_threads+0x70/0x70
> [50661.142543]  [] ? kthread+0xd2/0xe0
> [50661.142545]  [] ? kthreadd+0x330/0x330
> [50661.142553]  [] ? ret_from_fork+0x7c/0xb0
> [50661.142555]  [] ? kthreadd+0x330/0x330
> [50661.142568] Code: fd ff ff 0f 1f 80 00 00 00 00 31 d2 e9 09 fd ff ff 66 0f 
> 1f 84 00 00 00 00 00 ba 08 00 00 00 be 0f 00 00 00 e9 f1 fc ff ff 90 53 <48> 
> 8b 07 48 89 fb a8 0c 75 08 48 8b 47 08 a8 0c 74 11 be ba 06
> [50661.142570] RIP  [] wake_up_process+0x1/0x40
> [50661.142570]  RSP 
> [50661.142571] CR2: 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-in

Re: [ANNOUNCE] 3.14-rt1

2014-05-01 Thread Mike Galbraith

On Wed, 2014-04-30 at 11:48 -0400, Steven Rostedt wrote: 
> On Wed, 30 Apr 2014 17:15:57 +0200
> Mike Galbraith  wrote:
> 
> > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote:
> > 
> > > > Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
> > > > should be while (atomic_read(&done.nr_todo)) 
> > > > 
> > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
> > > > ret = multi_cpu_stop(&msdata);
> > > > 
> > > > /* Busy wait for completion. */
> > > > -   while (!completion_done(&done.completion))
> > > > +   while (!atomic_read(&done.nr_todo))
> >^--- that ! needs to go away 
> > > 
> > > I don't see this in the code. That is, there is no "completion_done()"
> > > in stop_machine_from_inactive_cpu(). It is already an atomic_read().
> > 
> > Yes, but it should read "while (atomic_read(&done.nr_todo))"
> 
> Ah, this would have been better if you had sent a patch. I misread what
> you talked about.
> 
> Yes, this was the culprit of my failures. After removing the '!', it
> worked.

Hah!  I knew you were just hiding, you sneaky little SOB ;-)


[50661.070049] smpboot: Booting Node 0 Processor 15 APIC 0x36
[50661.142381] kvm: enabling virtualization on CPU15
[50661.142397] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[50661.142417] IP: [] wake_up_process+0x1/0x40
[50661.142420] PGD 0
[50661.142422] Oops:  [#1] PREEMPT SMP
[50661.142470] Modules linked in: nfsd(F) lockd(F) nfs_acl(F) auth_rpcgss(F) 
sunrpc(F) autofs4(F) binfmt_misc(F) edd(F) af_packet(F) bridge(F) stp(F) llc(F) 
cpufreq_conservative(F) cpufreq_ondemand(F) cpufreq_userspace(F) 
cpufreq_powersave(F) pcc_cpufreq(F) fuse(F) loop(F) md_mod(F) dm_mod(F) 
iTCO_wdt(F) iTCO_vendor_support(F) gpio_ich(F) vhost_net(F) macvtap(F) 
macvlan(F) vhost(F) tun(F) i7core_edac(F) netxen_nic(F) kvm_intel(F) joydev(F) 
shpchp(F) edac_core(F) hid_generic(F) kvm(F) ipmi_si(F) sr_mod(F) 
ipmi_msghandler(F) bnx2(F) cdrom(F) sg(F) hpilo(F) hpwdt(F) ehci_pci(F) 
lpc_ich(F) mfd_core(F) acpi_power_meter(F) pcspkr(F) button(F) ext4(F) jbd2(F) 
mbcache(F) crc16(F) usbhid(F) uhci_hcd(F) ehci_hcd(F) usbcore(F) sd_mod(F) 
usb_common(F) thermal(F) processor(F) scsi_dh_rdac(F) scsi_dh_alua(F) 
scsi_dh_emc(F)
[50661.142475]  scsi_dh_hp_sw(F) scsi_dh(F) ata_generic(F) ata_piix(F) 
libata(F) cciss(F) hpsa(F) scsi_mod(F)
[50661.142479] CPU: 39 PID: 283 Comm: migration/39 Tainted: GF
3.14.2-rt1 #667
[50661.142481] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 
07/07/2010
[50661.142482] task: 880274515bb0 ti: 88027454e000 task.ti: 
88027454e000
[50661.142486] RIP: 0010:[]  [] 
wake_up_process+0x1/0x40
[50661.142487] RSP: 0018:88027454fda8  EFLAGS: 00010002
[50661.142488] RAX: 8001 RBX: 880275581eb8 RCX: 
[50661.142488] RDX: 81aacec0 RSI: 0100 RDI: 
[50661.142489] RBP: 8802772ee9b0 R08:  R09: 81aacec0
[50661.142490] R10:  R11: 8103d640 R12: 810f26c0
[50661.142490] R13: 880275581e88 R14: 8802772ee9b8 R15: 88027454e010
[50661.142492] FS:  () GS:8802772e() 
knlGS:
[50661.142493] CS:  0010 DS:  ES:  CR0: 8005003b
[50661.142494] CR2:  CR3: 01a0f000 CR4: 07e0
[50661.142494] Stack:
[50661.142505]  880275581eb8 810f2555 880274515bb0 
0005
[50661.142508]  0001 0001 0140 
0001
[50661.142512]  880274515bb0 88027454e000 8802772f4020 
0005
[50661.142512] Call Trace:
[50661.142526]  [] ? cpu_stopper_thread+0x125/0x1a0
[50661.142530]  [] ? smpboot_thread_fn+0x23d/0x320
[50661.142533]  [] ? smpboot_create_threads+0x70/0x70
[50661.142535]  [] ? smpboot_create_threads+0x70/0x70
[50661.142543]  [] ? kthread+0xd2/0xe0
[50661.142545]  [] ? kthreadd+0x330/0x330
[50661.142553]  [] ? ret_from_fork+0x7c/0xb0
[50661.142555]  [] ? kthreadd+0x330/0x330
[50661.142568] Code: fd ff ff 0f 1f 80 00 00 00 00 31 d2 e9 09 fd ff ff 66 0f 
1f 84 00 00 00 00 00 ba 08 00 00 00 be 0f 00 00 00 e9 f1 fc ff ff 90 53 <48> 8b 
07 48 89 fb a8 0c 75 08 48 8b 47 08 a8 0c 74 11 be ba 06
[50661.142570] RIP  [] wake_up_process+0x1/0x40
[50661.142570]  RSP 
[50661.142571] CR2: 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Wed, 2014-04-30 at 11:48 -0400, Steven Rostedt wrote: 
> On Wed, 30 Apr 2014 17:15:57 +0200
> Mike Galbraith  wrote:
> 
> > On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote:
> > 
> > > > Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
> > > > should be while (atomic_read(&done.nr_todo)) 
> > > > 
> > > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
> > > > ret = multi_cpu_stop(&msdata);
> > > > 
> > > > /* Busy wait for completion. */
> > > > -   while (!completion_done(&done.completion))
> > > > +   while (!atomic_read(&done.nr_todo))
> >^--- that ! needs to go away 
> > > 
> > > I don't see this in the code. That is, there is no "completion_done()"
> > > in stop_machine_from_inactive_cpu(). It is already an atomic_read().
> > 
> > Yes, but it should read "while (atomic_read(&done.nr_todo))"
> 
> Ah, this would have been better if you had sent a patch. I misread what
> you talked about.
> 
> Yes, this was the culprit of my failures. After removing the '!', it
> worked.
> 
> Care to send a patch :-)

I figured those two were just edit patch, done, but yeah, I can do that.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Steven Rostedt

On Wed, 30 Apr 2014 17:15:57 +0200
Mike Galbraith  wrote:

> On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote:
> 
> > > Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
> > > should be while (atomic_read(&done.nr_todo)) 
> > > 
> > > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
> > > ret = multi_cpu_stop(&msdata);
> > > 
> > > /* Busy wait for completion. */
> > > -   while (!completion_done(&done.completion))
> > > +   while (!atomic_read(&done.nr_todo))
>^--- that ! needs to go away 
> > 
> > I don't see this in the code. That is, there is no "completion_done()"
> > in stop_machine_from_inactive_cpu(). It is already an atomic_read().
> 
> Yes, but it should read "while (atomic_read(&done.nr_todo))"

Ah, this would have been better if you had sent a patch. I misread what
you talked about.

Yes, this was the culprit of my failures. After removing the '!', it
worked.

Care to send a patch :-)

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith


I fired off a 100 iteration run on 64 core box.  If it's still alive in
the morning, it should still be busy as hell.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Wed, 2014-04-30 at 11:11 -0400, Steven Rostedt wrote:

> > Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
> > should be while (atomic_read(&done.nr_todo)) 
> > 
> > @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
> > ret = multi_cpu_stop(&msdata);
> > 
> > /* Busy wait for completion. */
> > -   while (!completion_done(&done.completion))
> > +   while (!atomic_read(&done.nr_todo))
   ^--- that ! needs to go away 
> 
> I don't see this in the code. That is, there is no "completion_done()"
> in stop_machine_from_inactive_cpu(). It is already an atomic_read().

Yes, but it should read "while (atomic_read(&done.nr_todo))"

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Steven Rostedt

On Wed, 30 Apr 2014 16:54:46 +0200
Mike Galbraith  wrote:

> On Wed, 2014-04-30 at 10:33 -0400, Steven Rostedt wrote: 
> > On Wed, 30 Apr 2014 10:19:19 -0400
> > Steven Rostedt  wrote:
> > 
> > > I'm testing it now. But could you please post them as regular patches.
> > > They were attachments to this thread, and were not something that stood
> > > out.
> > 
> > With your two patches, it still crashes exactly the same way. I
> > probably should remove my debug just in case, but I think this box has
> > another problem with it.
> 
> You killed this hunk of hotplug-light-get-online-cpus.patch
> 
> @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
> /* CPU didn't die: tell everyone.  Can't complain. */
> smpboot_unpark_threads(cpu);
> cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> -   goto out_release;
> +   goto out_cancel;

I added this, but it only happens on the failed case, which I don't
think is an issue with what I'm dealing with.

> }
> BUG_ON(cpu_online(cpu));
> 
> ..and fixed this too?
> 
> Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
> should be while (atomic_read(&done.nr_todo)) 
> 
> @@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
> ret = multi_cpu_stop(&msdata);
> 
> /* Busy wait for completion. */
> -   while (!completion_done(&done.completion))
> +   while (!atomic_read(&done.nr_todo))

I don't see this in the code. That is, there is no "completion_done()"
in stop_machine_from_inactive_cpu(). It is already an atomic_read().

-- Steve

> cpu_relax();
> 
> mutex_unlock(&stop_cpus_mutex);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Wed, 2014-04-30 at 10:33 -0400, Steven Rostedt wrote: 
> On Wed, 30 Apr 2014 10:19:19 -0400
> Steven Rostedt  wrote:
> 
> > I'm testing it now. But could you please post them as regular patches.
> > They were attachments to this thread, and were not something that stood
> > out.
> 
> With your two patches, it still crashes exactly the same way. I
> probably should remove my debug just in case, but I think this box has
> another problem with it.

You killed this hunk of hotplug-light-get-online-cpus.patch

@@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
/* CPU didn't die: tell everyone.  Can't complain. */
smpboot_unpark_threads(cpu);
cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
-   goto out_release;
+   goto out_cancel;
}
BUG_ON(cpu_online(cpu));

..and fixed this too?

Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
should be while (atomic_read(&done.nr_todo)) 

@@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
ret = multi_cpu_stop(&msdata);

/* Busy wait for completion. */
-   while (!completion_done(&done.completion))
+   while (!atomic_read(&done.nr_todo))
cpu_relax();

mutex_unlock(&stop_cpus_mutex);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Wed, 2014-04-30 at 10:19 -0400, Steven Rostedt wrote: 
> On Wed, 30 Apr 2014 16:00:03 +0200
> Mike Galbraith  wrote:
> 
> > On Wed, 2014-04-30 at 09:15 -0400, Steven Rostedt wrote: 
> > > On Wed, 30 Apr 2014 15:06:29 +0200
> > > Mike Galbraith  wrote:
> > > 
> > > 
> > > > The End.. I hope.  I've had enough hotplug entertainment for a while.
> > > 
> > > Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a
> > > different issues than what my patch addressed. I'm still debugging it.
> > 
> > If you didn't fix the two bugs I showed, and (wisely) didn't look at the
> > beautiful lglock patches I posted (no frozen shark, I'm disappointed;),
> > your patch won't help.
> 
> Mike,
> 
> I'm testing it now. But could you please post them as regular patches.
> They were attachments to this thread, and were not something that stood
> out.

They were meant to not stick out :)  I showed what I did to deal with
that damn lglock, but showing them at all felt more akin to chumming the
waters for frozen sharks than posting patches.

'spose I could try to muster up some courage, showing them put a pretty
big dent in my supply.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Steven Rostedt

On Wed, 30 Apr 2014 10:19:19 -0400
Steven Rostedt  wrote:

> I'm testing it now. But could you please post them as regular patches.
> They were attachments to this thread, and were not something that stood
> out.

With your two patches, it still crashes exactly the same way. I
probably should remove my debug just in case, but I think this box has
another problem with it.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Steven Rostedt

On Wed, 30 Apr 2014 16:00:03 +0200
Mike Galbraith  wrote:

> On Wed, 2014-04-30 at 09:15 -0400, Steven Rostedt wrote: 
> > On Wed, 30 Apr 2014 15:06:29 +0200
> > Mike Galbraith  wrote:
> > 
> > 
> > > The End.. I hope.  I've had enough hotplug entertainment for a while.
> > 
> > Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a
> > different issues than what my patch addressed. I'm still debugging it.
> 
> If you didn't fix the two bugs I showed, and (wisely) didn't look at the
> beautiful lglock patches I posted (no frozen shark, I'm disappointed;),
> your patch won't help.

Mike,

I'm testing it now. But could you please post them as regular patches.
They were attachments to this thread, and were not something that stood
out.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Wed, 2014-04-30 at 09:15 -0400, Steven Rostedt wrote: 
> On Wed, 30 Apr 2014 15:06:29 +0200
> Mike Galbraith  wrote:
> 
> 
> > The End.. I hope.  I've had enough hotplug entertainment for a while.
> 
> Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a
> different issues than what my patch addressed. I'm still debugging it.

If you didn't fix the two bugs I showed, and (wisely) didn't look at the
beautiful lglock patches I posted (no frozen shark, I'm disappointed;),
your patch won't help.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Steven Rostedt

On Wed, 30 Apr 2014 15:06:29 +0200
Mike Galbraith  wrote:


> The End.. I hope.  I've had enough hotplug entertainment for a while.

Not for me. 3.14-rt stress-cpu-hotplug crashes quickly. But it's a
different issues than what my patch addressed. I'm still debugging it.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Wed, 2014-04-30 at 09:43 +0200, Mike Galbraith wrote: 
> On Tue, 2014-04-29 at 20:13 -0400, Steven Rostedt wrote: 
> > On Tue, 29 Apr 2014 07:21:09 +0200
> > Mike Galbraith  wrote:
> > 
> > > On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: 
> > > 
> > > > > Seems that migrate_disable() must be called before taking the lock as
> > > > > it is done in every other location.
> > > > 
> > > > And for tasklist_lock, seems you also MUST do that prior to trylock as
> > > > well, else you'll run afoul of the hotplug beast.
> > > 
> > > Bah.  Futzing with dmesg while stress script is running is either a very
> > > bad idea, or a very good test.  Both virgin 3.10-rt and 3.12-rt with new
> > > bugs squashed will deadlock.
> > > 
> > > Too bad I kept on testing, I liked the notion that hotplug was solid ;-)
> > 
> > I was able to stress cpu hotplug on 3.12-rt after applying the
> > following patch.
> > 
> > If there's no complaints about it. I'm going to add this to the 3.12-rt
> > stable tree. As without it, it fails horribly with the cpu hotplug
> > stress test, and I wont release a stable kernel that does that.
> 
> My local boxen are happy, 64 core box with 14-rt seems happy as well,
> though I couldn't let it burn for long.

And 3.12 looks stable on 64 core DL980 as well.  (If it survived a 24
hour busy+stress session I'd still likely fall outta my chair though)

My kinda sorta 3.12-rt enterprise to be kernel wasn't stable on DL980,
while appearing just fine on small boxen, which made me suspect that
there was still a big box something lurking, only raising its ugly head
in the fatter kernel.  That wasn't an rt problem after all, someone in
enterprise land just didn't stack their goody pile quite high enough
while wedging upstream into the stable base kernel, which bent rt.

The End.. I hope.  I've had enough hotplug entertainment for a while.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-30 Thread Mike Galbraith

On Tue, 2014-04-29 at 20:13 -0400, Steven Rostedt wrote: 
> On Tue, 29 Apr 2014 07:21:09 +0200
> Mike Galbraith  wrote:
> 
> > On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: 
> > 
> > > > Seems that migrate_disable() must be called before taking the lock as
> > > > it is done in every other location.
> > > 
> > > And for tasklist_lock, seems you also MUST do that prior to trylock as
> > > well, else you'll run afoul of the hotplug beast.
> > 
> > Bah.  Futzing with dmesg while stress script is running is either a very
> > bad idea, or a very good test.  Both virgin 3.10-rt and 3.12-rt with new
> > bugs squashed will deadlock.
> > 
> > Too bad I kept on testing, I liked the notion that hotplug was solid ;-)
> 
> I was able to stress cpu hotplug on 3.12-rt after applying the
> following patch.
> 
> If there's no complaints about it. I'm going to add this to the 3.12-rt
> stable tree. As without it, it fails horribly with the cpu hotplug
> stress test, and I wont release a stable kernel that does that.

My local boxen are happy, 64 core box with 14-rt seems happy as well,
though I couldn't let it burn for long.

BTW, that dmesg business went into hiding.  I didn't have time to put
virgin 10-rt back on and play around poking both kernels this that and
the other way again, but seems there's some phase-of-moon factor there.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-29 Thread Steven Rostedt

On Tue, 29 Apr 2014 07:21:09 +0200
Mike Galbraith  wrote:

> On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: 
> 
> > > Seems that migrate_disable() must be called before taking the lock as
> > > it is done in every other location.
> > 
> > And for tasklist_lock, seems you also MUST do that prior to trylock as
> > well, else you'll run afoul of the hotplug beast.
> 
> Bah.  Futzing with dmesg while stress script is running is either a very
> bad idea, or a very good test.  Both virgin 3.10-rt and 3.12-rt with new
> bugs squashed will deadlock.
> 
> Too bad I kept on testing, I liked the notion that hotplug was solid ;-)

I was able to stress cpu hotplug on 3.12-rt after applying the
following patch.

If there's no complaints about it. I'm going to add this to the 3.12-rt
stable tree. As without it, it fails horribly with the cpu hotplug
stress test, and I wont release a stable kernel that does that.

-- Steve

Signed-off-by: Steven Rostedt 

diff --git a/kernel/rt.c b/kernel/rt.c
index bb72347..4f2a613 100644
--- a/kernel/rt.c
+++ b/kernel/rt.c
@@ -180,12 +180,15 @@ EXPORT_SYMBOL(_mutex_unlock);
  */
 int __lockfunc rt_write_trylock(rwlock_t *rwlock)
 {
-   int ret = rt_mutex_trylock(&rwlock->lock);
+   int ret;
+
+   migrate_disable();
+   ret = rt_mutex_trylock(&rwlock->lock);
 
-   if (ret) {
+   if (ret)
rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
-   migrate_disable();
-   }
+   else
+   migrate_enable();
 
return ret;
 }
@@ -212,11 +215,12 @@ int __lockfunc rt_read_trylock(rwlock_t *rwlock)
 * write locked.
 */
if (rt_mutex_owner(lock) != current) {
+   migrate_disable();
ret = rt_mutex_trylock(lock);
-   if (ret) {
+   if (ret)
rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
-   migrate_disable();
-   }
+   else
+   migrate_enable();
} else if (!rwlock->read_depth) {
ret = 0;
}
@@ -245,8 +249,8 @@ void __lockfunc rt_read_lock(rwlock_t *rwlock)
 */
if (rt_mutex_owner(lock) != current) {
rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
-   __rt_spin_lock(lock);
migrate_disable();
+   __rt_spin_lock(lock);
}
rwlock->read_depth++;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-28 Thread Mike Galbraith

On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: 

> > Seems that migrate_disable() must be called before taking the lock as
> > it is done in every other location.
> 
> And for tasklist_lock, seems you also MUST do that prior to trylock as
> well, else you'll run afoul of the hotplug beast.

Bah.  Futzing with dmesg while stress script is running is either a very
bad idea, or a very good test.  Both virgin 3.10-rt and 3.12-rt with new
bugs squashed will deadlock.

Too bad I kept on testing, I liked the notion that hotplug was solid ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-28 Thread Mike Galbraith

On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: 
> On Mon, 2014-04-28 at 10:18 -0400, Steven Rostedt wrote: 
> > On Mon, 28 Apr 2014 11:09:46 +0200
> > Mike Galbraith  wrote:
> >  
> > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> > > 
> > > bug: migrate_disable() after blocking is too late.
> > > 
> > > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a
> > > /* Subtract 1 from counter unless that drops it to 0 (ie. it was 
> > > 1) */
> > > if (atomic_add_unless(atomic, -1, 1))
> > > return 0;
> > > -   migrate_disable();
> > > rt_spin_lock(lock);
> > > -   if (atomic_dec_and_test(atomic))
> > > +   if (atomic_dec_and_test(atomic)){
> > > +   migrate_disable();
> > 
> > Makes sense, as the CPU can go offline right after the lock is grabbed
> > and before the migrate_disable() is called.
> > 
> > Seems that migrate_disable() must be called before taking the lock as
> > it is done in every other location.
> 
> And for tasklist_lock, seems you also MUST do that prior to trylock as
> well, else you'll run afoul of the hotplug beast.

This lockdep gripe is from the deadlocked crashdump with only the
clearly busted bits patched up.

[  193.033224] ==
[  193.033225] [ INFO: possible circular locking dependency detected ]
[  193.033227] 3.12.18-rt25 #19 Not tainted
[  193.033227] ---
[  193.033228] boot.kdump/5422 is trying to acquire lock:
[  193.033237]  (&hp->lock){+.+...}, at: [] 
pin_current_cpu+0x84/0x1d0
[  193.033238] 
   but task is already holding lock:
[  193.033241]  (tasklist_lock){+.+...}, at: [] 
do_wait+0xbb/0x2a0
[  193.033242] 
   which lock already depends on the new lock.
   
[  193.033242] 
   the existing dependency chain (in reverse order) is:
[  193.033244] 
   -> #1 (tasklist_lock){+.+...}:
[  193.033248][] check_prevs_add+0xf8/0x180
[  193.033250][] validate_chain.isra.45+0x5aa/0x750
[  193.033252][] __lock_acquire+0x3f6/0x9f0
[  193.033253][] lock_acquire+0x8c/0x160
[  193.033257][] rt_write_lock+0x2c/0x40
[  193.033260][] _cpu_down+0x219/0x440
[  193.033261][] cpu_down+0x30/0x50
[  193.033264][] cpu_subsys_offline+0x1c/0x30
[  193.033267][] device_offline+0x95/0xc0
[  193.033269][] online_store+0x40/0x80
[  193.033271][] dev_attr_store+0x13/0x30
[  193.033274][] sysfs_write_file+0xf0/0x170
[  193.033277][] vfs_write+0xc8/0x1d0
[  193.033279][] SyS_write+0x50/0xa0
[  193.033282][] system_call_fastpath+0x16/0x1b
[  193.033284] 
   -> #0 (&hp->lock){+.+...}:
[  193.033286][] check_prev_add+0x7bd/0x7d0
[  193.033287][] check_prevs_add+0xf8/0x180
[  193.033289][] validate_chain.isra.45+0x5aa/0x750
[  193.033291][] __lock_acquire+0x3f6/0x9f0
[  193.033293][] lock_acquire+0x8c/0x160
[  193.033295][] rt_spin_lock+0x55/0x70
[  193.033296][] pin_current_cpu+0x84/0x1d0
[  193.033299][] migrate_disable+0x81/0x100
[  193.033301][] rt_read_lock+0x47/0x60
[  193.033303][] do_wait+0xbb/0x2a0
[  193.033305][] SyS_wait4+0x9e/0x100
[  193.033307][] system_call_fastpath+0x16/0x1b
[  193.033307] 
   other info that might help us debug this:
   
[  193.033308]  Possible unsafe locking scenario:
   
[  193.033309]CPU0CPU1
[  193.033309]
[  193.033310]   lock(tasklist_lock);
[  193.033312]lock(&hp->lock);
[  193.033313]lock(tasklist_lock);
[  193.033314]   lock(&hp->lock);
[  193.033315] 
*** DEADLOCK ***
   
[  193.033316] 1 lock held by boot.kdump/5422:
[  193.033319]  #0:  (tasklist_lock){+.+...}, at: [] 
do_wait+0xbb/0x2a0
[  193.033320] 
   stack backtrace:
[  193.033322] CPU: 0 PID: 5422 Comm: boot.kdump Not tainted 3.12.18-rt25 #19
[  193.033323] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[  193.033326]  880200550818 8802004e5ad8 8155538c 

[  193.033328]   8802004e5b28 8154d0df 
8802004e5b18
[  193.00]  8802004e5b50 880200550818 8802005507e0 
880200550818
[  193.01] Call Trace:
[  193.05]  [] dump_stack+0x4f/0x91
[  193.07]  [] print_circular_bug+0xd3/0xe4
[  193.09]  [] check_prev_add+0x7bd/0x7d0
[  193.033342]  [] ? sched_clock_local+0x25/0x90
[  193.033344]  [] ? sched_clock_cpu+0xa8/0x120
[  193.033346]  [] check_prevs_add+0xf8/0x180
[  193.033348]  [] validate_chain.isra.45+0x5aa/0x750
[  193.033350]  [] __lock_acquire+0x3f6/0x9f0
[  193.033352]  [] ? rt_spin_lock_slowlock+0x231/0x280
[  1

Re: [ANNOUNCE] 3.14-rt1

2014-04-28 Thread Mike Galbraith

On Mon, 2014-04-28 at 10:18 -0400, Steven Rostedt wrote: 
> On Mon, 28 Apr 2014 11:09:46 +0200
> Mike Galbraith  wrote:
>  
> > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> > 
> > bug: migrate_disable() after blocking is too late.
> > 
> > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a
> > /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) 
> > */
> > if (atomic_add_unless(atomic, -1, 1))
> > return 0;
> > -   migrate_disable();
> > rt_spin_lock(lock);
> > -   if (atomic_dec_and_test(atomic))
> > +   if (atomic_dec_and_test(atomic)){
> > +   migrate_disable();
> 
> Makes sense, as the CPU can go offline right after the lock is grabbed
> and before the migrate_disable() is called.
> 
> Seems that migrate_disable() must be called before taking the lock as
> it is done in every other location.

And for tasklist_lock, seems you also MUST do that prior to trylock as
well, else you'll run afoul of the hotplug beast.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-28 Thread Steven Rostedt

On Mon, 28 Apr 2014 11:09:46 +0200
Mike Galbraith  wrote:
 
> migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> 
> bug: migrate_disable() after blocking is too late.
> 
> @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a
> /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
> if (atomic_add_unless(atomic, -1, 1))
> return 0;
> -   migrate_disable();
> rt_spin_lock(lock);
> -   if (atomic_dec_and_test(atomic))
> +   if (atomic_dec_and_test(atomic)){
> +   migrate_disable();

Makes sense, as the CPU can go offline right after the lock is grabbed
and before the migrate_disable() is called.

Seems that migrate_disable() must be called before taking the lock as
it is done in every other location.

-- Steve


> return 1;
> +   }
> rt_spin_unlock(lock);
> -   migrate_enable();
> return 0;
>  }
>  EXPORT_SYMBOL(atomic_dec_and_spin_lock);
> 
> read_lock-migrate_disable-pushdown-to-rt_read_lock.patch
> 
> bug: ditto.
> 
> @@ -244,8 +246,10 @@ void __lockfunc rt_read_lock(rwlock_t *r
> /*
>  * recursive read locks succeed when current owns the lock
>  */
> -   if (rt_mutex_owner(lock) != current)
> +   if (rt_mutex_owner(lock) != current) {
> __rt_spin_lock(lock);
> +   migrate_disable();
> +   }
> rwlock->read_depth++;
>  }
> 
> Moving that migrate_disable() up will likely fix my hotplug troubles.
> I'll find out when I get back from physical torture (therapy) session.
> 
> -Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-28 Thread Mike Galbraith

On Mon, 2014-04-28 at 07:09 +0200, Mike Galbraith wrote: 
> Hi Nicholas,
> 
> On Sat, 2014-04-26 at 15:58 +0200, Mike Galbraith wrote: 
> > On Sat, 2014-04-26 at 10:38 +0200, Mike Galbraith wrote: 
> > > On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote:
> > > 
> > > > Hotplug can still deadlock in rt trees too, and will if you beat it
> > > > hard.
> > > 
> > > Box actually deadlocks like so.
> > 
> > ...
> > 
> > 3.12-rt looks a bit busted migrate_disable/enable() wise.
> > 
> > /me eyeballs 3.10-rt (looks better), confirms 3.10-rt hotplug works,
> > swipes working code, confirms 3.12-rt now works.  Yup, that was it.
> 
> My boxen, including 64 core DL980 that ran hotplug stress for 3 hours
> yesterday with pre-pushdown rwlocks, say the migrate_disable/enable
> pushdown patches are very definitely busted.

migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch

bug: migrate_disable() after blocking is too late.

@@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a
/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
if (atomic_add_unless(atomic, -1, 1))
return 0;
-   migrate_disable();
rt_spin_lock(lock);
-   if (atomic_dec_and_test(atomic))
+   if (atomic_dec_and_test(atomic)){
+   migrate_disable();
return 1;
+   }
rt_spin_unlock(lock);
-   migrate_enable();
return 0;
 }
 EXPORT_SYMBOL(atomic_dec_and_spin_lock);

read_lock-migrate_disable-pushdown-to-rt_read_lock.patch

bug: ditto.

@@ -244,8 +246,10 @@ void __lockfunc rt_read_lock(rwlock_t *r
/*
 * recursive read locks succeed when current owns the lock
 */
-   if (rt_mutex_owner(lock) != current)
+   if (rt_mutex_owner(lock) != current) {
__rt_spin_lock(lock);
+   migrate_disable();
+   }
rwlock->read_depth++;
 }

Moving that migrate_disable() up will likely fix my hotplug troubles.
I'll find out when I get back from physical torture (therapy) session.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-27 Thread Mike Galbraith

Hi Nicholas,

On Sat, 2014-04-26 at 15:58 +0200, Mike Galbraith wrote: 
> On Sat, 2014-04-26 at 10:38 +0200, Mike Galbraith wrote: 
> > On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote:
> > 
> > > Hotplug can still deadlock in rt trees too, and will if you beat it
> > > hard.
> > 
> > Box actually deadlocks like so.
> 
> ...
> 
> 3.12-rt looks a bit busted migrate_disable/enable() wise.
> 
> /me eyeballs 3.10-rt (looks better), confirms 3.10-rt hotplug works,
> swipes working code, confirms 3.12-rt now works.  Yup, that was it.

My boxen, including 64 core DL980 that ran hotplug stress for 3 hours
yesterday with pre-pushdown rwlocks, say the migrate_disable/enable
pushdown patches are very definitely busted.

Instead of whacking selective bits, as I did to verify that the rwlock
changes were indeed causing hotplug stress deadlock woes, I'm eyeballing
the lot, twiddling primitives to look like I think they should, after
which I'll let my boxen express their opinions of the result.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-26 Thread Fernando Lopez-Lezcano


On 04/11/2014 11:57 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.14-rt1 patch setty).

Changes since v3.12.15-rt25
- I dropped the sparc64 patches I had in the queue. They did not apply
   cleanly, the code in v3.14 changed in the MMU area. Here is where I
   remembered that it was not working perfectly either.


Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I have 
seen something similar in 3.12.x-r):


Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut here 
]
Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID: 
2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption. 
prev->next should be 8802101196a0, but was 0001
Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in: fuse 
ipt_MASQUERADE xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack 
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_security ip6table_raw rfcomm ip6table_filter 
bnep ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iTCO_wdt 
iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel uvcvideo videobuf2_vmalloc microcode 
videobuf2_memops snd_hda_codec_hdmi videobuf2_core videodev media 
serio_raw btusb bluetooth intel_ips i2c_i801 6lowpan_iphc 
snd_hda_codec_conexant snd_hda_codec_generic arc4 iwldvm mac80211 
iwlwifi lpc_ich sdhci_pci mfd_core sdhci cfg80211 mmc_core snd_hda_intel 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_timer 
ptp mei_me pps_core mei shpchp thinkpad_acpi snd ppdev soundcore rfkill 
parport_pc parport acpi_cpufreq uinput firewire_ohci nouveau 
firewire_core crc_itu_t i2c_algo_bit drm_kms_helper ttm drm mxm_wmi 
i2c_core wmi video
Apr 26 11:16:11 localhost kernel: [   96.323331] CPU: 0 PID: 2051 Comm: 
cinnamon Not tainted 3.14.1-200.rt1.1.fc19.ccrma.x86_64+rt #1
Apr 26 11:16:11 localhost kernel: [   96.323332] Hardware name: LENOVO 
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Apr 26 11:16:11 localhost kernel: [   96.323334]   
8a5c11dc 8800ae715a88 81707fca
Apr 26 11:16:11 localhost kernel: [   96.323336]  8800ae715ad0 
8800ae715ac0 8108d03d 8802101196a0
Apr 26 11:16:11 localhost kernel: [   96.323337]  880210119b50 
880210119b50 880210119b40 88021a615648

Apr 26 11:16:11 localhost kernel: [   96.323338] Call Trace:
Apr 26 11:16:11 localhost kernel: [   96.323345]  [] 
dump_stack+0x4d/0x82
Apr 26 11:16:11 localhost kernel: [   96.323351]  [] 
warn_slowpath_common+0x7d/0xc0
Apr 26 11:16:11 localhost kernel: [   96.323352]  [] 
warn_slowpath_fmt+0x5c/0x80
Apr 26 11:16:11 localhost kernel: [   96.323354]  [] 
__list_del_entry+0xa1/0xd0
Apr 26 11:16:11 localhost kernel: [   96.323355]  [] 
list_del+0xd/0x30
Apr 26 11:16:11 localhost kernel: [   96.323393]  [] 
nouveau_fence_signal+0x53/0x80 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323414]  [] 
nouveau_fence_update+0x48/0xa0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323435]  [] 
nouveau_fence_sync+0x45/0x80 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323456]  [] 
validate_list+0xd8/0x2e0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323478]  [] 
nouveau_gem_ioctl_pushbuf+0xaa3/0x13e0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323500]  [] 
drm_ioctl+0x4f2/0x620 [drm]
Apr 26 11:16:11 localhost kernel: [   96.323506]  [] ? 
migrate_enable+0x94/0x1c0
Apr 26 11:16:11 localhost kernel: [   96.323527]  [] 
nouveau_drm_ioctl+0x4e/0x90 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323530]  [] 
do_vfs_ioctl+0x2e0/0x4c0
Apr 26 11:16:11 localhost kernel: [   96.323533]  [] ? 
file_has_perm+0xa6/0xb0
Apr 26 11:16:11 localhost kernel: [   96.323535]  [] 
SyS_ioctl+0x81/0xa0
Apr 26 11:16:11 localhost kernel: [   96.323538]  [] 
system_call_fastpath+0x16/0x1b
Apr 26 11:16:11 localhost kernel: [   96.323569] ---[ end trace 
0002 ]---


-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-26 Thread Mike Galbraith

On Sat, 2014-04-26 at 10:38 +0200, Mike Galbraith wrote: 
> On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote:
> 
> > Hotplug can still deadlock in rt trees too, and will if you beat it
> > hard.
> 
> Box actually deadlocks like so.

...

3.12-rt looks a bit busted migrate_disable/enable() wise.

/me eyeballs 3.10-rt (looks better), confirms 3.10-rt hotplug works,
swipes working code, confirms 3.12-rt now works.  Yup, that was it.

When I fix lg_global_lock() (I think it and Medusa are both busted) I
bet a nickle 3.14-rt will work.

Hm, actually, rt_write_trylock() in swiped 3.10-rt code below (and some
others) look busted to me.  migrate_disable() _after_ grabbing a lock is
too late, no?

---
 include/linux/rwlock_rt.h |   32 
 kernel/rt.c   |   21 +++--
 2 files changed, 39 insertions(+), 14 deletions(-)

--- a/include/linux/rwlock_rt.h
+++ b/include/linux/rwlock_rt.h
@@ -33,50 +33,72 @@ extern void __rt_rwlock_init(rwlock_t *r
 #define read_lock_irqsave(lock, flags) \
do {\
typecheck(unsigned long, flags);\
+   migrate_disable();  \
flags = rt_read_lock_irqsave(lock); \
} while (0)
 
 #define write_lock_irqsave(lock, flags)\
do {\
typecheck(unsigned long, flags);\
+   migrate_disable();  \
flags = rt_write_lock_irqsave(lock);\
} while (0)
 
-#define read_lock(lock)rt_read_lock(lock)
+#define read_lock(lock)\
+   do {\
+   migrate_disable();  \
+   rt_read_lock(lock); \
+   } while (0)
 
 #define read_lock_bh(lock) \
do {\
local_bh_disable(); \
+   migrate_disable();  \
rt_read_lock(lock); \
} while (0)
 
 #define read_lock_irq(lock)read_lock(lock)
 
-#define write_lock(lock)   rt_write_lock(lock)
+#define write_lock(lock)   \
+   do {\
+   migrate_disable();  \
+   rt_write_lock(lock);\
+   } while (0)
 
 #define write_lock_bh(lock)\
do {\
local_bh_disable(); \
+   migrate_disable();  \
rt_write_lock(lock);\
} while (0)
 
 #define write_lock_irq(lock)   write_lock(lock)
 
-#define read_unlock(lock)  rt_read_unlock(lock)
+#define read_unlock(lock)  \
+   do {\
+   rt_read_unlock(lock);   \
+   migrate_enable();   \
+   } while (0)
 
 #define read_unlock_bh(lock)   \
do {\
rt_read_unlock(lock);   \
+   migrate_enable();   \
local_bh_enable();  \
} while (0)
 
 #define read_unlock_irq(lock)  read_unlock(lock)
 
-#define write_unlock(lock) rt_write_unlock(lock)
+#define write_unlock(lock) \
+   do {\
+   rt_write_unlock(lock);  \
+   migrate_enable();   \
+   } while (0)
 
 #define write_unlock_bh(lock)  \
do {\
rt_write_unlock(lock);  \
+   migrate_enable();   \
local_bh_enable();  \
} while (0)
 
@@ -87,6 +109,7 @@ extern void __rt_rwlock_init(rwlock_t *r
typecheck(unsigned long, flags);\
(void) flags;   \
rt_read_unlock(lock);   \
+   migrate_enable();   \
} while (0)
 
 #define write_unlock_irqrestore(lock, flags) \
@@ -94,6 +117,7 @@ extern void __rt_rwlock_init(rwlock_t *r
typecheck(unsigned long, flags);\
(void) flags;   \
rt_write_unlock(lock);  \
+   migrate_enable();   \
} while (0)
 
 #endif
--- a/kernel/rt.c
+++ b/kernel

Re: [ANNOUNCE] 3.14-rt1

2014-04-26 Thread Mike Galbraith

On Fri, 2014-04-25 at 09:40 +0200, Mike Galbraith wrote:

> Hotplug can still deadlock in rt trees too, and will if you beat it
> hard.

Box actually deadlocks like so.

CPU3 boot.kdump
sys_wait4
do_wait
   read_lock(&tasklist_lock)
  rt_read_lock
 __rt_spin_lock(lock)
migrate_disable()
   pin_current_cpu()
  if 
(hp->grab_lock) {

preempt_enable(); <== hmm

hotplug_lock(hp); 
   hp = 
&__get_cpu_var(hotplug_pcp);  <== hmm
   struct 
hotplug_pcp {

unplug = 0x8800b7d0e540,

sync_tsk = 0x0,

refcount = 0, <== hmm

grab_lock = 1,
  ...
  lock 
= {
 ...

owner = 0x8802039f0001,

stress-cpu-hotplug_stress.sh?!?

<=== he's way over yonder.

Yo, dude, would you please NOT

take percpu locks with you?

CPU0 stress-cpu-hotplug_stress.sh
sysfs_write_file
dev_attr_store
online_store
device_offline
cpu_subsys_offline
cpu_down
_cpu_down
   cpu_hotplug_begin
   mutex_lock(&cpu_hotplug.lock);
   ...
  check_for_tasks
 write_lock_irq(&tasklist_lock);
 held by CPU3 boot.kdump over there ===>

CPU0 kworker/0:0
cpuset_hotplug_workfn+0x23e/0x380
rebuild_sched_domains+0x15/0x30
rebuild_sched_domains_locked+0x17/0x80
get_online_cpus+0x35/0x50
   mutex_lock(&cpu_hotplug.lock);
   held by stress-cpu-hotplug_stress.sh

twiddle twiddle twiddle...
INFO: task kworker/0:0:4 blocked for more than 120 seconds.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-25 Thread Mike Galbraith

On Sat, 2014-04-19 at 16:46 +0200, Mike Galbraith wrote: 
> Hi Sebastian,
> 
> On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: 
> > Dear RT folks!
> > 
> > I'm pleased to announce the v3.14-rt1 patch set.
> 
> This hunk in hotplug-light-get-online-cpus.patch looks like a bug.
> 
> @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
> /* CPU didn't die: tell everyone.  Can't complain. */
> smpboot_unpark_threads(cpu);
> cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> -   goto out_release;
> +   goto out_cancel;
> }
> BUG_ON(cpu_online(cpu));

...

BTW, the reason I was eyeballing this stuff is because I was highly
interested in what you were going to do here...

# XXX stomp-machine-deal-clever-with-stopper-lock.patch

...with that bloody lglock.  What I did is attached for your amusement.
(warning: viewing may induce "Medussa" syndrome:)

Hotplug can still deadlock in rt trees too, and will if you beat it
hard.  The splat below is virgin 3.12-rt (where wonderful lock doesn't
yet exist) while running Stevens stress-cpu-hotplug.sh, which is still
plenty deadly when liberally applied.

[  161.951908] CPU0 attaching NULL sched-domain.
[  161.970417] CPU2 attaching NULL sched-domain.
[  161.976594] CPU3 attaching NULL sched-domain.
[  161.981044] CPU0 attaching sched-domain:
[  161.985010]  domain 0: span 0,3 level CPU
[  161.990627]   groups: 0 (cpu_power = 997) 3 (cpu_power = 1021)
[  162.000609] CPU3 attaching sched-domain:
[  162.007723]  domain 0: span 0,3 level CPU
[  162.012756]   groups: 3 (cpu_power = 1021) 0 (cpu_power = 997)
[  162.025533] smpboot: CPU 2 is now offline
[  162.036113] 
[  162.036114] ==
[  162.036115] [ INFO: possible circular locking dependency detected ]
[  162.036116] 3.12.17-rt25 #14 Not tainted
[  162.036117] ---
[  162.036118] boot.kdump/6853 is trying to acquire lock:
[  162.036126]  (&hp->lock){+.+...}, at: [] 
pin_current_cpu+0x84/0x1d0
[  162.036126] 
[  162.036126] but task is already holding lock:
[  162.036131]  (&mm->mmap_sem){+.}, at: [] 
__do_page_fault+0x14c/0x5d0
[  162.036132] 
[  162.036132] which lock already depends on the new lock.
[  162.036132] 
[  162.036133] 
[  162.036133] the existing dependency chain (in reverse order) is:
[  162.036135] 
[  162.036135] -> #2 (&mm->mmap_sem){+.}:
[  162.036138][] check_prevs_add+0xf8/0x180
[  162.036140][] validate_chain.isra.45+0x5aa/0x750
[  162.036142][] __lock_acquire+0x3f6/0x9f0
[  162.036143][] lock_acquire+0x8c/0x160
[  162.036146][] might_fault+0x83/0xb0
[  162.036149][] sel_loadlut+0x11/0x70
[  162.036152][] tioclinux+0x23d/0x2c0
[  162.036153][] vt_ioctl+0x86c/0x11f0
[  162.036155][] tty_ioctl+0x2a8/0x940
[  162.036158][] do_vfs_ioctl+0x81/0x340
[  162.036159][] SyS_ioctl+0x4b/0x90
[  162.036162][] system_call_fastpath+0x16/0x1b
[  162.036164] 
[  162.036164] -> #1 (console_lock){+.+.+.}:
[  162.036165][] check_prevs_add+0xf8/0x180
[  162.036167][] validate_chain.isra.45+0x5aa/0x750
[  162.036169][] __lock_acquire+0x3f6/0x9f0
[  162.036171][] lock_acquire+0x8c/0x160
[  162.036173][] console_lock+0x6f/0x80
[  162.036174][] console_cpu_notify+0x1d/0x30
[  162.036176][] notifier_call_chain+0x4d/0x70
[  162.036179][] __raw_notifier_call_chain+0x9/0x10
[  162.036181][] __cpu_notify+0x1b/0x30
[  162.036182][] cpu_notify_nofail+0x10/0x20
[  162.036185][] _cpu_down+0x20d/0x440
[  162.036186][] cpu_down+0x30/0x50
[  162.036188][] cpu_subsys_offline+0x1c/0x30
[  162.036191][] device_offline+0x95/0xc0
[  162.036192][] online_store+0x40/0x80
[  162.036194][] dev_attr_store+0x13/0x30
[  162.036197][] sysfs_write_file+0xf0/0x170
[  162.036200][] vfs_write+0xc8/0x1d0
[  162.036202][] SyS_write+0x50/0xa0
[  162.036203][] system_call_fastpath+0x16/0x1b
[  162.036205] 
[  162.036205] -> #0 (&hp->lock){+.+...}:
[  162.036207][] check_prev_add+0x7bd/0x7d0
[  162.036209][] check_prevs_add+0xf8/0x180
[  162.036210][] validate_chain.isra.45+0x5aa/0x750
[  162.036212][] __lock_acquire+0x3f6/0x9f0
[  162.036214][] lock_acquire+0x8c/0x160
[  162.036216][] rt_spin_lock+0x55/0x70
[  162.036218][] pin_current_cpu+0x84/0x1d0
[  162.036220][] migrate_disable+0x81/0x100
[  162.036222][] handle_pte_fault+0xf8/0x1c0
[  162.036223][] __handle_mm_fault+0x106/0x1b0
[  162.036225][] handle_mm_fault+0x22/0x30
[  162.036227][] __do_page_fault+0x1b1/0x5d0
[  162.036229][] do_page_fault+0x9/0x10
[  162.036230][] page_fault+0x22/0x30
[  162.036232][] ret_from_f

Re: [ANNOUNCE] 3.14-rt1

2014-04-24 Thread Mike Galbraith

On Thu, 2014-04-24 at 09:12 +0200, Sebastian Andrzej Siewior wrote: 
> On 04/24/2014 06:06 AM, Mike Galbraith wrote:
> > Turning lockdep on, it says it's busted.
> 
> http://www.spinics.net/lists/linux-rt-users/msg11179.html

I was heading toward the same conclusion while regression testing.
Guess I can stop that.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-24 Thread Sebastian Andrzej Siewior

On 04/24/2014 06:06 AM, Mike Galbraith wrote:
> Turning lockdep on, it says it's busted.

http://www.spinics.net/lists/linux-rt-users/msg11179.html

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-23 Thread Mike Galbraith

Turning lockdep on, it says it's busted.

(I'll go stare at it, maybe the beast will blink first for a change)

[0.00] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[0.00] ... MAX_LOCKDEP_SUBCLASSES:  8
[0.00] ... MAX_LOCK_DEPTH:  48
[0.00] ... MAX_LOCKDEP_KEYS:8191
[0.00] ... CLASSHASH_SIZE:  4096
[0.00] ... MAX_LOCKDEP_ENTRIES: 16384
[0.00] ... MAX_LOCKDEP_CHAINS:  32768
[0.00] ... CHAINHASH_SIZE:  16384
[0.00]  memory used by lock dependency info: 6367 kB
[0.00]  per task-struct memory footprint: 2688 bytes
[0.00] 
[0.00] | Locking API testsuite:
[0.00] 

[0.00]  | spin |wlock |rlock |mutex | 
wsem | rsem |
[0.00]   
--
[0.00]  A-A deadlock:  ok  |  ok  |FAILED|
[0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16
[0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[0.00]  0002 81a01f28 815e12a5 
810b7727
[0.00]  0001 81a01f58 815e1db2 

[0.00]     
81a01f68
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4f/0x7c
[0.00]  [] ? console_trylock_for_printk+0x37/0xf0
[0.00]  [] dotest+0x5f/0xc7
[0.00]  [] locking_selftest+0xdf/0xb30
[0.00]  [] start_kernel+0x215/0x327
[0.00]  [] ? repair_env_string+0x5a/0x5a
[0.00]  [] ? memblock_reserve+0x49/0x4e
[0.00]  [] x86_64_start_reservations+0x2a/0x2c
[0.00]  [] x86_64_start_kernel+0xf0/0xf7
[0.00]   ok  |  ok  |  ok  |
[0.00]  A-B-B-A deadlock:  ok  |  ok  |FAILED|
[0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16
[0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[0.00]  0002 81a01f28 815e12a5 
810b7727
[0.00]  0001 81a01f58 815e1db2 

[0.00]     
81a01f68
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4f/0x7c
[0.00]  [] ? console_trylock_for_printk+0x37/0xf0
[0.00]  [] dotest+0x5f/0xc7
[0.00]  [] locking_selftest+0x16e/0xb30
[0.00]  [] start_kernel+0x215/0x327
[0.00]  [] ? repair_env_string+0x5a/0x5a
[0.00]  [] ? memblock_reserve+0x49/0x4e
[0.00]  [] x86_64_start_reservations+0x2a/0x2c
[0.00]  [] x86_64_start_kernel+0xf0/0xf7
[0.00]   ok  |  ok  |  ok  |
[0.00]  A-B-B-C-C-A deadlock:  ok  |  ok  |FAILED|
[0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16
[0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[0.00]  0002 81a01f28 815e12a5 
810b7727
[0.00]  0001 81a01f58 815e1db2 

[0.00]     
81a01f68
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4f/0x7c
[0.00]  [] ? console_trylock_for_printk+0x37/0xf0
[0.00]  [] dotest+0x5f/0xc7
[0.00]  [] locking_selftest+0x1fd/0xb30
[0.00]  [] start_kernel+0x215/0x327
[0.00]  [] ? repair_env_string+0x5a/0x5a
[0.00]  [] ? memblock_reserve+0x49/0x4e
[0.00]  [] x86_64_start_reservations+0x2a/0x2c
[0.00]  [] x86_64_start_kernel+0xf0/0xf7
[0.00]   ok  |  ok  |  ok  |
[0.00]  A-B-C-A-B-C deadlock:  ok  |  ok  |FAILED|
[0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.1-rt1 #16
[0.00] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[0.00]  0002 81a01f28 815e12a5 
810b7727
[0.00]  0001 81a01f58 815e1db2 

[0.00]     
81a01f68
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4f/0x7c
[0.00]  [] ? console_trylock_for_printk+0x37/0xf0
[0.00]  [] dotest+0x5f/0xc7
[0.00]  [] locking_selftest+0x28c/0xb30
[0.00]  [] start_kernel+0x215/0x327
[0.00]  [] ? repair_env_string+0x5a/0x5a
[0.00]  [] ? memblock_reserve+0x49/0x4e
[0.00]  [] x86_64_start_reservations+0x2a/0x2c
[0.00]  [] x86_64_start_kernel+0xf0/0xf7
[0.00]   ok  |  ok  |  ok  |
[0.00]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |FAILED|
[0.00] CPU: 0 PID: 0 C

Re: [ANNOUNCE] 3.14-rt1

2014-04-23 Thread Steven Rostedt

On Wed, 23 Apr 2014 12:37:05 +0200
Mike Galbraith  wrote:

> On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote:
> 
> > This -RT series didn't crashed within ~4h testing on my ARM and
> > x86-32.
> > x86-64 crashed after I started hackbench. I figured out that the crash
> > does not happen with lazy-preempt disabled. Therefore the last but one
> > patch in the queue disables lazy preempt on x86-64. With this change the
> > test box survived ~2h without a crash. I look at this later but it looks
> > good now.
> 
> I think the below fixes it (in a more or less minimalist way), but it's
> not very pretty.  Methinks it would be prettier to either clone the x86
> percpu + fold logic, or neutralize that optimization completely when
> PREEMPT_LAZY is enabled.
> 
> x86_32 bit is completely untested, x86_64 hasn't exploded.. yet :) 
> 

This patch makes sense to me.

Acked-by: Steven Rostedt 

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-23 Thread Mike Galbraith

On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote:

> This -RT series didn't crashed within ~4h testing on my ARM and
> x86-32.
> x86-64 crashed after I started hackbench. I figured out that the crash
> does not happen with lazy-preempt disabled. Therefore the last but one
> patch in the queue disables lazy preempt on x86-64. With this change the
> test box survived ~2h without a crash. I look at this later but it looks
> good now.

I think the below fixes it (in a more or less minimalist way), but it's
not very pretty.  Methinks it would be prettier to either clone the x86
percpu + fold logic, or neutralize that optimization completely when
PREEMPT_LAZY is enabled.

x86_32 bit is completely untested, x86_64 hasn't exploded.. yet :) 

---
 include/linux/preempt.h|3 +--
 arch/x86/include/asm/preempt.h |8 
 arch/x86/kernel/asm-offsets.c  |1 +
 arch/x86/kernel/entry_32.S |9 ++---
 arch/x86/kernel/entry_64.S |7 +--
 5 files changed, 21 insertions(+), 7 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -126,8 +126,7 @@ do { \
 #define preempt_enable_notrace() \
 do { \
barrier(); \
-   if (unlikely(__preempt_count_dec_and_test() || \
-   test_thread_flag(TIF_NEED_RESCHED_LAZY))) \
+   if (unlikely(__preempt_count_dec_and_test())) \
__preempt_schedule_context(); \
 } while (0)
 #else
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -94,7 +94,11 @@ static __always_inline bool __preempt_co
 {
if (preempt_count_dec_and_test())
return true;
+#ifdef CONFIG_PREEMPT_LAZY
return test_thread_flag(TIF_NEED_RESCHED_LAZY);
+#else
+   return false;
+#endif
 }
 
 /*
@@ -102,8 +106,12 @@ static __always_inline bool __preempt_co
  */
 static __always_inline bool should_resched(void)
 {
+#ifdef CONFIG_PREEMPT_LAZY
return unlikely(!__this_cpu_read_4(__preempt_count) || \
test_thread_flag(TIF_NEED_RESCHED_LAZY));
+#else
+   return unlikely(!__this_cpu_read_4(__preempt_count));
+#endif
 }
 
 #ifdef CONFIG_PREEMPT
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -72,4 +72,5 @@ void common(void) {
 
BLANK();
DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
+   DEFINE(_PREEMPT_ENABLED, PREEMPT_ENABLED);
 }
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -365,19 +365,22 @@ ENTRY(resume_kernel)
 need_resched:
# preempt count == 0 + NEED_RS set?
cmpl $0,PER_CPU_VAR(__preempt_count)
+#ifndef CONFIG_PREEMPT_LAZY
+   jnz restore_all
+#else
jz test_int_off
 
# atleast preempt count == 0 ?
-   cmpl $_TIF_NEED_RESCHED,PER_CPU_VAR(__preempt_count)
+   cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count)
jne restore_all
 
cmpl $0,TI_preempt_lazy_count(%ebp) # non-zero preempt_lazy_count ?
jnz restore_all
 
-   testl $_TIF_NEED_RESCHED_LAZY, %ecx
+   testl $_TIF_NEED_RESCHED_LAZY, TI_flags(%ebp)
jz restore_all
-
 test_int_off:
+#endif
testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)# interrupts off (exception 
path) ?
jz restore_all
call preempt_schedule_irq
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1104,10 +1104,13 @@ ENTRY(native_iret)
/* rcx:  threadinfo. interrupts off. */
 ENTRY(retint_kernel)
cmpl $0,PER_CPU_VAR(__preempt_count)
+#ifndef CONFIG_PREEMPT_LAZY
+   jnz  retint_restore_args
+#else
jz  check_int_off
 
# atleast preempt count == 0 ?
-   cmpl $_TIF_NEED_RESCHED,PER_CPU_VAR(__preempt_count)
+   cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count)
jnz retint_restore_args
 
cmpl $0, TI_preempt_lazy_count(%rcx)
@@ -1115,8 +1118,8 @@ ENTRY(retint_kernel)
 
bt $TIF_NEED_RESCHED_LAZY,TI_flags(%rcx)
jnc  retint_restore_args
-
 check_int_off:
+#endif
bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
jnc  retint_restore_args
call preempt_schedule_irq


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-20 Thread Mike Galbraith

On Sat, 2014-04-19 at 16:46 +0200, Mike Galbraith wrote: 
> Hi Sebastian,
> 
> On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: 
> > Dear RT folks!
> > 
> > I'm pleased to announce the v3.14-rt1 patch set.
> 
> This hunk in hotplug-light-get-online-cpus.patch looks like a bug.
> 
> @@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
> /* CPU didn't die: tell everyone.  Can't complain. */
> smpboot_unpark_threads(cpu);
> cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> -   goto out_release;
> +   goto out_cancel;
> }
> BUG_ON(cpu_online(cpu));
> 

Another little bug.  This hunk of patches/stomp-machine-raw-lock.patch
should be while (atomic_read(&done.nr_todo)) 

@@ -647,7 +671,7 @@ int stop_machine_from_inactive_cpu(int (
ret = multi_cpu_stop(&msdata);

/* Busy wait for completion. */
-   while (!completion_done(&done.completion))
+   while (!atomic_read(&done.nr_todo))
cpu_relax();

mutex_unlock(&stop_cpus_mutex);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-19 Thread Mike Galbraith

Hi Sebastian,

On Fri, 2014-04-11 at 20:57 +0200, Sebastian Andrzej Siewior wrote: 
> Dear RT folks!
> 
> I'm pleased to announce the v3.14-rt1 patch set.

This hunk in hotplug-light-get-online-cpus.patch looks like a bug.

@@ -333,7 +449,7 @@ static int __ref _cpu_down(unsigned int
/* CPU didn't die: tell everyone.  Can't complain. */
smpboot_unpark_threads(cpu);
cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
-   goto out_release;
+   goto out_cancel;
}
BUG_ON(cpu_online(cpu));


> x86-64 crashed after I started hackbench. I figured out that the crash
> does not happen with lazy-preempt disabled. Therefore the last but one
> patch in the queue disables lazy preempt on x86-64. With this change the
> test box survived ~2h without a crash. I look at this later but it looks
> good now.

Ah, I had trouble there a while back too.  I'll try to scrape up cycles
for a round 2, see who begs for mercy this time, it or me again.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-11 Thread Pavel Vasilyev


11.04.2014 22:57, Sebastian Andrzej Siewior пишет:

Dear RT folks!

I'm pleased to announce the v3.14-rt1 patch set.


Hray!



--

 Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

40 matches

Mail list logo