Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread Juergen Gross
On 18/05/16 17:45, David Vrabel wrote:
> On 18/05/16 16:42, Juergen Gross wrote:
>> On 18/05/16 17:25, Boris Ostrovsky wrote:
>>> On 05/18/2016 10:53 AM, Juergen Gross wrote:
 On 18/05/16 16:46, Boris Ostrovsky wrote:
> On 05/18/2016 08:15 AM, Juergen Gross wrote:
>>  }
>>  
>> +void __init xen_time_setup_guest(void)
>> +{
>> +pv_time_ops.steal_clock = xen_steal_clock;
>> +
>> +static_key_slow_inc(_steal_enabled);
>> +/*
>> + * We can't set paravirt_steal_rq_enabled as this would require 
>> the
>> + * capability to read another cpu's runstate info.
>> + */
>> +}
> Won't we be accounting for stolen cycles twice now --- once from
> steal_account_process_tick()->steal_clock() and second time from
> do_stolen_accounting()?
 Uuh, yes.

 I guess I should rip do_stolen_accounting() out, too? 
>>>
>>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If
>>
>> This is easy to accomplish. :-)
>>
>>> that's indeed the case then we should ifndef do_stolen_accounting(). Or
>>> maybe check for paravirt_steal_enabled.
>>
>> Is this really a sensible thing to do? There is a mechanism used by KVM
>> to do the stolen accounting. I think we should use it instead of having
>> a second implementation doing the same thing in case the generic one
>> isn't enabled.
> 
> I agree.
> 
> Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I
> don't think it's essential (or is it?).

Not doing so will change behavior in case I rip out
do_stolen_accounting(). What about "default y if XEN" ?


Juergen



Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread Juergen Gross
On 18/05/16 17:45, David Vrabel wrote:
> On 18/05/16 16:42, Juergen Gross wrote:
>> On 18/05/16 17:25, Boris Ostrovsky wrote:
>>> On 05/18/2016 10:53 AM, Juergen Gross wrote:
 On 18/05/16 16:46, Boris Ostrovsky wrote:
> On 05/18/2016 08:15 AM, Juergen Gross wrote:
>>  }
>>  
>> +void __init xen_time_setup_guest(void)
>> +{
>> +pv_time_ops.steal_clock = xen_steal_clock;
>> +
>> +static_key_slow_inc(_steal_enabled);
>> +/*
>> + * We can't set paravirt_steal_rq_enabled as this would require 
>> the
>> + * capability to read another cpu's runstate info.
>> + */
>> +}
> Won't we be accounting for stolen cycles twice now --- once from
> steal_account_process_tick()->steal_clock() and second time from
> do_stolen_accounting()?
 Uuh, yes.

 I guess I should rip do_stolen_accounting() out, too? 
>>>
>>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If
>>
>> This is easy to accomplish. :-)
>>
>>> that's indeed the case then we should ifndef do_stolen_accounting(). Or
>>> maybe check for paravirt_steal_enabled.
>>
>> Is this really a sensible thing to do? There is a mechanism used by KVM
>> to do the stolen accounting. I think we should use it instead of having
>> a second implementation doing the same thing in case the generic one
>> isn't enabled.
> 
> I agree.
> 
> Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I
> don't think it's essential (or is it?).

Not doing so will change behavior in case I rip out
do_stolen_accounting(). What about "default y if XEN" ?


Juergen



Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread David Vrabel
On 18/05/16 16:42, Juergen Gross wrote:
> On 18/05/16 17:25, Boris Ostrovsky wrote:
>> On 05/18/2016 10:53 AM, Juergen Gross wrote:
>>> On 18/05/16 16:46, Boris Ostrovsky wrote:
 On 05/18/2016 08:15 AM, Juergen Gross wrote:
>  }
>  
> +void __init xen_time_setup_guest(void)
> +{
> + pv_time_ops.steal_clock = xen_steal_clock;
> +
> + static_key_slow_inc(_steal_enabled);
> + /*
> +  * We can't set paravirt_steal_rq_enabled as this would require the
> +  * capability to read another cpu's runstate info.
> +  */
> +}
 Won't we be accounting for stolen cycles twice now --- once from
 steal_account_process_tick()->steal_clock() and second time from
 do_stolen_accounting()?
>>> Uuh, yes.
>>>
>>> I guess I should rip do_stolen_accounting() out, too? 
>>
>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If
> 
> This is easy to accomplish. :-)
> 
>> that's indeed the case then we should ifndef do_stolen_accounting(). Or
>> maybe check for paravirt_steal_enabled.
> 
> Is this really a sensible thing to do? There is a mechanism used by KVM
> to do the stolen accounting. I think we should use it instead of having
> a second implementation doing the same thing in case the generic one
> isn't enabled.

I agree.

Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I
don't think it's essential (or is it?).

David


Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread David Vrabel
On 18/05/16 16:42, Juergen Gross wrote:
> On 18/05/16 17:25, Boris Ostrovsky wrote:
>> On 05/18/2016 10:53 AM, Juergen Gross wrote:
>>> On 18/05/16 16:46, Boris Ostrovsky wrote:
 On 05/18/2016 08:15 AM, Juergen Gross wrote:
>  }
>  
> +void __init xen_time_setup_guest(void)
> +{
> + pv_time_ops.steal_clock = xen_steal_clock;
> +
> + static_key_slow_inc(_steal_enabled);
> + /*
> +  * We can't set paravirt_steal_rq_enabled as this would require the
> +  * capability to read another cpu's runstate info.
> +  */
> +}
 Won't we be accounting for stolen cycles twice now --- once from
 steal_account_process_tick()->steal_clock() and second time from
 do_stolen_accounting()?
>>> Uuh, yes.
>>>
>>> I guess I should rip do_stolen_accounting() out, too? 
>>
>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If
> 
> This is easy to accomplish. :-)
> 
>> that's indeed the case then we should ifndef do_stolen_accounting(). Or
>> maybe check for paravirt_steal_enabled.
> 
> Is this really a sensible thing to do? There is a mechanism used by KVM
> to do the stolen accounting. I think we should use it instead of having
> a second implementation doing the same thing in case the generic one
> isn't enabled.

I agree.

Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I
don't think it's essential (or is it?).

David


Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-18 Thread Guenter Roeck
On Tue, May 17, 2016 at 10:01:37PM -0700, Florian Fainelli wrote:
> Le 17/05/2016 21:37, Guenter Roeck a écrit :
> > Hi,
> > 
> > my xtensa qemu tests crash in -next as follows.
> > 
> > [ ... ]
> > 
> > [9.366256] libphy: ethoc-mdio: probed
> > [9.367389]  (null): could not attach to PHY
> > [9.368555]  (null): failed to probe MDIO bus
> > [9.371540] Unable to handle kernel paging request at virtual address
> > 001c
> > [9.371540]  pc = d0320926, ra = 903209d1
> > [9.375358] Oops: sig: 11 [#1]
> > [9.376081] PREEMPT
> > [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted
> > 4.6.0-next-20160517 #1
> > [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
> > [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 
> >  d7f45c00 d7c31bd0
> > [9.382298] a08:     00060100
> > d04b0c10 d7f45dfc d7c31bb0
> > [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr:
> > 001c
> > [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar:
> > 0011
> > [9.388173]
> > Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00
> > 
> >d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00
> > d025befc
> >d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00
> > d7f45c34
> > [9.396652] Call Trace:
> > [9.397469]  [] __device_release_driver+0x7d/0x98
> > [9.398869]  [] device_release_driver+0x15/0x20
> > [9.400247]  [] bus_remove_device+0xc1/0xd4
> > [9.401569]  [] device_del+0x109/0x15c
> > [9.402794]  [] phy_mdio_device_remove+0xd/0x18
> > [9.404124]  [] mdiobus_unregister+0x40/0x5c
> > [9.405444]  [] ethoc_probe+0x534/0x5b8
> > [9.406742]  [] platform_drv_probe+0x28/0x48
> > [9.408122]  [] driver_probe_device+0x101/0x234
> > [9.409499]  [] __driver_attach+0x7d/0x98
> > [9.410809]  [] bus_for_each_dev+0x30/0x5c
> > [9.412104]  [] driver_attach+0x14/0x18
> > [9.413385]  [] bus_add_driver+0xc9/0x198
> > [9.414686]  [] driver_register+0x70/0xa0
> > [9.416001]  [] __platform_driver_register+0x24/0x28
> > [9.417463]  [] ethoc_driver_init+0x10/0x14
> > [9.418824]  [] do_one_initcall+0x80/0x1ac
> > [9.420083]  [] kernel_init_freeable+0x131/0x198
> > [9.421504]  [] kernel_init+0xc/0xb0
> > [9.422693]  [] ret_from_kernel_thread+0x8/0xc
> > 
> > Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio
> > specification").
> > Bisect log is attached. Reverting the patch fixes the problem.
> 
> Aside from what you pointed out, this patch was still in dicussion when
> it got merged, since we got a concurrent patch from Sergei which tries
> to deal with the same kind of problem.
> 
> Do you mind sending a revert, or I can do that first thing in the morning.
> 
I don't think I'll find the time to do that today, and also I would like
to hear from Dave what his preferences are.

Note that this now also affects mainline.

Guenter


Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-18 Thread Guenter Roeck
On Tue, May 17, 2016 at 10:01:37PM -0700, Florian Fainelli wrote:
> Le 17/05/2016 21:37, Guenter Roeck a écrit :
> > Hi,
> > 
> > my xtensa qemu tests crash in -next as follows.
> > 
> > [ ... ]
> > 
> > [9.366256] libphy: ethoc-mdio: probed
> > [9.367389]  (null): could not attach to PHY
> > [9.368555]  (null): failed to probe MDIO bus
> > [9.371540] Unable to handle kernel paging request at virtual address
> > 001c
> > [9.371540]  pc = d0320926, ra = 903209d1
> > [9.375358] Oops: sig: 11 [#1]
> > [9.376081] PREEMPT
> > [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted
> > 4.6.0-next-20160517 #1
> > [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
> > [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 
> >  d7f45c00 d7c31bd0
> > [9.382298] a08:     00060100
> > d04b0c10 d7f45dfc d7c31bb0
> > [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr:
> > 001c
> > [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar:
> > 0011
> > [9.388173]
> > Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00
> > 
> >d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00
> > d025befc
> >d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00
> > d7f45c34
> > [9.396652] Call Trace:
> > [9.397469]  [] __device_release_driver+0x7d/0x98
> > [9.398869]  [] device_release_driver+0x15/0x20
> > [9.400247]  [] bus_remove_device+0xc1/0xd4
> > [9.401569]  [] device_del+0x109/0x15c
> > [9.402794]  [] phy_mdio_device_remove+0xd/0x18
> > [9.404124]  [] mdiobus_unregister+0x40/0x5c
> > [9.405444]  [] ethoc_probe+0x534/0x5b8
> > [9.406742]  [] platform_drv_probe+0x28/0x48
> > [9.408122]  [] driver_probe_device+0x101/0x234
> > [9.409499]  [] __driver_attach+0x7d/0x98
> > [9.410809]  [] bus_for_each_dev+0x30/0x5c
> > [9.412104]  [] driver_attach+0x14/0x18
> > [9.413385]  [] bus_add_driver+0xc9/0x198
> > [9.414686]  [] driver_register+0x70/0xa0
> > [9.416001]  [] __platform_driver_register+0x24/0x28
> > [9.417463]  [] ethoc_driver_init+0x10/0x14
> > [9.418824]  [] do_one_initcall+0x80/0x1ac
> > [9.420083]  [] kernel_init_freeable+0x131/0x198
> > [9.421504]  [] kernel_init+0xc/0xb0
> > [9.422693]  [] ret_from_kernel_thread+0x8/0xc
> > 
> > Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio
> > specification").
> > Bisect log is attached. Reverting the patch fixes the problem.
> 
> Aside from what you pointed out, this patch was still in dicussion when
> it got merged, since we got a concurrent patch from Sergei which tries
> to deal with the same kind of problem.
> 
> Do you mind sending a revert, or I can do that first thing in the morning.
> 
I don't think I'll find the time to do that today, and also I would like
to hear from Dave what his preferences are.

Note that this now also affects mainline.

Guenter


Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread Boris Ostrovsky
On 05/18/2016 10:53 AM, Juergen Gross wrote:
> On 18/05/16 16:46, Boris Ostrovsky wrote:
>> On 05/18/2016 08:15 AM, Juergen Gross wrote:
>>>  }
>>>  
>>> +void __init xen_time_setup_guest(void)
>>> +{
>>> +   pv_time_ops.steal_clock = xen_steal_clock;
>>> +
>>> +   static_key_slow_inc(_steal_enabled);
>>> +   /*
>>> +* We can't set paravirt_steal_rq_enabled as this would require the
>>> +* capability to read another cpu's runstate info.
>>> +*/
>>> +}
>> Won't we be accounting for stolen cycles twice now --- once from
>> steal_account_process_tick()->steal_clock() and second time from
>> do_stolen_accounting()?
> Uuh, yes.
>
> I guess I should rip do_stolen_accounting() out, too? 

I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If
that's indeed the case then we should ifndef do_stolen_accounting(). Or
maybe check for paravirt_steal_enabled.

-boris

> It is a
> Xen-specific hack, so I guess nobody will cry. Maybe it would be a
> good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then?
>
>
> Juergen
>




Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread Boris Ostrovsky
On 05/18/2016 10:53 AM, Juergen Gross wrote:
> On 18/05/16 16:46, Boris Ostrovsky wrote:
>> On 05/18/2016 08:15 AM, Juergen Gross wrote:
>>>  }
>>>  
>>> +void __init xen_time_setup_guest(void)
>>> +{
>>> +   pv_time_ops.steal_clock = xen_steal_clock;
>>> +
>>> +   static_key_slow_inc(_steal_enabled);
>>> +   /*
>>> +* We can't set paravirt_steal_rq_enabled as this would require the
>>> +* capability to read another cpu's runstate info.
>>> +*/
>>> +}
>> Won't we be accounting for stolen cycles twice now --- once from
>> steal_account_process_tick()->steal_clock() and second time from
>> do_stolen_accounting()?
> Uuh, yes.
>
> I guess I should rip do_stolen_accounting() out, too? 

I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If
that's indeed the case then we should ifndef do_stolen_accounting(). Or
maybe check for paravirt_steal_enabled.

-boris

> It is a
> Xen-specific hack, so I guess nobody will cry. Maybe it would be a
> good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then?
>
>
> Juergen
>




Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread Juergen Gross
On 18/05/16 17:25, Boris Ostrovsky wrote:
> On 05/18/2016 10:53 AM, Juergen Gross wrote:
>> On 18/05/16 16:46, Boris Ostrovsky wrote:
>>> On 05/18/2016 08:15 AM, Juergen Gross wrote:
  }
  
 +void __init xen_time_setup_guest(void)
 +{
 +  pv_time_ops.steal_clock = xen_steal_clock;
 +
 +  static_key_slow_inc(_steal_enabled);
 +  /*
 +   * We can't set paravirt_steal_rq_enabled as this would require the
 +   * capability to read another cpu's runstate info.
 +   */
 +}
>>> Won't we be accounting for stolen cycles twice now --- once from
>>> steal_account_process_tick()->steal_clock() and second time from
>>> do_stolen_accounting()?
>> Uuh, yes.
>>
>> I guess I should rip do_stolen_accounting() out, too? 
> 
> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If

This is easy to accomplish. :-)

> that's indeed the case then we should ifndef do_stolen_accounting(). Or
> maybe check for paravirt_steal_enabled.

Is this really a sensible thing to do? There is a mechanism used by KVM
to do the stolen accounting. I think we should use it instead of having
a second implementation doing the same thing in case the generic one
isn't enabled.

Juergen


Re: [PATCH] xen: add steal_clock support on x86

2016-05-18 Thread Juergen Gross
On 18/05/16 17:25, Boris Ostrovsky wrote:
> On 05/18/2016 10:53 AM, Juergen Gross wrote:
>> On 18/05/16 16:46, Boris Ostrovsky wrote:
>>> On 05/18/2016 08:15 AM, Juergen Gross wrote:
  }
  
 +void __init xen_time_setup_guest(void)
 +{
 +  pv_time_ops.steal_clock = xen_steal_clock;
 +
 +  static_key_slow_inc(_steal_enabled);
 +  /*
 +   * We can't set paravirt_steal_rq_enabled as this would require the
 +   * capability to read another cpu's runstate info.
 +   */
 +}
>>> Won't we be accounting for stolen cycles twice now --- once from
>>> steal_account_process_tick()->steal_clock() and second time from
>>> do_stolen_accounting()?
>> Uuh, yes.
>>
>> I guess I should rip do_stolen_accounting() out, too? 
> 
> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If

This is easy to accomplish. :-)

> that's indeed the case then we should ifndef do_stolen_accounting(). Or
> maybe check for paravirt_steal_enabled.

Is this really a sensible thing to do? There is a mechanism used by KVM
to do the stolen accounting. I think we should use it instead of having
a second implementation doing the same thing in case the generic one
isn't enabled.

Juergen


Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.

2016-05-18 Thread Matt Redfearn



On 18/05/16 16:04, Paul Burton wrote:

On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote:

When starting secondary VPEs which support EVA and the SegCtl registers,
copy the memory segmentation configuration from the running VPE to ensure
that all VPEs in the core have a consitent virtual memory map.

The EVA configuration of secondary cores is dealt with when starting the
core via the CM.

Signed-off-by: Matt Redfearn 
---

  arch/mips/kernel/cps-vec.S | 16 
  1 file changed, 16 insertions(+)

diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
index ac81edd44563..07b3274c8ae1 100644
--- a/arch/mips/kernel/cps-vec.S
+++ b/arch/mips/kernel/cps-vec.S
@@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes)
mfc0t0, CP0_CONFIG
mttc0   t0, CP0_CONFIG
  
+	/* Copy the EVA config from this VPE if the CPU supports it */

+   mfc0t0, CP0_CONFIG, 1
+   bgezt0, 1f
+mfc0   t0, CP0_CONFIG, 2
+   bgezt0, 1f
+mfc0   t0, CP0_CONFIG, 3
+   and t0, t0, MIPS_CONF3_SC
+   beqzt0, 1f
+nop

Hi Matt,

The checks here aren't *quite* right since they do the mfc0 of the next
register in the delay slot which will happen even if the M bit of the
preceeding register wasn't set. There are other cases in cps-vec.S where
I've made that mistake... Luckily, in this particular case, we know that
we have MT ASE support which means we know that Config3 exists. So I
think you can just remove the checks of Config1.M & Config2.M and just
read Config3 straight away.

Good point - thanks Paul :-)

Matt


Thanks,
 Paul


+   mfc0t0, CP0_SEGCTL0
+   mttc0   t0, CP0_SEGCTL0
+   mfc0t0, CP0_SEGCTL1
+   mttc0   t0, CP0_SEGCTL1
+   mfc0t0, CP0_SEGCTL2
+   mttc0   t0, CP0_SEGCTL2
+1:
/* Ensure no software interrupts are pending */
mttc0   zero, CP0_CAUSE
mttc0   zero, CP0_STATUS
--
2.5.0





Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.

2016-05-18 Thread Matt Redfearn



On 18/05/16 16:04, Paul Burton wrote:

On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote:

When starting secondary VPEs which support EVA and the SegCtl registers,
copy the memory segmentation configuration from the running VPE to ensure
that all VPEs in the core have a consitent virtual memory map.

The EVA configuration of secondary cores is dealt with when starting the
core via the CM.

Signed-off-by: Matt Redfearn 
---

  arch/mips/kernel/cps-vec.S | 16 
  1 file changed, 16 insertions(+)

diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
index ac81edd44563..07b3274c8ae1 100644
--- a/arch/mips/kernel/cps-vec.S
+++ b/arch/mips/kernel/cps-vec.S
@@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes)
mfc0t0, CP0_CONFIG
mttc0   t0, CP0_CONFIG
  
+	/* Copy the EVA config from this VPE if the CPU supports it */

+   mfc0t0, CP0_CONFIG, 1
+   bgezt0, 1f
+mfc0   t0, CP0_CONFIG, 2
+   bgezt0, 1f
+mfc0   t0, CP0_CONFIG, 3
+   and t0, t0, MIPS_CONF3_SC
+   beqzt0, 1f
+nop

Hi Matt,

The checks here aren't *quite* right since they do the mfc0 of the next
register in the delay slot which will happen even if the M bit of the
preceeding register wasn't set. There are other cases in cps-vec.S where
I've made that mistake... Luckily, in this particular case, we know that
we have MT ASE support which means we know that Config3 exists. So I
think you can just remove the checks of Config1.M & Config2.M and just
read Config3 straight away.

Good point - thanks Paul :-)

Matt


Thanks,
 Paul


+   mfc0t0, CP0_SEGCTL0
+   mttc0   t0, CP0_SEGCTL0
+   mfc0t0, CP0_SEGCTL1
+   mttc0   t0, CP0_SEGCTL1
+   mfc0t0, CP0_SEGCTL2
+   mttc0   t0, CP0_SEGCTL2
+1:
/* Ensure no software interrupts are pending */
mttc0   zero, CP0_CAUSE
mttc0   zero, CP0_STATUS
--
2.5.0





[PATCH V3] drm/tegra: Fix crash caused by reference count imbalance

2016-05-18 Thread Jon Hunter
Commit d2307dea14a4 ("drm/atomic: use connector references (v3)") added
reference counting for DRM connectors and this caused a crash when
exercising system suspend on Tegra114 Dalmore.

The Tegra DSI driver implements a Tegra specific function,
tegra_dsi_connector_duplicate_state(), to duplicate the connector state
and destroys the state using the generic helper function,
drm_atomic_helper_connector_destroy_state(). Following commit
d2307dea14a4 ("drm/atomic: use connector references (v3)") there is
now an imbalance in the connector reference count because the Tegra
function to duplicate state does not take a reference when duplicating
the state information. However, the generic helper function to destroy
the state information assumes a reference has been taken and during
system suspend, when the connector state is destroyed, this leads to a
crash because we attempt to put the reference for an object that has
already been freed.

Fix this by calling __drm_atomic_helper_connector_duplicate_state() from
tegra_dsi_connector_duplicate_state() to ensure that we take a reference
on a connector if crtc is set. Note that this will also copy the
connector state a 2nd time, but this should be harmless.

By fixing tegra_dsi_connector_duplicate_state() to take a reference,
although a crash was no longer seen, it was then observed that after
each system suspend-resume cycle, the reference would be one greater
than before the suspend-resume cycle. Following commit d2307dea14a4
("drm/atomic: use connector references (v3)"), it was found that we
also need to put the reference when calling the function
tegra_dsi_connector_reset() before freeing the state. Fix this by
updating tegra_dsi_connector_reset() to call the function
__drm_atomic_helper_connector_destroy_state() in order to put the
reference for the connector.

Fixes: d2307dea14a4 ("drm/atomic: use connector references (v3)")

Signed-off-by: Jon Hunter <jonath...@nvidia.com>
Reviewed-by: Daniel Vetter <daniel.vet...@ffwll.ch>
Acked-by: Thierry Reding <tred...@nvidia.com>
---

V3 changes:
- Dropped WARN_ON

V2 changes:
- Updated to next-20160518
- Replaced open coding of call to drm_connector_reference() with
  __drm_atomic_helper_connector_duplicate_state() per Daniel's feedback.

 drivers/gpu/drm/tegra/dsi.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c
index 44e102799195..d1239ebc190f 100644
--- a/drivers/gpu/drm/tegra/dsi.c
+++ b/drivers/gpu/drm/tegra/dsi.c
@@ -745,13 +745,17 @@ static void tegra_dsi_soft_reset(struct tegra_dsi *dsi)
 
 static void tegra_dsi_connector_reset(struct drm_connector *connector)
 {
-   struct tegra_dsi_state *state =
-   kzalloc(sizeof(*state), GFP_KERNEL);
+   struct tegra_dsi_state *state = kzalloc(sizeof(*state), GFP_KERNEL);
 
-   if (state) {
+   if (!state)
+   return;
+
+   if (connector->state) {
+   __drm_atomic_helper_connector_destroy_state(connector->state);
kfree(connector->state);
-   __drm_atomic_helper_connector_reset(connector, >base);
}
+
+   __drm_atomic_helper_connector_reset(connector, >base);
 }
 
 static struct drm_connector_state *
@@ -764,6 +768,9 @@ tegra_dsi_connector_duplicate_state(struct drm_connector 
*connector)
if (!copy)
return NULL;
 
+   __drm_atomic_helper_connector_duplicate_state(connector,
+ >base);
+
return >base;
 }
 
-- 
2.1.4



[PATCH V3] drm/tegra: Fix crash caused by reference count imbalance

2016-05-18 Thread Jon Hunter
Commit d2307dea14a4 ("drm/atomic: use connector references (v3)") added
reference counting for DRM connectors and this caused a crash when
exercising system suspend on Tegra114 Dalmore.

The Tegra DSI driver implements a Tegra specific function,
tegra_dsi_connector_duplicate_state(), to duplicate the connector state
and destroys the state using the generic helper function,
drm_atomic_helper_connector_destroy_state(). Following commit
d2307dea14a4 ("drm/atomic: use connector references (v3)") there is
now an imbalance in the connector reference count because the Tegra
function to duplicate state does not take a reference when duplicating
the state information. However, the generic helper function to destroy
the state information assumes a reference has been taken and during
system suspend, when the connector state is destroyed, this leads to a
crash because we attempt to put the reference for an object that has
already been freed.

Fix this by calling __drm_atomic_helper_connector_duplicate_state() from
tegra_dsi_connector_duplicate_state() to ensure that we take a reference
on a connector if crtc is set. Note that this will also copy the
connector state a 2nd time, but this should be harmless.

By fixing tegra_dsi_connector_duplicate_state() to take a reference,
although a crash was no longer seen, it was then observed that after
each system suspend-resume cycle, the reference would be one greater
than before the suspend-resume cycle. Following commit d2307dea14a4
("drm/atomic: use connector references (v3)"), it was found that we
also need to put the reference when calling the function
tegra_dsi_connector_reset() before freeing the state. Fix this by
updating tegra_dsi_connector_reset() to call the function
__drm_atomic_helper_connector_destroy_state() in order to put the
reference for the connector.

Fixes: d2307dea14a4 ("drm/atomic: use connector references (v3)")

Signed-off-by: Jon Hunter 
Reviewed-by: Daniel Vetter 
Acked-by: Thierry Reding 
---

V3 changes:
- Dropped WARN_ON

V2 changes:
- Updated to next-20160518
- Replaced open coding of call to drm_connector_reference() with
  __drm_atomic_helper_connector_duplicate_state() per Daniel's feedback.

 drivers/gpu/drm/tegra/dsi.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c
index 44e102799195..d1239ebc190f 100644
--- a/drivers/gpu/drm/tegra/dsi.c
+++ b/drivers/gpu/drm/tegra/dsi.c
@@ -745,13 +745,17 @@ static void tegra_dsi_soft_reset(struct tegra_dsi *dsi)
 
 static void tegra_dsi_connector_reset(struct drm_connector *connector)
 {
-   struct tegra_dsi_state *state =
-   kzalloc(sizeof(*state), GFP_KERNEL);
+   struct tegra_dsi_state *state = kzalloc(sizeof(*state), GFP_KERNEL);
 
-   if (state) {
+   if (!state)
+   return;
+
+   if (connector->state) {
+   __drm_atomic_helper_connector_destroy_state(connector->state);
kfree(connector->state);
-   __drm_atomic_helper_connector_reset(connector, >base);
}
+
+   __drm_atomic_helper_connector_reset(connector, >base);
 }
 
 static struct drm_connector_state *
@@ -764,6 +768,9 @@ tegra_dsi_connector_duplicate_state(struct drm_connector 
*connector)
if (!copy)
return NULL;
 
+   __drm_atomic_helper_connector_duplicate_state(connector,
+ >base);
+
return >base;
 }
 
-- 
2.1.4



Re: [PATCH v6 16/20] IB/fmr_pool: Convert the cleanup thread into kthread worker API

2016-05-18 Thread Doug Ledford
On 04/14/2016 11:14 AM, Petr Mladek wrote:
> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
> 
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
> 
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
> 
> This patch converts the frm_pool kthread into the kthread worker
> API because I am not sure how busy the thread is. It is well
> possible that it does not need a dedicated kthread and workqueues
> would be perfectly fine. Well, the conversion between kthread
> worker API and workqueues is pretty trivial.
> 
> The patch moves one iteration from the kthread into the work function.
> It preserves the check for a spurious queuing (wake up). Then it
> processes one request. Finally, it re-queues itself if more requests
> are pending.
> 
> Otherwise, wake_up_process() is replaced by queuing the work.
> 
> Important: The change is only compile tested. I did not find an easy
> way how to check it in a real life.

I had to do some digging myself to figure out how to move forward on
this patch.  The issue is that your conversion touches the fmr_pool code
as your target.  That code is slowly being phased out.  Right now, only
two drivers in the IB core support fmr: mthca and mlx4.  The generally
preferred method of mem management is fr instead of fmr.  The mlx4
driver support both fr and fmr, while the mthca driver is fmr only.  All
of the other drivers are fr only.  The only code that uses the fmr pools
are the upper layer iSER and SRP drivers.  So, if you have mthca
hardware, you can test fmr using either iSER or SRP clients.  If you
have mlx4 hardware, you can still test fmr by using the SRP client and
setting prefer_fr to false when you load the module.

Now that I know that, I can provide testing of this patch when the
overall patchset is ready to be submitted next.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 16/20] IB/fmr_pool: Convert the cleanup thread into kthread worker API

2016-05-18 Thread Doug Ledford
On 04/14/2016 11:14 AM, Petr Mladek wrote:
> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
> 
> The plan is to convert kthreads into kthread_worker or workqueues
> API. It allows to split the functionality into separate operations.
> It helps to make a better structure. Also it defines a clean state
> where no locks are taken, IRQs blocked, the kthread might sleep
> or even be safely migrated.
> 
> The kthread worker API is useful when we want to have a dedicated
> single thread for the work. It helps to make sure that it is
> available when needed. Also it allows a better control, e.g.
> define a scheduling priority.
> 
> This patch converts the frm_pool kthread into the kthread worker
> API because I am not sure how busy the thread is. It is well
> possible that it does not need a dedicated kthread and workqueues
> would be perfectly fine. Well, the conversion between kthread
> worker API and workqueues is pretty trivial.
> 
> The patch moves one iteration from the kthread into the work function.
> It preserves the check for a spurious queuing (wake up). Then it
> processes one request. Finally, it re-queues itself if more requests
> are pending.
> 
> Otherwise, wake_up_process() is replaced by queuing the work.
> 
> Important: The change is only compile tested. I did not find an easy
> way how to check it in a real life.

I had to do some digging myself to figure out how to move forward on
this patch.  The issue is that your conversion touches the fmr_pool code
as your target.  That code is slowly being phased out.  Right now, only
two drivers in the IB core support fmr: mthca and mlx4.  The generally
preferred method of mem management is fr instead of fmr.  The mlx4
driver support both fr and fmr, while the mthca driver is fmr only.  All
of the other drivers are fr only.  The only code that uses the fmr pools
are the upper layer iSER and SRP drivers.  So, if you have mthca
hardware, you can test fmr using either iSER or SRP clients.  If you
have mlx4 hardware, you can still test fmr by using the SRP client and
setting prefer_fr to false when you load the module.

Now that I know that, I can provide testing of this patch when the
overall patchset is ready to be submitted next.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


[RFC PATCH 10/15] x86: Use ISO atomics

2016-05-18 Thread David Howells
Make x86 use the ISO intrinsic atomics.

This boots fine, however it can't NOP out the LOCK prefixes if the number
of online CPUs is 1.

Without this patch, according to size -A, .text for my test kernel is:

.text 6268981   18446744071578845184

with this patch:

.text 6268277   18446744071578845184

There are still some underoptimisations to be dealt with.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/atomic.h |  224 +
 1 file changed, 4 insertions(+), 220 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 3e8674288198..bec5c111b82e 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -13,230 +13,14 @@
  * resource counting etc..
  */
 
-#define ATOMIC_INIT(i) { (i) }
-
-/**
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
-static __always_inline int atomic_read(const atomic_t *v)
-{
-   return READ_ONCE((v)->counter);
-}
-
-/**
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
-static __always_inline void atomic_set(atomic_t *v, int i)
-{
-   WRITE_ONCE(v->counter, i);
-}
-
-/**
- * atomic_add - add integer to atomic variable
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v.
- */
-static __always_inline void atomic_add(int i, atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "addl %1,%0"
-: "+m" (v->counter)
-: "ir" (i));
-}
-
-/**
- * atomic_sub - subtract integer from atomic variable
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v.
- */
-static __always_inline void atomic_sub(int i, atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "subl %1,%0"
-: "+m" (v->counter)
-: "ir" (i));
-}
-
-/**
- * atomic_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-static __always_inline int atomic_sub_and_test(int i, atomic_t *v)
-{
-   GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", "e");
-}
-
-/**
- * atomic_inc - increment atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1.
- */
-static __always_inline void atomic_inc(atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "incl %0"
-: "+m" (v->counter));
-}
-
-/**
- * atomic_dec - decrement atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1.
- */
-static __always_inline void atomic_dec(atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "decl %0"
-: "+m" (v->counter));
-}
-
-/**
- * atomic_dec_and_test - decrement and test
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-static __always_inline int atomic_dec_and_test(atomic_t *v)
-{
-   GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
-}
-
-/**
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-static __always_inline int atomic_inc_and_test(atomic_t *v)
-{
-   GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
-}
-
-/**
- * atomic_add_negative - add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-static __always_inline int atomic_add_negative(int i, atomic_t *v)
-{
-   GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", "s");
-}
-
-/**
- * atomic_add_return - add integer and return
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns @i + @v
- */
-static __always_inline int atomic_add_return(int i, atomic_t *v)
-{
-   return i + xadd(>counter, i);
-}
-
-/**
- * atomic_sub_return - subtract integer and return
- * @v: pointer of type atomic_t
- * @i: integer value to subtract
- *
- * Atomically subtracts @i from @v and returns @v - @i
- */
-static __always_inline int atomic_sub_return(int i, atomic_t *v)
-{
-   return atomic_add_return(-i, v);
-}
-
-#define atomic_inc_return(v)  (atomic_add_return(1, v))
-#define atomic_dec_return(v)  (atomic_sub_return(1, v))
-
-static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
-{
-   return cmpxchg(>counter, old, new);
-}
-
-static inline int 

[RFC PATCH 06/15] Provide 16-bit ISO atomics

2016-05-18 Thread David Howells
Provide an implementation of atomic_inc_short() using ISO atomics.

Signed-off-by: David Howells 
---

 include/asm-generic/iso-atomic16.h |   27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 include/asm-generic/iso-atomic16.h

diff --git a/include/asm-generic/iso-atomic16.h 
b/include/asm-generic/iso-atomic16.h
new file mode 100644
index ..383baaae7208
--- /dev/null
+++ b/include/asm-generic/iso-atomic16.h
@@ -0,0 +1,27 @@
+/* Use ISO C++11 intrinsics to implement 16-bit atomic ops.
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _ASM_GENERIC_ISO_ATOMIC16_H
+#define _ASM_GENERIC_ISO_ATOMIC16_H
+
+/**
+ * atomic_inc_short - increment of a short integer
+ * @v: pointer to type int
+ *
+ * Atomically adds 1 to @v
+ * Returns the new value of @v
+ */
+static __always_inline short int atomic_inc_short(short int *v)
+{
+   return __atomic_add_fetch(v, 1, __ATOMIC_SEQ_CST);
+}
+
+#endif /* _ASM_GENERIC_ISO_ATOMIC16_H */



[RFC PATCH 11/15] x86: Use ISO bitops

2016-05-18 Thread David Howells
Make x86 use the ISO intrinsic bitops.

This boots fine, however it can't NOP out the LOCK prefixes if the number
of online CPUs is 1.

Without this patch, according to size -A, .text for my test kernel is:

.text 6268277   18446744071578845184

with this patch:

.text 6273589   18446744071578845184

There are still some underoptimisations to be dealt with.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/bitops.h |  143 +
 1 file changed, 2 insertions(+), 141 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 7766d1cf096e..f89aec3a41d2 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -25,6 +25,8 @@
 # error "Unexpected BITS_PER_LONG"
 #endif
 
+#include 
+
 #define BIT_64(n)  (U64_C(1) << (n))
 
 /*
@@ -54,35 +56,6 @@
 #define CONST_MASK(nr) (1 << ((nr) & 7))
 
 /**
- * set_bit - Atomically set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * This function is atomic and may not be reordered.  See __set_bit()
- * if you do not require the atomic guarantees.
- *
- * Note: there are no guarantees that this function will not be reordered
- * on non x86 architectures, so if you are writing portable code,
- * make sure not to rely on its reordering guarantees.
- *
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static __always_inline void
-set_bit(long nr, volatile unsigned long *addr)
-{
-   if (IS_IMMEDIATE(nr)) {
-   asm volatile(LOCK_PREFIX "orb %1,%0"
-   : CONST_MASK_ADDR(nr, addr)
-   : "iq" ((u8)CONST_MASK(nr))
-   : "memory");
-   } else {
-   asm volatile(LOCK_PREFIX "bts %1,%0"
-   : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
-   }
-}
-
-/**
  * __set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
@@ -96,44 +69,6 @@ static __always_inline void __set_bit(long nr, volatile 
unsigned long *addr)
asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
 }
 
-/**
- * clear_bit - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered.  However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
- * in order to ensure changes are visible on other processors.
- */
-static __always_inline void
-clear_bit(long nr, volatile unsigned long *addr)
-{
-   if (IS_IMMEDIATE(nr)) {
-   asm volatile(LOCK_PREFIX "andb %1,%0"
-   : CONST_MASK_ADDR(nr, addr)
-   : "iq" ((u8)~CONST_MASK(nr)));
-   } else {
-   asm volatile(LOCK_PREFIX "btr %1,%0"
-   : BITOP_ADDR(addr)
-   : "Ir" (nr));
-   }
-}
-
-/*
- * clear_bit_unlock - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and implies release semantics before the memory
- * operation. It can be used for an unlock.
- */
-static __always_inline void clear_bit_unlock(long nr, volatile unsigned long 
*addr)
-{
-   barrier();
-   clear_bit(nr, addr);
-}
-
 static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
 {
asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
@@ -172,54 +107,6 @@ static __always_inline void __change_bit(long nr, volatile 
unsigned long *addr)
 }
 
 /**
- * change_bit - Toggle a bit in memory
- * @nr: Bit to change
- * @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static __always_inline void change_bit(long nr, volatile unsigned long *addr)
-{
-   if (IS_IMMEDIATE(nr)) {
-   asm volatile(LOCK_PREFIX "xorb %1,%0"
-   : CONST_MASK_ADDR(nr, addr)
-   : "iq" ((u8)CONST_MASK(nr)));
-   } else {
-   asm volatile(LOCK_PREFIX "btc %1,%0"
-   : BITOP_ADDR(addr)
-   : "Ir" (nr));
-   }
-}
-
-/**
- * test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
- */
-static __always_inline int test_and_set_bit(long nr, volatile unsigned long 
*addr)
-{
-   GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, "Ir", nr, "%0", "c");
-}
-
-/**
- * test_and_set_bit_lock - Set a bit and return its old value for lock
- * 

[RFC PATCH 10/15] x86: Use ISO atomics

2016-05-18 Thread David Howells
Make x86 use the ISO intrinsic atomics.

This boots fine, however it can't NOP out the LOCK prefixes if the number
of online CPUs is 1.

Without this patch, according to size -A, .text for my test kernel is:

.text 6268981   18446744071578845184

with this patch:

.text 6268277   18446744071578845184

There are still some underoptimisations to be dealt with.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/atomic.h |  224 +
 1 file changed, 4 insertions(+), 220 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 3e8674288198..bec5c111b82e 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -13,230 +13,14 @@
  * resource counting etc..
  */
 
-#define ATOMIC_INIT(i) { (i) }
-
-/**
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
-static __always_inline int atomic_read(const atomic_t *v)
-{
-   return READ_ONCE((v)->counter);
-}
-
-/**
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
-static __always_inline void atomic_set(atomic_t *v, int i)
-{
-   WRITE_ONCE(v->counter, i);
-}
-
-/**
- * atomic_add - add integer to atomic variable
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v.
- */
-static __always_inline void atomic_add(int i, atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "addl %1,%0"
-: "+m" (v->counter)
-: "ir" (i));
-}
-
-/**
- * atomic_sub - subtract integer from atomic variable
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v.
- */
-static __always_inline void atomic_sub(int i, atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "subl %1,%0"
-: "+m" (v->counter)
-: "ir" (i));
-}
-
-/**
- * atomic_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-static __always_inline int atomic_sub_and_test(int i, atomic_t *v)
-{
-   GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", "e");
-}
-
-/**
- * atomic_inc - increment atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1.
- */
-static __always_inline void atomic_inc(atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "incl %0"
-: "+m" (v->counter));
-}
-
-/**
- * atomic_dec - decrement atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1.
- */
-static __always_inline void atomic_dec(atomic_t *v)
-{
-   asm volatile(LOCK_PREFIX "decl %0"
-: "+m" (v->counter));
-}
-
-/**
- * atomic_dec_and_test - decrement and test
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-static __always_inline int atomic_dec_and_test(atomic_t *v)
-{
-   GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
-}
-
-/**
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-static __always_inline int atomic_inc_and_test(atomic_t *v)
-{
-   GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
-}
-
-/**
- * atomic_add_negative - add and test if negative
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-static __always_inline int atomic_add_negative(int i, atomic_t *v)
-{
-   GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", "s");
-}
-
-/**
- * atomic_add_return - add integer and return
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns @i + @v
- */
-static __always_inline int atomic_add_return(int i, atomic_t *v)
-{
-   return i + xadd(>counter, i);
-}
-
-/**
- * atomic_sub_return - subtract integer and return
- * @v: pointer of type atomic_t
- * @i: integer value to subtract
- *
- * Atomically subtracts @i from @v and returns @v - @i
- */
-static __always_inline int atomic_sub_return(int i, atomic_t *v)
-{
-   return atomic_add_return(-i, v);
-}
-
-#define atomic_inc_return(v)  (atomic_add_return(1, v))
-#define atomic_dec_return(v)  (atomic_sub_return(1, v))
-
-static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
-{
-   return cmpxchg(>counter, old, new);
-}
-
-static inline int atomic_xchg(atomic_t *v, int new)
-{
- 

[RFC PATCH 06/15] Provide 16-bit ISO atomics

2016-05-18 Thread David Howells
Provide an implementation of atomic_inc_short() using ISO atomics.

Signed-off-by: David Howells 
---

 include/asm-generic/iso-atomic16.h |   27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 include/asm-generic/iso-atomic16.h

diff --git a/include/asm-generic/iso-atomic16.h 
b/include/asm-generic/iso-atomic16.h
new file mode 100644
index ..383baaae7208
--- /dev/null
+++ b/include/asm-generic/iso-atomic16.h
@@ -0,0 +1,27 @@
+/* Use ISO C++11 intrinsics to implement 16-bit atomic ops.
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _ASM_GENERIC_ISO_ATOMIC16_H
+#define _ASM_GENERIC_ISO_ATOMIC16_H
+
+/**
+ * atomic_inc_short - increment of a short integer
+ * @v: pointer to type int
+ *
+ * Atomically adds 1 to @v
+ * Returns the new value of @v
+ */
+static __always_inline short int atomic_inc_short(short int *v)
+{
+   return __atomic_add_fetch(v, 1, __ATOMIC_SEQ_CST);
+}
+
+#endif /* _ASM_GENERIC_ISO_ATOMIC16_H */



[RFC PATCH 11/15] x86: Use ISO bitops

2016-05-18 Thread David Howells
Make x86 use the ISO intrinsic bitops.

This boots fine, however it can't NOP out the LOCK prefixes if the number
of online CPUs is 1.

Without this patch, according to size -A, .text for my test kernel is:

.text 6268277   18446744071578845184

with this patch:

.text 6273589   18446744071578845184

There are still some underoptimisations to be dealt with.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/bitops.h |  143 +
 1 file changed, 2 insertions(+), 141 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 7766d1cf096e..f89aec3a41d2 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -25,6 +25,8 @@
 # error "Unexpected BITS_PER_LONG"
 #endif
 
+#include 
+
 #define BIT_64(n)  (U64_C(1) << (n))
 
 /*
@@ -54,35 +56,6 @@
 #define CONST_MASK(nr) (1 << ((nr) & 7))
 
 /**
- * set_bit - Atomically set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * This function is atomic and may not be reordered.  See __set_bit()
- * if you do not require the atomic guarantees.
- *
- * Note: there are no guarantees that this function will not be reordered
- * on non x86 architectures, so if you are writing portable code,
- * make sure not to rely on its reordering guarantees.
- *
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static __always_inline void
-set_bit(long nr, volatile unsigned long *addr)
-{
-   if (IS_IMMEDIATE(nr)) {
-   asm volatile(LOCK_PREFIX "orb %1,%0"
-   : CONST_MASK_ADDR(nr, addr)
-   : "iq" ((u8)CONST_MASK(nr))
-   : "memory");
-   } else {
-   asm volatile(LOCK_PREFIX "bts %1,%0"
-   : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
-   }
-}
-
-/**
  * __set_bit - Set a bit in memory
  * @nr: the bit to set
  * @addr: the address to start counting from
@@ -96,44 +69,6 @@ static __always_inline void __set_bit(long nr, volatile 
unsigned long *addr)
asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
 }
 
-/**
- * clear_bit - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered.  However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
- * in order to ensure changes are visible on other processors.
- */
-static __always_inline void
-clear_bit(long nr, volatile unsigned long *addr)
-{
-   if (IS_IMMEDIATE(nr)) {
-   asm volatile(LOCK_PREFIX "andb %1,%0"
-   : CONST_MASK_ADDR(nr, addr)
-   : "iq" ((u8)~CONST_MASK(nr)));
-   } else {
-   asm volatile(LOCK_PREFIX "btr %1,%0"
-   : BITOP_ADDR(addr)
-   : "Ir" (nr));
-   }
-}
-
-/*
- * clear_bit_unlock - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and implies release semantics before the memory
- * operation. It can be used for an unlock.
- */
-static __always_inline void clear_bit_unlock(long nr, volatile unsigned long 
*addr)
-{
-   barrier();
-   clear_bit(nr, addr);
-}
-
 static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
 {
asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
@@ -172,54 +107,6 @@ static __always_inline void __change_bit(long nr, volatile 
unsigned long *addr)
 }
 
 /**
- * change_bit - Toggle a bit in memory
- * @nr: Bit to change
- * @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static __always_inline void change_bit(long nr, volatile unsigned long *addr)
-{
-   if (IS_IMMEDIATE(nr)) {
-   asm volatile(LOCK_PREFIX "xorb %1,%0"
-   : CONST_MASK_ADDR(nr, addr)
-   : "iq" ((u8)CONST_MASK(nr)));
-   } else {
-   asm volatile(LOCK_PREFIX "btc %1,%0"
-   : BITOP_ADDR(addr)
-   : "Ir" (nr));
-   }
-}
-
-/**
- * test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
- */
-static __always_inline int test_and_set_bit(long nr, volatile unsigned long 
*addr)
-{
-   GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, "Ir", nr, "%0", "c");
-}
-
-/**
- * test_and_set_bit_lock - Set a bit and return its old value for lock
- * @nr: Bit to set
- * 

Re: [PATCH 3/3] ACPI: ARM64: support for ACPI_TABLE_UPGRADE

2016-05-18 Thread Aleksey Makarov


On 05/17/2016 03:46 PM, Mark Rutland wrote:
> On Tue, May 17, 2016 at 03:06:03PM +0300, Aleksey Makarov wrote:

[...]

>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index feab2ee..1d5e24f 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -261,11 +261,13 @@ void __init setup_arch(char **cmdline_p)
>>  efi_init();
>>  arm64_memblock_init();
>>  
>> +paging_init();
>> +
>> +early_initrd_acpi_init();
> 
> Do we actually need full paging up here?
> 
> Why can we not use fixmap based copying as we do for the other cases
> prior to paging_init?

The implementation of the feature acpi_table_initrd_init()
(drivers/acpi/tables.c) works with initrd data represented as an array
in virtual memory.  It uses some library utility to find the redefined
tables in that array and iterates over it to copy the data to new
allocated memory.  So we need to rewrite it considerably to access the
initrd data via fixmap.

In x86 arch, linear kernel memory is already mapped by the time when
early_initrd_acpi_init() and acpi_boot_table_init() are called so I
think that we can just move this mapping one function earlier too.

Is it ok?

I will address your other comments in the next version of the patchset.

Thank you
Aleksey Makarov

>> +
>>  /* Parse the ACPI tables for possible boot-time configuration */
>>  acpi_boot_table_init();
>>  
>> -paging_init();
>> -
>>  if (acpi_disabled)
>>  unflatten_device_tree();
>>  
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index c204344..ec694c0 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -313,7 +313,7 @@ config ACPI_CUSTOM_DSDT
>>  
>>  config ACPI_TABLE_UPGRADE
>>  bool "Allow upgrading ACPI tables via initrd"
>> -depends on BLK_DEV_INITRD && X86
>> +depends on BLK_DEV_INITRD && (X86 || ARM64)
>>  default y
>>  help
> 
> Perhaps ARCH_HAS_ACPI_TABLE_UPGRADE?
> 
>>This option provides functionality to upgrade arbitrary ACPI tables
>> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>> index 82be84a..c35034d8 100644
>> --- a/drivers/acpi/tables.c
>> +++ b/drivers/acpi/tables.c
>> @@ -482,6 +482,14 @@ static DECLARE_BITMAP(acpi_initrd_installed, 
>> NR_ACPI_INITRD_TABLES);
>>  
>>  #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
>>  
>> +#if defined(CONFIG_X86)
>> +#define MAX_PHYS_ACPI_TABLES (max_low_pfn_mapped << PAGE_SHIFT)
>> +#elif defined(CONFIG_ARM64)
>> +#define MAX_PHYS_ACPI_TABLES PFN_PHYS(max_pfn)
>> +#else
>> +#error "MAX_PHYS_ACPI_TABLES is not defiend for this architecture"
>> +#endif
> 
> s/defiend/defined/
> 
> This should be defined in an arch-specific header if it is actually
> arch-specific.
> 
> Thanks,
> Mark.
> 
>> +
>>  static void __init acpi_table_initrd_init(void *data, size_t size)
>>  {
>>  int sig, no, table_nr = 0, total_offset = 0;
>> @@ -541,7 +549,7 @@ static void __init acpi_table_initrd_init(void *data, 
>> size_t size)
>>  return;
>>  
>>  acpi_tables_addr =
>> -memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
>> +memblock_find_in_range(0, MAX_PHYS_ACPI_TABLES,
>> all_tables_size, PAGE_SIZE);
>>  if (!acpi_tables_addr) {
>>  WARN_ON(1);
>> -- 
>> 2.8.2
>>


Re: [PATCH 3/3] ACPI: ARM64: support for ACPI_TABLE_UPGRADE

2016-05-18 Thread Aleksey Makarov


On 05/17/2016 03:46 PM, Mark Rutland wrote:
> On Tue, May 17, 2016 at 03:06:03PM +0300, Aleksey Makarov wrote:

[...]

>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index feab2ee..1d5e24f 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -261,11 +261,13 @@ void __init setup_arch(char **cmdline_p)
>>  efi_init();
>>  arm64_memblock_init();
>>  
>> +paging_init();
>> +
>> +early_initrd_acpi_init();
> 
> Do we actually need full paging up here?
> 
> Why can we not use fixmap based copying as we do for the other cases
> prior to paging_init?

The implementation of the feature acpi_table_initrd_init()
(drivers/acpi/tables.c) works with initrd data represented as an array
in virtual memory.  It uses some library utility to find the redefined
tables in that array and iterates over it to copy the data to new
allocated memory.  So we need to rewrite it considerably to access the
initrd data via fixmap.

In x86 arch, linear kernel memory is already mapped by the time when
early_initrd_acpi_init() and acpi_boot_table_init() are called so I
think that we can just move this mapping one function earlier too.

Is it ok?

I will address your other comments in the next version of the patchset.

Thank you
Aleksey Makarov

>> +
>>  /* Parse the ACPI tables for possible boot-time configuration */
>>  acpi_boot_table_init();
>>  
>> -paging_init();
>> -
>>  if (acpi_disabled)
>>  unflatten_device_tree();
>>  
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index c204344..ec694c0 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -313,7 +313,7 @@ config ACPI_CUSTOM_DSDT
>>  
>>  config ACPI_TABLE_UPGRADE
>>  bool "Allow upgrading ACPI tables via initrd"
>> -depends on BLK_DEV_INITRD && X86
>> +depends on BLK_DEV_INITRD && (X86 || ARM64)
>>  default y
>>  help
> 
> Perhaps ARCH_HAS_ACPI_TABLE_UPGRADE?
> 
>>This option provides functionality to upgrade arbitrary ACPI tables
>> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>> index 82be84a..c35034d8 100644
>> --- a/drivers/acpi/tables.c
>> +++ b/drivers/acpi/tables.c
>> @@ -482,6 +482,14 @@ static DECLARE_BITMAP(acpi_initrd_installed, 
>> NR_ACPI_INITRD_TABLES);
>>  
>>  #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
>>  
>> +#if defined(CONFIG_X86)
>> +#define MAX_PHYS_ACPI_TABLES (max_low_pfn_mapped << PAGE_SHIFT)
>> +#elif defined(CONFIG_ARM64)
>> +#define MAX_PHYS_ACPI_TABLES PFN_PHYS(max_pfn)
>> +#else
>> +#error "MAX_PHYS_ACPI_TABLES is not defiend for this architecture"
>> +#endif
> 
> s/defiend/defined/
> 
> This should be defined in an arch-specific header if it is actually
> arch-specific.
> 
> Thanks,
> Mark.
> 
>> +
>>  static void __init acpi_table_initrd_init(void *data, size_t size)
>>  {
>>  int sig, no, table_nr = 0, total_offset = 0;
>> @@ -541,7 +549,7 @@ static void __init acpi_table_initrd_init(void *data, 
>> size_t size)
>>  return;
>>  
>>  acpi_tables_addr =
>> -memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
>> +memblock_find_in_range(0, MAX_PHYS_ACPI_TABLES,
>> all_tables_size, PAGE_SIZE);
>>  if (!acpi_tables_addr) {
>>  WARN_ON(1);
>> -- 
>> 2.8.2
>>


Re: [PATCH] mlx5: avoid unused variable warning

2016-05-18 Thread Amir Vadai"
On Wed, May 18, 2016 at 04:21:07PM +0200, Arnd Bergmann wrote:
> When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5
> ethernet driver because the tc_for_each_action() loop never references
> the iterator:
> 
> mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower':
> mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' 
> [-Werror=unused-variable]
>   struct tc_action *a;
> 
> This changes the dummy tc_for_each_action() macro by adding a
> cast to void, letting the compiler know that the variable is
> intentionally declared but not used here. I could not come up
> with a nicer workaround, but this seems to do the trick.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics 
> support")
> Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
> ---
Acked-By: Amir Vadai 

Thanks Arnd.


[RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics

2016-05-18 Thread David Howells
Provide an implementation of various atomic bit functions using the
__atomic_fetch_{and,or,xor}() C++11 atomics.  This involves creating a mask
and adjusting the address in each function as the bit number can be greater
than the number of bits in a word and can also be negative.

On some arches, such as those that use LL/SC type constructs or CMPXCHG,
AND/OR/XOR are normally used anyway so this is not a problem.  On
powerpc64, for example:

.test_and_set_bit:
clrlwi  r10,r3,26
rldicl  r3,r3,58,6
rldicr  r3,r3,3,60
li  r9,1
sld r9,r9,r10
add r4,r4,r3
hwsync
retry:  ldarx   r3,0,r4
or  r10,r3,r9   <--- OR is used here
stdcx.  r10,0,r4
bne-retry
hwsync
and r9,r9,r3
addic   r3,r9,-1
subfe   r3,r3,r9
blr

However, on some arches such as x86, there are bit wangling instructions
that take a bit number and may even correctly handle bit numbers outside
the range 0...BITS_PER_LONG-1.  Even on x86, if the result isn't of
interest, it is apparently faster to use LOCKed AND/OR/XOR than to use
LOCKed BTR/BTS/BTC.

However, the current gcc implementation always uses a CMPXCHG loop rather
than using BTR/BTS/BTC, though it will still use AND/OR/XOR if it can:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

With this fixed,

bool try_test_and_change_bit(unsigned long *p)
{
return test_and_change_bit(83, p);
}

becomes:

foo_test_and_change_bit:
lock btcq $0x13,0x8(%rdi)
setb   %al
retq

There are three things to note here:

 (1) The SETB instruction is generated by the compiler.

 (2) Because the memory variable is a long, BTCQ is used - which uses an
 extra byte of opcode.  We could use BTCL instead as the inline asm
 does... except that that generates technically broken code since the
 bit number is a *64-bit* number and BTCL only takes a 32-bit bit
 number.  I'm not sure that that is worth worrying about, however.

 (3) The compiler divided the constant bit number between the immediate
 value to the BTCQ instruction and the displacement on the memory
 address.  The immediate value is, however, an imm8 so this is
 unnecessary outside the range 127..-128 I think.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c20

Next there's:

const char *foo(unsigned long *p);
const char *foo_test_and_change_bit_2(unsigned long *p)
{
return test_and_change_bit(83, p) ? foo(p) : "bar";
}

becomes:

foo_test_and_change_bit_2:
lock btcq $0x13,0x8(%rdi)
jae.L5
jmpq   foo
.L5 mov$.LC0,%eax
retq

with the compiler generating a conditional jump around a tail-call to
foo().

Next, we can use a variable as the bit number rather than a constant.

bool foo_test_and_change_bit_3(unsigned long *p, long n)
{
return test_and_change_bit(n, p);
}

becomes:

foo_test_and_change_bit_3:
mov%rsi,%rax}
and$0x3f,%esi   } Splitting the bit number
sar$0x6,%rax}
lock btc %rsi,(%rdi,%rax,8)
setb   %al
retq

As can be seen, the compiler again splits the bit number up unnecessarily
since the CPU will do that.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13


Finally, if we ignore the result:

void foo_test_and_change_bit_4(unsigned long *p)
{
test_and_change_bit(83, p);
}
void foo_test_and_change_bit_5(unsigned long *p, long n)
{
test_and_change_bit(n, p);
}

the compiler will use XOR instead of BTC:

foo_test_and_change_bit_4:
lock xorq $0x8,0x8(%rdi)
retq
foo_test_and_change_bit_5:
mov%rsi,%rdx
mov%esi,%ecx
mov$0x1,%eax
sar$0x6,%rdx
shl%cl,%rax
lock xor %rax,(%rdi,%rdx,8)
retq

Also, the compiler won't optimise __atomic_load[_n]() to either a TEST or a
BT instruction:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13


For arm64, if compiled with -march=armv8-a+lse this will generate LDCLR,
LDSET and LDEOR instructions for __atomic_fetch_{and,or,xor}() with
appropriate {,A,L,AL} suffixes.  However, because LSCLR is actually an
AND-NOT not an AND, the operand is negated by the compiler to undo the
negation by the CPU.  This combined with the ~ operator in the code to
generate the mask:

static 

Re: [PATCH] mlx5: avoid unused variable warning

2016-05-18 Thread Amir Vadai"
On Wed, May 18, 2016 at 04:21:07PM +0200, Arnd Bergmann wrote:
> When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5
> ethernet driver because the tc_for_each_action() loop never references
> the iterator:
> 
> mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower':
> mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' 
> [-Werror=unused-variable]
>   struct tc_action *a;
> 
> This changes the dummy tc_for_each_action() macro by adding a
> cast to void, letting the compiler know that the variable is
> intentionally declared but not used here. I could not come up
> with a nicer workaround, but this seems to do the trick.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics 
> support")
> Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
> ---
Acked-By: Amir Vadai 

Thanks Arnd.


[RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics

2016-05-18 Thread David Howells
Provide an implementation of various atomic bit functions using the
__atomic_fetch_{and,or,xor}() C++11 atomics.  This involves creating a mask
and adjusting the address in each function as the bit number can be greater
than the number of bits in a word and can also be negative.

On some arches, such as those that use LL/SC type constructs or CMPXCHG,
AND/OR/XOR are normally used anyway so this is not a problem.  On
powerpc64, for example:

.test_and_set_bit:
clrlwi  r10,r3,26
rldicl  r3,r3,58,6
rldicr  r3,r3,3,60
li  r9,1
sld r9,r9,r10
add r4,r4,r3
hwsync
retry:  ldarx   r3,0,r4
or  r10,r3,r9   <--- OR is used here
stdcx.  r10,0,r4
bne-retry
hwsync
and r9,r9,r3
addic   r3,r9,-1
subfe   r3,r3,r9
blr

However, on some arches such as x86, there are bit wangling instructions
that take a bit number and may even correctly handle bit numbers outside
the range 0...BITS_PER_LONG-1.  Even on x86, if the result isn't of
interest, it is apparently faster to use LOCKed AND/OR/XOR than to use
LOCKed BTR/BTS/BTC.

However, the current gcc implementation always uses a CMPXCHG loop rather
than using BTR/BTS/BTC, though it will still use AND/OR/XOR if it can:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

With this fixed,

bool try_test_and_change_bit(unsigned long *p)
{
return test_and_change_bit(83, p);
}

becomes:

foo_test_and_change_bit:
lock btcq $0x13,0x8(%rdi)
setb   %al
retq

There are three things to note here:

 (1) The SETB instruction is generated by the compiler.

 (2) Because the memory variable is a long, BTCQ is used - which uses an
 extra byte of opcode.  We could use BTCL instead as the inline asm
 does... except that that generates technically broken code since the
 bit number is a *64-bit* number and BTCL only takes a 32-bit bit
 number.  I'm not sure that that is worth worrying about, however.

 (3) The compiler divided the constant bit number between the immediate
 value to the BTCQ instruction and the displacement on the memory
 address.  The immediate value is, however, an imm8 so this is
 unnecessary outside the range 127..-128 I think.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c20

Next there's:

const char *foo(unsigned long *p);
const char *foo_test_and_change_bit_2(unsigned long *p)
{
return test_and_change_bit(83, p) ? foo(p) : "bar";
}

becomes:

foo_test_and_change_bit_2:
lock btcq $0x13,0x8(%rdi)
jae.L5
jmpq   foo
.L5 mov$.LC0,%eax
retq

with the compiler generating a conditional jump around a tail-call to
foo().

Next, we can use a variable as the bit number rather than a constant.

bool foo_test_and_change_bit_3(unsigned long *p, long n)
{
return test_and_change_bit(n, p);
}

becomes:

foo_test_and_change_bit_3:
mov%rsi,%rax}
and$0x3f,%esi   } Splitting the bit number
sar$0x6,%rax}
lock btc %rsi,(%rdi,%rax,8)
setb   %al
retq

As can be seen, the compiler again splits the bit number up unnecessarily
since the CPU will do that.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13


Finally, if we ignore the result:

void foo_test_and_change_bit_4(unsigned long *p)
{
test_and_change_bit(83, p);
}
void foo_test_and_change_bit_5(unsigned long *p, long n)
{
test_and_change_bit(n, p);
}

the compiler will use XOR instead of BTC:

foo_test_and_change_bit_4:
lock xorq $0x8,0x8(%rdi)
retq
foo_test_and_change_bit_5:
mov%rsi,%rdx
mov%esi,%ecx
mov$0x1,%eax
sar$0x6,%rdx
shl%cl,%rax
lock xor %rax,(%rdi,%rdx,8)
retq

Also, the compiler won't optimise __atomic_load[_n]() to either a TEST or a
BT instruction:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13


For arm64, if compiled with -march=armv8-a+lse this will generate LDCLR,
LDSET and LDEOR instructions for __atomic_fetch_{and,or,xor}() with
appropriate {,A,L,AL} suffixes.  However, because LSCLR is actually an
AND-NOT not an AND, the operand is negated by the compiler to undo the
negation by the CPU.  This combined with the ~ operator in the code to
generate the mask:

static 

[RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends

2016-05-18 Thread David Howells
Make x86 use the ISO intrinsic xchg(), cmpxchg() and similar functions.

This boots fine, however it can't NOP out the LOCK prefixes if the number
of online CPUs is 1.

Without this patch, according to size -A, .text for my test kernel is:

.text 6273589   18446744071578845184

with this patch:

.text 6273013   18446744071578845184

There are still some underoptimisations to be dealt with.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/cmpxchg.h|   99 -
 arch/x86/include/asm/cmpxchg_32.h |3 -
 arch/x86/include/asm/cmpxchg_64.h |6 --
 3 files changed, 1 insertion(+), 107 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 9733361fed6f..cde270af1b94 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -2,6 +2,7 @@
 #define ASM_X86_CMPXCHG_H
 
 #include 
+#include 
 #include 
 #include  /* Provides LOCK_PREFIX */
 
@@ -9,12 +10,8 @@
  * Non-existant functions to indicate usage errors at link time
  * (or compile-time if the compiler implements __compiletime_error().
  */
-extern void __xchg_wrong_size(void)
-   __compiletime_error("Bad argument size for xchg");
 extern void __cmpxchg_wrong_size(void)
__compiletime_error("Bad argument size for cmpxchg");
-extern void __xadd_wrong_size(void)
-   __compiletime_error("Bad argument size for xadd");
 extern void __add_wrong_size(void)
__compiletime_error("Bad argument size for add");
 
@@ -34,48 +31,6 @@ extern void __add_wrong_size(void)
 #define__X86_CASE_Q-1  /* sizeof will never return -1 
*/
 #endif
 
-/* 
- * An exchange-type operation, which takes a value and a pointer, and
- * returns the old value.
- */
-#define __xchg_op(ptr, arg, op, lock)  \
-   ({  \
-   __typeof__ (*(ptr)) __ret = (arg);  \
-   switch (sizeof(*(ptr))) {   \
-   case __X86_CASE_B:  \
-   asm volatile (lock #op "b %b0, %1\n"\
- : "+q" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   case __X86_CASE_W:  \
-   asm volatile (lock #op "w %w0, %1\n"\
- : "+r" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   case __X86_CASE_L:  \
-   asm volatile (lock #op "l %0, %1\n" \
- : "+r" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   case __X86_CASE_Q:  \
-   asm volatile (lock #op "q %q0, %1\n"\
- : "+r" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   default:\
-   __ ## op ## _wrong_size();  \
-   }   \
-   __ret;  \
-   })
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define xchg(ptr, v)   __xchg_op((ptr), (v), xchg, "")
-
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
@@ -129,9 +84,6 @@ extern void __add_wrong_size(void)
__ret;  \
 })
 
-#define __cmpxchg(ptr, old, new, size) \
-   __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
 #define __sync_cmpxchg(ptr, old, new, size)\
__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
 
@@ -144,9 +96,6 @@ extern void __add_wrong_size(void)
 # include 
 #endif
 
-#define cmpxchg(ptr, old, new) \
-   __cmpxchg(ptr, old, new, sizeof(*(ptr)))
-
 #define 

[RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics

2016-05-18 Thread David Howells
Make some improvements to the x86 spinlock implementation using the ISO
C++11 intrinsic atomic wrappers.  Note that since the spinlocks use
cmpxchg(), xadd() and __add(), it already makes some use of this.

 (1) Use acquire variants in spin_lock() and spin_trylock() paths.  On x86
 this shouldn't actually make a difference.

 (2) Use try_cmpxchg*() rather than cmpxchg*().  This should produce
 slightly more efficient code as a comparison can be discarded in
 favour of using the value in the Z flag.

Note as mentioned before, the LOCK prefix is currently mandatory, but a gcc
ticket is open to try and get these listed for suppression.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/qspinlock.h |2 +-
 arch/x86/include/asm/spinlock.h  |   18 ++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index eaba08076030..b56b8c2a9ed2 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -55,7 +55,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
do {
while (atomic_read(>val) != 0)
cpu_relax();
-   } while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0);
+   } while (!atomic_try_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL));
 
return true;
 }
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index be0a05913b91..1cbddef2b3f3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -81,7 +81,7 @@ static inline void 
__ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
new.tickets.tail = old.tickets.tail;
 
/* try to clear slowpath flag when there are no contenders */
-   cmpxchg(>head_tail, old.head_tail, new.head_tail);
+   try_cmpxchg_acquire(>head_tail, old.head_tail, 
new.head_tail);
}
 }
 
@@ -107,7 +107,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
 {
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
-   inc = xadd(>tickets, inc);
+   inc = xadd_acquire(>tickets, inc);
if (likely(inc.head == inc.tail))
goto out;
 
@@ -128,19 +128,21 @@ out:
barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
+static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
old.tickets = READ_ONCE(lock->tickets);
if (!__tickets_equal(old.tickets.head, old.tickets.tail))
-   return 0;
+   return false;
 
new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
new.head_tail &= ~TICKET_SLOWPATH_FLAG;
 
-   /* cmpxchg is a full barrier, so nothing can move before it */
-   return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
+   /* Insert an acquire barrier with the cmpxchg so that nothing
+* can move before it.
+*/
+   return try_cmpxchg_acquire(>head_tail, old.head_tail, 
new.head_tail);
 }
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
@@ -151,14 +153,14 @@ static __always_inline void 
arch_spin_unlock(arch_spinlock_t *lock)
 
BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
 
-   head = xadd(>tickets.head, TICKET_LOCK_INC);
+   head = xadd_release(>tickets.head, TICKET_LOCK_INC);
 
if (unlikely(head & TICKET_SLOWPATH_FLAG)) {
head &= ~TICKET_SLOWPATH_FLAG;
__ticket_unlock_kick(lock, (head + TICKET_LOCK_INC));
}
} else
-   __add(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
+   add_release(>tickets.head, TICKET_LOCK_INC, 
UNLOCK_LOCK_PREFIX);
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)



[RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally

2016-05-18 Thread David Howells
Make the ISO bitops use 32-bit values internally so that on x86 we emit the
smaller BTRL/BTSL/BTCL instructions rather than BTRQ/BTSQ/BTCQ (which
require a prefix).

However, if we're going to do this, we really need to change the bit
numbers for test_bit(), set_bit(), test_and_set_bit(), etc. to be int
rather than long because BTR/BTS/BTC take a bit number that's the same size
as the memory variable size.

This means that BTSQ, for example, has a bit number in the range
-2^63..2^63-1 whereas BTSL only has a range of -2^31..2^31-1.  So,
technically, the current inline-asm set_bit() and co. for non-constant bit
number are implemented incorrectly as they can't handle the full range of
the long bit number.  However, in practice, it's probably not a problem.

Signed-off-by: David Howells 
---

 include/asm-generic/iso-bitops.h |   57 +-
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/include/asm-generic/iso-bitops.h b/include/asm-generic/iso-bitops.h
index 64d5067e3a67..e87b91965e67 100644
--- a/include/asm-generic/iso-bitops.h
+++ b/include/asm-generic/iso-bitops.h
@@ -18,11 +18,12 @@
 static __always_inline
 bool test_bit(long bit, const volatile unsigned long *addr)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
-   unsigned long old;
+   const volatile unsigned int *addr32 = (const volatile unsigned int 
*)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
+   unsigned int old;
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   old = __atomic_load_n(addr, __ATOMIC_RELAXED);
+   addr32 += bit >> 5;
+   old = __atomic_load_n(addr32, __ATOMIC_RELAXED);
return old & mask;
 }
 
@@ -44,10 +45,11 @@ bool test_bit(long bit, const volatile unsigned long *addr)
 static __always_inline
 void iso_set_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   __atomic_fetch_or(addr, mask, memorder);
+   addr32 += bit >> 5;
+   __atomic_fetch_or(addr32, mask, memorder);
 }
 
 #define set_bit(b, a) iso_set_bit((b), (a), __ATOMIC_ACQ_REL)
@@ -75,10 +77,11 @@ void iso_set_bit(long bit, volatile unsigned long *addr, 
int memorder)
 static __always_inline
 void iso_clear_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   __atomic_fetch_and(addr, ~mask, memorder);
+   addr32 += bit >> 5;
+   __atomic_fetch_and(addr32, ~mask, memorder);
 }
 
 #define clear_bit(b, a) iso_clear_bit((b), (a), __ATOMIC_ACQ_REL)
@@ -105,10 +108,11 @@ void iso_clear_bit(long bit, volatile unsigned long 
*addr, int memorder)
 static __always_inline
 void iso_change_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   __atomic_fetch_xor(addr, mask, memorder);
+   addr32 += bit >> 5;
+   __atomic_fetch_xor(addr32, mask, memorder);
 }
 
 #define change_bit(b, a) iso_change_bit((b), (a), __ATOMIC_ACQ_REL)
@@ -124,11 +128,12 @@ void iso_change_bit(long bit, volatile unsigned long 
*addr, int memorder)
 static __always_inline
 bool iso_test_and_set_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
-   unsigned long old;
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
+   unsigned int old;
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   old = __atomic_fetch_or(addr, mask, memorder);
+   addr32 += bit >> 5;
+   old = __atomic_fetch_or(addr32, mask, memorder);
return old & mask;
 }
 
@@ -146,11 +151,12 @@ bool iso_test_and_set_bit(long bit, volatile unsigned 
long *addr, int memorder)
 static __always_inline
 bool iso_test_and_clear_bit(long bit, volatile unsigned long *addr, int 
memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
-   unsigned long old;
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
+   unsigned int old;
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   old = __atomic_fetch_and(addr, ~mask, memorder);
+   addr32 += bit >> 5;
+   old = __atomic_fetch_and(addr32, ~mask, memorder);
return old & mask;
 }
 
@@ -168,11 +174,12 @@ bool iso_test_and_clear_bit(long 

[RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally

2016-05-18 Thread David Howells
Make the ISO bitops use 32-bit values internally so that on x86 we emit the
smaller BTRL/BTSL/BTCL instructions rather than BTRQ/BTSQ/BTCQ (which
require a prefix).

However, if we're going to do this, we really need to change the bit
numbers for test_bit(), set_bit(), test_and_set_bit(), etc. to be int
rather than long because BTR/BTS/BTC take a bit number that's the same size
as the memory variable size.

This means that BTSQ, for example, has a bit number in the range
-2^63..2^63-1 whereas BTSL only has a range of -2^31..2^31-1.  So,
technically, the current inline-asm set_bit() and co. for non-constant bit
number are implemented incorrectly as they can't handle the full range of
the long bit number.  However, in practice, it's probably not a problem.

Signed-off-by: David Howells 
---

 include/asm-generic/iso-bitops.h |   57 +-
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/include/asm-generic/iso-bitops.h b/include/asm-generic/iso-bitops.h
index 64d5067e3a67..e87b91965e67 100644
--- a/include/asm-generic/iso-bitops.h
+++ b/include/asm-generic/iso-bitops.h
@@ -18,11 +18,12 @@
 static __always_inline
 bool test_bit(long bit, const volatile unsigned long *addr)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
-   unsigned long old;
+   const volatile unsigned int *addr32 = (const volatile unsigned int 
*)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
+   unsigned int old;
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   old = __atomic_load_n(addr, __ATOMIC_RELAXED);
+   addr32 += bit >> 5;
+   old = __atomic_load_n(addr32, __ATOMIC_RELAXED);
return old & mask;
 }
 
@@ -44,10 +45,11 @@ bool test_bit(long bit, const volatile unsigned long *addr)
 static __always_inline
 void iso_set_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   __atomic_fetch_or(addr, mask, memorder);
+   addr32 += bit >> 5;
+   __atomic_fetch_or(addr32, mask, memorder);
 }
 
 #define set_bit(b, a) iso_set_bit((b), (a), __ATOMIC_ACQ_REL)
@@ -75,10 +77,11 @@ void iso_set_bit(long bit, volatile unsigned long *addr, 
int memorder)
 static __always_inline
 void iso_clear_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   __atomic_fetch_and(addr, ~mask, memorder);
+   addr32 += bit >> 5;
+   __atomic_fetch_and(addr32, ~mask, memorder);
 }
 
 #define clear_bit(b, a) iso_clear_bit((b), (a), __ATOMIC_ACQ_REL)
@@ -105,10 +108,11 @@ void iso_clear_bit(long bit, volatile unsigned long 
*addr, int memorder)
 static __always_inline
 void iso_change_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   __atomic_fetch_xor(addr, mask, memorder);
+   addr32 += bit >> 5;
+   __atomic_fetch_xor(addr32, mask, memorder);
 }
 
 #define change_bit(b, a) iso_change_bit((b), (a), __ATOMIC_ACQ_REL)
@@ -124,11 +128,12 @@ void iso_change_bit(long bit, volatile unsigned long 
*addr, int memorder)
 static __always_inline
 bool iso_test_and_set_bit(long bit, volatile unsigned long *addr, int memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
-   unsigned long old;
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
+   unsigned int old;
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   old = __atomic_fetch_or(addr, mask, memorder);
+   addr32 += bit >> 5;
+   old = __atomic_fetch_or(addr32, mask, memorder);
return old & mask;
 }
 
@@ -146,11 +151,12 @@ bool iso_test_and_set_bit(long bit, volatile unsigned 
long *addr, int memorder)
 static __always_inline
 bool iso_test_and_clear_bit(long bit, volatile unsigned long *addr, int 
memorder)
 {
-   unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
-   unsigned long old;
+   volatile unsigned int *addr32 = (volatile unsigned int *)addr;
+   unsigned int mask = 1U << (bit & (32 - 1));
+   unsigned int old;
 
-   addr += bit >> _BITOPS_LONG_SHIFT;
-   old = __atomic_fetch_and(addr, ~mask, memorder);
+   addr32 += bit >> 5;
+   old = __atomic_fetch_and(addr32, ~mask, memorder);
return old & mask;
 }
 
@@ -168,11 +174,12 @@ bool iso_test_and_clear_bit(long bit, volatile 

[RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends

2016-05-18 Thread David Howells
Make x86 use the ISO intrinsic xchg(), cmpxchg() and similar functions.

This boots fine, however it can't NOP out the LOCK prefixes if the number
of online CPUs is 1.

Without this patch, according to size -A, .text for my test kernel is:

.text 6273589   18446744071578845184

with this patch:

.text 6273013   18446744071578845184

There are still some underoptimisations to be dealt with.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/cmpxchg.h|   99 -
 arch/x86/include/asm/cmpxchg_32.h |3 -
 arch/x86/include/asm/cmpxchg_64.h |6 --
 3 files changed, 1 insertion(+), 107 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 9733361fed6f..cde270af1b94 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -2,6 +2,7 @@
 #define ASM_X86_CMPXCHG_H
 
 #include 
+#include 
 #include 
 #include  /* Provides LOCK_PREFIX */
 
@@ -9,12 +10,8 @@
  * Non-existant functions to indicate usage errors at link time
  * (or compile-time if the compiler implements __compiletime_error().
  */
-extern void __xchg_wrong_size(void)
-   __compiletime_error("Bad argument size for xchg");
 extern void __cmpxchg_wrong_size(void)
__compiletime_error("Bad argument size for cmpxchg");
-extern void __xadd_wrong_size(void)
-   __compiletime_error("Bad argument size for xadd");
 extern void __add_wrong_size(void)
__compiletime_error("Bad argument size for add");
 
@@ -34,48 +31,6 @@ extern void __add_wrong_size(void)
 #define__X86_CASE_Q-1  /* sizeof will never return -1 
*/
 #endif
 
-/* 
- * An exchange-type operation, which takes a value and a pointer, and
- * returns the old value.
- */
-#define __xchg_op(ptr, arg, op, lock)  \
-   ({  \
-   __typeof__ (*(ptr)) __ret = (arg);  \
-   switch (sizeof(*(ptr))) {   \
-   case __X86_CASE_B:  \
-   asm volatile (lock #op "b %b0, %1\n"\
- : "+q" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   case __X86_CASE_W:  \
-   asm volatile (lock #op "w %w0, %1\n"\
- : "+r" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   case __X86_CASE_L:  \
-   asm volatile (lock #op "l %0, %1\n" \
- : "+r" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   case __X86_CASE_Q:  \
-   asm volatile (lock #op "q %q0, %1\n"\
- : "+r" (__ret), "+m" (*(ptr)) \
- : : "memory", "cc");  \
-   break;  \
-   default:\
-   __ ## op ## _wrong_size();  \
-   }   \
-   __ret;  \
-   })
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define xchg(ptr, v)   __xchg_op((ptr), (v), xchg, "")
-
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
@@ -129,9 +84,6 @@ extern void __add_wrong_size(void)
__ret;  \
 })
 
-#define __cmpxchg(ptr, old, new, size) \
-   __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
 #define __sync_cmpxchg(ptr, old, new, size)\
__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
 
@@ -144,9 +96,6 @@ extern void __add_wrong_size(void)
 # include 
 #endif
 
-#define cmpxchg(ptr, old, new) \
-   __cmpxchg(ptr, old, new, sizeof(*(ptr)))
-
 #define sync_cmpxchg(ptr, old, 

[RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics

2016-05-18 Thread David Howells
Make some improvements to the x86 spinlock implementation using the ISO
C++11 intrinsic atomic wrappers.  Note that since the spinlocks use
cmpxchg(), xadd() and __add(), it already makes some use of this.

 (1) Use acquire variants in spin_lock() and spin_trylock() paths.  On x86
 this shouldn't actually make a difference.

 (2) Use try_cmpxchg*() rather than cmpxchg*().  This should produce
 slightly more efficient code as a comparison can be discarded in
 favour of using the value in the Z flag.

Note as mentioned before, the LOCK prefix is currently mandatory, but a gcc
ticket is open to try and get these listed for suppression.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/qspinlock.h |2 +-
 arch/x86/include/asm/spinlock.h  |   18 ++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index eaba08076030..b56b8c2a9ed2 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -55,7 +55,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
do {
while (atomic_read(>val) != 0)
cpu_relax();
-   } while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0);
+   } while (!atomic_try_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL));
 
return true;
 }
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index be0a05913b91..1cbddef2b3f3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -81,7 +81,7 @@ static inline void 
__ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
new.tickets.tail = old.tickets.tail;
 
/* try to clear slowpath flag when there are no contenders */
-   cmpxchg(>head_tail, old.head_tail, new.head_tail);
+   try_cmpxchg_acquire(>head_tail, old.head_tail, 
new.head_tail);
}
 }
 
@@ -107,7 +107,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
 {
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
-   inc = xadd(>tickets, inc);
+   inc = xadd_acquire(>tickets, inc);
if (likely(inc.head == inc.tail))
goto out;
 
@@ -128,19 +128,21 @@ out:
barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
+static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
old.tickets = READ_ONCE(lock->tickets);
if (!__tickets_equal(old.tickets.head, old.tickets.tail))
-   return 0;
+   return false;
 
new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
new.head_tail &= ~TICKET_SLOWPATH_FLAG;
 
-   /* cmpxchg is a full barrier, so nothing can move before it */
-   return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
+   /* Insert an acquire barrier with the cmpxchg so that nothing
+* can move before it.
+*/
+   return try_cmpxchg_acquire(>head_tail, old.head_tail, 
new.head_tail);
 }
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
@@ -151,14 +153,14 @@ static __always_inline void 
arch_spin_unlock(arch_spinlock_t *lock)
 
BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
 
-   head = xadd(>tickets.head, TICKET_LOCK_INC);
+   head = xadd_release(>tickets.head, TICKET_LOCK_INC);
 
if (unlikely(head & TICKET_SLOWPATH_FLAG)) {
head &= ~TICKET_SLOWPATH_FLAG;
__ticket_unlock_kick(lock, (head + TICKET_LOCK_INC));
}
} else
-   __add(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
+   add_release(>tickets.head, TICKET_LOCK_INC, 
UNLOCK_LOCK_PREFIX);
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)



[RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops

2016-05-18 Thread David Howells
Make the x86 mutex implementation use ISO atomic ops rather than inline
assembly.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/mutex.h |6 +--
 arch/x86/include/asm/mutex_iso.h |   73 ++
 2 files changed, 74 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/include/asm/mutex_iso.h

diff --git a/arch/x86/include/asm/mutex.h b/arch/x86/include/asm/mutex.h
index 7d3a48275394..831bc5a73886 100644
--- a/arch/x86/include/asm/mutex.h
+++ b/arch/x86/include/asm/mutex.h
@@ -1,5 +1 @@
-#ifdef CONFIG_X86_32
-# include 
-#else
-# include 
-#endif
+#include 
diff --git a/arch/x86/include/asm/mutex_iso.h b/arch/x86/include/asm/mutex_iso.h
new file mode 100644
index ..ffdbdb411989
--- /dev/null
+++ b/arch/x86/include/asm/mutex_iso.h
@@ -0,0 +1,73 @@
+/* ISO atomics based mutex
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+#ifndef _ASM_X86_MUTEX_ISO_H
+#define _ASM_X86_MUTEX_ISO_H
+
+/**
+ * __mutex_fastpath_lock - decrement and call function if negative
+ * @v: pointer of type atomic_t
+ * @fail_fn: function to call if the result is negative
+ *
+ * Atomically decrements @v and calls  if the result is negative.
+ */
+static inline void __mutex_fastpath_lock(atomic_t *v,
+void (*fail_fn)(atomic_t *))
+{
+   if (atomic_dec_return_acquire(v) < 0)
+   fail_fn(v);
+}
+
+/**
+ *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
+ * from 1 to a 0 value
+ * @v: pointer of type atomic_t
+ *
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
+ */
+static inline int __mutex_fastpath_lock_retval(atomic_t *v)
+{
+   return unlikely(atomic_dec_return(v) < 0) ? -1 : 0;
+}
+
+/**
+ * __mutex_fastpath_unlock - increment and call function if nonpositive
+ * @v: pointer of type atomic_t
+ * @fail_fn: function to call if the result is nonpositive
+ *
+ * Atomically increments @v and calls  if the result is nonpositive.
+ */
+static inline void __mutex_fastpath_unlock(atomic_t *v,
+  void (*fail_fn)(atomic_t *))
+{
+   if (atomic_inc_return_release(v) <= 0)
+   fail_fn(v);
+}
+
+#define __mutex_slowpath_needs_to_unlock() 1
+
+/**
+ * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
+ *
+ *  @v: pointer of type atomic_t
+ *  @fail_fn: fallback function
+ *
+ * Change the count from 1 to 0 and return true (success), or return false
+ * (failure) if it wasn't 1 originally. [the fallback function is never used on
+ * x86_64, because all x86_64 CPUs have a CMPXCHG instruction.]
+ */
+static inline bool __mutex_fastpath_trylock(atomic_t *v,
+   int (*fail_fn)(atomic_t *))
+{
+   return likely(atomic_try_cmpxchg(v, 1, 0));
+}
+
+#endif /* _ASM_X86_MUTEX_ISO_H */



[RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops

2016-05-18 Thread David Howells
Make the x86 mutex implementation use ISO atomic ops rather than inline
assembly.

Signed-off-by: David Howells 
---

 arch/x86/include/asm/mutex.h |6 +--
 arch/x86/include/asm/mutex_iso.h |   73 ++
 2 files changed, 74 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/include/asm/mutex_iso.h

diff --git a/arch/x86/include/asm/mutex.h b/arch/x86/include/asm/mutex.h
index 7d3a48275394..831bc5a73886 100644
--- a/arch/x86/include/asm/mutex.h
+++ b/arch/x86/include/asm/mutex.h
@@ -1,5 +1 @@
-#ifdef CONFIG_X86_32
-# include 
-#else
-# include 
-#endif
+#include 
diff --git a/arch/x86/include/asm/mutex_iso.h b/arch/x86/include/asm/mutex_iso.h
new file mode 100644
index ..ffdbdb411989
--- /dev/null
+++ b/arch/x86/include/asm/mutex_iso.h
@@ -0,0 +1,73 @@
+/* ISO atomics based mutex
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+#ifndef _ASM_X86_MUTEX_ISO_H
+#define _ASM_X86_MUTEX_ISO_H
+
+/**
+ * __mutex_fastpath_lock - decrement and call function if negative
+ * @v: pointer of type atomic_t
+ * @fail_fn: function to call if the result is negative
+ *
+ * Atomically decrements @v and calls  if the result is negative.
+ */
+static inline void __mutex_fastpath_lock(atomic_t *v,
+void (*fail_fn)(atomic_t *))
+{
+   if (atomic_dec_return_acquire(v) < 0)
+   fail_fn(v);
+}
+
+/**
+ *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
+ * from 1 to a 0 value
+ * @v: pointer of type atomic_t
+ *
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
+ */
+static inline int __mutex_fastpath_lock_retval(atomic_t *v)
+{
+   return unlikely(atomic_dec_return(v) < 0) ? -1 : 0;
+}
+
+/**
+ * __mutex_fastpath_unlock - increment and call function if nonpositive
+ * @v: pointer of type atomic_t
+ * @fail_fn: function to call if the result is nonpositive
+ *
+ * Atomically increments @v and calls  if the result is nonpositive.
+ */
+static inline void __mutex_fastpath_unlock(atomic_t *v,
+  void (*fail_fn)(atomic_t *))
+{
+   if (atomic_inc_return_release(v) <= 0)
+   fail_fn(v);
+}
+
+#define __mutex_slowpath_needs_to_unlock() 1
+
+/**
+ * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
+ *
+ *  @v: pointer of type atomic_t
+ *  @fail_fn: fallback function
+ *
+ * Change the count from 1 to 0 and return true (success), or return false
+ * (failure) if it wasn't 1 originally. [the fallback function is never used on
+ * x86_64, because all x86_64 CPUs have a CMPXCHG instruction.]
+ */
+static inline bool __mutex_fastpath_trylock(atomic_t *v,
+   int (*fail_fn)(atomic_t *))
+{
+   return likely(atomic_try_cmpxchg(v, 1, 0));
+}
+
+#endif /* _ASM_X86_MUTEX_ISO_H */



drivers/of: crash on boot

2016-05-18 Thread Sasha Levin
Hi Rhyland,

I'm seeing a crash on boot that seems to have been caused by
"drivers/of: Fix depth when unflattening devicetree":

[   61.145229] 
==

[   61.147588] BUG: KASAN: stack-out-of-bounds in 
unflatten_dt_nodes+0x11d2/0x1290 at addr 88005b30777c

[   61.150490] Read of size 4 by task swapper/0/1

[   61.151892] page:ea00016cc1c0 count:0 mapcount:0 mapping:  
(null) index:0x0

[   61.154313] flags: 0x1f8000()

[   61.155460] page dumped because: kasan: bad access detected

[   61.157174] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090

[   61.160149]  11000b660e83 8a2fe4e6 88005b3074a0 
a3049c42

[   61.162473]   fbfff5c6e404 41b58ab3 
adceb660

[   61.164827]  a3049ad0 88005b307480 a16ecb83 
88003f501ebc

[   61.167133] Call Trace:

[   61.167904] dump_stack (lib/dump_stack.c:53)
[   61.169541] ? arch_local_irq_restore (./arch/x86/include/asm/paravirt.h:134)
[   61.171470] ? __dump_page (mm/debug.c:62)
[   61.173221] kasan_report_error (include/linux/kasan.h:28 
mm/kasan/report.c:211 mm/kasan/report.c:277)
[   61.175067] ? fdt_next_node (lib/../scripts/dtc/libfdt/fdt.c:163)
[   61.176905] ? unflatten_dt_nodes (drivers/of/fdt.c:417)
[   61.178852] __asan_report_load4_noabort (mm/kasan/report.c:318)
[   61.180850] ? unflatten_dt_nodes (drivers/of/fdt.c:417)
[   61.182766] unflatten_dt_nodes (drivers/of/fdt.c:417)
[   61.184697] ? reverse_nodes (drivers/of/fdt.c:396)
[   61.186439] ? set_pageblock_migratetype (mm/page_alloc.c:589)
[   61.188473] ? kernel_poison_pages (mm/page_poison.c:163)
[   61.190344] ? lookup_page_ext (mm/page_ext.c:200)
[   61.192168] ? get_page_from_freelist (mm/page_alloc.c:1747 
mm/page_alloc.c:3003)
[   61.194178] ? get_from_free_list (lib/idr.c:79)
[   61.196069] ? ida_get_new_above (lib/idr.c:1002)
[   61.197884] ? idr_get_empty_slot (lib/idr.c:933)
[   61.199802] ? split_free_page (mm/page_alloc.c:2901)
[   61.201598] ? ___might_sleep (kernel/sched/core.c:7520 (discriminator 1))
[   61.203346] ? __alloc_pages_nodemask (mm/page_alloc.c:3804)
[   61.205328] ? __alloc_pages_slowpath (mm/page_alloc.c:3749)
[   61.207386] ? alloc_pages_current (mm/mempolicy.c:2078)
[   61.209281] ? kasan_unpoison_shadow (mm/kasan/kasan.c:59)
[   61.211155] ? kasan_kmalloc_large (mm/kasan/kasan.c:612)
[   61.213015] ? of_fdt_unflatten_tree (drivers/of/fdt.c:513)
[   61.214929] __unflatten_device_tree (drivers/of/fdt.c:488)
[   61.216901] of_fdt_unflatten_tree (drivers/of/fdt.c:541)
[   61.218841] of_unittest (drivers/of/unittest.c:924 
drivers/of/unittest.c:1936)
[   61.220556] ? initcall_blacklisted (init/main.c:725)
[   61.222494] ? try_to_run_init_process (init/main.c:708)
[   61.224682] ? of_unittest_overlay (drivers/of/unittest.c:1931)
[   61.227059] ? kobject_add (lib/kobject.c:396)
[   61.229113] ? kobject_add_internal (lib/kobject.c:396)
[   61.231455] ? of_unittest_overlay (drivers/of/unittest.c:1931)
[   61.233865] do_one_initcall (init/main.c:770)
[   61.236005] ? initcall_blacklisted (init/main.c:759)
[   61.238354] ? ___might_sleep (kernel/sched/core.c:7522)
[   61.240504] kernel_init_freeable (init/main.c:834 init/main.c:843 
init/main.c:861 init/main.c:1008)
[   61.242798] ? start_kernel (init/main.c:978)
[   61.244919] ? compat_start_thread (arch/x86/kernel/process_64.c:259)
[   61.247174] kernel_init (init/main.c:936)
[   61.249162] ret_from_fork (arch/x86/entry/entry_64.S:390)
[   61.251170] ? rest_init (init/main.c:931)
[   61.253104] Memory state around the buggy address:

[   61.254888]  88005b307600: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 
f1

[   61.257551]  88005b307680: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 
f2

[   61.260255] >88005b307700: 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 
f2

[   61.262911] ^

[   61.265529]  88005b307780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00

[   61.268218]  88005b307800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00

[   61.270874] 
==

[   61.273558] Disabling lock debugging due to kernel taint

[   61.275648] 
==

[   61.278303] BUG: KASAN: stack-out-of-bounds in 
unflatten_dt_nodes+0x1236/0x1290 at addr 88005b307898

[   61.281794] Read of size 8 by task swapper/0/1

[   61.283483] page:ea00016cc1c0 count:0 mapcount:0 mapping:  
(null) index:0x0

[   61.286454] flags: 0x1f8000()

[   61.287817] page dumped because: kasan: bad access detected

[   61.289904] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB   
4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090

[   61.293896]  11000b660e83 8a2fe4e6 88005b3074a0 
fff

drivers/of: crash on boot

2016-05-18 Thread Sasha Levin
Hi Rhyland,

I'm seeing a crash on boot that seems to have been caused by
"drivers/of: Fix depth when unflattening devicetree":

[   61.145229] 
==

[   61.147588] BUG: KASAN: stack-out-of-bounds in 
unflatten_dt_nodes+0x11d2/0x1290 at addr 88005b30777c

[   61.150490] Read of size 4 by task swapper/0/1

[   61.151892] page:ea00016cc1c0 count:0 mapcount:0 mapping:  
(null) index:0x0

[   61.154313] flags: 0x1f8000()

[   61.155460] page dumped because: kasan: bad access detected

[   61.157174] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090

[   61.160149]  11000b660e83 8a2fe4e6 88005b3074a0 
a3049c42

[   61.162473]   fbfff5c6e404 41b58ab3 
adceb660

[   61.164827]  a3049ad0 88005b307480 a16ecb83 
88003f501ebc

[   61.167133] Call Trace:

[   61.167904] dump_stack (lib/dump_stack.c:53)
[   61.169541] ? arch_local_irq_restore (./arch/x86/include/asm/paravirt.h:134)
[   61.171470] ? __dump_page (mm/debug.c:62)
[   61.173221] kasan_report_error (include/linux/kasan.h:28 
mm/kasan/report.c:211 mm/kasan/report.c:277)
[   61.175067] ? fdt_next_node (lib/../scripts/dtc/libfdt/fdt.c:163)
[   61.176905] ? unflatten_dt_nodes (drivers/of/fdt.c:417)
[   61.178852] __asan_report_load4_noabort (mm/kasan/report.c:318)
[   61.180850] ? unflatten_dt_nodes (drivers/of/fdt.c:417)
[   61.182766] unflatten_dt_nodes (drivers/of/fdt.c:417)
[   61.184697] ? reverse_nodes (drivers/of/fdt.c:396)
[   61.186439] ? set_pageblock_migratetype (mm/page_alloc.c:589)
[   61.188473] ? kernel_poison_pages (mm/page_poison.c:163)
[   61.190344] ? lookup_page_ext (mm/page_ext.c:200)
[   61.192168] ? get_page_from_freelist (mm/page_alloc.c:1747 
mm/page_alloc.c:3003)
[   61.194178] ? get_from_free_list (lib/idr.c:79)
[   61.196069] ? ida_get_new_above (lib/idr.c:1002)
[   61.197884] ? idr_get_empty_slot (lib/idr.c:933)
[   61.199802] ? split_free_page (mm/page_alloc.c:2901)
[   61.201598] ? ___might_sleep (kernel/sched/core.c:7520 (discriminator 1))
[   61.203346] ? __alloc_pages_nodemask (mm/page_alloc.c:3804)
[   61.205328] ? __alloc_pages_slowpath (mm/page_alloc.c:3749)
[   61.207386] ? alloc_pages_current (mm/mempolicy.c:2078)
[   61.209281] ? kasan_unpoison_shadow (mm/kasan/kasan.c:59)
[   61.211155] ? kasan_kmalloc_large (mm/kasan/kasan.c:612)
[   61.213015] ? of_fdt_unflatten_tree (drivers/of/fdt.c:513)
[   61.214929] __unflatten_device_tree (drivers/of/fdt.c:488)
[   61.216901] of_fdt_unflatten_tree (drivers/of/fdt.c:541)
[   61.218841] of_unittest (drivers/of/unittest.c:924 
drivers/of/unittest.c:1936)
[   61.220556] ? initcall_blacklisted (init/main.c:725)
[   61.222494] ? try_to_run_init_process (init/main.c:708)
[   61.224682] ? of_unittest_overlay (drivers/of/unittest.c:1931)
[   61.227059] ? kobject_add (lib/kobject.c:396)
[   61.229113] ? kobject_add_internal (lib/kobject.c:396)
[   61.231455] ? of_unittest_overlay (drivers/of/unittest.c:1931)
[   61.233865] do_one_initcall (init/main.c:770)
[   61.236005] ? initcall_blacklisted (init/main.c:759)
[   61.238354] ? ___might_sleep (kernel/sched/core.c:7522)
[   61.240504] kernel_init_freeable (init/main.c:834 init/main.c:843 
init/main.c:861 init/main.c:1008)
[   61.242798] ? start_kernel (init/main.c:978)
[   61.244919] ? compat_start_thread (arch/x86/kernel/process_64.c:259)
[   61.247174] kernel_init (init/main.c:936)
[   61.249162] ret_from_fork (arch/x86/entry/entry_64.S:390)
[   61.251170] ? rest_init (init/main.c:931)
[   61.253104] Memory state around the buggy address:

[   61.254888]  88005b307600: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 
f1

[   61.257551]  88005b307680: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 
f2

[   61.260255] >88005b307700: 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 
f2

[   61.262911] ^

[   61.265529]  88005b307780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00

[   61.268218]  88005b307800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00

[   61.270874] 
==

[   61.273558] Disabling lock debugging due to kernel taint

[   61.275648] 
==

[   61.278303] BUG: KASAN: stack-out-of-bounds in 
unflatten_dt_nodes+0x1236/0x1290 at addr 88005b307898

[   61.281794] Read of size 8 by task swapper/0/1

[   61.283483] page:ea00016cc1c0 count:0 mapcount:0 mapping:  
(null) index:0x0

[   61.286454] flags: 0x1f8000()

[   61.287817] page dumped because: kasan: bad access detected

[   61.289904] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB   
4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090

[   61.293896]  11000b660e83 8a2fe4e6 88005b3074a0 
fff

Re: [PATCH v2 omap 3/6] arm: Add _rcuidle tracepoints to allow clk_core_disable() use from idle

2016-05-18 Thread Mason
On 16/05/2016 20:50, Paul E. McKenney wrote:

>> stack backtrace:
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1114
>> Hardware name: Generic OMAP36xx (Flattened Device Tree)
>> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> [] (show_stack) from [] (dump_stack+0xb0/0xe4)
>> [] (dump_stack) from [] (clk_core_disable+0x17c/0x348)
>> [] (clk_core_disable) from [] (clk_disable+0x24/0x30)
>> [] (clk_disable) from [] (_disable_clocks+0x18/0x7c)
>> [] (_disable_clocks) from [] (_idle+0x12c/0x230)
>> [] (_idle) from [] (omap_hwmod_idle+0x24/0x44)
>> [] (omap_hwmod_idle) from [] (omap_device_idle+0x3c/0x90)
>> [] (omap_device_idle) from [] (__rpm_callback+0x2c/0x60)
>> [] (__rpm_callback) from [] (rpm_callback+0x20/0x80)
>> [] (rpm_callback) from [] (rpm_suspend+0x100/0x768)
>> [] (rpm_suspend) from [] (__pm_runtime_suspend+0x64/0x84)
>> [] (__pm_runtime_suspend) from [] 
>> (omap2_gpio_prepare_for_idle+0x5
>> c/0x70)
>> [] (omap2_gpio_prepare_for_idle) from [] 
>> (omap_sram_idle+0x140/0x2
>> 44)

Wouldn't the stack trace look better without the word wrap?  :-)

Regards.


Re: [PATCH v2 omap 3/6] arm: Add _rcuidle tracepoints to allow clk_core_disable() use from idle

2016-05-18 Thread Mason
On 16/05/2016 20:50, Paul E. McKenney wrote:

>> stack backtrace:
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1114
>> Hardware name: Generic OMAP36xx (Flattened Device Tree)
>> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> [] (show_stack) from [] (dump_stack+0xb0/0xe4)
>> [] (dump_stack) from [] (clk_core_disable+0x17c/0x348)
>> [] (clk_core_disable) from [] (clk_disable+0x24/0x30)
>> [] (clk_disable) from [] (_disable_clocks+0x18/0x7c)
>> [] (_disable_clocks) from [] (_idle+0x12c/0x230)
>> [] (_idle) from [] (omap_hwmod_idle+0x24/0x44)
>> [] (omap_hwmod_idle) from [] (omap_device_idle+0x3c/0x90)
>> [] (omap_device_idle) from [] (__rpm_callback+0x2c/0x60)
>> [] (__rpm_callback) from [] (rpm_callback+0x20/0x80)
>> [] (rpm_callback) from [] (rpm_suspend+0x100/0x768)
>> [] (rpm_suspend) from [] (__pm_runtime_suspend+0x64/0x84)
>> [] (__pm_runtime_suspend) from [] 
>> (omap2_gpio_prepare_for_idle+0x5
>> c/0x70)
>> [] (omap2_gpio_prepare_for_idle) from [] 
>> (omap_sram_idle+0x140/0x2
>> 44)

Wouldn't the stack trace look better without the word wrap?  :-)

Regards.


Re: [PATCH] usb: echi-hcd: Add register access check in shutdown

2016-05-18 Thread Srinivas Kandagatla



On 18/05/16 15:56, Alan Stern wrote:

On Wed, 18 May 2016, Srinivas Kandagatla wrote:


This patch adds a check in ehci_shutdown(), to make sure
that the register access is available before accessing registers.

The use case is simple, for boards like DB410c where the usb host
or device functionality is decided based on the micro-usb cable
presence. If the board boots up with micro-usb connected and the
host driver is probed, but the ehci_setup() has not been done yet,
then a system shutdown would trigger below NULL pointer exception
without this patch.


How can that happen?  While the host driver is probed, the probing
thread holds the device lock.  But the system shutdown routine acquires
the device lock before invoking the ->shutdown callback.  Therefore the
two things cannot happen concurrently.


No, I did not mean them happening concurrently, I mean that the host 
driver is up, however ehci_setup() is not done yet.


Will change the comments if its misleading the reader.




Unable to handle kernel NULL pointer dereference at virtual address
0008


...


--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -368,6 +368,9 @@ static void ehci_shutdown(struct usb_hcd *hcd)
  {
struct ehci_hcd *ehci = hcd_to_ehci(hcd);

+   if (!HCD_HW_ACCESSIBLE(hcd))
+   return;
+
spin_lock_irq(>lock);
ehci->shutdown = true;
ehci->rh_state = EHCI_RH_STOPPING;


This doesn't seem like the right place.  What you really should do is
skip calling ehci_silence_controller() if the hardware isn't
accessible.  That's where the hardware gets touched, not in
ehci_shutdown().

Yep , that should work as well.

Will send a v2 patch.

thanks,
srini



Alan Stern



Re: [PATCH] usb: echi-hcd: Add register access check in shutdown

2016-05-18 Thread Srinivas Kandagatla



On 18/05/16 15:56, Alan Stern wrote:

On Wed, 18 May 2016, Srinivas Kandagatla wrote:


This patch adds a check in ehci_shutdown(), to make sure
that the register access is available before accessing registers.

The use case is simple, for boards like DB410c where the usb host
or device functionality is decided based on the micro-usb cable
presence. If the board boots up with micro-usb connected and the
host driver is probed, but the ehci_setup() has not been done yet,
then a system shutdown would trigger below NULL pointer exception
without this patch.


How can that happen?  While the host driver is probed, the probing
thread holds the device lock.  But the system shutdown routine acquires
the device lock before invoking the ->shutdown callback.  Therefore the
two things cannot happen concurrently.


No, I did not mean them happening concurrently, I mean that the host 
driver is up, however ehci_setup() is not done yet.


Will change the comments if its misleading the reader.




Unable to handle kernel NULL pointer dereference at virtual address
0008


...


--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -368,6 +368,9 @@ static void ehci_shutdown(struct usb_hcd *hcd)
  {
struct ehci_hcd *ehci = hcd_to_ehci(hcd);

+   if (!HCD_HW_ACCESSIBLE(hcd))
+   return;
+
spin_lock_irq(>lock);
ehci->shutdown = true;
ehci->rh_state = EHCI_RH_STOPPING;


This doesn't seem like the right place.  What you really should do is
skip calling ehci_silence_controller() if the hardware isn't
accessible.  That's where the hardware gets touched, not in
ehci_shutdown().

Yep , that should work as well.

Will send a v2 patch.

thanks,
srini



Alan Stern



Re: [PATCH v3] net: sock: move ->sk_shutdown out of bitfields.

2016-05-18 Thread Eric Dumazet
On Wed, 2016-05-18 at 17:36 +0300, Andrey Ryabinin wrote:
> ->sk_shutdown bits share one bitfield with some other bits in sock struct,
> such as ->sk_no_check_[r,t]x, ->sk_userlocks ...
> sock_setsockopt() may write to these bits, while holding the socket lock.
> 
> In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only
> unix_state_lock(). So concurrent setsockopt() and shutdown() may lead
> to corrupting these bits.
> 
> Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte.
> This will not change the 'struct sock' size since ->sk_shutdown moved into
> previously unused 16-bit hole.
> 
> Signed-off-by: Andrey Ryabinin 
> Suggested-by: Hannes Frederic Sowa 
> ---
> 
> Changes since v1:
>  - move sk_shutdown into a separate byte instead of locking
>   AF_UNIX sockets.
> 
> Changes since v2:
>  - reorder bitfields, so that sk_type/sk_protocol still fit in
> separate 2-bytes/byte.
>   

I suggested to use an explicit padding, to clearly show that two bits
are available there. You also could add a comment to remind the rules.

Something like :

diff --git a/include/net/sock.h b/include/net/sock.h
index 
c9c8b19df27c558354687119db60c0716909ea3f..4e239a7ef4f67b1988fe119d81566d1465a2b596
 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -382,8 +382,10 @@ struct sock {
atomic_tsk_omem_alloc;
int sk_sndbuf;
struct sk_buff_head sk_write_queue;
+
+   /* because of non atomicity rules, all changes are protected by socket 
lock */
kmemcheck_bitfield_begin(flags);
-   unsigned intsk_shutdown  : 2,
+   unsigned intsk_padding   : 2,
sk_no_check_tx : 1,
sk_no_check_rx : 1,
sk_userlocks : 4,
@@ -391,6 +393,7 @@ struct sock {
sk_type  : 16;
 #define SK_PROTOCOL_MAX U8_MAX
kmemcheck_bitfield_end(flags);
+
int sk_wmem_queued;
gfp_t   sk_allocation;
u32 sk_pacing_rate; /* bytes per second */
@@ -418,6 +421,7 @@ struct sock {
struct timer_list   sk_timer;
ktime_t sk_stamp;
u16 sk_tsflags;
+   u8  sk_shutdown;
u32 sk_tskey;
struct socket   *sk_socket;
void*sk_user_data;




Re: [PATCH v3] net: sock: move ->sk_shutdown out of bitfields.

2016-05-18 Thread Eric Dumazet
On Wed, 2016-05-18 at 17:36 +0300, Andrey Ryabinin wrote:
> ->sk_shutdown bits share one bitfield with some other bits in sock struct,
> such as ->sk_no_check_[r,t]x, ->sk_userlocks ...
> sock_setsockopt() may write to these bits, while holding the socket lock.
> 
> In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only
> unix_state_lock(). So concurrent setsockopt() and shutdown() may lead
> to corrupting these bits.
> 
> Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte.
> This will not change the 'struct sock' size since ->sk_shutdown moved into
> previously unused 16-bit hole.
> 
> Signed-off-by: Andrey Ryabinin 
> Suggested-by: Hannes Frederic Sowa 
> ---
> 
> Changes since v1:
>  - move sk_shutdown into a separate byte instead of locking
>   AF_UNIX sockets.
> 
> Changes since v2:
>  - reorder bitfields, so that sk_type/sk_protocol still fit in
> separate 2-bytes/byte.
>   

I suggested to use an explicit padding, to clearly show that two bits
are available there. You also could add a comment to remind the rules.

Something like :

diff --git a/include/net/sock.h b/include/net/sock.h
index 
c9c8b19df27c558354687119db60c0716909ea3f..4e239a7ef4f67b1988fe119d81566d1465a2b596
 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -382,8 +382,10 @@ struct sock {
atomic_tsk_omem_alloc;
int sk_sndbuf;
struct sk_buff_head sk_write_queue;
+
+   /* because of non atomicity rules, all changes are protected by socket 
lock */
kmemcheck_bitfield_begin(flags);
-   unsigned intsk_shutdown  : 2,
+   unsigned intsk_padding   : 2,
sk_no_check_tx : 1,
sk_no_check_rx : 1,
sk_userlocks : 4,
@@ -391,6 +393,7 @@ struct sock {
sk_type  : 16;
 #define SK_PROTOCOL_MAX U8_MAX
kmemcheck_bitfield_end(flags);
+
int sk_wmem_queued;
gfp_t   sk_allocation;
u32 sk_pacing_rate; /* bytes per second */
@@ -418,6 +421,7 @@ struct sock {
struct timer_list   sk_timer;
ktime_t sk_stamp;
u16 sk_tsflags;
+   u8  sk_shutdown;
u32 sk_tskey;
struct socket   *sk_socket;
void*sk_user_data;




Re: [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics

2016-05-18 Thread Arnd Bergmann
On Wednesday 18 May 2016 16:10:45 David Howells wrote:
> cmpxchg_local() is not signed-value safe because on a 64-bit machine signed
> int arguments to it may be sign-extended to signed long _before_ begin cast
> to unsigned long.  This potentially causes comparisons to fail when dealing
> with negative values.
> 
> Fix the generic atomic functions that are implemented in terms of cmpxchg()
> to cast their arguments to unsigned int before calling cmpxchg().
> 
> Signed-off-by: David Howells 
> 

Isn't the problem you describe something that cmpxchg() could prevent instead?

Arnd


Re: [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics

2016-05-18 Thread Arnd Bergmann
On Wednesday 18 May 2016 16:10:45 David Howells wrote:
> cmpxchg_local() is not signed-value safe because on a 64-bit machine signed
> int arguments to it may be sign-extended to signed long _before_ begin cast
> to unsigned long.  This potentially causes comparisons to fail when dealing
> with negative values.
> 
> Fix the generic atomic functions that are implemented in terms of cmpxchg()
> to cast their arguments to unsigned int before calling cmpxchg().
> 
> Signed-off-by: David Howells 
> 

Isn't the problem you describe something that cmpxchg() could prevent instead?

Arnd


Re: [PATCH] bcache: bch_writeback_thread() is not freezable

2016-05-18 Thread Jiri Kosina
On Mon, 2 May 2016, Jiri Kosina wrote:

> > From: Jiri Kosina 
> > 
> > bch_writeback_thread() is calling try_to_freeze(), but that's just an 
> > expensive no-op given the fact that the thread is not marked freezable.
> > 
> > I/O helper kthreads, exactly such as the bcache writeback thread, actually 
> > shouldn't be freezable, because they are potentially necessary for 
> > finalizing the image write-out.
> > 
> > Signed-off-by: Jiri Kosina 
> 
> Let's send a friendly ping on these three patches after two weeks...

After a month, I think, it's time for another ping.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH] bcache: bch_writeback_thread() is not freezable

2016-05-18 Thread Jiri Kosina
On Mon, 2 May 2016, Jiri Kosina wrote:

> > From: Jiri Kosina 
> > 
> > bch_writeback_thread() is calling try_to_freeze(), but that's just an 
> > expensive no-op given the fact that the thread is not marked freezable.
> > 
> > I/O helper kthreads, exactly such as the bcache writeback thread, actually 
> > shouldn't be freezable, because they are potentially necessary for 
> > finalizing the image write-out.
> > 
> > Signed-off-by: Jiri Kosina 
> 
> Let's send a friendly ping on these three patches after two weeks...

After a month, I think, it's time for another ping.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-18 Thread Sebastian Frias
Hi Michal,

On 05/17/2016 10:16 PM, Michal Hocko wrote:
> On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
> [...]
>> From reading Documentation/cgroup-v1/memory.txt (and from a few
>> replies here talking about cgroups), it looks like the OOM-killer is
>> still being actively discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes
>> in a given cgroup until the OOM situation is solved for that cgroup,
>> right?
> 
> It will be blocked waiting either for some external action which would
> result in OOM codition going away or any other charge release. You have
> to configure memcg for that though. The default behavior is to invoke
> the same OOM killer algorithm which is just reduced to tasks from the
> memcg (hierarchy).

Ok, I see, thanks!

> 
>> If that is right, it means that there is indeed a way to deal
>> with an OOM situation (stack expansion, COW failure, 'memory hog',
>> etc.) in a better way than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole
>> system behave as if it was inside a cgroup? (*)
> 
> No it is not. You have to realize that the system wide and the memcg OOM
> situations are quite different. There is usually quite some memory free
> when you hit the memcg OOM so the administrator can actually do
> something. 

Ok, so it works like the 5% reserved for 'root' on filesystems?

>The global OOM means there is _no_ memory at all. Many kernel
> operations will need some memory to do something useful. Let's say you
> would want to do an educated guess about who to kill - most proc APIs
> will need to allocate. And this is just a beginning. Things are getting
> really nasty when you get deeper and deeper. E.g. the OOM killer has to
> give the oom victim access to memory reserves so that the task can exit
> because that path needs to allocate as well. 

Really? I would have thought that once that SIGKILL is sent, the victim process 
is not expected to do anything else and thus its memory could be claimed 
immediately.
Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM)

>So even if you wanted to
> give userspace some chance to resolve the OOM situation you would either
> need some special API to tell "this process is really special and it can
> access memory reserves and it has an absolute priority etc." or have a
> in kernel fallback to do something or your system could lockup really
> easily.
> 

I see, so basically at least two cgroups would be needed, one reserved for 
handling the OOM situation through some API and another for the "rest of the 
system".
Basically just like the 5% reserved for 'root' on filesystems.
Do you think that would work?

Best regards,

Sebastian


Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-18 Thread Sebastian Frias
Hi Michal,

On 05/17/2016 10:16 PM, Michal Hocko wrote:
> On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
> [...]
>> From reading Documentation/cgroup-v1/memory.txt (and from a few
>> replies here talking about cgroups), it looks like the OOM-killer is
>> still being actively discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes
>> in a given cgroup until the OOM situation is solved for that cgroup,
>> right?
> 
> It will be blocked waiting either for some external action which would
> result in OOM codition going away or any other charge release. You have
> to configure memcg for that though. The default behavior is to invoke
> the same OOM killer algorithm which is just reduced to tasks from the
> memcg (hierarchy).

Ok, I see, thanks!

> 
>> If that is right, it means that there is indeed a way to deal
>> with an OOM situation (stack expansion, COW failure, 'memory hog',
>> etc.) in a better way than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole
>> system behave as if it was inside a cgroup? (*)
> 
> No it is not. You have to realize that the system wide and the memcg OOM
> situations are quite different. There is usually quite some memory free
> when you hit the memcg OOM so the administrator can actually do
> something. 

Ok, so it works like the 5% reserved for 'root' on filesystems?

>The global OOM means there is _no_ memory at all. Many kernel
> operations will need some memory to do something useful. Let's say you
> would want to do an educated guess about who to kill - most proc APIs
> will need to allocate. And this is just a beginning. Things are getting
> really nasty when you get deeper and deeper. E.g. the OOM killer has to
> give the oom victim access to memory reserves so that the task can exit
> because that path needs to allocate as well. 

Really? I would have thought that once that SIGKILL is sent, the victim process 
is not expected to do anything else and thus its memory could be claimed 
immediately.
Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM)

>So even if you wanted to
> give userspace some chance to resolve the OOM situation you would either
> need some special API to tell "this process is really special and it can
> access memory reserves and it has an absolute priority etc." or have a
> in kernel fallback to do something or your system could lockup really
> easily.
> 

I see, so basically at least two cgroups would be needed, one reserved for 
handling the OOM situation through some API and another for the "rest of the 
system".
Basically just like the 5% reserved for 'root' on filesystems.
Do you think that would work?

Best regards,

Sebastian


Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-18 Thread Sebastian Frias
Hi Austin,

On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote:
>> I see the difference, your answer seems a bit like the one from Austin, 
>> basically:
>> - killing a process is a sort of kernel protection attempting to deal 
>> "automatically" with some situation, like deciding what is a 'memory hog', 
>> or what is 'in infinite loop', "usually" in a correct way.
>> It seems there's people who think its better to avoid having to take such 
>> decisions and/or they should be decided by the user, because "usually" != 
>> "always".
> FWIW, it's really easy to see what's using a lot of memory, it's impossible 
> to tell if something is stuck in an infinite loop without looking deep into 
> the process state and possibly even at the source code (and even then it can 
> be almost impossible to be certain).  This is why we have a OOM-Killer, and 
> not a infinite-loop-killer.
> 
> Again I reiterate, if a system is properly provisioned (that is, if you have 
> put in enough RAM and possibly swap space to do what you want to use it for), 
> the only reason the OOM-killer should be invoked is due to a bug. 

Are you sure that's the only possible reason?
I mean, what if somebody keeps opening tabs on Firefox?
If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs 
open, please close some to free memory".

> The non-default overcommit options still have the same issues they just 
> change how and when they happen (overcommit=never will fire sooner, 
> overcommit=always will fire later), and also can impact memory allocation 
> performance (I have numbers somewhere that I can't find right now that 
> demonstrated that overcommit=never gave more deterministic and (on average) 
> marginally better malloc() performance, and simple logic would suggest that 
> overcommit=always would make malloc() perform better too).
>> And people who see that as a nice thing but complex thing to do.
>> In this thread we've tried to explain why this heuristic (and/or OOM-killer) 
>> is/was needed and/or its history, which has been very enlightening by the 
>> way.
>>
>> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here 
>> talking about cgroups), it looks like the OOM-killer is still being actively 
>> discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes in a 
>> given cgroup until the OOM situation is solved for that cgroup, right?
>> If that is right, it means that there is indeed a way to deal with an OOM 
>> situation (stack expansion, COW failure, 'memory hog', etc.) in a better way 
>> than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole system 
>> behave as if it was inside a cgroup? (*)
> No, not with the process freeze behavior, because getting the group running 
> again requires input from an external part of the system, which by definition 
> doesn't exist if the group is the entire system; 

Do you mean that it pauses all processes in the cgroup?
I thought it would pause on a case-by-case basis, like first process to reach 
the limit gets paused, and so on.

Honestly I thought it would work a bit like the filesystems, where 'root' 
usually has 5% reserved, so that a process (or processes) filling the disk does 
not disrupt the system to the point of preventing 'root' from performing 
administrative actions.

That makes me think, why is disk space handled differently than memory in this 
case? I mean, why is disk space exhaustion handled differently than memory 
exhaustion?
We could imagine that both resources are required for proper system and process 
operation, so if OOM-killer is there to attempt to keep the system working at 
all costs (even if that means sacrificing processes), why isn't there an 
OOFS-killer (out-of-free-space killer)?

>and, because our GUI isn't built into the kernel, we can't pause things and 
>pop up a little dialog asking the user what to do to resolve the issue.

:-) Yeah, I was thinking that could be handled with the cgroups' notification 
system + the reserved space (like on filesystems)
Maybe I was too optimistic (naive or just plain ignorant) about this.

Best regards,

Sebastian




Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-18 Thread Sebastian Frias
Hi Austin,

On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote:
>> I see the difference, your answer seems a bit like the one from Austin, 
>> basically:
>> - killing a process is a sort of kernel protection attempting to deal 
>> "automatically" with some situation, like deciding what is a 'memory hog', 
>> or what is 'in infinite loop', "usually" in a correct way.
>> It seems there's people who think its better to avoid having to take such 
>> decisions and/or they should be decided by the user, because "usually" != 
>> "always".
> FWIW, it's really easy to see what's using a lot of memory, it's impossible 
> to tell if something is stuck in an infinite loop without looking deep into 
> the process state and possibly even at the source code (and even then it can 
> be almost impossible to be certain).  This is why we have a OOM-Killer, and 
> not a infinite-loop-killer.
> 
> Again I reiterate, if a system is properly provisioned (that is, if you have 
> put in enough RAM and possibly swap space to do what you want to use it for), 
> the only reason the OOM-killer should be invoked is due to a bug. 

Are you sure that's the only possible reason?
I mean, what if somebody keeps opening tabs on Firefox?
If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs 
open, please close some to free memory".

> The non-default overcommit options still have the same issues they just 
> change how and when they happen (overcommit=never will fire sooner, 
> overcommit=always will fire later), and also can impact memory allocation 
> performance (I have numbers somewhere that I can't find right now that 
> demonstrated that overcommit=never gave more deterministic and (on average) 
> marginally better malloc() performance, and simple logic would suggest that 
> overcommit=always would make malloc() perform better too).
>> And people who see that as a nice thing but complex thing to do.
>> In this thread we've tried to explain why this heuristic (and/or OOM-killer) 
>> is/was needed and/or its history, which has been very enlightening by the 
>> way.
>>
>> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here 
>> talking about cgroups), it looks like the OOM-killer is still being actively 
>> discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes in a 
>> given cgroup until the OOM situation is solved for that cgroup, right?
>> If that is right, it means that there is indeed a way to deal with an OOM 
>> situation (stack expansion, COW failure, 'memory hog', etc.) in a better way 
>> than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole system 
>> behave as if it was inside a cgroup? (*)
> No, not with the process freeze behavior, because getting the group running 
> again requires input from an external part of the system, which by definition 
> doesn't exist if the group is the entire system; 

Do you mean that it pauses all processes in the cgroup?
I thought it would pause on a case-by-case basis, like first process to reach 
the limit gets paused, and so on.

Honestly I thought it would work a bit like the filesystems, where 'root' 
usually has 5% reserved, so that a process (or processes) filling the disk does 
not disrupt the system to the point of preventing 'root' from performing 
administrative actions.

That makes me think, why is disk space handled differently than memory in this 
case? I mean, why is disk space exhaustion handled differently than memory 
exhaustion?
We could imagine that both resources are required for proper system and process 
operation, so if OOM-killer is there to attempt to keep the system working at 
all costs (even if that means sacrificing processes), why isn't there an 
OOFS-killer (out-of-free-space killer)?

>and, because our GUI isn't built into the kernel, we can't pause things and 
>pop up a little dialog asking the user what to do to resolve the issue.

:-) Yeah, I was thinking that could be handled with the cgroups' notification 
system + the reserved space (like on filesystems)
Maybe I was too optimistic (naive or just plain ignorant) about this.

Best regards,

Sebastian




Re: linux-next: Tree for May 18 (rcu/tree)

2016-05-18 Thread Randy Dunlap
On 05/17/16 21:30, Stephen Rothwell wrote:
> Hi all,
> 
> Please do not add any v4.8 destined material to your linux-next included
> branches until after v4.7-rc1 has been released.
> 
> Changes since 20160517:
> 

on i386:

In file included from ../kernel/rcu/tree.c:4209:0:
../kernel/rcu/tree_plugin.h: In function 'rcu_boost_kthread_setaffinity':
../kernel/rcu/tree_plugin.h:1168:2: error: implicit declaration of function 
'for_each_leaf_node_cpu_bit' [-Werror=implicit-function-declaration]
  for_each_leaf_node_cpu_bit(rnp, cpu, bit)
  ^
../kernel/rcu/tree_plugin.h:1169:3: error: expected ';' before 'if'
   if ((mask & bit) && cpu != outgoingcpu)
   ^
../kernel/rcu/tree_plugin.h:1159:16: warning: unused variable 'mask' 
[-Wunused-variable]
  unsigned long mask = rcu_rnp_online_cpus(rnp);
^


Full randconfig file is attached.




-- 
~Randy
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.6.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_FHANDLE is not set
CONFIG_USELIB=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
# CONFIG_RCU_NOCB_CPU is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_NAMESPACES is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# 

Re: linux-next: Tree for May 18 (rcu/tree)

2016-05-18 Thread Randy Dunlap
On 05/17/16 21:30, Stephen Rothwell wrote:
> Hi all,
> 
> Please do not add any v4.8 destined material to your linux-next included
> branches until after v4.7-rc1 has been released.
> 
> Changes since 20160517:
> 

on i386:

In file included from ../kernel/rcu/tree.c:4209:0:
../kernel/rcu/tree_plugin.h: In function 'rcu_boost_kthread_setaffinity':
../kernel/rcu/tree_plugin.h:1168:2: error: implicit declaration of function 
'for_each_leaf_node_cpu_bit' [-Werror=implicit-function-declaration]
  for_each_leaf_node_cpu_bit(rnp, cpu, bit)
  ^
../kernel/rcu/tree_plugin.h:1169:3: error: expected ';' before 'if'
   if ((mask & bit) && cpu != outgoingcpu)
   ^
../kernel/rcu/tree_plugin.h:1159:16: warning: unused variable 'mask' 
[-Wunused-variable]
  unsigned long mask = rcu_rnp_online_cpus(rnp);
^


Full randconfig file is attached.




-- 
~Randy
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.6.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_FHANDLE is not set
CONFIG_USELIB=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
# CONFIG_RCU_NOCB_CPU is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_NAMESPACES is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# 

Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

2016-05-18 Thread Michal Hocko
On Wed 18-05-16 13:59:53, Vlastimil Babka wrote:
> On 05/13/2016 02:05 PM, Michal Hocko wrote:
> > On Fri 13-05-16 10:23:31, Vlastimil Babka wrote:
> >> On 05/12/2016 06:20 PM, Michal Hocko wrote:
> >>> On Tue 10-05-16 09:35:56, Vlastimil Babka wrote:
> >>> [...]
>  diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>  index 570383a41853..0cb09714d960 100644
>  --- a/include/linux/gfp.h
>  +++ b/include/linux/gfp.h
>  @@ -256,8 +256,7 @@ struct vm_area_struct;
> #define GFP_HIGHUSER  (GFP_USER | __GFP_HIGHMEM)
> #define GFP_HIGHUSER_MOVABLE  (GFP_HIGHUSER | __GFP_MOVABLE)
> #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
>  - __GFP_NOMEMALLOC | __GFP_NORETRY | 
>  __GFP_NOWARN) & \
>  - ~__GFP_RECLAIM)
>  + __GFP_NOMEMALLOC | __GFP_NOWARN) & 
>  ~__GFP_RECLAIM)
> >>>
> >>> I am not sure this is the right thing to do. I think we should keep
> >>> __GFP_NORETRY and clear it where we want a stronger semantic. This is
> >>> just too suble that all callsites are doing the right thing.
> >>
> >> That would complicate alloc_hugepage_direct_gfpmask() a bit, but if you
> >> think it's worth it, I can turn the default around, OK.
> > 
> > Hmm, on the other hand it is true that GFP_TRANSHUGE is clearing both
> > reclaim flags by default and then overwrites that. This is just too
> > ugly. Can we make GFP_TRANSHUGE to only define flags we care about and
> > then tweak those that should go away at the callsites which matter now
> > that we do not rely on is_thp_gfp_mask?
>  
> So the following patch attempts what you suggest, if I understand you
> correctly. GFP_TRANSHUGE includes all possible flag, and then they are
> removed as needed. I don't really think it helps code readability
> though.

yeah it is ugly has _hell_. I do not think this deserves too much time
to discuss as the flag is mostly internal but one last proposal would be
to define different THP allocations context explicitly. Some callers
would still need some additional meddling but maybe it would be slightly
better to read. Dunno. Anyway if you think this is not really an
improvement then I won't insist on any change to your original patch.
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..e7926b466107 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -255,9 +255,14 @@ struct vm_area_struct;
 #define GFP_DMA32  __GFP_DMA32
 #define GFP_HIGHUSER   (GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE   (GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE  ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+
+/* Optimistic or latency sensitive THP allocation - page fault path */
+#define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
 ~__GFP_RECLAIM)
+/* More serious THP allocation request - kcompactd */
+#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) & \
+   ~__GFP_NORETRY
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a4d4c807d92..937b89c6c0aa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -216,7 +216,7 @@ struct page *get_huge_zero_page(void)
if (likely(atomic_inc_not_zero(_zero_refcount)))
return READ_ONCE(huge_zero_page);
 
-   zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE,
+   zero_page = alloc_pages((GFP_TRANSHUGE_LIGHT | __GFP_ZERO) & 
~__GFP_MOVABLE,
HPAGE_PMD_ORDER);
if (!zero_page) {
count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED);
@@ -888,23 +888,31 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct 
*mm,
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
-   gfp_t reclaim_flags = 0;
+   gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT;
 
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, 
_hugepage_flags) &&
(vma->vm_flags & VM_HUGEPAGE))
-   reclaim_flags = __GFP_DIRECT_RECLAIM;
+   gfp_mask = GFP_TRANSHUGE;
else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, 
_hugepage_flags))
-   reclaim_flags = __GFP_KSWAPD_RECLAIM;
-   else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, 
_hugepage_flags))
-   reclaim_flags = __GFP_DIRECT_RECLAIM;
+   gfp_mask |= __GFP_KSWAPD_RECLAIM;
+   else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, 
_hugepage_flags)) {
+   if (vm->vm_flags & VM_HUGEPAGE)
+   gfp_mask = GFP_TRANSHUGE;
+   else
+   gfp_mask = GFP_TRANSHUGE | __GFP_NORETRY;
+   }
 
-   return GFP_TRANSHUGE | reclaim_flags;
+   return 

Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations

2016-05-18 Thread Michal Hocko
On Wed 18-05-16 13:59:53, Vlastimil Babka wrote:
> On 05/13/2016 02:05 PM, Michal Hocko wrote:
> > On Fri 13-05-16 10:23:31, Vlastimil Babka wrote:
> >> On 05/12/2016 06:20 PM, Michal Hocko wrote:
> >>> On Tue 10-05-16 09:35:56, Vlastimil Babka wrote:
> >>> [...]
>  diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>  index 570383a41853..0cb09714d960 100644
>  --- a/include/linux/gfp.h
>  +++ b/include/linux/gfp.h
>  @@ -256,8 +256,7 @@ struct vm_area_struct;
> #define GFP_HIGHUSER  (GFP_USER | __GFP_HIGHMEM)
> #define GFP_HIGHUSER_MOVABLE  (GFP_HIGHUSER | __GFP_MOVABLE)
> #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
>  - __GFP_NOMEMALLOC | __GFP_NORETRY | 
>  __GFP_NOWARN) & \
>  - ~__GFP_RECLAIM)
>  + __GFP_NOMEMALLOC | __GFP_NOWARN) & 
>  ~__GFP_RECLAIM)
> >>>
> >>> I am not sure this is the right thing to do. I think we should keep
> >>> __GFP_NORETRY and clear it where we want a stronger semantic. This is
> >>> just too suble that all callsites are doing the right thing.
> >>
> >> That would complicate alloc_hugepage_direct_gfpmask() a bit, but if you
> >> think it's worth it, I can turn the default around, OK.
> > 
> > Hmm, on the other hand it is true that GFP_TRANSHUGE is clearing both
> > reclaim flags by default and then overwrites that. This is just too
> > ugly. Can we make GFP_TRANSHUGE to only define flags we care about and
> > then tweak those that should go away at the callsites which matter now
> > that we do not rely on is_thp_gfp_mask?
>  
> So the following patch attempts what you suggest, if I understand you
> correctly. GFP_TRANSHUGE includes all possible flag, and then they are
> removed as needed. I don't really think it helps code readability
> though.

yeah it is ugly has _hell_. I do not think this deserves too much time
to discuss as the flag is mostly internal but one last proposal would be
to define different THP allocations context explicitly. Some callers
would still need some additional meddling but maybe it would be slightly
better to read. Dunno. Anyway if you think this is not really an
improvement then I won't insist on any change to your original patch.
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..e7926b466107 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -255,9 +255,14 @@ struct vm_area_struct;
 #define GFP_DMA32  __GFP_DMA32
 #define GFP_HIGHUSER   (GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE   (GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE  ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+
+/* Optimistic or latency sensitive THP allocation - page fault path */
+#define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
 ~__GFP_RECLAIM)
+/* More serious THP allocation request - kcompactd */
+#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) & \
+   ~__GFP_NORETRY
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a4d4c807d92..937b89c6c0aa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -216,7 +216,7 @@ struct page *get_huge_zero_page(void)
if (likely(atomic_inc_not_zero(_zero_refcount)))
return READ_ONCE(huge_zero_page);
 
-   zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE,
+   zero_page = alloc_pages((GFP_TRANSHUGE_LIGHT | __GFP_ZERO) & 
~__GFP_MOVABLE,
HPAGE_PMD_ORDER);
if (!zero_page) {
count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED);
@@ -888,23 +888,31 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct 
*mm,
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
-   gfp_t reclaim_flags = 0;
+   gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT;
 
if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, 
_hugepage_flags) &&
(vma->vm_flags & VM_HUGEPAGE))
-   reclaim_flags = __GFP_DIRECT_RECLAIM;
+   gfp_mask = GFP_TRANSHUGE;
else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, 
_hugepage_flags))
-   reclaim_flags = __GFP_KSWAPD_RECLAIM;
-   else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, 
_hugepage_flags))
-   reclaim_flags = __GFP_DIRECT_RECLAIM;
+   gfp_mask |= __GFP_KSWAPD_RECLAIM;
+   else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, 
_hugepage_flags)) {
+   if (vm->vm_flags & VM_HUGEPAGE)
+   gfp_mask = GFP_TRANSHUGE;
+   else
+   gfp_mask = GFP_TRANSHUGE | __GFP_NORETRY;
+   }
 
-   return GFP_TRANSHUGE | reclaim_flags;
+   return 

Re: [PATCH] mm: fix duplicate words and typos

2016-05-18 Thread Randy Dunlap
On 05/17/16 19:35, Li Peng wrote:
> Signed-off-by: Li Peng 
> ---
>  mm/memcontrol.c | 2 +-
>  mm/page_alloc.c | 6 +++---
>  mm/vmscan.c | 7 +++
>  mm/zswap.c  | 2 +-
>  4 files changed, 8 insertions(+), 9 deletions(-)

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 142cb61..8ff5a79 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c

> @@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, 
> int classzone_idx)
>   /*
>* There should be no need to raise the scanning
>* priority if enough pages are already being scanned
> -  * that that high watermark would be met at 100%
> -  * efficiency.
> +  * that high watermark would be met at 100% efficiency.

I think that this one wasn't wrong, just confusing.  Maybe change it to:
* that the high watermark would be met at 100% 
efficiency.

>*/
>   if (kswapd_shrink_zone(zone, end_zone, ))
>   raise_priority = false;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index de0f119b..6d829d7 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, 
> unsigned long handle)
>   * a load may happening concurrently
>   * it is safe and okay to not free the entry
>   * if we free the entry in the following put
> - * it it either okay to return !0
> + * it either okay to return !0

That's still confusing.  Needs some kind of help.

>   */
>  fail:
>   spin_lock(>lock);
> 


-- 
~Randy


Re: [PATCH] mm: fix duplicate words and typos

2016-05-18 Thread Randy Dunlap
On 05/17/16 19:35, Li Peng wrote:
> Signed-off-by: Li Peng 
> ---
>  mm/memcontrol.c | 2 +-
>  mm/page_alloc.c | 6 +++---
>  mm/vmscan.c | 7 +++
>  mm/zswap.c  | 2 +-
>  4 files changed, 8 insertions(+), 9 deletions(-)

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 142cb61..8ff5a79 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c

> @@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, 
> int classzone_idx)
>   /*
>* There should be no need to raise the scanning
>* priority if enough pages are already being scanned
> -  * that that high watermark would be met at 100%
> -  * efficiency.
> +  * that high watermark would be met at 100% efficiency.

I think that this one wasn't wrong, just confusing.  Maybe change it to:
* that the high watermark would be met at 100% 
efficiency.

>*/
>   if (kswapd_shrink_zone(zone, end_zone, ))
>   raise_priority = false;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index de0f119b..6d829d7 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, 
> unsigned long handle)
>   * a load may happening concurrently
>   * it is safe and okay to not free the entry
>   * if we free the entry in the following put
> - * it it either okay to return !0
> + * it either okay to return !0

That's still confusing.  Needs some kind of help.

>   */
>  fail:
>   spin_lock(>lock);
> 


-- 
~Randy


[RFC PATCH 04/15] Convert 32-bit ISO atomics into a template

2016-05-18 Thread David Howells
Convert the 32-bit ISO C++11 atomic implementation into a template that can
then be overloaded with #defines to produce 32-bit, long and 64-bit
atomics.

This is then overloaded to produce the 32-bit atomic_t implementation.

This should probably be merged with the previous patch if we want to use
the template approach.  The template approach has the advantage that it can
produce all of the different sets of atomics from the same code, but the
disadvantage that the template is much less readable.

Signed-off-by: David Howells 
---

 include/asm-generic/iso-atomic-template.h |  572 +
 include/asm-generic/iso-atomic.h  |  387 
 2 files changed, 579 insertions(+), 380 deletions(-)
 create mode 100644 include/asm-generic/iso-atomic-template.h

diff --git a/include/asm-generic/iso-atomic-template.h 
b/include/asm-generic/iso-atomic-template.h
new file mode 100644
index ..76da163ec559
--- /dev/null
+++ b/include/asm-generic/iso-atomic-template.h
@@ -0,0 +1,572 @@
+/* Use ISO C++11 intrinsics to implement atomic ops.
+ *
+ * This file is a template.  The #includer needs to #define the following
+ * items:
+ *
+ * atomic_val  - counter type (eg. int, long, long long)
+ * atomic_prefix(x)- prefix (eg. atomic, atomic64, atomic_long)
+ * __atomic_prefix(x)  - prefix with "__" prepended
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+/**
+ * atomic_read - read atomic variable
+ * @v: pointer to atomic variable
+ *
+ * Atomically reads the value of @v.
+ */
+static __always_inline atomic_val atomic_prefix(_read)(const atomic_prefix(_t) 
*v)
+{
+   return __atomic_load_n(>counter, __ATOMIC_RELAXED);
+}
+
+static __always_inline atomic_val atomic_prefix(_read_acquire)(const 
atomic_prefix(_t) *v)
+{
+   return __atomic_load_n(>counter, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * atomic_set - set atomic variable
+ * @v: pointer to atomic variable
+ * @i: required value
+ *
+ * Atomically sets the value of @v to @i.
+ */
+static __always_inline void atomic_prefix(_set)(atomic_prefix(_t) *v, 
atomic_val i)
+{
+   __atomic_store_n(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline void atomic_prefix(_set_release)(atomic_prefix(_t) *v, 
atomic_val i)
+{
+   __atomic_store_n(>counter, i, __ATOMIC_RELEASE);
+}
+
+/**
+ * atomic_add - add integer to atomic variable
+ * @i: integer value to add
+ * @v: pointer to atomic variable
+ *
+ * Atomically adds @i to @v.
+ */
+static __always_inline void atomic_prefix(_add)(atomic_val i, 
atomic_prefix(_t) *v)
+{
+   __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline void atomic_prefix(_inc)(atomic_prefix(_t) *v)
+{
+   __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED);
+}
+
+/**
+ * atomic_sub - subtract integer from atomic variable
+ * @i: integer value to subtract
+ * @v: pointer to atomic variable
+ *
+ * Atomically subtracts @i from @v.
+ */
+static __always_inline void atomic_prefix(_sub)(atomic_val i, 
atomic_prefix(_t) *v)
+{
+   __atomic_sub_fetch(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline void atomic_prefix(_dec)(atomic_prefix(_t) *v)
+{
+   __atomic_sub_fetch(>counter, 1, __ATOMIC_RELAXED);
+}
+
+/**
+ * atomic_add_return - add integer and return
+ * @i: integer value to add
+ * @v: pointer to atomic variable
+ *
+ * Atomically adds @i to @v and returns @i + @v.
+ */
+static __always_inline
+atomic_val atomic_prefix(_add_return)(atomic_val i, atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_SEQ_CST);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_inc_return)(atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, 1, __ATOMIC_SEQ_CST);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_add_return_relaxed)(atomic_val i, atomic_prefix(_t) 
*v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_inc_return_relaxed)(atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_add_return_acquire)(atomic_val i, atomic_prefix(_t) 
*v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_ACQUIRE);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_inc_return_acquire)(atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, 1, __ATOMIC_ACQUIRE);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_add_return_release)(atomic_val i, atomic_prefix(_t) 
*v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_RELEASE);

[RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics

2016-05-18 Thread David Howells
Provide implementations of cmpxchg(), xchg(), xadd() and __add() using ISO
C++11 intrinsics.  Also provide variants with less strong ordering.

The __atomic_compare_exchange_n() function returns a bool indicating
whether or not the exchange took place.  On x86, for example, the CMPXCHG
instruction sticks this value into the Z flag and using the bool return
allows the compiler to use this value directly.  I've added two new macros
for this:

 (1) bool cmpxchg_return(TYPE *ptr, TYPE old, TYPE new, TYPE *_orig)

 This is like cmpxchg() except that the current value is returned in
 *_orig and true is returned if the exchange happens, false otherwise.

 (2) bool try_cmpxchg(TYPE *ptr, TYPE old, TYPE new)

 Like cmpxchg_return() except that the current value is discarded.

I've marked cmpxchg() as deprecated as the compiler generates better code
if it is allowed to use the extra bit of information rather than comparing
old with the current value.

To illustrate the differences on x86_64, I compiled the following with -Os
and -fomit-frame-pointer:

const char *kernel_cmpxchg(long *ptr, long old, long new)
{
long cur;
cur = cmpxchg(ptr, old, new);
return cur == old ? "foo" : "bar";
}

#define ISO__try_cmpxchg(ptr, old, new, mem)\
({  \
__typeof__((ptr)) __ptr = (ptr);\
__typeof__(*__ptr) __orig;  \
__atomic_compare_exchange_n(__ptr,  \
&__orig, (new), \
false,  \
mem,\
__ATOMIC_RELAXED);  \
})
#define ISO_try_cmpxchg(ptr, old, new) \
ISO__try_cmpxchg((ptr), (old), (new), __ATOMIC_SEQ_CST)
const char *iso_try_cmpxchg(long *ptr, long old, long new)
{
return ISO_try_cmpxchg(ptr, old, new) ? "foo" : "bar";
}

which gives:

kernel_cmpxchg:
mov%rsi,%rax
lock cmpxchg %rdx,(%rdi)
mov$.LC1,%edx
cmp%rax,%rsi<--- Redundant instruction
mov$.LC0,%eax
cmovne %rdx,%rax
retq

iso_cmpxchg_return:
mov%rsi,%rax
mov%rsi,-0x8(%rsp)  <--- Unoptimisation
lock cmpxchg %rdx,(%rdi)
mov$.LC1,%edx
mov$.LC0,%eax
cmovne %rdx,%rax
retq

As can be seen, barring an unoptimisation, the ISO try_cmpxchg() code can
dispense with the CMP instruction required for the kernel inline asm and
use the Z flag resulting from the CMPXCHG directly.

This unoptimisation is an issus in gcc's __atomic_compare_exchange[_n]() on
several arches, including x86, arm64 and powerpc64, whereby it tries to use
the stack to pass the old value rather than bunging this into a register.

This appears to be because __atomic_compare_exchange[_n]() returns a bool
indicating whether the exchange happened or not and passes back the current
value read from the memory variable through a pointer in the argument list.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

Signed-off-by: David Howells 
---

 include/asm-generic/iso-cmpxchg.h |  180 +
 include/linux/atomic.h|3 +
 2 files changed, 183 insertions(+)
 create mode 100644 include/asm-generic/iso-cmpxchg.h

diff --git a/include/asm-generic/iso-cmpxchg.h 
b/include/asm-generic/iso-cmpxchg.h
new file mode 100644
index ..c84e213c0b6b
--- /dev/null
+++ b/include/asm-generic/iso-cmpxchg.h
@@ -0,0 +1,180 @@
+/* Use ISO C++11 intrinsics to implement cmpxchg() and xchg().
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _ASM_GENERIC_ISO_CMPXCHG_H
+#define _ASM_GENERIC_ISO_CMPXCHG_H
+
+/**
+ * cmpxchg_return - Check variable and replace if expected value.
+ * @ptr: Pointer to target variable
+ * @old: The value to check for
+ * @new: The value to replace with
+ * @_orig: A pointer to a variable in which to place the current value
+ *
+ * Atomically checks the contents of *@ptr, and if the same as @old, replaces
+ * it with @new.  Return true if the exchange happened, false if it didn't.
+ * The original value read from 

[RFC PATCH 04/15] Convert 32-bit ISO atomics into a template

2016-05-18 Thread David Howells
Convert the 32-bit ISO C++11 atomic implementation into a template that can
then be overloaded with #defines to produce 32-bit, long and 64-bit
atomics.

This is then overloaded to produce the 32-bit atomic_t implementation.

This should probably be merged with the previous patch if we want to use
the template approach.  The template approach has the advantage that it can
produce all of the different sets of atomics from the same code, but the
disadvantage that the template is much less readable.

Signed-off-by: David Howells 
---

 include/asm-generic/iso-atomic-template.h |  572 +
 include/asm-generic/iso-atomic.h  |  387 
 2 files changed, 579 insertions(+), 380 deletions(-)
 create mode 100644 include/asm-generic/iso-atomic-template.h

diff --git a/include/asm-generic/iso-atomic-template.h 
b/include/asm-generic/iso-atomic-template.h
new file mode 100644
index ..76da163ec559
--- /dev/null
+++ b/include/asm-generic/iso-atomic-template.h
@@ -0,0 +1,572 @@
+/* Use ISO C++11 intrinsics to implement atomic ops.
+ *
+ * This file is a template.  The #includer needs to #define the following
+ * items:
+ *
+ * atomic_val  - counter type (eg. int, long, long long)
+ * atomic_prefix(x)- prefix (eg. atomic, atomic64, atomic_long)
+ * __atomic_prefix(x)  - prefix with "__" prepended
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+/**
+ * atomic_read - read atomic variable
+ * @v: pointer to atomic variable
+ *
+ * Atomically reads the value of @v.
+ */
+static __always_inline atomic_val atomic_prefix(_read)(const atomic_prefix(_t) 
*v)
+{
+   return __atomic_load_n(>counter, __ATOMIC_RELAXED);
+}
+
+static __always_inline atomic_val atomic_prefix(_read_acquire)(const 
atomic_prefix(_t) *v)
+{
+   return __atomic_load_n(>counter, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * atomic_set - set atomic variable
+ * @v: pointer to atomic variable
+ * @i: required value
+ *
+ * Atomically sets the value of @v to @i.
+ */
+static __always_inline void atomic_prefix(_set)(atomic_prefix(_t) *v, 
atomic_val i)
+{
+   __atomic_store_n(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline void atomic_prefix(_set_release)(atomic_prefix(_t) *v, 
atomic_val i)
+{
+   __atomic_store_n(>counter, i, __ATOMIC_RELEASE);
+}
+
+/**
+ * atomic_add - add integer to atomic variable
+ * @i: integer value to add
+ * @v: pointer to atomic variable
+ *
+ * Atomically adds @i to @v.
+ */
+static __always_inline void atomic_prefix(_add)(atomic_val i, 
atomic_prefix(_t) *v)
+{
+   __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline void atomic_prefix(_inc)(atomic_prefix(_t) *v)
+{
+   __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED);
+}
+
+/**
+ * atomic_sub - subtract integer from atomic variable
+ * @i: integer value to subtract
+ * @v: pointer to atomic variable
+ *
+ * Atomically subtracts @i from @v.
+ */
+static __always_inline void atomic_prefix(_sub)(atomic_val i, 
atomic_prefix(_t) *v)
+{
+   __atomic_sub_fetch(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline void atomic_prefix(_dec)(atomic_prefix(_t) *v)
+{
+   __atomic_sub_fetch(>counter, 1, __ATOMIC_RELAXED);
+}
+
+/**
+ * atomic_add_return - add integer and return
+ * @i: integer value to add
+ * @v: pointer to atomic variable
+ *
+ * Atomically adds @i to @v and returns @i + @v.
+ */
+static __always_inline
+atomic_val atomic_prefix(_add_return)(atomic_val i, atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_SEQ_CST);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_inc_return)(atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, 1, __ATOMIC_SEQ_CST);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_add_return_relaxed)(atomic_val i, atomic_prefix(_t) 
*v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_inc_return_relaxed)(atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_add_return_acquire)(atomic_val i, atomic_prefix(_t) 
*v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_ACQUIRE);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_inc_return_acquire)(atomic_prefix(_t) *v)
+{
+   return __atomic_add_fetch(>counter, 1, __ATOMIC_ACQUIRE);
+}
+
+static __always_inline
+atomic_val atomic_prefix(_add_return_release)(atomic_val i, atomic_prefix(_t) 
*v)
+{
+   return __atomic_add_fetch(>counter, i, __ATOMIC_RELEASE);
+}
+
+static 

[RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics

2016-05-18 Thread David Howells
Provide implementations of cmpxchg(), xchg(), xadd() and __add() using ISO
C++11 intrinsics.  Also provide variants with less strong ordering.

The __atomic_compare_exchange_n() function returns a bool indicating
whether or not the exchange took place.  On x86, for example, the CMPXCHG
instruction sticks this value into the Z flag and using the bool return
allows the compiler to use this value directly.  I've added two new macros
for this:

 (1) bool cmpxchg_return(TYPE *ptr, TYPE old, TYPE new, TYPE *_orig)

 This is like cmpxchg() except that the current value is returned in
 *_orig and true is returned if the exchange happens, false otherwise.

 (2) bool try_cmpxchg(TYPE *ptr, TYPE old, TYPE new)

 Like cmpxchg_return() except that the current value is discarded.

I've marked cmpxchg() as deprecated as the compiler generates better code
if it is allowed to use the extra bit of information rather than comparing
old with the current value.

To illustrate the differences on x86_64, I compiled the following with -Os
and -fomit-frame-pointer:

const char *kernel_cmpxchg(long *ptr, long old, long new)
{
long cur;
cur = cmpxchg(ptr, old, new);
return cur == old ? "foo" : "bar";
}

#define ISO__try_cmpxchg(ptr, old, new, mem)\
({  \
__typeof__((ptr)) __ptr = (ptr);\
__typeof__(*__ptr) __orig;  \
__atomic_compare_exchange_n(__ptr,  \
&__orig, (new), \
false,  \
mem,\
__ATOMIC_RELAXED);  \
})
#define ISO_try_cmpxchg(ptr, old, new) \
ISO__try_cmpxchg((ptr), (old), (new), __ATOMIC_SEQ_CST)
const char *iso_try_cmpxchg(long *ptr, long old, long new)
{
return ISO_try_cmpxchg(ptr, old, new) ? "foo" : "bar";
}

which gives:

kernel_cmpxchg:
mov%rsi,%rax
lock cmpxchg %rdx,(%rdi)
mov$.LC1,%edx
cmp%rax,%rsi<--- Redundant instruction
mov$.LC0,%eax
cmovne %rdx,%rax
retq

iso_cmpxchg_return:
mov%rsi,%rax
mov%rsi,-0x8(%rsp)  <--- Unoptimisation
lock cmpxchg %rdx,(%rdi)
mov$.LC1,%edx
mov$.LC0,%eax
cmovne %rdx,%rax
retq

As can be seen, barring an unoptimisation, the ISO try_cmpxchg() code can
dispense with the CMP instruction required for the kernel inline asm and
use the Z flag resulting from the CMPXCHG directly.

This unoptimisation is an issus in gcc's __atomic_compare_exchange[_n]() on
several arches, including x86, arm64 and powerpc64, whereby it tries to use
the stack to pass the old value rather than bunging this into a register.

This appears to be because __atomic_compare_exchange[_n]() returns a bool
indicating whether the exchange happened or not and passes back the current
value read from the memory variable through a pointer in the argument list.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825

Signed-off-by: David Howells 
---

 include/asm-generic/iso-cmpxchg.h |  180 +
 include/linux/atomic.h|3 +
 2 files changed, 183 insertions(+)
 create mode 100644 include/asm-generic/iso-cmpxchg.h

diff --git a/include/asm-generic/iso-cmpxchg.h 
b/include/asm-generic/iso-cmpxchg.h
new file mode 100644
index ..c84e213c0b6b
--- /dev/null
+++ b/include/asm-generic/iso-cmpxchg.h
@@ -0,0 +1,180 @@
+/* Use ISO C++11 intrinsics to implement cmpxchg() and xchg().
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _ASM_GENERIC_ISO_CMPXCHG_H
+#define _ASM_GENERIC_ISO_CMPXCHG_H
+
+/**
+ * cmpxchg_return - Check variable and replace if expected value.
+ * @ptr: Pointer to target variable
+ * @old: The value to check for
+ * @new: The value to replace with
+ * @_orig: A pointer to a variable in which to place the current value
+ *
+ * Atomically checks the contents of *@ptr, and if the same as @old, replaces
+ * it with @new.  Return true if the exchange happened, false if it didn't.
+ * The original value read from *@ptr is stored in 

Re: [RFC PATCH 2/3] arch/powerpc : optprobes for powerpc core

2016-05-18 Thread Masami Hiramatsu
On Wed, 18 May 2016 02:09:37 +0530
Anju T  wrote:

> Instruction slot for detour buffer is allocated from
> the reserved area. For the time being 64KB is reserved
> in memory for this purpose. ppc_get_optinsn_slot() and
> ppc_free_optinsn_slot() are geared towards the allocation and freeing
> of memory from this area.

Thank you for porting optprobe on ppc!!

I have some comments on this patch.

> 
> Signed-off-by: Anju T 
> ---
>  arch/powerpc/kernel/optprobes.c | 463 
> 
>  1 file changed, 463 insertions(+)
>  create mode 100644 arch/powerpc/kernel/optprobes.c
> 
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> new file mode 100644
> index 000..50a60c1
> --- /dev/null
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -0,0 +1,463 @@
> +/*
> + * Code for Kernel probes Jump optimization.
> + *
> + * Copyright 2016, Anju T, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Reserve an area to allocate slots for detour buffer */
> +extern void  optprobe_trampoline_holder(void)
> +{
> + asm volatile(".global optinsn_slot\n"
> + "optinsn_slot:\n"
> + ".space 65536");
> +}

Would we better move this into optprobes_head.S?

> +
> +#define SLOT_SIZE 65536
> +#define TMPL_CALL_HDLR_IDX   \
> + (optprobe_template_call_handler - optprobe_template_entry)
> +#define TMPL_EMULATE_IDX \
> + (optprobe_template_call_emulate - optprobe_template_entry)
> +#define TMPL_RET_BRANCH_IDX  \
> + (optprobe_template_ret_branch - optprobe_template_entry)
> +#define TMPL_RET_IDX \
> + (optprobe_template_ret - optprobe_template_entry)
> +#define TMPL_OP1_IDX \
> + (optprobe_template_op_address1 - optprobe_template_entry)
> +#define TMPL_OP2_IDX \
> + (optprobe_template_op_address2 - optprobe_template_entry)
> +#define TMPL_INSN_IDX\
> + (optprobe_template_insn - optprobe_template_entry)
> +#define TMPL_END_IDX \
> + (optprobe_template_end - optprobe_template_entry)
> +
> +struct kprobe_ppc_insn_page {
> + struct list_head list;
> + kprobe_opcode_t *insns; /* Page of instruction slots */
> + struct kprobe_insn_cache *cache;
> + int nused;
> + int ngarbage;
> + char slot_used[];
> +};
> +
> +#define PPC_KPROBE_INSN_PAGE_SIZE(slots) \
> + (offsetof(struct kprobe_ppc_insn_page, slot_used) + \
> + (sizeof(char) * (slots)))
> +
> +enum ppc_kprobe_slot_state {
> + SLOT_CLEAN = 0,
> + SLOT_DIRTY = 1,
> + SLOT_USED = 2,
> +};
> +
> +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
> + .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
> + .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
> + /* .insn_size is initialized later */
> + .nr_garbage = 0,
> +};
> +
> +static int ppc_slots_per_page(struct kprobe_insn_cache *c)
> +{
> + /*
> +  * Here the #slots per page differs from x86 as we have
> +  * only 64KB reserved.
> +  */
> + return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
> +}
> +
> +/* Return 1 if all garbages are collected, otherwise 0. */
> +static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
> +{
> + kip->slot_used[idx] = SLOT_CLEAN;
> + kip->nused--;
> + return 0;
> +}
> +
> +static int collect_garbage_slots(struct kprobe_insn_cache *c)
> +{
> + struct kprobe_ppc_insn_page *kip, *next;
> +
> + /* Ensure no-one is interrupted on the garbages */
> + synchronize_sched();
> +
> + list_for_each_entry_safe(kip, next, >pages, list) {
> + int i;
> +
> + if (kip->ngarbage == 0)
> + continue;
> + kip->ngarbage = 0;  /* we will collect all garbages */
> + for (i = 0; i < ppc_slots_per_page(c); i++) {
> + if (kip->slot_used[i] == SLOT_DIRTY &&
> + collect_one_slot(kip, i))
> + break;
> + }
> + }
> + c->nr_garbage = 0;
> + return 0;
> +}
> +
> +kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
> +{
> + struct kprobe_ppc_insn_page *kip;
> + kprobe_opcode_t *slot = NULL;
> +
> + mutex_lock(>mutex);
> + list_for_each_entry(kip, >pages, list) {
> + if (kip->nused < ppc_slots_per_page(c)) {
> + int i;
> +
> + for (i = 0; i < ppc_slots_per_page(c); i++) {
> + if (kip->slot_used[i] == SLOT_CLEAN) {
> + 

Re: [RFC PATCH 2/3] arch/powerpc : optprobes for powerpc core

2016-05-18 Thread Masami Hiramatsu
On Wed, 18 May 2016 02:09:37 +0530
Anju T  wrote:

> Instruction slot for detour buffer is allocated from
> the reserved area. For the time being 64KB is reserved
> in memory for this purpose. ppc_get_optinsn_slot() and
> ppc_free_optinsn_slot() are geared towards the allocation and freeing
> of memory from this area.

Thank you for porting optprobe on ppc!!

I have some comments on this patch.

> 
> Signed-off-by: Anju T 
> ---
>  arch/powerpc/kernel/optprobes.c | 463 
> 
>  1 file changed, 463 insertions(+)
>  create mode 100644 arch/powerpc/kernel/optprobes.c
> 
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> new file mode 100644
> index 000..50a60c1
> --- /dev/null
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -0,0 +1,463 @@
> +/*
> + * Code for Kernel probes Jump optimization.
> + *
> + * Copyright 2016, Anju T, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Reserve an area to allocate slots for detour buffer */
> +extern void  optprobe_trampoline_holder(void)
> +{
> + asm volatile(".global optinsn_slot\n"
> + "optinsn_slot:\n"
> + ".space 65536");
> +}

Would we better move this into optprobes_head.S?

> +
> +#define SLOT_SIZE 65536
> +#define TMPL_CALL_HDLR_IDX   \
> + (optprobe_template_call_handler - optprobe_template_entry)
> +#define TMPL_EMULATE_IDX \
> + (optprobe_template_call_emulate - optprobe_template_entry)
> +#define TMPL_RET_BRANCH_IDX  \
> + (optprobe_template_ret_branch - optprobe_template_entry)
> +#define TMPL_RET_IDX \
> + (optprobe_template_ret - optprobe_template_entry)
> +#define TMPL_OP1_IDX \
> + (optprobe_template_op_address1 - optprobe_template_entry)
> +#define TMPL_OP2_IDX \
> + (optprobe_template_op_address2 - optprobe_template_entry)
> +#define TMPL_INSN_IDX\
> + (optprobe_template_insn - optprobe_template_entry)
> +#define TMPL_END_IDX \
> + (optprobe_template_end - optprobe_template_entry)
> +
> +struct kprobe_ppc_insn_page {
> + struct list_head list;
> + kprobe_opcode_t *insns; /* Page of instruction slots */
> + struct kprobe_insn_cache *cache;
> + int nused;
> + int ngarbage;
> + char slot_used[];
> +};
> +
> +#define PPC_KPROBE_INSN_PAGE_SIZE(slots) \
> + (offsetof(struct kprobe_ppc_insn_page, slot_used) + \
> + (sizeof(char) * (slots)))
> +
> +enum ppc_kprobe_slot_state {
> + SLOT_CLEAN = 0,
> + SLOT_DIRTY = 1,
> + SLOT_USED = 2,
> +};
> +
> +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
> + .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex),
> + .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages),
> + /* .insn_size is initialized later */
> + .nr_garbage = 0,
> +};
> +
> +static int ppc_slots_per_page(struct kprobe_insn_cache *c)
> +{
> + /*
> +  * Here the #slots per page differs from x86 as we have
> +  * only 64KB reserved.
> +  */
> + return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t));
> +}
> +
> +/* Return 1 if all garbages are collected, otherwise 0. */
> +static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx)
> +{
> + kip->slot_used[idx] = SLOT_CLEAN;
> + kip->nused--;
> + return 0;
> +}
> +
> +static int collect_garbage_slots(struct kprobe_insn_cache *c)
> +{
> + struct kprobe_ppc_insn_page *kip, *next;
> +
> + /* Ensure no-one is interrupted on the garbages */
> + synchronize_sched();
> +
> + list_for_each_entry_safe(kip, next, >pages, list) {
> + int i;
> +
> + if (kip->ngarbage == 0)
> + continue;
> + kip->ngarbage = 0;  /* we will collect all garbages */
> + for (i = 0; i < ppc_slots_per_page(c); i++) {
> + if (kip->slot_used[i] == SLOT_DIRTY &&
> + collect_one_slot(kip, i))
> + break;
> + }
> + }
> + c->nr_garbage = 0;
> + return 0;
> +}
> +
> +kprobe_opcode_t  *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c)
> +{
> + struct kprobe_ppc_insn_page *kip;
> + kprobe_opcode_t *slot = NULL;
> +
> + mutex_lock(>mutex);
> + list_for_each_entry(kip, >pages, list) {
> + if (kip->nused < ppc_slots_per_page(c)) {
> + int i;
> +
> + for (i = 0; i < ppc_slots_per_page(c); i++) {
> + if (kip->slot_used[i] == SLOT_CLEAN) {
> + kip->slot_used[i] = SLOT_USED;
> 

Re: [PATCH v6 2/8] debugfs: prevent access to removed files' private data

2016-05-18 Thread Sasha Levin
On 05/18/2016 11:01 AM, Nicolai Stange wrote:
> Thanks a million for reporting!
> 
> 1.) Do you have lockdep enabled?

Yup, nothing there.

> 2.) Does this happen before or after userspace init has been spawned,
> i.e. does the lockup happen at debugfs file creation time or
> possibly at usage time?

So I looked closer, and it seems to happen after starting syzkaller, which
as far as I know tries to open many different debugfs files.

Is there debug code I can add it that'll help us figure out what's up?


Thanks,
Sasha


Re: [PATCH v6 2/8] debugfs: prevent access to removed files' private data

2016-05-18 Thread Sasha Levin
On 05/18/2016 11:01 AM, Nicolai Stange wrote:
> Thanks a million for reporting!
> 
> 1.) Do you have lockdep enabled?

Yup, nothing there.

> 2.) Does this happen before or after userspace init has been spawned,
> i.e. does the lockup happen at debugfs file creation time or
> possibly at usage time?

So I looked closer, and it seems to happen after starting syzkaller, which
as far as I know tries to open many different debugfs files.

Is there debug code I can add it that'll help us figure out what's up?


Thanks,
Sasha


[PATCH] Input: pwm-beeper - fix: scheduling while atomic

2016-05-18 Thread Manfred Schlaegl
Pwm config may sleep so defer it using a worker.

Trigger:
On a Freescale i.MX53 based board we ran into "BUG: scheduling while
atomic" because input_inject_event locks interrupts, but
imx_pwm_config_v2 sleeps.

Tested on Freescale i.MX53 SoC with 4.6.0 and 4.1.24.

Unmodified applicable to
 * 4.6(stable)
 * 4.5.4  (stable)
 * 4.4.10 (longterm)
 * 4.1.24 (longterm)

Modified applicable to
 * 3.18.33 (longterm)

Signed-off-by: Manfred Schlaegl 
---
 drivers/input/misc/pwm-beeper.c | 54 +++--
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/input/misc/pwm-beeper.c b/drivers/input/misc/pwm-beeper.c
index f2261ab..7d783c8 100644
--- a/drivers/input/misc/pwm-beeper.c
+++ b/drivers/input/misc/pwm-beeper.c
@@ -20,21 +20,41 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pwm_beeper {
struct input_dev *input;
struct pwm_device *pwm;
+   struct work_struct work;
unsigned long period;
 };
 
 #define HZ_TO_NANOSECONDS(x) (10UL/(x))
 
+static void __pwm_beeper_set(struct pwm_beeper *beeper)
+{
+   unsigned long period = beeper->period;
+
+   pwm_config(beeper->pwm, period / 2, period);
+
+   if (period == 0)
+   pwm_disable(beeper->pwm);
+   else
+   pwm_enable(beeper->pwm);
+}
+
+static void pwm_beeper_work(struct work_struct *work)
+{
+   struct pwm_beeper *beeper =
+   container_of(work, struct pwm_beeper, work);
+
+   __pwm_beeper_set(beeper);
+}
+
 static int pwm_beeper_event(struct input_dev *input,
unsigned int type, unsigned int code, int value)
 {
-   int ret = 0;
struct pwm_beeper *beeper = input_get_drvdata(input);
-   unsigned long period;
 
if (type != EV_SND || value < 0)
return -EINVAL;
@@ -49,18 +69,12 @@ static int pwm_beeper_event(struct input_dev *input,
return -EINVAL;
}
 
-   if (value == 0) {
-   pwm_disable(beeper->pwm);
-   } else {
-   period = HZ_TO_NANOSECONDS(value);
-   ret = pwm_config(beeper->pwm, period / 2, period);
-   if (ret)
-   return ret;
-   ret = pwm_enable(beeper->pwm);
-   if (ret)
-   return ret;
-   beeper->period = period;
-   }
+   if (value == 0)
+   beeper->period = 0;
+   else
+   beeper->period = HZ_TO_NANOSECONDS(value);
+
+   schedule_work(>work);
 
return 0;
 }
@@ -87,6 +101,8 @@ static int pwm_beeper_probe(struct platform_device *pdev)
goto err_free;
}
 
+   INIT_WORK(>work, pwm_beeper_work);
+
beeper->input = input_allocate_device();
if (!beeper->input) {
dev_err(>dev, "Failed to allocate input device\n");
@@ -133,6 +149,8 @@ static int pwm_beeper_remove(struct platform_device *pdev)
 {
struct pwm_beeper *beeper = platform_get_drvdata(pdev);
 
+   cancel_work_sync(>work);
+
input_unregister_device(beeper->input);
 
pwm_disable(beeper->pwm);
@@ -147,6 +165,8 @@ static int __maybe_unused pwm_beeper_suspend(struct device 
*dev)
 {
struct pwm_beeper *beeper = dev_get_drvdata(dev);
 
+   cancel_work_sync(>work);
+
if (beeper->period)
pwm_disable(beeper->pwm);
 
@@ -157,10 +177,8 @@ static int __maybe_unused pwm_beeper_resume(struct device 
*dev)
 {
struct pwm_beeper *beeper = dev_get_drvdata(dev);
 
-   if (beeper->period) {
-   pwm_config(beeper->pwm, beeper->period / 2, beeper->period);
-   pwm_enable(beeper->pwm);
-   }
+   if (beeper->period)
+   __pwm_beeper_set(beeper);
 
return 0;
 }
-- 
2.1.4




[PATCH] Input: pwm-beeper - fix: scheduling while atomic

2016-05-18 Thread Manfred Schlaegl
Pwm config may sleep so defer it using a worker.

Trigger:
On a Freescale i.MX53 based board we ran into "BUG: scheduling while
atomic" because input_inject_event locks interrupts, but
imx_pwm_config_v2 sleeps.

Tested on Freescale i.MX53 SoC with 4.6.0 and 4.1.24.

Unmodified applicable to
 * 4.6(stable)
 * 4.5.4  (stable)
 * 4.4.10 (longterm)
 * 4.1.24 (longterm)

Modified applicable to
 * 3.18.33 (longterm)

Signed-off-by: Manfred Schlaegl 
---
 drivers/input/misc/pwm-beeper.c | 54 +++--
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/input/misc/pwm-beeper.c b/drivers/input/misc/pwm-beeper.c
index f2261ab..7d783c8 100644
--- a/drivers/input/misc/pwm-beeper.c
+++ b/drivers/input/misc/pwm-beeper.c
@@ -20,21 +20,41 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pwm_beeper {
struct input_dev *input;
struct pwm_device *pwm;
+   struct work_struct work;
unsigned long period;
 };
 
 #define HZ_TO_NANOSECONDS(x) (10UL/(x))
 
+static void __pwm_beeper_set(struct pwm_beeper *beeper)
+{
+   unsigned long period = beeper->period;
+
+   pwm_config(beeper->pwm, period / 2, period);
+
+   if (period == 0)
+   pwm_disable(beeper->pwm);
+   else
+   pwm_enable(beeper->pwm);
+}
+
+static void pwm_beeper_work(struct work_struct *work)
+{
+   struct pwm_beeper *beeper =
+   container_of(work, struct pwm_beeper, work);
+
+   __pwm_beeper_set(beeper);
+}
+
 static int pwm_beeper_event(struct input_dev *input,
unsigned int type, unsigned int code, int value)
 {
-   int ret = 0;
struct pwm_beeper *beeper = input_get_drvdata(input);
-   unsigned long period;
 
if (type != EV_SND || value < 0)
return -EINVAL;
@@ -49,18 +69,12 @@ static int pwm_beeper_event(struct input_dev *input,
return -EINVAL;
}
 
-   if (value == 0) {
-   pwm_disable(beeper->pwm);
-   } else {
-   period = HZ_TO_NANOSECONDS(value);
-   ret = pwm_config(beeper->pwm, period / 2, period);
-   if (ret)
-   return ret;
-   ret = pwm_enable(beeper->pwm);
-   if (ret)
-   return ret;
-   beeper->period = period;
-   }
+   if (value == 0)
+   beeper->period = 0;
+   else
+   beeper->period = HZ_TO_NANOSECONDS(value);
+
+   schedule_work(>work);
 
return 0;
 }
@@ -87,6 +101,8 @@ static int pwm_beeper_probe(struct platform_device *pdev)
goto err_free;
}
 
+   INIT_WORK(>work, pwm_beeper_work);
+
beeper->input = input_allocate_device();
if (!beeper->input) {
dev_err(>dev, "Failed to allocate input device\n");
@@ -133,6 +149,8 @@ static int pwm_beeper_remove(struct platform_device *pdev)
 {
struct pwm_beeper *beeper = platform_get_drvdata(pdev);
 
+   cancel_work_sync(>work);
+
input_unregister_device(beeper->input);
 
pwm_disable(beeper->pwm);
@@ -147,6 +165,8 @@ static int __maybe_unused pwm_beeper_suspend(struct device 
*dev)
 {
struct pwm_beeper *beeper = dev_get_drvdata(dev);
 
+   cancel_work_sync(>work);
+
if (beeper->period)
pwm_disable(beeper->pwm);
 
@@ -157,10 +177,8 @@ static int __maybe_unused pwm_beeper_resume(struct device 
*dev)
 {
struct pwm_beeper *beeper = dev_get_drvdata(dev);
 
-   if (beeper->period) {
-   pwm_config(beeper->pwm, beeper->period / 2, beeper->period);
-   pwm_enable(beeper->pwm);
-   }
+   if (beeper->period)
+   __pwm_beeper_set(beeper);
 
return 0;
 }
-- 
2.1.4




Re: [PATCH] rcu: tree: correctly handle sparse possible CPUs

2016-05-18 Thread Andrey Ryabinin
2016-05-16 19:48 GMT+03:00 Mark Rutland :

>  /*
> + * Iterate over all possible CPUs in a leaf RCU node.
> + */
> +#define for_each_leaf_node_possible_cpu(rnp, cpu) \
> +   for ((cpu) = rnp->grplo; \
> +cpu <= rnp->grphi; \
> +cpu = cpumask_next((cpu), cpu_possible_mask))
> +
> +/*
> + * Iterate over all possible CPUs in a leaf RCU node, at each step providing 
> a
> + * bit for comparison against rcu_node bitmasks.
> + */
> +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \
> +   for ((cpu) = rnp->grplo, (bit) = 1; \
> +cpu <= rnp->grphi; \
> +cpu = cpumask_next((cpu), cpu_possible_mask), \
> +  (bit) = 1UL << (cpu - rnp->grplo))
> +


[0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3
[0.164000] shift exponent 64 is too large for 64-bit type 'long
unsigned int'


Re: [PATCH] rcu: tree: correctly handle sparse possible CPUs

2016-05-18 Thread Andrey Ryabinin
2016-05-16 19:48 GMT+03:00 Mark Rutland :

>  /*
> + * Iterate over all possible CPUs in a leaf RCU node.
> + */
> +#define for_each_leaf_node_possible_cpu(rnp, cpu) \
> +   for ((cpu) = rnp->grplo; \
> +cpu <= rnp->grphi; \
> +cpu = cpumask_next((cpu), cpu_possible_mask))
> +
> +/*
> + * Iterate over all possible CPUs in a leaf RCU node, at each step providing 
> a
> + * bit for comparison against rcu_node bitmasks.
> + */
> +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \
> +   for ((cpu) = rnp->grplo, (bit) = 1; \
> +cpu <= rnp->grphi; \
> +cpu = cpumask_next((cpu), cpu_possible_mask), \
> +  (bit) = 1UL << (cpu - rnp->grplo))
> +


[0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3
[0.164000] shift exponent 64 is too large for 64-bit type 'long
unsigned int'


[RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics

2016-05-18 Thread David Howells
Provide an implementation of the atomic_t functions implemented with
ISO-C++11 atomics.  This has some advantages over using inline assembly:

 (1) The compiler has a much better idea of what is going on and can
 optimise appropriately, whereas with inline assembly, the content is
 an indivisible black box as far as the compiler is concerned.

 For example, with powerpc64, the compiler can put the barrier in a
 potentially more favourable position.  Take the inline-asm variant of
 test_and_set_bit() - this has to (a) generate a mask and an address
 and (b) interpolate a memory barrier before doing the atomic ops.

 Here's a disassembly of the current inline-asm variant and then the
 ISO variant:

.current_test_and_set_bit:
clrlwi  r10,r3,26   }
rldicl  r3,r3,58,6  } Fiddling with regs to make
rldicr  r3,r3,3,60  } mask and address
li  r9,1}
sld r9,r9,r10   }
add r4,r4,r3}
hwsync  <--- Release barrier
retry:  ldarx   r3,0,r4 }
or  r10,r3,r9   } Atomic region
stdcx.  r10,0,r4}
bne-retry   }
hwsync  <--- Acquire barrier
and r9,r9,r3
addic   r3,r9,-1
subfe   r3,r3,r9
blr

.iso_test_and_set_bit:
hwsync  <--- Release barrier
clrlwi  r10,r3,26   }
sradi   r3,r3,6 } Fiddling with regs to make
li  r9,1} mask and address
rldicr  r3,r3,3,60  }
sld r9,r9,r10   }
add r4,r4,r3}
retry:  ldarx   r3,0,r4 }
or  r10,r3,r9   } Atomic region
stdcx.  r10,0,r4}
bne retry   }
isync   <--- Acquire barrier
and r9,r9,r3
addic   r3,r9,-1
subfe   r3,r3,r9
blr

 Moving the barrier up in the ISO case would seem to give the CPU a
 better chance of doing the barrier simultaneously with the register
 fiddling.

 Things to note here:

 (a) A BNE rather than a BNE- is emitted after the STDCX instruction.
 I've logged this as:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162

 (b) An ISYNC instruction is emitted as the Acquire barrier with
 __ATOMIC_SEQ_CST, but I'm not sure this is strong enough.

 (c) An LWSYNC instruction is emitted as the Release barrier with
 __ATOMIC_ACQ_REL or __ATOMIC_RELEASE.  Is this strong enough that
 we could use these memorders instead?


 (2) The compiler can pick the appropriate instruction to use, depending on
 the size of the memory location, the operation being applied, the
 value of the operand being applied (if applicable), whether the return
 value is used and whether any condition flags resulting from the
 atomic op are used.

 For instance, on x86_64, atomic_add() can use any of ADDL, SUBL, INCL,
 DECL, XADDL or CMPXCHGL.  atomic_add_return() can use ADDL, SUBL, INCL
 or DECL if we only about the representation of the return value
 encoded in the flags (zero, carry, sign).  If we actually want the
 return value, atomic_add_return() will use XADDL.

 So with __atomic_fetch_add(), if the return value isn't being used and
 if the operand is 1, INCL will be used; if >1, ADDL will be used; if
 -1, DECL will be used; and if <-1, SUBL will be used.

 Things to note here:

 (a) If unpatched, gcc will emit an XADDL instruction rather than a
 DECL instruction if it is told to subtract 1.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821

 (b) The x86 arch keeps a list of LOCK prefixes and NOP's them out when
 only one CPU is online, but gcc doesn't do this yet.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973


 (3) The compiler can make use of extra information provided by the atomic
 instructions involved that can't otherwise be passed out of inline
 assembly - such as the condition flag values.

 Take atomic_dec_and_test(), for example.  I can call it as follows:

const char *xxx_atomic_dec_and_test(atomic_t *counter)
{
if (atomic_dec_and_test(counter))
return "foo";
return "bar";
}

 For original, non-asm-goto code, I see:

nogoto_atomic_dec_and_test:
lock decl (%rdi)
sete   %al
mov$.LC1,%edx
test   %al,%al

[RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics

2016-05-18 Thread David Howells
Provide an implementation of the atomic_t functions implemented with
ISO-C++11 atomics.  This has some advantages over using inline assembly:

 (1) The compiler has a much better idea of what is going on and can
 optimise appropriately, whereas with inline assembly, the content is
 an indivisible black box as far as the compiler is concerned.

 For example, with powerpc64, the compiler can put the barrier in a
 potentially more favourable position.  Take the inline-asm variant of
 test_and_set_bit() - this has to (a) generate a mask and an address
 and (b) interpolate a memory barrier before doing the atomic ops.

 Here's a disassembly of the current inline-asm variant and then the
 ISO variant:

.current_test_and_set_bit:
clrlwi  r10,r3,26   }
rldicl  r3,r3,58,6  } Fiddling with regs to make
rldicr  r3,r3,3,60  } mask and address
li  r9,1}
sld r9,r9,r10   }
add r4,r4,r3}
hwsync  <--- Release barrier
retry:  ldarx   r3,0,r4 }
or  r10,r3,r9   } Atomic region
stdcx.  r10,0,r4}
bne-retry   }
hwsync  <--- Acquire barrier
and r9,r9,r3
addic   r3,r9,-1
subfe   r3,r3,r9
blr

.iso_test_and_set_bit:
hwsync  <--- Release barrier
clrlwi  r10,r3,26   }
sradi   r3,r3,6 } Fiddling with regs to make
li  r9,1} mask and address
rldicr  r3,r3,3,60  }
sld r9,r9,r10   }
add r4,r4,r3}
retry:  ldarx   r3,0,r4 }
or  r10,r3,r9   } Atomic region
stdcx.  r10,0,r4}
bne retry   }
isync   <--- Acquire barrier
and r9,r9,r3
addic   r3,r9,-1
subfe   r3,r3,r9
blr

 Moving the barrier up in the ISO case would seem to give the CPU a
 better chance of doing the barrier simultaneously with the register
 fiddling.

 Things to note here:

 (a) A BNE rather than a BNE- is emitted after the STDCX instruction.
 I've logged this as:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162

 (b) An ISYNC instruction is emitted as the Acquire barrier with
 __ATOMIC_SEQ_CST, but I'm not sure this is strong enough.

 (c) An LWSYNC instruction is emitted as the Release barrier with
 __ATOMIC_ACQ_REL or __ATOMIC_RELEASE.  Is this strong enough that
 we could use these memorders instead?


 (2) The compiler can pick the appropriate instruction to use, depending on
 the size of the memory location, the operation being applied, the
 value of the operand being applied (if applicable), whether the return
 value is used and whether any condition flags resulting from the
 atomic op are used.

 For instance, on x86_64, atomic_add() can use any of ADDL, SUBL, INCL,
 DECL, XADDL or CMPXCHGL.  atomic_add_return() can use ADDL, SUBL, INCL
 or DECL if we only about the representation of the return value
 encoded in the flags (zero, carry, sign).  If we actually want the
 return value, atomic_add_return() will use XADDL.

 So with __atomic_fetch_add(), if the return value isn't being used and
 if the operand is 1, INCL will be used; if >1, ADDL will be used; if
 -1, DECL will be used; and if <-1, SUBL will be used.

 Things to note here:

 (a) If unpatched, gcc will emit an XADDL instruction rather than a
 DECL instruction if it is told to subtract 1.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821

 (b) The x86 arch keeps a list of LOCK prefixes and NOP's them out when
 only one CPU is online, but gcc doesn't do this yet.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973


 (3) The compiler can make use of extra information provided by the atomic
 instructions involved that can't otherwise be passed out of inline
 assembly - such as the condition flag values.

 Take atomic_dec_and_test(), for example.  I can call it as follows:

const char *xxx_atomic_dec_and_test(atomic_t *counter)
{
if (atomic_dec_and_test(counter))
return "foo";
return "bar";
}

 For original, non-asm-goto code, I see:

nogoto_atomic_dec_and_test:
lock decl (%rdi)
sete   %al
mov$.LC1,%edx
test   %al,%al

[RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()

2016-05-18 Thread David Howells
ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() since the
variable it is wangling isn't an atomic_long_t.

Signed-off-by: David Howells 
cc: Peter Hurley 
cc: Greg Kroah-Hartman 
---

 drivers/tty/tty_ldsem.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 1bf8ed13f827..dc27b285ba4f 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -86,7 +86,7 @@ static inline long ldsem_atomic_update(long delta, struct 
ld_semaphore *sem)
  */
 static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem)
 {
-   long tmp = atomic_long_cmpxchg(>count, *old, new);
+   long tmp = cmpxchg(>count, *old, new);
if (tmp == *old) {
*old = new;
return 1;



[RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics

2016-05-18 Thread David Howells

Here's a set of patches to provide kernel atomics and bitops implemented
with ISO C++11 atomic intrinsics.  The second part of the set makes the x86
arch use the implementation.

Note that the x86 patches are very rough.  It would need to be made
compile-time conditional in some way and the old code can't really be
thrown away that easily - but it's a good way to make sure I'm not using
that code any more.


There are some advantages to using ISO C++11 atomics:

 (1) The compiler can make use of extra information, such as condition
 flags, that are tricky to get out of inline assembly in an efficient
 manner.  This should reduce the number of instructions required in
 some cases - such as in x86 where we use SETcc to store the condition
 inside the inline asm and then CMP outside to put it back again.

 Whilst this can be alleviated by the use of asm-goto constructs, this
 adds mandatory conditional jumps where the use of CMOVcc and SETcc
 might be better.

 (2) The compiler inserts memory barriers for us and can move them earlier,
 within reason, thereby affording a greater chance of the CPU being
 able to execute the memory barrier instruction simultaneously with
 register-only instructions.

 (3) The compiler can automatically switch between different forms of an
 atomic instruction depending on operand size, thereby eliminating the
 need for large switch statements with individual blocks of inline asm.

 (4) The compiler can automatically switch between different available
 atomic instructions depending on the values in the operands (INC vs
 ADD) and whether the return value is examined (ADD vs XADD) and how it
 is examined (ADD+Jcc vs XADD+CMP+Jcc).


There are some disadvantages also:

 (1) It's not available in gcc before gcc-4.7 and there will be some
 seriously suboptimal code production before gcc-7.1.

 (2) The compiler might misoptimise - for example, it might generate a
 CMPXCHG loop rather than a BTR instruction to clear a bit.

 (3) The C++11 memory model permits atomic instructions to be merged if
 appropriate - for example, two adjacent __atomic_read() calls might
 get merged if the return value of the first isn't examined.  Making
 the pointers volatile alleviates this.  Note that gcc doesn't do this
 yet.

 (4) The C++11 memory barriers are, in general, weaker than the kernel's
 memory barriers are defined to be.  Whether this actually matters is
 arch dependent.  Further, the C++11 memory barriers are
 acquire/release, but some arches actually have load/store instead -
 which may be a problem.

 (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so
 they cannot be suppressed if only one CPU is online.


Things to be considered:

 (1) We could weaken the kernel memory model to for the benefit of arches
 that have instructions that employ explicit acquire/release barriers -
 but that may cause data races to occur based on assumptions we've
 already made.  Note, however, that powerpc already seems to have a
 weaker memory model.

 (2) There are three sets of atomics (atomic_t, atomic64_t and
 atomic_long_t).  I can either put each in a separate file all nicely
 laid out (see patch 3) or I can make a template header (see patch 4)
 and use #defines with it to make all three atomics from one piece of
 code.  Which is best?  The first is definitely easier to read, but the
 second might be easier to maintain.

 (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg().
 I've also provided atomicX_t variants of these.  These return the
 boolean return value from the __atomic_compare_exchange_n() function
 (which is carried in the Z flag on x86).  Should this be rolled out
 even without the ISO atomic implementation?

 (4) The current x86_64 bitops (set_bit() and co.) are technically broken.
 The set_bit() function, for example, takes a 64-bit signed bit number
 but implements this with BTSL, presumably because it's a shorter
 instruction.

 The BTS instruction's bit number operand, however, is sized according
 to the memory operand, so the upper 32 bits of the bit number are
 discarded by BTSL.  We should really be using BTSQ to implement this
 correctly (and gcc does just that).  In practice, though, it would
 seem unlikely that this will ever be a problem as I think we're
 unlikely to have a bitset with more than ~2 billion bits in it within
 the kernel (it would be >256MiB in size).

 Patch 9 casts the pointers internally in the bitops functions to
 persuade the kernel to use 32-bit bitops - but this only really
 matters on x86.  Should set_bit() and co. take int rather than long
 bit number arguments to make the point?


I've opened a number of gcc bugzillas to improve the code generated by the
atomics:

 (*) 

[RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()

2016-05-18 Thread David Howells
ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() since the
variable it is wangling isn't an atomic_long_t.

Signed-off-by: David Howells 
cc: Peter Hurley 
cc: Greg Kroah-Hartman 
---

 drivers/tty/tty_ldsem.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 1bf8ed13f827..dc27b285ba4f 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -86,7 +86,7 @@ static inline long ldsem_atomic_update(long delta, struct 
ld_semaphore *sem)
  */
 static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem)
 {
-   long tmp = atomic_long_cmpxchg(>count, *old, new);
+   long tmp = cmpxchg(>count, *old, new);
if (tmp == *old) {
*old = new;
return 1;



[RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics

2016-05-18 Thread David Howells

Here's a set of patches to provide kernel atomics and bitops implemented
with ISO C++11 atomic intrinsics.  The second part of the set makes the x86
arch use the implementation.

Note that the x86 patches are very rough.  It would need to be made
compile-time conditional in some way and the old code can't really be
thrown away that easily - but it's a good way to make sure I'm not using
that code any more.


There are some advantages to using ISO C++11 atomics:

 (1) The compiler can make use of extra information, such as condition
 flags, that are tricky to get out of inline assembly in an efficient
 manner.  This should reduce the number of instructions required in
 some cases - such as in x86 where we use SETcc to store the condition
 inside the inline asm and then CMP outside to put it back again.

 Whilst this can be alleviated by the use of asm-goto constructs, this
 adds mandatory conditional jumps where the use of CMOVcc and SETcc
 might be better.

 (2) The compiler inserts memory barriers for us and can move them earlier,
 within reason, thereby affording a greater chance of the CPU being
 able to execute the memory barrier instruction simultaneously with
 register-only instructions.

 (3) The compiler can automatically switch between different forms of an
 atomic instruction depending on operand size, thereby eliminating the
 need for large switch statements with individual blocks of inline asm.

 (4) The compiler can automatically switch between different available
 atomic instructions depending on the values in the operands (INC vs
 ADD) and whether the return value is examined (ADD vs XADD) and how it
 is examined (ADD+Jcc vs XADD+CMP+Jcc).


There are some disadvantages also:

 (1) It's not available in gcc before gcc-4.7 and there will be some
 seriously suboptimal code production before gcc-7.1.

 (2) The compiler might misoptimise - for example, it might generate a
 CMPXCHG loop rather than a BTR instruction to clear a bit.

 (3) The C++11 memory model permits atomic instructions to be merged if
 appropriate - for example, two adjacent __atomic_read() calls might
 get merged if the return value of the first isn't examined.  Making
 the pointers volatile alleviates this.  Note that gcc doesn't do this
 yet.

 (4) The C++11 memory barriers are, in general, weaker than the kernel's
 memory barriers are defined to be.  Whether this actually matters is
 arch dependent.  Further, the C++11 memory barriers are
 acquire/release, but some arches actually have load/store instead -
 which may be a problem.

 (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so
 they cannot be suppressed if only one CPU is online.


Things to be considered:

 (1) We could weaken the kernel memory model to for the benefit of arches
 that have instructions that employ explicit acquire/release barriers -
 but that may cause data races to occur based on assumptions we've
 already made.  Note, however, that powerpc already seems to have a
 weaker memory model.

 (2) There are three sets of atomics (atomic_t, atomic64_t and
 atomic_long_t).  I can either put each in a separate file all nicely
 laid out (see patch 3) or I can make a template header (see patch 4)
 and use #defines with it to make all three atomics from one piece of
 code.  Which is best?  The first is definitely easier to read, but the
 second might be easier to maintain.

 (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg().
 I've also provided atomicX_t variants of these.  These return the
 boolean return value from the __atomic_compare_exchange_n() function
 (which is carried in the Z flag on x86).  Should this be rolled out
 even without the ISO atomic implementation?

 (4) The current x86_64 bitops (set_bit() and co.) are technically broken.
 The set_bit() function, for example, takes a 64-bit signed bit number
 but implements this with BTSL, presumably because it's a shorter
 instruction.

 The BTS instruction's bit number operand, however, is sized according
 to the memory operand, so the upper 32 bits of the bit number are
 discarded by BTSL.  We should really be using BTSQ to implement this
 correctly (and gcc does just that).  In practice, though, it would
 seem unlikely that this will ever be a problem as I think we're
 unlikely to have a bitset with more than ~2 billion bits in it within
 the kernel (it would be >256MiB in size).

 Patch 9 casts the pointers internally in the bitops functions to
 persuade the kernel to use 32-bit bitops - but this only really
 matters on x86.  Should set_bit() and co. take int rather than long
 bit number arguments to make the point?


I've opened a number of gcc bugzillas to improve the code generated by the
atomics:

 (*) 

[RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics

2016-05-18 Thread David Howells
cmpxchg_local() is not signed-value safe because on a 64-bit machine signed
int arguments to it may be sign-extended to signed long _before_ begin cast
to unsigned long.  This potentially causes comparisons to fail when dealing
with negative values.

Fix the generic atomic functions that are implemented in terms of cmpxchg()
to cast their arguments to unsigned int before calling cmpxchg().

Signed-off-by: David Howells 
---

 include/asm-generic/atomic.h |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index 74f1a3704d7a..e6c71c52edfe 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -37,28 +37,33 @@
 
 #ifdef CONFIG_SMP
 
-/* we can build all atomic primitives from cmpxchg */
+/*
+ * We can build all atomic primitives from cmpxchg(), but we have to beware of
+ * implicit casting of signed int parameters to signed long and thence to
+ * unsigned long on a 64-bit machine if we don't explicitly cast to unsigned
+ * int.
+ */
 
 #define ATOMIC_OP(op, c_op)\
 static inline void atomic_##op(int i, atomic_t *v) \
 {  \
-   int c, old; \
+   unsigned int c, old;\
\
c = v->counter; \
-   while ((old = cmpxchg(>counter, c, c c_op i)) != c)  \
+   while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \
c = old;\
 }
 
 #define ATOMIC_OP_RETURN(op, c_op) \
 static inline int atomic_##op##_return(int i, atomic_t *v) \
 {  \
-   int c, old; \
+   unsigned int c, old;\
\
c = v->counter; \
-   while ((old = cmpxchg(>counter, c, c c_op i)) != c)  \
+   while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \
c = old;\
\
-   return c c_op i;\
+   return c c_op (unsigned int)i;  \
 }
 
 #else



[RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics

2016-05-18 Thread David Howells
cmpxchg_local() is not signed-value safe because on a 64-bit machine signed
int arguments to it may be sign-extended to signed long _before_ begin cast
to unsigned long.  This potentially causes comparisons to fail when dealing
with negative values.

Fix the generic atomic functions that are implemented in terms of cmpxchg()
to cast their arguments to unsigned int before calling cmpxchg().

Signed-off-by: David Howells 
---

 include/asm-generic/atomic.h |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index 74f1a3704d7a..e6c71c52edfe 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -37,28 +37,33 @@
 
 #ifdef CONFIG_SMP
 
-/* we can build all atomic primitives from cmpxchg */
+/*
+ * We can build all atomic primitives from cmpxchg(), but we have to beware of
+ * implicit casting of signed int parameters to signed long and thence to
+ * unsigned long on a 64-bit machine if we don't explicitly cast to unsigned
+ * int.
+ */
 
 #define ATOMIC_OP(op, c_op)\
 static inline void atomic_##op(int i, atomic_t *v) \
 {  \
-   int c, old; \
+   unsigned int c, old;\
\
c = v->counter; \
-   while ((old = cmpxchg(>counter, c, c c_op i)) != c)  \
+   while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \
c = old;\
 }
 
 #define ATOMIC_OP_RETURN(op, c_op) \
 static inline int atomic_##op##_return(int i, atomic_t *v) \
 {  \
-   int c, old; \
+   unsigned int c, old;\
\
c = v->counter; \
-   while ((old = cmpxchg(>counter, c, c c_op i)) != c)  \
+   while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \
c = old;\
\
-   return c c_op i;\
+   return c c_op (unsigned int)i;  \
 }
 
 #else



[RFC v2 5/7] iio: inv_mpu6050: Add support for auxiliary I2C master

2016-05-18 Thread Crestez Dan Leonard
The MPU has an auxiliary I2C bus for connecting external
sensors. This bus has two operating modes:
* bypasss: which connects the primary and auxiliary busses
together. This is already supported via an i2c mux.
* master: where the MPU acts as a master to any external
connected sensors. This is implemented by this patch.

This I2C master mode also works when the MPU itself is connected via
SPI.

I2C master supports up to 5 slaves. Slaves 0-3 have a common operating
mode while slave 4 is different. This patch implements an i2c adapter
using slave 4.

Signed-off-by: Crestez Dan Leonard 
---
 .../devicetree/bindings/iio/imu/inv_mpu6050.txt|  66 +-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 258 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  |  36 +++
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c  |   8 -
 4 files changed, 355 insertions(+), 13 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt 
b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
index a9fc11e..778d076 100644
--- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
+++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
@@ -1,16 +1,31 @@
 InvenSense MPU-6050 Six-Axis (Gyro + Accelerometer) MEMS MotionTracking Device
 
-http://www.invensense.com/mems/gyro/mpu6050.html
-
 Required properties:
- - compatible : should be "invensense,mpu6050"
- - reg : the I2C address of the sensor
+ - compatible : one of "invensense,mpu6000", "invensense,mpu6050",
+   "invensense,mpu6000" or "invensense,mpu9150"
+ - reg : the I2C or SPI address of the sensor
  - interrupt-parent : should be the phandle for the interrupt controller
  - interrupts : interrupt mapping for GPIO IRQ
 
 Optional properties:
  - mount-matrix: an optional 3x3 mounting rotation matrix
+ - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is 
bypass).
+
+The MPU has an auxiliary i2c bus for additional sensors. Devices attached this
+way can be described as for a regular linux i2c bus.
+
+It is possible to interact with aux devices in "bypass" or "master" mode. In
+"bypass" mode the auxiliary SDA/SCL lines are connected directly to the main 
i2c
+interface. In "master" mode access to aux devices is done by instructing the
+MPU to read or write i2c bytes.
+
+In "bypass" mode aux devices are listed behind a "i2c@0" node with reg = <0>;
+In "master" mode aux devices are listed behind a "i2c@1" node with reg = <1>;
 
+The master and bypass modes are not supported at the same time. The
+"invensense,i2c-aux-master" property must be set to activate master mode.
+Bypass mode is generally faster but master mode also works when the MPU is
+connected via SPI.
 
 Example:
mpu6050@68 {
@@ -28,3 +43,46 @@ Example:
   "0",   /* y2 */
   "0.984807753012208";   /* z2 */
};
+
+Example describing mpu9150 (which includes an ak9875 on chip):
+   mpu9150@68 {
+   compatible = "invensense,mpu9150";
+   reg = <0x68>;
+   interrupt-parent = <>;
+   interrupts = <18 1>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+   i2c@0 {
+   reg = <0>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ak8975@0c {
+   compatible = "ak,ak8975";
+   reg = <0x0c>;
+   };
+   };
+   };
+
+Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c:
+   mpu6500@0 {
+   compatible = "invensense,mpu6500";
+   reg = <0x0>;
+   spi-max-frequency = <100>;
+   interrupt-parent = <>;
+   interrupts = <31 1>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+   i2c@1 {
+   reg = <1>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   hmc5883l@1e {
+   compatible = "honeywell,hmc5883l";
+   reg = <0x1e>;
+   };
+   };
+   };
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
index 5918c23..76683b8 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "inv_mpu_iio.h"
 
 /*
@@ -57,6 +59,7 @@ static const struct inv_mpu6050_reg_map reg_set_6500 = {
.int_pin_cfg= INV_MPU6050_REG_INT_PIN_CFG,
.accl_offset= INV_MPU6500_REG_ACCEL_OFFSET,
.gyro_offset= INV_MPU6050_REG_GYRO_OFFSET,
+   .mst_status = 

[RFC v2 5/7] iio: inv_mpu6050: Add support for auxiliary I2C master

2016-05-18 Thread Crestez Dan Leonard
The MPU has an auxiliary I2C bus for connecting external
sensors. This bus has two operating modes:
* bypasss: which connects the primary and auxiliary busses
together. This is already supported via an i2c mux.
* master: where the MPU acts as a master to any external
connected sensors. This is implemented by this patch.

This I2C master mode also works when the MPU itself is connected via
SPI.

I2C master supports up to 5 slaves. Slaves 0-3 have a common operating
mode while slave 4 is different. This patch implements an i2c adapter
using slave 4.

Signed-off-by: Crestez Dan Leonard 
---
 .../devicetree/bindings/iio/imu/inv_mpu6050.txt|  66 +-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 258 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  |  36 +++
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c  |   8 -
 4 files changed, 355 insertions(+), 13 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt 
b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
index a9fc11e..778d076 100644
--- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
+++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
@@ -1,16 +1,31 @@
 InvenSense MPU-6050 Six-Axis (Gyro + Accelerometer) MEMS MotionTracking Device
 
-http://www.invensense.com/mems/gyro/mpu6050.html
-
 Required properties:
- - compatible : should be "invensense,mpu6050"
- - reg : the I2C address of the sensor
+ - compatible : one of "invensense,mpu6000", "invensense,mpu6050",
+   "invensense,mpu6000" or "invensense,mpu9150"
+ - reg : the I2C or SPI address of the sensor
  - interrupt-parent : should be the phandle for the interrupt controller
  - interrupts : interrupt mapping for GPIO IRQ
 
 Optional properties:
  - mount-matrix: an optional 3x3 mounting rotation matrix
+ - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is 
bypass).
+
+The MPU has an auxiliary i2c bus for additional sensors. Devices attached this
+way can be described as for a regular linux i2c bus.
+
+It is possible to interact with aux devices in "bypass" or "master" mode. In
+"bypass" mode the auxiliary SDA/SCL lines are connected directly to the main 
i2c
+interface. In "master" mode access to aux devices is done by instructing the
+MPU to read or write i2c bytes.
+
+In "bypass" mode aux devices are listed behind a "i2c@0" node with reg = <0>;
+In "master" mode aux devices are listed behind a "i2c@1" node with reg = <1>;
 
+The master and bypass modes are not supported at the same time. The
+"invensense,i2c-aux-master" property must be set to activate master mode.
+Bypass mode is generally faster but master mode also works when the MPU is
+connected via SPI.
 
 Example:
mpu6050@68 {
@@ -28,3 +43,46 @@ Example:
   "0",   /* y2 */
   "0.984807753012208";   /* z2 */
};
+
+Example describing mpu9150 (which includes an ak9875 on chip):
+   mpu9150@68 {
+   compatible = "invensense,mpu9150";
+   reg = <0x68>;
+   interrupt-parent = <>;
+   interrupts = <18 1>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+   i2c@0 {
+   reg = <0>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ak8975@0c {
+   compatible = "ak,ak8975";
+   reg = <0x0c>;
+   };
+   };
+   };
+
+Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c:
+   mpu6500@0 {
+   compatible = "invensense,mpu6500";
+   reg = <0x0>;
+   spi-max-frequency = <100>;
+   interrupt-parent = <>;
+   interrupts = <31 1>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+   i2c@1 {
+   reg = <1>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   hmc5883l@1e {
+   compatible = "honeywell,hmc5883l";
+   reg = <0x1e>;
+   };
+   };
+   };
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
index 5918c23..76683b8 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "inv_mpu_iio.h"
 
 /*
@@ -57,6 +59,7 @@ static const struct inv_mpu6050_reg_map reg_set_6500 = {
.int_pin_cfg= INV_MPU6050_REG_INT_PIN_CFG,
.accl_offset= INV_MPU6500_REG_ACCEL_OFFSET,
.gyro_offset= INV_MPU6050_REG_GYRO_OFFSET,
+   .mst_status = INV_MPU6050_REG_I2C_MST_STATUS,
 };
 
 

[PATCH v2 6/7] iio: inv_mpu6050: Reformat sample for active scan mask

2016-05-18 Thread Crestez Dan Leonard
Right now it is possible to only enable some of the x/y/z channels, for
example you can enable accel_z without x or y but if you actually do
that what you get is actually only the x channel.

Fix this by reformatting the hardware sample to only include the
requested channels.

Signed-off-by: Crestez Dan Leonard 
---
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  |   8 +++
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 107 -
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
index 85f9b50..ec1401d 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
@@ -152,6 +152,11 @@ struct inv_mpu6050_state {
/* Value of I2C_MST_STATUS after slv4_done */
u8 slv4_done_status;
 #endif
+
+#define INV_MPU6050_MAX_SCAN_ELEMENTS 7
+   unsigned int scan_offsets[INV_MPU6050_MAX_SCAN_ELEMENTS];
+   unsigned int scan_lengths[INV_MPU6050_MAX_SCAN_ELEMENTS];
+   bool realign_required;
 };
 
 /*register and associated bit definition*/
@@ -274,6 +279,9 @@ enum inv_mpu6050_scan {
INV_MPU6050_SCAN_TIMESTAMP,
 };
 
+#define INV_MPU6050_SCAN_MASK_ACCEL0x07
+#define INV_MPU6050_SCAN_MASK_GYRO 0x38
+
 enum inv_mpu6050_filter_e {
INV_MPU6050_FILTER_256HZ_NOLPF2 = 0,
INV_MPU6050_FILTER_188HZ,
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 56ee1e2..49e503c 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -24,6 +24,90 @@
 #include 
 #include "inv_mpu_iio.h"
 
+static void inv_mpu_get_scan_offsets(
+   struct iio_dev *indio_dev,
+   const unsigned long *mask,
+   const unsigned int masklen,
+   unsigned int *scan_offsets)
+{
+   unsigned int pos = 0;
+
+   if (*mask & INV_MPU6050_SCAN_MASK_ACCEL) {
+   scan_offsets[INV_MPU6050_SCAN_ACCL_X] = pos + 0;
+   scan_offsets[INV_MPU6050_SCAN_ACCL_Y] = pos + 2;
+   scan_offsets[INV_MPU6050_SCAN_ACCL_Z] = pos + 4;
+   pos += 6;
+   }
+   if (*mask & INV_MPU6050_SCAN_MASK_GYRO) {
+   scan_offsets[INV_MPU6050_SCAN_GYRO_X] = pos + 0;
+   scan_offsets[INV_MPU6050_SCAN_GYRO_Y] = pos + 2;
+   scan_offsets[INV_MPU6050_SCAN_GYRO_Z] = pos + 4;
+   pos += 6;
+   }
+}
+
+/* This is slowish but relatively common */
+static const struct iio_chan_spec *
+iio_chan_by_scan_index(struct iio_dev *indio_dev, int index)
+{
+   int i;
+
+   for (i = 0; i < indio_dev->num_channels; ++i)
+   if (indio_dev->channels[i].scan_index == index)
+   return _dev->channels[i];
+
+   return NULL;
+}
+
+static int iio_check_scan_offsets_aligned(
+   struct iio_dev *indio_dev,
+   const unsigned long *mask,
+   unsigned int masklen,
+   unsigned int *scan_offsets)
+{
+   int scan_index;
+   unsigned int pos = 0;
+   const struct iio_chan_spec *chan;
+
+   for_each_set_bit(scan_index, mask, masklen) {
+   chan = iio_chan_by_scan_index(indio_dev, scan_index);
+   if (scan_offsets[scan_index] != pos)
+   return false;
+   pos = ALIGN(pos + chan->scan_type.storagebits / 8,
+   chan->scan_type.storagebits / 8);
+   }
+
+   return true;
+}
+
+static void iio_get_scan_lengths(struct iio_dev *indio_dev, unsigned int 
*scan_length)
+{
+   int i;
+   const struct iio_chan_spec *chan;
+
+   for (i = 0; i < indio_dev->num_channels; ++i) {
+   chan = _dev->channels[i];
+   if (chan->scan_index < 0)
+   continue;
+   scan_length[chan->scan_index] = chan->scan_type.storagebits / 8;
+   }
+}
+
+static void iio_realign_sample(
+   struct iio_dev *indio_dev,
+   const u8 *ibuf, u8 *obuf,
+   const unsigned int *scan_offset,
+   const unsigned int *scan_lengths)
+{
+   unsigned int pos = 0;
+   int i;
+
+   for_each_set_bit(i, indio_dev->active_scan_mask, indio_dev->masklength) 
{
+   memcpy([pos], [scan_offset[i]], scan_lengths[i]);
+   pos = ALIGN(pos + scan_lengths[i], scan_lengths[i]);
+   }
+}
+
 static void inv_clear_kfifo(struct inv_mpu6050_state *st)
 {
unsigned long flags;
@@ -38,7 +122,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
 {
int result;
u8 d;
-   struct inv_mpu6050_state  *st = iio_priv(indio_dev);
+   struct inv_mpu6050_state *st = iio_priv(indio_dev);
+   const unsigned long *mask = indio_dev->active_scan_mask;
+   unsigned int masklen = indio_dev->masklength;
 
/* disable interrupt */

[PATCH v2 6/7] iio: inv_mpu6050: Reformat sample for active scan mask

2016-05-18 Thread Crestez Dan Leonard
Right now it is possible to only enable some of the x/y/z channels, for
example you can enable accel_z without x or y but if you actually do
that what you get is actually only the x channel.

Fix this by reformatting the hardware sample to only include the
requested channels.

Signed-off-by: Crestez Dan Leonard 
---
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  |   8 +++
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 107 -
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
index 85f9b50..ec1401d 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
@@ -152,6 +152,11 @@ struct inv_mpu6050_state {
/* Value of I2C_MST_STATUS after slv4_done */
u8 slv4_done_status;
 #endif
+
+#define INV_MPU6050_MAX_SCAN_ELEMENTS 7
+   unsigned int scan_offsets[INV_MPU6050_MAX_SCAN_ELEMENTS];
+   unsigned int scan_lengths[INV_MPU6050_MAX_SCAN_ELEMENTS];
+   bool realign_required;
 };
 
 /*register and associated bit definition*/
@@ -274,6 +279,9 @@ enum inv_mpu6050_scan {
INV_MPU6050_SCAN_TIMESTAMP,
 };
 
+#define INV_MPU6050_SCAN_MASK_ACCEL0x07
+#define INV_MPU6050_SCAN_MASK_GYRO 0x38
+
 enum inv_mpu6050_filter_e {
INV_MPU6050_FILTER_256HZ_NOLPF2 = 0,
INV_MPU6050_FILTER_188HZ,
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 56ee1e2..49e503c 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -24,6 +24,90 @@
 #include 
 #include "inv_mpu_iio.h"
 
+static void inv_mpu_get_scan_offsets(
+   struct iio_dev *indio_dev,
+   const unsigned long *mask,
+   const unsigned int masklen,
+   unsigned int *scan_offsets)
+{
+   unsigned int pos = 0;
+
+   if (*mask & INV_MPU6050_SCAN_MASK_ACCEL) {
+   scan_offsets[INV_MPU6050_SCAN_ACCL_X] = pos + 0;
+   scan_offsets[INV_MPU6050_SCAN_ACCL_Y] = pos + 2;
+   scan_offsets[INV_MPU6050_SCAN_ACCL_Z] = pos + 4;
+   pos += 6;
+   }
+   if (*mask & INV_MPU6050_SCAN_MASK_GYRO) {
+   scan_offsets[INV_MPU6050_SCAN_GYRO_X] = pos + 0;
+   scan_offsets[INV_MPU6050_SCAN_GYRO_Y] = pos + 2;
+   scan_offsets[INV_MPU6050_SCAN_GYRO_Z] = pos + 4;
+   pos += 6;
+   }
+}
+
+/* This is slowish but relatively common */
+static const struct iio_chan_spec *
+iio_chan_by_scan_index(struct iio_dev *indio_dev, int index)
+{
+   int i;
+
+   for (i = 0; i < indio_dev->num_channels; ++i)
+   if (indio_dev->channels[i].scan_index == index)
+   return _dev->channels[i];
+
+   return NULL;
+}
+
+static int iio_check_scan_offsets_aligned(
+   struct iio_dev *indio_dev,
+   const unsigned long *mask,
+   unsigned int masklen,
+   unsigned int *scan_offsets)
+{
+   int scan_index;
+   unsigned int pos = 0;
+   const struct iio_chan_spec *chan;
+
+   for_each_set_bit(scan_index, mask, masklen) {
+   chan = iio_chan_by_scan_index(indio_dev, scan_index);
+   if (scan_offsets[scan_index] != pos)
+   return false;
+   pos = ALIGN(pos + chan->scan_type.storagebits / 8,
+   chan->scan_type.storagebits / 8);
+   }
+
+   return true;
+}
+
+static void iio_get_scan_lengths(struct iio_dev *indio_dev, unsigned int 
*scan_length)
+{
+   int i;
+   const struct iio_chan_spec *chan;
+
+   for (i = 0; i < indio_dev->num_channels; ++i) {
+   chan = _dev->channels[i];
+   if (chan->scan_index < 0)
+   continue;
+   scan_length[chan->scan_index] = chan->scan_type.storagebits / 8;
+   }
+}
+
+static void iio_realign_sample(
+   struct iio_dev *indio_dev,
+   const u8 *ibuf, u8 *obuf,
+   const unsigned int *scan_offset,
+   const unsigned int *scan_lengths)
+{
+   unsigned int pos = 0;
+   int i;
+
+   for_each_set_bit(i, indio_dev->active_scan_mask, indio_dev->masklength) 
{
+   memcpy([pos], [scan_offset[i]], scan_lengths[i]);
+   pos = ALIGN(pos + scan_lengths[i], scan_lengths[i]);
+   }
+}
+
 static void inv_clear_kfifo(struct inv_mpu6050_state *st)
 {
unsigned long flags;
@@ -38,7 +122,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
 {
int result;
u8 d;
-   struct inv_mpu6050_state  *st = iio_priv(indio_dev);
+   struct inv_mpu6050_state *st = iio_priv(indio_dev);
+   const unsigned long *mask = indio_dev->active_scan_mask;
+   unsigned int masklen = indio_dev->masklength;
 
/* disable interrupt */
result = 

Re: [PATCH] ARM: exynos: don't select keyboard driver

2016-05-18 Thread Krzysztof Kozlowski
On 05/18/2016 04:17 PM, Arnd Bergmann wrote:
> The samsung-keypad driver is implicitly selected by ARCH_EXYNOS4 (why?),
> but this fails if CONFIG_INPUT is a loadable module:
> 
> drivers/input/built-in.o: In function `samsung_keypad_remove':
> drivers/input/keyboard/samsung-keypad.c:461: undefined reference to 
> `input_unregister_device'
> drivers/input/built-in.o: In function `samsung_keypad_irq':
> drivers/input/keyboard/samsung-keypad.c:137: undefined reference to 
> `input_event'
> drivers/input/built-in.o: In function `samsung_keypad_irq':
> include/linux/input.h:389: undefined reference to `input_event'
> drivers/input/built-in.o: In function `samsung_keypad_probe':
> drivers/input/keyboard/samsung-keypad.c:358: undefined reference to 
> `devm_input_allocate_device'
> drivers/input/built-in.o:(.debug_addr+0x34): undefined reference to 
> `input_set_capability'
> 
> This removes the 'select' as suggested by Krzysztof Kozlowski and
> instead enables the driver from the defconfig files.
> 
> The problem does not happen on mainline kernels, as we don't normally
> build built-in input drivers when CONFIG_INPUT=m, but I am experimenting
> with a patch to change this, and the samsung keypad driver showed up
> as one example that was silently broken before.
> 
> Signed-off-by: Arnd Bergmann 
> Link: https://lkml.org/lkml/2016/2/14/55
> ---
>  arch/arm/configs/exynos_defconfig   | 1 +
>  arch/arm/configs/multi_v7_defconfig | 1 +
>  arch/arm/mach-exynos/Kconfig| 1 -
>  3 files changed, 2 insertions(+), 1 deletion(-)

Thanks for update.

Applied for v4.7 (same reason as in plat-samsung/devs.c).

Best regards,
Krzysztof



[RFC v2 7/7] iio: inv_mpu6050: Expose channels from slave sensors

2016-05-18 Thread Crestez Dan Leonard
This works by copying the iio_chan_specs from the slave device and
republishing them as if they belonged to the MPU itself. All
read/write_raw operations are forwarded to the other driver.

The original device is still registered with linux as a normal driver
and works normally and you can poke at it to configure stuff like sample
rates and scaling factors.

Buffer values are read from the MPU fifo, allowing a much higher
sampling rate.

Signed-off-by: Crestez Dan Leonard 
---
 .../devicetree/bindings/iio/imu/inv_mpu6050.txt|  47 ++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 387 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  |  74 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c |  65 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c  |   9 +
 5 files changed, 556 insertions(+), 26 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt 
b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
index 778d076..07d572a 100644
--- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
+++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
@@ -11,8 +11,8 @@ Optional properties:
  - mount-matrix: an optional 3x3 mounting rotation matrix
  - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is 
bypass).
 
-The MPU has an auxiliary i2c bus for additional sensors. Devices attached this
-way can be described as for a regular linux i2c bus.
+It is possible to attach auxiliary sensors to the MPU and have them be handled
+by linux. Those auxiliary sensors are described as an i2c bus.
 
 It is possible to interact with aux devices in "bypass" or "master" mode. In
 "bypass" mode the auxiliary SDA/SCL lines are connected directly to the main 
i2c
@@ -25,7 +25,31 @@ In "master" mode aux devices are listed behind a "i2c@1" 
node with reg = <1>;
 The master and bypass modes are not supported at the same time. The
 "invensense,i2c-aux-master" property must be set to activate master mode.
 Bypass mode is generally faster but master mode also works when the MPU is
-connected via SPI.
+connected via SPI. Enabling master mode is required for automated external
+readings.
+
+
+It is possible to configure the MPU to automatically fetch reading for
+devices connected on the auxiliary i2c bus without CPU intervention. When this
+is done the driver will present the channels of the slaved devices as if they
+belong to the MPU device itself.
+
+Reads and write to config values (like scaling factors) are forwarded to the
+other driver while data reads are handled from MPU registers. The channels are
+also available through the MPU's iio triggered buffer mechanism. This feature
+can allow sampling up to 24 bytes from 4 slaves at a much higher rate.
+
+This is specified in device tree as an "invensense,external-sensors" node with
+children indexed by slave ids 0 to 3. The child nodes identifying each external
+sensor reading have the following properties:
+ - reg: 0 to 3 slave index
+ - invensense,addr : the I2C address to read from
+ - invensense,reg : the I2C register to read from
+ - invensense,len : read length from the device
+ - invensense,external-channels : list of slaved channels specified by 
scan_index.
+
+The sum of storagebits for external channels must equal the read length. Only
+16bit channels are currently supported.
 
 Example:
mpu6050@68 {
@@ -65,7 +89,8 @@ Example describing mpu9150 (which includes an ak9875 on chip):
};
};
 
-Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c:
+Example describing a mpu6500 via SPI with an hmc5883l on aux i2c configured for
+automatic external readings as slave 0:
mpu6500@0 {
compatible = "invensense,mpu6500";
reg = <0x0>;
@@ -73,6 +98,8 @@ Example describing a mpu6500 via SPI with an hmc5883l on 
auxiliary i2c:
interrupt-parent = <>;
interrupts = <31 1>;
 
+   invensense,i2c-aux-master;
+
#address-cells = <1>;
#size-cells = <0>;
i2c@1 {
@@ -85,4 +112,16 @@ Example describing a mpu6500 via SPI with an hmc5883l on 
auxiliary i2c:
reg = <0x1e>;
};
};
+
+   invensense,external-sensors {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   hmc5833l@0 {
+   reg = <0>;
+   invensense,addr = <0x1e>;
+   invensense,reg = <3>;
+   invensense,len = <6>;
+   invensense,external-channels = <0 1 2>;
+   };
+   };
};
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
index 76683b8..715b2e4 100644
--- 

[PATCH] thermal: add the note for set_trip_temp

2016-05-18 Thread Caesar Wang
Fixes commit 60f9ce3ada53
("thermal: of-thermal: allow setting trip_temp on hardware")

Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org

---

 include/linux/thermal.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index e45abe7..ee517be 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -335,6 +335,8 @@ struct thermal_genl_event {
  * @get_trend: a pointer to a function that reads the sensor temperature trend.
  * @set_emul_temp: a pointer to a function that sets sensor emulated
  *temperature.
+ * @set_trip_temp: a pointer to a function that sets the trip temperature on
+ *hardware.
  */
 struct thermal_zone_of_device_ops {
int (*get_temp)(void *, int *);
-- 
2.7.4



Re: [PATCH] ARM: exynos: don't select keyboard driver

2016-05-18 Thread Krzysztof Kozlowski
On 05/18/2016 04:17 PM, Arnd Bergmann wrote:
> The samsung-keypad driver is implicitly selected by ARCH_EXYNOS4 (why?),
> but this fails if CONFIG_INPUT is a loadable module:
> 
> drivers/input/built-in.o: In function `samsung_keypad_remove':
> drivers/input/keyboard/samsung-keypad.c:461: undefined reference to 
> `input_unregister_device'
> drivers/input/built-in.o: In function `samsung_keypad_irq':
> drivers/input/keyboard/samsung-keypad.c:137: undefined reference to 
> `input_event'
> drivers/input/built-in.o: In function `samsung_keypad_irq':
> include/linux/input.h:389: undefined reference to `input_event'
> drivers/input/built-in.o: In function `samsung_keypad_probe':
> drivers/input/keyboard/samsung-keypad.c:358: undefined reference to 
> `devm_input_allocate_device'
> drivers/input/built-in.o:(.debug_addr+0x34): undefined reference to 
> `input_set_capability'
> 
> This removes the 'select' as suggested by Krzysztof Kozlowski and
> instead enables the driver from the defconfig files.
> 
> The problem does not happen on mainline kernels, as we don't normally
> build built-in input drivers when CONFIG_INPUT=m, but I am experimenting
> with a patch to change this, and the samsung keypad driver showed up
> as one example that was silently broken before.
> 
> Signed-off-by: Arnd Bergmann 
> Link: https://lkml.org/lkml/2016/2/14/55
> ---
>  arch/arm/configs/exynos_defconfig   | 1 +
>  arch/arm/configs/multi_v7_defconfig | 1 +
>  arch/arm/mach-exynos/Kconfig| 1 -
>  3 files changed, 2 insertions(+), 1 deletion(-)

Thanks for update.

Applied for v4.7 (same reason as in plat-samsung/devs.c).

Best regards,
Krzysztof



[RFC v2 7/7] iio: inv_mpu6050: Expose channels from slave sensors

2016-05-18 Thread Crestez Dan Leonard
This works by copying the iio_chan_specs from the slave device and
republishing them as if they belonged to the MPU itself. All
read/write_raw operations are forwarded to the other driver.

The original device is still registered with linux as a normal driver
and works normally and you can poke at it to configure stuff like sample
rates and scaling factors.

Buffer values are read from the MPU fifo, allowing a much higher
sampling rate.

Signed-off-by: Crestez Dan Leonard 
---
 .../devicetree/bindings/iio/imu/inv_mpu6050.txt|  47 ++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 387 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  |  74 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c |  65 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c  |   9 +
 5 files changed, 556 insertions(+), 26 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt 
b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
index 778d076..07d572a 100644
--- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
+++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt
@@ -11,8 +11,8 @@ Optional properties:
  - mount-matrix: an optional 3x3 mounting rotation matrix
  - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is 
bypass).
 
-The MPU has an auxiliary i2c bus for additional sensors. Devices attached this
-way can be described as for a regular linux i2c bus.
+It is possible to attach auxiliary sensors to the MPU and have them be handled
+by linux. Those auxiliary sensors are described as an i2c bus.
 
 It is possible to interact with aux devices in "bypass" or "master" mode. In
 "bypass" mode the auxiliary SDA/SCL lines are connected directly to the main 
i2c
@@ -25,7 +25,31 @@ In "master" mode aux devices are listed behind a "i2c@1" 
node with reg = <1>;
 The master and bypass modes are not supported at the same time. The
 "invensense,i2c-aux-master" property must be set to activate master mode.
 Bypass mode is generally faster but master mode also works when the MPU is
-connected via SPI.
+connected via SPI. Enabling master mode is required for automated external
+readings.
+
+
+It is possible to configure the MPU to automatically fetch reading for
+devices connected on the auxiliary i2c bus without CPU intervention. When this
+is done the driver will present the channels of the slaved devices as if they
+belong to the MPU device itself.
+
+Reads and write to config values (like scaling factors) are forwarded to the
+other driver while data reads are handled from MPU registers. The channels are
+also available through the MPU's iio triggered buffer mechanism. This feature
+can allow sampling up to 24 bytes from 4 slaves at a much higher rate.
+
+This is specified in device tree as an "invensense,external-sensors" node with
+children indexed by slave ids 0 to 3. The child nodes identifying each external
+sensor reading have the following properties:
+ - reg: 0 to 3 slave index
+ - invensense,addr : the I2C address to read from
+ - invensense,reg : the I2C register to read from
+ - invensense,len : read length from the device
+ - invensense,external-channels : list of slaved channels specified by 
scan_index.
+
+The sum of storagebits for external channels must equal the read length. Only
+16bit channels are currently supported.
 
 Example:
mpu6050@68 {
@@ -65,7 +89,8 @@ Example describing mpu9150 (which includes an ak9875 on chip):
};
};
 
-Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c:
+Example describing a mpu6500 via SPI with an hmc5883l on aux i2c configured for
+automatic external readings as slave 0:
mpu6500@0 {
compatible = "invensense,mpu6500";
reg = <0x0>;
@@ -73,6 +98,8 @@ Example describing a mpu6500 via SPI with an hmc5883l on 
auxiliary i2c:
interrupt-parent = <>;
interrupts = <31 1>;
 
+   invensense,i2c-aux-master;
+
#address-cells = <1>;
#size-cells = <0>;
i2c@1 {
@@ -85,4 +112,16 @@ Example describing a mpu6500 via SPI with an hmc5883l on 
auxiliary i2c:
reg = <0x1e>;
};
};
+
+   invensense,external-sensors {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   hmc5833l@0 {
+   reg = <0>;
+   invensense,addr = <0x1e>;
+   invensense,reg = <3>;
+   invensense,len = <6>;
+   invensense,external-channels = <0 1 2>;
+   };
+   };
};
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c
index 76683b8..715b2e4 100644
--- 

[PATCH] thermal: add the note for set_trip_temp

2016-05-18 Thread Caesar Wang
Fixes commit 60f9ce3ada53
("thermal: of-thermal: allow setting trip_temp on hardware")

Signed-off-by: Caesar Wang 
Cc: Zhang Rui 
Cc: Eduardo Valentin 
Cc: linux...@vger.kernel.org

---

 include/linux/thermal.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index e45abe7..ee517be 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -335,6 +335,8 @@ struct thermal_genl_event {
  * @get_trend: a pointer to a function that reads the sensor temperature trend.
  * @set_emul_temp: a pointer to a function that sets sensor emulated
  *temperature.
+ * @set_trip_temp: a pointer to a function that sets the trip temperature on
+ *hardware.
  */
 struct thermal_zone_of_device_ops {
int (*get_temp)(void *, int *);
-- 
2.7.4



Re: UBSAN whinge in ihci-hub.c

2016-05-18 Thread Andrey Ryabinin
2016-05-18 17:40 GMT+03:00 Alan Stern :

> All right, I'm getting very tired of all these bug reports.  Besides,
> Andrey has a point: Unless you're Linus, arguing against the C standard
> is futile.  (Even though the language dialect used in the kernel is not
> standard C.)
>
> Does this patch make UBSAN happy?  The runtime overhead is minimal.
>

It does. However, you could fool ubsan way more easy:
 u32 __iomem *hostpc_reg = ehci->regs->hostpc +
(wIndex & 0xff) - 1;



> Alan Stern
>
>
>
> Index: usb-4.x/drivers/usb/host/ehci-hub.c
> ===
> --- usb-4.x.orig/drivers/usb/host/ehci-hub.c
> +++ usb-4.x/drivers/usb/host/ehci-hub.c
> @@ -872,14 +872,17 @@ int ehci_hub_control(
>  ) {
> struct ehci_hcd *ehci = hcd_to_ehci (hcd);
> int ports = HCS_N_PORTS (ehci->hcs_params);
> -   u32 __iomem *status_reg = >regs->port_status[
> -   (wIndex & 0xff) - 1];
> -   u32 __iomem *hostpc_reg = >regs->hostpc[(wIndex & 0xff) - 
> 1];
> +   u32 __iomem *status_reg, *hostpc_reg;
> u32 temp, temp1, status;
> unsigned long   flags;
> int retval = 0;
> unsignedselector;
>
> +   temp = wIndex & 0xff;
> +   temp -= (temp > 0);
> +   status_reg = >regs->port_status[temp];
> +   hostpc_reg = >regs->hostpc[temp];
> +
> /*
>  * FIXME:  support SetPortFeatures USB_PORT_FEAT_INDICATOR.
>  * HCS_INDICATOR may say we can change LEDs to off/amber/green.
>


Re: UBSAN whinge in ihci-hub.c

2016-05-18 Thread Andrey Ryabinin
2016-05-18 17:40 GMT+03:00 Alan Stern :

> All right, I'm getting very tired of all these bug reports.  Besides,
> Andrey has a point: Unless you're Linus, arguing against the C standard
> is futile.  (Even though the language dialect used in the kernel is not
> standard C.)
>
> Does this patch make UBSAN happy?  The runtime overhead is minimal.
>

It does. However, you could fool ubsan way more easy:
 u32 __iomem *hostpc_reg = ehci->regs->hostpc +
(wIndex & 0xff) - 1;



> Alan Stern
>
>
>
> Index: usb-4.x/drivers/usb/host/ehci-hub.c
> ===
> --- usb-4.x.orig/drivers/usb/host/ehci-hub.c
> +++ usb-4.x/drivers/usb/host/ehci-hub.c
> @@ -872,14 +872,17 @@ int ehci_hub_control(
>  ) {
> struct ehci_hcd *ehci = hcd_to_ehci (hcd);
> int ports = HCS_N_PORTS (ehci->hcs_params);
> -   u32 __iomem *status_reg = >regs->port_status[
> -   (wIndex & 0xff) - 1];
> -   u32 __iomem *hostpc_reg = >regs->hostpc[(wIndex & 0xff) - 
> 1];
> +   u32 __iomem *status_reg, *hostpc_reg;
> u32 temp, temp1, status;
> unsigned long   flags;
> int retval = 0;
> unsignedselector;
>
> +   temp = wIndex & 0xff;
> +   temp -= (temp > 0);
> +   status_reg = >regs->port_status[temp];
> +   hostpc_reg = >regs->hostpc[temp];
> +
> /*
>  * FIXME:  support SetPortFeatures USB_PORT_FEAT_INDICATOR.
>  * HCS_INDICATOR may say we can change LEDs to off/amber/green.
>


Re: [PATCH v6 0/3] auxdisplay: Introduce ht16k33 driver

2016-05-18 Thread Greg Kroah-Hartman
On Wed, May 18, 2016 at 02:23:16PM +0200, Robin van der Gracht wrote:
> This patchset adds a new driver to the auxdisplay subsystem. It also adds
> devicetree bindings documentation and a new vendor prefix.
> 
> I added myself as maintainer to the MAINTAINERS file.

First off, if you want me to apply patches, put me in the to: line, and
say so, otherwise I don't know.

Secondly, it's the middle of the merge window, and we can't take new
patches into our trees (go read Documentation/development_model please),
so this will have to wait until after 4.7-rc1 is out.

Thirdly, I need an ack for the DT-related change before I can accept
that, hopefully you included the correct people on it.

And 4th, what is with the insane number of people on cc:?  Use
get_maintainer.pl correctly please, and don't just hit everyone you can
possibly think of with a cc: for no good reason.

I'll put this in my "to-review" queue to look at after 4.7-rc1 is out.

thanks,

greg k-h


Re: [PATCH v6 0/3] auxdisplay: Introduce ht16k33 driver

2016-05-18 Thread Greg Kroah-Hartman
On Wed, May 18, 2016 at 02:23:16PM +0200, Robin van der Gracht wrote:
> This patchset adds a new driver to the auxdisplay subsystem. It also adds
> devicetree bindings documentation and a new vendor prefix.
> 
> I added myself as maintainer to the MAINTAINERS file.

First off, if you want me to apply patches, put me in the to: line, and
say so, otherwise I don't know.

Secondly, it's the middle of the merge window, and we can't take new
patches into our trees (go read Documentation/development_model please),
so this will have to wait until after 4.7-rc1 is out.

Thirdly, I need an ack for the DT-related change before I can accept
that, hopefully you included the correct people on it.

And 4th, what is with the insane number of people on cc:?  Use
get_maintainer.pl correctly please, and don't just hit everyone you can
possibly think of with a cc: for no good reason.

I'll put this in my "to-review" queue to look at after 4.7-rc1 is out.

thanks,

greg k-h


[PATCH v2 0/7] iio: inv_mpu6050: Support i2c master and external readings

2016-05-18 Thread Crestez Dan Leonard
This series attempts to implement support for external readings in i2c master
mode. Previous version here:
https://www.spinics.net/lists/linux-iio/msg24502.html

Patches 1 and 6 should be considered bugfixes.
Patches 2,3,4 add regmap support and are mostly unchanged from v2
Patch 5 adds i2c master support
Patch 6 adds external readings (which only works in i2c master mode)

It might make sense to wait for Peter Rosin's patches cleaning up i2c mux
locking before patch 5.

Notable differences since previous versions:
 * Prefer inv_mpu_ over inv_mpu6050_ prefix
 * Remove indirection for SLV4 register names
 * Use the registered vendor prefix 'invensense'
 * Rearrange samples for scan mask instead of validating

For i2c master:
 * Explicit read/write when caching addr
 * Support I2C_FUNC_SMBUS_BYTE (making i2cdetect work)
 * Handle xfer errors as reported in status registers

For external channels support:
 * Only enable i2c slaves when required
 * Also forward write_raw for external channels
 * Drop handling read_raw from EXT_SENS registers
 * List external channels by scan index
 * Allow external channels with arbitrary sizes

Crestez Dan Leonard (7):
  iio: inv_mpu6050: Do burst reads using spi/i2c directly
  iio: inv_mpu6050: Initial regcache support
  iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo
  iio: inv_mpu6050: Cache non-volatile bits of user_ctrl
  iio: inv_mpu6050: Add support for auxiliary I2C master
  iio: inv_mpu6050: Reformat sample for active scan mask
  iio: inv_mpu6050: Expose channels from slave sensors

 .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 105 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 692 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_i2c.c  |   5 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  | 115 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 221 ++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_spi.c  |   5 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c  |  24 +-
 7 files changed,  insertions(+), 56 deletions(-)

-- 
2.5.5



Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.

2016-05-18 Thread Paul Burton
On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote:
> When starting secondary VPEs which support EVA and the SegCtl registers,
> copy the memory segmentation configuration from the running VPE to ensure
> that all VPEs in the core have a consitent virtual memory map.
> 
> The EVA configuration of secondary cores is dealt with when starting the
> core via the CM.
> 
> Signed-off-by: Matt Redfearn 
> ---
> 
>  arch/mips/kernel/cps-vec.S | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
> index ac81edd44563..07b3274c8ae1 100644
> --- a/arch/mips/kernel/cps-vec.S
> +++ b/arch/mips/kernel/cps-vec.S
> @@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes)
>   mfc0t0, CP0_CONFIG
>   mttc0   t0, CP0_CONFIG
>  
> + /* Copy the EVA config from this VPE if the CPU supports it */
> + mfc0t0, CP0_CONFIG, 1
> + bgezt0, 1f
> +  mfc0   t0, CP0_CONFIG, 2
> + bgezt0, 1f
> +  mfc0   t0, CP0_CONFIG, 3
> + and t0, t0, MIPS_CONF3_SC
> + beqzt0, 1f
> +  nop

Hi Matt,

The checks here aren't *quite* right since they do the mfc0 of the next
register in the delay slot which will happen even if the M bit of the
preceeding register wasn't set. There are other cases in cps-vec.S where
I've made that mistake... Luckily, in this particular case, we know that
we have MT ASE support which means we know that Config3 exists. So I
think you can just remove the checks of Config1.M & Config2.M and just
read Config3 straight away.

Thanks,
Paul

> + mfc0t0, CP0_SEGCTL0
> + mttc0   t0, CP0_SEGCTL0
> + mfc0t0, CP0_SEGCTL1
> + mttc0   t0, CP0_SEGCTL1
> + mfc0t0, CP0_SEGCTL2
> + mttc0   t0, CP0_SEGCTL2
> +1:
>   /* Ensure no software interrupts are pending */
>   mttc0   zero, CP0_CAUSE
>   mttc0   zero, CP0_STATUS
> -- 
> 2.5.0
> 


[PATCH v2 0/7] iio: inv_mpu6050: Support i2c master and external readings

2016-05-18 Thread Crestez Dan Leonard
This series attempts to implement support for external readings in i2c master
mode. Previous version here:
https://www.spinics.net/lists/linux-iio/msg24502.html

Patches 1 and 6 should be considered bugfixes.
Patches 2,3,4 add regmap support and are mostly unchanged from v2
Patch 5 adds i2c master support
Patch 6 adds external readings (which only works in i2c master mode)

It might make sense to wait for Peter Rosin's patches cleaning up i2c mux
locking before patch 5.

Notable differences since previous versions:
 * Prefer inv_mpu_ over inv_mpu6050_ prefix
 * Remove indirection for SLV4 register names
 * Use the registered vendor prefix 'invensense'
 * Rearrange samples for scan mask instead of validating

For i2c master:
 * Explicit read/write when caching addr
 * Support I2C_FUNC_SMBUS_BYTE (making i2cdetect work)
 * Handle xfer errors as reported in status registers

For external channels support:
 * Only enable i2c slaves when required
 * Also forward write_raw for external channels
 * Drop handling read_raw from EXT_SENS registers
 * List external channels by scan index
 * Allow external channels with arbitrary sizes

Crestez Dan Leonard (7):
  iio: inv_mpu6050: Do burst reads using spi/i2c directly
  iio: inv_mpu6050: Initial regcache support
  iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo
  iio: inv_mpu6050: Cache non-volatile bits of user_ctrl
  iio: inv_mpu6050: Add support for auxiliary I2C master
  iio: inv_mpu6050: Reformat sample for active scan mask
  iio: inv_mpu6050: Expose channels from slave sensors

 .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 105 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 692 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_i2c.c  |   5 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h  | 115 +++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 221 ++-
 drivers/iio/imu/inv_mpu6050/inv_mpu_spi.c  |   5 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c  |  24 +-
 7 files changed,  insertions(+), 56 deletions(-)

-- 
2.5.5



Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.

2016-05-18 Thread Paul Burton
On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote:
> When starting secondary VPEs which support EVA and the SegCtl registers,
> copy the memory segmentation configuration from the running VPE to ensure
> that all VPEs in the core have a consitent virtual memory map.
> 
> The EVA configuration of secondary cores is dealt with when starting the
> core via the CM.
> 
> Signed-off-by: Matt Redfearn 
> ---
> 
>  arch/mips/kernel/cps-vec.S | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
> index ac81edd44563..07b3274c8ae1 100644
> --- a/arch/mips/kernel/cps-vec.S
> +++ b/arch/mips/kernel/cps-vec.S
> @@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes)
>   mfc0t0, CP0_CONFIG
>   mttc0   t0, CP0_CONFIG
>  
> + /* Copy the EVA config from this VPE if the CPU supports it */
> + mfc0t0, CP0_CONFIG, 1
> + bgezt0, 1f
> +  mfc0   t0, CP0_CONFIG, 2
> + bgezt0, 1f
> +  mfc0   t0, CP0_CONFIG, 3
> + and t0, t0, MIPS_CONF3_SC
> + beqzt0, 1f
> +  nop

Hi Matt,

The checks here aren't *quite* right since they do the mfc0 of the next
register in the delay slot which will happen even if the M bit of the
preceeding register wasn't set. There are other cases in cps-vec.S where
I've made that mistake... Luckily, in this particular case, we know that
we have MT ASE support which means we know that Config3 exists. So I
think you can just remove the checks of Config1.M & Config2.M and just
read Config3 straight away.

Thanks,
Paul

> + mfc0t0, CP0_SEGCTL0
> + mttc0   t0, CP0_SEGCTL0
> + mfc0t0, CP0_SEGCTL1
> + mttc0   t0, CP0_SEGCTL1
> + mfc0t0, CP0_SEGCTL2
> + mttc0   t0, CP0_SEGCTL2
> +1:
>   /* Ensure no software interrupts are pending */
>   mttc0   zero, CP0_CAUSE
>   mttc0   zero, CP0_STATUS
> -- 
> 2.5.0
> 


[PATCH v2 4/7] iio: inv_mpu6050: Cache non-volatile bits of user_ctrl

2016-05-18 Thread Crestez Dan Leonard
Signed-off-by: Crestez Dan Leonard 
---
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 2 ++
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 9 ++---
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 4 +++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
index 297b0ef..bd2c0fd 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
@@ -80,6 +80,7 @@ enum inv_devices {
  *  @enable:   master enable state.
  *  @accl_fifo_enable: enable accel data output
  *  @gyro_fifo_enable: enable gyro data output
+ *  @user_ctrl:The non-volatile bits of user_ctrl
  *  @fifo_rate:FIFO update rate.
  */
 struct inv_mpu6050_chip_config {
@@ -89,6 +90,7 @@ struct inv_mpu6050_chip_config {
unsigned int enable:1;
unsigned int accl_fifo_enable:1;
unsigned int gyro_fifo_enable:1;
+   u8 user_ctrl;
u16 fifo_rate;
 };
 
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 3fc0b71..56ee1e2 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -53,13 +53,15 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
if (result)
goto reset_fifo_fail;
/* disable fifo reading */
-   result = regmap_write(st->map, st->reg->user_ctrl, 0);
+   st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN;
+   result = regmap_write(st->map, st->reg->user_ctrl,
+ st->chip_config.user_ctrl);
if (result)
goto reset_fifo_fail;
 
/* reset FIFO*/
result = regmap_write(st->map, st->reg->user_ctrl,
- INV_MPU6050_BIT_FIFO_RST);
+ st->chip_config.user_ctrl | 
INV_MPU6050_BIT_FIFO_RST);
if (result)
goto reset_fifo_fail;
 
@@ -76,8 +78,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
return result;
}
/* enable FIFO reading and I2C master interface*/
+   st->chip_config.user_ctrl |= INV_MPU6050_BIT_FIFO_EN;
result = regmap_write(st->map, st->reg->user_ctrl,
- INV_MPU6050_BIT_FIFO_EN);
+ st->chip_config.user_ctrl);
if (result)
goto reset_fifo_fail;
/* enable sensor output to FIFO */
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index 1a6bad3..fc55923 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -74,7 +74,9 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, 
bool enable)
if (result)
return result;
 
-   result = regmap_write(st->map, st->reg->user_ctrl, 0);
+   st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN;
+   result = regmap_write(st->map, st->reg->user_ctrl,
+ st->chip_config.user_ctrl);
if (result)
return result;
 
-- 
2.5.5



[PATCH v2 3/7] iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo

2016-05-18 Thread Crestez Dan Leonard
Signed-off-by: Crestez Dan Leonard 
---
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 13 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c |  3 ++-
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 8455af0..3fc0b71 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -41,7 +41,8 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
struct inv_mpu6050_state  *st = iio_priv(indio_dev);
 
/* disable interrupt */
-   result = regmap_write(st->map, st->reg->int_enable, 0);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN, 0);
if (result) {
dev_err(regmap_get_device(st->map), "int_enable failed %d\n",
result);
@@ -68,8 +69,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
/* enable interrupt */
if (st->chip_config.accl_fifo_enable ||
st->chip_config.gyro_fifo_enable) {
-   result = regmap_write(st->map, st->reg->int_enable,
- INV_MPU6050_BIT_DATA_RDY_EN);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN,
+   INV_MPU6050_BIT_DATA_RDY_EN);
if (result)
return result;
}
@@ -92,8 +94,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
 
 reset_fifo_fail:
dev_err(regmap_get_device(st->map), "reset fifo failed %d\n", result);
-   result = regmap_write(st->map, st->reg->int_enable,
- INV_MPU6050_BIT_DATA_RDY_EN);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN,
+   INV_MPU6050_BIT_DATA_RDY_EN);
 
return result;
 }
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index e8818d4..1a6bad3 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -69,7 +69,8 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, 
bool enable)
if (result)
return result;
 
-   result = regmap_write(st->map, st->reg->int_enable, 0);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN, 0);
if (result)
return result;
 
-- 
2.5.5



[PATCH v2 4/7] iio: inv_mpu6050: Cache non-volatile bits of user_ctrl

2016-05-18 Thread Crestez Dan Leonard
Signed-off-by: Crestez Dan Leonard 
---
 drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 2 ++
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 9 ++---
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 4 +++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
index 297b0ef..bd2c0fd 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h
@@ -80,6 +80,7 @@ enum inv_devices {
  *  @enable:   master enable state.
  *  @accl_fifo_enable: enable accel data output
  *  @gyro_fifo_enable: enable gyro data output
+ *  @user_ctrl:The non-volatile bits of user_ctrl
  *  @fifo_rate:FIFO update rate.
  */
 struct inv_mpu6050_chip_config {
@@ -89,6 +90,7 @@ struct inv_mpu6050_chip_config {
unsigned int enable:1;
unsigned int accl_fifo_enable:1;
unsigned int gyro_fifo_enable:1;
+   u8 user_ctrl;
u16 fifo_rate;
 };
 
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 3fc0b71..56ee1e2 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -53,13 +53,15 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
if (result)
goto reset_fifo_fail;
/* disable fifo reading */
-   result = regmap_write(st->map, st->reg->user_ctrl, 0);
+   st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN;
+   result = regmap_write(st->map, st->reg->user_ctrl,
+ st->chip_config.user_ctrl);
if (result)
goto reset_fifo_fail;
 
/* reset FIFO*/
result = regmap_write(st->map, st->reg->user_ctrl,
- INV_MPU6050_BIT_FIFO_RST);
+ st->chip_config.user_ctrl | 
INV_MPU6050_BIT_FIFO_RST);
if (result)
goto reset_fifo_fail;
 
@@ -76,8 +78,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
return result;
}
/* enable FIFO reading and I2C master interface*/
+   st->chip_config.user_ctrl |= INV_MPU6050_BIT_FIFO_EN;
result = regmap_write(st->map, st->reg->user_ctrl,
- INV_MPU6050_BIT_FIFO_EN);
+ st->chip_config.user_ctrl);
if (result)
goto reset_fifo_fail;
/* enable sensor output to FIFO */
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index 1a6bad3..fc55923 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -74,7 +74,9 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, 
bool enable)
if (result)
return result;
 
-   result = regmap_write(st->map, st->reg->user_ctrl, 0);
+   st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN;
+   result = regmap_write(st->map, st->reg->user_ctrl,
+ st->chip_config.user_ctrl);
if (result)
return result;
 
-- 
2.5.5



[PATCH v2 3/7] iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo

2016-05-18 Thread Crestez Dan Leonard
Signed-off-by: Crestez Dan Leonard 
---
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 13 -
 drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c |  3 ++-
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 8455af0..3fc0b71 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -41,7 +41,8 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
struct inv_mpu6050_state  *st = iio_priv(indio_dev);
 
/* disable interrupt */
-   result = regmap_write(st->map, st->reg->int_enable, 0);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN, 0);
if (result) {
dev_err(regmap_get_device(st->map), "int_enable failed %d\n",
result);
@@ -68,8 +69,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
/* enable interrupt */
if (st->chip_config.accl_fifo_enable ||
st->chip_config.gyro_fifo_enable) {
-   result = regmap_write(st->map, st->reg->int_enable,
- INV_MPU6050_BIT_DATA_RDY_EN);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN,
+   INV_MPU6050_BIT_DATA_RDY_EN);
if (result)
return result;
}
@@ -92,8 +94,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev)
 
 reset_fifo_fail:
dev_err(regmap_get_device(st->map), "reset fifo failed %d\n", result);
-   result = regmap_write(st->map, st->reg->int_enable,
- INV_MPU6050_BIT_DATA_RDY_EN);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN,
+   INV_MPU6050_BIT_DATA_RDY_EN);
 
return result;
 }
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c 
b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index e8818d4..1a6bad3 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -69,7 +69,8 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, 
bool enable)
if (result)
return result;
 
-   result = regmap_write(st->map, st->reg->int_enable, 0);
+   result = regmap_update_bits(st->map, st->reg->int_enable,
+   INV_MPU6050_BIT_DATA_RDY_EN, 0);
if (result)
return result;
 
-- 
2.5.5



<    3   4   5   6   7   8   9   10   11   12   >