Re: [PATCH] xen: add steal_clock support on x86
On 18/05/16 17:45, David Vrabel wrote: > On 18/05/16 16:42, Juergen Gross wrote: >> On 18/05/16 17:25, Boris Ostrovsky wrote: >>> On 05/18/2016 10:53 AM, Juergen Gross wrote: On 18/05/16 16:46, Boris Ostrovsky wrote: > On 05/18/2016 08:15 AM, Juergen Gross wrote: >> } >> >> +void __init xen_time_setup_guest(void) >> +{ >> +pv_time_ops.steal_clock = xen_steal_clock; >> + >> +static_key_slow_inc(_steal_enabled); >> +/* >> + * We can't set paravirt_steal_rq_enabled as this would require >> the >> + * capability to read another cpu's runstate info. >> + */ >> +} > Won't we be accounting for stolen cycles twice now --- once from > steal_account_process_tick()->steal_clock() and second time from > do_stolen_accounting()? Uuh, yes. I guess I should rip do_stolen_accounting() out, too? >>> >>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >> >> This is easy to accomplish. :-) >> >>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>> maybe check for paravirt_steal_enabled. >> >> Is this really a sensible thing to do? There is a mechanism used by KVM >> to do the stolen accounting. I think we should use it instead of having >> a second implementation doing the same thing in case the generic one >> isn't enabled. > > I agree. > > Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I > don't think it's essential (or is it?). Not doing so will change behavior in case I rip out do_stolen_accounting(). What about "default y if XEN" ? Juergen
Re: [PATCH] xen: add steal_clock support on x86
On 18/05/16 17:45, David Vrabel wrote: > On 18/05/16 16:42, Juergen Gross wrote: >> On 18/05/16 17:25, Boris Ostrovsky wrote: >>> On 05/18/2016 10:53 AM, Juergen Gross wrote: On 18/05/16 16:46, Boris Ostrovsky wrote: > On 05/18/2016 08:15 AM, Juergen Gross wrote: >> } >> >> +void __init xen_time_setup_guest(void) >> +{ >> +pv_time_ops.steal_clock = xen_steal_clock; >> + >> +static_key_slow_inc(_steal_enabled); >> +/* >> + * We can't set paravirt_steal_rq_enabled as this would require >> the >> + * capability to read another cpu's runstate info. >> + */ >> +} > Won't we be accounting for stolen cycles twice now --- once from > steal_account_process_tick()->steal_clock() and second time from > do_stolen_accounting()? Uuh, yes. I guess I should rip do_stolen_accounting() out, too? >>> >>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >> >> This is easy to accomplish. :-) >> >>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>> maybe check for paravirt_steal_enabled. >> >> Is this really a sensible thing to do? There is a mechanism used by KVM >> to do the stolen accounting. I think we should use it instead of having >> a second implementation doing the same thing in case the generic one >> isn't enabled. > > I agree. > > Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I > don't think it's essential (or is it?). Not doing so will change behavior in case I rip out do_stolen_accounting(). What about "default y if XEN" ? Juergen
Re: [PATCH] xen: add steal_clock support on x86
On 18/05/16 16:42, Juergen Gross wrote: > On 18/05/16 17:25, Boris Ostrovsky wrote: >> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>> On 18/05/16 16:46, Boris Ostrovsky wrote: On 05/18/2016 08:15 AM, Juergen Gross wrote: > } > > +void __init xen_time_setup_guest(void) > +{ > + pv_time_ops.steal_clock = xen_steal_clock; > + > + static_key_slow_inc(_steal_enabled); > + /* > + * We can't set paravirt_steal_rq_enabled as this would require the > + * capability to read another cpu's runstate info. > + */ > +} Won't we be accounting for stolen cycles twice now --- once from steal_account_process_tick()->steal_clock() and second time from do_stolen_accounting()? >>> Uuh, yes. >>> >>> I guess I should rip do_stolen_accounting() out, too? >> >> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If > > This is easy to accomplish. :-) > >> that's indeed the case then we should ifndef do_stolen_accounting(). Or >> maybe check for paravirt_steal_enabled. > > Is this really a sensible thing to do? There is a mechanism used by KVM > to do the stolen accounting. I think we should use it instead of having > a second implementation doing the same thing in case the generic one > isn't enabled. I agree. Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I don't think it's essential (or is it?). David
Re: [PATCH] xen: add steal_clock support on x86
On 18/05/16 16:42, Juergen Gross wrote: > On 18/05/16 17:25, Boris Ostrovsky wrote: >> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>> On 18/05/16 16:46, Boris Ostrovsky wrote: On 05/18/2016 08:15 AM, Juergen Gross wrote: > } > > +void __init xen_time_setup_guest(void) > +{ > + pv_time_ops.steal_clock = xen_steal_clock; > + > + static_key_slow_inc(_steal_enabled); > + /* > + * We can't set paravirt_steal_rq_enabled as this would require the > + * capability to read another cpu's runstate info. > + */ > +} Won't we be accounting for stolen cycles twice now --- once from steal_account_process_tick()->steal_clock() and second time from do_stolen_accounting()? >>> Uuh, yes. >>> >>> I guess I should rip do_stolen_accounting() out, too? >> >> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If > > This is easy to accomplish. :-) > >> that's indeed the case then we should ifndef do_stolen_accounting(). Or >> maybe check for paravirt_steal_enabled. > > Is this really a sensible thing to do? There is a mechanism used by KVM > to do the stolen accounting. I think we should use it instead of having > a second implementation doing the same thing in case the generic one > isn't enabled. I agree. Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I don't think it's essential (or is it?). David
Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'
On Tue, May 17, 2016 at 10:01:37PM -0700, Florian Fainelli wrote: > Le 17/05/2016 21:37, Guenter Roeck a écrit : > > Hi, > > > > my xtensa qemu tests crash in -next as follows. > > > > [ ... ] > > > > [9.366256] libphy: ethoc-mdio: probed > > [9.367389] (null): could not attach to PHY > > [9.368555] (null): failed to probe MDIO bus > > [9.371540] Unable to handle kernel paging request at virtual address > > 001c > > [9.371540] pc = d0320926, ra = 903209d1 > > [9.375358] Oops: sig: 11 [#1] > > [9.376081] PREEMPT > > [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted > > 4.6.0-next-20160517 #1 > > [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3 > > [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 > > d7f45c00 d7c31bd0 > > [9.382298] a08: 00060100 > > d04b0c10 d7f45dfc d7c31bb0 > > [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: > > 001c > > [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: > > 0011 > > [9.388173] > > Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 > > > >d0485dcc d0485dcc d7fb5810 d7c2c000 d7c31c30 d7f45c00 > > d025befc > >d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 > > d7f45c34 > > [9.396652] Call Trace: > > [9.397469] [] __device_release_driver+0x7d/0x98 > > [9.398869] [] device_release_driver+0x15/0x20 > > [9.400247] [] bus_remove_device+0xc1/0xd4 > > [9.401569] [] device_del+0x109/0x15c > > [9.402794] [] phy_mdio_device_remove+0xd/0x18 > > [9.404124] [] mdiobus_unregister+0x40/0x5c > > [9.405444] [] ethoc_probe+0x534/0x5b8 > > [9.406742] [] platform_drv_probe+0x28/0x48 > > [9.408122] [] driver_probe_device+0x101/0x234 > > [9.409499] [] __driver_attach+0x7d/0x98 > > [9.410809] [] bus_for_each_dev+0x30/0x5c > > [9.412104] [] driver_attach+0x14/0x18 > > [9.413385] [] bus_add_driver+0xc9/0x198 > > [9.414686] [] driver_register+0x70/0xa0 > > [9.416001] [] __platform_driver_register+0x24/0x28 > > [9.417463] [] ethoc_driver_init+0x10/0x14 > > [9.418824] [] do_one_initcall+0x80/0x1ac > > [9.420083] [] kernel_init_freeable+0x131/0x198 > > [9.421504] [] kernel_init+0xc/0xb0 > > [9.422693] [] ret_from_kernel_thread+0x8/0xc > > > > Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio > > specification"). > > Bisect log is attached. Reverting the patch fixes the problem. > > Aside from what you pointed out, this patch was still in dicussion when > it got merged, since we got a concurrent patch from Sergei which tries > to deal with the same kind of problem. > > Do you mind sending a revert, or I can do that first thing in the morning. > I don't think I'll find the time to do that today, and also I would like to hear from Dave what his preferences are. Note that this now also affects mainline. Guenter
Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'
On Tue, May 17, 2016 at 10:01:37PM -0700, Florian Fainelli wrote: > Le 17/05/2016 21:37, Guenter Roeck a écrit : > > Hi, > > > > my xtensa qemu tests crash in -next as follows. > > > > [ ... ] > > > > [9.366256] libphy: ethoc-mdio: probed > > [9.367389] (null): could not attach to PHY > > [9.368555] (null): failed to probe MDIO bus > > [9.371540] Unable to handle kernel paging request at virtual address > > 001c > > [9.371540] pc = d0320926, ra = 903209d1 > > [9.375358] Oops: sig: 11 [#1] > > [9.376081] PREEMPT > > [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted > > 4.6.0-next-20160517 #1 > > [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3 > > [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 > > d7f45c00 d7c31bd0 > > [9.382298] a08: 00060100 > > d04b0c10 d7f45dfc d7c31bb0 > > [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: > > 001c > > [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: > > 0011 > > [9.388173] > > Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 > > > >d0485dcc d0485dcc d7fb5810 d7c2c000 d7c31c30 d7f45c00 > > d025befc > >d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 > > d7f45c34 > > [9.396652] Call Trace: > > [9.397469] [] __device_release_driver+0x7d/0x98 > > [9.398869] [] device_release_driver+0x15/0x20 > > [9.400247] [] bus_remove_device+0xc1/0xd4 > > [9.401569] [] device_del+0x109/0x15c > > [9.402794] [] phy_mdio_device_remove+0xd/0x18 > > [9.404124] [] mdiobus_unregister+0x40/0x5c > > [9.405444] [] ethoc_probe+0x534/0x5b8 > > [9.406742] [] platform_drv_probe+0x28/0x48 > > [9.408122] [] driver_probe_device+0x101/0x234 > > [9.409499] [] __driver_attach+0x7d/0x98 > > [9.410809] [] bus_for_each_dev+0x30/0x5c > > [9.412104] [] driver_attach+0x14/0x18 > > [9.413385] [] bus_add_driver+0xc9/0x198 > > [9.414686] [] driver_register+0x70/0xa0 > > [9.416001] [] __platform_driver_register+0x24/0x28 > > [9.417463] [] ethoc_driver_init+0x10/0x14 > > [9.418824] [] do_one_initcall+0x80/0x1ac > > [9.420083] [] kernel_init_freeable+0x131/0x198 > > [9.421504] [] kernel_init+0xc/0xb0 > > [9.422693] [] ret_from_kernel_thread+0x8/0xc > > > > Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio > > specification"). > > Bisect log is attached. Reverting the patch fixes the problem. > > Aside from what you pointed out, this patch was still in dicussion when > it got merged, since we got a concurrent patch from Sergei which tries > to deal with the same kind of problem. > > Do you mind sending a revert, or I can do that first thing in the morning. > I don't think I'll find the time to do that today, and also I would like to hear from Dave what his preferences are. Note that this now also affects mainline. Guenter
Re: [PATCH] xen: add steal_clock support on x86
On 05/18/2016 10:53 AM, Juergen Gross wrote: > On 18/05/16 16:46, Boris Ostrovsky wrote: >> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>> } >>> >>> +void __init xen_time_setup_guest(void) >>> +{ >>> + pv_time_ops.steal_clock = xen_steal_clock; >>> + >>> + static_key_slow_inc(_steal_enabled); >>> + /* >>> +* We can't set paravirt_steal_rq_enabled as this would require the >>> +* capability to read another cpu's runstate info. >>> +*/ >>> +} >> Won't we be accounting for stolen cycles twice now --- once from >> steal_account_process_tick()->steal_clock() and second time from >> do_stolen_accounting()? > Uuh, yes. > > I guess I should rip do_stolen_accounting() out, too? I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If that's indeed the case then we should ifndef do_stolen_accounting(). Or maybe check for paravirt_steal_enabled. -boris > It is a > Xen-specific hack, so I guess nobody will cry. Maybe it would be a > good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? > > > Juergen >
Re: [PATCH] xen: add steal_clock support on x86
On 05/18/2016 10:53 AM, Juergen Gross wrote: > On 18/05/16 16:46, Boris Ostrovsky wrote: >> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>> } >>> >>> +void __init xen_time_setup_guest(void) >>> +{ >>> + pv_time_ops.steal_clock = xen_steal_clock; >>> + >>> + static_key_slow_inc(_steal_enabled); >>> + /* >>> +* We can't set paravirt_steal_rq_enabled as this would require the >>> +* capability to read another cpu's runstate info. >>> +*/ >>> +} >> Won't we be accounting for stolen cycles twice now --- once from >> steal_account_process_tick()->steal_clock() and second time from >> do_stolen_accounting()? > Uuh, yes. > > I guess I should rip do_stolen_accounting() out, too? I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If that's indeed the case then we should ifndef do_stolen_accounting(). Or maybe check for paravirt_steal_enabled. -boris > It is a > Xen-specific hack, so I guess nobody will cry. Maybe it would be a > good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? > > > Juergen >
Re: [PATCH] xen: add steal_clock support on x86
On 18/05/16 17:25, Boris Ostrovsky wrote: > On 05/18/2016 10:53 AM, Juergen Gross wrote: >> On 18/05/16 16:46, Boris Ostrovsky wrote: >>> On 05/18/2016 08:15 AM, Juergen Gross wrote: } +void __init xen_time_setup_guest(void) +{ + pv_time_ops.steal_clock = xen_steal_clock; + + static_key_slow_inc(_steal_enabled); + /* + * We can't set paravirt_steal_rq_enabled as this would require the + * capability to read another cpu's runstate info. + */ +} >>> Won't we be accounting for stolen cycles twice now --- once from >>> steal_account_process_tick()->steal_clock() and second time from >>> do_stolen_accounting()? >> Uuh, yes. >> >> I guess I should rip do_stolen_accounting() out, too? > > I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If This is easy to accomplish. :-) > that's indeed the case then we should ifndef do_stolen_accounting(). Or > maybe check for paravirt_steal_enabled. Is this really a sensible thing to do? There is a mechanism used by KVM to do the stolen accounting. I think we should use it instead of having a second implementation doing the same thing in case the generic one isn't enabled. Juergen
Re: [PATCH] xen: add steal_clock support on x86
On 18/05/16 17:25, Boris Ostrovsky wrote: > On 05/18/2016 10:53 AM, Juergen Gross wrote: >> On 18/05/16 16:46, Boris Ostrovsky wrote: >>> On 05/18/2016 08:15 AM, Juergen Gross wrote: } +void __init xen_time_setup_guest(void) +{ + pv_time_ops.steal_clock = xen_steal_clock; + + static_key_slow_inc(_steal_enabled); + /* + * We can't set paravirt_steal_rq_enabled as this would require the + * capability to read another cpu's runstate info. + */ +} >>> Won't we be accounting for stolen cycles twice now --- once from >>> steal_account_process_tick()->steal_clock() and second time from >>> do_stolen_accounting()? >> Uuh, yes. >> >> I guess I should rip do_stolen_accounting() out, too? > > I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If This is easy to accomplish. :-) > that's indeed the case then we should ifndef do_stolen_accounting(). Or > maybe check for paravirt_steal_enabled. Is this really a sensible thing to do? There is a mechanism used by KVM to do the stolen accounting. I think we should use it instead of having a second implementation doing the same thing in case the generic one isn't enabled. Juergen
Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.
On 18/05/16 16:04, Paul Burton wrote: On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote: When starting secondary VPEs which support EVA and the SegCtl registers, copy the memory segmentation configuration from the running VPE to ensure that all VPEs in the core have a consitent virtual memory map. The EVA configuration of secondary cores is dealt with when starting the core via the CM. Signed-off-by: Matt Redfearn--- arch/mips/kernel/cps-vec.S | 16 1 file changed, 16 insertions(+) diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S index ac81edd44563..07b3274c8ae1 100644 --- a/arch/mips/kernel/cps-vec.S +++ b/arch/mips/kernel/cps-vec.S @@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes) mfc0t0, CP0_CONFIG mttc0 t0, CP0_CONFIG + /* Copy the EVA config from this VPE if the CPU supports it */ + mfc0t0, CP0_CONFIG, 1 + bgezt0, 1f +mfc0 t0, CP0_CONFIG, 2 + bgezt0, 1f +mfc0 t0, CP0_CONFIG, 3 + and t0, t0, MIPS_CONF3_SC + beqzt0, 1f +nop Hi Matt, The checks here aren't *quite* right since they do the mfc0 of the next register in the delay slot which will happen even if the M bit of the preceeding register wasn't set. There are other cases in cps-vec.S where I've made that mistake... Luckily, in this particular case, we know that we have MT ASE support which means we know that Config3 exists. So I think you can just remove the checks of Config1.M & Config2.M and just read Config3 straight away. Good point - thanks Paul :-) Matt Thanks, Paul + mfc0t0, CP0_SEGCTL0 + mttc0 t0, CP0_SEGCTL0 + mfc0t0, CP0_SEGCTL1 + mttc0 t0, CP0_SEGCTL1 + mfc0t0, CP0_SEGCTL2 + mttc0 t0, CP0_SEGCTL2 +1: /* Ensure no software interrupts are pending */ mttc0 zero, CP0_CAUSE mttc0 zero, CP0_STATUS -- 2.5.0
Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.
On 18/05/16 16:04, Paul Burton wrote: On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote: When starting secondary VPEs which support EVA and the SegCtl registers, copy the memory segmentation configuration from the running VPE to ensure that all VPEs in the core have a consitent virtual memory map. The EVA configuration of secondary cores is dealt with when starting the core via the CM. Signed-off-by: Matt Redfearn --- arch/mips/kernel/cps-vec.S | 16 1 file changed, 16 insertions(+) diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S index ac81edd44563..07b3274c8ae1 100644 --- a/arch/mips/kernel/cps-vec.S +++ b/arch/mips/kernel/cps-vec.S @@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes) mfc0t0, CP0_CONFIG mttc0 t0, CP0_CONFIG + /* Copy the EVA config from this VPE if the CPU supports it */ + mfc0t0, CP0_CONFIG, 1 + bgezt0, 1f +mfc0 t0, CP0_CONFIG, 2 + bgezt0, 1f +mfc0 t0, CP0_CONFIG, 3 + and t0, t0, MIPS_CONF3_SC + beqzt0, 1f +nop Hi Matt, The checks here aren't *quite* right since they do the mfc0 of the next register in the delay slot which will happen even if the M bit of the preceeding register wasn't set. There are other cases in cps-vec.S where I've made that mistake... Luckily, in this particular case, we know that we have MT ASE support which means we know that Config3 exists. So I think you can just remove the checks of Config1.M & Config2.M and just read Config3 straight away. Good point - thanks Paul :-) Matt Thanks, Paul + mfc0t0, CP0_SEGCTL0 + mttc0 t0, CP0_SEGCTL0 + mfc0t0, CP0_SEGCTL1 + mttc0 t0, CP0_SEGCTL1 + mfc0t0, CP0_SEGCTL2 + mttc0 t0, CP0_SEGCTL2 +1: /* Ensure no software interrupts are pending */ mttc0 zero, CP0_CAUSE mttc0 zero, CP0_STATUS -- 2.5.0
[PATCH V3] drm/tegra: Fix crash caused by reference count imbalance
Commit d2307dea14a4 ("drm/atomic: use connector references (v3)") added reference counting for DRM connectors and this caused a crash when exercising system suspend on Tegra114 Dalmore. The Tegra DSI driver implements a Tegra specific function, tegra_dsi_connector_duplicate_state(), to duplicate the connector state and destroys the state using the generic helper function, drm_atomic_helper_connector_destroy_state(). Following commit d2307dea14a4 ("drm/atomic: use connector references (v3)") there is now an imbalance in the connector reference count because the Tegra function to duplicate state does not take a reference when duplicating the state information. However, the generic helper function to destroy the state information assumes a reference has been taken and during system suspend, when the connector state is destroyed, this leads to a crash because we attempt to put the reference for an object that has already been freed. Fix this by calling __drm_atomic_helper_connector_duplicate_state() from tegra_dsi_connector_duplicate_state() to ensure that we take a reference on a connector if crtc is set. Note that this will also copy the connector state a 2nd time, but this should be harmless. By fixing tegra_dsi_connector_duplicate_state() to take a reference, although a crash was no longer seen, it was then observed that after each system suspend-resume cycle, the reference would be one greater than before the suspend-resume cycle. Following commit d2307dea14a4 ("drm/atomic: use connector references (v3)"), it was found that we also need to put the reference when calling the function tegra_dsi_connector_reset() before freeing the state. Fix this by updating tegra_dsi_connector_reset() to call the function __drm_atomic_helper_connector_destroy_state() in order to put the reference for the connector. Fixes: d2307dea14a4 ("drm/atomic: use connector references (v3)") Signed-off-by: Jon Hunter <jonath...@nvidia.com> Reviewed-by: Daniel Vetter <daniel.vet...@ffwll.ch> Acked-by: Thierry Reding <tred...@nvidia.com> --- V3 changes: - Dropped WARN_ON V2 changes: - Updated to next-20160518 - Replaced open coding of call to drm_connector_reference() with __drm_atomic_helper_connector_duplicate_state() per Daniel's feedback. drivers/gpu/drm/tegra/dsi.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c index 44e102799195..d1239ebc190f 100644 --- a/drivers/gpu/drm/tegra/dsi.c +++ b/drivers/gpu/drm/tegra/dsi.c @@ -745,13 +745,17 @@ static void tegra_dsi_soft_reset(struct tegra_dsi *dsi) static void tegra_dsi_connector_reset(struct drm_connector *connector) { - struct tegra_dsi_state *state = - kzalloc(sizeof(*state), GFP_KERNEL); + struct tegra_dsi_state *state = kzalloc(sizeof(*state), GFP_KERNEL); - if (state) { + if (!state) + return; + + if (connector->state) { + __drm_atomic_helper_connector_destroy_state(connector->state); kfree(connector->state); - __drm_atomic_helper_connector_reset(connector, >base); } + + __drm_atomic_helper_connector_reset(connector, >base); } static struct drm_connector_state * @@ -764,6 +768,9 @@ tegra_dsi_connector_duplicate_state(struct drm_connector *connector) if (!copy) return NULL; + __drm_atomic_helper_connector_duplicate_state(connector, + >base); + return >base; } -- 2.1.4
[PATCH V3] drm/tegra: Fix crash caused by reference count imbalance
Commit d2307dea14a4 ("drm/atomic: use connector references (v3)") added reference counting for DRM connectors and this caused a crash when exercising system suspend on Tegra114 Dalmore. The Tegra DSI driver implements a Tegra specific function, tegra_dsi_connector_duplicate_state(), to duplicate the connector state and destroys the state using the generic helper function, drm_atomic_helper_connector_destroy_state(). Following commit d2307dea14a4 ("drm/atomic: use connector references (v3)") there is now an imbalance in the connector reference count because the Tegra function to duplicate state does not take a reference when duplicating the state information. However, the generic helper function to destroy the state information assumes a reference has been taken and during system suspend, when the connector state is destroyed, this leads to a crash because we attempt to put the reference for an object that has already been freed. Fix this by calling __drm_atomic_helper_connector_duplicate_state() from tegra_dsi_connector_duplicate_state() to ensure that we take a reference on a connector if crtc is set. Note that this will also copy the connector state a 2nd time, but this should be harmless. By fixing tegra_dsi_connector_duplicate_state() to take a reference, although a crash was no longer seen, it was then observed that after each system suspend-resume cycle, the reference would be one greater than before the suspend-resume cycle. Following commit d2307dea14a4 ("drm/atomic: use connector references (v3)"), it was found that we also need to put the reference when calling the function tegra_dsi_connector_reset() before freeing the state. Fix this by updating tegra_dsi_connector_reset() to call the function __drm_atomic_helper_connector_destroy_state() in order to put the reference for the connector. Fixes: d2307dea14a4 ("drm/atomic: use connector references (v3)") Signed-off-by: Jon Hunter Reviewed-by: Daniel Vetter Acked-by: Thierry Reding --- V3 changes: - Dropped WARN_ON V2 changes: - Updated to next-20160518 - Replaced open coding of call to drm_connector_reference() with __drm_atomic_helper_connector_duplicate_state() per Daniel's feedback. drivers/gpu/drm/tegra/dsi.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c index 44e102799195..d1239ebc190f 100644 --- a/drivers/gpu/drm/tegra/dsi.c +++ b/drivers/gpu/drm/tegra/dsi.c @@ -745,13 +745,17 @@ static void tegra_dsi_soft_reset(struct tegra_dsi *dsi) static void tegra_dsi_connector_reset(struct drm_connector *connector) { - struct tegra_dsi_state *state = - kzalloc(sizeof(*state), GFP_KERNEL); + struct tegra_dsi_state *state = kzalloc(sizeof(*state), GFP_KERNEL); - if (state) { + if (!state) + return; + + if (connector->state) { + __drm_atomic_helper_connector_destroy_state(connector->state); kfree(connector->state); - __drm_atomic_helper_connector_reset(connector, >base); } + + __drm_atomic_helper_connector_reset(connector, >base); } static struct drm_connector_state * @@ -764,6 +768,9 @@ tegra_dsi_connector_duplicate_state(struct drm_connector *connector) if (!copy) return NULL; + __drm_atomic_helper_connector_duplicate_state(connector, + >base); + return >base; } -- 2.1.4
Re: [PATCH v6 16/20] IB/fmr_pool: Convert the cleanup thread into kthread worker API
On 04/14/2016 11:14 AM, Petr Mladek wrote: > Kthreads are currently implemented as an infinite loop. Each > has its own variant of checks for terminating, freezing, > awakening. In many cases it is unclear to say in which state > it is and sometimes it is done a wrong way. > > The plan is to convert kthreads into kthread_worker or workqueues > API. It allows to split the functionality into separate operations. > It helps to make a better structure. Also it defines a clean state > where no locks are taken, IRQs blocked, the kthread might sleep > or even be safely migrated. > > The kthread worker API is useful when we want to have a dedicated > single thread for the work. It helps to make sure that it is > available when needed. Also it allows a better control, e.g. > define a scheduling priority. > > This patch converts the frm_pool kthread into the kthread worker > API because I am not sure how busy the thread is. It is well > possible that it does not need a dedicated kthread and workqueues > would be perfectly fine. Well, the conversion between kthread > worker API and workqueues is pretty trivial. > > The patch moves one iteration from the kthread into the work function. > It preserves the check for a spurious queuing (wake up). Then it > processes one request. Finally, it re-queues itself if more requests > are pending. > > Otherwise, wake_up_process() is replaced by queuing the work. > > Important: The change is only compile tested. I did not find an easy > way how to check it in a real life. I had to do some digging myself to figure out how to move forward on this patch. The issue is that your conversion touches the fmr_pool code as your target. That code is slowly being phased out. Right now, only two drivers in the IB core support fmr: mthca and mlx4. The generally preferred method of mem management is fr instead of fmr. The mlx4 driver support both fr and fmr, while the mthca driver is fmr only. All of the other drivers are fr only. The only code that uses the fmr pools are the upper layer iSER and SRP drivers. So, if you have mthca hardware, you can test fmr using either iSER or SRP clients. If you have mlx4 hardware, you can still test fmr by using the SRP client and setting prefer_fr to false when you load the module. Now that I know that, I can provide testing of this patch when the overall patchset is ready to be submitted next. -- Doug LedfordGPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH v6 16/20] IB/fmr_pool: Convert the cleanup thread into kthread worker API
On 04/14/2016 11:14 AM, Petr Mladek wrote: > Kthreads are currently implemented as an infinite loop. Each > has its own variant of checks for terminating, freezing, > awakening. In many cases it is unclear to say in which state > it is and sometimes it is done a wrong way. > > The plan is to convert kthreads into kthread_worker or workqueues > API. It allows to split the functionality into separate operations. > It helps to make a better structure. Also it defines a clean state > where no locks are taken, IRQs blocked, the kthread might sleep > or even be safely migrated. > > The kthread worker API is useful when we want to have a dedicated > single thread for the work. It helps to make sure that it is > available when needed. Also it allows a better control, e.g. > define a scheduling priority. > > This patch converts the frm_pool kthread into the kthread worker > API because I am not sure how busy the thread is. It is well > possible that it does not need a dedicated kthread and workqueues > would be perfectly fine. Well, the conversion between kthread > worker API and workqueues is pretty trivial. > > The patch moves one iteration from the kthread into the work function. > It preserves the check for a spurious queuing (wake up). Then it > processes one request. Finally, it re-queues itself if more requests > are pending. > > Otherwise, wake_up_process() is replaced by queuing the work. > > Important: The change is only compile tested. I did not find an easy > way how to check it in a real life. I had to do some digging myself to figure out how to move forward on this patch. The issue is that your conversion touches the fmr_pool code as your target. That code is slowly being phased out. Right now, only two drivers in the IB core support fmr: mthca and mlx4. The generally preferred method of mem management is fr instead of fmr. The mlx4 driver support both fr and fmr, while the mthca driver is fmr only. All of the other drivers are fr only. The only code that uses the fmr pools are the upper layer iSER and SRP drivers. So, if you have mthca hardware, you can test fmr using either iSER or SRP clients. If you have mlx4 hardware, you can still test fmr by using the SRP client and setting prefer_fr to false when you load the module. Now that I know that, I can provide testing of this patch when the overall patchset is ready to be submitted next. -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
[RFC PATCH 10/15] x86: Use ISO atomics
Make x86 use the ISO intrinsic atomics. This boots fine, however it can't NOP out the LOCK prefixes if the number of online CPUs is 1. Without this patch, according to size -A, .text for my test kernel is: .text 6268981 18446744071578845184 with this patch: .text 6268277 18446744071578845184 There are still some underoptimisations to be dealt with. Signed-off-by: David Howells--- arch/x86/include/asm/atomic.h | 224 + 1 file changed, 4 insertions(+), 220 deletions(-) diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h index 3e8674288198..bec5c111b82e 100644 --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -13,230 +13,14 @@ * resource counting etc.. */ -#define ATOMIC_INIT(i) { (i) } - -/** - * atomic_read - read atomic variable - * @v: pointer of type atomic_t - * - * Atomically reads the value of @v. - */ -static __always_inline int atomic_read(const atomic_t *v) -{ - return READ_ONCE((v)->counter); -} - -/** - * atomic_set - set atomic variable - * @v: pointer of type atomic_t - * @i: required value - * - * Atomically sets the value of @v to @i. - */ -static __always_inline void atomic_set(atomic_t *v, int i) -{ - WRITE_ONCE(v->counter, i); -} - -/** - * atomic_add - add integer to atomic variable - * @i: integer value to add - * @v: pointer of type atomic_t - * - * Atomically adds @i to @v. - */ -static __always_inline void atomic_add(int i, atomic_t *v) -{ - asm volatile(LOCK_PREFIX "addl %1,%0" -: "+m" (v->counter) -: "ir" (i)); -} - -/** - * atomic_sub - subtract integer from atomic variable - * @i: integer value to subtract - * @v: pointer of type atomic_t - * - * Atomically subtracts @i from @v. - */ -static __always_inline void atomic_sub(int i, atomic_t *v) -{ - asm volatile(LOCK_PREFIX "subl %1,%0" -: "+m" (v->counter) -: "ir" (i)); -} - -/** - * atomic_sub_and_test - subtract value from variable and test result - * @i: integer value to subtract - * @v: pointer of type atomic_t - * - * Atomically subtracts @i from @v and returns - * true if the result is zero, or false for all - * other cases. - */ -static __always_inline int atomic_sub_and_test(int i, atomic_t *v) -{ - GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", "e"); -} - -/** - * atomic_inc - increment atomic variable - * @v: pointer of type atomic_t - * - * Atomically increments @v by 1. - */ -static __always_inline void atomic_inc(atomic_t *v) -{ - asm volatile(LOCK_PREFIX "incl %0" -: "+m" (v->counter)); -} - -/** - * atomic_dec - decrement atomic variable - * @v: pointer of type atomic_t - * - * Atomically decrements @v by 1. - */ -static __always_inline void atomic_dec(atomic_t *v) -{ - asm volatile(LOCK_PREFIX "decl %0" -: "+m" (v->counter)); -} - -/** - * atomic_dec_and_test - decrement and test - * @v: pointer of type atomic_t - * - * Atomically decrements @v by 1 and - * returns true if the result is 0, or false for all other - * cases. - */ -static __always_inline int atomic_dec_and_test(atomic_t *v) -{ - GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e"); -} - -/** - * atomic_inc_and_test - increment and test - * @v: pointer of type atomic_t - * - * Atomically increments @v by 1 - * and returns true if the result is zero, or false for all - * other cases. - */ -static __always_inline int atomic_inc_and_test(atomic_t *v) -{ - GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e"); -} - -/** - * atomic_add_negative - add and test if negative - * @i: integer value to add - * @v: pointer of type atomic_t - * - * Atomically adds @i to @v and returns true - * if the result is negative, or false when - * result is greater than or equal to zero. - */ -static __always_inline int atomic_add_negative(int i, atomic_t *v) -{ - GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", "s"); -} - -/** - * atomic_add_return - add integer and return - * @i: integer value to add - * @v: pointer of type atomic_t - * - * Atomically adds @i to @v and returns @i + @v - */ -static __always_inline int atomic_add_return(int i, atomic_t *v) -{ - return i + xadd(>counter, i); -} - -/** - * atomic_sub_return - subtract integer and return - * @v: pointer of type atomic_t - * @i: integer value to subtract - * - * Atomically subtracts @i from @v and returns @v - @i - */ -static __always_inline int atomic_sub_return(int i, atomic_t *v) -{ - return atomic_add_return(-i, v); -} - -#define atomic_inc_return(v) (atomic_add_return(1, v)) -#define atomic_dec_return(v) (atomic_sub_return(1, v)) - -static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new) -{ - return cmpxchg(>counter, old, new); -} - -static inline int
[RFC PATCH 06/15] Provide 16-bit ISO atomics
Provide an implementation of atomic_inc_short() using ISO atomics. Signed-off-by: David Howells--- include/asm-generic/iso-atomic16.h | 27 +++ 1 file changed, 27 insertions(+) create mode 100644 include/asm-generic/iso-atomic16.h diff --git a/include/asm-generic/iso-atomic16.h b/include/asm-generic/iso-atomic16.h new file mode 100644 index ..383baaae7208 --- /dev/null +++ b/include/asm-generic/iso-atomic16.h @@ -0,0 +1,27 @@ +/* Use ISO C++11 intrinsics to implement 16-bit atomic ops. + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _ASM_GENERIC_ISO_ATOMIC16_H +#define _ASM_GENERIC_ISO_ATOMIC16_H + +/** + * atomic_inc_short - increment of a short integer + * @v: pointer to type int + * + * Atomically adds 1 to @v + * Returns the new value of @v + */ +static __always_inline short int atomic_inc_short(short int *v) +{ + return __atomic_add_fetch(v, 1, __ATOMIC_SEQ_CST); +} + +#endif /* _ASM_GENERIC_ISO_ATOMIC16_H */
[RFC PATCH 11/15] x86: Use ISO bitops
Make x86 use the ISO intrinsic bitops. This boots fine, however it can't NOP out the LOCK prefixes if the number of online CPUs is 1. Without this patch, according to size -A, .text for my test kernel is: .text 6268277 18446744071578845184 with this patch: .text 6273589 18446744071578845184 There are still some underoptimisations to be dealt with. Signed-off-by: David Howells--- arch/x86/include/asm/bitops.h | 143 + 1 file changed, 2 insertions(+), 141 deletions(-) diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 7766d1cf096e..f89aec3a41d2 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -25,6 +25,8 @@ # error "Unexpected BITS_PER_LONG" #endif +#include + #define BIT_64(n) (U64_C(1) << (n)) /* @@ -54,35 +56,6 @@ #define CONST_MASK(nr) (1 << ((nr) & 7)) /** - * set_bit - Atomically set a bit in memory - * @nr: the bit to set - * @addr: the address to start counting from - * - * This function is atomic and may not be reordered. See __set_bit() - * if you do not require the atomic guarantees. - * - * Note: there are no guarantees that this function will not be reordered - * on non x86 architectures, so if you are writing portable code, - * make sure not to rely on its reordering guarantees. - * - * Note that @nr may be almost arbitrarily large; this function is not - * restricted to acting on a single-word quantity. - */ -static __always_inline void -set_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "orb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)CONST_MASK(nr)) - : "memory"); - } else { - asm volatile(LOCK_PREFIX "bts %1,%0" - : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); - } -} - -/** * __set_bit - Set a bit in memory * @nr: the bit to set * @addr: the address to start counting from @@ -96,44 +69,6 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr) asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); } -/** - * clear_bit - Clears a bit in memory - * @nr: Bit to clear - * @addr: Address to start counting from - * - * clear_bit() is atomic and may not be reordered. However, it does - * not contain a memory barrier, so if it is used for locking purposes, - * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic() - * in order to ensure changes are visible on other processors. - */ -static __always_inline void -clear_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "andb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)~CONST_MASK(nr))); - } else { - asm volatile(LOCK_PREFIX "btr %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); - } -} - -/* - * clear_bit_unlock - Clears a bit in memory - * @nr: Bit to clear - * @addr: Address to start counting from - * - * clear_bit() is atomic and implies release semantics before the memory - * operation. It can be used for an unlock. - */ -static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *addr) -{ - barrier(); - clear_bit(nr, addr); -} - static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) { asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); @@ -172,54 +107,6 @@ static __always_inline void __change_bit(long nr, volatile unsigned long *addr) } /** - * change_bit - Toggle a bit in memory - * @nr: Bit to change - * @addr: Address to start counting from - * - * change_bit() is atomic and may not be reordered. - * Note that @nr may be almost arbitrarily large; this function is not - * restricted to acting on a single-word quantity. - */ -static __always_inline void change_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "xorb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)CONST_MASK(nr))); - } else { - asm volatile(LOCK_PREFIX "btc %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); - } -} - -/** - * test_and_set_bit - Set a bit and return its old value - * @nr: Bit to set - * @addr: Address to count from - * - * This operation is atomic and cannot be reordered. - * It also implies a memory barrier. - */ -static __always_inline int test_and_set_bit(long nr, volatile unsigned long *addr) -{ - GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, "Ir", nr, "%0", "c"); -} - -/** - * test_and_set_bit_lock - Set a bit and return its old value for lock - *
[RFC PATCH 10/15] x86: Use ISO atomics
Make x86 use the ISO intrinsic atomics. This boots fine, however it can't NOP out the LOCK prefixes if the number of online CPUs is 1. Without this patch, according to size -A, .text for my test kernel is: .text 6268981 18446744071578845184 with this patch: .text 6268277 18446744071578845184 There are still some underoptimisations to be dealt with. Signed-off-by: David Howells --- arch/x86/include/asm/atomic.h | 224 + 1 file changed, 4 insertions(+), 220 deletions(-) diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h index 3e8674288198..bec5c111b82e 100644 --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -13,230 +13,14 @@ * resource counting etc.. */ -#define ATOMIC_INIT(i) { (i) } - -/** - * atomic_read - read atomic variable - * @v: pointer of type atomic_t - * - * Atomically reads the value of @v. - */ -static __always_inline int atomic_read(const atomic_t *v) -{ - return READ_ONCE((v)->counter); -} - -/** - * atomic_set - set atomic variable - * @v: pointer of type atomic_t - * @i: required value - * - * Atomically sets the value of @v to @i. - */ -static __always_inline void atomic_set(atomic_t *v, int i) -{ - WRITE_ONCE(v->counter, i); -} - -/** - * atomic_add - add integer to atomic variable - * @i: integer value to add - * @v: pointer of type atomic_t - * - * Atomically adds @i to @v. - */ -static __always_inline void atomic_add(int i, atomic_t *v) -{ - asm volatile(LOCK_PREFIX "addl %1,%0" -: "+m" (v->counter) -: "ir" (i)); -} - -/** - * atomic_sub - subtract integer from atomic variable - * @i: integer value to subtract - * @v: pointer of type atomic_t - * - * Atomically subtracts @i from @v. - */ -static __always_inline void atomic_sub(int i, atomic_t *v) -{ - asm volatile(LOCK_PREFIX "subl %1,%0" -: "+m" (v->counter) -: "ir" (i)); -} - -/** - * atomic_sub_and_test - subtract value from variable and test result - * @i: integer value to subtract - * @v: pointer of type atomic_t - * - * Atomically subtracts @i from @v and returns - * true if the result is zero, or false for all - * other cases. - */ -static __always_inline int atomic_sub_and_test(int i, atomic_t *v) -{ - GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", "e"); -} - -/** - * atomic_inc - increment atomic variable - * @v: pointer of type atomic_t - * - * Atomically increments @v by 1. - */ -static __always_inline void atomic_inc(atomic_t *v) -{ - asm volatile(LOCK_PREFIX "incl %0" -: "+m" (v->counter)); -} - -/** - * atomic_dec - decrement atomic variable - * @v: pointer of type atomic_t - * - * Atomically decrements @v by 1. - */ -static __always_inline void atomic_dec(atomic_t *v) -{ - asm volatile(LOCK_PREFIX "decl %0" -: "+m" (v->counter)); -} - -/** - * atomic_dec_and_test - decrement and test - * @v: pointer of type atomic_t - * - * Atomically decrements @v by 1 and - * returns true if the result is 0, or false for all other - * cases. - */ -static __always_inline int atomic_dec_and_test(atomic_t *v) -{ - GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e"); -} - -/** - * atomic_inc_and_test - increment and test - * @v: pointer of type atomic_t - * - * Atomically increments @v by 1 - * and returns true if the result is zero, or false for all - * other cases. - */ -static __always_inline int atomic_inc_and_test(atomic_t *v) -{ - GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e"); -} - -/** - * atomic_add_negative - add and test if negative - * @i: integer value to add - * @v: pointer of type atomic_t - * - * Atomically adds @i to @v and returns true - * if the result is negative, or false when - * result is greater than or equal to zero. - */ -static __always_inline int atomic_add_negative(int i, atomic_t *v) -{ - GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", "s"); -} - -/** - * atomic_add_return - add integer and return - * @i: integer value to add - * @v: pointer of type atomic_t - * - * Atomically adds @i to @v and returns @i + @v - */ -static __always_inline int atomic_add_return(int i, atomic_t *v) -{ - return i + xadd(>counter, i); -} - -/** - * atomic_sub_return - subtract integer and return - * @v: pointer of type atomic_t - * @i: integer value to subtract - * - * Atomically subtracts @i from @v and returns @v - @i - */ -static __always_inline int atomic_sub_return(int i, atomic_t *v) -{ - return atomic_add_return(-i, v); -} - -#define atomic_inc_return(v) (atomic_add_return(1, v)) -#define atomic_dec_return(v) (atomic_sub_return(1, v)) - -static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new) -{ - return cmpxchg(>counter, old, new); -} - -static inline int atomic_xchg(atomic_t *v, int new) -{ -
[RFC PATCH 06/15] Provide 16-bit ISO atomics
Provide an implementation of atomic_inc_short() using ISO atomics. Signed-off-by: David Howells --- include/asm-generic/iso-atomic16.h | 27 +++ 1 file changed, 27 insertions(+) create mode 100644 include/asm-generic/iso-atomic16.h diff --git a/include/asm-generic/iso-atomic16.h b/include/asm-generic/iso-atomic16.h new file mode 100644 index ..383baaae7208 --- /dev/null +++ b/include/asm-generic/iso-atomic16.h @@ -0,0 +1,27 @@ +/* Use ISO C++11 intrinsics to implement 16-bit atomic ops. + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _ASM_GENERIC_ISO_ATOMIC16_H +#define _ASM_GENERIC_ISO_ATOMIC16_H + +/** + * atomic_inc_short - increment of a short integer + * @v: pointer to type int + * + * Atomically adds 1 to @v + * Returns the new value of @v + */ +static __always_inline short int atomic_inc_short(short int *v) +{ + return __atomic_add_fetch(v, 1, __ATOMIC_SEQ_CST); +} + +#endif /* _ASM_GENERIC_ISO_ATOMIC16_H */
[RFC PATCH 11/15] x86: Use ISO bitops
Make x86 use the ISO intrinsic bitops. This boots fine, however it can't NOP out the LOCK prefixes if the number of online CPUs is 1. Without this patch, according to size -A, .text for my test kernel is: .text 6268277 18446744071578845184 with this patch: .text 6273589 18446744071578845184 There are still some underoptimisations to be dealt with. Signed-off-by: David Howells --- arch/x86/include/asm/bitops.h | 143 + 1 file changed, 2 insertions(+), 141 deletions(-) diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 7766d1cf096e..f89aec3a41d2 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -25,6 +25,8 @@ # error "Unexpected BITS_PER_LONG" #endif +#include + #define BIT_64(n) (U64_C(1) << (n)) /* @@ -54,35 +56,6 @@ #define CONST_MASK(nr) (1 << ((nr) & 7)) /** - * set_bit - Atomically set a bit in memory - * @nr: the bit to set - * @addr: the address to start counting from - * - * This function is atomic and may not be reordered. See __set_bit() - * if you do not require the atomic guarantees. - * - * Note: there are no guarantees that this function will not be reordered - * on non x86 architectures, so if you are writing portable code, - * make sure not to rely on its reordering guarantees. - * - * Note that @nr may be almost arbitrarily large; this function is not - * restricted to acting on a single-word quantity. - */ -static __always_inline void -set_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "orb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)CONST_MASK(nr)) - : "memory"); - } else { - asm volatile(LOCK_PREFIX "bts %1,%0" - : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); - } -} - -/** * __set_bit - Set a bit in memory * @nr: the bit to set * @addr: the address to start counting from @@ -96,44 +69,6 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr) asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); } -/** - * clear_bit - Clears a bit in memory - * @nr: Bit to clear - * @addr: Address to start counting from - * - * clear_bit() is atomic and may not be reordered. However, it does - * not contain a memory barrier, so if it is used for locking purposes, - * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic() - * in order to ensure changes are visible on other processors. - */ -static __always_inline void -clear_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "andb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)~CONST_MASK(nr))); - } else { - asm volatile(LOCK_PREFIX "btr %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); - } -} - -/* - * clear_bit_unlock - Clears a bit in memory - * @nr: Bit to clear - * @addr: Address to start counting from - * - * clear_bit() is atomic and implies release semantics before the memory - * operation. It can be used for an unlock. - */ -static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *addr) -{ - barrier(); - clear_bit(nr, addr); -} - static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) { asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); @@ -172,54 +107,6 @@ static __always_inline void __change_bit(long nr, volatile unsigned long *addr) } /** - * change_bit - Toggle a bit in memory - * @nr: Bit to change - * @addr: Address to start counting from - * - * change_bit() is atomic and may not be reordered. - * Note that @nr may be almost arbitrarily large; this function is not - * restricted to acting on a single-word quantity. - */ -static __always_inline void change_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "xorb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)CONST_MASK(nr))); - } else { - asm volatile(LOCK_PREFIX "btc %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); - } -} - -/** - * test_and_set_bit - Set a bit and return its old value - * @nr: Bit to set - * @addr: Address to count from - * - * This operation is atomic and cannot be reordered. - * It also implies a memory barrier. - */ -static __always_inline int test_and_set_bit(long nr, volatile unsigned long *addr) -{ - GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, "Ir", nr, "%0", "c"); -} - -/** - * test_and_set_bit_lock - Set a bit and return its old value for lock - * @nr: Bit to set - *
Re: [PATCH 3/3] ACPI: ARM64: support for ACPI_TABLE_UPGRADE
On 05/17/2016 03:46 PM, Mark Rutland wrote: > On Tue, May 17, 2016 at 03:06:03PM +0300, Aleksey Makarov wrote: [...] >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> index feab2ee..1d5e24f 100644 >> --- a/arch/arm64/kernel/setup.c >> +++ b/arch/arm64/kernel/setup.c >> @@ -261,11 +261,13 @@ void __init setup_arch(char **cmdline_p) >> efi_init(); >> arm64_memblock_init(); >> >> +paging_init(); >> + >> +early_initrd_acpi_init(); > > Do we actually need full paging up here? > > Why can we not use fixmap based copying as we do for the other cases > prior to paging_init? The implementation of the feature acpi_table_initrd_init() (drivers/acpi/tables.c) works with initrd data represented as an array in virtual memory. It uses some library utility to find the redefined tables in that array and iterates over it to copy the data to new allocated memory. So we need to rewrite it considerably to access the initrd data via fixmap. In x86 arch, linear kernel memory is already mapped by the time when early_initrd_acpi_init() and acpi_boot_table_init() are called so I think that we can just move this mapping one function earlier too. Is it ok? I will address your other comments in the next version of the patchset. Thank you Aleksey Makarov >> + >> /* Parse the ACPI tables for possible boot-time configuration */ >> acpi_boot_table_init(); >> >> -paging_init(); >> - >> if (acpi_disabled) >> unflatten_device_tree(); >> >> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig >> index c204344..ec694c0 100644 >> --- a/drivers/acpi/Kconfig >> +++ b/drivers/acpi/Kconfig >> @@ -313,7 +313,7 @@ config ACPI_CUSTOM_DSDT >> >> config ACPI_TABLE_UPGRADE >> bool "Allow upgrading ACPI tables via initrd" >> -depends on BLK_DEV_INITRD && X86 >> +depends on BLK_DEV_INITRD && (X86 || ARM64) >> default y >> help > > Perhaps ARCH_HAS_ACPI_TABLE_UPGRADE? > >>This option provides functionality to upgrade arbitrary ACPI tables >> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c >> index 82be84a..c35034d8 100644 >> --- a/drivers/acpi/tables.c >> +++ b/drivers/acpi/tables.c >> @@ -482,6 +482,14 @@ static DECLARE_BITMAP(acpi_initrd_installed, >> NR_ACPI_INITRD_TABLES); >> >> #define MAP_CHUNK_SIZE (NR_FIX_BTMAPS << PAGE_SHIFT) >> >> +#if defined(CONFIG_X86) >> +#define MAX_PHYS_ACPI_TABLES (max_low_pfn_mapped << PAGE_SHIFT) >> +#elif defined(CONFIG_ARM64) >> +#define MAX_PHYS_ACPI_TABLES PFN_PHYS(max_pfn) >> +#else >> +#error "MAX_PHYS_ACPI_TABLES is not defiend for this architecture" >> +#endif > > s/defiend/defined/ > > This should be defined in an arch-specific header if it is actually > arch-specific. > > Thanks, > Mark. > >> + >> static void __init acpi_table_initrd_init(void *data, size_t size) >> { >> int sig, no, table_nr = 0, total_offset = 0; >> @@ -541,7 +549,7 @@ static void __init acpi_table_initrd_init(void *data, >> size_t size) >> return; >> >> acpi_tables_addr = >> -memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT, >> +memblock_find_in_range(0, MAX_PHYS_ACPI_TABLES, >> all_tables_size, PAGE_SIZE); >> if (!acpi_tables_addr) { >> WARN_ON(1); >> -- >> 2.8.2 >>
Re: [PATCH 3/3] ACPI: ARM64: support for ACPI_TABLE_UPGRADE
On 05/17/2016 03:46 PM, Mark Rutland wrote: > On Tue, May 17, 2016 at 03:06:03PM +0300, Aleksey Makarov wrote: [...] >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> index feab2ee..1d5e24f 100644 >> --- a/arch/arm64/kernel/setup.c >> +++ b/arch/arm64/kernel/setup.c >> @@ -261,11 +261,13 @@ void __init setup_arch(char **cmdline_p) >> efi_init(); >> arm64_memblock_init(); >> >> +paging_init(); >> + >> +early_initrd_acpi_init(); > > Do we actually need full paging up here? > > Why can we not use fixmap based copying as we do for the other cases > prior to paging_init? The implementation of the feature acpi_table_initrd_init() (drivers/acpi/tables.c) works with initrd data represented as an array in virtual memory. It uses some library utility to find the redefined tables in that array and iterates over it to copy the data to new allocated memory. So we need to rewrite it considerably to access the initrd data via fixmap. In x86 arch, linear kernel memory is already mapped by the time when early_initrd_acpi_init() and acpi_boot_table_init() are called so I think that we can just move this mapping one function earlier too. Is it ok? I will address your other comments in the next version of the patchset. Thank you Aleksey Makarov >> + >> /* Parse the ACPI tables for possible boot-time configuration */ >> acpi_boot_table_init(); >> >> -paging_init(); >> - >> if (acpi_disabled) >> unflatten_device_tree(); >> >> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig >> index c204344..ec694c0 100644 >> --- a/drivers/acpi/Kconfig >> +++ b/drivers/acpi/Kconfig >> @@ -313,7 +313,7 @@ config ACPI_CUSTOM_DSDT >> >> config ACPI_TABLE_UPGRADE >> bool "Allow upgrading ACPI tables via initrd" >> -depends on BLK_DEV_INITRD && X86 >> +depends on BLK_DEV_INITRD && (X86 || ARM64) >> default y >> help > > Perhaps ARCH_HAS_ACPI_TABLE_UPGRADE? > >>This option provides functionality to upgrade arbitrary ACPI tables >> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c >> index 82be84a..c35034d8 100644 >> --- a/drivers/acpi/tables.c >> +++ b/drivers/acpi/tables.c >> @@ -482,6 +482,14 @@ static DECLARE_BITMAP(acpi_initrd_installed, >> NR_ACPI_INITRD_TABLES); >> >> #define MAP_CHUNK_SIZE (NR_FIX_BTMAPS << PAGE_SHIFT) >> >> +#if defined(CONFIG_X86) >> +#define MAX_PHYS_ACPI_TABLES (max_low_pfn_mapped << PAGE_SHIFT) >> +#elif defined(CONFIG_ARM64) >> +#define MAX_PHYS_ACPI_TABLES PFN_PHYS(max_pfn) >> +#else >> +#error "MAX_PHYS_ACPI_TABLES is not defiend for this architecture" >> +#endif > > s/defiend/defined/ > > This should be defined in an arch-specific header if it is actually > arch-specific. > > Thanks, > Mark. > >> + >> static void __init acpi_table_initrd_init(void *data, size_t size) >> { >> int sig, no, table_nr = 0, total_offset = 0; >> @@ -541,7 +549,7 @@ static void __init acpi_table_initrd_init(void *data, >> size_t size) >> return; >> >> acpi_tables_addr = >> -memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT, >> +memblock_find_in_range(0, MAX_PHYS_ACPI_TABLES, >> all_tables_size, PAGE_SIZE); >> if (!acpi_tables_addr) { >> WARN_ON(1); >> -- >> 2.8.2 >>
Re: [PATCH] mlx5: avoid unused variable warning
On Wed, May 18, 2016 at 04:21:07PM +0200, Arnd Bergmann wrote: > When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5 > ethernet driver because the tc_for_each_action() loop never references > the iterator: > > mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower': > mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' > [-Werror=unused-variable] > struct tc_action *a; > > This changes the dummy tc_for_each_action() macro by adding a > cast to void, letting the compiler know that the variable is > intentionally declared but not used here. I could not come up > with a nicer workaround, but this seems to do the trick. > > Signed-off-by: Arnd Bergmann> Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics > support") > Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef") > --- Acked-By: Amir Vadai Thanks Arnd.
[RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics
Provide an implementation of various atomic bit functions using the __atomic_fetch_{and,or,xor}() C++11 atomics. This involves creating a mask and adjusting the address in each function as the bit number can be greater than the number of bits in a word and can also be negative. On some arches, such as those that use LL/SC type constructs or CMPXCHG, AND/OR/XOR are normally used anyway so this is not a problem. On powerpc64, for example: .test_and_set_bit: clrlwi r10,r3,26 rldicl r3,r3,58,6 rldicr r3,r3,3,60 li r9,1 sld r9,r9,r10 add r4,r4,r3 hwsync retry: ldarx r3,0,r4 or r10,r3,r9 <--- OR is used here stdcx. r10,0,r4 bne-retry hwsync and r9,r9,r3 addic r3,r9,-1 subfe r3,r3,r9 blr However, on some arches such as x86, there are bit wangling instructions that take a bit number and may even correctly handle bit numbers outside the range 0...BITS_PER_LONG-1. Even on x86, if the result isn't of interest, it is apparently faster to use LOCKed AND/OR/XOR than to use LOCKed BTR/BTS/BTC. However, the current gcc implementation always uses a CMPXCHG loop rather than using BTR/BTS/BTC, though it will still use AND/OR/XOR if it can: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 With this fixed, bool try_test_and_change_bit(unsigned long *p) { return test_and_change_bit(83, p); } becomes: foo_test_and_change_bit: lock btcq $0x13,0x8(%rdi) setb %al retq There are three things to note here: (1) The SETB instruction is generated by the compiler. (2) Because the memory variable is a long, BTCQ is used - which uses an extra byte of opcode. We could use BTCL instead as the inline asm does... except that that generates technically broken code since the bit number is a *64-bit* number and BTCL only takes a 32-bit bit number. I'm not sure that that is worth worrying about, however. (3) The compiler divided the constant bit number between the immediate value to the BTCQ instruction and the displacement on the memory address. The immediate value is, however, an imm8 so this is unnecessary outside the range 127..-128 I think. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c20 Next there's: const char *foo(unsigned long *p); const char *foo_test_and_change_bit_2(unsigned long *p) { return test_and_change_bit(83, p) ? foo(p) : "bar"; } becomes: foo_test_and_change_bit_2: lock btcq $0x13,0x8(%rdi) jae.L5 jmpq foo .L5 mov$.LC0,%eax retq with the compiler generating a conditional jump around a tail-call to foo(). Next, we can use a variable as the bit number rather than a constant. bool foo_test_and_change_bit_3(unsigned long *p, long n) { return test_and_change_bit(n, p); } becomes: foo_test_and_change_bit_3: mov%rsi,%rax} and$0x3f,%esi } Splitting the bit number sar$0x6,%rax} lock btc %rsi,(%rdi,%rax,8) setb %al retq As can be seen, the compiler again splits the bit number up unnecessarily since the CPU will do that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13 Finally, if we ignore the result: void foo_test_and_change_bit_4(unsigned long *p) { test_and_change_bit(83, p); } void foo_test_and_change_bit_5(unsigned long *p, long n) { test_and_change_bit(n, p); } the compiler will use XOR instead of BTC: foo_test_and_change_bit_4: lock xorq $0x8,0x8(%rdi) retq foo_test_and_change_bit_5: mov%rsi,%rdx mov%esi,%ecx mov$0x1,%eax sar$0x6,%rdx shl%cl,%rax lock xor %rax,(%rdi,%rdx,8) retq Also, the compiler won't optimise __atomic_load[_n]() to either a TEST or a BT instruction: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13 For arm64, if compiled with -march=armv8-a+lse this will generate LDCLR, LDSET and LDEOR instructions for __atomic_fetch_{and,or,xor}() with appropriate {,A,L,AL} suffixes. However, because LSCLR is actually an AND-NOT not an AND, the operand is negated by the compiler to undo the negation by the CPU. This combined with the ~ operator in the code to generate the mask: static
Re: [PATCH] mlx5: avoid unused variable warning
On Wed, May 18, 2016 at 04:21:07PM +0200, Arnd Bergmann wrote: > When CONFIG_NET_CLS_ACT is disabled, we get a new warning in the mlx5 > ethernet driver because the tc_for_each_action() loop never references > the iterator: > > mellanox/mlx5/core/en_tc.c: In function 'mlx5e_stats_flower': > mellanox/mlx5/core/en_tc.c:431:20: error: unused variable 'a' > [-Werror=unused-variable] > struct tc_action *a; > > This changes the dummy tc_for_each_action() macro by adding a > cast to void, letting the compiler know that the variable is > intentionally declared but not used here. I could not come up > with a nicer workaround, but this seems to do the trick. > > Signed-off-by: Arnd Bergmann > Fixes: aad7e08d39bd ("net/mlx5e: Hardware offloaded flower filter statistics > support") > Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef") > --- Acked-By: Amir Vadai Thanks Arnd.
[RFC PATCH 08/15] Provide an implementation of bitops using C++11 atomics
Provide an implementation of various atomic bit functions using the __atomic_fetch_{and,or,xor}() C++11 atomics. This involves creating a mask and adjusting the address in each function as the bit number can be greater than the number of bits in a word and can also be negative. On some arches, such as those that use LL/SC type constructs or CMPXCHG, AND/OR/XOR are normally used anyway so this is not a problem. On powerpc64, for example: .test_and_set_bit: clrlwi r10,r3,26 rldicl r3,r3,58,6 rldicr r3,r3,3,60 li r9,1 sld r9,r9,r10 add r4,r4,r3 hwsync retry: ldarx r3,0,r4 or r10,r3,r9 <--- OR is used here stdcx. r10,0,r4 bne-retry hwsync and r9,r9,r3 addic r3,r9,-1 subfe r3,r3,r9 blr However, on some arches such as x86, there are bit wangling instructions that take a bit number and may even correctly handle bit numbers outside the range 0...BITS_PER_LONG-1. Even on x86, if the result isn't of interest, it is apparently faster to use LOCKed AND/OR/XOR than to use LOCKed BTR/BTS/BTC. However, the current gcc implementation always uses a CMPXCHG loop rather than using BTR/BTS/BTC, though it will still use AND/OR/XOR if it can: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 With this fixed, bool try_test_and_change_bit(unsigned long *p) { return test_and_change_bit(83, p); } becomes: foo_test_and_change_bit: lock btcq $0x13,0x8(%rdi) setb %al retq There are three things to note here: (1) The SETB instruction is generated by the compiler. (2) Because the memory variable is a long, BTCQ is used - which uses an extra byte of opcode. We could use BTCL instead as the inline asm does... except that that generates technically broken code since the bit number is a *64-bit* number and BTCL only takes a 32-bit bit number. I'm not sure that that is worth worrying about, however. (3) The compiler divided the constant bit number between the immediate value to the BTCQ instruction and the displacement on the memory address. The immediate value is, however, an imm8 so this is unnecessary outside the range 127..-128 I think. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c20 Next there's: const char *foo(unsigned long *p); const char *foo_test_and_change_bit_2(unsigned long *p) { return test_and_change_bit(83, p) ? foo(p) : "bar"; } becomes: foo_test_and_change_bit_2: lock btcq $0x13,0x8(%rdi) jae.L5 jmpq foo .L5 mov$.LC0,%eax retq with the compiler generating a conditional jump around a tail-call to foo(). Next, we can use a variable as the bit number rather than a constant. bool foo_test_and_change_bit_3(unsigned long *p, long n) { return test_and_change_bit(n, p); } becomes: foo_test_and_change_bit_3: mov%rsi,%rax} and$0x3f,%esi } Splitting the bit number sar$0x6,%rax} lock btc %rsi,(%rdi,%rax,8) setb %al retq As can be seen, the compiler again splits the bit number up unnecessarily since the CPU will do that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13 Finally, if we ignore the result: void foo_test_and_change_bit_4(unsigned long *p) { test_and_change_bit(83, p); } void foo_test_and_change_bit_5(unsigned long *p, long n) { test_and_change_bit(n, p); } the compiler will use XOR instead of BTC: foo_test_and_change_bit_4: lock xorq $0x8,0x8(%rdi) retq foo_test_and_change_bit_5: mov%rsi,%rdx mov%esi,%ecx mov$0x1,%eax sar$0x6,%rdx shl%cl,%rax lock xor %rax,(%rdi,%rdx,8) retq Also, the compiler won't optimise __atomic_load[_n]() to either a TEST or a BT instruction: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244#c13 For arm64, if compiled with -march=armv8-a+lse this will generate LDCLR, LDSET and LDEOR instructions for __atomic_fetch_{and,or,xor}() with appropriate {,A,L,AL} suffixes. However, because LSCLR is actually an AND-NOT not an AND, the operand is negated by the compiler to undo the negation by the CPU. This combined with the ~ operator in the code to generate the mask: static
[RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends
Make x86 use the ISO intrinsic xchg(), cmpxchg() and similar functions. This boots fine, however it can't NOP out the LOCK prefixes if the number of online CPUs is 1. Without this patch, according to size -A, .text for my test kernel is: .text 6273589 18446744071578845184 with this patch: .text 6273013 18446744071578845184 There are still some underoptimisations to be dealt with. Signed-off-by: David Howells--- arch/x86/include/asm/cmpxchg.h| 99 - arch/x86/include/asm/cmpxchg_32.h |3 - arch/x86/include/asm/cmpxchg_64.h |6 -- 3 files changed, 1 insertion(+), 107 deletions(-) diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h index 9733361fed6f..cde270af1b94 100644 --- a/arch/x86/include/asm/cmpxchg.h +++ b/arch/x86/include/asm/cmpxchg.h @@ -2,6 +2,7 @@ #define ASM_X86_CMPXCHG_H #include +#include #include #include /* Provides LOCK_PREFIX */ @@ -9,12 +10,8 @@ * Non-existant functions to indicate usage errors at link time * (or compile-time if the compiler implements __compiletime_error(). */ -extern void __xchg_wrong_size(void) - __compiletime_error("Bad argument size for xchg"); extern void __cmpxchg_wrong_size(void) __compiletime_error("Bad argument size for cmpxchg"); -extern void __xadd_wrong_size(void) - __compiletime_error("Bad argument size for xadd"); extern void __add_wrong_size(void) __compiletime_error("Bad argument size for add"); @@ -34,48 +31,6 @@ extern void __add_wrong_size(void) #define__X86_CASE_Q-1 /* sizeof will never return -1 */ #endif -/* - * An exchange-type operation, which takes a value and a pointer, and - * returns the old value. - */ -#define __xchg_op(ptr, arg, op, lock) \ - ({ \ - __typeof__ (*(ptr)) __ret = (arg); \ - switch (sizeof(*(ptr))) { \ - case __X86_CASE_B: \ - asm volatile (lock #op "b %b0, %1\n"\ - : "+q" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - case __X86_CASE_W: \ - asm volatile (lock #op "w %w0, %1\n"\ - : "+r" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - case __X86_CASE_L: \ - asm volatile (lock #op "l %0, %1\n" \ - : "+r" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - case __X86_CASE_Q: \ - asm volatile (lock #op "q %q0, %1\n"\ - : "+r" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - default:\ - __ ## op ## _wrong_size(); \ - } \ - __ret; \ - }) - -/* - * Note: no "lock" prefix even on SMP: xchg always implies lock anyway. - * Since this is generally used to protect other memory information, we - * use "asm volatile" and "memory" clobbers to prevent gcc from moving - * information around. - */ -#define xchg(ptr, v) __xchg_op((ptr), (v), xchg, "") - /* * Atomic compare and exchange. Compare OLD with MEM, if identical, * store NEW in MEM. Return the initial value in MEM. Success is @@ -129,9 +84,6 @@ extern void __add_wrong_size(void) __ret; \ }) -#define __cmpxchg(ptr, old, new, size) \ - __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX) - #define __sync_cmpxchg(ptr, old, new, size)\ __raw_cmpxchg((ptr), (old), (new), (size), "lock; ") @@ -144,9 +96,6 @@ extern void __add_wrong_size(void) # include #endif -#define cmpxchg(ptr, old, new) \ - __cmpxchg(ptr, old, new, sizeof(*(ptr))) - #define
[RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics
Make some improvements to the x86 spinlock implementation using the ISO C++11 intrinsic atomic wrappers. Note that since the spinlocks use cmpxchg(), xadd() and __add(), it already makes some use of this. (1) Use acquire variants in spin_lock() and spin_trylock() paths. On x86 this shouldn't actually make a difference. (2) Use try_cmpxchg*() rather than cmpxchg*(). This should produce slightly more efficient code as a comparison can be discarded in favour of using the value in the Z flag. Note as mentioned before, the LOCK prefix is currently mandatory, but a gcc ticket is open to try and get these listed for suppression. Signed-off-by: David Howells--- arch/x86/include/asm/qspinlock.h |2 +- arch/x86/include/asm/spinlock.h | 18 ++ 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index eaba08076030..b56b8c2a9ed2 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -55,7 +55,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock) do { while (atomic_read(>val) != 0) cpu_relax(); - } while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0); + } while (!atomic_try_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL)); return true; } diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index be0a05913b91..1cbddef2b3f3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -81,7 +81,7 @@ static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, new.tickets.tail = old.tickets.tail; /* try to clear slowpath flag when there are no contenders */ - cmpxchg(>head_tail, old.head_tail, new.head_tail); + try_cmpxchg_acquire(>head_tail, old.head_tail, new.head_tail); } } @@ -107,7 +107,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) { register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; - inc = xadd(>tickets, inc); + inc = xadd_acquire(>tickets, inc); if (likely(inc.head == inc.tail)) goto out; @@ -128,19 +128,21 @@ out: barrier(); /* make sure nothing creeps before the lock is taken */ } -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock) { arch_spinlock_t old, new; old.tickets = READ_ONCE(lock->tickets); if (!__tickets_equal(old.tickets.head, old.tickets.tail)) - return 0; + return false; new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT); new.head_tail &= ~TICKET_SLOWPATH_FLAG; - /* cmpxchg is a full barrier, so nothing can move before it */ - return cmpxchg(>head_tail, old.head_tail, new.head_tail) == old.head_tail; + /* Insert an acquire barrier with the cmpxchg so that nothing +* can move before it. +*/ + return try_cmpxchg_acquire(>head_tail, old.head_tail, new.head_tail); } static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) @@ -151,14 +153,14 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); - head = xadd(>tickets.head, TICKET_LOCK_INC); + head = xadd_release(>tickets.head, TICKET_LOCK_INC); if (unlikely(head & TICKET_SLOWPATH_FLAG)) { head &= ~TICKET_SLOWPATH_FLAG; __ticket_unlock_kick(lock, (head + TICKET_LOCK_INC)); } } else - __add(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); + add_release(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); } static inline int arch_spin_is_locked(arch_spinlock_t *lock)
[RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally
Make the ISO bitops use 32-bit values internally so that on x86 we emit the smaller BTRL/BTSL/BTCL instructions rather than BTRQ/BTSQ/BTCQ (which require a prefix). However, if we're going to do this, we really need to change the bit numbers for test_bit(), set_bit(), test_and_set_bit(), etc. to be int rather than long because BTR/BTS/BTC take a bit number that's the same size as the memory variable size. This means that BTSQ, for example, has a bit number in the range -2^63..2^63-1 whereas BTSL only has a range of -2^31..2^31-1. So, technically, the current inline-asm set_bit() and co. for non-constant bit number are implemented incorrectly as they can't handle the full range of the long bit number. However, in practice, it's probably not a problem. Signed-off-by: David Howells--- include/asm-generic/iso-bitops.h | 57 +- 1 file changed, 32 insertions(+), 25 deletions(-) diff --git a/include/asm-generic/iso-bitops.h b/include/asm-generic/iso-bitops.h index 64d5067e3a67..e87b91965e67 100644 --- a/include/asm-generic/iso-bitops.h +++ b/include/asm-generic/iso-bitops.h @@ -18,11 +18,12 @@ static __always_inline bool test_bit(long bit, const volatile unsigned long *addr) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); - unsigned long old; + const volatile unsigned int *addr32 = (const volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); + unsigned int old; - addr += bit >> _BITOPS_LONG_SHIFT; - old = __atomic_load_n(addr, __ATOMIC_RELAXED); + addr32 += bit >> 5; + old = __atomic_load_n(addr32, __ATOMIC_RELAXED); return old & mask; } @@ -44,10 +45,11 @@ bool test_bit(long bit, const volatile unsigned long *addr) static __always_inline void iso_set_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); - addr += bit >> _BITOPS_LONG_SHIFT; - __atomic_fetch_or(addr, mask, memorder); + addr32 += bit >> 5; + __atomic_fetch_or(addr32, mask, memorder); } #define set_bit(b, a) iso_set_bit((b), (a), __ATOMIC_ACQ_REL) @@ -75,10 +77,11 @@ void iso_set_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline void iso_clear_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); - addr += bit >> _BITOPS_LONG_SHIFT; - __atomic_fetch_and(addr, ~mask, memorder); + addr32 += bit >> 5; + __atomic_fetch_and(addr32, ~mask, memorder); } #define clear_bit(b, a) iso_clear_bit((b), (a), __ATOMIC_ACQ_REL) @@ -105,10 +108,11 @@ void iso_clear_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline void iso_change_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); - addr += bit >> _BITOPS_LONG_SHIFT; - __atomic_fetch_xor(addr, mask, memorder); + addr32 += bit >> 5; + __atomic_fetch_xor(addr32, mask, memorder); } #define change_bit(b, a) iso_change_bit((b), (a), __ATOMIC_ACQ_REL) @@ -124,11 +128,12 @@ void iso_change_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline bool iso_test_and_set_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); - unsigned long old; + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); + unsigned int old; - addr += bit >> _BITOPS_LONG_SHIFT; - old = __atomic_fetch_or(addr, mask, memorder); + addr32 += bit >> 5; + old = __atomic_fetch_or(addr32, mask, memorder); return old & mask; } @@ -146,11 +151,12 @@ bool iso_test_and_set_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline bool iso_test_and_clear_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); - unsigned long old; + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); + unsigned int old; - addr += bit >> _BITOPS_LONG_SHIFT; - old = __atomic_fetch_and(addr, ~mask, memorder); + addr32 += bit >> 5; + old = __atomic_fetch_and(addr32, ~mask, memorder); return old & mask; } @@ -168,11 +174,12 @@ bool iso_test_and_clear_bit(long
[RFC PATCH 09/15] Make the ISO bitops use 32-bit values internally
Make the ISO bitops use 32-bit values internally so that on x86 we emit the smaller BTRL/BTSL/BTCL instructions rather than BTRQ/BTSQ/BTCQ (which require a prefix). However, if we're going to do this, we really need to change the bit numbers for test_bit(), set_bit(), test_and_set_bit(), etc. to be int rather than long because BTR/BTS/BTC take a bit number that's the same size as the memory variable size. This means that BTSQ, for example, has a bit number in the range -2^63..2^63-1 whereas BTSL only has a range of -2^31..2^31-1. So, technically, the current inline-asm set_bit() and co. for non-constant bit number are implemented incorrectly as they can't handle the full range of the long bit number. However, in practice, it's probably not a problem. Signed-off-by: David Howells --- include/asm-generic/iso-bitops.h | 57 +- 1 file changed, 32 insertions(+), 25 deletions(-) diff --git a/include/asm-generic/iso-bitops.h b/include/asm-generic/iso-bitops.h index 64d5067e3a67..e87b91965e67 100644 --- a/include/asm-generic/iso-bitops.h +++ b/include/asm-generic/iso-bitops.h @@ -18,11 +18,12 @@ static __always_inline bool test_bit(long bit, const volatile unsigned long *addr) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); - unsigned long old; + const volatile unsigned int *addr32 = (const volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); + unsigned int old; - addr += bit >> _BITOPS_LONG_SHIFT; - old = __atomic_load_n(addr, __ATOMIC_RELAXED); + addr32 += bit >> 5; + old = __atomic_load_n(addr32, __ATOMIC_RELAXED); return old & mask; } @@ -44,10 +45,11 @@ bool test_bit(long bit, const volatile unsigned long *addr) static __always_inline void iso_set_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); - addr += bit >> _BITOPS_LONG_SHIFT; - __atomic_fetch_or(addr, mask, memorder); + addr32 += bit >> 5; + __atomic_fetch_or(addr32, mask, memorder); } #define set_bit(b, a) iso_set_bit((b), (a), __ATOMIC_ACQ_REL) @@ -75,10 +77,11 @@ void iso_set_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline void iso_clear_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); - addr += bit >> _BITOPS_LONG_SHIFT; - __atomic_fetch_and(addr, ~mask, memorder); + addr32 += bit >> 5; + __atomic_fetch_and(addr32, ~mask, memorder); } #define clear_bit(b, a) iso_clear_bit((b), (a), __ATOMIC_ACQ_REL) @@ -105,10 +108,11 @@ void iso_clear_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline void iso_change_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); - addr += bit >> _BITOPS_LONG_SHIFT; - __atomic_fetch_xor(addr, mask, memorder); + addr32 += bit >> 5; + __atomic_fetch_xor(addr32, mask, memorder); } #define change_bit(b, a) iso_change_bit((b), (a), __ATOMIC_ACQ_REL) @@ -124,11 +128,12 @@ void iso_change_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline bool iso_test_and_set_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); - unsigned long old; + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); + unsigned int old; - addr += bit >> _BITOPS_LONG_SHIFT; - old = __atomic_fetch_or(addr, mask, memorder); + addr32 += bit >> 5; + old = __atomic_fetch_or(addr32, mask, memorder); return old & mask; } @@ -146,11 +151,12 @@ bool iso_test_and_set_bit(long bit, volatile unsigned long *addr, int memorder) static __always_inline bool iso_test_and_clear_bit(long bit, volatile unsigned long *addr, int memorder) { - unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); - unsigned long old; + volatile unsigned int *addr32 = (volatile unsigned int *)addr; + unsigned int mask = 1U << (bit & (32 - 1)); + unsigned int old; - addr += bit >> _BITOPS_LONG_SHIFT; - old = __atomic_fetch_and(addr, ~mask, memorder); + addr32 += bit >> 5; + old = __atomic_fetch_and(addr32, ~mask, memorder); return old & mask; } @@ -168,11 +174,12 @@ bool iso_test_and_clear_bit(long bit, volatile
[RFC PATCH 12/15] x86: Use ISO xchg(), cmpxchg() and friends
Make x86 use the ISO intrinsic xchg(), cmpxchg() and similar functions. This boots fine, however it can't NOP out the LOCK prefixes if the number of online CPUs is 1. Without this patch, according to size -A, .text for my test kernel is: .text 6273589 18446744071578845184 with this patch: .text 6273013 18446744071578845184 There are still some underoptimisations to be dealt with. Signed-off-by: David Howells --- arch/x86/include/asm/cmpxchg.h| 99 - arch/x86/include/asm/cmpxchg_32.h |3 - arch/x86/include/asm/cmpxchg_64.h |6 -- 3 files changed, 1 insertion(+), 107 deletions(-) diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h index 9733361fed6f..cde270af1b94 100644 --- a/arch/x86/include/asm/cmpxchg.h +++ b/arch/x86/include/asm/cmpxchg.h @@ -2,6 +2,7 @@ #define ASM_X86_CMPXCHG_H #include +#include #include #include /* Provides LOCK_PREFIX */ @@ -9,12 +10,8 @@ * Non-existant functions to indicate usage errors at link time * (or compile-time if the compiler implements __compiletime_error(). */ -extern void __xchg_wrong_size(void) - __compiletime_error("Bad argument size for xchg"); extern void __cmpxchg_wrong_size(void) __compiletime_error("Bad argument size for cmpxchg"); -extern void __xadd_wrong_size(void) - __compiletime_error("Bad argument size for xadd"); extern void __add_wrong_size(void) __compiletime_error("Bad argument size for add"); @@ -34,48 +31,6 @@ extern void __add_wrong_size(void) #define__X86_CASE_Q-1 /* sizeof will never return -1 */ #endif -/* - * An exchange-type operation, which takes a value and a pointer, and - * returns the old value. - */ -#define __xchg_op(ptr, arg, op, lock) \ - ({ \ - __typeof__ (*(ptr)) __ret = (arg); \ - switch (sizeof(*(ptr))) { \ - case __X86_CASE_B: \ - asm volatile (lock #op "b %b0, %1\n"\ - : "+q" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - case __X86_CASE_W: \ - asm volatile (lock #op "w %w0, %1\n"\ - : "+r" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - case __X86_CASE_L: \ - asm volatile (lock #op "l %0, %1\n" \ - : "+r" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - case __X86_CASE_Q: \ - asm volatile (lock #op "q %q0, %1\n"\ - : "+r" (__ret), "+m" (*(ptr)) \ - : : "memory", "cc"); \ - break; \ - default:\ - __ ## op ## _wrong_size(); \ - } \ - __ret; \ - }) - -/* - * Note: no "lock" prefix even on SMP: xchg always implies lock anyway. - * Since this is generally used to protect other memory information, we - * use "asm volatile" and "memory" clobbers to prevent gcc from moving - * information around. - */ -#define xchg(ptr, v) __xchg_op((ptr), (v), xchg, "") - /* * Atomic compare and exchange. Compare OLD with MEM, if identical, * store NEW in MEM. Return the initial value in MEM. Success is @@ -129,9 +84,6 @@ extern void __add_wrong_size(void) __ret; \ }) -#define __cmpxchg(ptr, old, new, size) \ - __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX) - #define __sync_cmpxchg(ptr, old, new, size)\ __raw_cmpxchg((ptr), (old), (new), (size), "lock; ") @@ -144,9 +96,6 @@ extern void __add_wrong_size(void) # include #endif -#define cmpxchg(ptr, old, new) \ - __cmpxchg(ptr, old, new, sizeof(*(ptr))) - #define sync_cmpxchg(ptr, old,
[RFC PATCH 13/15] x86: Improve spinlocks using ISO C++11 intrinsic atomics
Make some improvements to the x86 spinlock implementation using the ISO C++11 intrinsic atomic wrappers. Note that since the spinlocks use cmpxchg(), xadd() and __add(), it already makes some use of this. (1) Use acquire variants in spin_lock() and spin_trylock() paths. On x86 this shouldn't actually make a difference. (2) Use try_cmpxchg*() rather than cmpxchg*(). This should produce slightly more efficient code as a comparison can be discarded in favour of using the value in the Z flag. Note as mentioned before, the LOCK prefix is currently mandatory, but a gcc ticket is open to try and get these listed for suppression. Signed-off-by: David Howells --- arch/x86/include/asm/qspinlock.h |2 +- arch/x86/include/asm/spinlock.h | 18 ++ 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index eaba08076030..b56b8c2a9ed2 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -55,7 +55,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock) do { while (atomic_read(>val) != 0) cpu_relax(); - } while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0); + } while (!atomic_try_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL)); return true; } diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index be0a05913b91..1cbddef2b3f3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -81,7 +81,7 @@ static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, new.tickets.tail = old.tickets.tail; /* try to clear slowpath flag when there are no contenders */ - cmpxchg(>head_tail, old.head_tail, new.head_tail); + try_cmpxchg_acquire(>head_tail, old.head_tail, new.head_tail); } } @@ -107,7 +107,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) { register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; - inc = xadd(>tickets, inc); + inc = xadd_acquire(>tickets, inc); if (likely(inc.head == inc.tail)) goto out; @@ -128,19 +128,21 @@ out: barrier(); /* make sure nothing creeps before the lock is taken */ } -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock) { arch_spinlock_t old, new; old.tickets = READ_ONCE(lock->tickets); if (!__tickets_equal(old.tickets.head, old.tickets.tail)) - return 0; + return false; new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT); new.head_tail &= ~TICKET_SLOWPATH_FLAG; - /* cmpxchg is a full barrier, so nothing can move before it */ - return cmpxchg(>head_tail, old.head_tail, new.head_tail) == old.head_tail; + /* Insert an acquire barrier with the cmpxchg so that nothing +* can move before it. +*/ + return try_cmpxchg_acquire(>head_tail, old.head_tail, new.head_tail); } static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) @@ -151,14 +153,14 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); - head = xadd(>tickets.head, TICKET_LOCK_INC); + head = xadd_release(>tickets.head, TICKET_LOCK_INC); if (unlikely(head & TICKET_SLOWPATH_FLAG)) { head &= ~TICKET_SLOWPATH_FLAG; __ticket_unlock_kick(lock, (head + TICKET_LOCK_INC)); } } else - __add(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); + add_release(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); } static inline int arch_spin_is_locked(arch_spinlock_t *lock)
[RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops
Make the x86 mutex implementation use ISO atomic ops rather than inline assembly. Signed-off-by: David Howells--- arch/x86/include/asm/mutex.h |6 +-- arch/x86/include/asm/mutex_iso.h | 73 ++ 2 files changed, 74 insertions(+), 5 deletions(-) create mode 100644 arch/x86/include/asm/mutex_iso.h diff --git a/arch/x86/include/asm/mutex.h b/arch/x86/include/asm/mutex.h index 7d3a48275394..831bc5a73886 100644 --- a/arch/x86/include/asm/mutex.h +++ b/arch/x86/include/asm/mutex.h @@ -1,5 +1 @@ -#ifdef CONFIG_X86_32 -# include -#else -# include -#endif +#include diff --git a/arch/x86/include/asm/mutex_iso.h b/arch/x86/include/asm/mutex_iso.h new file mode 100644 index ..ffdbdb411989 --- /dev/null +++ b/arch/x86/include/asm/mutex_iso.h @@ -0,0 +1,73 @@ +/* ISO atomics based mutex + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ +#ifndef _ASM_X86_MUTEX_ISO_H +#define _ASM_X86_MUTEX_ISO_H + +/** + * __mutex_fastpath_lock - decrement and call function if negative + * @v: pointer of type atomic_t + * @fail_fn: function to call if the result is negative + * + * Atomically decrements @v and calls if the result is negative. + */ +static inline void __mutex_fastpath_lock(atomic_t *v, +void (*fail_fn)(atomic_t *)) +{ + if (atomic_dec_return_acquire(v) < 0) + fail_fn(v); +} + +/** + * __mutex_fastpath_lock_retval - try to take the lock by moving the count + * from 1 to a 0 value + * @v: pointer of type atomic_t + * + * Change the count from 1 to a value lower than 1. This function returns 0 + * if the fastpath succeeds, or -1 otherwise. + */ +static inline int __mutex_fastpath_lock_retval(atomic_t *v) +{ + return unlikely(atomic_dec_return(v) < 0) ? -1 : 0; +} + +/** + * __mutex_fastpath_unlock - increment and call function if nonpositive + * @v: pointer of type atomic_t + * @fail_fn: function to call if the result is nonpositive + * + * Atomically increments @v and calls if the result is nonpositive. + */ +static inline void __mutex_fastpath_unlock(atomic_t *v, + void (*fail_fn)(atomic_t *)) +{ + if (atomic_inc_return_release(v) <= 0) + fail_fn(v); +} + +#define __mutex_slowpath_needs_to_unlock() 1 + +/** + * __mutex_fastpath_trylock - try to acquire the mutex, without waiting + * + * @v: pointer of type atomic_t + * @fail_fn: fallback function + * + * Change the count from 1 to 0 and return true (success), or return false + * (failure) if it wasn't 1 originally. [the fallback function is never used on + * x86_64, because all x86_64 CPUs have a CMPXCHG instruction.] + */ +static inline bool __mutex_fastpath_trylock(atomic_t *v, + int (*fail_fn)(atomic_t *)) +{ + return likely(atomic_try_cmpxchg(v, 1, 0)); +} + +#endif /* _ASM_X86_MUTEX_ISO_H */
[RFC PATCH 14/15] x86: Make the mutex implementation use ISO atomic ops
Make the x86 mutex implementation use ISO atomic ops rather than inline assembly. Signed-off-by: David Howells --- arch/x86/include/asm/mutex.h |6 +-- arch/x86/include/asm/mutex_iso.h | 73 ++ 2 files changed, 74 insertions(+), 5 deletions(-) create mode 100644 arch/x86/include/asm/mutex_iso.h diff --git a/arch/x86/include/asm/mutex.h b/arch/x86/include/asm/mutex.h index 7d3a48275394..831bc5a73886 100644 --- a/arch/x86/include/asm/mutex.h +++ b/arch/x86/include/asm/mutex.h @@ -1,5 +1 @@ -#ifdef CONFIG_X86_32 -# include -#else -# include -#endif +#include diff --git a/arch/x86/include/asm/mutex_iso.h b/arch/x86/include/asm/mutex_iso.h new file mode 100644 index ..ffdbdb411989 --- /dev/null +++ b/arch/x86/include/asm/mutex_iso.h @@ -0,0 +1,73 @@ +/* ISO atomics based mutex + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ +#ifndef _ASM_X86_MUTEX_ISO_H +#define _ASM_X86_MUTEX_ISO_H + +/** + * __mutex_fastpath_lock - decrement and call function if negative + * @v: pointer of type atomic_t + * @fail_fn: function to call if the result is negative + * + * Atomically decrements @v and calls if the result is negative. + */ +static inline void __mutex_fastpath_lock(atomic_t *v, +void (*fail_fn)(atomic_t *)) +{ + if (atomic_dec_return_acquire(v) < 0) + fail_fn(v); +} + +/** + * __mutex_fastpath_lock_retval - try to take the lock by moving the count + * from 1 to a 0 value + * @v: pointer of type atomic_t + * + * Change the count from 1 to a value lower than 1. This function returns 0 + * if the fastpath succeeds, or -1 otherwise. + */ +static inline int __mutex_fastpath_lock_retval(atomic_t *v) +{ + return unlikely(atomic_dec_return(v) < 0) ? -1 : 0; +} + +/** + * __mutex_fastpath_unlock - increment and call function if nonpositive + * @v: pointer of type atomic_t + * @fail_fn: function to call if the result is nonpositive + * + * Atomically increments @v and calls if the result is nonpositive. + */ +static inline void __mutex_fastpath_unlock(atomic_t *v, + void (*fail_fn)(atomic_t *)) +{ + if (atomic_inc_return_release(v) <= 0) + fail_fn(v); +} + +#define __mutex_slowpath_needs_to_unlock() 1 + +/** + * __mutex_fastpath_trylock - try to acquire the mutex, without waiting + * + * @v: pointer of type atomic_t + * @fail_fn: fallback function + * + * Change the count from 1 to 0 and return true (success), or return false + * (failure) if it wasn't 1 originally. [the fallback function is never used on + * x86_64, because all x86_64 CPUs have a CMPXCHG instruction.] + */ +static inline bool __mutex_fastpath_trylock(atomic_t *v, + int (*fail_fn)(atomic_t *)) +{ + return likely(atomic_try_cmpxchg(v, 1, 0)); +} + +#endif /* _ASM_X86_MUTEX_ISO_H */
drivers/of: crash on boot
Hi Rhyland, I'm seeing a crash on boot that seems to have been caused by "drivers/of: Fix depth when unflattening devicetree": [ 61.145229] == [ 61.147588] BUG: KASAN: stack-out-of-bounds in unflatten_dt_nodes+0x11d2/0x1290 at addr 88005b30777c [ 61.150490] Read of size 4 by task swapper/0/1 [ 61.151892] page:ea00016cc1c0 count:0 mapcount:0 mapping: (null) index:0x0 [ 61.154313] flags: 0x1f8000() [ 61.155460] page dumped because: kasan: bad access detected [ 61.157174] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090 [ 61.160149] 11000b660e83 8a2fe4e6 88005b3074a0 a3049c42 [ 61.162473] fbfff5c6e404 41b58ab3 adceb660 [ 61.164827] a3049ad0 88005b307480 a16ecb83 88003f501ebc [ 61.167133] Call Trace: [ 61.167904] dump_stack (lib/dump_stack.c:53) [ 61.169541] ? arch_local_irq_restore (./arch/x86/include/asm/paravirt.h:134) [ 61.171470] ? __dump_page (mm/debug.c:62) [ 61.173221] kasan_report_error (include/linux/kasan.h:28 mm/kasan/report.c:211 mm/kasan/report.c:277) [ 61.175067] ? fdt_next_node (lib/../scripts/dtc/libfdt/fdt.c:163) [ 61.176905] ? unflatten_dt_nodes (drivers/of/fdt.c:417) [ 61.178852] __asan_report_load4_noabort (mm/kasan/report.c:318) [ 61.180850] ? unflatten_dt_nodes (drivers/of/fdt.c:417) [ 61.182766] unflatten_dt_nodes (drivers/of/fdt.c:417) [ 61.184697] ? reverse_nodes (drivers/of/fdt.c:396) [ 61.186439] ? set_pageblock_migratetype (mm/page_alloc.c:589) [ 61.188473] ? kernel_poison_pages (mm/page_poison.c:163) [ 61.190344] ? lookup_page_ext (mm/page_ext.c:200) [ 61.192168] ? get_page_from_freelist (mm/page_alloc.c:1747 mm/page_alloc.c:3003) [ 61.194178] ? get_from_free_list (lib/idr.c:79) [ 61.196069] ? ida_get_new_above (lib/idr.c:1002) [ 61.197884] ? idr_get_empty_slot (lib/idr.c:933) [ 61.199802] ? split_free_page (mm/page_alloc.c:2901) [ 61.201598] ? ___might_sleep (kernel/sched/core.c:7520 (discriminator 1)) [ 61.203346] ? __alloc_pages_nodemask (mm/page_alloc.c:3804) [ 61.205328] ? __alloc_pages_slowpath (mm/page_alloc.c:3749) [ 61.207386] ? alloc_pages_current (mm/mempolicy.c:2078) [ 61.209281] ? kasan_unpoison_shadow (mm/kasan/kasan.c:59) [ 61.211155] ? kasan_kmalloc_large (mm/kasan/kasan.c:612) [ 61.213015] ? of_fdt_unflatten_tree (drivers/of/fdt.c:513) [ 61.214929] __unflatten_device_tree (drivers/of/fdt.c:488) [ 61.216901] of_fdt_unflatten_tree (drivers/of/fdt.c:541) [ 61.218841] of_unittest (drivers/of/unittest.c:924 drivers/of/unittest.c:1936) [ 61.220556] ? initcall_blacklisted (init/main.c:725) [ 61.222494] ? try_to_run_init_process (init/main.c:708) [ 61.224682] ? of_unittest_overlay (drivers/of/unittest.c:1931) [ 61.227059] ? kobject_add (lib/kobject.c:396) [ 61.229113] ? kobject_add_internal (lib/kobject.c:396) [ 61.231455] ? of_unittest_overlay (drivers/of/unittest.c:1931) [ 61.233865] do_one_initcall (init/main.c:770) [ 61.236005] ? initcall_blacklisted (init/main.c:759) [ 61.238354] ? ___might_sleep (kernel/sched/core.c:7522) [ 61.240504] kernel_init_freeable (init/main.c:834 init/main.c:843 init/main.c:861 init/main.c:1008) [ 61.242798] ? start_kernel (init/main.c:978) [ 61.244919] ? compat_start_thread (arch/x86/kernel/process_64.c:259) [ 61.247174] kernel_init (init/main.c:936) [ 61.249162] ret_from_fork (arch/x86/entry/entry_64.S:390) [ 61.251170] ? rest_init (init/main.c:931) [ 61.253104] Memory state around the buggy address: [ 61.254888] 88005b307600: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 [ 61.257551] 88005b307680: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 [ 61.260255] >88005b307700: 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 f2 [ 61.262911] ^ [ 61.265529] 88005b307780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 61.268218] 88005b307800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 61.270874] == [ 61.273558] Disabling lock debugging due to kernel taint [ 61.275648] == [ 61.278303] BUG: KASAN: stack-out-of-bounds in unflatten_dt_nodes+0x1236/0x1290 at addr 88005b307898 [ 61.281794] Read of size 8 by task swapper/0/1 [ 61.283483] page:ea00016cc1c0 count:0 mapcount:0 mapping: (null) index:0x0 [ 61.286454] flags: 0x1f8000() [ 61.287817] page dumped because: kasan: bad access detected [ 61.289904] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090 [ 61.293896] 11000b660e83 8a2fe4e6 88005b3074a0 fff
drivers/of: crash on boot
Hi Rhyland, I'm seeing a crash on boot that seems to have been caused by "drivers/of: Fix depth when unflattening devicetree": [ 61.145229] == [ 61.147588] BUG: KASAN: stack-out-of-bounds in unflatten_dt_nodes+0x11d2/0x1290 at addr 88005b30777c [ 61.150490] Read of size 4 by task swapper/0/1 [ 61.151892] page:ea00016cc1c0 count:0 mapcount:0 mapping: (null) index:0x0 [ 61.154313] flags: 0x1f8000() [ 61.155460] page dumped because: kasan: bad access detected [ 61.157174] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090 [ 61.160149] 11000b660e83 8a2fe4e6 88005b3074a0 a3049c42 [ 61.162473] fbfff5c6e404 41b58ab3 adceb660 [ 61.164827] a3049ad0 88005b307480 a16ecb83 88003f501ebc [ 61.167133] Call Trace: [ 61.167904] dump_stack (lib/dump_stack.c:53) [ 61.169541] ? arch_local_irq_restore (./arch/x86/include/asm/paravirt.h:134) [ 61.171470] ? __dump_page (mm/debug.c:62) [ 61.173221] kasan_report_error (include/linux/kasan.h:28 mm/kasan/report.c:211 mm/kasan/report.c:277) [ 61.175067] ? fdt_next_node (lib/../scripts/dtc/libfdt/fdt.c:163) [ 61.176905] ? unflatten_dt_nodes (drivers/of/fdt.c:417) [ 61.178852] __asan_report_load4_noabort (mm/kasan/report.c:318) [ 61.180850] ? unflatten_dt_nodes (drivers/of/fdt.c:417) [ 61.182766] unflatten_dt_nodes (drivers/of/fdt.c:417) [ 61.184697] ? reverse_nodes (drivers/of/fdt.c:396) [ 61.186439] ? set_pageblock_migratetype (mm/page_alloc.c:589) [ 61.188473] ? kernel_poison_pages (mm/page_poison.c:163) [ 61.190344] ? lookup_page_ext (mm/page_ext.c:200) [ 61.192168] ? get_page_from_freelist (mm/page_alloc.c:1747 mm/page_alloc.c:3003) [ 61.194178] ? get_from_free_list (lib/idr.c:79) [ 61.196069] ? ida_get_new_above (lib/idr.c:1002) [ 61.197884] ? idr_get_empty_slot (lib/idr.c:933) [ 61.199802] ? split_free_page (mm/page_alloc.c:2901) [ 61.201598] ? ___might_sleep (kernel/sched/core.c:7520 (discriminator 1)) [ 61.203346] ? __alloc_pages_nodemask (mm/page_alloc.c:3804) [ 61.205328] ? __alloc_pages_slowpath (mm/page_alloc.c:3749) [ 61.207386] ? alloc_pages_current (mm/mempolicy.c:2078) [ 61.209281] ? kasan_unpoison_shadow (mm/kasan/kasan.c:59) [ 61.211155] ? kasan_kmalloc_large (mm/kasan/kasan.c:612) [ 61.213015] ? of_fdt_unflatten_tree (drivers/of/fdt.c:513) [ 61.214929] __unflatten_device_tree (drivers/of/fdt.c:488) [ 61.216901] of_fdt_unflatten_tree (drivers/of/fdt.c:541) [ 61.218841] of_unittest (drivers/of/unittest.c:924 drivers/of/unittest.c:1936) [ 61.220556] ? initcall_blacklisted (init/main.c:725) [ 61.222494] ? try_to_run_init_process (init/main.c:708) [ 61.224682] ? of_unittest_overlay (drivers/of/unittest.c:1931) [ 61.227059] ? kobject_add (lib/kobject.c:396) [ 61.229113] ? kobject_add_internal (lib/kobject.c:396) [ 61.231455] ? of_unittest_overlay (drivers/of/unittest.c:1931) [ 61.233865] do_one_initcall (init/main.c:770) [ 61.236005] ? initcall_blacklisted (init/main.c:759) [ 61.238354] ? ___might_sleep (kernel/sched/core.c:7522) [ 61.240504] kernel_init_freeable (init/main.c:834 init/main.c:843 init/main.c:861 init/main.c:1008) [ 61.242798] ? start_kernel (init/main.c:978) [ 61.244919] ? compat_start_thread (arch/x86/kernel/process_64.c:259) [ 61.247174] kernel_init (init/main.c:936) [ 61.249162] ret_from_fork (arch/x86/entry/entry_64.S:390) [ 61.251170] ? rest_init (init/main.c:931) [ 61.253104] Memory state around the buggy address: [ 61.254888] 88005b307600: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 [ 61.257551] 88005b307680: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 [ 61.260255] >88005b307700: 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 f2 [ 61.262911] ^ [ 61.265529] 88005b307780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 61.268218] 88005b307800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 61.270874] == [ 61.273558] Disabling lock debugging due to kernel taint [ 61.275648] == [ 61.278303] BUG: KASAN: stack-out-of-bounds in unflatten_dt_nodes+0x1236/0x1290 at addr 88005b307898 [ 61.281794] Read of size 8 by task swapper/0/1 [ 61.283483] page:ea00016cc1c0 count:0 mapcount:0 mapping: (null) index:0x0 [ 61.286454] flags: 0x1f8000() [ 61.287817] page dumped because: kasan: bad access detected [ 61.289904] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 4.6.0-next-20160518-sasha-00032-gab479e0-dirty #3090 [ 61.293896] 11000b660e83 8a2fe4e6 88005b3074a0 fff
Re: [PATCH v2 omap 3/6] arm: Add _rcuidle tracepoints to allow clk_core_disable() use from idle
On 16/05/2016 20:50, Paul E. McKenney wrote: >> stack backtrace: >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1114 >> Hardware name: Generic OMAP36xx (Flattened Device Tree) >> [] (unwind_backtrace) from [] (show_stack+0x10/0x14) >> [] (show_stack) from [] (dump_stack+0xb0/0xe4) >> [] (dump_stack) from [] (clk_core_disable+0x17c/0x348) >> [] (clk_core_disable) from [] (clk_disable+0x24/0x30) >> [] (clk_disable) from [] (_disable_clocks+0x18/0x7c) >> [] (_disable_clocks) from [] (_idle+0x12c/0x230) >> [] (_idle) from [] (omap_hwmod_idle+0x24/0x44) >> [] (omap_hwmod_idle) from [] (omap_device_idle+0x3c/0x90) >> [] (omap_device_idle) from [] (__rpm_callback+0x2c/0x60) >> [] (__rpm_callback) from [] (rpm_callback+0x20/0x80) >> [] (rpm_callback) from [] (rpm_suspend+0x100/0x768) >> [] (rpm_suspend) from [] (__pm_runtime_suspend+0x64/0x84) >> [] (__pm_runtime_suspend) from [] >> (omap2_gpio_prepare_for_idle+0x5 >> c/0x70) >> [] (omap2_gpio_prepare_for_idle) from [] >> (omap_sram_idle+0x140/0x2 >> 44) Wouldn't the stack trace look better without the word wrap? :-) Regards.
Re: [PATCH v2 omap 3/6] arm: Add _rcuidle tracepoints to allow clk_core_disable() use from idle
On 16/05/2016 20:50, Paul E. McKenney wrote: >> stack backtrace: >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1114 >> Hardware name: Generic OMAP36xx (Flattened Device Tree) >> [] (unwind_backtrace) from [] (show_stack+0x10/0x14) >> [] (show_stack) from [] (dump_stack+0xb0/0xe4) >> [] (dump_stack) from [] (clk_core_disable+0x17c/0x348) >> [] (clk_core_disable) from [] (clk_disable+0x24/0x30) >> [] (clk_disable) from [] (_disable_clocks+0x18/0x7c) >> [] (_disable_clocks) from [] (_idle+0x12c/0x230) >> [] (_idle) from [] (omap_hwmod_idle+0x24/0x44) >> [] (omap_hwmod_idle) from [] (omap_device_idle+0x3c/0x90) >> [] (omap_device_idle) from [] (__rpm_callback+0x2c/0x60) >> [] (__rpm_callback) from [] (rpm_callback+0x20/0x80) >> [] (rpm_callback) from [] (rpm_suspend+0x100/0x768) >> [] (rpm_suspend) from [] (__pm_runtime_suspend+0x64/0x84) >> [] (__pm_runtime_suspend) from [] >> (omap2_gpio_prepare_for_idle+0x5 >> c/0x70) >> [] (omap2_gpio_prepare_for_idle) from [] >> (omap_sram_idle+0x140/0x2 >> 44) Wouldn't the stack trace look better without the word wrap? :-) Regards.
Re: [PATCH] usb: echi-hcd: Add register access check in shutdown
On 18/05/16 15:56, Alan Stern wrote: On Wed, 18 May 2016, Srinivas Kandagatla wrote: This patch adds a check in ehci_shutdown(), to make sure that the register access is available before accessing registers. The use case is simple, for boards like DB410c where the usb host or device functionality is decided based on the micro-usb cable presence. If the board boots up with micro-usb connected and the host driver is probed, but the ehci_setup() has not been done yet, then a system shutdown would trigger below NULL pointer exception without this patch. How can that happen? While the host driver is probed, the probing thread holds the device lock. But the system shutdown routine acquires the device lock before invoking the ->shutdown callback. Therefore the two things cannot happen concurrently. No, I did not mean them happening concurrently, I mean that the host driver is up, however ehci_setup() is not done yet. Will change the comments if its misleading the reader. Unable to handle kernel NULL pointer dereference at virtual address 0008 ... --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -368,6 +368,9 @@ static void ehci_shutdown(struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci(hcd); + if (!HCD_HW_ACCESSIBLE(hcd)) + return; + spin_lock_irq(>lock); ehci->shutdown = true; ehci->rh_state = EHCI_RH_STOPPING; This doesn't seem like the right place. What you really should do is skip calling ehci_silence_controller() if the hardware isn't accessible. That's where the hardware gets touched, not in ehci_shutdown(). Yep , that should work as well. Will send a v2 patch. thanks, srini Alan Stern
Re: [PATCH] usb: echi-hcd: Add register access check in shutdown
On 18/05/16 15:56, Alan Stern wrote: On Wed, 18 May 2016, Srinivas Kandagatla wrote: This patch adds a check in ehci_shutdown(), to make sure that the register access is available before accessing registers. The use case is simple, for boards like DB410c where the usb host or device functionality is decided based on the micro-usb cable presence. If the board boots up with micro-usb connected and the host driver is probed, but the ehci_setup() has not been done yet, then a system shutdown would trigger below NULL pointer exception without this patch. How can that happen? While the host driver is probed, the probing thread holds the device lock. But the system shutdown routine acquires the device lock before invoking the ->shutdown callback. Therefore the two things cannot happen concurrently. No, I did not mean them happening concurrently, I mean that the host driver is up, however ehci_setup() is not done yet. Will change the comments if its misleading the reader. Unable to handle kernel NULL pointer dereference at virtual address 0008 ... --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -368,6 +368,9 @@ static void ehci_shutdown(struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci(hcd); + if (!HCD_HW_ACCESSIBLE(hcd)) + return; + spin_lock_irq(>lock); ehci->shutdown = true; ehci->rh_state = EHCI_RH_STOPPING; This doesn't seem like the right place. What you really should do is skip calling ehci_silence_controller() if the hardware isn't accessible. That's where the hardware gets touched, not in ehci_shutdown(). Yep , that should work as well. Will send a v2 patch. thanks, srini Alan Stern
Re: [PATCH v3] net: sock: move ->sk_shutdown out of bitfields.
On Wed, 2016-05-18 at 17:36 +0300, Andrey Ryabinin wrote: > ->sk_shutdown bits share one bitfield with some other bits in sock struct, > such as ->sk_no_check_[r,t]x, ->sk_userlocks ... > sock_setsockopt() may write to these bits, while holding the socket lock. > > In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only > unix_state_lock(). So concurrent setsockopt() and shutdown() may lead > to corrupting these bits. > > Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte. > This will not change the 'struct sock' size since ->sk_shutdown moved into > previously unused 16-bit hole. > > Signed-off-by: Andrey Ryabinin> Suggested-by: Hannes Frederic Sowa > --- > > Changes since v1: > - move sk_shutdown into a separate byte instead of locking > AF_UNIX sockets. > > Changes since v2: > - reorder bitfields, so that sk_type/sk_protocol still fit in > separate 2-bytes/byte. > I suggested to use an explicit padding, to clearly show that two bits are available there. You also could add a comment to remind the rules. Something like : diff --git a/include/net/sock.h b/include/net/sock.h index c9c8b19df27c558354687119db60c0716909ea3f..4e239a7ef4f67b1988fe119d81566d1465a2b596 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -382,8 +382,10 @@ struct sock { atomic_tsk_omem_alloc; int sk_sndbuf; struct sk_buff_head sk_write_queue; + + /* because of non atomicity rules, all changes are protected by socket lock */ kmemcheck_bitfield_begin(flags); - unsigned intsk_shutdown : 2, + unsigned intsk_padding : 2, sk_no_check_tx : 1, sk_no_check_rx : 1, sk_userlocks : 4, @@ -391,6 +393,7 @@ struct sock { sk_type : 16; #define SK_PROTOCOL_MAX U8_MAX kmemcheck_bitfield_end(flags); + int sk_wmem_queued; gfp_t sk_allocation; u32 sk_pacing_rate; /* bytes per second */ @@ -418,6 +421,7 @@ struct sock { struct timer_list sk_timer; ktime_t sk_stamp; u16 sk_tsflags; + u8 sk_shutdown; u32 sk_tskey; struct socket *sk_socket; void*sk_user_data;
Re: [PATCH v3] net: sock: move ->sk_shutdown out of bitfields.
On Wed, 2016-05-18 at 17:36 +0300, Andrey Ryabinin wrote: > ->sk_shutdown bits share one bitfield with some other bits in sock struct, > such as ->sk_no_check_[r,t]x, ->sk_userlocks ... > sock_setsockopt() may write to these bits, while holding the socket lock. > > In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only > unix_state_lock(). So concurrent setsockopt() and shutdown() may lead > to corrupting these bits. > > Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte. > This will not change the 'struct sock' size since ->sk_shutdown moved into > previously unused 16-bit hole. > > Signed-off-by: Andrey Ryabinin > Suggested-by: Hannes Frederic Sowa > --- > > Changes since v1: > - move sk_shutdown into a separate byte instead of locking > AF_UNIX sockets. > > Changes since v2: > - reorder bitfields, so that sk_type/sk_protocol still fit in > separate 2-bytes/byte. > I suggested to use an explicit padding, to clearly show that two bits are available there. You also could add a comment to remind the rules. Something like : diff --git a/include/net/sock.h b/include/net/sock.h index c9c8b19df27c558354687119db60c0716909ea3f..4e239a7ef4f67b1988fe119d81566d1465a2b596 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -382,8 +382,10 @@ struct sock { atomic_tsk_omem_alloc; int sk_sndbuf; struct sk_buff_head sk_write_queue; + + /* because of non atomicity rules, all changes are protected by socket lock */ kmemcheck_bitfield_begin(flags); - unsigned intsk_shutdown : 2, + unsigned intsk_padding : 2, sk_no_check_tx : 1, sk_no_check_rx : 1, sk_userlocks : 4, @@ -391,6 +393,7 @@ struct sock { sk_type : 16; #define SK_PROTOCOL_MAX U8_MAX kmemcheck_bitfield_end(flags); + int sk_wmem_queued; gfp_t sk_allocation; u32 sk_pacing_rate; /* bytes per second */ @@ -418,6 +421,7 @@ struct sock { struct timer_list sk_timer; ktime_t sk_stamp; u16 sk_tsflags; + u8 sk_shutdown; u32 sk_tskey; struct socket *sk_socket; void*sk_user_data;
Re: [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics
On Wednesday 18 May 2016 16:10:45 David Howells wrote: > cmpxchg_local() is not signed-value safe because on a 64-bit machine signed > int arguments to it may be sign-extended to signed long _before_ begin cast > to unsigned long. This potentially causes comparisons to fail when dealing > with negative values. > > Fix the generic atomic functions that are implemented in terms of cmpxchg() > to cast their arguments to unsigned int before calling cmpxchg(). > > Signed-off-by: David Howells> Isn't the problem you describe something that cmpxchg() could prevent instead? Arnd
Re: [RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics
On Wednesday 18 May 2016 16:10:45 David Howells wrote: > cmpxchg_local() is not signed-value safe because on a 64-bit machine signed > int arguments to it may be sign-extended to signed long _before_ begin cast > to unsigned long. This potentially causes comparisons to fail when dealing > with negative values. > > Fix the generic atomic functions that are implemented in terms of cmpxchg() > to cast their arguments to unsigned int before calling cmpxchg(). > > Signed-off-by: David Howells > Isn't the problem you describe something that cmpxchg() could prevent instead? Arnd
Re: [PATCH] bcache: bch_writeback_thread() is not freezable
On Mon, 2 May 2016, Jiri Kosina wrote: > > From: Jiri Kosina> > > > bch_writeback_thread() is calling try_to_freeze(), but that's just an > > expensive no-op given the fact that the thread is not marked freezable. > > > > I/O helper kthreads, exactly such as the bcache writeback thread, actually > > shouldn't be freezable, because they are potentially necessary for > > finalizing the image write-out. > > > > Signed-off-by: Jiri Kosina > > Let's send a friendly ping on these three patches after two weeks... After a month, I think, it's time for another ping. -- Jiri Kosina SUSE Labs
Re: [PATCH] bcache: bch_writeback_thread() is not freezable
On Mon, 2 May 2016, Jiri Kosina wrote: > > From: Jiri Kosina > > > > bch_writeback_thread() is calling try_to_freeze(), but that's just an > > expensive no-op given the fact that the thread is not marked freezable. > > > > I/O helper kthreads, exactly such as the bcache writeback thread, actually > > shouldn't be freezable, because they are potentially necessary for > > finalizing the image write-out. > > > > Signed-off-by: Jiri Kosina > > Let's send a friendly ping on these three patches after two weeks... After a month, I think, it's time for another ping. -- Jiri Kosina SUSE Labs
Re: [PATCH] mm: add config option to select the initial overcommit mode
Hi Michal, On 05/17/2016 10:16 PM, Michal Hocko wrote: > On Tue 17-05-16 18:16:58, Sebastian Frias wrote: > [...] >> From reading Documentation/cgroup-v1/memory.txt (and from a few >> replies here talking about cgroups), it looks like the OOM-killer is >> still being actively discussed, well, there's also "cgroup-v2". >> My understanding is that cgroup's memory control will pause processes >> in a given cgroup until the OOM situation is solved for that cgroup, >> right? > > It will be blocked waiting either for some external action which would > result in OOM codition going away or any other charge release. You have > to configure memcg for that though. The default behavior is to invoke > the same OOM killer algorithm which is just reduced to tasks from the > memcg (hierarchy). Ok, I see, thanks! > >> If that is right, it means that there is indeed a way to deal >> with an OOM situation (stack expansion, COW failure, 'memory hog', >> etc.) in a better way than the OOM-killer, right? >> In which case, do you guys know if there is a way to make the whole >> system behave as if it was inside a cgroup? (*) > > No it is not. You have to realize that the system wide and the memcg OOM > situations are quite different. There is usually quite some memory free > when you hit the memcg OOM so the administrator can actually do > something. Ok, so it works like the 5% reserved for 'root' on filesystems? >The global OOM means there is _no_ memory at all. Many kernel > operations will need some memory to do something useful. Let's say you > would want to do an educated guess about who to kill - most proc APIs > will need to allocate. And this is just a beginning. Things are getting > really nasty when you get deeper and deeper. E.g. the OOM killer has to > give the oom victim access to memory reserves so that the task can exit > because that path needs to allocate as well. Really? I would have thought that once that SIGKILL is sent, the victim process is not expected to do anything else and thus its memory could be claimed immediately. Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM) >So even if you wanted to > give userspace some chance to resolve the OOM situation you would either > need some special API to tell "this process is really special and it can > access memory reserves and it has an absolute priority etc." or have a > in kernel fallback to do something or your system could lockup really > easily. > I see, so basically at least two cgroups would be needed, one reserved for handling the OOM situation through some API and another for the "rest of the system". Basically just like the 5% reserved for 'root' on filesystems. Do you think that would work? Best regards, Sebastian
Re: [PATCH] mm: add config option to select the initial overcommit mode
Hi Michal, On 05/17/2016 10:16 PM, Michal Hocko wrote: > On Tue 17-05-16 18:16:58, Sebastian Frias wrote: > [...] >> From reading Documentation/cgroup-v1/memory.txt (and from a few >> replies here talking about cgroups), it looks like the OOM-killer is >> still being actively discussed, well, there's also "cgroup-v2". >> My understanding is that cgroup's memory control will pause processes >> in a given cgroup until the OOM situation is solved for that cgroup, >> right? > > It will be blocked waiting either for some external action which would > result in OOM codition going away or any other charge release. You have > to configure memcg for that though. The default behavior is to invoke > the same OOM killer algorithm which is just reduced to tasks from the > memcg (hierarchy). Ok, I see, thanks! > >> If that is right, it means that there is indeed a way to deal >> with an OOM situation (stack expansion, COW failure, 'memory hog', >> etc.) in a better way than the OOM-killer, right? >> In which case, do you guys know if there is a way to make the whole >> system behave as if it was inside a cgroup? (*) > > No it is not. You have to realize that the system wide and the memcg OOM > situations are quite different. There is usually quite some memory free > when you hit the memcg OOM so the administrator can actually do > something. Ok, so it works like the 5% reserved for 'root' on filesystems? >The global OOM means there is _no_ memory at all. Many kernel > operations will need some memory to do something useful. Let's say you > would want to do an educated guess about who to kill - most proc APIs > will need to allocate. And this is just a beginning. Things are getting > really nasty when you get deeper and deeper. E.g. the OOM killer has to > give the oom victim access to memory reserves so that the task can exit > because that path needs to allocate as well. Really? I would have thought that once that SIGKILL is sent, the victim process is not expected to do anything else and thus its memory could be claimed immediately. Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM) >So even if you wanted to > give userspace some chance to resolve the OOM situation you would either > need some special API to tell "this process is really special and it can > access memory reserves and it has an absolute priority etc." or have a > in kernel fallback to do something or your system could lockup really > easily. > I see, so basically at least two cgroups would be needed, one reserved for handling the OOM situation through some API and another for the "rest of the system". Basically just like the 5% reserved for 'root' on filesystems. Do you think that would work? Best regards, Sebastian
Re: [PATCH] mm: add config option to select the initial overcommit mode
Hi Austin, On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote: >> I see the difference, your answer seems a bit like the one from Austin, >> basically: >> - killing a process is a sort of kernel protection attempting to deal >> "automatically" with some situation, like deciding what is a 'memory hog', >> or what is 'in infinite loop', "usually" in a correct way. >> It seems there's people who think its better to avoid having to take such >> decisions and/or they should be decided by the user, because "usually" != >> "always". > FWIW, it's really easy to see what's using a lot of memory, it's impossible > to tell if something is stuck in an infinite loop without looking deep into > the process state and possibly even at the source code (and even then it can > be almost impossible to be certain). This is why we have a OOM-Killer, and > not a infinite-loop-killer. > > Again I reiterate, if a system is properly provisioned (that is, if you have > put in enough RAM and possibly swap space to do what you want to use it for), > the only reason the OOM-killer should be invoked is due to a bug. Are you sure that's the only possible reason? I mean, what if somebody keeps opening tabs on Firefox? If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs open, please close some to free memory". > The non-default overcommit options still have the same issues they just > change how and when they happen (overcommit=never will fire sooner, > overcommit=always will fire later), and also can impact memory allocation > performance (I have numbers somewhere that I can't find right now that > demonstrated that overcommit=never gave more deterministic and (on average) > marginally better malloc() performance, and simple logic would suggest that > overcommit=always would make malloc() perform better too). >> And people who see that as a nice thing but complex thing to do. >> In this thread we've tried to explain why this heuristic (and/or OOM-killer) >> is/was needed and/or its history, which has been very enlightening by the >> way. >> >> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here >> talking about cgroups), it looks like the OOM-killer is still being actively >> discussed, well, there's also "cgroup-v2". >> My understanding is that cgroup's memory control will pause processes in a >> given cgroup until the OOM situation is solved for that cgroup, right? >> If that is right, it means that there is indeed a way to deal with an OOM >> situation (stack expansion, COW failure, 'memory hog', etc.) in a better way >> than the OOM-killer, right? >> In which case, do you guys know if there is a way to make the whole system >> behave as if it was inside a cgroup? (*) > No, not with the process freeze behavior, because getting the group running > again requires input from an external part of the system, which by definition > doesn't exist if the group is the entire system; Do you mean that it pauses all processes in the cgroup? I thought it would pause on a case-by-case basis, like first process to reach the limit gets paused, and so on. Honestly I thought it would work a bit like the filesystems, where 'root' usually has 5% reserved, so that a process (or processes) filling the disk does not disrupt the system to the point of preventing 'root' from performing administrative actions. That makes me think, why is disk space handled differently than memory in this case? I mean, why is disk space exhaustion handled differently than memory exhaustion? We could imagine that both resources are required for proper system and process operation, so if OOM-killer is there to attempt to keep the system working at all costs (even if that means sacrificing processes), why isn't there an OOFS-killer (out-of-free-space killer)? >and, because our GUI isn't built into the kernel, we can't pause things and >pop up a little dialog asking the user what to do to resolve the issue. :-) Yeah, I was thinking that could be handled with the cgroups' notification system + the reserved space (like on filesystems) Maybe I was too optimistic (naive or just plain ignorant) about this. Best regards, Sebastian
Re: [PATCH] mm: add config option to select the initial overcommit mode
Hi Austin, On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote: >> I see the difference, your answer seems a bit like the one from Austin, >> basically: >> - killing a process is a sort of kernel protection attempting to deal >> "automatically" with some situation, like deciding what is a 'memory hog', >> or what is 'in infinite loop', "usually" in a correct way. >> It seems there's people who think its better to avoid having to take such >> decisions and/or they should be decided by the user, because "usually" != >> "always". > FWIW, it's really easy to see what's using a lot of memory, it's impossible > to tell if something is stuck in an infinite loop without looking deep into > the process state and possibly even at the source code (and even then it can > be almost impossible to be certain). This is why we have a OOM-Killer, and > not a infinite-loop-killer. > > Again I reiterate, if a system is properly provisioned (that is, if you have > put in enough RAM and possibly swap space to do what you want to use it for), > the only reason the OOM-killer should be invoked is due to a bug. Are you sure that's the only possible reason? I mean, what if somebody keeps opening tabs on Firefox? If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs open, please close some to free memory". > The non-default overcommit options still have the same issues they just > change how and when they happen (overcommit=never will fire sooner, > overcommit=always will fire later), and also can impact memory allocation > performance (I have numbers somewhere that I can't find right now that > demonstrated that overcommit=never gave more deterministic and (on average) > marginally better malloc() performance, and simple logic would suggest that > overcommit=always would make malloc() perform better too). >> And people who see that as a nice thing but complex thing to do. >> In this thread we've tried to explain why this heuristic (and/or OOM-killer) >> is/was needed and/or its history, which has been very enlightening by the >> way. >> >> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here >> talking about cgroups), it looks like the OOM-killer is still being actively >> discussed, well, there's also "cgroup-v2". >> My understanding is that cgroup's memory control will pause processes in a >> given cgroup until the OOM situation is solved for that cgroup, right? >> If that is right, it means that there is indeed a way to deal with an OOM >> situation (stack expansion, COW failure, 'memory hog', etc.) in a better way >> than the OOM-killer, right? >> In which case, do you guys know if there is a way to make the whole system >> behave as if it was inside a cgroup? (*) > No, not with the process freeze behavior, because getting the group running > again requires input from an external part of the system, which by definition > doesn't exist if the group is the entire system; Do you mean that it pauses all processes in the cgroup? I thought it would pause on a case-by-case basis, like first process to reach the limit gets paused, and so on. Honestly I thought it would work a bit like the filesystems, where 'root' usually has 5% reserved, so that a process (or processes) filling the disk does not disrupt the system to the point of preventing 'root' from performing administrative actions. That makes me think, why is disk space handled differently than memory in this case? I mean, why is disk space exhaustion handled differently than memory exhaustion? We could imagine that both resources are required for proper system and process operation, so if OOM-killer is there to attempt to keep the system working at all costs (even if that means sacrificing processes), why isn't there an OOFS-killer (out-of-free-space killer)? >and, because our GUI isn't built into the kernel, we can't pause things and >pop up a little dialog asking the user what to do to resolve the issue. :-) Yeah, I was thinking that could be handled with the cgroups' notification system + the reserved space (like on filesystems) Maybe I was too optimistic (naive or just plain ignorant) about this. Best regards, Sebastian
Re: linux-next: Tree for May 18 (rcu/tree)
On 05/17/16 21:30, Stephen Rothwell wrote: > Hi all, > > Please do not add any v4.8 destined material to your linux-next included > branches until after v4.7-rc1 has been released. > > Changes since 20160517: > on i386: In file included from ../kernel/rcu/tree.c:4209:0: ../kernel/rcu/tree_plugin.h: In function 'rcu_boost_kthread_setaffinity': ../kernel/rcu/tree_plugin.h:1168:2: error: implicit declaration of function 'for_each_leaf_node_cpu_bit' [-Werror=implicit-function-declaration] for_each_leaf_node_cpu_bit(rnp, cpu, bit) ^ ../kernel/rcu/tree_plugin.h:1169:3: error: expected ';' before 'if' if ((mask & bit) && cpu != outgoingcpu) ^ ../kernel/rcu/tree_plugin.h:1159:16: warning: unused variable 'mask' [-Wunused-variable] unsigned long mask = rcu_rnp_online_cpus(rnp); ^ Full randconfig file is attached. -- ~Randy # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.6.0 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SYSVIPC=y # CONFIG_CROSS_MEMORY_ATTACH is not set # CONFIG_FHANDLE is not set CONFIG_USELIB=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_IRQ_DOMAIN_DEBUG=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # # CONFIG_TICK_CPU_ACCOUNTING is not set CONFIG_IRQ_TIME_ACCOUNTING=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # # RCU Subsystem # CONFIG_PREEMPT_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FAST_NO_HZ is not set CONFIG_TREE_RCU_TRACE=y CONFIG_RCU_BOOST=y CONFIG_RCU_KTHREAD_PRIO=1 CONFIG_RCU_BOOST_DELAY=500 # CONFIG_RCU_NOCB_CPU is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_NMI_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set CONFIG_CGROUP_FREEZER=y # CONFIG_CPUSETS is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_CGROUP_PERF is not set # CONFIG_CGROUP_DEBUG is not set CONFIG_CHECKPOINT_RESTORE=y # CONFIG_NAMESPACES is not set CONFIG_SCHED_AUTOGROUP=y # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y #
Re: linux-next: Tree for May 18 (rcu/tree)
On 05/17/16 21:30, Stephen Rothwell wrote: > Hi all, > > Please do not add any v4.8 destined material to your linux-next included > branches until after v4.7-rc1 has been released. > > Changes since 20160517: > on i386: In file included from ../kernel/rcu/tree.c:4209:0: ../kernel/rcu/tree_plugin.h: In function 'rcu_boost_kthread_setaffinity': ../kernel/rcu/tree_plugin.h:1168:2: error: implicit declaration of function 'for_each_leaf_node_cpu_bit' [-Werror=implicit-function-declaration] for_each_leaf_node_cpu_bit(rnp, cpu, bit) ^ ../kernel/rcu/tree_plugin.h:1169:3: error: expected ';' before 'if' if ((mask & bit) && cpu != outgoingcpu) ^ ../kernel/rcu/tree_plugin.h:1159:16: warning: unused variable 'mask' [-Wunused-variable] unsigned long mask = rcu_rnp_online_cpus(rnp); ^ Full randconfig file is attached. -- ~Randy # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.6.0 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SYSVIPC=y # CONFIG_CROSS_MEMORY_ATTACH is not set # CONFIG_FHANDLE is not set CONFIG_USELIB=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_IRQ_DOMAIN_DEBUG=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # # CONFIG_TICK_CPU_ACCOUNTING is not set CONFIG_IRQ_TIME_ACCOUNTING=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # # RCU Subsystem # CONFIG_PREEMPT_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FAST_NO_HZ is not set CONFIG_TREE_RCU_TRACE=y CONFIG_RCU_BOOST=y CONFIG_RCU_KTHREAD_PRIO=1 CONFIG_RCU_BOOST_DELAY=500 # CONFIG_RCU_NOCB_CPU is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_NMI_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set # CONFIG_CGROUP_PIDS is not set CONFIG_CGROUP_FREEZER=y # CONFIG_CPUSETS is not set # CONFIG_CGROUP_DEVICE is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_CGROUP_PERF is not set # CONFIG_CGROUP_DEBUG is not set CONFIG_CHECKPOINT_RESTORE=y # CONFIG_NAMESPACES is not set CONFIG_SCHED_AUTOGROUP=y # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y #
Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
On Wed 18-05-16 13:59:53, Vlastimil Babka wrote: > On 05/13/2016 02:05 PM, Michal Hocko wrote: > > On Fri 13-05-16 10:23:31, Vlastimil Babka wrote: > >> On 05/12/2016 06:20 PM, Michal Hocko wrote: > >>> On Tue 10-05-16 09:35:56, Vlastimil Babka wrote: > >>> [...] > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 570383a41853..0cb09714d960 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -256,8 +256,7 @@ struct vm_area_struct; > #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) > #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) > #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ > - __GFP_NOMEMALLOC | __GFP_NORETRY | > __GFP_NOWARN) & \ > - ~__GFP_RECLAIM) > + __GFP_NOMEMALLOC | __GFP_NOWARN) & > ~__GFP_RECLAIM) > >>> > >>> I am not sure this is the right thing to do. I think we should keep > >>> __GFP_NORETRY and clear it where we want a stronger semantic. This is > >>> just too suble that all callsites are doing the right thing. > >> > >> That would complicate alloc_hugepage_direct_gfpmask() a bit, but if you > >> think it's worth it, I can turn the default around, OK. > > > > Hmm, on the other hand it is true that GFP_TRANSHUGE is clearing both > > reclaim flags by default and then overwrites that. This is just too > > ugly. Can we make GFP_TRANSHUGE to only define flags we care about and > > then tweak those that should go away at the callsites which matter now > > that we do not rely on is_thp_gfp_mask? > > So the following patch attempts what you suggest, if I understand you > correctly. GFP_TRANSHUGE includes all possible flag, and then they are > removed as needed. I don't really think it helps code readability > though. yeah it is ugly has _hell_. I do not think this deserves too much time to discuss as the flag is mostly internal but one last proposal would be to define different THP allocations context explicitly. Some callers would still need some additional meddling but maybe it would be slightly better to read. Dunno. Anyway if you think this is not really an improvement then I won't insist on any change to your original patch. --- diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 570383a41853..e7926b466107 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -255,9 +255,14 @@ struct vm_area_struct; #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) -#define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ + +/* Optimistic or latency sensitive THP allocation - page fault path */ +#define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \ ~__GFP_RECLAIM) +/* More serious THP allocation request - kcompactd */ +#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) & \ + ~__GFP_NORETRY /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1a4d4c807d92..937b89c6c0aa 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -216,7 +216,7 @@ struct page *get_huge_zero_page(void) if (likely(atomic_inc_not_zero(_zero_refcount))) return READ_ONCE(huge_zero_page); - zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, + zero_page = alloc_pages((GFP_TRANSHUGE_LIGHT | __GFP_ZERO) & ~__GFP_MOVABLE, HPAGE_PMD_ORDER); if (!zero_page) { count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED); @@ -888,23 +888,31 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, */ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) { - gfp_t reclaim_flags = 0; + gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT; if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, _hugepage_flags) && (vma->vm_flags & VM_HUGEPAGE)) - reclaim_flags = __GFP_DIRECT_RECLAIM; + gfp_mask = GFP_TRANSHUGE; else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, _hugepage_flags)) - reclaim_flags = __GFP_KSWAPD_RECLAIM; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, _hugepage_flags)) - reclaim_flags = __GFP_DIRECT_RECLAIM; + gfp_mask |= __GFP_KSWAPD_RECLAIM; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, _hugepage_flags)) { + if (vm->vm_flags & VM_HUGEPAGE) + gfp_mask = GFP_TRANSHUGE; + else + gfp_mask = GFP_TRANSHUGE | __GFP_NORETRY; + } - return GFP_TRANSHUGE | reclaim_flags; + return
Re: [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
On Wed 18-05-16 13:59:53, Vlastimil Babka wrote: > On 05/13/2016 02:05 PM, Michal Hocko wrote: > > On Fri 13-05-16 10:23:31, Vlastimil Babka wrote: > >> On 05/12/2016 06:20 PM, Michal Hocko wrote: > >>> On Tue 10-05-16 09:35:56, Vlastimil Babka wrote: > >>> [...] > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 570383a41853..0cb09714d960 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -256,8 +256,7 @@ struct vm_area_struct; > #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) > #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) > #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ > - __GFP_NOMEMALLOC | __GFP_NORETRY | > __GFP_NOWARN) & \ > - ~__GFP_RECLAIM) > + __GFP_NOMEMALLOC | __GFP_NOWARN) & > ~__GFP_RECLAIM) > >>> > >>> I am not sure this is the right thing to do. I think we should keep > >>> __GFP_NORETRY and clear it where we want a stronger semantic. This is > >>> just too suble that all callsites are doing the right thing. > >> > >> That would complicate alloc_hugepage_direct_gfpmask() a bit, but if you > >> think it's worth it, I can turn the default around, OK. > > > > Hmm, on the other hand it is true that GFP_TRANSHUGE is clearing both > > reclaim flags by default and then overwrites that. This is just too > > ugly. Can we make GFP_TRANSHUGE to only define flags we care about and > > then tweak those that should go away at the callsites which matter now > > that we do not rely on is_thp_gfp_mask? > > So the following patch attempts what you suggest, if I understand you > correctly. GFP_TRANSHUGE includes all possible flag, and then they are > removed as needed. I don't really think it helps code readability > though. yeah it is ugly has _hell_. I do not think this deserves too much time to discuss as the flag is mostly internal but one last proposal would be to define different THP allocations context explicitly. Some callers would still need some additional meddling but maybe it would be slightly better to read. Dunno. Anyway if you think this is not really an improvement then I won't insist on any change to your original patch. --- diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 570383a41853..e7926b466107 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -255,9 +255,14 @@ struct vm_area_struct; #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) -#define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ + +/* Optimistic or latency sensitive THP allocation - page fault path */ +#define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \ ~__GFP_RECLAIM) +/* More serious THP allocation request - kcompactd */ +#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) & \ + ~__GFP_NORETRY /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1a4d4c807d92..937b89c6c0aa 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -216,7 +216,7 @@ struct page *get_huge_zero_page(void) if (likely(atomic_inc_not_zero(_zero_refcount))) return READ_ONCE(huge_zero_page); - zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, + zero_page = alloc_pages((GFP_TRANSHUGE_LIGHT | __GFP_ZERO) & ~__GFP_MOVABLE, HPAGE_PMD_ORDER); if (!zero_page) { count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED); @@ -888,23 +888,31 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, */ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) { - gfp_t reclaim_flags = 0; + gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT; if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, _hugepage_flags) && (vma->vm_flags & VM_HUGEPAGE)) - reclaim_flags = __GFP_DIRECT_RECLAIM; + gfp_mask = GFP_TRANSHUGE; else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, _hugepage_flags)) - reclaim_flags = __GFP_KSWAPD_RECLAIM; - else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, _hugepage_flags)) - reclaim_flags = __GFP_DIRECT_RECLAIM; + gfp_mask |= __GFP_KSWAPD_RECLAIM; + else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, _hugepage_flags)) { + if (vm->vm_flags & VM_HUGEPAGE) + gfp_mask = GFP_TRANSHUGE; + else + gfp_mask = GFP_TRANSHUGE | __GFP_NORETRY; + } - return GFP_TRANSHUGE | reclaim_flags; + return
Re: [PATCH] mm: fix duplicate words and typos
On 05/17/16 19:35, Li Peng wrote: > Signed-off-by: Li Peng> --- > mm/memcontrol.c | 2 +- > mm/page_alloc.c | 6 +++--- > mm/vmscan.c | 7 +++ > mm/zswap.c | 2 +- > 4 files changed, 8 insertions(+), 9 deletions(-) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 142cb61..8ff5a79 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, > int classzone_idx) > /* >* There should be no need to raise the scanning >* priority if enough pages are already being scanned > - * that that high watermark would be met at 100% > - * efficiency. > + * that high watermark would be met at 100% efficiency. I think that this one wasn't wrong, just confusing. Maybe change it to: * that the high watermark would be met at 100% efficiency. >*/ > if (kswapd_shrink_zone(zone, end_zone, )) > raise_priority = false; > diff --git a/mm/zswap.c b/mm/zswap.c > index de0f119b..6d829d7 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, > unsigned long handle) > * a load may happening concurrently > * it is safe and okay to not free the entry > * if we free the entry in the following put > - * it it either okay to return !0 > + * it either okay to return !0 That's still confusing. Needs some kind of help. > */ > fail: > spin_lock(>lock); > -- ~Randy
Re: [PATCH] mm: fix duplicate words and typos
On 05/17/16 19:35, Li Peng wrote: > Signed-off-by: Li Peng > --- > mm/memcontrol.c | 2 +- > mm/page_alloc.c | 6 +++--- > mm/vmscan.c | 7 +++ > mm/zswap.c | 2 +- > 4 files changed, 8 insertions(+), 9 deletions(-) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 142cb61..8ff5a79 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, > int classzone_idx) > /* >* There should be no need to raise the scanning >* priority if enough pages are already being scanned > - * that that high watermark would be met at 100% > - * efficiency. > + * that high watermark would be met at 100% efficiency. I think that this one wasn't wrong, just confusing. Maybe change it to: * that the high watermark would be met at 100% efficiency. >*/ > if (kswapd_shrink_zone(zone, end_zone, )) > raise_priority = false; > diff --git a/mm/zswap.c b/mm/zswap.c > index de0f119b..6d829d7 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, > unsigned long handle) > * a load may happening concurrently > * it is safe and okay to not free the entry > * if we free the entry in the following put > - * it it either okay to return !0 > + * it either okay to return !0 That's still confusing. Needs some kind of help. > */ > fail: > spin_lock(>lock); > -- ~Randy
[RFC PATCH 04/15] Convert 32-bit ISO atomics into a template
Convert the 32-bit ISO C++11 atomic implementation into a template that can then be overloaded with #defines to produce 32-bit, long and 64-bit atomics. This is then overloaded to produce the 32-bit atomic_t implementation. This should probably be merged with the previous patch if we want to use the template approach. The template approach has the advantage that it can produce all of the different sets of atomics from the same code, but the disadvantage that the template is much less readable. Signed-off-by: David Howells--- include/asm-generic/iso-atomic-template.h | 572 + include/asm-generic/iso-atomic.h | 387 2 files changed, 579 insertions(+), 380 deletions(-) create mode 100644 include/asm-generic/iso-atomic-template.h diff --git a/include/asm-generic/iso-atomic-template.h b/include/asm-generic/iso-atomic-template.h new file mode 100644 index ..76da163ec559 --- /dev/null +++ b/include/asm-generic/iso-atomic-template.h @@ -0,0 +1,572 @@ +/* Use ISO C++11 intrinsics to implement atomic ops. + * + * This file is a template. The #includer needs to #define the following + * items: + * + * atomic_val - counter type (eg. int, long, long long) + * atomic_prefix(x)- prefix (eg. atomic, atomic64, atomic_long) + * __atomic_prefix(x) - prefix with "__" prepended + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +/** + * atomic_read - read atomic variable + * @v: pointer to atomic variable + * + * Atomically reads the value of @v. + */ +static __always_inline atomic_val atomic_prefix(_read)(const atomic_prefix(_t) *v) +{ + return __atomic_load_n(>counter, __ATOMIC_RELAXED); +} + +static __always_inline atomic_val atomic_prefix(_read_acquire)(const atomic_prefix(_t) *v) +{ + return __atomic_load_n(>counter, __ATOMIC_ACQUIRE); +} + +/** + * atomic_set - set atomic variable + * @v: pointer to atomic variable + * @i: required value + * + * Atomically sets the value of @v to @i. + */ +static __always_inline void atomic_prefix(_set)(atomic_prefix(_t) *v, atomic_val i) +{ + __atomic_store_n(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline void atomic_prefix(_set_release)(atomic_prefix(_t) *v, atomic_val i) +{ + __atomic_store_n(>counter, i, __ATOMIC_RELEASE); +} + +/** + * atomic_add - add integer to atomic variable + * @i: integer value to add + * @v: pointer to atomic variable + * + * Atomically adds @i to @v. + */ +static __always_inline void atomic_prefix(_add)(atomic_val i, atomic_prefix(_t) *v) +{ + __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline void atomic_prefix(_inc)(atomic_prefix(_t) *v) +{ + __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED); +} + +/** + * atomic_sub - subtract integer from atomic variable + * @i: integer value to subtract + * @v: pointer to atomic variable + * + * Atomically subtracts @i from @v. + */ +static __always_inline void atomic_prefix(_sub)(atomic_val i, atomic_prefix(_t) *v) +{ + __atomic_sub_fetch(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline void atomic_prefix(_dec)(atomic_prefix(_t) *v) +{ + __atomic_sub_fetch(>counter, 1, __ATOMIC_RELAXED); +} + +/** + * atomic_add_return - add integer and return + * @i: integer value to add + * @v: pointer to atomic variable + * + * Atomically adds @i to @v and returns @i + @v. + */ +static __always_inline +atomic_val atomic_prefix(_add_return)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_SEQ_CST); +} + +static __always_inline +atomic_val atomic_prefix(_inc_return)(atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, 1, __ATOMIC_SEQ_CST); +} + +static __always_inline +atomic_val atomic_prefix(_add_return_relaxed)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline +atomic_val atomic_prefix(_inc_return_relaxed)(atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED); +} + +static __always_inline +atomic_val atomic_prefix(_add_return_acquire)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_ACQUIRE); +} + +static __always_inline +atomic_val atomic_prefix(_inc_return_acquire)(atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, 1, __ATOMIC_ACQUIRE); +} + +static __always_inline +atomic_val atomic_prefix(_add_return_release)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_RELEASE);
[RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics
Provide implementations of cmpxchg(), xchg(), xadd() and __add() using ISO C++11 intrinsics. Also provide variants with less strong ordering. The __atomic_compare_exchange_n() function returns a bool indicating whether or not the exchange took place. On x86, for example, the CMPXCHG instruction sticks this value into the Z flag and using the bool return allows the compiler to use this value directly. I've added two new macros for this: (1) bool cmpxchg_return(TYPE *ptr, TYPE old, TYPE new, TYPE *_orig) This is like cmpxchg() except that the current value is returned in *_orig and true is returned if the exchange happens, false otherwise. (2) bool try_cmpxchg(TYPE *ptr, TYPE old, TYPE new) Like cmpxchg_return() except that the current value is discarded. I've marked cmpxchg() as deprecated as the compiler generates better code if it is allowed to use the extra bit of information rather than comparing old with the current value. To illustrate the differences on x86_64, I compiled the following with -Os and -fomit-frame-pointer: const char *kernel_cmpxchg(long *ptr, long old, long new) { long cur; cur = cmpxchg(ptr, old, new); return cur == old ? "foo" : "bar"; } #define ISO__try_cmpxchg(ptr, old, new, mem)\ ({ \ __typeof__((ptr)) __ptr = (ptr);\ __typeof__(*__ptr) __orig; \ __atomic_compare_exchange_n(__ptr, \ &__orig, (new), \ false, \ mem,\ __ATOMIC_RELAXED); \ }) #define ISO_try_cmpxchg(ptr, old, new) \ ISO__try_cmpxchg((ptr), (old), (new), __ATOMIC_SEQ_CST) const char *iso_try_cmpxchg(long *ptr, long old, long new) { return ISO_try_cmpxchg(ptr, old, new) ? "foo" : "bar"; } which gives: kernel_cmpxchg: mov%rsi,%rax lock cmpxchg %rdx,(%rdi) mov$.LC1,%edx cmp%rax,%rsi<--- Redundant instruction mov$.LC0,%eax cmovne %rdx,%rax retq iso_cmpxchg_return: mov%rsi,%rax mov%rsi,-0x8(%rsp) <--- Unoptimisation lock cmpxchg %rdx,(%rdi) mov$.LC1,%edx mov$.LC0,%eax cmovne %rdx,%rax retq As can be seen, barring an unoptimisation, the ISO try_cmpxchg() code can dispense with the CMP instruction required for the kernel inline asm and use the Z flag resulting from the CMPXCHG directly. This unoptimisation is an issus in gcc's __atomic_compare_exchange[_n]() on several arches, including x86, arm64 and powerpc64, whereby it tries to use the stack to pass the old value rather than bunging this into a register. This appears to be because __atomic_compare_exchange[_n]() returns a bool indicating whether the exchange happened or not and passes back the current value read from the memory variable through a pointer in the argument list. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 Signed-off-by: David Howells--- include/asm-generic/iso-cmpxchg.h | 180 + include/linux/atomic.h|3 + 2 files changed, 183 insertions(+) create mode 100644 include/asm-generic/iso-cmpxchg.h diff --git a/include/asm-generic/iso-cmpxchg.h b/include/asm-generic/iso-cmpxchg.h new file mode 100644 index ..c84e213c0b6b --- /dev/null +++ b/include/asm-generic/iso-cmpxchg.h @@ -0,0 +1,180 @@ +/* Use ISO C++11 intrinsics to implement cmpxchg() and xchg(). + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _ASM_GENERIC_ISO_CMPXCHG_H +#define _ASM_GENERIC_ISO_CMPXCHG_H + +/** + * cmpxchg_return - Check variable and replace if expected value. + * @ptr: Pointer to target variable + * @old: The value to check for + * @new: The value to replace with + * @_orig: A pointer to a variable in which to place the current value + * + * Atomically checks the contents of *@ptr, and if the same as @old, replaces + * it with @new. Return true if the exchange happened, false if it didn't. + * The original value read from
[RFC PATCH 04/15] Convert 32-bit ISO atomics into a template
Convert the 32-bit ISO C++11 atomic implementation into a template that can then be overloaded with #defines to produce 32-bit, long and 64-bit atomics. This is then overloaded to produce the 32-bit atomic_t implementation. This should probably be merged with the previous patch if we want to use the template approach. The template approach has the advantage that it can produce all of the different sets of atomics from the same code, but the disadvantage that the template is much less readable. Signed-off-by: David Howells --- include/asm-generic/iso-atomic-template.h | 572 + include/asm-generic/iso-atomic.h | 387 2 files changed, 579 insertions(+), 380 deletions(-) create mode 100644 include/asm-generic/iso-atomic-template.h diff --git a/include/asm-generic/iso-atomic-template.h b/include/asm-generic/iso-atomic-template.h new file mode 100644 index ..76da163ec559 --- /dev/null +++ b/include/asm-generic/iso-atomic-template.h @@ -0,0 +1,572 @@ +/* Use ISO C++11 intrinsics to implement atomic ops. + * + * This file is a template. The #includer needs to #define the following + * items: + * + * atomic_val - counter type (eg. int, long, long long) + * atomic_prefix(x)- prefix (eg. atomic, atomic64, atomic_long) + * __atomic_prefix(x) - prefix with "__" prepended + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +/** + * atomic_read - read atomic variable + * @v: pointer to atomic variable + * + * Atomically reads the value of @v. + */ +static __always_inline atomic_val atomic_prefix(_read)(const atomic_prefix(_t) *v) +{ + return __atomic_load_n(>counter, __ATOMIC_RELAXED); +} + +static __always_inline atomic_val atomic_prefix(_read_acquire)(const atomic_prefix(_t) *v) +{ + return __atomic_load_n(>counter, __ATOMIC_ACQUIRE); +} + +/** + * atomic_set - set atomic variable + * @v: pointer to atomic variable + * @i: required value + * + * Atomically sets the value of @v to @i. + */ +static __always_inline void atomic_prefix(_set)(atomic_prefix(_t) *v, atomic_val i) +{ + __atomic_store_n(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline void atomic_prefix(_set_release)(atomic_prefix(_t) *v, atomic_val i) +{ + __atomic_store_n(>counter, i, __ATOMIC_RELEASE); +} + +/** + * atomic_add - add integer to atomic variable + * @i: integer value to add + * @v: pointer to atomic variable + * + * Atomically adds @i to @v. + */ +static __always_inline void atomic_prefix(_add)(atomic_val i, atomic_prefix(_t) *v) +{ + __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline void atomic_prefix(_inc)(atomic_prefix(_t) *v) +{ + __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED); +} + +/** + * atomic_sub - subtract integer from atomic variable + * @i: integer value to subtract + * @v: pointer to atomic variable + * + * Atomically subtracts @i from @v. + */ +static __always_inline void atomic_prefix(_sub)(atomic_val i, atomic_prefix(_t) *v) +{ + __atomic_sub_fetch(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline void atomic_prefix(_dec)(atomic_prefix(_t) *v) +{ + __atomic_sub_fetch(>counter, 1, __ATOMIC_RELAXED); +} + +/** + * atomic_add_return - add integer and return + * @i: integer value to add + * @v: pointer to atomic variable + * + * Atomically adds @i to @v and returns @i + @v. + */ +static __always_inline +atomic_val atomic_prefix(_add_return)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_SEQ_CST); +} + +static __always_inline +atomic_val atomic_prefix(_inc_return)(atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, 1, __ATOMIC_SEQ_CST); +} + +static __always_inline +atomic_val atomic_prefix(_add_return_relaxed)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_RELAXED); +} + +static __always_inline +atomic_val atomic_prefix(_inc_return_relaxed)(atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, 1, __ATOMIC_RELAXED); +} + +static __always_inline +atomic_val atomic_prefix(_add_return_acquire)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_ACQUIRE); +} + +static __always_inline +atomic_val atomic_prefix(_inc_return_acquire)(atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, 1, __ATOMIC_ACQUIRE); +} + +static __always_inline +atomic_val atomic_prefix(_add_return_release)(atomic_val i, atomic_prefix(_t) *v) +{ + return __atomic_add_fetch(>counter, i, __ATOMIC_RELEASE); +} + +static
[RFC PATCH 07/15] Provide cmpxchg(), xchg(), xadd() and __add() based on ISO C++11 intrinsics
Provide implementations of cmpxchg(), xchg(), xadd() and __add() using ISO C++11 intrinsics. Also provide variants with less strong ordering. The __atomic_compare_exchange_n() function returns a bool indicating whether or not the exchange took place. On x86, for example, the CMPXCHG instruction sticks this value into the Z flag and using the bool return allows the compiler to use this value directly. I've added two new macros for this: (1) bool cmpxchg_return(TYPE *ptr, TYPE old, TYPE new, TYPE *_orig) This is like cmpxchg() except that the current value is returned in *_orig and true is returned if the exchange happens, false otherwise. (2) bool try_cmpxchg(TYPE *ptr, TYPE old, TYPE new) Like cmpxchg_return() except that the current value is discarded. I've marked cmpxchg() as deprecated as the compiler generates better code if it is allowed to use the extra bit of information rather than comparing old with the current value. To illustrate the differences on x86_64, I compiled the following with -Os and -fomit-frame-pointer: const char *kernel_cmpxchg(long *ptr, long old, long new) { long cur; cur = cmpxchg(ptr, old, new); return cur == old ? "foo" : "bar"; } #define ISO__try_cmpxchg(ptr, old, new, mem)\ ({ \ __typeof__((ptr)) __ptr = (ptr);\ __typeof__(*__ptr) __orig; \ __atomic_compare_exchange_n(__ptr, \ &__orig, (new), \ false, \ mem,\ __ATOMIC_RELAXED); \ }) #define ISO_try_cmpxchg(ptr, old, new) \ ISO__try_cmpxchg((ptr), (old), (new), __ATOMIC_SEQ_CST) const char *iso_try_cmpxchg(long *ptr, long old, long new) { return ISO_try_cmpxchg(ptr, old, new) ? "foo" : "bar"; } which gives: kernel_cmpxchg: mov%rsi,%rax lock cmpxchg %rdx,(%rdi) mov$.LC1,%edx cmp%rax,%rsi<--- Redundant instruction mov$.LC0,%eax cmovne %rdx,%rax retq iso_cmpxchg_return: mov%rsi,%rax mov%rsi,-0x8(%rsp) <--- Unoptimisation lock cmpxchg %rdx,(%rdi) mov$.LC1,%edx mov$.LC0,%eax cmovne %rdx,%rax retq As can be seen, barring an unoptimisation, the ISO try_cmpxchg() code can dispense with the CMP instruction required for the kernel inline asm and use the Z flag resulting from the CMPXCHG directly. This unoptimisation is an issus in gcc's __atomic_compare_exchange[_n]() on several arches, including x86, arm64 and powerpc64, whereby it tries to use the stack to pass the old value rather than bunging this into a register. This appears to be because __atomic_compare_exchange[_n]() returns a bool indicating whether the exchange happened or not and passes back the current value read from the memory variable through a pointer in the argument list. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 Signed-off-by: David Howells --- include/asm-generic/iso-cmpxchg.h | 180 + include/linux/atomic.h|3 + 2 files changed, 183 insertions(+) create mode 100644 include/asm-generic/iso-cmpxchg.h diff --git a/include/asm-generic/iso-cmpxchg.h b/include/asm-generic/iso-cmpxchg.h new file mode 100644 index ..c84e213c0b6b --- /dev/null +++ b/include/asm-generic/iso-cmpxchg.h @@ -0,0 +1,180 @@ +/* Use ISO C++11 intrinsics to implement cmpxchg() and xchg(). + * + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _ASM_GENERIC_ISO_CMPXCHG_H +#define _ASM_GENERIC_ISO_CMPXCHG_H + +/** + * cmpxchg_return - Check variable and replace if expected value. + * @ptr: Pointer to target variable + * @old: The value to check for + * @new: The value to replace with + * @_orig: A pointer to a variable in which to place the current value + * + * Atomically checks the contents of *@ptr, and if the same as @old, replaces + * it with @new. Return true if the exchange happened, false if it didn't. + * The original value read from *@ptr is stored in
Re: [RFC PATCH 2/3] arch/powerpc : optprobes for powerpc core
On Wed, 18 May 2016 02:09:37 +0530 Anju Twrote: > Instruction slot for detour buffer is allocated from > the reserved area. For the time being 64KB is reserved > in memory for this purpose. ppc_get_optinsn_slot() and > ppc_free_optinsn_slot() are geared towards the allocation and freeing > of memory from this area. Thank you for porting optprobe on ppc!! I have some comments on this patch. > > Signed-off-by: Anju T > --- > arch/powerpc/kernel/optprobes.c | 463 > > 1 file changed, 463 insertions(+) > create mode 100644 arch/powerpc/kernel/optprobes.c > > diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c > new file mode 100644 > index 000..50a60c1 > --- /dev/null > +++ b/arch/powerpc/kernel/optprobes.c > @@ -0,0 +1,463 @@ > +/* > + * Code for Kernel probes Jump optimization. > + * > + * Copyright 2016, Anju T, IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Reserve an area to allocate slots for detour buffer */ > +extern void optprobe_trampoline_holder(void) > +{ > + asm volatile(".global optinsn_slot\n" > + "optinsn_slot:\n" > + ".space 65536"); > +} Would we better move this into optprobes_head.S? > + > +#define SLOT_SIZE 65536 > +#define TMPL_CALL_HDLR_IDX \ > + (optprobe_template_call_handler - optprobe_template_entry) > +#define TMPL_EMULATE_IDX \ > + (optprobe_template_call_emulate - optprobe_template_entry) > +#define TMPL_RET_BRANCH_IDX \ > + (optprobe_template_ret_branch - optprobe_template_entry) > +#define TMPL_RET_IDX \ > + (optprobe_template_ret - optprobe_template_entry) > +#define TMPL_OP1_IDX \ > + (optprobe_template_op_address1 - optprobe_template_entry) > +#define TMPL_OP2_IDX \ > + (optprobe_template_op_address2 - optprobe_template_entry) > +#define TMPL_INSN_IDX\ > + (optprobe_template_insn - optprobe_template_entry) > +#define TMPL_END_IDX \ > + (optprobe_template_end - optprobe_template_entry) > + > +struct kprobe_ppc_insn_page { > + struct list_head list; > + kprobe_opcode_t *insns; /* Page of instruction slots */ > + struct kprobe_insn_cache *cache; > + int nused; > + int ngarbage; > + char slot_used[]; > +}; > + > +#define PPC_KPROBE_INSN_PAGE_SIZE(slots) \ > + (offsetof(struct kprobe_ppc_insn_page, slot_used) + \ > + (sizeof(char) * (slots))) > + > +enum ppc_kprobe_slot_state { > + SLOT_CLEAN = 0, > + SLOT_DIRTY = 1, > + SLOT_USED = 2, > +}; > + > +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = { > + .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex), > + .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages), > + /* .insn_size is initialized later */ > + .nr_garbage = 0, > +}; > + > +static int ppc_slots_per_page(struct kprobe_insn_cache *c) > +{ > + /* > + * Here the #slots per page differs from x86 as we have > + * only 64KB reserved. > + */ > + return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t)); > +} > + > +/* Return 1 if all garbages are collected, otherwise 0. */ > +static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx) > +{ > + kip->slot_used[idx] = SLOT_CLEAN; > + kip->nused--; > + return 0; > +} > + > +static int collect_garbage_slots(struct kprobe_insn_cache *c) > +{ > + struct kprobe_ppc_insn_page *kip, *next; > + > + /* Ensure no-one is interrupted on the garbages */ > + synchronize_sched(); > + > + list_for_each_entry_safe(kip, next, >pages, list) { > + int i; > + > + if (kip->ngarbage == 0) > + continue; > + kip->ngarbage = 0; /* we will collect all garbages */ > + for (i = 0; i < ppc_slots_per_page(c); i++) { > + if (kip->slot_used[i] == SLOT_DIRTY && > + collect_one_slot(kip, i)) > + break; > + } > + } > + c->nr_garbage = 0; > + return 0; > +} > + > +kprobe_opcode_t *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c) > +{ > + struct kprobe_ppc_insn_page *kip; > + kprobe_opcode_t *slot = NULL; > + > + mutex_lock(>mutex); > + list_for_each_entry(kip, >pages, list) { > + if (kip->nused < ppc_slots_per_page(c)) { > + int i; > + > + for (i = 0; i < ppc_slots_per_page(c); i++) { > + if (kip->slot_used[i] == SLOT_CLEAN) { > +
Re: [RFC PATCH 2/3] arch/powerpc : optprobes for powerpc core
On Wed, 18 May 2016 02:09:37 +0530 Anju T wrote: > Instruction slot for detour buffer is allocated from > the reserved area. For the time being 64KB is reserved > in memory for this purpose. ppc_get_optinsn_slot() and > ppc_free_optinsn_slot() are geared towards the allocation and freeing > of memory from this area. Thank you for porting optprobe on ppc!! I have some comments on this patch. > > Signed-off-by: Anju T > --- > arch/powerpc/kernel/optprobes.c | 463 > > 1 file changed, 463 insertions(+) > create mode 100644 arch/powerpc/kernel/optprobes.c > > diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c > new file mode 100644 > index 000..50a60c1 > --- /dev/null > +++ b/arch/powerpc/kernel/optprobes.c > @@ -0,0 +1,463 @@ > +/* > + * Code for Kernel probes Jump optimization. > + * > + * Copyright 2016, Anju T, IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Reserve an area to allocate slots for detour buffer */ > +extern void optprobe_trampoline_holder(void) > +{ > + asm volatile(".global optinsn_slot\n" > + "optinsn_slot:\n" > + ".space 65536"); > +} Would we better move this into optprobes_head.S? > + > +#define SLOT_SIZE 65536 > +#define TMPL_CALL_HDLR_IDX \ > + (optprobe_template_call_handler - optprobe_template_entry) > +#define TMPL_EMULATE_IDX \ > + (optprobe_template_call_emulate - optprobe_template_entry) > +#define TMPL_RET_BRANCH_IDX \ > + (optprobe_template_ret_branch - optprobe_template_entry) > +#define TMPL_RET_IDX \ > + (optprobe_template_ret - optprobe_template_entry) > +#define TMPL_OP1_IDX \ > + (optprobe_template_op_address1 - optprobe_template_entry) > +#define TMPL_OP2_IDX \ > + (optprobe_template_op_address2 - optprobe_template_entry) > +#define TMPL_INSN_IDX\ > + (optprobe_template_insn - optprobe_template_entry) > +#define TMPL_END_IDX \ > + (optprobe_template_end - optprobe_template_entry) > + > +struct kprobe_ppc_insn_page { > + struct list_head list; > + kprobe_opcode_t *insns; /* Page of instruction slots */ > + struct kprobe_insn_cache *cache; > + int nused; > + int ngarbage; > + char slot_used[]; > +}; > + > +#define PPC_KPROBE_INSN_PAGE_SIZE(slots) \ > + (offsetof(struct kprobe_ppc_insn_page, slot_used) + \ > + (sizeof(char) * (slots))) > + > +enum ppc_kprobe_slot_state { > + SLOT_CLEAN = 0, > + SLOT_DIRTY = 1, > + SLOT_USED = 2, > +}; > + > +static struct kprobe_insn_cache kprobe_ppc_optinsn_slots = { > + .mutex = __MUTEX_INITIALIZER(kprobe_ppc_optinsn_slots.mutex), > + .pages = LIST_HEAD_INIT(kprobe_ppc_optinsn_slots.pages), > + /* .insn_size is initialized later */ > + .nr_garbage = 0, > +}; > + > +static int ppc_slots_per_page(struct kprobe_insn_cache *c) > +{ > + /* > + * Here the #slots per page differs from x86 as we have > + * only 64KB reserved. > + */ > + return SLOT_SIZE / (c->insn_size * sizeof(kprobe_opcode_t)); > +} > + > +/* Return 1 if all garbages are collected, otherwise 0. */ > +static int collect_one_slot(struct kprobe_ppc_insn_page *kip, int idx) > +{ > + kip->slot_used[idx] = SLOT_CLEAN; > + kip->nused--; > + return 0; > +} > + > +static int collect_garbage_slots(struct kprobe_insn_cache *c) > +{ > + struct kprobe_ppc_insn_page *kip, *next; > + > + /* Ensure no-one is interrupted on the garbages */ > + synchronize_sched(); > + > + list_for_each_entry_safe(kip, next, >pages, list) { > + int i; > + > + if (kip->ngarbage == 0) > + continue; > + kip->ngarbage = 0; /* we will collect all garbages */ > + for (i = 0; i < ppc_slots_per_page(c); i++) { > + if (kip->slot_used[i] == SLOT_DIRTY && > + collect_one_slot(kip, i)) > + break; > + } > + } > + c->nr_garbage = 0; > + return 0; > +} > + > +kprobe_opcode_t *__ppc_get_optinsn_slot(struct kprobe_insn_cache *c) > +{ > + struct kprobe_ppc_insn_page *kip; > + kprobe_opcode_t *slot = NULL; > + > + mutex_lock(>mutex); > + list_for_each_entry(kip, >pages, list) { > + if (kip->nused < ppc_slots_per_page(c)) { > + int i; > + > + for (i = 0; i < ppc_slots_per_page(c); i++) { > + if (kip->slot_used[i] == SLOT_CLEAN) { > + kip->slot_used[i] = SLOT_USED; >
Re: [PATCH v6 2/8] debugfs: prevent access to removed files' private data
On 05/18/2016 11:01 AM, Nicolai Stange wrote: > Thanks a million for reporting! > > 1.) Do you have lockdep enabled? Yup, nothing there. > 2.) Does this happen before or after userspace init has been spawned, > i.e. does the lockup happen at debugfs file creation time or > possibly at usage time? So I looked closer, and it seems to happen after starting syzkaller, which as far as I know tries to open many different debugfs files. Is there debug code I can add it that'll help us figure out what's up? Thanks, Sasha
Re: [PATCH v6 2/8] debugfs: prevent access to removed files' private data
On 05/18/2016 11:01 AM, Nicolai Stange wrote: > Thanks a million for reporting! > > 1.) Do you have lockdep enabled? Yup, nothing there. > 2.) Does this happen before or after userspace init has been spawned, > i.e. does the lockup happen at debugfs file creation time or > possibly at usage time? So I looked closer, and it seems to happen after starting syzkaller, which as far as I know tries to open many different debugfs files. Is there debug code I can add it that'll help us figure out what's up? Thanks, Sasha
[PATCH] Input: pwm-beeper - fix: scheduling while atomic
Pwm config may sleep so defer it using a worker. Trigger: On a Freescale i.MX53 based board we ran into "BUG: scheduling while atomic" because input_inject_event locks interrupts, but imx_pwm_config_v2 sleeps. Tested on Freescale i.MX53 SoC with 4.6.0 and 4.1.24. Unmodified applicable to * 4.6(stable) * 4.5.4 (stable) * 4.4.10 (longterm) * 4.1.24 (longterm) Modified applicable to * 3.18.33 (longterm) Signed-off-by: Manfred Schlaegl--- drivers/input/misc/pwm-beeper.c | 54 +++-- 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/drivers/input/misc/pwm-beeper.c b/drivers/input/misc/pwm-beeper.c index f2261ab..7d783c8 100644 --- a/drivers/input/misc/pwm-beeper.c +++ b/drivers/input/misc/pwm-beeper.c @@ -20,21 +20,41 @@ #include #include #include +#include struct pwm_beeper { struct input_dev *input; struct pwm_device *pwm; + struct work_struct work; unsigned long period; }; #define HZ_TO_NANOSECONDS(x) (10UL/(x)) +static void __pwm_beeper_set(struct pwm_beeper *beeper) +{ + unsigned long period = beeper->period; + + pwm_config(beeper->pwm, period / 2, period); + + if (period == 0) + pwm_disable(beeper->pwm); + else + pwm_enable(beeper->pwm); +} + +static void pwm_beeper_work(struct work_struct *work) +{ + struct pwm_beeper *beeper = + container_of(work, struct pwm_beeper, work); + + __pwm_beeper_set(beeper); +} + static int pwm_beeper_event(struct input_dev *input, unsigned int type, unsigned int code, int value) { - int ret = 0; struct pwm_beeper *beeper = input_get_drvdata(input); - unsigned long period; if (type != EV_SND || value < 0) return -EINVAL; @@ -49,18 +69,12 @@ static int pwm_beeper_event(struct input_dev *input, return -EINVAL; } - if (value == 0) { - pwm_disable(beeper->pwm); - } else { - period = HZ_TO_NANOSECONDS(value); - ret = pwm_config(beeper->pwm, period / 2, period); - if (ret) - return ret; - ret = pwm_enable(beeper->pwm); - if (ret) - return ret; - beeper->period = period; - } + if (value == 0) + beeper->period = 0; + else + beeper->period = HZ_TO_NANOSECONDS(value); + + schedule_work(>work); return 0; } @@ -87,6 +101,8 @@ static int pwm_beeper_probe(struct platform_device *pdev) goto err_free; } + INIT_WORK(>work, pwm_beeper_work); + beeper->input = input_allocate_device(); if (!beeper->input) { dev_err(>dev, "Failed to allocate input device\n"); @@ -133,6 +149,8 @@ static int pwm_beeper_remove(struct platform_device *pdev) { struct pwm_beeper *beeper = platform_get_drvdata(pdev); + cancel_work_sync(>work); + input_unregister_device(beeper->input); pwm_disable(beeper->pwm); @@ -147,6 +165,8 @@ static int __maybe_unused pwm_beeper_suspend(struct device *dev) { struct pwm_beeper *beeper = dev_get_drvdata(dev); + cancel_work_sync(>work); + if (beeper->period) pwm_disable(beeper->pwm); @@ -157,10 +177,8 @@ static int __maybe_unused pwm_beeper_resume(struct device *dev) { struct pwm_beeper *beeper = dev_get_drvdata(dev); - if (beeper->period) { - pwm_config(beeper->pwm, beeper->period / 2, beeper->period); - pwm_enable(beeper->pwm); - } + if (beeper->period) + __pwm_beeper_set(beeper); return 0; } -- 2.1.4
[PATCH] Input: pwm-beeper - fix: scheduling while atomic
Pwm config may sleep so defer it using a worker. Trigger: On a Freescale i.MX53 based board we ran into "BUG: scheduling while atomic" because input_inject_event locks interrupts, but imx_pwm_config_v2 sleeps. Tested on Freescale i.MX53 SoC with 4.6.0 and 4.1.24. Unmodified applicable to * 4.6(stable) * 4.5.4 (stable) * 4.4.10 (longterm) * 4.1.24 (longterm) Modified applicable to * 3.18.33 (longterm) Signed-off-by: Manfred Schlaegl --- drivers/input/misc/pwm-beeper.c | 54 +++-- 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/drivers/input/misc/pwm-beeper.c b/drivers/input/misc/pwm-beeper.c index f2261ab..7d783c8 100644 --- a/drivers/input/misc/pwm-beeper.c +++ b/drivers/input/misc/pwm-beeper.c @@ -20,21 +20,41 @@ #include #include #include +#include struct pwm_beeper { struct input_dev *input; struct pwm_device *pwm; + struct work_struct work; unsigned long period; }; #define HZ_TO_NANOSECONDS(x) (10UL/(x)) +static void __pwm_beeper_set(struct pwm_beeper *beeper) +{ + unsigned long period = beeper->period; + + pwm_config(beeper->pwm, period / 2, period); + + if (period == 0) + pwm_disable(beeper->pwm); + else + pwm_enable(beeper->pwm); +} + +static void pwm_beeper_work(struct work_struct *work) +{ + struct pwm_beeper *beeper = + container_of(work, struct pwm_beeper, work); + + __pwm_beeper_set(beeper); +} + static int pwm_beeper_event(struct input_dev *input, unsigned int type, unsigned int code, int value) { - int ret = 0; struct pwm_beeper *beeper = input_get_drvdata(input); - unsigned long period; if (type != EV_SND || value < 0) return -EINVAL; @@ -49,18 +69,12 @@ static int pwm_beeper_event(struct input_dev *input, return -EINVAL; } - if (value == 0) { - pwm_disable(beeper->pwm); - } else { - period = HZ_TO_NANOSECONDS(value); - ret = pwm_config(beeper->pwm, period / 2, period); - if (ret) - return ret; - ret = pwm_enable(beeper->pwm); - if (ret) - return ret; - beeper->period = period; - } + if (value == 0) + beeper->period = 0; + else + beeper->period = HZ_TO_NANOSECONDS(value); + + schedule_work(>work); return 0; } @@ -87,6 +101,8 @@ static int pwm_beeper_probe(struct platform_device *pdev) goto err_free; } + INIT_WORK(>work, pwm_beeper_work); + beeper->input = input_allocate_device(); if (!beeper->input) { dev_err(>dev, "Failed to allocate input device\n"); @@ -133,6 +149,8 @@ static int pwm_beeper_remove(struct platform_device *pdev) { struct pwm_beeper *beeper = platform_get_drvdata(pdev); + cancel_work_sync(>work); + input_unregister_device(beeper->input); pwm_disable(beeper->pwm); @@ -147,6 +165,8 @@ static int __maybe_unused pwm_beeper_suspend(struct device *dev) { struct pwm_beeper *beeper = dev_get_drvdata(dev); + cancel_work_sync(>work); + if (beeper->period) pwm_disable(beeper->pwm); @@ -157,10 +177,8 @@ static int __maybe_unused pwm_beeper_resume(struct device *dev) { struct pwm_beeper *beeper = dev_get_drvdata(dev); - if (beeper->period) { - pwm_config(beeper->pwm, beeper->period / 2, beeper->period); - pwm_enable(beeper->pwm); - } + if (beeper->period) + __pwm_beeper_set(beeper); return 0; } -- 2.1.4
Re: [PATCH] rcu: tree: correctly handle sparse possible CPUs
2016-05-16 19:48 GMT+03:00 Mark Rutland: > /* > + * Iterate over all possible CPUs in a leaf RCU node. > + */ > +#define for_each_leaf_node_possible_cpu(rnp, cpu) \ > + for ((cpu) = rnp->grplo; \ > +cpu <= rnp->grphi; \ > +cpu = cpumask_next((cpu), cpu_possible_mask)) > + > +/* > + * Iterate over all possible CPUs in a leaf RCU node, at each step providing > a > + * bit for comparison against rcu_node bitmasks. > + */ > +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \ > + for ((cpu) = rnp->grplo, (bit) = 1; \ > +cpu <= rnp->grphi; \ > +cpu = cpumask_next((cpu), cpu_possible_mask), \ > + (bit) = 1UL << (cpu - rnp->grplo)) > + [0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3 [0.164000] shift exponent 64 is too large for 64-bit type 'long unsigned int'
Re: [PATCH] rcu: tree: correctly handle sparse possible CPUs
2016-05-16 19:48 GMT+03:00 Mark Rutland : > /* > + * Iterate over all possible CPUs in a leaf RCU node. > + */ > +#define for_each_leaf_node_possible_cpu(rnp, cpu) \ > + for ((cpu) = rnp->grplo; \ > +cpu <= rnp->grphi; \ > +cpu = cpumask_next((cpu), cpu_possible_mask)) > + > +/* > + * Iterate over all possible CPUs in a leaf RCU node, at each step providing > a > + * bit for comparison against rcu_node bitmasks. > + */ > +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \ > + for ((cpu) = rnp->grplo, (bit) = 1; \ > +cpu <= rnp->grphi; \ > +cpu = cpumask_next((cpu), cpu_possible_mask), \ > + (bit) = 1UL << (cpu - rnp->grplo)) > + [0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3 [0.164000] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics
Provide an implementation of the atomic_t functions implemented with ISO-C++11 atomics. This has some advantages over using inline assembly: (1) The compiler has a much better idea of what is going on and can optimise appropriately, whereas with inline assembly, the content is an indivisible black box as far as the compiler is concerned. For example, with powerpc64, the compiler can put the barrier in a potentially more favourable position. Take the inline-asm variant of test_and_set_bit() - this has to (a) generate a mask and an address and (b) interpolate a memory barrier before doing the atomic ops. Here's a disassembly of the current inline-asm variant and then the ISO variant: .current_test_and_set_bit: clrlwi r10,r3,26 } rldicl r3,r3,58,6 } Fiddling with regs to make rldicr r3,r3,3,60 } mask and address li r9,1} sld r9,r9,r10 } add r4,r4,r3} hwsync <--- Release barrier retry: ldarx r3,0,r4 } or r10,r3,r9 } Atomic region stdcx. r10,0,r4} bne-retry } hwsync <--- Acquire barrier and r9,r9,r3 addic r3,r9,-1 subfe r3,r3,r9 blr .iso_test_and_set_bit: hwsync <--- Release barrier clrlwi r10,r3,26 } sradi r3,r3,6 } Fiddling with regs to make li r9,1} mask and address rldicr r3,r3,3,60 } sld r9,r9,r10 } add r4,r4,r3} retry: ldarx r3,0,r4 } or r10,r3,r9 } Atomic region stdcx. r10,0,r4} bne retry } isync <--- Acquire barrier and r9,r9,r3 addic r3,r9,-1 subfe r3,r3,r9 blr Moving the barrier up in the ISO case would seem to give the CPU a better chance of doing the barrier simultaneously with the register fiddling. Things to note here: (a) A BNE rather than a BNE- is emitted after the STDCX instruction. I've logged this as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162 (b) An ISYNC instruction is emitted as the Acquire barrier with __ATOMIC_SEQ_CST, but I'm not sure this is strong enough. (c) An LWSYNC instruction is emitted as the Release barrier with __ATOMIC_ACQ_REL or __ATOMIC_RELEASE. Is this strong enough that we could use these memorders instead? (2) The compiler can pick the appropriate instruction to use, depending on the size of the memory location, the operation being applied, the value of the operand being applied (if applicable), whether the return value is used and whether any condition flags resulting from the atomic op are used. For instance, on x86_64, atomic_add() can use any of ADDL, SUBL, INCL, DECL, XADDL or CMPXCHGL. atomic_add_return() can use ADDL, SUBL, INCL or DECL if we only about the representation of the return value encoded in the flags (zero, carry, sign). If we actually want the return value, atomic_add_return() will use XADDL. So with __atomic_fetch_add(), if the return value isn't being used and if the operand is 1, INCL will be used; if >1, ADDL will be used; if -1, DECL will be used; and if <-1, SUBL will be used. Things to note here: (a) If unpatched, gcc will emit an XADDL instruction rather than a DECL instruction if it is told to subtract 1. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821 (b) The x86 arch keeps a list of LOCK prefixes and NOP's them out when only one CPU is online, but gcc doesn't do this yet. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973 (3) The compiler can make use of extra information provided by the atomic instructions involved that can't otherwise be passed out of inline assembly - such as the condition flag values. Take atomic_dec_and_test(), for example. I can call it as follows: const char *xxx_atomic_dec_and_test(atomic_t *counter) { if (atomic_dec_and_test(counter)) return "foo"; return "bar"; } For original, non-asm-goto code, I see: nogoto_atomic_dec_and_test: lock decl (%rdi) sete %al mov$.LC1,%edx test %al,%al
[RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics
Provide an implementation of the atomic_t functions implemented with ISO-C++11 atomics. This has some advantages over using inline assembly: (1) The compiler has a much better idea of what is going on and can optimise appropriately, whereas with inline assembly, the content is an indivisible black box as far as the compiler is concerned. For example, with powerpc64, the compiler can put the barrier in a potentially more favourable position. Take the inline-asm variant of test_and_set_bit() - this has to (a) generate a mask and an address and (b) interpolate a memory barrier before doing the atomic ops. Here's a disassembly of the current inline-asm variant and then the ISO variant: .current_test_and_set_bit: clrlwi r10,r3,26 } rldicl r3,r3,58,6 } Fiddling with regs to make rldicr r3,r3,3,60 } mask and address li r9,1} sld r9,r9,r10 } add r4,r4,r3} hwsync <--- Release barrier retry: ldarx r3,0,r4 } or r10,r3,r9 } Atomic region stdcx. r10,0,r4} bne-retry } hwsync <--- Acquire barrier and r9,r9,r3 addic r3,r9,-1 subfe r3,r3,r9 blr .iso_test_and_set_bit: hwsync <--- Release barrier clrlwi r10,r3,26 } sradi r3,r3,6 } Fiddling with regs to make li r9,1} mask and address rldicr r3,r3,3,60 } sld r9,r9,r10 } add r4,r4,r3} retry: ldarx r3,0,r4 } or r10,r3,r9 } Atomic region stdcx. r10,0,r4} bne retry } isync <--- Acquire barrier and r9,r9,r3 addic r3,r9,-1 subfe r3,r3,r9 blr Moving the barrier up in the ISO case would seem to give the CPU a better chance of doing the barrier simultaneously with the register fiddling. Things to note here: (a) A BNE rather than a BNE- is emitted after the STDCX instruction. I've logged this as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71162 (b) An ISYNC instruction is emitted as the Acquire barrier with __ATOMIC_SEQ_CST, but I'm not sure this is strong enough. (c) An LWSYNC instruction is emitted as the Release barrier with __ATOMIC_ACQ_REL or __ATOMIC_RELEASE. Is this strong enough that we could use these memorders instead? (2) The compiler can pick the appropriate instruction to use, depending on the size of the memory location, the operation being applied, the value of the operand being applied (if applicable), whether the return value is used and whether any condition flags resulting from the atomic op are used. For instance, on x86_64, atomic_add() can use any of ADDL, SUBL, INCL, DECL, XADDL or CMPXCHGL. atomic_add_return() can use ADDL, SUBL, INCL or DECL if we only about the representation of the return value encoded in the flags (zero, carry, sign). If we actually want the return value, atomic_add_return() will use XADDL. So with __atomic_fetch_add(), if the return value isn't being used and if the operand is 1, INCL will be used; if >1, ADDL will be used; if -1, DECL will be used; and if <-1, SUBL will be used. Things to note here: (a) If unpatched, gcc will emit an XADDL instruction rather than a DECL instruction if it is told to subtract 1. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70821 (b) The x86 arch keeps a list of LOCK prefixes and NOP's them out when only one CPU is online, but gcc doesn't do this yet. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973 (3) The compiler can make use of extra information provided by the atomic instructions involved that can't otherwise be passed out of inline assembly - such as the condition flag values. Take atomic_dec_and_test(), for example. I can call it as follows: const char *xxx_atomic_dec_and_test(atomic_t *counter) { if (atomic_dec_and_test(counter)) return "foo"; return "bar"; } For original, non-asm-goto code, I see: nogoto_atomic_dec_and_test: lock decl (%rdi) sete %al mov$.LC1,%edx test %al,%al
[RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()
ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() since the variable it is wangling isn't an atomic_long_t. Signed-off-by: David Howellscc: Peter Hurley cc: Greg Kroah-Hartman --- drivers/tty/tty_ldsem.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c index 1bf8ed13f827..dc27b285ba4f 100644 --- a/drivers/tty/tty_ldsem.c +++ b/drivers/tty/tty_ldsem.c @@ -86,7 +86,7 @@ static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem) */ static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem) { - long tmp = atomic_long_cmpxchg(>count, *old, new); + long tmp = cmpxchg(>count, *old, new); if (tmp == *old) { *old = new; return 1;
[RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
Here's a set of patches to provide kernel atomics and bitops implemented with ISO C++11 atomic intrinsics. The second part of the set makes the x86 arch use the implementation. Note that the x86 patches are very rough. It would need to be made compile-time conditional in some way and the old code can't really be thrown away that easily - but it's a good way to make sure I'm not using that code any more. There are some advantages to using ISO C++11 atomics: (1) The compiler can make use of extra information, such as condition flags, that are tricky to get out of inline assembly in an efficient manner. This should reduce the number of instructions required in some cases - such as in x86 where we use SETcc to store the condition inside the inline asm and then CMP outside to put it back again. Whilst this can be alleviated by the use of asm-goto constructs, this adds mandatory conditional jumps where the use of CMOVcc and SETcc might be better. (2) The compiler inserts memory barriers for us and can move them earlier, within reason, thereby affording a greater chance of the CPU being able to execute the memory barrier instruction simultaneously with register-only instructions. (3) The compiler can automatically switch between different forms of an atomic instruction depending on operand size, thereby eliminating the need for large switch statements with individual blocks of inline asm. (4) The compiler can automatically switch between different available atomic instructions depending on the values in the operands (INC vs ADD) and whether the return value is examined (ADD vs XADD) and how it is examined (ADD+Jcc vs XADD+CMP+Jcc). There are some disadvantages also: (1) It's not available in gcc before gcc-4.7 and there will be some seriously suboptimal code production before gcc-7.1. (2) The compiler might misoptimise - for example, it might generate a CMPXCHG loop rather than a BTR instruction to clear a bit. (3) The C++11 memory model permits atomic instructions to be merged if appropriate - for example, two adjacent __atomic_read() calls might get merged if the return value of the first isn't examined. Making the pointers volatile alleviates this. Note that gcc doesn't do this yet. (4) The C++11 memory barriers are, in general, weaker than the kernel's memory barriers are defined to be. Whether this actually matters is arch dependent. Further, the C++11 memory barriers are acquire/release, but some arches actually have load/store instead - which may be a problem. (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so they cannot be suppressed if only one CPU is online. Things to be considered: (1) We could weaken the kernel memory model to for the benefit of arches that have instructions that employ explicit acquire/release barriers - but that may cause data races to occur based on assumptions we've already made. Note, however, that powerpc already seems to have a weaker memory model. (2) There are three sets of atomics (atomic_t, atomic64_t and atomic_long_t). I can either put each in a separate file all nicely laid out (see patch 3) or I can make a template header (see patch 4) and use #defines with it to make all three atomics from one piece of code. Which is best? The first is definitely easier to read, but the second might be easier to maintain. (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg(). I've also provided atomicX_t variants of these. These return the boolean return value from the __atomic_compare_exchange_n() function (which is carried in the Z flag on x86). Should this be rolled out even without the ISO atomic implementation? (4) The current x86_64 bitops (set_bit() and co.) are technically broken. The set_bit() function, for example, takes a 64-bit signed bit number but implements this with BTSL, presumably because it's a shorter instruction. The BTS instruction's bit number operand, however, is sized according to the memory operand, so the upper 32 bits of the bit number are discarded by BTSL. We should really be using BTSQ to implement this correctly (and gcc does just that). In practice, though, it would seem unlikely that this will ever be a problem as I think we're unlikely to have a bitset with more than ~2 billion bits in it within the kernel (it would be >256MiB in size). Patch 9 casts the pointers internally in the bitops functions to persuade the kernel to use 32-bit bitops - but this only really matters on x86. Should set_bit() and co. take int rather than long bit number arguments to make the point? I've opened a number of gcc bugzillas to improve the code generated by the atomics: (*)
[RFC PATCH 02/15] tty: ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg()
ldsem_cmpxchg() should use cmpxchg() not atomic_long_cmpxchg() since the variable it is wangling isn't an atomic_long_t. Signed-off-by: David Howells cc: Peter Hurley cc: Greg Kroah-Hartman --- drivers/tty/tty_ldsem.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c index 1bf8ed13f827..dc27b285ba4f 100644 --- a/drivers/tty/tty_ldsem.c +++ b/drivers/tty/tty_ldsem.c @@ -86,7 +86,7 @@ static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem) */ static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem) { - long tmp = atomic_long_cmpxchg(>count, *old, new); + long tmp = cmpxchg(>count, *old, new); if (tmp == *old) { *old = new; return 1;
[RFC PATCH 00/15] Provide atomics and bitops implemented with ISO C++11 atomics
Here's a set of patches to provide kernel atomics and bitops implemented with ISO C++11 atomic intrinsics. The second part of the set makes the x86 arch use the implementation. Note that the x86 patches are very rough. It would need to be made compile-time conditional in some way and the old code can't really be thrown away that easily - but it's a good way to make sure I'm not using that code any more. There are some advantages to using ISO C++11 atomics: (1) The compiler can make use of extra information, such as condition flags, that are tricky to get out of inline assembly in an efficient manner. This should reduce the number of instructions required in some cases - such as in x86 where we use SETcc to store the condition inside the inline asm and then CMP outside to put it back again. Whilst this can be alleviated by the use of asm-goto constructs, this adds mandatory conditional jumps where the use of CMOVcc and SETcc might be better. (2) The compiler inserts memory barriers for us and can move them earlier, within reason, thereby affording a greater chance of the CPU being able to execute the memory barrier instruction simultaneously with register-only instructions. (3) The compiler can automatically switch between different forms of an atomic instruction depending on operand size, thereby eliminating the need for large switch statements with individual blocks of inline asm. (4) The compiler can automatically switch between different available atomic instructions depending on the values in the operands (INC vs ADD) and whether the return value is examined (ADD vs XADD) and how it is examined (ADD+Jcc vs XADD+CMP+Jcc). There are some disadvantages also: (1) It's not available in gcc before gcc-4.7 and there will be some seriously suboptimal code production before gcc-7.1. (2) The compiler might misoptimise - for example, it might generate a CMPXCHG loop rather than a BTR instruction to clear a bit. (3) The C++11 memory model permits atomic instructions to be merged if appropriate - for example, two adjacent __atomic_read() calls might get merged if the return value of the first isn't examined. Making the pointers volatile alleviates this. Note that gcc doesn't do this yet. (4) The C++11 memory barriers are, in general, weaker than the kernel's memory barriers are defined to be. Whether this actually matters is arch dependent. Further, the C++11 memory barriers are acquire/release, but some arches actually have load/store instead - which may be a problem. (5) On x86, the compiler doesn't tell you where the LOCK prefixes are so they cannot be suppressed if only one CPU is online. Things to be considered: (1) We could weaken the kernel memory model to for the benefit of arches that have instructions that employ explicit acquire/release barriers - but that may cause data races to occur based on assumptions we've already made. Note, however, that powerpc already seems to have a weaker memory model. (2) There are three sets of atomics (atomic_t, atomic64_t and atomic_long_t). I can either put each in a separate file all nicely laid out (see patch 3) or I can make a template header (see patch 4) and use #defines with it to make all three atomics from one piece of code. Which is best? The first is definitely easier to read, but the second might be easier to maintain. (3) I've added cmpxchg_return() and try_cmpxchg() to replace cmpxchg(). I've also provided atomicX_t variants of these. These return the boolean return value from the __atomic_compare_exchange_n() function (which is carried in the Z flag on x86). Should this be rolled out even without the ISO atomic implementation? (4) The current x86_64 bitops (set_bit() and co.) are technically broken. The set_bit() function, for example, takes a 64-bit signed bit number but implements this with BTSL, presumably because it's a shorter instruction. The BTS instruction's bit number operand, however, is sized according to the memory operand, so the upper 32 bits of the bit number are discarded by BTSL. We should really be using BTSQ to implement this correctly (and gcc does just that). In practice, though, it would seem unlikely that this will ever be a problem as I think we're unlikely to have a bitset with more than ~2 billion bits in it within the kernel (it would be >256MiB in size). Patch 9 casts the pointers internally in the bitops functions to persuade the kernel to use 32-bit bitops - but this only really matters on x86. Should set_bit() and co. take int rather than long bit number arguments to make the point? I've opened a number of gcc bugzillas to improve the code generated by the atomics: (*)
[RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics
cmpxchg_local() is not signed-value safe because on a 64-bit machine signed int arguments to it may be sign-extended to signed long _before_ begin cast to unsigned long. This potentially causes comparisons to fail when dealing with negative values. Fix the generic atomic functions that are implemented in terms of cmpxchg() to cast their arguments to unsigned int before calling cmpxchg(). Signed-off-by: David Howells--- include/asm-generic/atomic.h | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h index 74f1a3704d7a..e6c71c52edfe 100644 --- a/include/asm-generic/atomic.h +++ b/include/asm-generic/atomic.h @@ -37,28 +37,33 @@ #ifdef CONFIG_SMP -/* we can build all atomic primitives from cmpxchg */ +/* + * We can build all atomic primitives from cmpxchg(), but we have to beware of + * implicit casting of signed int parameters to signed long and thence to + * unsigned long on a 64-bit machine if we don't explicitly cast to unsigned + * int. + */ #define ATOMIC_OP(op, c_op)\ static inline void atomic_##op(int i, atomic_t *v) \ { \ - int c, old; \ + unsigned int c, old;\ \ c = v->counter; \ - while ((old = cmpxchg(>counter, c, c c_op i)) != c) \ + while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \ c = old;\ } #define ATOMIC_OP_RETURN(op, c_op) \ static inline int atomic_##op##_return(int i, atomic_t *v) \ { \ - int c, old; \ + unsigned int c, old;\ \ c = v->counter; \ - while ((old = cmpxchg(>counter, c, c c_op i)) != c) \ + while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \ c = old;\ \ - return c c_op i;\ + return c c_op (unsigned int)i; \ } #else
[RFC PATCH 01/15] cmpxchg_local() is not signed-value safe, so fix generic atomics
cmpxchg_local() is not signed-value safe because on a 64-bit machine signed int arguments to it may be sign-extended to signed long _before_ begin cast to unsigned long. This potentially causes comparisons to fail when dealing with negative values. Fix the generic atomic functions that are implemented in terms of cmpxchg() to cast their arguments to unsigned int before calling cmpxchg(). Signed-off-by: David Howells --- include/asm-generic/atomic.h | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h index 74f1a3704d7a..e6c71c52edfe 100644 --- a/include/asm-generic/atomic.h +++ b/include/asm-generic/atomic.h @@ -37,28 +37,33 @@ #ifdef CONFIG_SMP -/* we can build all atomic primitives from cmpxchg */ +/* + * We can build all atomic primitives from cmpxchg(), but we have to beware of + * implicit casting of signed int parameters to signed long and thence to + * unsigned long on a 64-bit machine if we don't explicitly cast to unsigned + * int. + */ #define ATOMIC_OP(op, c_op)\ static inline void atomic_##op(int i, atomic_t *v) \ { \ - int c, old; \ + unsigned int c, old;\ \ c = v->counter; \ - while ((old = cmpxchg(>counter, c, c c_op i)) != c) \ + while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \ c = old;\ } #define ATOMIC_OP_RETURN(op, c_op) \ static inline int atomic_##op##_return(int i, atomic_t *v) \ { \ - int c, old; \ + unsigned int c, old;\ \ c = v->counter; \ - while ((old = cmpxchg(>counter, c, c c_op i)) != c) \ + while ((old = cmpxchg(>counter, c, c c_op (unsigned int)i)) != c) \ c = old;\ \ - return c c_op i;\ + return c c_op (unsigned int)i; \ } #else
[RFC v2 5/7] iio: inv_mpu6050: Add support for auxiliary I2C master
The MPU has an auxiliary I2C bus for connecting external sensors. This bus has two operating modes: * bypasss: which connects the primary and auxiliary busses together. This is already supported via an i2c mux. * master: where the MPU acts as a master to any external connected sensors. This is implemented by this patch. This I2C master mode also works when the MPU itself is connected via SPI. I2C master supports up to 5 slaves. Slaves 0-3 have a common operating mode while slave 4 is different. This patch implements an i2c adapter using slave 4. Signed-off-by: Crestez Dan Leonard--- .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 66 +- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 258 - drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 36 +++ drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 8 - 4 files changed, 355 insertions(+), 13 deletions(-) diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt index a9fc11e..778d076 100644 --- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt +++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt @@ -1,16 +1,31 @@ InvenSense MPU-6050 Six-Axis (Gyro + Accelerometer) MEMS MotionTracking Device -http://www.invensense.com/mems/gyro/mpu6050.html - Required properties: - - compatible : should be "invensense,mpu6050" - - reg : the I2C address of the sensor + - compatible : one of "invensense,mpu6000", "invensense,mpu6050", + "invensense,mpu6000" or "invensense,mpu9150" + - reg : the I2C or SPI address of the sensor - interrupt-parent : should be the phandle for the interrupt controller - interrupts : interrupt mapping for GPIO IRQ Optional properties: - mount-matrix: an optional 3x3 mounting rotation matrix + - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is bypass). + +The MPU has an auxiliary i2c bus for additional sensors. Devices attached this +way can be described as for a regular linux i2c bus. + +It is possible to interact with aux devices in "bypass" or "master" mode. In +"bypass" mode the auxiliary SDA/SCL lines are connected directly to the main i2c +interface. In "master" mode access to aux devices is done by instructing the +MPU to read or write i2c bytes. + +In "bypass" mode aux devices are listed behind a "i2c@0" node with reg = <0>; +In "master" mode aux devices are listed behind a "i2c@1" node with reg = <1>; +The master and bypass modes are not supported at the same time. The +"invensense,i2c-aux-master" property must be set to activate master mode. +Bypass mode is generally faster but master mode also works when the MPU is +connected via SPI. Example: mpu6050@68 { @@ -28,3 +43,46 @@ Example: "0", /* y2 */ "0.984807753012208"; /* z2 */ }; + +Example describing mpu9150 (which includes an ak9875 on chip): + mpu9150@68 { + compatible = "invensense,mpu9150"; + reg = <0x68>; + interrupt-parent = <>; + interrupts = <18 1>; + + #address-cells = <1>; + #size-cells = <0>; + i2c@0 { + reg = <0>; + #address-cells = <1>; + #size-cells = <0>; + + ak8975@0c { + compatible = "ak,ak8975"; + reg = <0x0c>; + }; + }; + }; + +Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: + mpu6500@0 { + compatible = "invensense,mpu6500"; + reg = <0x0>; + spi-max-frequency = <100>; + interrupt-parent = <>; + interrupts = <31 1>; + + #address-cells = <1>; + #size-cells = <0>; + i2c@1 { + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + + hmc5883l@1e { + compatible = "honeywell,hmc5883l"; + reg = <0x1e>; + }; + }; + }; diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c index 5918c23..76683b8 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c @@ -25,6 +25,8 @@ #include #include #include +#include + #include "inv_mpu_iio.h" /* @@ -57,6 +59,7 @@ static const struct inv_mpu6050_reg_map reg_set_6500 = { .int_pin_cfg= INV_MPU6050_REG_INT_PIN_CFG, .accl_offset= INV_MPU6500_REG_ACCEL_OFFSET, .gyro_offset= INV_MPU6050_REG_GYRO_OFFSET, + .mst_status =
[RFC v2 5/7] iio: inv_mpu6050: Add support for auxiliary I2C master
The MPU has an auxiliary I2C bus for connecting external sensors. This bus has two operating modes: * bypasss: which connects the primary and auxiliary busses together. This is already supported via an i2c mux. * master: where the MPU acts as a master to any external connected sensors. This is implemented by this patch. This I2C master mode also works when the MPU itself is connected via SPI. I2C master supports up to 5 slaves. Slaves 0-3 have a common operating mode while slave 4 is different. This patch implements an i2c adapter using slave 4. Signed-off-by: Crestez Dan Leonard --- .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 66 +- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 258 - drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 36 +++ drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 8 - 4 files changed, 355 insertions(+), 13 deletions(-) diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt index a9fc11e..778d076 100644 --- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt +++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt @@ -1,16 +1,31 @@ InvenSense MPU-6050 Six-Axis (Gyro + Accelerometer) MEMS MotionTracking Device -http://www.invensense.com/mems/gyro/mpu6050.html - Required properties: - - compatible : should be "invensense,mpu6050" - - reg : the I2C address of the sensor + - compatible : one of "invensense,mpu6000", "invensense,mpu6050", + "invensense,mpu6000" or "invensense,mpu9150" + - reg : the I2C or SPI address of the sensor - interrupt-parent : should be the phandle for the interrupt controller - interrupts : interrupt mapping for GPIO IRQ Optional properties: - mount-matrix: an optional 3x3 mounting rotation matrix + - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is bypass). + +The MPU has an auxiliary i2c bus for additional sensors. Devices attached this +way can be described as for a regular linux i2c bus. + +It is possible to interact with aux devices in "bypass" or "master" mode. In +"bypass" mode the auxiliary SDA/SCL lines are connected directly to the main i2c +interface. In "master" mode access to aux devices is done by instructing the +MPU to read or write i2c bytes. + +In "bypass" mode aux devices are listed behind a "i2c@0" node with reg = <0>; +In "master" mode aux devices are listed behind a "i2c@1" node with reg = <1>; +The master and bypass modes are not supported at the same time. The +"invensense,i2c-aux-master" property must be set to activate master mode. +Bypass mode is generally faster but master mode also works when the MPU is +connected via SPI. Example: mpu6050@68 { @@ -28,3 +43,46 @@ Example: "0", /* y2 */ "0.984807753012208"; /* z2 */ }; + +Example describing mpu9150 (which includes an ak9875 on chip): + mpu9150@68 { + compatible = "invensense,mpu9150"; + reg = <0x68>; + interrupt-parent = <>; + interrupts = <18 1>; + + #address-cells = <1>; + #size-cells = <0>; + i2c@0 { + reg = <0>; + #address-cells = <1>; + #size-cells = <0>; + + ak8975@0c { + compatible = "ak,ak8975"; + reg = <0x0c>; + }; + }; + }; + +Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: + mpu6500@0 { + compatible = "invensense,mpu6500"; + reg = <0x0>; + spi-max-frequency = <100>; + interrupt-parent = <>; + interrupts = <31 1>; + + #address-cells = <1>; + #size-cells = <0>; + i2c@1 { + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + + hmc5883l@1e { + compatible = "honeywell,hmc5883l"; + reg = <0x1e>; + }; + }; + }; diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c index 5918c23..76683b8 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c @@ -25,6 +25,8 @@ #include #include #include +#include + #include "inv_mpu_iio.h" /* @@ -57,6 +59,7 @@ static const struct inv_mpu6050_reg_map reg_set_6500 = { .int_pin_cfg= INV_MPU6050_REG_INT_PIN_CFG, .accl_offset= INV_MPU6500_REG_ACCEL_OFFSET, .gyro_offset= INV_MPU6050_REG_GYRO_OFFSET, + .mst_status = INV_MPU6050_REG_I2C_MST_STATUS, };
[PATCH v2 6/7] iio: inv_mpu6050: Reformat sample for active scan mask
Right now it is possible to only enable some of the x/y/z channels, for example you can enable accel_z without x or y but if you actually do that what you get is actually only the x channel. Fix this by reformatting the hardware sample to only include the requested channels. Signed-off-by: Crestez Dan Leonard--- drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 8 +++ drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 107 - 2 files changed, 112 insertions(+), 3 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h index 85f9b50..ec1401d 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h @@ -152,6 +152,11 @@ struct inv_mpu6050_state { /* Value of I2C_MST_STATUS after slv4_done */ u8 slv4_done_status; #endif + +#define INV_MPU6050_MAX_SCAN_ELEMENTS 7 + unsigned int scan_offsets[INV_MPU6050_MAX_SCAN_ELEMENTS]; + unsigned int scan_lengths[INV_MPU6050_MAX_SCAN_ELEMENTS]; + bool realign_required; }; /*register and associated bit definition*/ @@ -274,6 +279,9 @@ enum inv_mpu6050_scan { INV_MPU6050_SCAN_TIMESTAMP, }; +#define INV_MPU6050_SCAN_MASK_ACCEL0x07 +#define INV_MPU6050_SCAN_MASK_GYRO 0x38 + enum inv_mpu6050_filter_e { INV_MPU6050_FILTER_256HZ_NOLPF2 = 0, INV_MPU6050_FILTER_188HZ, diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c index 56ee1e2..49e503c 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c @@ -24,6 +24,90 @@ #include #include "inv_mpu_iio.h" +static void inv_mpu_get_scan_offsets( + struct iio_dev *indio_dev, + const unsigned long *mask, + const unsigned int masklen, + unsigned int *scan_offsets) +{ + unsigned int pos = 0; + + if (*mask & INV_MPU6050_SCAN_MASK_ACCEL) { + scan_offsets[INV_MPU6050_SCAN_ACCL_X] = pos + 0; + scan_offsets[INV_MPU6050_SCAN_ACCL_Y] = pos + 2; + scan_offsets[INV_MPU6050_SCAN_ACCL_Z] = pos + 4; + pos += 6; + } + if (*mask & INV_MPU6050_SCAN_MASK_GYRO) { + scan_offsets[INV_MPU6050_SCAN_GYRO_X] = pos + 0; + scan_offsets[INV_MPU6050_SCAN_GYRO_Y] = pos + 2; + scan_offsets[INV_MPU6050_SCAN_GYRO_Z] = pos + 4; + pos += 6; + } +} + +/* This is slowish but relatively common */ +static const struct iio_chan_spec * +iio_chan_by_scan_index(struct iio_dev *indio_dev, int index) +{ + int i; + + for (i = 0; i < indio_dev->num_channels; ++i) + if (indio_dev->channels[i].scan_index == index) + return _dev->channels[i]; + + return NULL; +} + +static int iio_check_scan_offsets_aligned( + struct iio_dev *indio_dev, + const unsigned long *mask, + unsigned int masklen, + unsigned int *scan_offsets) +{ + int scan_index; + unsigned int pos = 0; + const struct iio_chan_spec *chan; + + for_each_set_bit(scan_index, mask, masklen) { + chan = iio_chan_by_scan_index(indio_dev, scan_index); + if (scan_offsets[scan_index] != pos) + return false; + pos = ALIGN(pos + chan->scan_type.storagebits / 8, + chan->scan_type.storagebits / 8); + } + + return true; +} + +static void iio_get_scan_lengths(struct iio_dev *indio_dev, unsigned int *scan_length) +{ + int i; + const struct iio_chan_spec *chan; + + for (i = 0; i < indio_dev->num_channels; ++i) { + chan = _dev->channels[i]; + if (chan->scan_index < 0) + continue; + scan_length[chan->scan_index] = chan->scan_type.storagebits / 8; + } +} + +static void iio_realign_sample( + struct iio_dev *indio_dev, + const u8 *ibuf, u8 *obuf, + const unsigned int *scan_offset, + const unsigned int *scan_lengths) +{ + unsigned int pos = 0; + int i; + + for_each_set_bit(i, indio_dev->active_scan_mask, indio_dev->masklength) { + memcpy([pos], [scan_offset[i]], scan_lengths[i]); + pos = ALIGN(pos + scan_lengths[i], scan_lengths[i]); + } +} + static void inv_clear_kfifo(struct inv_mpu6050_state *st) { unsigned long flags; @@ -38,7 +122,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) { int result; u8 d; - struct inv_mpu6050_state *st = iio_priv(indio_dev); + struct inv_mpu6050_state *st = iio_priv(indio_dev); + const unsigned long *mask = indio_dev->active_scan_mask; + unsigned int masklen = indio_dev->masklength; /* disable interrupt */
[PATCH v2 6/7] iio: inv_mpu6050: Reformat sample for active scan mask
Right now it is possible to only enable some of the x/y/z channels, for example you can enable accel_z without x or y but if you actually do that what you get is actually only the x channel. Fix this by reformatting the hardware sample to only include the requested channels. Signed-off-by: Crestez Dan Leonard --- drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 8 +++ drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 107 - 2 files changed, 112 insertions(+), 3 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h index 85f9b50..ec1401d 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h @@ -152,6 +152,11 @@ struct inv_mpu6050_state { /* Value of I2C_MST_STATUS after slv4_done */ u8 slv4_done_status; #endif + +#define INV_MPU6050_MAX_SCAN_ELEMENTS 7 + unsigned int scan_offsets[INV_MPU6050_MAX_SCAN_ELEMENTS]; + unsigned int scan_lengths[INV_MPU6050_MAX_SCAN_ELEMENTS]; + bool realign_required; }; /*register and associated bit definition*/ @@ -274,6 +279,9 @@ enum inv_mpu6050_scan { INV_MPU6050_SCAN_TIMESTAMP, }; +#define INV_MPU6050_SCAN_MASK_ACCEL0x07 +#define INV_MPU6050_SCAN_MASK_GYRO 0x38 + enum inv_mpu6050_filter_e { INV_MPU6050_FILTER_256HZ_NOLPF2 = 0, INV_MPU6050_FILTER_188HZ, diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c index 56ee1e2..49e503c 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c @@ -24,6 +24,90 @@ #include #include "inv_mpu_iio.h" +static void inv_mpu_get_scan_offsets( + struct iio_dev *indio_dev, + const unsigned long *mask, + const unsigned int masklen, + unsigned int *scan_offsets) +{ + unsigned int pos = 0; + + if (*mask & INV_MPU6050_SCAN_MASK_ACCEL) { + scan_offsets[INV_MPU6050_SCAN_ACCL_X] = pos + 0; + scan_offsets[INV_MPU6050_SCAN_ACCL_Y] = pos + 2; + scan_offsets[INV_MPU6050_SCAN_ACCL_Z] = pos + 4; + pos += 6; + } + if (*mask & INV_MPU6050_SCAN_MASK_GYRO) { + scan_offsets[INV_MPU6050_SCAN_GYRO_X] = pos + 0; + scan_offsets[INV_MPU6050_SCAN_GYRO_Y] = pos + 2; + scan_offsets[INV_MPU6050_SCAN_GYRO_Z] = pos + 4; + pos += 6; + } +} + +/* This is slowish but relatively common */ +static const struct iio_chan_spec * +iio_chan_by_scan_index(struct iio_dev *indio_dev, int index) +{ + int i; + + for (i = 0; i < indio_dev->num_channels; ++i) + if (indio_dev->channels[i].scan_index == index) + return _dev->channels[i]; + + return NULL; +} + +static int iio_check_scan_offsets_aligned( + struct iio_dev *indio_dev, + const unsigned long *mask, + unsigned int masklen, + unsigned int *scan_offsets) +{ + int scan_index; + unsigned int pos = 0; + const struct iio_chan_spec *chan; + + for_each_set_bit(scan_index, mask, masklen) { + chan = iio_chan_by_scan_index(indio_dev, scan_index); + if (scan_offsets[scan_index] != pos) + return false; + pos = ALIGN(pos + chan->scan_type.storagebits / 8, + chan->scan_type.storagebits / 8); + } + + return true; +} + +static void iio_get_scan_lengths(struct iio_dev *indio_dev, unsigned int *scan_length) +{ + int i; + const struct iio_chan_spec *chan; + + for (i = 0; i < indio_dev->num_channels; ++i) { + chan = _dev->channels[i]; + if (chan->scan_index < 0) + continue; + scan_length[chan->scan_index] = chan->scan_type.storagebits / 8; + } +} + +static void iio_realign_sample( + struct iio_dev *indio_dev, + const u8 *ibuf, u8 *obuf, + const unsigned int *scan_offset, + const unsigned int *scan_lengths) +{ + unsigned int pos = 0; + int i; + + for_each_set_bit(i, indio_dev->active_scan_mask, indio_dev->masklength) { + memcpy([pos], [scan_offset[i]], scan_lengths[i]); + pos = ALIGN(pos + scan_lengths[i], scan_lengths[i]); + } +} + static void inv_clear_kfifo(struct inv_mpu6050_state *st) { unsigned long flags; @@ -38,7 +122,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) { int result; u8 d; - struct inv_mpu6050_state *st = iio_priv(indio_dev); + struct inv_mpu6050_state *st = iio_priv(indio_dev); + const unsigned long *mask = indio_dev->active_scan_mask; + unsigned int masklen = indio_dev->masklength; /* disable interrupt */ result =
Re: [PATCH] ARM: exynos: don't select keyboard driver
On 05/18/2016 04:17 PM, Arnd Bergmann wrote: > The samsung-keypad driver is implicitly selected by ARCH_EXYNOS4 (why?), > but this fails if CONFIG_INPUT is a loadable module: > > drivers/input/built-in.o: In function `samsung_keypad_remove': > drivers/input/keyboard/samsung-keypad.c:461: undefined reference to > `input_unregister_device' > drivers/input/built-in.o: In function `samsung_keypad_irq': > drivers/input/keyboard/samsung-keypad.c:137: undefined reference to > `input_event' > drivers/input/built-in.o: In function `samsung_keypad_irq': > include/linux/input.h:389: undefined reference to `input_event' > drivers/input/built-in.o: In function `samsung_keypad_probe': > drivers/input/keyboard/samsung-keypad.c:358: undefined reference to > `devm_input_allocate_device' > drivers/input/built-in.o:(.debug_addr+0x34): undefined reference to > `input_set_capability' > > This removes the 'select' as suggested by Krzysztof Kozlowski and > instead enables the driver from the defconfig files. > > The problem does not happen on mainline kernels, as we don't normally > build built-in input drivers when CONFIG_INPUT=m, but I am experimenting > with a patch to change this, and the samsung keypad driver showed up > as one example that was silently broken before. > > Signed-off-by: Arnd Bergmann> Link: https://lkml.org/lkml/2016/2/14/55 > --- > arch/arm/configs/exynos_defconfig | 1 + > arch/arm/configs/multi_v7_defconfig | 1 + > arch/arm/mach-exynos/Kconfig| 1 - > 3 files changed, 2 insertions(+), 1 deletion(-) Thanks for update. Applied for v4.7 (same reason as in plat-samsung/devs.c). Best regards, Krzysztof
[RFC v2 7/7] iio: inv_mpu6050: Expose channels from slave sensors
This works by copying the iio_chan_specs from the slave device and republishing them as if they belonged to the MPU itself. All read/write_raw operations are forwarded to the other driver. The original device is still registered with linux as a normal driver and works normally and you can poke at it to configure stuff like sample rates and scaling factors. Buffer values are read from the MPU fifo, allowing a much higher sampling rate. Signed-off-by: Crestez Dan Leonard--- .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 47 ++- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 387 - drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 74 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 65 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 9 + 5 files changed, 556 insertions(+), 26 deletions(-) diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt index 778d076..07d572a 100644 --- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt +++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt @@ -11,8 +11,8 @@ Optional properties: - mount-matrix: an optional 3x3 mounting rotation matrix - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is bypass). -The MPU has an auxiliary i2c bus for additional sensors. Devices attached this -way can be described as for a regular linux i2c bus. +It is possible to attach auxiliary sensors to the MPU and have them be handled +by linux. Those auxiliary sensors are described as an i2c bus. It is possible to interact with aux devices in "bypass" or "master" mode. In "bypass" mode the auxiliary SDA/SCL lines are connected directly to the main i2c @@ -25,7 +25,31 @@ In "master" mode aux devices are listed behind a "i2c@1" node with reg = <1>; The master and bypass modes are not supported at the same time. The "invensense,i2c-aux-master" property must be set to activate master mode. Bypass mode is generally faster but master mode also works when the MPU is -connected via SPI. +connected via SPI. Enabling master mode is required for automated external +readings. + + +It is possible to configure the MPU to automatically fetch reading for +devices connected on the auxiliary i2c bus without CPU intervention. When this +is done the driver will present the channels of the slaved devices as if they +belong to the MPU device itself. + +Reads and write to config values (like scaling factors) are forwarded to the +other driver while data reads are handled from MPU registers. The channels are +also available through the MPU's iio triggered buffer mechanism. This feature +can allow sampling up to 24 bytes from 4 slaves at a much higher rate. + +This is specified in device tree as an "invensense,external-sensors" node with +children indexed by slave ids 0 to 3. The child nodes identifying each external +sensor reading have the following properties: + - reg: 0 to 3 slave index + - invensense,addr : the I2C address to read from + - invensense,reg : the I2C register to read from + - invensense,len : read length from the device + - invensense,external-channels : list of slaved channels specified by scan_index. + +The sum of storagebits for external channels must equal the read length. Only +16bit channels are currently supported. Example: mpu6050@68 { @@ -65,7 +89,8 @@ Example describing mpu9150 (which includes an ak9875 on chip): }; }; -Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: +Example describing a mpu6500 via SPI with an hmc5883l on aux i2c configured for +automatic external readings as slave 0: mpu6500@0 { compatible = "invensense,mpu6500"; reg = <0x0>; @@ -73,6 +98,8 @@ Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: interrupt-parent = <>; interrupts = <31 1>; + invensense,i2c-aux-master; + #address-cells = <1>; #size-cells = <0>; i2c@1 { @@ -85,4 +112,16 @@ Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: reg = <0x1e>; }; }; + + invensense,external-sensors { + #address-cells = <1>; + #size-cells = <0>; + hmc5833l@0 { + reg = <0>; + invensense,addr = <0x1e>; + invensense,reg = <3>; + invensense,len = <6>; + invensense,external-channels = <0 1 2>; + }; + }; }; diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c index 76683b8..715b2e4 100644 ---
[PATCH] thermal: add the note for set_trip_temp
Fixes commit 60f9ce3ada53 ("thermal: of-thermal: allow setting trip_temp on hardware") Signed-off-by: Caesar WangCc: Zhang Rui Cc: Eduardo Valentin Cc: linux...@vger.kernel.org --- include/linux/thermal.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/thermal.h b/include/linux/thermal.h index e45abe7..ee517be 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -335,6 +335,8 @@ struct thermal_genl_event { * @get_trend: a pointer to a function that reads the sensor temperature trend. * @set_emul_temp: a pointer to a function that sets sensor emulated *temperature. + * @set_trip_temp: a pointer to a function that sets the trip temperature on + *hardware. */ struct thermal_zone_of_device_ops { int (*get_temp)(void *, int *); -- 2.7.4
Re: [PATCH] ARM: exynos: don't select keyboard driver
On 05/18/2016 04:17 PM, Arnd Bergmann wrote: > The samsung-keypad driver is implicitly selected by ARCH_EXYNOS4 (why?), > but this fails if CONFIG_INPUT is a loadable module: > > drivers/input/built-in.o: In function `samsung_keypad_remove': > drivers/input/keyboard/samsung-keypad.c:461: undefined reference to > `input_unregister_device' > drivers/input/built-in.o: In function `samsung_keypad_irq': > drivers/input/keyboard/samsung-keypad.c:137: undefined reference to > `input_event' > drivers/input/built-in.o: In function `samsung_keypad_irq': > include/linux/input.h:389: undefined reference to `input_event' > drivers/input/built-in.o: In function `samsung_keypad_probe': > drivers/input/keyboard/samsung-keypad.c:358: undefined reference to > `devm_input_allocate_device' > drivers/input/built-in.o:(.debug_addr+0x34): undefined reference to > `input_set_capability' > > This removes the 'select' as suggested by Krzysztof Kozlowski and > instead enables the driver from the defconfig files. > > The problem does not happen on mainline kernels, as we don't normally > build built-in input drivers when CONFIG_INPUT=m, but I am experimenting > with a patch to change this, and the samsung keypad driver showed up > as one example that was silently broken before. > > Signed-off-by: Arnd Bergmann > Link: https://lkml.org/lkml/2016/2/14/55 > --- > arch/arm/configs/exynos_defconfig | 1 + > arch/arm/configs/multi_v7_defconfig | 1 + > arch/arm/mach-exynos/Kconfig| 1 - > 3 files changed, 2 insertions(+), 1 deletion(-) Thanks for update. Applied for v4.7 (same reason as in plat-samsung/devs.c). Best regards, Krzysztof
[RFC v2 7/7] iio: inv_mpu6050: Expose channels from slave sensors
This works by copying the iio_chan_specs from the slave device and republishing them as if they belonged to the MPU itself. All read/write_raw operations are forwarded to the other driver. The original device is still registered with linux as a normal driver and works normally and you can poke at it to configure stuff like sample rates and scaling factors. Buffer values are read from the MPU fifo, allowing a much higher sampling rate. Signed-off-by: Crestez Dan Leonard --- .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 47 ++- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 387 - drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 74 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 65 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 9 + 5 files changed, 556 insertions(+), 26 deletions(-) diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt index 778d076..07d572a 100644 --- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt +++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt @@ -11,8 +11,8 @@ Optional properties: - mount-matrix: an optional 3x3 mounting rotation matrix - invensense,i2c-aux-master: operate aux i2c in "master" mode (default is bypass). -The MPU has an auxiliary i2c bus for additional sensors. Devices attached this -way can be described as for a regular linux i2c bus. +It is possible to attach auxiliary sensors to the MPU and have them be handled +by linux. Those auxiliary sensors are described as an i2c bus. It is possible to interact with aux devices in "bypass" or "master" mode. In "bypass" mode the auxiliary SDA/SCL lines are connected directly to the main i2c @@ -25,7 +25,31 @@ In "master" mode aux devices are listed behind a "i2c@1" node with reg = <1>; The master and bypass modes are not supported at the same time. The "invensense,i2c-aux-master" property must be set to activate master mode. Bypass mode is generally faster but master mode also works when the MPU is -connected via SPI. +connected via SPI. Enabling master mode is required for automated external +readings. + + +It is possible to configure the MPU to automatically fetch reading for +devices connected on the auxiliary i2c bus without CPU intervention. When this +is done the driver will present the channels of the slaved devices as if they +belong to the MPU device itself. + +Reads and write to config values (like scaling factors) are forwarded to the +other driver while data reads are handled from MPU registers. The channels are +also available through the MPU's iio triggered buffer mechanism. This feature +can allow sampling up to 24 bytes from 4 slaves at a much higher rate. + +This is specified in device tree as an "invensense,external-sensors" node with +children indexed by slave ids 0 to 3. The child nodes identifying each external +sensor reading have the following properties: + - reg: 0 to 3 slave index + - invensense,addr : the I2C address to read from + - invensense,reg : the I2C register to read from + - invensense,len : read length from the device + - invensense,external-channels : list of slaved channels specified by scan_index. + +The sum of storagebits for external channels must equal the read length. Only +16bit channels are currently supported. Example: mpu6050@68 { @@ -65,7 +89,8 @@ Example describing mpu9150 (which includes an ak9875 on chip): }; }; -Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: +Example describing a mpu6500 via SPI with an hmc5883l on aux i2c configured for +automatic external readings as slave 0: mpu6500@0 { compatible = "invensense,mpu6500"; reg = <0x0>; @@ -73,6 +98,8 @@ Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: interrupt-parent = <>; interrupts = <31 1>; + invensense,i2c-aux-master; + #address-cells = <1>; #size-cells = <0>; i2c@1 { @@ -85,4 +112,16 @@ Example describing a mpu6500 via SPI with an hmc5883l on auxiliary i2c: reg = <0x1e>; }; }; + + invensense,external-sensors { + #address-cells = <1>; + #size-cells = <0>; + hmc5833l@0 { + reg = <0>; + invensense,addr = <0x1e>; + invensense,reg = <3>; + invensense,len = <6>; + invensense,external-channels = <0 1 2>; + }; + }; }; diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c index 76683b8..715b2e4 100644 ---
[PATCH] thermal: add the note for set_trip_temp
Fixes commit 60f9ce3ada53 ("thermal: of-thermal: allow setting trip_temp on hardware") Signed-off-by: Caesar Wang Cc: Zhang Rui Cc: Eduardo Valentin Cc: linux...@vger.kernel.org --- include/linux/thermal.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/thermal.h b/include/linux/thermal.h index e45abe7..ee517be 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -335,6 +335,8 @@ struct thermal_genl_event { * @get_trend: a pointer to a function that reads the sensor temperature trend. * @set_emul_temp: a pointer to a function that sets sensor emulated *temperature. + * @set_trip_temp: a pointer to a function that sets the trip temperature on + *hardware. */ struct thermal_zone_of_device_ops { int (*get_temp)(void *, int *); -- 2.7.4
Re: UBSAN whinge in ihci-hub.c
2016-05-18 17:40 GMT+03:00 Alan Stern: > All right, I'm getting very tired of all these bug reports. Besides, > Andrey has a point: Unless you're Linus, arguing against the C standard > is futile. (Even though the language dialect used in the kernel is not > standard C.) > > Does this patch make UBSAN happy? The runtime overhead is minimal. > It does. However, you could fool ubsan way more easy: u32 __iomem *hostpc_reg = ehci->regs->hostpc + (wIndex & 0xff) - 1; > Alan Stern > > > > Index: usb-4.x/drivers/usb/host/ehci-hub.c > === > --- usb-4.x.orig/drivers/usb/host/ehci-hub.c > +++ usb-4.x/drivers/usb/host/ehci-hub.c > @@ -872,14 +872,17 @@ int ehci_hub_control( > ) { > struct ehci_hcd *ehci = hcd_to_ehci (hcd); > int ports = HCS_N_PORTS (ehci->hcs_params); > - u32 __iomem *status_reg = >regs->port_status[ > - (wIndex & 0xff) - 1]; > - u32 __iomem *hostpc_reg = >regs->hostpc[(wIndex & 0xff) - > 1]; > + u32 __iomem *status_reg, *hostpc_reg; > u32 temp, temp1, status; > unsigned long flags; > int retval = 0; > unsignedselector; > > + temp = wIndex & 0xff; > + temp -= (temp > 0); > + status_reg = >regs->port_status[temp]; > + hostpc_reg = >regs->hostpc[temp]; > + > /* > * FIXME: support SetPortFeatures USB_PORT_FEAT_INDICATOR. > * HCS_INDICATOR may say we can change LEDs to off/amber/green. >
Re: UBSAN whinge in ihci-hub.c
2016-05-18 17:40 GMT+03:00 Alan Stern : > All right, I'm getting very tired of all these bug reports. Besides, > Andrey has a point: Unless you're Linus, arguing against the C standard > is futile. (Even though the language dialect used in the kernel is not > standard C.) > > Does this patch make UBSAN happy? The runtime overhead is minimal. > It does. However, you could fool ubsan way more easy: u32 __iomem *hostpc_reg = ehci->regs->hostpc + (wIndex & 0xff) - 1; > Alan Stern > > > > Index: usb-4.x/drivers/usb/host/ehci-hub.c > === > --- usb-4.x.orig/drivers/usb/host/ehci-hub.c > +++ usb-4.x/drivers/usb/host/ehci-hub.c > @@ -872,14 +872,17 @@ int ehci_hub_control( > ) { > struct ehci_hcd *ehci = hcd_to_ehci (hcd); > int ports = HCS_N_PORTS (ehci->hcs_params); > - u32 __iomem *status_reg = >regs->port_status[ > - (wIndex & 0xff) - 1]; > - u32 __iomem *hostpc_reg = >regs->hostpc[(wIndex & 0xff) - > 1]; > + u32 __iomem *status_reg, *hostpc_reg; > u32 temp, temp1, status; > unsigned long flags; > int retval = 0; > unsignedselector; > > + temp = wIndex & 0xff; > + temp -= (temp > 0); > + status_reg = >regs->port_status[temp]; > + hostpc_reg = >regs->hostpc[temp]; > + > /* > * FIXME: support SetPortFeatures USB_PORT_FEAT_INDICATOR. > * HCS_INDICATOR may say we can change LEDs to off/amber/green. >
Re: [PATCH v6 0/3] auxdisplay: Introduce ht16k33 driver
On Wed, May 18, 2016 at 02:23:16PM +0200, Robin van der Gracht wrote: > This patchset adds a new driver to the auxdisplay subsystem. It also adds > devicetree bindings documentation and a new vendor prefix. > > I added myself as maintainer to the MAINTAINERS file. First off, if you want me to apply patches, put me in the to: line, and say so, otherwise I don't know. Secondly, it's the middle of the merge window, and we can't take new patches into our trees (go read Documentation/development_model please), so this will have to wait until after 4.7-rc1 is out. Thirdly, I need an ack for the DT-related change before I can accept that, hopefully you included the correct people on it. And 4th, what is with the insane number of people on cc:? Use get_maintainer.pl correctly please, and don't just hit everyone you can possibly think of with a cc: for no good reason. I'll put this in my "to-review" queue to look at after 4.7-rc1 is out. thanks, greg k-h
Re: [PATCH v6 0/3] auxdisplay: Introduce ht16k33 driver
On Wed, May 18, 2016 at 02:23:16PM +0200, Robin van der Gracht wrote: > This patchset adds a new driver to the auxdisplay subsystem. It also adds > devicetree bindings documentation and a new vendor prefix. > > I added myself as maintainer to the MAINTAINERS file. First off, if you want me to apply patches, put me in the to: line, and say so, otherwise I don't know. Secondly, it's the middle of the merge window, and we can't take new patches into our trees (go read Documentation/development_model please), so this will have to wait until after 4.7-rc1 is out. Thirdly, I need an ack for the DT-related change before I can accept that, hopefully you included the correct people on it. And 4th, what is with the insane number of people on cc:? Use get_maintainer.pl correctly please, and don't just hit everyone you can possibly think of with a cc: for no good reason. I'll put this in my "to-review" queue to look at after 4.7-rc1 is out. thanks, greg k-h
[PATCH v2 0/7] iio: inv_mpu6050: Support i2c master and external readings
This series attempts to implement support for external readings in i2c master mode. Previous version here: https://www.spinics.net/lists/linux-iio/msg24502.html Patches 1 and 6 should be considered bugfixes. Patches 2,3,4 add regmap support and are mostly unchanged from v2 Patch 5 adds i2c master support Patch 6 adds external readings (which only works in i2c master mode) It might make sense to wait for Peter Rosin's patches cleaning up i2c mux locking before patch 5. Notable differences since previous versions: * Prefer inv_mpu_ over inv_mpu6050_ prefix * Remove indirection for SLV4 register names * Use the registered vendor prefix 'invensense' * Rearrange samples for scan mask instead of validating For i2c master: * Explicit read/write when caching addr * Support I2C_FUNC_SMBUS_BYTE (making i2cdetect work) * Handle xfer errors as reported in status registers For external channels support: * Only enable i2c slaves when required * Also forward write_raw for external channels * Drop handling read_raw from EXT_SENS registers * List external channels by scan index * Allow external channels with arbitrary sizes Crestez Dan Leonard (7): iio: inv_mpu6050: Do burst reads using spi/i2c directly iio: inv_mpu6050: Initial regcache support iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo iio: inv_mpu6050: Cache non-volatile bits of user_ctrl iio: inv_mpu6050: Add support for auxiliary I2C master iio: inv_mpu6050: Reformat sample for active scan mask iio: inv_mpu6050: Expose channels from slave sensors .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 105 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 692 - drivers/iio/imu/inv_mpu6050/inv_mpu_i2c.c | 5 - drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 115 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 221 ++- drivers/iio/imu/inv_mpu6050/inv_mpu_spi.c | 5 - drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 24 +- 7 files changed, insertions(+), 56 deletions(-) -- 2.5.5
Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.
On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote: > When starting secondary VPEs which support EVA and the SegCtl registers, > copy the memory segmentation configuration from the running VPE to ensure > that all VPEs in the core have a consitent virtual memory map. > > The EVA configuration of secondary cores is dealt with when starting the > core via the CM. > > Signed-off-by: Matt Redfearn> --- > > arch/mips/kernel/cps-vec.S | 16 > 1 file changed, 16 insertions(+) > > diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S > index ac81edd44563..07b3274c8ae1 100644 > --- a/arch/mips/kernel/cps-vec.S > +++ b/arch/mips/kernel/cps-vec.S > @@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes) > mfc0t0, CP0_CONFIG > mttc0 t0, CP0_CONFIG > > + /* Copy the EVA config from this VPE if the CPU supports it */ > + mfc0t0, CP0_CONFIG, 1 > + bgezt0, 1f > + mfc0 t0, CP0_CONFIG, 2 > + bgezt0, 1f > + mfc0 t0, CP0_CONFIG, 3 > + and t0, t0, MIPS_CONF3_SC > + beqzt0, 1f > + nop Hi Matt, The checks here aren't *quite* right since they do the mfc0 of the next register in the delay slot which will happen even if the M bit of the preceeding register wasn't set. There are other cases in cps-vec.S where I've made that mistake... Luckily, in this particular case, we know that we have MT ASE support which means we know that Config3 exists. So I think you can just remove the checks of Config1.M & Config2.M and just read Config3 straight away. Thanks, Paul > + mfc0t0, CP0_SEGCTL0 > + mttc0 t0, CP0_SEGCTL0 > + mfc0t0, CP0_SEGCTL1 > + mttc0 t0, CP0_SEGCTL1 > + mfc0t0, CP0_SEGCTL2 > + mttc0 t0, CP0_SEGCTL2 > +1: > /* Ensure no software interrupts are pending */ > mttc0 zero, CP0_CAUSE > mttc0 zero, CP0_STATUS > -- > 2.5.0 >
[PATCH v2 0/7] iio: inv_mpu6050: Support i2c master and external readings
This series attempts to implement support for external readings in i2c master mode. Previous version here: https://www.spinics.net/lists/linux-iio/msg24502.html Patches 1 and 6 should be considered bugfixes. Patches 2,3,4 add regmap support and are mostly unchanged from v2 Patch 5 adds i2c master support Patch 6 adds external readings (which only works in i2c master mode) It might make sense to wait for Peter Rosin's patches cleaning up i2c mux locking before patch 5. Notable differences since previous versions: * Prefer inv_mpu_ over inv_mpu6050_ prefix * Remove indirection for SLV4 register names * Use the registered vendor prefix 'invensense' * Rearrange samples for scan mask instead of validating For i2c master: * Explicit read/write when caching addr * Support I2C_FUNC_SMBUS_BYTE (making i2cdetect work) * Handle xfer errors as reported in status registers For external channels support: * Only enable i2c slaves when required * Also forward write_raw for external channels * Drop handling read_raw from EXT_SENS registers * List external channels by scan index * Allow external channels with arbitrary sizes Crestez Dan Leonard (7): iio: inv_mpu6050: Do burst reads using spi/i2c directly iio: inv_mpu6050: Initial regcache support iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo iio: inv_mpu6050: Cache non-volatile bits of user_ctrl iio: inv_mpu6050: Add support for auxiliary I2C master iio: inv_mpu6050: Reformat sample for active scan mask iio: inv_mpu6050: Expose channels from slave sensors .../devicetree/bindings/iio/imu/inv_mpu6050.txt| 105 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 692 - drivers/iio/imu/inv_mpu6050/inv_mpu_i2c.c | 5 - drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 115 +++- drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 221 ++- drivers/iio/imu/inv_mpu6050/inv_mpu_spi.c | 5 - drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 24 +- 7 files changed, insertions(+), 56 deletions(-) -- 2.5.5
Re: [PATCH 2/2] MIPS: CPS: Copy EVA configuration when starting secondary VPs.
On Wed, May 18, 2016 at 03:45:22PM +0100, Matt Redfearn wrote: > When starting secondary VPEs which support EVA and the SegCtl registers, > copy the memory segmentation configuration from the running VPE to ensure > that all VPEs in the core have a consitent virtual memory map. > > The EVA configuration of secondary cores is dealt with when starting the > core via the CM. > > Signed-off-by: Matt Redfearn > --- > > arch/mips/kernel/cps-vec.S | 16 > 1 file changed, 16 insertions(+) > > diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S > index ac81edd44563..07b3274c8ae1 100644 > --- a/arch/mips/kernel/cps-vec.S > +++ b/arch/mips/kernel/cps-vec.S > @@ -431,6 +431,22 @@ LEAF(mips_cps_boot_vpes) > mfc0t0, CP0_CONFIG > mttc0 t0, CP0_CONFIG > > + /* Copy the EVA config from this VPE if the CPU supports it */ > + mfc0t0, CP0_CONFIG, 1 > + bgezt0, 1f > + mfc0 t0, CP0_CONFIG, 2 > + bgezt0, 1f > + mfc0 t0, CP0_CONFIG, 3 > + and t0, t0, MIPS_CONF3_SC > + beqzt0, 1f > + nop Hi Matt, The checks here aren't *quite* right since they do the mfc0 of the next register in the delay slot which will happen even if the M bit of the preceeding register wasn't set. There are other cases in cps-vec.S where I've made that mistake... Luckily, in this particular case, we know that we have MT ASE support which means we know that Config3 exists. So I think you can just remove the checks of Config1.M & Config2.M and just read Config3 straight away. Thanks, Paul > + mfc0t0, CP0_SEGCTL0 > + mttc0 t0, CP0_SEGCTL0 > + mfc0t0, CP0_SEGCTL1 > + mttc0 t0, CP0_SEGCTL1 > + mfc0t0, CP0_SEGCTL2 > + mttc0 t0, CP0_SEGCTL2 > +1: > /* Ensure no software interrupts are pending */ > mttc0 zero, CP0_CAUSE > mttc0 zero, CP0_STATUS > -- > 2.5.0 >
[PATCH v2 4/7] iio: inv_mpu6050: Cache non-volatile bits of user_ctrl
Signed-off-by: Crestez Dan Leonard--- drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 2 ++ drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 9 ++--- drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 4 +++- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h index 297b0ef..bd2c0fd 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h @@ -80,6 +80,7 @@ enum inv_devices { * @enable: master enable state. * @accl_fifo_enable: enable accel data output * @gyro_fifo_enable: enable gyro data output + * @user_ctrl:The non-volatile bits of user_ctrl * @fifo_rate:FIFO update rate. */ struct inv_mpu6050_chip_config { @@ -89,6 +90,7 @@ struct inv_mpu6050_chip_config { unsigned int enable:1; unsigned int accl_fifo_enable:1; unsigned int gyro_fifo_enable:1; + u8 user_ctrl; u16 fifo_rate; }; diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c index 3fc0b71..56ee1e2 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c @@ -53,13 +53,15 @@ int inv_reset_fifo(struct iio_dev *indio_dev) if (result) goto reset_fifo_fail; /* disable fifo reading */ - result = regmap_write(st->map, st->reg->user_ctrl, 0); + st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN; + result = regmap_write(st->map, st->reg->user_ctrl, + st->chip_config.user_ctrl); if (result) goto reset_fifo_fail; /* reset FIFO*/ result = regmap_write(st->map, st->reg->user_ctrl, - INV_MPU6050_BIT_FIFO_RST); + st->chip_config.user_ctrl | INV_MPU6050_BIT_FIFO_RST); if (result) goto reset_fifo_fail; @@ -76,8 +78,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) return result; } /* enable FIFO reading and I2C master interface*/ + st->chip_config.user_ctrl |= INV_MPU6050_BIT_FIFO_EN; result = regmap_write(st->map, st->reg->user_ctrl, - INV_MPU6050_BIT_FIFO_EN); + st->chip_config.user_ctrl); if (result) goto reset_fifo_fail; /* enable sensor output to FIFO */ diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c index 1a6bad3..fc55923 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c @@ -74,7 +74,9 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, bool enable) if (result) return result; - result = regmap_write(st->map, st->reg->user_ctrl, 0); + st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN; + result = regmap_write(st->map, st->reg->user_ctrl, + st->chip_config.user_ctrl); if (result) return result; -- 2.5.5
[PATCH v2 3/7] iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo
Signed-off-by: Crestez Dan Leonard--- drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 13 - drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 3 ++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c index 8455af0..3fc0b71 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c @@ -41,7 +41,8 @@ int inv_reset_fifo(struct iio_dev *indio_dev) struct inv_mpu6050_state *st = iio_priv(indio_dev); /* disable interrupt */ - result = regmap_write(st->map, st->reg->int_enable, 0); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, 0); if (result) { dev_err(regmap_get_device(st->map), "int_enable failed %d\n", result); @@ -68,8 +69,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) /* enable interrupt */ if (st->chip_config.accl_fifo_enable || st->chip_config.gyro_fifo_enable) { - result = regmap_write(st->map, st->reg->int_enable, - INV_MPU6050_BIT_DATA_RDY_EN); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, + INV_MPU6050_BIT_DATA_RDY_EN); if (result) return result; } @@ -92,8 +94,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) reset_fifo_fail: dev_err(regmap_get_device(st->map), "reset fifo failed %d\n", result); - result = regmap_write(st->map, st->reg->int_enable, - INV_MPU6050_BIT_DATA_RDY_EN); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, + INV_MPU6050_BIT_DATA_RDY_EN); return result; } diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c index e8818d4..1a6bad3 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c @@ -69,7 +69,8 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, bool enable) if (result) return result; - result = regmap_write(st->map, st->reg->int_enable, 0); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, 0); if (result) return result; -- 2.5.5
[PATCH v2 4/7] iio: inv_mpu6050: Cache non-volatile bits of user_ctrl
Signed-off-by: Crestez Dan Leonard --- drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h | 2 ++ drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 9 ++--- drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 4 +++- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h index 297b0ef..bd2c0fd 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_iio.h @@ -80,6 +80,7 @@ enum inv_devices { * @enable: master enable state. * @accl_fifo_enable: enable accel data output * @gyro_fifo_enable: enable gyro data output + * @user_ctrl:The non-volatile bits of user_ctrl * @fifo_rate:FIFO update rate. */ struct inv_mpu6050_chip_config { @@ -89,6 +90,7 @@ struct inv_mpu6050_chip_config { unsigned int enable:1; unsigned int accl_fifo_enable:1; unsigned int gyro_fifo_enable:1; + u8 user_ctrl; u16 fifo_rate; }; diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c index 3fc0b71..56ee1e2 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c @@ -53,13 +53,15 @@ int inv_reset_fifo(struct iio_dev *indio_dev) if (result) goto reset_fifo_fail; /* disable fifo reading */ - result = regmap_write(st->map, st->reg->user_ctrl, 0); + st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN; + result = regmap_write(st->map, st->reg->user_ctrl, + st->chip_config.user_ctrl); if (result) goto reset_fifo_fail; /* reset FIFO*/ result = regmap_write(st->map, st->reg->user_ctrl, - INV_MPU6050_BIT_FIFO_RST); + st->chip_config.user_ctrl | INV_MPU6050_BIT_FIFO_RST); if (result) goto reset_fifo_fail; @@ -76,8 +78,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) return result; } /* enable FIFO reading and I2C master interface*/ + st->chip_config.user_ctrl |= INV_MPU6050_BIT_FIFO_EN; result = regmap_write(st->map, st->reg->user_ctrl, - INV_MPU6050_BIT_FIFO_EN); + st->chip_config.user_ctrl); if (result) goto reset_fifo_fail; /* enable sensor output to FIFO */ diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c index 1a6bad3..fc55923 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c @@ -74,7 +74,9 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, bool enable) if (result) return result; - result = regmap_write(st->map, st->reg->user_ctrl, 0); + st->chip_config.user_ctrl &= ~INV_MPU6050_BIT_FIFO_EN; + result = regmap_write(st->map, st->reg->user_ctrl, + st->chip_config.user_ctrl); if (result) return result; -- 2.5.5
[PATCH v2 3/7] iio: inv_mpu6050: Only toggle DATA_RDY_EN in inv_reset_fifo
Signed-off-by: Crestez Dan Leonard --- drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c| 13 - drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 3 ++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c index 8455af0..3fc0b71 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c @@ -41,7 +41,8 @@ int inv_reset_fifo(struct iio_dev *indio_dev) struct inv_mpu6050_state *st = iio_priv(indio_dev); /* disable interrupt */ - result = regmap_write(st->map, st->reg->int_enable, 0); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, 0); if (result) { dev_err(regmap_get_device(st->map), "int_enable failed %d\n", result); @@ -68,8 +69,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) /* enable interrupt */ if (st->chip_config.accl_fifo_enable || st->chip_config.gyro_fifo_enable) { - result = regmap_write(st->map, st->reg->int_enable, - INV_MPU6050_BIT_DATA_RDY_EN); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, + INV_MPU6050_BIT_DATA_RDY_EN); if (result) return result; } @@ -92,8 +94,9 @@ int inv_reset_fifo(struct iio_dev *indio_dev) reset_fifo_fail: dev_err(regmap_get_device(st->map), "reset fifo failed %d\n", result); - result = regmap_write(st->map, st->reg->int_enable, - INV_MPU6050_BIT_DATA_RDY_EN); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, + INV_MPU6050_BIT_DATA_RDY_EN); return result; } diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c index e8818d4..1a6bad3 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c @@ -69,7 +69,8 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, bool enable) if (result) return result; - result = regmap_write(st->map, st->reg->int_enable, 0); + result = regmap_update_bits(st->map, st->reg->int_enable, + INV_MPU6050_BIT_DATA_RDY_EN, 0); if (result) return result; -- 2.5.5