from:"Russell King \- ARM Linux"

Re: [Call for participation] Bi-Weekly KVM/ARM Technical Sync-up

2013-08-21 Thread Russell King - ARM Linux

On Wed, Aug 21, 2013 at 05:09:39PM -0700, Christoffer Dall wrote:
 Linaro is going to host a bi-weekly sync-up call for technical issues on
 KVM/ARM development.  The KVM 32-bit and 64-bit maintainers as well as
 the QEMU ARM maintainer will typically be on the call.
 
 The first call will be held Tuesday August 27th.

I'll point out that I don't do Tuesdays for phone calls (it's one of the
days I regularly take as weekend time) so you'll never be able to
invite me if you keep this on Tuesdays.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 1/5] clk: allow reentrant calls into the clk framework

2013-03-18 Thread Russell King - ARM Linux

On Mon, Mar 18, 2013 at 01:15:51PM -0700, Mike Turquette wrote:
 Quoting Ulf Hansson (2013-02-28 01:54:34)
  On 28 February 2013 05:49, Mike Turquette mturque...@linaro.org wrote:
   @@ -703,10 +744,29 @@ int clk_enable(struct clk *clk)
   unsigned long flags;
   int ret;
  
   +   /* this call re-enters if it is from the same context */
   +   if (spin_is_locked(enable_lock) || 
   mutex_is_locked(prepare_lock)) {
   +   if ((void *) atomic_read(context) == get_current()) {
   +   ret = __clk_enable(clk);
   +   goto out;
   +   }
   +   }
  
  I beleive the clk_enable|disable code will be racy. What do you think
  about this scenario:
  
  1. Thread 1, calls clk_prepare - clk is not reentrant - mutex_lock
  - set_context to thread1.
  2. Thread 2, calls clk_enable - above if will mean that get_current
  returns thread 1 context and then clk_enable continues -
  spin_lock_irqsave - set_context to thread 2.
  3. Thread 1 continues and triggers a reentancy for clk_prepare - clk
  is not reentrant (since thread 2 has set a new context) - mutex_lock
  and we will hang forever.
  
  Do you think above scenario could happen?
  
  I think the solution would be to invent another static atomic_t
  context; which is used only for fast path functions
  (clk_enable|disable). That should do the trick I think.
 
 Ulf,
 
 You are correct.  In fact I have a branch that has two separate context
 pointers, one for mutex-protected functions and one for
 spinlock-protected functions.  Somehow I managed to discard that change
 before settling on the final version that was published.

Err.

Do not forget one very important point.

Any clock which has clk_enable() called on it must have had clk_prepare()
already called _and_ completed.  A second clk_prepare() call on the same
clock should be a no-op other than to increase the prepare reference count
on it.

If you do anything else, you are going to get into sticky problems.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC 1/1] clk: Add notifier support in clk_prepare_enable/clk_disable_unprepare

2013-03-15 Thread Russell King - ARM Linux

On Tue, Mar 12, 2013 at 06:47:53PM -0700, Bill Huang wrote:
 On Tue, 2013-03-12 at 21:40 +0800, Russell King - ARM Linux wrote:
  On Tue, Mar 12, 2013 at 05:37:41AM -0700, Bill Huang wrote:
   Add the below four notifier events so drivers which are interested in
   knowing the clock status can act accordingly. This is extremely useful
   in some of the DVFS (Dynamic Voltage Frequency Scaling) design.
   
   PRE_CLK_ENABLE
   POST_CLK_ENABLE
   PRE_CLK_DISABLE
   POST_CLK_DISABLE
   
   Signed-off-by: Bill Huang bilhu...@nvidia.com
  
  NAK.  *Sigh* NO, this is the wrong level to be doing stuff like this.
  
  The *ONLY* thing that clk_prepare_enable() and clk_prepare_disable() should
  *EVER* be doing is calling clk_prepare(), clk_enable(), clk_disable() and
  clk_unprepare().  Those two functions are *merely* helpers for drivers
  who don't wish to make the individual calls.
  
  Drivers are still completely free to call the individual functions, at
  which point your proposal breaks horribly - and they _do_ call the
  individual functions.

 I'm proposing to give device driver a choice when it knows that some
 driver might be interested in knowing its clock's enabled/disabled state
 change at runtime, this is very important for centralized DVFS core
 driver. It is not meant to be covering all cases especially for drivers
 which is not part of the DVFS, so we don't care if it is calling
 clk_enable/disable directly or not.

But you're not giving drivers a choice.  You're giving them an ultimatum.
Either they use clk_prepare_enable() which must only be called from non-
atomic contexts and have the notifiers, or if they need to use the
individual functions (which is what they _should_ be doing but people
are too lazy to properly convert stuff) they don't get the option of
the notifiers at all.

This sucks totally, design wise.

The whole point of clk_prepare_enable() is that it is a helper function
to _only_ do the clk_prepare() call followed by a clk_enable() call and
_nothing_ _else_ _what_ _so_ _ever_.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC 1/1] clk: Add notifier support in clk_prepare_enable/clk_disable_unprepare

2013-03-15 Thread Russell King - ARM Linux

On Tue, Mar 12, 2013 at 10:42:30PM -0600, Stephen Warren wrote:
 I believe the point Russell is making is not that the idea behind this
 patch is wrong, but simply that the function where you put the hooks is
 wrong. The hooks should at least be in clk_enable/clk_disable and not

Indeed, remembering that clk_enable/clk_disable can be called from
atomic contexts.

If the hook needs to be non-atomic (iow, it can schedule) then it can't
go into clk_enable/clk_disable, and must go into clk_prepare/clk_unprepare,
which is the schedulable half of that API.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC 1/1] clk: Add notifier support in clk_prepare_enable/clk_disable_unprepare

2013-03-15 Thread Russell King - ARM Linux

On Tue, Mar 12, 2013 at 10:40:04PM -0700, Bill Huang wrote:
 That will be too bad, it looks like we deadlock in the mechanism, we
 cannot change existing drivers behavior (that means some call
 clk_disable/enable directly, some are not), and we cannot hook notifier
 in clk_disable/enable either, that means there seems no any chance to
 get what we want, any idea?

Look, the whole point is:

- Drivers can call clk_enable/clk_disable from their atomic regions to
  control the clock.  Drivers which do this also call clk_prepare/
  clk_unprepare from a schedulable context to perform any operations
  necessary to allow the clock to be used.

- Drivers which only ever control the clock from a schedulable context
  *can* use clk_prepare_enable()/clk_disable_unprepare() to control
  their clock, which simplifies the coding in the driver.

The whole point here is to cater for what is found on different SoCs and
not need to keep rewriting the drivers between different SoCs.

So, the idea is that:

- clk_prepare() does whatever is needed to prepare a clock for use which
  may require waiting for the clock to be in a state which it can be
  enabled.  In other words, if there is a PLL, the PLL is setup and
  we wait for it to report that it has locked.

- clk_enable() is about turning the clock output on so that the device
  receives the clock.

Now, in the case of a PLL directly feeding a device, it's entirely possible
that clk_prepare() ends up providing the clock signal to the device, and
clk_enable() does absolutely nothing.

Or, if the clock has a gate on it, it's entirely possible that clk_prepare()
does nothing, and clk_enable() unmasks the gate to allow the clock to be
provided to the device - which can happen from atomic contexts.

The whole point about the separation of these two functions is that device
driver writers _can_ code their drivers for both situations and not care
about how the SoC implements the clocking at all.

Why did we end up with this split in the first place?  Because we ran into
the problem that some SoCs required a sleeping clk_enable() and others
didn't, and the whole thing was turning into an incompatible mess.

So, please.  Realise that clk_prepare() and clk_enable() are the _official_
APIs, and that clk_prepare_enable() is merely a helper function for drivers
to allow them to automate the calling of those two functions in succession
with _no_ _further_ _processing_ at all.

So, if your hooks need to be callable from schedulable contexts, then you
need to put them inside clk_prepare().  If your hooks are callable from
atomic contexts, then they can go into clk_enable().  But what you can
not do is put them into clk_prepare_enable().

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC 1/1] clk: Add notifier support in clk_prepare_enable/clk_disable_unprepare

2013-03-15 Thread Russell King - ARM Linux

On Thu, Mar 14, 2013 at 11:22:47PM -0600, Stephen Warren wrote:
 Is clk_set_rate() only legal to call in non-atomic contexts then? The
 header file doesn't say, although I guess since many other functions
 explicitly say they can't, then by omission it can...

I think when all this was discussed alongside the clk_prepare/clk_unprepare
stuff, that yes, it was decided that clk_set_rate() was non-atomic only.

And yes, that is most definitely the case, because the first thing 
clk_set_rate() does is take a mutex - which will trigger a might_sleep()
complaint if ever called from an atomic context.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC v2 1/1] clk: Add notifier support in clk_prepare/clk_unprepare

2013-03-15 Thread Russell King - ARM Linux

On Thu, Mar 14, 2013 at 02:31:04AM -0700, Bill Huang wrote:
 Add the below two notifier events so drivers which are interested in
 knowing the clock status can act accordingly. This is extremely useful
 in some of the DVFS (Dynamic Voltage Frequency Scaling) design.
 
 CLK_PREPARED
 CLK_UNPREPARED
 
 Signed-off-by: Bill Huang bilhu...@nvidia.com
 ---
  drivers/clk/clk.c   |3 +++
  include/linux/clk.h |2 ++
  2 files changed, 5 insertions(+)
 
 diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
 index ed87b24..3292cec 100644
 --- a/drivers/clk/clk.c
 +++ b/drivers/clk/clk.c
 @@ -550,6 +550,7 @@ void clk_unprepare(struct clk *clk)
  {
   mutex_lock(prepare_lock);
   __clk_unprepare(clk);
 + __clk_notify(clk, CLK_UNPREPARED, clk-rate, clk-rate);
   mutex_unlock(prepare_lock);
  }
  EXPORT_SYMBOL_GPL(clk_unprepare);
 @@ -598,6 +599,8 @@ int clk_prepare(struct clk *clk)
  
   mutex_lock(prepare_lock);
   ret = __clk_prepare(clk);
 + if (!ret)
 + __clk_notify(clk, CLK_PREPARED, clk-rate, clk-rate);

So, on prepare, we notify after we've prepared the clock.  On unprepare,
we notify after the clock has been shut down.  Are you sure that's the
correct ordering?  Would it not be better to view it in a stack-like
fashion, iow:

get
prepare
notify-prepare
enable
disable
notify-unprepare
unprepare
put

?

 diff --git a/include/linux/clk.h b/include/linux/clk.h
 index b3ac22d..16c1d92 100644
 --- a/include/linux/clk.h
 +++ b/include/linux/clk.h
 @@ -43,6 +43,8 @@ struct clk;
  #define PRE_RATE_CHANGE  BIT(0)
  #define POST_RATE_CHANGE BIT(1)
  #define ABORT_RATE_CHANGEBIT(2)
 +#define CLK_PREPARED BIT(3)
 +#define CLK_UNPREPARED   BIT(4)

This implies that we're only going to have a maximum of 32 reason codes
here.  Is that enough?

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC 1/1] clk: Add notifier support in clk_prepare_enable/clk_disable_unprepare

2013-03-12 Thread Russell King - ARM Linux

On Tue, Mar 12, 2013 at 05:37:41AM -0700, Bill Huang wrote:
 Add the below four notifier events so drivers which are interested in
 knowing the clock status can act accordingly. This is extremely useful
 in some of the DVFS (Dynamic Voltage Frequency Scaling) design.
 
 PRE_CLK_ENABLE
 POST_CLK_ENABLE
 PRE_CLK_DISABLE
 POST_CLK_DISABLE
 
 Signed-off-by: Bill Huang bilhu...@nvidia.com

NAK.  *Sigh* NO, this is the wrong level to be doing stuff like this.

The *ONLY* thing that clk_prepare_enable() and clk_prepare_disable() should
*EVER* be doing is calling clk_prepare(), clk_enable(), clk_disable() and
clk_unprepare().  Those two functions are *merely* helpers for drivers
who don't wish to make the individual calls.

Drivers are still completely free to call the individual functions, at
which point your proposal breaks horribly - and they _do_ call the
individual functions.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v9 2/2] video: drm: exynos: Add pinctrl support to fimd

2013-02-28 Thread Russell King - ARM Linux

On Thu, Feb 28, 2013 at 11:03:57PM +0100, Sylwester Nawrocki wrote:
 Please just use IS_ERR(), let's stop this IS_ERR_OR_NULL() insanity.

Yes, indeed.

On that topic (and off-topic for this thread, sorry) I've committed
a set of patches to remove most users of IS_ERR_OR_NULL() from arch/arm.
These are the last remaining ones there - and I don't want to see any
more appearing:

arch/arm/plat-samsung/clock.c:  if (IS_ERR_OR_NULL(clk))
arch/arm/plat-samsung/clock.c:  if (!IS_ERR_OR_NULL(clk)  clk-ops  
clk-ops-round_rate)
arch/arm/plat-samsung/clock.c:  if (IS_ERR_OR_NULL(clk))
arch/arm/plat-samsung/clock.c:  if (IS_ERR_OR_NULL(clk) || 
IS_ERR_OR_NULL(parent))
arch/arm/mach-imx/devices/platform-ipu-core.c:  if 
(IS_ERR_OR_NULL(imx_ipu_coredev))
arch/arm/mach-imx/devices/platform-ipu-core.c:  if 
(IS_ERR_OR_NULL(imx_ipu_coredev))
arch/arm/kernel/smp_twd.c:   * We use IS_ERR_OR_NULL() here, 
because if the clock stubs
arch/arm/kernel/smp_twd.c:  if (!IS_ERR_OR_NULL(twd_clk))

They currently all legal uses of it - though I'm sure that the samsung
clock uses can be reduced to just IS_ERR().  The IMX use looks valid
in that imx_ipu_coredev really can be an error pointer (on failure) or
NULL if the platform device hasn't yet been created.  The TWD one is
explained in the comments in the code (if people had to write explanations
for using IS_ERR_OR_NULL(), we'd probably have it used correctly!)

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC/PATCH 2/5] kernel_cpustat: convert to atomic 64-bit accessors

2013-02-21 Thread Russell King - ARM Linux

On Thu, Feb 21, 2013 at 10:53:07PM +0100, Frederic Weisbecker wrote:
 That too should be kcpustat_this_cpu_set(), or kcpustat_this_cpu_add()
 FWIW. But we probably don't need the overhead of atomic_add() that
 does a LOCK.
 atomic_set(var, atomic_read(var) + delta) would be better. All we need

You mean atomic64_set() and atomic64_read().  Looking at the generic
version in lib/atomic64.c, atomic64_add() is cheaper for 32-bit arches
because it doesn't involve taking the lock twice.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [Xen-devel] [PATCH 02/24] xen/arm: hypercalls

2012-07-27 Thread Russell King - ARM Linux

On Fri, Jul 27, 2012 at 03:39:31PM +0100, Ian Campbell wrote:
 On Fri, 2012-07-27 at 15:21 +0100, Russell King - ARM Linux wrote:
  On Fri, Jul 27, 2012 at 02:02:18PM +0100, Stefano Stabellini wrote:
  +/**
  + * hypercall.h
  + *
  + * Linux-specific hypervisor handling.
  + *
  + * Stefano Stabellini stefano.stabell...@eu.citrix.com, Citrix, 
  2012
  + *
  + * This program is free software; you can redistribute it and/or
  + * modify it under the terms of the GNU General Public License 
  version 2
  + * as published by the Free Software Foundation; or, when 
  distributed
  + * separately from the Linux kernel or incorporated into other
  + * software packages, subject to the following license:
  + *
  + * Permission is hereby granted, free of charge, to any person 
  obtaining a copy
  + * of this source file (the Software), to deal in the Software 
  without
  
  Erm, is that an additional restriction on the GPL which prevents me from
  shipping this code on a CD and charging for the act of creating the CD
  and shipping it?  That would technically make the above statement
  incompatible with the GPL.
 
 There's an or in there.
 
 The non-GPL alternative license is the standard one applied by upstream
 Xen to the interface headers:
 
 http://xenbits.xen.org/hg/xen-unstable.hg/file/tip/xen/include/public/COPYING
 
 It's the X11/MIT license IIRC, which the FSF say is GPL compatible.
 http://www.gnu.org/licenses/license-list.html#X11License
 
 The same license is used a few other places in the kernel, e.g. the DRM
 code.

Ok, but be aware that you won't be able to take code from the Linux
kernel and place it in a file marked with that license header (because
the code authors haven't given permission for it to be placed under any
other license other than GPLv2.)

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] ARM: EXYNOS: PD: Fix duplicate variable

2012-05-13 Thread Russell King - ARM Linux

On Wed, May 09, 2012 at 01:50:28PM +0100, Sangwook Lee wrote:
 struct generic_pm_domain already has a field for name. Use that field
 instead of creating another field in struct exynos_pm_domain

Argh.  No.

 @@ -99,7 +98,7 @@ static __init int exynos_pm_dt_parse_domains(void)
  
   if (of_get_property(np, samsung,exynos4210-pd-off, NULL))
   pd-is_off = true;
 - pd-name = np-name;
 + pd-pd.name = (char *)np-name;

Why this cast?  Why can't pd-pd.name be correctly typed in the first place?

I assume this is because np-name is const.  Never EVER cast away const.
It's wrong to do so.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Making ARM multiplatform kernels DT-only?

2012-05-04 Thread Russell King - ARM Linux

On Thu, May 03, 2012 at 11:31:13PM -0700, Deepak Saxena wrote:
 On 3 May 2012 07:04, Russell King - ARM Linux li...@arm.linux.org.uk wrote:
  On Thu, May 03, 2012 at 01:50:35PM +, Arnd Bergmann wrote:
  Hi everyone,
 
  I've been discussing multiplatform kernels with a few people recently,
  and we will have a lot of discussion sessions about this at Linaro
  Connect in Hong Kong.
 
  One question that came up repeatedly is whether we should support all
  possible board files for each platform in a multiplatform kernel,
  or just the ones that are already using DT probing. I would like
  to get a quick poll of opinions on that and I've tried to put those
  people on Cc that would be most impacted by this, i.e. the maintainers
  for platforms that have both DT and non-DT board files at the moment.
 
  My feeling is that we should just mandate DT booting for multiplatform
  kernels, because it significantly reduces the combinatorial space
  at compile time, avoids a lot of legacy board files that we cannot
  test anyway, reduces the total kernel size and gives an incentive
  for people to move forward to DT with their existing boards.
 
  The counterargument is that we won't be able to support all the
  boards we currently do when the user switches on multiplatform,
  but I think that is acceptable.
  Note that I would still want to allow users to build platforms
  separately in order to enable the ATAG style board files, even
  for platforms that are not multiplatform capable.
 
  I'm basing my comments off mach-zynq.
 
  How about we take the following steps towards it?
 
  1. create arch/arm/include/mach/ which contains standardized headers
    for DT based implementations.  This must include all headers included
    by asm/ or linux/ includes.  This will also be the only mach/ header
    directory included for code outside of arch/arm/mach-*.  This also
    acts as the 'default' set of mach/* includes for stuff like timex.h
    and the empty hardware.h
 
  2. DT based mach-* directories do not have an include directory; their
    include files must be located in the main include/ heirarchy if shared
    with other parts of the kernel, otherwise they must be in the mach-*
    directory.
 
 What do we do about all the mach-specific driver related headers that are
 currently littered all over arch/arm/mach*? Things like mach/mx3fb.h and
 mach/ohci.h?

Such platforms with that kind of stuff haven't fully converted to DT,
because these sorts of things are platform dependent yet aren't represented
in DT.

 Arnd (or maybe Nicolas?) suggested something along the lines
 of scripting something to figure out the constants required for a
 specific driver and pulling them out of mach/*.h and into that
 driver as a starting point.

But that doesn't solve the problem.  Take a USB driver where the register
layouts are different.  To have it work on two different platforms, you'd
need to build the driver twice.  Not only that, you'd also need to figure
out some way to register only one copy of that.

So really, the header file thing is just a sign of larger blocking issues
to getting those kinds of drivers working on a multiplatform kernel, and
no amount of header munging will sort it out.  The fact of that situation
is the driver concerned is _not_ multiplatform and shouldn't be built in
that situation until it is fixed.

  3. Allow build multiple mach-* directories (which we already do... see
    the samsung stuff.)
 
  We still have irqs.h being SoC dependent, and we still haven't taken
  debug-macros.S far enough along to get rid of that.  Then there's also
  the problem of uncompress.h.  The last piece of the puzzle is the common
  clock stuff.
 
 There's also some stuff outside of arch/arm that needs cleanup
 including USB driver
 refactoring and some other bits that I can't recall atm. Right now we
 can build one and
 only one host controller   inteface, even as modules. Not too
 difficult of a problem to
 solve and not critical to get the SOCs booting, but needed to be
 usable on real devices.

That's already a problem today, and has nothing to do with this current
issue - what I'm saying is the problem can't be made to go away through
header file stuff, so this issue is not relevant to this discussion.

That's not to say it doesn't need to be resolved, because it does.  It's
just no reason to argue against what I've said.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Making ARM multiplatform kernels DT-only?

2012-05-04 Thread Russell King - ARM Linux

On Thu, May 03, 2012 at 10:38:15PM -0700, Deepak Saxena wrote:
 I'm of the opinion that we support DT only platforms for
 multi-platform but this is based on the approach of only caring for
 multi-platform for newer systems and not worrying too much for legacy
 HW.

You do realise that you're advocating that we forcefully regress stuff
that works today.  Sorry, that's not going to happen.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Making ARM multiplatform kernels DT-only?

2012-05-04 Thread Russell King - ARM Linux

On Fri, May 04, 2012 at 03:20:57PM +0100, Wookey wrote:
 Debian tries very hard not to support anything in the kernel that
 upstream don't support in the kernel because otherwise it's way too
 much work. The current list of supplied arm kernels is:
 
 iop32x (ThecusN2100, intel SS4000, GLAN tank)
 ixp4xx (Linksys NSLU2)
 kirkwood (*plugs, QNAP NAS, OPenRD)
 orion5x (QNAP NAS, HP mv2120)
 versatile
 mx5
 omap
 
 because that's a good compromise between coverage and 'building 20-odd
 images'. I have no idea how much of that lot is going to get DTified,
 but I'm guessing the older stuff won't be?

Well, my understanding is that there's DT patches around for Versatile.
OMAP and MX5 are both heading for DT.  I'm less certain about the Orion
and Kirkwood stuff, but as they're only about 4 years old, I would hope
that there was some active movement for these.

The iop32x and ixp4xx stuff hasn't seen much in the way of maintenance
so its highly likely that these won't be converted to DT unless someone
with the hardware decides to step up and do it.

So, that means your list should reduce down to five kernels, or three if
the Orion/Kirkwood stuff gets converted to DT.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Making ARM multiplatform kernels DT-only?

2012-05-04 Thread Russell King - ARM Linux

On Fri, May 04, 2012 at 11:39:30AM -0500, Rob Herring wrote:
 Many of the headers are simply platform_data structs which may still be
 needed on DT platforms, but could be moved elsewhere.

Those should be in include/linux/platform.

  Then there's also the problem of uncompress.h.  The last piece of the
  puzzle is the common clock stuff.
 
 The smp/hotplug/localtimer related functions are still global. Marc Z
 has posted patches for this, but I haven't seen recent activity. This
 and clocks were the 2 main issues I saw trying to build 2 platforms
 together. highbank and picoxcell could be built together since only
 highbank has clocks and smp.
 
 gpio.h is still required, but empty for most platforms.

Those empty gpio.h files are definitely a candidate for going into
arch/arm/include/mach/gpio.h, and then all those 12-byte mach/gpio.h can
be deleted (13 files).

We've not had any progress on the gpio.h issue since I did the last round
of cleanup; the next stage was to persuade SoC maintainers to get rid of
their optimized versions which aren't compatible with multi-platform
kernels.

I don't know if folk are expecting me to push that forwards or whether
there's someone else working on that aspect of it...

So this issue really does need to be progressed too.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Making ARM multiplatform kernels DT-only?

2012-05-04 Thread Russell King - ARM Linux

On Fri, May 04, 2012 at 10:03:40PM +0200, Linus Walleij wrote:
 On Fri, May 4, 2012 at 4:35 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
 
  Well, my understanding is that there's DT patches around for Versatile.
 
 Is there? There is some in-tree stuff, but haven't seen any other
 sign of patches.
[...]
 make it clear that noone ever tested an MMC card on a Versatile
 booted on real hardware using DT. And I strongly suspect there
 are more instances like that, it seems AACI, GPIO and I2C and
 I guess whatever you cannot test on QEMU is just unsupported.

Given that we've converted stuff like MMCI to use DT, it _should_ be
the case that we can just add the appropriate DT description to the
existing Versatile DT - we might need to create some new GPIO devices
for things like SYSMCI and get rid of the status function, or we
could just use the auxdata stuff in the mean time.

Either way, we've solved these problems on other platforms, and the
shared code has already been fixed or is being fixed for stuff like
Versatile Express.

Lets also not forget that the VIC code has already been converted to
DT.

So I don't think there's a big problem here - I think most of the
pieces of the jigsaw are in place through converting other platforms.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 1/3] ARM: imx: Add common imx cpuidle init functionality.

2012-04-24 Thread Russell King - ARM Linux

On Tue, Apr 24, 2012 at 09:38:43AM +0800, Shawn Guo wrote:
 On Mon, Apr 23, 2012 at 10:45:02AM -0500, Rob Lee wrote:
   Let me try last time.  What about having a late_initcall hook in
   machine_desc?
  
   Also fine with me.
  
  
  Shall I add Shawn's patch to my imx cpuidle patchset or should the
  arch/arm/kernel/setup.c and arch.h changes be submitted separately?
  If separately, Shawn, did you want to submit this patch or should I?
  
 Strange.  Russell is not in the Cc list.  I remember I added Russell
 into Cc when I propose the idea.  Added him again.

I didn't see any message in this thread cc'd to me, but that's not to
say I hadn't already read this patch.  I don't have any comment against
it, but I do wonder how often this hook would be used.

We do seem to have quite a number of late_initcall()s in arch/arm/mach-*,
so it seems to be a good idea - provided someone's willing to convert all
those users of late_initcall()s.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 1/3] ARM: imx: Add common imx cpuidle init functionality.

2012-04-24 Thread Russell King - ARM Linux

On Tue, Apr 24, 2012 at 10:40:35AM -0500, Rob Lee wrote:
 Thanks for the attention on this.  From what I've understood, I will
 send another submission that includes the imx cpuidle patchset and
 Shawn's device tree late initcall patchset and I'll provide
 explanation of the two separate patchsets in the cover sheet.

What I also want to see _along with_ the addition of the init_late
callback is a patch set from the _same_ people fixing up the rest
of arch/arm/mach-* to use it - you know, like what I do when I
introduce these kinds of things.

Otherwise I'm going to NAK the change.  We can't have new stuff
introduced in this way and the older platforms ignored.  Forward
progress across the board please.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v7 1/3] Documentation: common clk API

2012-03-21 Thread Russell King - ARM Linux

On Wed, Mar 21, 2012 at 12:41:41PM -0700, Saravana Kannan wrote:
 The meaning of clk_enable/disable has been changed and they won't work  
 without calling clk_prepare/unprepare. So, these are definitely new  
 APIs. If it weren't new APIs, then none of the general drivers would  
 need to change.

Yes and no.  I disagree that the meaning of clk_enable/disable() has
changed.  It hasn't.  What has changed is the preconditions for calling
those functions, and necessarily so in the interest of being able to
unify the different implementations.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH-WIP 01/13] xen/arm: use r12 to pass the hypercall number to the hypervisor

2012-03-01 Thread Russell King - ARM Linux

On Wed, Feb 29, 2012 at 12:58:26PM +, Dave Martin wrote:
 On Wed, Feb 29, 2012 at 09:56:02AM +, Ian Campbell wrote:
  On Wed, 2012-02-29 at 09:34 +, Dave Martin wrote:
   On Tue, Feb 28, 2012 at 12:28:29PM +, Stefano Stabellini wrote:
  
I don't have a very strong opinion on which register we should use, but
I would like to avoid r7 if it is already actively used by gcc.
   
   But there is no framepointer for Thumb-2 code (?)
  
  Peter Maydell suggested there was:
   r7 is (used by gcc as) the Thumb frame pointer; I don't know if this
   makes it worth avoiding in this context.
  
  Sounds like it might be a gcc-ism, possibly a non-default option?
  
  Anyway, I think r12 will be fine for our purposes so the point is rather
  moot.
 
 Just had a chat with some tools guys -- apparently, when passing register
 arguments to gcc inline asms there really isn't a guarantee that those
 variables will be in the expected registers on entry to the inline asm.

The best you can do is:

register unsigned int foo asm(r7) = value;

asm(blah %0 : : r (foo));

and ensure that your assembly checks that %0 _is_ r7 and produces a build
error if not.  See the __asmeq() macro in asm/system.h to find out how to
do that.

This feature has been missing from ARM GCC for quite a long time - it's
used extensively on x86 GCC, where they have one register class per
register, so they can do stuff like:

asm(blah %0 : : a (value));

and be guaranteed that %0 will be eax.

 If you need a specific register, this means that you must set up that
 register explicitly inside the asm if you want a guarantee that the
 code will work:
 
   asm volatile (
   movw   r12, %[hvc_num]\n\t
   ...
   hvc#0
   :: [hvc_num] i (NUMBER) : r12
   );
 
 Of course, if you need to set up more than about 5 or 6 registers in
 this way, the doubled register footprint means that the compiler will
 have to start spilling stuff to the stack.

No it won't - it will barf instead - think about it.  If you're clobbering
r0 - r5, but need to pass in six values in registers, gcc can't use r0-r5
for that, so it must use the remaining registers.  It gets rather unhappy
with that, and starts erroring out (iirc 'too many reloads' or some similar
error.)  I've been there.

If you want to do it that way, your only option is to store them to memory
and pass the address of the block into the assembly, and reload them there.
Which is extremely sucky and inefficient.

Practically, the register variable plus asm() does seem to work, and seems
to work for virtually all gcc versions out there (there have been the odd
buggy version, but those bugs appear to get fixed.)


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH-WIP 01/13] xen/arm: use r12 to pass the hypercall number to the hypervisor

2012-03-01 Thread Russell King - ARM Linux

On Wed, Feb 29, 2012 at 02:44:24PM +, Ian Campbell wrote:
  If you need a specific register, this means that you must set up that
  register explicitly inside the asm if you want a guarantee that the
  code will work:
  
  asm volatile (
  movw   r12, %[hvc_num]\n\t
 
 Is gcc (or gas?) smart enough to optimise this away if it turns out that
 %[hvc_num] == r12?

No, and it won't do, because %[hvc_num] is specified in these operands:

  ...
  hvc#0
  :: [hvc_num] i (NUMBER) : r12

to be an integer, not a register.

 How are system calls implemented on the userspace side? I confess I
 don't know what the ARM syscall ABI looks like -- is it all registers or
 is some of it on the stack? It sounds like the solution ought to be
 pretty similar though.

All registers.  We have a few which take a pointer to an in memory array,
but those are for some old multiplexed syscalls.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH-WIP 01/13] xen/arm: use r12 to pass the hypercall number to the hypervisor

2012-03-01 Thread Russell King - ARM Linux

On Thu, Mar 01, 2012 at 10:27:02AM +, Dave Martin wrote:
 So, where there's a compelling reason to inline these things, we can use
 the existing techniques if we're alert to the risks.  But in cases where
 there isn't a compelling reason, aren't we just inviting fragility
 unnecessarily?

The practical experience from the kernel suggests that there isn't a
problem - that's not to say that future versions of gcc won't become
a problem, and that the compiler guys may refuse to fix it.

I think it's a feature that we should be pressing gcc guys for - it's
fairly fundamental to any programming which requires interfaces that
require certain args in certain registers, or receive results in
certain registers.

The options over this are basically:
1. refusing to upgrade to any version of gcc which does not allow
   registers-in-asm
2. doing the store-to-memory reload-in-asm thing
3. hand-coding veneers for every call to marshall the registers

Each of those has its down sides, but I suspect with (1), it may be
possible to have enough people applying pressure to the compiler guys
that they finally see sense on this matter.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2 3/4] arm/dts: OMAP4: Add mmc controller nodes and board data

2012-02-24 Thread Russell King - ARM Linux

On Fri, Feb 24, 2012 at 03:56:52PM +0530, Rajendra Nayak wrote:
 I don't know if we can, but even if we could, we take care of the same
 bootargs working on two boards (say sdp and panda) *if* on sdp I have my
 filesystem on eMMC and on panda I have it on external card.
 What happens if I want to use my filesystem on both boards on external
 card?

This is why the bootargs should be part of the information the boot
loader passes to the kernel, and not part of the kernel configuration
or DT.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] gic : check if there are pending interrupts

2012-02-24 Thread Russell King - ARM Linux

On Fri, Feb 24, 2012 at 02:45:48PM +0100, Daniel Lezcano wrote:
 The following patch checks if there are pending interrupts on the gic.
 
 This function is needed for example for the ux500 cpuidle driver.
 When the A9 cores and the gic are decoupled from the PRCMU, the idle
 routine has to check if an interrupt is pending on the gic before entering
 in retention mode.

So what happens if an interrupt becomes pending after you've checked it
but before you've gone into retention mode?  This basically sounds like
one big racy idea to me.

Hardware which allows you to go into retention mode with interrupts
pending which should prevent it (or at least cause it to re-awake) just
sounds like a massive fail to me.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2 6/6] mmc: omap_hsmmc: Don't expect MMC1 to always have vmmc supply

2012-02-21 Thread Russell King - ARM Linux

On Tue, Feb 21, 2012 at 05:43:54PM +0530, S, Venkatraman wrote:
 On Tue, Feb 21, 2012 at 3:33 PM, Rajendra Nayak rna...@ti.com wrote:
  @@ -324,8 +302,8 @@ static int omap_hsmmc_reg_get(struct omap_hsmmc_host
  *host)
                         mmc_slot(host).ocr_mask = ocr_value;
                 } else {
                         if (!(mmc_slot(host).ocr_mask  ocr_value)) {
  -                               pr_err(MMC%d ocrmask %x is not
  supported\n,
  -                                       host-id,
  mmc_slot(host).ocr_mask);
  +                               pr_err(MMC ocrmask %x is not
  supported\n,
  +                                       mmc_slot(host).ocr_mask);

You're dropping the MMC number from these error messages.  It would be
much better to fix them instead.  Use dev_info(mmc_dev(host-mmc), blah
rather than pr_err().

Drivers should not be using pr_* unless they really do not have a struct
device.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Query: Multiple Mappings to Mem and ARMV6+

2012-02-20 Thread Russell King - ARM Linux

On Mon, Feb 20, 2012 at 09:48:56AM -0800, viresh kumar wrote:
 On Feb 20, 2012 4:31 PM, Catalin Marinas catalin.mari...@arm.com wrote:
 
  On 16 February 2012 18:14, viresh kumar viresh.li...@gmail.com wrote:
   On Thu, Feb 16, 2012 at 9:48 AM, Catalin Marinas
   catalin.mari...@arm.com wrote:
   The DMA API implementation on ARM takes care of the cache cleaning and
   invalidating.
  
   I believe that this is the reason why we have cache re-invalidation
   (we invalidated
   it in dma_map_*() earlier) in dma_unmap_*() calls for ARMv6+ for
   DMA_FROM_DEVICE.
   Am i Correct?
 
  Yes.
 
 
 But why isn't keeping only the second one sufficient? Why don't we remove
 it from dma_map_* routines?

Please don't think for a moment that anyone likes the idea of having to
walk over the cache twice for every DMA operation.  We don't.  But I can
assure you that it's very very necessary.

The first run through the affected cache lines on dma_map_*() is there
to get rid of any cache lines for the buffer which may be marked 'dirty'.
A dirty cache line can be evicted (written back) to memory at _any_ time.
If this occurs while the DMA controller is reading data from a device,
the results depend on the order which the particular cache line is written
by the DMA controller, and by the cache line eviction.

If the cache line eviction happens after the DMA controller has written,
the DMA'd data will be overwritten by old stale data.

So, we must get rid of these dirty cache lines.  We can do that either
by cleaning the cache lines or invalidating them.  We chose to invalidate
them because it makes very little difference.

For the case where the DMA controller needs to read from memory, we
obviously have to ensure that all data is pushed out of the cache for it
to be visible to the DMA controller.  For this case, merely cleaning the
cache is sufficient.

The second run through the affected cache lines is to prevent speculative
prefetches occuring while the DMA controller is on operation, which could
result in old data (before the DMA controller has written its data) being
read by the CPU, resulting again in old stale data.

I hope it's now clear why we need to run over the buffer twice.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Query: Multiple Mappings to Mem and ARMV6+

2012-02-16 Thread Russell King - ARM Linux

On Thu, Feb 16, 2012 at 05:15:20PM +, Catalin Marinas wrote:
 On Thu, Feb 16, 2012 at 04:41:02PM +, viresh kumar wrote:
  Sorry for starting the long old thread again, but i have to start it as i
  was a bit confused. :(
  
  We know that we can't have multiple mappings with different attributes
  to the same physical memory on ARMv6+ machines due to speculative
  prefetch.
  
  So, we have following kind of mappings in kernel now (please correct me
  if i am wrong):
  - Low Mem: Mapped at boot time to - Normal Cacheable - Bufferable
  - ioremap() - blocked on Low Mem, so that we don't create Device type 
  mapping
  to same mem
  - dma_alloc_coherent() and others:
   - Without DMA_MEM_BUFFERABLE selected - gives strongly ordered mem
(i.e. Non cacheable - Non Bufferable)
   - With DMA_MEM_BUFFERABLE selected - gives Normal - Non cacheable -
 Bufferable mapping
   - Maybe some other too...
  
  I have a doubt with the last mapping mentioned above. We have two
  mappings possibly to the same physical memory, with different
  attributes: One is Cacheable and other one is not.
  
  Is this allowed by ARM? Because the patch in which Russell blocked
  ioremap on Low Mem, he clearly mentioned that these attributes are
  also important and they should be same.
 
 Section A3.5.7 in the latest ARM ARM (revC) clarifies the mismatched

There's a new version?

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Query: Multiple Mappings to Mem and ARMV6+

2012-02-16 Thread Russell King - ARM Linux

On Thu, Feb 16, 2012 at 05:29:28PM +, Catalin Marinas wrote:
 On Thu, Feb 16, 2012 at 05:22:42PM +, Russell King - ARM Linux wrote:
  On Thu, Feb 16, 2012 at 05:15:20PM +, Catalin Marinas wrote:
   On Thu, Feb 16, 2012 at 04:41:02PM +, viresh kumar wrote:
Sorry for starting the long old thread again, but i have to start it as 
i
was a bit confused. :(

We know that we can't have multiple mappings with different attributes
to the same physical memory on ARMv6+ machines due to speculative
prefetch.

So, we have following kind of mappings in kernel now (please correct me
if i am wrong):
- Low Mem: Mapped at boot time to - Normal Cacheable - Bufferable
- ioremap() - blocked on Low Mem, so that we don't create Device type 
mapping
to same mem
- dma_alloc_coherent() and others:
 - Without DMA_MEM_BUFFERABLE selected - gives strongly ordered 
mem
  (i.e. Non cacheable - Non Bufferable)
 - With DMA_MEM_BUFFERABLE selected - gives Normal - Non 
cacheable -
   Bufferable mapping
 - Maybe some other too...

I have a doubt with the last mapping mentioned above. We have two
mappings possibly to the same physical memory, with different
attributes: One is Cacheable and other one is not.

Is this allowed by ARM? Because the patch in which Russell blocked
ioremap on Low Mem, he clearly mentioned that these attributes are
also important and they should be same.
   
   Section A3.5.7 in the latest ARM ARM (revC) clarifies the mismatched
  
  There's a new version?
 
 Rev C has been available for a while. I thought you got it already:

No, if no one tells me when a new version is available, then, unless I
waste time regularly polling the website below, I have no way to know.

 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html
 
 It contains the LPAE and virtualisation extensions.
 
 -- 
 Catalin

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

2012-02-08 Thread Russell King - ARM Linux

On Wed, Feb 08, 2012 at 04:48:34AM -0800, Dmitry Antipov wrote:
 Generalize CONFIG_IRQ_TIME_ACCOUNTING  between X86 and
 ARM, move noirqtime= option to common debugging code.
 For a bit of backward compatibility, tsc=noirqtime
 is preserved, but issues a warning.
 
 Suggested-by: Venki Pallipadi ve...@google.com
 Signed-off-by: Dmitry Antipov dmitry.anti...@linaro.org
 ---
  arch/arm/kernel/sched_clock.c |3 +++
  arch/x86/Kconfig  |   11 ---
  arch/x86/kernel/tsc.c |7 ---
  include/linux/sched.h |2 ++
  lib/Kconfig.debug |   12 
  lib/Makefile  |2 ++
  lib/irqtime.c |   12 
  7 files changed, 35 insertions(+), 14 deletions(-)
  create mode 100644 lib/irqtime.c
 
 diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
 index 5416c7c..56d2a9d 100644
 --- a/arch/arm/kernel/sched_clock.c
 +++ b/arch/arm/kernel/sched_clock.c
 @@ -162,5 +162,8 @@ void __init sched_clock_postinit(void)
   if (read_sched_clock == jiffy_sched_clock_read)
   setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
  
 + if (!no_sched_irq_time)
 + enable_sched_clock_irqtime();

Why are you placing this here?  sched_clock is available from the point
that it's registered, which should be before the first sched_clock()
call.

 +config IRQ_TIME_ACCOUNTING
 + bool Fine granularity task level IRQ time accounting
 + depends on (X86 || (ARM  HAVE_SCHED_CLOCK))

Even though it's not bad here, please get out of the habbit of throwing
unnecessary parens into the mix.  It can make stuff more difficult to
read and therefore confirm correctness.  (I've spent many a time
rewriting if() statements because of paren overuse.)

This could have been written:

depends on X86 || (ARM  HAVE_SCHED_CLOCK)

However, ARM will always have HAVE_SCHED_CLOCK after the next merge window,
so this can become a much simpler:

depends on X86 || ARM

Apart from these two points, the rest of the patch looks fine to me but
the ultimate decision about its acceptability is up to other people.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] sched: generalize CONFIG_IRQ_TIME_ACCOUNTING for X86 and ARM

2012-02-08 Thread Russell King - ARM Linux

On Wed, Feb 08, 2012 at 07:15:26AM -0800, Dmitry Antipov wrote:
 On 02/08/2012 05:18 AM, Russell King - ARM Linux wrote:
 Why are you placing this here?  sched_clock is available from the point
 that it's registered, which should be before the first sched_clock()
 call.

 This is just because I'm thinking about:

 if (read_sched_clock == jiffy_sched_clock_read)
 setup_sched_clock(jiffy_sched_clock_read, 32, HZ);
 else if (!no_sched_irq_time)
   enable_sched_clock_irqtime();

 I suppose that fine granularity task irq time accounting
 makes no sense if sched_clock() granularity is poor.

Let me put it a different way - is there a reason not to do this in
setup_sched_clock() so that it becomes available as soon as sched_clock()
has been initialized by a platform?

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH V6 0/7] add a generic cpufreq driver

2012-02-03 Thread Russell King - ARM Linux

On Mon, Jan 09, 2012 at 10:52:48PM +0800, Richard Zhao wrote:
 On Fri, Jan 06, 2012 at 08:53:37AM +0800, Richard Zhao wrote:
  On Thu, Jan 05, 2012 at 06:16:54AM +0800, Richard Zhao wrote:
   hi Russell,
   
   May I have your ACK, you merge it?
  Russell, ping
 would you have time to look at this patch series?

Sorry, kept missing this.  Yes, patch 1 is fine, and _that_ patch alone
gets:

Acked-by: Russell King rmk+ker...@arm.linux.org.uk

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] mx53_loco: add DA9053 PMIC support

2012-01-16 Thread Russell King - ARM Linux

On Tue, Jan 17, 2012 at 01:10:53AM +0800, Ying-Chun Liu (PaulLiu) wrote:
 diff --git a/arch/arm/mach-mx5/board-mx53_loco.c 
 b/arch/arm/mach-mx5/board-mx53_loco.c
 index fd8b524..61dd8c9 100644
 --- a/arch/arm/mach-mx5/board-mx53_loco.c
 +++ b/arch/arm/mach-mx5/board-mx53_loco.c
 @@ -23,10 +23,21 @@
  #include linux/delay.h
  #include linux/gpio.h

You have this ^^^ include, which includes asm/gpio.h, and in turn
mach/gpio.h, but...

  #include linux/i2c.h
 +#include linux/module.h
 +#include linux/platform_device.h
 +#include linux/irq.h
 +#include linux/interrupt.h
 +#include linux/err.h
 +#include linux/regulator/machine.h
 +#include linux/regulator/fixed.h
 +#include linux/mfd/da9052/da9052.h
 +#include linux/mfd/da9052/pdata.h
  
  #include mach/common.h
  #include mach/hardware.h
  #include mach/iomux-mx53.h
 +#include mach/irqs.h
 +#include mach/gpio.h

you include mach/gpio.h here?  I don't see why you need mach/irqs.h
either.

That brings into this another question: you're adding 9 new linux/
includes above.  How many of those do you _actually_ need?  Eg, I
don't see you adding anything from linux/err.h to this file, so why
do you need this header?

Please don't add include files unnecessarily.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] ARM: pl330: fix null pointer dereference in pl330_chan_ctrl()

2012-01-13 Thread Russell King - ARM Linux

On Fri, Jan 13, 2012 at 12:36:31PM +, Mans Rullgard wrote:
 This fixes the thrd-req_running field being accessed before thrd
 is checked for null.  The error was introduced in abb959f.
 
 Signed-off-by: Mans Rullgard mans.rullg...@linaro.org

I don't know what's happening with the PL330 driver, but there's patches
around to remove this file and merge it with the DMA engine driver.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v4 4/7] cpufreq: add clk-reg cpufreq driver

2012-01-03 Thread Russell King - ARM Linux

On Mon, Dec 26, 2011 at 09:44:52PM +0800, Richard Zhao wrote:
 On Mon, Dec 26, 2011 at 11:10:30AM +, Mark Brown wrote:
  The *call* is there in the regulator subsystem, it's just that none of
  the drivers back it up with an actual implementation yet.  Which turns
  out to be a good thing as cpufreq can't currently understand variable
  latencies and the governors don't deal well with non-trivial latencies
  anyway.
 but clk API don't have such calls. and many SoCs only adjust clk frequencies, 
 using
 one single voltage.

That's because it's often not known - especially in the case of PLLs,
data sheets don't tend to specify how long it takes for the PLL to relock
after a requested change.  If it's important that the PLL be locked,
there will be a bit to poll (or they'll cause the CPU itself to stall
while the PLL is not locked.)

So, for these kinds of situations, how do you suggest that the clk API
provides this information?

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v4 4/7] cpufreq: add clk-reg cpufreq driver

2012-01-03 Thread Russell King - ARM Linux

On Tue, Jan 03, 2012 at 09:25:30PM +0800, Richard Zhao wrote:
 Hi Russel,
 
 On 3 January 2012 17:06, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  On Mon, Dec 26, 2011 at 09:44:52PM +0800, Richard Zhao wrote:
  On Mon, Dec 26, 2011 at 11:10:30AM +, Mark Brown wrote:
   The *call* is there in the regulator subsystem, it's just that none of
   the drivers back it up with an actual implementation yet.  Which turns
   out to be a good thing as cpufreq can't currently understand variable
   latencies and the governors don't deal well with non-trivial latencies
   anyway.
  but clk API don't have such calls. and many SoCs only adjust clk 
  frequencies, using
  one single voltage.
 
  That's because it's often not known - especially in the case of PLLs,
  data sheets don't tend to specify how long it takes for the PLL to relock
  after a requested change.  If it's important that the PLL be locked,
  there will be a bit to poll (or they'll cause the CPU itself to stall
  while the PLL is not locked.)
 
  So, for these kinds of situations, how do you suggest that the clk API
  provides this information?
 In latest v6 version, I get clk transition latency from dt property, and get
 regulator transition latency from regulator API.
 Could you please help review other arm common changes in v6 version?

You didn't get my point: how do you specify a clock transition latency
for a clock with a PLL when the data sheets don't tell you what that is,
and they instead give you a bit to poll?

Do you:

(a) make up some number and hope that it's representative
(b) not specify any transition latency
(c) think about the problem _now_ and define what it means for a clock
without a transition latency.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Is Pandaboard cpuhotplug working stably?

2011-12-22 Thread Russell King - ARM Linux

On Thu, Dec 22, 2011 at 02:19:23PM +0530, Shilimkar, Santosh wrote:
 + Peter Z
 
 On Wed, Dec 21, 2011 at 3:37 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  On Wed, Dec 21, 2011 at 05:59:07PM +0800, Barry Song wrote:
  2011/12/21 Russell King - ARM Linux li...@arm.linux.org.uk:
   cpu hotplug is basically totally buggered - the preconditions placed
   upon the bringup code path are basically impossible to satisfy in any
   shape or form at the moment.
  
   There's the requirement that the secondary CPU is marked online and
   active before interrupts are enabled for the thread migration stuff
   to behave correctly.  However, this is incompatible with 
   smp_call_function()
   which will wait for online CPUs to respond to an IPI - which this one
   won't because interrupts are disabled.
  
   I think there was some discussion about how to fix this but I don't
   recall the details.
 
  thanks, Russell. then could i think this is an ARM-kernel-specific bug
  which exists on all ARM SMP chips for the moment?
  and that bug doesn't happen on x86:
 
  I don't think so.  There's nothing ARM specific about it.
 
 There are few patches floating around for this issue. I posted one version
 long back [1] and then there was one more form Thomas G.
 The most recent is from one is from Peter Z [2] which is moving the
 fix for the cup online race to core code.
 
 Can you try Peter's patch with your test-case ?
 
 Regards,
 Santosh
 
 [1] https://lkml.org/lkml/2011/6/20/79
 [2] https://lkml.org/lkml/2011/12/15/255

[1] is already fixed - and is not the latest problem with this code.
Fixing the problem in [1] actually itself created the latest problem
with smp_call_function() which wasn't there before this change.  Patch
[2] refers to this problem and proposes a fix for it.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Is Pandaboard cpuhotplug working stably?

2011-12-21 Thread Russell King - ARM Linux

On Wed, Dec 21, 2011 at 05:23:48PM +0800, Barry Song wrote:
 Hi guys,
 i tried cpuhotplug on pandaboard for both
 Pandroid_Froyo_L27.8.2_release_pkg and Linaro 11.11. It has failed to
 work stably.
 On Pandroid_Froyo_L27.8.2_release_pkg, unplugging cpu1 works well:
 # echo 0  /sys/devices/system/cpu/cpu1/online
 CPU1: shutdown
 
 if i enable the cpu1 again by echo 1 
 /sys/devices/system/cpu/cpu1/online, the system will restore to 3
 random status: hang, normal, panic.
 
 Using  Linaro 11.11 release, echo 0 
 /sys/devices/system/cpu/cpu1/online will make system hang and the
 whole system will not be able to reset by pressing reset key, the only
 way to reset system is pulling out AV power.
 
 i am sorry i can't get more time to debug and find more clues. just
 want to ask people whether this is a version the cpuhotplug works
 normal on?

cpu hotplug is basically totally buggered - the preconditions placed
upon the bringup code path are basically impossible to satisfy in any
shape or form at the moment.

There's the requirement that the secondary CPU is marked online and
active before interrupts are enabled for the thread migration stuff
to behave correctly.  However, this is incompatible with smp_call_function()
which will wait for online CPUs to respond to an IPI - which this one
won't because interrupts are disabled.

I think there was some discussion about how to fix this but I don't
recall the details.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Is Pandaboard cpuhotplug working stably?

2011-12-21 Thread Russell King - ARM Linux

On Wed, Dec 21, 2011 at 05:59:07PM +0800, Barry Song wrote:
 2011/12/21 Russell King - ARM Linux li...@arm.linux.org.uk:
  cpu hotplug is basically totally buggered - the preconditions placed
  upon the bringup code path are basically impossible to satisfy in any
  shape or form at the moment.
 
  There's the requirement that the secondary CPU is marked online and
  active before interrupts are enabled for the thread migration stuff
  to behave correctly.  However, this is incompatible with smp_call_function()
  which will wait for online CPUs to respond to an IPI - which this one
  won't because interrupts are disabled.
 
  I think there was some discussion about how to fix this but I don't
  recall the details.
 
 thanks, Russell. then could i think this is an ARM-kernel-specific bug
 which exists on all ARM SMP chips for the moment?
 and that bug doesn't happen on x86:

I don't think so.  There's nothing ARM specific about it.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v4 3/6] clk: introduce the common clock framework

2011-12-17 Thread Russell King - ARM Linux

On Fri, Dec 16, 2011 at 04:45:48PM -0800, Turquette, Mike wrote:
 On Wed, Dec 14, 2011 at 5:18 AM, Thomas Gleixner t...@linutronix.de wrote:
  On Tue, 13 Dec 2011, Mike Turquette wrote:
  +void __clk_unprepare(struct clk *clk)
  +{
  +     if (!clk)
  +             return;
  +
  +     if (WARN_ON(clk-prepare_count == 0))
  +             return;
  +
  +     if (--clk-prepare_count  0)
  +             return;
  +
  +     WARN_ON(clk-enable_count  0);
 
  So this leaves the clock enable count set. I'm a bit wary about
  that. Shouldn't it either return (including bumping the prepare_count
  again) or call clk_disable() ?

No it should not.

 I've hit this in my port of OMAP.  It comes from this simple situation:
 
 driver 1 (adapted for clk_prepare/clk_unprepare):
 clk_prepare(clk);
 clk_enable(clk);
 
 ...
 
 driver2 (not adapted for clk_prepare/clk_unprepare):
 clk_enable(clk);

So this is basically buggy.  Look, it's quite simple.  Convert _all_
your drivers to clk_prepare/clk_unprepare _before_ you start switching
your platform to use these new functions.  You can do that _today_
without exception.

We must refuse to merge _any_ user which does this the old way - and
we should have been doing this since my commit was merged into mainline
to allow drivers to be converted.

And stop trying to think of ways around this inside clk_prepare/
clk_unprepare/clk_enable/clk_disable.  You can't do it.  Just fix _all_
the drivers.  Now.  Before you start implementing clk_prepare/clk_unprepare.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC V1 1/4] cpufreq: add arm soc generic cpufreq driver

2011-12-16 Thread Russell King - ARM Linux

On Thu, Dec 15, 2011 at 12:50:07PM -0600, Mark Langsdorf wrote:
 I'd prefer to see clk_get90 replaced with of_clk_get() and
 get_this_cpu_node() from the clk-cpufreq driver by Jamie Iles that
 I resubmitted yesterday.

Why isn't of_clk_get() hidden inside clk_get() ?

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC V1 1/4] cpufreq: add arm soc generic cpufreq driver

2011-12-15 Thread Russell King - ARM Linux

On Thu, Dec 15, 2011 at 07:16:35PM +0800, Richard Zhao wrote:
 +#ifdef CONFIG_SMP
 + /* loops_per_jiffy is not updated by the cpufreq core for SMP systems.
 +  * So update it for all CPUs.
 +  */
 + for_each_possible_cpu(cpu)
 + per_cpu(cpu_data, cpu).loops_per_jiffy =
 + cpufreq_scale(per_cpu(cpu_data, cpu).loops_per_jiffy,
 + freqs.old, freqs.new);

NAK.  If you think this is a good solution, you're very wrong.  If this
is what's in the core cpufreq code, then it too is very broken.

I've seen this exact method result in the loops_per_jiffy being totally
buggered over time by the constant scaling up and down.  The only way
to do this _sensibly_ is to record the _initial_ loops_per_jiffy and
_inital_ frequency, and scale from that.

That way you get consistent results irrespective of the scalings you
do over time, rather than something which continually deteriorates with
every frequency change.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v3 3/5] clk: introduce the common clock framework

2011-12-05 Thread Russell King - ARM Linux

On Mon, Dec 05, 2011 at 02:15:56PM -0700, Paul Walmsley wrote:
 The types associated with clock rates in the clock interface 
 (include/linux/clk.h) are inconsistent, and we should fix this. 

Rubbish.  They're different with good reason.  Rates are primerily
unsigned quantities - and should be treated as such.

The exception is clk_round_rate() which returns the rate, but also
_may_ return an error.  Therefore, its return type has to be signed.

 We could fix the immediate problem by changing the prototype of 
 clk_round_rate() to pass the rounded rate back to the caller via a pointer 
 in one of the arguments, and return an error code (if any) via the return 
 value:
 
 int clk_round_rate(struct clk *clk, unsigned long rate, unsigned long 
*rounded_rate);

Yes that might have been a better solution.

 But I'd propose that we instead increase the size of struct clk.rate to be 
 s64:
 
 s64 clk_round_rate(struct clk *clk, s64 desired_rate);
 int clk_set_rate(struct clk *clk, s64 rate);
 s64 clk_get_rate(struct clk *clk);
 
 struct clk {
 ...
  s64 rate;
 ...
 };
 
 That way the clock framework can accommodate current clock rates, as well 
 as any conceivable future clock rate.  (Some production CPUs are already 
 running at clock rates greater than 4 GiHZ[1].  RF devices with 4 GiHz+ 
 clock rates are also common, such as 802.11a devices running in the 5.8 
 GHz band, and drivers for those may eventually wish to use the clock 
 framework.)

Yuck.  You are aware that 64-bit math on 32-bit CPUs sucks?  So burdening
_everything_ with 64-bit rate quantities is absurd.  As for making then
64-bit signed quantities, that's asking for horrid code from gcc for the
majority of cases.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v3 3/5] clk: introduce the common clock framework

2011-12-02 Thread Russell King - ARM Linux

On Fri, Dec 02, 2011 at 10:13:10AM -0700, Paul Walmsley wrote:
 Hi Russell,
 
 On Thu, 1 Dec 2011, Russell King - ARM Linux wrote:
 
  On Wed, Nov 30, 2011 at 06:20:50PM -0700, Paul Walmsley wrote:
   1. When a clock user calls clk_enable() on a clock, the clock framework 
   should prevent other users of the clock from changing the clock's rate.  
   This should persist until the clock user calls clk_disable() (but see 
   also 
   #2 below).  This will ensure that clock users can rely on the rate 
   returned by clk_get_rate(), as long as it's called between clk_enable() 
   and clk_disable().  And since the clock's rate is guaranteed to remain 
   the 
   same during this time, code that cannot tolerate clock rate changes 
   without special handling (such as driver code for external I/O devices) 
   will work safely without further modification.
  
  So, if you have a PLL whose parent clock is not used by anything else.
  You want to program it to a certain rate.
  
  You call clk_disable() on the PLL clock.
 
 The approach described wouldn't require the PLL to be disabled before 
 changing its rate.  If there are no other users of the PLL, or if the 
 other users of the PLL have indicated that it's safe for others to change 
 the PLL's rate, the clock framework would allow the PLL rate change, even 
 if the PLL is enabled.  (modulo any notifier activity, and assuming that 
 the underlying PLL hardware allows its frequency to be reprogrammed while 
 the PLL is enabled)
 
  This walks up the tree and disables the parent.  You then try to set the 
  rate using clk_set_rate(). clk_set_rate() in this circumstance can't 
  wait for the PLL to lock because it can't - there's no reference clock 
  for it.
 
 As an aside, this seems like a good time to mention that the behavior of 
 clk_set_rate() on a disabled clock needs to be clarified.

It's more complicated than that.  Clocks have more states than just
enabled and disabled.

There is:
- unprepared
- prepared and disabled
- prepared and enabled

Implementations can chose at which point to enable their PLLs and wait
for them to lock - but if they want to sleep while waiting, they must
do this in the prepare method, not the enable method (remember, enable
is to be callable from atomic contexts.)

So, it's entirely possible that a prepared clock will have the PLLs up
and running, which means that clk_set_rate() can also wait for the PLL
to stablize (which would be a good idea.)

Now... we can say that PLLs should be locked when the prepare method
returns, or whenever clk_set_rate() returns.  The problem with that is
there's a race condition between clk_enable() and clk_set_rate() - if
we allow clk_set_rate() to sleep waiting for the PLL to lock, a
concurrent clk_enable() can't be prevented from returning because
that would involve holding a spinlock...

I think this is just one of those unsolvable issues - if you have
concurrent users of a PLL there's not much you can do - race free - to
prevent the PLL changing beneath some user.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v3 3/5] clk: introduce the common clock framework

2011-12-01 Thread Russell King - ARM Linux

On Thu, Dec 01, 2011 at 11:30:16AM -0700, Paul Walmsley wrote:
 The intention behind the clk_{allow,block}_rate_change() proposal was to 
 allow the current user of the clock to change its rate without having to 
 call clk_{allow,block}_rate_change(), if that driver was the sole user of 
 the clock.

And how does a driver know that?

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v3 5/5] clk: export tree topology and clk data via sysfs

2011-11-23 Thread Russell King - ARM Linux

On Wed, Nov 23, 2011 at 08:59:04AM -0800, Tony Lindgren wrote:
 ..let's plan on getting rid of the early usage of clocks instead so
 you don't have the issue of deferring stuff.

No - we have too many platforms already using them early to do a change
like this - and to do such a change will force them to invent some other
way.  That's just stupid.

Keep the clk API as a fundamental thing which should be initialized early
so we don't have to invent new clk APIs to get around its unavailability.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v3 5/5] clk: export tree topology and clk data via sysfs

2011-11-23 Thread Russell King - ARM Linux

On Wed, Nov 23, 2011 at 10:55:19AM -0800, Tony Lindgren wrote:
 What else are you aware of that is really needed early for clocks other
 than clockevent?

TWD will lose its auto-calibration.  Then there's various clock source
and clock event implementations.  These all call for the clk API to be
up and running at init_early time.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2 1/7] clk: Add a generic clock infrastructure

2011-10-17 Thread Russell King - ARM Linux

On Mon, Oct 17, 2011 at 01:05:39PM +0200, Sascha Hauer wrote:
 It's not a problem to associate multiple clocks to a device, we can do
 this now already. What a driver can't do now is 
 give-me-all-clocks-I-need(dev),
 but this problem should not be solved at clock core level but at the
 clkdev level.

I don't think it even needs solving at the clkdev level.

In order to get all clocks for a specific device, we'd need variable
length arrays to store the individual struct clk pointers, which isn't
going to be all that nice.  We'd need clk_get_alldev() and
clk_put_alldev() to deal with these variable length arrays - and as
far as the driver is concerned that's an opaque object - it can't know
anything about any particular individual struct clk in the array.

Then we'd need operations to operate on the array itself, which means
much more API baggage.

Also, if it did need to know about one particular struct clk, then
there's problems with avoiding that struct clk in the alldev() versions
(or we'll see drivers doing clk_get() followed by clk_disable() to
work-around any enabling done via the array versions.)

 The fact is that the different clocks for a device are really different
 clocks. A dumb driver may want to request/enable all relevant clocks at
 once while a more sophisticated driver may want to enable the clock for
 accessing registers in the probe function and a baud clock on device
 open time.

I don't think you can get away from drivers having to know about their
individual clocks in some form or other - and I don't think a dumb
driver should be a justification for creating an API.  If a dumb driver
wants to get the clocks for a device it has to behave as if it were
a smart driver and request each one (maybe having an array itself)
such as this:

static const char *con_ids[NR_CLKS] = {
foo,
bar,
};

struct driver_priv {
struct clk *clk[NR_CLKS];
};


for (err = i = 0; i  NR_CLKS; i++) {
priv-clk[i] = clk_get(dev, con_ids[i];
if (IS_ERR(priv-clk[i])) {
err = PTR_ERR(priv-clk[i]);
break;
}
}

if (err) {
for (i = 0; i  NR_CLKS  !IS_ERR(priv-clk[i]); i++)
clk_put(priv-clk[i]);
return err;
}

This approach also has the advantage that we know what order the clocks
will be in the array, and so we can sensibly index the array to obtain
a particular clock.

However, going back to the bus fabric case, a driver probably doesn't
have the knowledge - it's a platform topology thing - one which can't
be represented by a clock tree.

It might help if we represented our busses closer to reality - like PCI
does - rather than a flattened set of platform devices.  Couple that with
runtime PM and a driver for the bus fabric, and it should fall out
fairly naturally from the infrastructure we already have.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2 4/7] clk: Add simple gated clock

2011-10-13 Thread Russell King - ARM Linux

On Thu, Sep 22, 2011 at 03:26:59PM -0700, Mike Turquette wrote:
 diff --git a/drivers/clk/clk-gate.c b/drivers/clk/clk-gate.c
 new file mode 100644
 index 000..a1d8e79
 --- /dev/null
 +++ b/drivers/clk/clk-gate.c
 @@ -0,0 +1,78 @@
 +/*
 + * Copyright (C) 2010-2011 Canonical Ltd jeremy.k...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * Simple clk gate implementation
 + */
 +
 +#include linux/clk.h
 +#include linux/module.h
 +#include asm/io.h

linux/io.h please.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2 1/7] clk: Add a generic clock infrastructure

2011-10-13 Thread Russell King - ARM Linux

On Thu, Sep 22, 2011 at 03:26:56PM -0700, Mike Turquette wrote:
   struct clk_hw_ops {
   int (*prepare)(struct clk_hw *);
   void(*unprepare)(struct clk_hw *);
   int (*enable)(struct clk_hw *);
   void(*disable)(struct clk_hw *);
   unsigned long   (*recalc_rate)(struct clk_hw *);
   int (*set_rate)(struct clk_hw *,
   unsigned long, unsigned long *);
   long(*round_rate)(struct clk_hw *, unsigned long);
   int (*set_parent)(struct clk_hw *, struct clk *);
   struct clk *(*get_parent)(struct clk_hw *);
   };
 
 Platform clock code can register a clock through clk_register, passing a
 set of operations, and a pointer to hardware-specific data:
 
   struct clk_hw_foo {
   struct clk_hw clk;
   void __iomem *enable_reg;
   };

Eww, no, this really isn't going to scale for platforms like OMAP - to
have all the operations indirected through a set of function pointers
for every clock just because the enable register (or enable bit) is
in a different position is completely absurd.

I'm not soo concerned about the increase in memory usage (for 100 to 200
clock definitions its only 4 to 8k) but what worries me the most is
initializing these damned things.  It's an awful lot of initializers,
which means the potential for an awful lot of conflicts should something
change in this structure.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] ARM: S3C2410: Remove section mismatch warning

2011-10-03 Thread Russell King - ARM Linux

On Mon, Oct 03, 2011 at 03:10:41PM +0530, Tushar Behera wrote:
 Some of the functions and structures did not have _init or __initdata
 attributes, even though they were referenced from functions / structures
 with those attribute, resulting in section mismatches.

Firstly - it's a good idea to include the warnings which you're fixing
in the commit log text, so that people know exactly what is being fixed.

 diff --git a/arch/arm/mach-s3c2410/usb-simtec.c 
 b/arch/arm/mach-s3c2410/usb-simtec.c
 index 29bd3d9..3a1028c 100644
 --- a/arch/arm/mach-s3c2410/usb-simtec.c
 +++ b/arch/arm/mach-s3c2410/usb-simtec.c
 @@ -104,7 +104,7 @@ static struct s3c2410_hcd_info usb_simtec_info __initdata 
 = {
  };
  
  
 -int usb_simtec_init(void)
 +int __init usb_simtec_init(void)
  {
   int ret;
  

This one looks fine.

 diff --git a/arch/arm/mach-s3c2416/irq.c b/arch/arm/mach-s3c2416/irq.c
 index 28ad20d..153cb2f 100644
 --- a/arch/arm/mach-s3c2416/irq.c
 +++ b/arch/arm/mach-s3c2416/irq.c
 @@ -234,7 +234,7 @@ static int __init s3c2416_irq_add(struct sys_device 
 *sysdev)
   return 0;
  }
  
 -static struct sysdev_driver s3c2416_irq_driver = {
 +static struct sysdev_driver s3c2416_irq_driver __initdata = {
   .add= s3c2416_irq_add,
  };
  

I remain entirely unconvinced that this is correct.  As a result of
the sysdev_driver_register(s3c2416_sysclass, s3c2416_irq_driver);
call, this structure is placed on a list.

If this structure is marked __initdata, then the memory behind the
structure will be freed and overwritten - however, it's still on a
list which might be walked.  Such a walk would cause a kernel oops
or might even be an exploitable security hole if that page ends up
in userspace - especially as said structure contains function calls
which would be called in privileged mode.

The same comment applies to the other sysdev driver structures you're
marking __initdata too.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2 1/7] clk: Add a generic clock infrastructure

2011-10-03 Thread Russell King - ARM Linux

On Mon, Oct 03, 2011 at 05:31:08PM +0100, Mark Brown wrote:
 I dunno, I get the impression that some of this is due to the current
 limitations of the clock API rather than due to a lack of clocks -
 perhaps that's specific to the applications I look at, though.
 applications 

The clk API per-se has nothing to do with how clocks are registered.
There are two things that are the clk API:

1. clkdev - which deals with translating devices + connection IDs to
   struct clk.  This has no ordering requirements wrt requiring
   parents to be initialized before their children (it has no care
   about that at all because it's not within its definition.)

   For this, registration is about connecting device + connection IDs
   to a struct clk.

2. the driver API, defining how the opaque struct clk is looked up,
   obtained and then manipulated.  This has no 'registration' stuff.

So, whether clocks are a tree or flat is unspecified.  It's unspecified
whether there's any particular order required.

In fact, with a clock tree, it's entirely possible that only the leaf
clocks will be 'registered' with clkdev.  How the rest of the clock tree
is initialized is beyond the scope of the driver clk API.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2] ASoC: omap: convert per-board modules to platform drivers

2011-09-09 Thread Russell King - ARM Linux

On Fri, Sep 09, 2011 at 09:11:52AM -0700, Mark Brown wrote:
 On Fri, Sep 09, 2011 at 10:41:56AM +0100, Russell King - ARM Linux wrote:
  On Thu, Sep 08, 2011 at 04:59:04PM -0700, Mark Brown wrote:
 
   The problem is that someone has to manually go and add the device to
   every board that needs one and people find that tedious and slightly
   inelegant
 
  Sheesh.  So now you're arguing against your statement above?  Please
  stop wasting my time.
 
 There's two things going on here - what we are doing and what people
 would like to be able to do.  What we are doing is explicitly adding
 devices, what people would like to do is infer the devices from the
 board type.  Personally I'm totally happy with explicitly adding an
 audio device, but not everyone is and I do understand where they're
 coming from.

Well, with DT, there won't be any 'board type' anymore.  There won't be
any 'machine_is_xxx()' to sort it out anymore.  Using DT, all that will
be history - it's all got to be sorted out by either devices or device
properties.

So, given that, I don't see the logic of having two methods - it might
as well be dealt with by devices and [platform data for non-DT | DT
properties], and which then means we have everything working the same
way irrespective of what the backing data for the platform actually is.

Therefore, as we are heading for DT, I'd definitely say that having
machine_is_xxx() outside of arch/arm is a bug, no less and no more.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v3] ASoC: omap: convert per-board modules to platform drivers

2011-09-09 Thread Russell King - ARM Linux

On Fri, Sep 09, 2011 at 12:38:51PM +0100, Mans Rullgard wrote:
 This converts the per-board modules to platform drivers for a
 device created by in main platform setup.  These drivers call
 snd_soc_register_card() directly instead of going via a soc-audio
 device and the corresponding driver in soc-core.
 
 Signed-off-by: Mans Rullgard mans.rullg...@linaro.org
 ---
 This version uses a table to assign the device name in omap_init_audio().
 It is certainly less ugly than the previous patch.
 
 Again, tested only on Beagle.
 ---
  arch/arm/mach-omap2/devices.c |   34 +++
  sound/soc/omap/am3517evm.c|   55 +++---
  sound/soc/omap/igep0020.c |   52 +++--
  sound/soc/omap/n810.c |   73 
 ++---
  sound/soc/omap/omap3beagle.c  |   55 +++---
  sound/soc/omap/omap3evm.c |   56 ---
  sound/soc/omap/omap3pandora.c |   70 ---
  sound/soc/omap/overo.c|   56 +++
  sound/soc/omap/rx51.c |   55 +--
  sound/soc/omap/sdp3430.c  |   65 ++--
  sound/soc/omap/sdp4430.c  |   60 +
  sound/soc/omap/zoom2.c|   68 +-
  12 files changed, 458 insertions(+), 241 deletions(-)

I don't think this is an improvement.  Just look at the diffstat - it almost
doubles the number of lines of code.

One thing here which is utterly silly is:

 +static struct {
 + int machine;
 + const char *name;
 +} soc_device_names[] = {
 + { MACH_TYPE_OMAP3517EVM,am3517evm-soc-audio   },
 + { MACH_TYPE_IGEP0020,   igep2-soc-audio   },
 + { MACH_TYPE_NOKIA_N810, n8x1-soc-audio},
 + { MACH_TYPE_NOKIA_N810_WIMAX,   n8x1-soc-audio},
 + { MACH_TYPE_OMAP3_BEAGLE,   omap3beagle-soc-audio },
 + { MACH_TYPE_DEVKIT8000, omap3beagle-soc-audio },
 + { MACH_TYPE_OMAP3EVM,   omap3evm-soc-audio},
 + { MACH_TYPE_OMAP3_PANDORA,  pandora-soc-audio },
 + { MACH_TYPE_OVERO,  overo-soc-audio,  },
 + { MACH_TYPE_CM_T35, overo-soc-audio,  },
 + { MACH_TYPE_NOKIA_RX51, rx51-soc-audio,   },
 + { MACH_TYPE_OMAP_3430SDP,   sdp3430-soc-audio,},
 + { MACH_TYPE_OMAP_4430SDP,   sdp4430-soc-audio,},
 + { MACH_TYPE_OMAP_ZOOM2, zoom2-soc-audio,  },
 +};

So you're using the machine ID to select the name of the device.
(That's not really DT compatible.)

 +static int __init am3517evm_soc_init(void)
 +{
 + if (!machine_is_omap3517evm())
 + return -ENODEV;

But then you conditionalize the registration of the drivers on the
platform as well.  Why?  It's pointless.  If you don't have the core
code register the struct device for this platform then this driver
won't be bound to a device, and therefore the probe function won't be
called.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2] ASoC: omap: convert per-board modules to platform drivers

2011-09-09 Thread Russell King - ARM Linux

On Fri, Sep 09, 2011 at 12:30:11PM -0700, Mark Brown wrote:
 On Fri, Sep 09, 2011 at 08:01:35PM +0100, Russell King - ARM Linux wrote:
 
  Well, with DT, there won't be any 'board type' anymore.  There won't be
  any 'machine_is_xxx()' to sort it out anymore.  Using DT, all that will
  be history - it's all got to be sorted out by either devices or device
  properties.
 
 There is a board type - there's a root compatible property for the board
 which fulfils this purpose - so the situation with and without device
 tree is essentially the same.

So instead of a table of machine id numbers and soc device strings, you're
going to have a table of board 'compatible' strings and soc device strings,
and you're going to have to manually create the struct device with that
name.

That's just twisted and utterly insane - adding more code for precisely
zero benefit what so ever.  Think about it - the device tree is *already*
creating platform devices for entries in the device tree file.  What's
the point of having this special ASoC code look up the platform compatible
property in a table of strings to find a different string to manually
create a device with.  Why not just _add_ the bloody device to the DT
file in the first place?  That's partly what DT is there for - to create
platform specific struct devices.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2] ASoC: omap: convert per-board modules to platform drivers

2011-09-08 Thread Russell King - ARM Linux

On Thu, Sep 08, 2011 at 03:29:11PM -0700, Mark Brown wrote:
 On Thu, 2011-09-08 at 22:28 +0200, Arnd Bergmann wrote:
  On Thursday 08 September 2011 20:05:48 Mans Rullgard wrote:
 
   I had the same thought, but I couldn't find a suitable string anywhere.
   Are you suggesting an if(machine_is_foo()) cascade in omap_init_audio()?
 
   I'll be the first to agree this patch is not particularly pretty.
 
  My general feeling is that practically every time someone writes
  machine_is_*(), they are doing it wrong. There are of course exceptions,
  but I would strongly recommend to have the initialization calling up
  from the board file into more general functions instead of having all
  boards calling the same function which then goes to board specific
  code again.
 
 I have to agree, that seems tasteless. I'd expect something like
 triggering registration of devices based off walking down a table of
 machine IDs or something. One other issue to consider here is that we
 don't want to discourage people from sharing machine drivers while we
 can so it can't be completely automatic, it 

This problem has been solved (before DT) for _ages_ through the use
of platform devices in the platform support files, registered by the
.init_machine callback.

Where it went wrong is when ASOC and PCMCIA became rather complicated
that you could no longer just pass a platform data structure, but
instead needed some complex platform specific code - which started the
demand to have mini platform specific chunks of code in drivers/pcmcia/
and sound/.

With DT of course, all devices get instantiated from the device tree,
so there should not be any more platform specific chunks of code in
these locations (ha, it couldn't be solved with platform data so I
suspect it will continue to persist, forever unsolved.)

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v2] ASoC: omap: convert per-board modules to platform drivers

2011-09-08 Thread Russell King - ARM Linux

On Thu, Sep 08, 2011 at 03:47:31PM -0700, Mark Brown wrote:
 On Thu, Sep 08, 2011 at 11:37:20PM +0100, Russell King - ARM Linux wrote:
 
  With DT of course, all devices get instantiated from the device tree,
  so there should not be any more platform specific chunks of code in
  these locations (ha, it couldn't be solved with platform data so I
  suspect it will continue to persist, forever unsolved.)
 
 That's not the case at all for audio, the PCB schematic for the audio
 subsystem on a device like a smartphone is a sufficiently interesting
 piece of hardware to be a device with a driver in its own right.  The
 ASoC machine drivers aren't about instantiating devices, they are about
 controlling the interrelationships between the various devices in the
 audio subsystem.
 
 What will happen for device tree is that there will be a device in the
 device tree for the ASoC board.

Sounds like you just solved the machine_is_xxx() problem in ASoC land too
there.  If you're _already_ going for separate devices to describe the
ASoC stuff on the board, then there's no reason that couldn't have already
been done to eliminate the machine_is_xxx() usage in ASoC - rather than
complaining about machine_is_xxx() not being a very good solution.

As I said, the problem was solved years ago, and all the component parts
have been there also for years.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v9 00/12] use nonblock mmc requests to minimize latency

2011-08-28 Thread Russell King - ARM Linux

On Sun, Aug 28, 2011 at 12:50:54PM +0200, Per Forlin wrote:
 On 26 August 2011 18:28, Santosh santosh.shilim...@ti.com wrote:
  + Balaji,
 
  On Friday 26 August 2011 09:49 PM, Russell King - ARM Linux wrote:
 
  I'm not sure who's responsible for this, but the nonblocking MMC stuff is
  broken on OMAPs HSMMC:
 
  mmcblk0: error -84 transferring data, sector 149201, nr 64, cmd response
  0x900, card status 0xb00
  mmcblk0: retrying using single block read
  [ cut here ]
  WARNING: at /home/rmk/git/linux-2.6-rmk/lib/dma-debug.c:811
  check_unmap+0x1ac/0x764()
  omap_hsmmc omap_hsmmc.0: DMA-API: device driver tries to free DMA memory
  it has not allocated [device address=0x80933000] [size=20480 bytes]
  Modules linked in:
  Backtrace:
  [c0017874] (dump_backtrace+0x0/0x10c) from [c02ce8ac]
  (dump_stack+0x18/0x1c)
   r7:c1abfcb8 r6:c0186248 r5:c037de51 r4:032b
  [c02ce894] (dump_stack+0x0/0x1c) from [c0039ed4]
  (warn_slowpath_common+0x58/0x70)
  [c0039e7c] (warn_slowpath_common+0x0/0x70) from [c0039f90]
  (warn_slowpath_fmt+0x38/0x40)
   r8:c1abfd50 r7: r6:5000 r5: r4:80933000
  [c0039f58] (warn_slowpath_fmt+0x0/0x40) from [c0186248]
  (check_unmap+0x1ac/0x764)
   r3:c0367d55 r2:c037e24d
  [c018609c] (check_unmap+0x0/0x764) from [c0186978]
  (debug_dma_unmap_sg+0x100/0x134)
  [c0186878] (debug_dma_unmap_sg+0x0/0x134) from [c0019770]
  (dma_unmap_sg+0x24/0x7c)
  [c001974c] (dma_unmap_sg+0x0/0x7c) from [c0207220]
  (omap_hsmmc_post_req+0x48/0x54)
  [c02071d8] (omap_hsmmc_post_req+0x0/0x54) from [c01fb644]
  (mmc_start_req+0x9c/0x128)
   r4:c1a76000
  [c01fb5a8] (mmc_start_req+0x0/0x128) from [c02049fc]
  (mmc_blk_issue_rw_rq+0x80/0x460)
   r8:c1ab5c00 r7:0001 r6:c1ab5824 r5:c1ab5824 r4:c1ab5c00
  [c020497c] (mmc_blk_issue_rw_rq+0x0/0x460) from [c02050a8]
  (mmc_blk_issue_rq+0x2cc/0x2fc)
  [c0204ddc] (mmc_blk_issue_rq+0x0/0x2fc) from [c02056c4]
  (mmc_queue_thread+0xa0/0x104)
  [c0205624] (mmc_queue_thread+0x0/0x104) from [c0054f8c]
  (kthread+0x88/0x90)
  [c0054f04] (kthread+0x0/0x90) from [c003d9a4] (do_exit+0x0/0x618)
   r7:0013 r6:c003d9a4 r5:c0054f04 r4:c1a7bc7c
  ---[ end trace 3314ad56daf5d14f ]---
 
  Luckily thinks continue to work, but clearly releasing DMA mappings which
  drivers don't own is bad news.  Unfortunately, I don't have the bandwidth
  to be able to investigate this at present - well, I could do but then I'd
  have to drop working on the OMAP4 vs generic suspend/resume code and learn
  about something I've no current clue about.
 
  Please continue your help on generic suspend.
 
  Can someone please investigate and fix whatever is broken.
 
  Will find somebody to look at this issue.
 
 I had a look at this and my guess is that the same mapped DMA memory
 is unmapped twice. First the memory is unmapped in
 omap_hsmmc_dma_cleanup() due to the mmc error, then later in
 omap_hsmmc_post_req(). Here is a patch to resolve that. I haven't had
 the chance to test it yet though.
 
 ---
  drivers/mmc/host/omap_hsmmc.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
 index 21e4a79..7f40767 100644
 --- a/drivers/mmc/host/omap_hsmmc.c
 +++ b/drivers/mmc/host/omap_hsmmc.c
 @@ -1011,6 +1011,7 @@ static void omap_hsmmc_dma_cleanup(struct
 omap_hsmmc_host *host, int errno)
 host-data-sg_len,
 omap_hsmmc_get_dma_dir(host, host-data));
 omap_free_dma(dma_ch);
 +   data-host_cookie = 0;
 }
 host-data = NULL;
  }
 @@ -1576,8 +1577,10 @@ static void omap_hsmmc_post_req(struct mmc_host
 *mmc, struct mmc_request *mrq,
 struct mmc_data *data = mrq-data;
 
 if (host-use_dma) {
 -   dma_unmap_sg(mmc_dev(host-mmc), data-sg, data-sg_len,
 -omap_hsmmc_get_dma_dir(host, data));
 +   if (data-host_cookie)
 +   dma_unmap_sg(mmc_dev(host-mmc), data-sg,
 +data-sg_len,
 +omap_hsmmc_get_dma_dir(host, data));
 data-host_cookie = 0;
 
 
 I also checked the mmci code for this same issue and mmci doesn't have
 the same bug.
 MMCI works fine in 3.1-rc2 even though the code is wrong. I propose
 this change.
 @@ -529,7 +529,7 @@ static void mmci_post_request(struct mmc_host
 *mmc, struct mmc_request *mrq,
 if (chan) {
 if (err)
 dmaengine_terminate_all(chan);
 -   if (err || data-host_cookie)
 +   if (data-host_cookie)
 
 I'll post these patches as soon as I have managed to test them or
 sooner if I get a tested-by.

Looking at MMCI, are you sure all those DMA engine terminate calls are
correct?

For example, why are we terminating all DMA engine transfers in the
post request function?  Surely that should've already been taken care

Re: [PATCH v9 00/12] use nonblock mmc requests to minimize latency

2011-08-26 Thread Russell King - ARM Linux

I'm not sure who's responsible for this, but the nonblocking MMC stuff is
broken on OMAPs HSMMC:

mmcblk0: error -84 transferring data, sector 149201, nr 64, cmd response 0x900, 
card status 0xb00   
mmcblk0: retrying using single block read   
[ cut here ]
WARNING: at /home/rmk/git/linux-2.6-rmk/lib/dma-debug.c:811 
check_unmap+0x1ac/0x764()
omap_hsmmc omap_hsmmc.0: DMA-API: device driver tries to free DMA memory it has 
not allocated [device address=0x80933000] [size=20480 bytes]
Modules linked in:  
Backtrace:  
[c0017874] (dump_backtrace+0x0/0x10c) from [c02ce8ac] (dump_stack+0x18/0x1c)
 r7:c1abfcb8 r6:c0186248 r5:c037de51 r4:032b
[c02ce894] (dump_stack+0x0/0x1c) from [c0039ed4] 
(warn_slowpath_common+0x58/0x70)
[c0039e7c] (warn_slowpath_common+0x0/0x70) from [c0039f90] 
(warn_slowpath_fmt+0x38/0x40)   
 
 r8:c1abfd50 r7: r6:5000 r5: r4:80933000
[c0039f58] (warn_slowpath_fmt+0x0/0x40) from [c0186248] 
(check_unmap+0x1ac/0x764)   

 r3:c0367d55 r2:c037e24d
[c018609c] (check_unmap+0x0/0x764) from [c0186978] 
(debug_dma_unmap_sg+0x100/0x134)
 
[c0186878] (debug_dma_unmap_sg+0x0/0x134) from [c0019770] 
(dma_unmap_sg+0x24/0x7c)
  
[c001974c] (dma_unmap_sg+0x0/0x7c) from [c0207220] 
(omap_hsmmc_post_req+0x48/0x54) 
 
[c02071d8] (omap_hsmmc_post_req+0x0/0x54) from [c01fb644] 
(mmc_start_req+0x9c/0x128)  
  
 r4:c1a76000
[c01fb5a8] (mmc_start_req+0x0/0x128) from [c02049fc] 
(mmc_blk_issue_rw_rq+0x80/0x460)
   
 r8:c1ab5c00 r7:0001 r6:c1ab5824 r5:c1ab5824 r4:c1ab5c00
[c020497c] (mmc_blk_issue_rw_rq+0x0/0x460) from [c02050a8] 
(mmc_blk_issue_rq+0x2cc/0x2fc)  
 
[c0204ddc] (mmc_blk_issue_rq+0x0/0x2fc) from [c02056c4] 
(mmc_queue_thread+0xa0/0x104)   

[c0205624] (mmc_queue_thread+0x0/0x104) from [c0054f8c] (kthread+0x88/0x90) 
[c0054f04] (kthread+0x0/0x90) from [c003d9a4] (do_exit+0x0/0x618)   
 r7:0013 r6:c003d9a4 r5:c0054f04 r4:c1a7bc7c
---[ end trace 3314ad56daf5d14f ]---

Luckily thinks continue to work, but clearly releasing DMA mappings which
drivers don't own is bad news.  Unfortunately, I don't have the bandwidth
to be able to investigate this at present - well, I could do but then I'd
have to drop working on the OMAP4 vs generic suspend/resume code and learn
about something I've no current clue about.

Can someone please investigate and fix whatever is broken.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH 3/4] ARM: EXYNOS4: Add support AFTR mode cpuidle state on EXYNOS4210

2011-08-21 Thread Russell King - ARM Linux

On Fri, Aug 19, 2011 at 06:39:59PM +0530, Amit Daniel Kachhap wrote:
 +ENTRY(exynos4_enter_lp)
 + stmfd   sp!, { r3 - r12, lr }
 +
 + adr r0, sleep_save_misc
 +
 + mrc p15, 0, r2, c15, c0, 0  @ read power control register
 + str r2, [r0], #4
 +
 + mrc p15, 0, r2, c15, c0, 1  @ read diagnostic register
 + str r2, [r0], #4
 +
 + ldr r3, =resume_with_mmu
 + bl  cpu_suspend
 +
 + mov r0, sp
 + bl  exynos4_cpu_lp
 +
 + /* Restore original sp */
 + mov r0, sp
 + add r0, r0, #4
 + ldr sp, [r0]
 +
 + mov r0, #0
 + b   early_wakeup

This is based upon old kernel code.  Clearly hasn't been tested with
anything later than 3.0.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v5] Add ARM cpu topology definition

2011-08-08 Thread Russell King - ARM Linux

On Tue, Jul 05, 2011 at 02:16:23PM +0100, Russell King - ARM Linux wrote:
 Looks fine now, and so can go to my patch system.  Many thanks.

Ping.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v5] Add ARM cpu topology definition

2011-08-08 Thread Russell King - ARM Linux

On Mon, Aug 08, 2011 at 04:42:42PM +0530, RACHANA TEWARI wrote:
 Does anyone know way to initiate a bash script(to run on linux box) from
 windows Written in C#?
 
 How to write these Scripts...

This has nothing to do with CPU topology.  Please do not topic hijack.
This is also off-topic for these mailing lists.  You may wish to
consider posting your query to more appropriate mailing lists or forums
(eg, those dealing with C#).  Thanks.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] ARM: do not mark CPU 0 as hotpluggable

2011-07-21 Thread Russell King - ARM Linux

On Wed, Jul 20, 2011 at 04:32:25PM -0700, Mike Turquette wrote:
 A quick poll of the ARM platforms that implement CPU Hotplug support
 shows that every platform treats CPU 0 as a special case that cannot be
 hotplugged.  In fact every platform has identical code for
 platform_cpu_die which returns -EPERM in the case of CPU 0.

Are you sure that's just not because everyone copied what Realview has
been doing (highly likely)?

I suspect that there's no reason that CPU0 can't be taken down, especially
on those platforms which don't take the CPU fully offline but just put it
into a WFI loop.

Those which restart the CPUs through the boot loader probably detect CPU0
as the boot CPU, so they probably can't take CPU0 down.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] ARM: do not mark CPU 0 as hotpluggable

2011-07-21 Thread Russell King - ARM Linux

On Thu, Jul 21, 2011 at 11:03:04AM +0530, Santosh Shilimkar wrote:
 Just talking on behalf of OMAP, we can't offline CPU0 and limitation
 would remain in future OMAPs too.

Apart from the broken IRQ migration, and the way CPU0 immediately
reawakes if it is offlined on OMAP4 (even when IRQs are migrated off
CPU0) because omap_read_auxcoreboot0() returns 0, is there any other
reason?

With fixed IRQ migration and forcing CPU0 into an infinite WFI loop,
I can offline CPU0 and still have a running system.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH 05/17] ARM: kernel: save/restore kernel IF

2011-07-12 Thread Russell King - ARM Linux

On Tue, Jul 12, 2011 at 11:12:57AM +0100, Lorenzo Pieralisi wrote:
 Thank you very much Russell for this recap.
 
 On Mon, Jul 11, 2011 at 07:40:10PM +0100, Russell King - ARM Linux wrote:
  On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
   Well, short answer is no. On SMP we do need to save CPU registers 
   but if just one single cpu is shutdown L2 is still on. 
   cpu_suspend saves regs on the stack that has to be cleaned from
   L2 before shutting a CPU down which make things more complicated than
   they should.
  
  Hang on.  Please explain something to me here.  You've mentioned a few
  times that cpu_suspend() can't be used because of the L2 cache.  Why
  is this the case?
  
  OMAP appears to have code in its sleep path - which has been converted
  to cpu_suspend() support - to deal with the L2 issues.
  
 
 OMAP4, it is SMP configs I am talking about.

Seriously, that's not something which really concerns me at present
because suspend/resume in any form is not supported there in any form
in mainline.

All my comments are based on the mainline kernel.  That's what I work
with.  Everything elsewhere is not my concern.

So, let me say again.  OMAP suspend/resume support _in_ _mainline_ will
be converted at the next merge window.  As that's only OMAP3 _in_
_mainline_ which has the need for saving state etc, that's the only
OMAP code I have access to, and therefore that's the only thing which
I've been able to fix.

OMAP4 suspend/resume support doesn't exist in mainline and therefore
doesn't exist for me.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH 05/17] ARM: kernel: save/restore kernel IF

2011-07-11 Thread Russell King - ARM Linux

On Mon, Jul 11, 2011 at 01:05:20PM -0700, Santosh Shilimkar wrote:
 (Just to add few more points on top of what Colin already commented)

 On 7/11/2011 11:40 AM, Russell King - ARM Linux wrote:
 On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
 Well, short answer is no. On SMP we do need to save CPU registers
 but if just one single cpu is shutdown L2 is still on.
 cpu_suspend saves regs on the stack that has to be cleaned from
 L2 before shutting a CPU down which make things more complicated than
 they should.

 Hang on.  Please explain something to me here.  You've mentioned a few
 times that cpu_suspend() can't be used because of the L2 cache.  Why
 is this the case?

 OMAP appears to have code in its sleep path - which has been converted
 to cpu_suspend() support - to deal with the L2 issues.

 This part is not done yet Russell. Infact the cpu_resume() function
 need an update to work with L2 enable configuration.

The code does deal with L2 cache enable in the resume path...

 However, lets recap.  What we do in cpu_suspend() in order is:

 - Save the ABI registers onto the stack, and some private state
 - Save the CPU specific state onto the stack
 We need to disable C bit here to avoid any speculative artifacts
 during further operations before WFI.

Which you are doing.

 - Flush the L1 cache
 - Call the platform specific suspend finisher
 Also finisher function should issue the WFI and just in
 case for some reason CPU doesn't hit the targeted low
 power state, finisher function takes care of things
 like enabling C bit, SMP bit etc.

You're restoring the C bit in the non-off paths already which follow a
failed WFI.  You're not touching the SMP bit there though - was it
ever reset?  I don't think so.

 On resume, with the MMU and caches off:

 - Platform defined entry point is called, which _may_ be cpu_resume
directly.
 - Platform initial code is executed to do whatever that requires
 - cpu_resume will be called
 - cpu_resume loads the previously saved private state
 - The CPU specific state is restored
 - Page table is modified to permit 1:1 mapping for MMU enable
 1:1 idmap used here should be mapped as non-cached to avoid L2 related
 issues. I faced similar issue in OMAP sleep code earlier and later
 thanks to Colin,  we got the fixed my making use of non-cached idmap.

It is specified that if the main control register C bit is clear,
accesses will be uncached.

Whether you get cache hits or not is probably implementation dependent,
and provided that the state information is cleaned from the L2 cache,
it doesn't matter if we hit the L2 cached copy or the RAM copy.  It's
the same data.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 01/17] ARM: proc: add definition of cpu_reset for ARMv6 and ARMv7 cores

2011-07-10 Thread Russell King - ARM Linux

On Sun, Jul 10, 2011 at 12:00:24PM +0100, Will Deacon wrote:
 I've had a look at a bunch of the cpu_*_reset definitions and I can't see
 any reason why they wouldn't be callable with the flat mapping in place. In
 fact, there's a scary comment for xscale:

However, that flat mapping doesn't save us, because this only covers
space below PAGE_OFFSET virtual.  We're executing these generally from
virtual space at addresses greater than this, which means when the MMU
is turned off, the instruction stream disappears.

 @ CAUTION: MMU turned off from this point. We count on the pipeline
 @ already containing those two last instructions to survive.
 
 which I think would disappear if the code was called via the ID map.

Yes, and everything pre-ARMv6 fundamentally relies upon the instructions
following the MCR to turn the MMU off already being in the CPUs pipeline.
Pre-ARMv6 relies upon this kind of behaviour from the CPU:

fetch   decode  execute
mcr
mov pc  mcr
nop mov pc  mcr
nop nop mov pc
inst0 --- flushed ---

where inst0 is the instruction at the target of the mov pc branch.
May not be well documented in the architecture manual, but there have
been documentation of this level against CPUs in the past.  Maybe
back in ARMv3 times, but the trick has proven to work all the way up
to ARMv5, and these are of course CPU specific files rather than
architecture specific.

It's curious that with ARMs move to a more relaxed model, that turning
the MMU off has visibly changed from a fairly weak operation to an
effective strong instruction barrier.  These now have visibly this
behaviour:

fetch   decode  execute
mcr
mov pc  mcr
inst0 mov pc  mcr
-- flushed --
mov pc*
* - attempt to reload this instruction fails because MMU is now off.

I'm not saying there that the pipeline is flushed (it may be) - but this
represents the _observed_ behaviour.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH 05/17] ARM: kernel: save/restore kernel IF

2011-07-09 Thread Russell King - ARM Linux

On Sat, Jul 09, 2011 at 09:38:15AM +0100, Russell King - ARM Linux wrote:
 On Thu, Jul 07, 2011 at 04:50:18PM +0100, Lorenzo Pieralisi wrote:
  +static int late_init(void)
  +{
  +   int rc;
  +   struct sr_cluster *cluster;
  +   int cluster_index, cpu_index = sr_platform_get_cpu_index();
 
 Stop this madness, and use the standard linux APIs like smp_processor_id
 here.  It might actually help you find bugs, like the fact that you're
 in a preemptible context here, and so could be rescheduled onto any other
 CPU in the system _after_ you've read the MPIDR register.
 
  +
  +   cluster_index = sr_platform_get_cluster_index();
  +   cluster = main_table.cluster_table + cluster_index;
  +   main_table.os_mmu_context[cluster_index][cpu_index] =
  +   current-active_mm-pgd;
  +   cpu_switch_mm(main_table.fw_mmu_context, current-active_mm);
  +   rc = sr_platform_init();
  +   cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index],
  +   current-active_mm);
 
 CPU numbers are unique in the system, why do you need a 'cluster_index'
 to save this?  In fact why do you even need to save it in a structure
 at all?
 
 Plus, cluster is not used, please get rid of it.

Oh, that's another thing.  This thread is about introducing idle support
not cluster support.  Cluster support is surely something different (esp.
as it seems from the above code that you're trying to support several
clusters of MPCore CPUs, each with physical 0-N CPU numbers.)

Cluster support should be an entirely separate patch series.

If that is not what this cluster stuff in this patch is about, then it's
just written badly and reinforces the need for it to be rewritten - in
that case there's no need for a 2D array.

In any case, smp_processor_id() will be (and must be) unique in any given
running kernel across all CPUs, even if you have clusters of N CPUs all
physically numbered 0-N.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH 06/17] ARM: kernel: save/restore generic infrastructure

2011-07-09 Thread Russell King - ARM Linux

The idea of splitting a large patch up into smaller patches is to do
it in a logical way so that:

1. Each patch is self-contained, adding a single new - and where possible
   complete - feature or bug fix.
2. Ease of review.

Carving your big patch up by file does not do either of this, because
all patches interact with each other.  There's people within ARM Ltd who
have been dealing with patch sets for some time who you could ask advice
from - or who you could ask to review your patches before you send them
out on the mailing list.

On Thu, Jul 07, 2011 at 04:50:19PM +0100, Lorenzo Pieralisi wrote:
 diff --git a/arch/arm/kernel/sr.h b/arch/arm/kernel/sr.h
 new file mode 100644
 index 000..6b24e53
 --- /dev/null
 +++ b/arch/arm/kernel/sr.h
 @@ -0,0 +1,162 @@
 +#define SR_NR_CLUSTERS 1
 +
 +#define STACK_SIZE 512
 +
 +#define CPU_A5   0x4100c050

Not used in this patch - should it be in some other patch?

 +#define CPU_A8   0x4100c080

Not used in this patch - should it be in some other patch?

 +#define CPU_A9   0x410fc090

Not used in this patch - and if this is generic code then it should not
be specific to any CPU.  Please get rid of all these definitions, and
if they're used you need to rework these patches to have proper separation
of generic code from CPU specific code.

 +#define L2_DATA_SIZE 16

Not used in this patch.

 +#define CONTEXT_SPACE_UNCACHED   (2 * PAGE_SIZE)
 +#define PA(f) ((typeof(f) *)virt_to_phys((void *)f))

This is just messed up.  Physical addresses aren't pointers, they're
numeric.  PA() is not used in this patch either, so it appears it can
be deleted.

 +extern int lookup_arch(void);

This doesn't exist in this patch, should it be in another patch or
deleted?

 +/*
 + * Global variables
 + */
 +extern struct sr_main_table main_table;
 +extern unsigned long idle_save_context;
 +extern unsigned long idle_restore_context;
 +extern unsigned long idle_mt;
 +extern void *context_memory_uncached;

Why does this need to be a global?

 +
 +/*
 + * Context save/restore
 + */
 +typedef u32 (sr_save_context_t)
 + (struct sr_cluster *,
 + struct sr_cpu*, u32);

Fits on one line so should be on one line.

 +typedef u32 (sr_restore_context_t)
 + (struct sr_cluster *,
 + struct sr_cpu*);

Fits on one line so should be on one line.

 +
 +extern sr_save_context_t sr_save_context;

This doesn't exist in this patch, should it be in another patch?

 +extern sr_restore_context_t sr_restore_context;

Ditto.

 +
 +
 +extern struct sr_arch *get_arch(void);

Ditto.

 +
 +
 +/*
 + * 1:1 mappings
 + */
 +
 +extern int linux_sr_setup_translation_tables(void);

Ditto.

 +
 +/*
 + * dumb memory allocator
 + */
 +
 +extern void *get_memory(unsigned int size);
 +
 +/*
 + * Entering/Leaving C-states function entries
 + */
 +
 +extern int sr_platform_enter_cstate(unsigned cpu_index, struct sr_cpu *cpu,
 + struct sr_cluster *cluster);
 +extern int sr_platform_leave_cstate(unsigned cpu_index, struct sr_cpu *cpu,
 + struct sr_cluster *cluster);

See comment at the bottom - would inline functions here be better
or maybe even place them at the callsite to make the code easier to
understand if they're only used at one location.

 +
 +/* save/restore main table */
 +extern struct sr_main_table main_table;

Why do we have two 'main_table' declarations in this same header file?

 +
 +/*
 + * Init functions
 + */
 +
 +extern int sr_platform_runtime_init(void);

Not defined in this patch.

 +extern int sr_platform_init(void);
 +extern int sr_context_init(void);
 +
 +
 +/*
 + * v7 specific
 + */
 +
 +extern char *cpu_v7_suspend_size;

No - stop going underneath the covers of existing APIs to get what you
want.  Use cpu_suspend() and cpu_resume() directly.  Note that they've
changed to be more flexible and those patches have been on this mailing
list, and will be going in for 3.1.

If they still don't do what you need, I'm going to be *pissed* because
you've obviously known that they don't yet you haven't taken the time
to get involved on this mailing list with the discussions over it.

 +extern void scu_cpu_mode(void __iomem *base, int state);

Not defined in this patch - should it be in another patch or deleted?

 +
 +/*
 + * These arrays keep suitable stack pointers for CPUs.
 + *
 + * The memory must be 8-byte aligned.
 + */
 +
 +extern unsigned long platform_cpu_stacks[CONFIG_NR_CPUS];

Ditto.

 +extern unsigned long platform_cpu_nc_stacks[CONFIG_NR_CPUS];

Ditto.

And should these be per-cpu variables?  In any case, CONFIG_NR_CPUS
doesn't look right here, NR_CPUS is probably what you want.

 +#endif
 diff --git a/arch/arm/kernel/sr_context.c b/arch/arm/kernel/sr_context.c
 new file mode 100644
 index 000..25eaa43
 --- /dev/null
 +++ b/arch/arm/kernel/sr_context.c
 @@ -0,0 +1,23 @@
 +/*
 + * Copyright (C) 2008-2011 ARM Limited
 + * Author(s): Jon Callan, Lorenzo Pieralisi
 + *
 + * This program is

Re: [PATCH 01/17] ARM: proc: add definition of cpu_reset for ARMv6 and ARMv7 cores

2011-07-09 Thread Russell King - ARM Linux

On Thu, Jul 07, 2011 at 04:50:14PM +0100, Lorenzo Pieralisi wrote:
 From: Will Deacon will.dea...@arm.com
 
 This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7
 cores, which disable the MMU via the SCTLR.

This really needs fixing properly, so that we have this well defined
across all supported ARM cores.  Requiring ARMv6 and ARMv7 to have this
code called with a flat mapping (which may overlap a section boundary)
vs ARMv5 and lower code which doesn't is just silly.

With any API, we need consistency.  So if ARMv6 and v7 require a flat
mapping, we need to ensure that ARMv5 and lower is happy to deal with
that code being also called with a flat mapping.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 02/17] ARM: Add cpu power management notifiers

2011-07-09 Thread Russell King - ARM Linux

On Thu, Jul 07, 2011 at 04:50:15PM +0100, Lorenzo Pieralisi wrote:
 During some CPU power modes entered during idle, hotplug and
 suspend, peripherals located in the CPU power domain, such as
 the GIC and VFP, may be powered down.  Add a notifier chain
 that allows drivers for those peripherals to be notified
 before and after they may be reset.

I defer comment on this until I see what the CPU_COMPLEX_* stuff is used
for.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 03/17] ARM: gic: Use cpu pm notifiers to save gic state

2011-07-09 Thread Russell King - ARM Linux

On Thu, Jul 07, 2011 at 04:50:16PM +0100, Lorenzo Pieralisi wrote:
 From: Colin Cross ccr...@android.com
 
 When the cpu is powered down in a low power mode, the gic cpu
 interface may be reset, and when the cpu complex is powered
 down, the gic distributor may also be reset.
 
 This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save
 and restore the gic cpu interface registers, and the
 CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save
 and restore the gic distributor registers.
 
 Signed-off-by: Colin Cross ccr...@android.com

Lost attributations and original author.  This is based in part on code
from Gary King, according to patch 6646/1 in the patch system.

Moreover, how now that we have genirq dealing with the suspend/restore
issues, how much of this is actually required.  And should this be
GIC specific or should there be a way of asking genirq to take care of
some of this for us?

We need to _reduce_ the amount of code needed to support this stuff, and
if core code almost-but-not-quite does what we need then we need to talk
to the maintainers of that code to see whether it can be changed.

Because adding 212 lines to save and restore the state of every interrupt
controller that we may have just isn't on.  We need this properly
abstracted and dealt with in a generic way.

 ---
  arch/arm/common/gic.c |  212 
 +
  1 files changed, 212 insertions(+), 0 deletions(-)
 
 diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c
 index 4ddd0a6..8d62e07 100644
 --- a/arch/arm/common/gic.c
 +++ b/arch/arm/common/gic.c
 @@ -29,6 +29,7 @@
  #include linux/cpumask.h
  #include linux/io.h
  
 +#include asm/cpu_pm.h
  #include asm/irq.h
  #include asm/mach/irq.h
  #include asm/hardware/gic.h
 @@ -42,6 +43,17 @@ struct gic_chip_data {
   unsigned int irq_offset;
   void __iomem *dist_base;
   void __iomem *cpu_base;
 +#ifdef CONFIG_CPU_PM
 + u32 saved_spi_enable[DIV_ROUND_UP(1020, 32)];
 + u32 saved_spi_conf[DIV_ROUND_UP(1020, 16)];
 + u32 saved_spi_pri[DIV_ROUND_UP(1020, 4)];
 + u32 saved_spi_target[DIV_ROUND_UP(1020, 4)];
 + u32 __percpu *saved_ppi_enable;
 + u32 __percpu *saved_ppi_conf;
 + u32 __percpu *saved_ppi_pri;
 +#endif
 +
 + unsigned int gic_irqs;
  };
  
  /*
 @@ -283,6 +295,8 @@ static void __init gic_dist_init(struct gic_chip_data 
 *gic,
   if (gic_irqs  1020)
   gic_irqs = 1020;
  
 + gic-gic_irqs = gic_irqs;
 +
   /*
* Set all global interrupts to be level triggered, active low.
*/
 @@ -350,6 +364,203 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data 
 *gic)
   writel_relaxed(1, base + GIC_CPU_CTRL);
  }
  
 +#ifdef CONFIG_CPU_PM
 +/*
 + * Saves the GIC distributor registers during suspend or idle.  Must be 
 called
 + * with interrupts disabled but before powering down the GIC.  After calling
 + * this function, no interrupts will be delivered by the GIC, and another
 + * platform-specific wakeup source must be enabled.
 + */
 +static void gic_dist_save(unsigned int gic_nr)
 +{
 + unsigned int gic_irqs;
 + void __iomem *dist_base;
 + int i;
 +
 + if (gic_nr = MAX_GIC_NR)
 + BUG();
 +
 + gic_irqs = gic_data[gic_nr].gic_irqs;
 + dist_base = gic_data[gic_nr].dist_base;
 +
 + if (!dist_base)
 + return;
 +
 + for (i = 0; i  DIV_ROUND_UP(gic_irqs, 16); i++)
 + gic_data[gic_nr].saved_spi_conf[i] =
 + readl_relaxed(dist_base + GIC_DIST_CONFIG + i * 4);
 +
 + for (i = 0; i  DIV_ROUND_UP(gic_irqs, 4); i++)
 + gic_data[gic_nr].saved_spi_pri[i] =
 + readl_relaxed(dist_base + GIC_DIST_PRI + i * 4);
 +
 + for (i = 0; i  DIV_ROUND_UP(gic_irqs, 4); i++)
 + gic_data[gic_nr].saved_spi_target[i] =
 + readl_relaxed(dist_base + GIC_DIST_TARGET + i * 4);
 +
 + for (i = 0; i  DIV_ROUND_UP(gic_irqs, 32); i++)
 + gic_data[gic_nr].saved_spi_enable[i] =
 + readl_relaxed(dist_base + GIC_DIST_ENABLE_SET + i * 4);
 +
 + writel_relaxed(0, dist_base + GIC_DIST_CTRL);
 +}
 +
 +/*
 + * Restores the GIC distributor registers during resume or when coming out of
 + * idle.  Must be called before enabling interrupts.  If a level interrupt
 + * that occured while the GIC was suspended is still present, it will be
 + * handled normally, but any edge interrupts that occured will not be seen by
 + * the GIC and need to be handled by the platform-specific wakeup source.
 + */
 +static void gic_dist_restore(unsigned int gic_nr)
 +{
 + unsigned int gic_irqs;
 + unsigned int i;
 + void __iomem *dist_base;
 +
 + if (gic_nr = MAX_GIC_NR)
 + BUG();
 +
 + gic_irqs = gic_data[gic_nr].gic_irqs;
 + dist_base = gic_data[gic_nr].dist_base;
 +
 + if (!dist_base)
 + return;
 +
 + writel_relaxed(0, dist_base + GIC_DIST_CTRL);
 +
 +

Re: [PATCH 04/17] ARM: vfp: Use cpu pm notifiers to save vfp state

2011-07-09 Thread Russell King - ARM Linux

On Thu, Jul 07, 2011 at 04:50:17PM +0100, Lorenzo Pieralisi wrote:
 From: Colin Cross ccr...@android.com
 
 When the cpu is powered down in a low power mode, the vfp
 registers may be reset.
 
 This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save
 and restore the cpu's vfp registers.
 
 Signed-off-by: Colin Cross ccr...@android.com
 ---
  arch/arm/vfp/vfpmodule.c |   40 
  1 files changed, 40 insertions(+), 0 deletions(-)
 
 diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
 index f25e7ec..6f08dbe 100644
 --- a/arch/arm/vfp/vfpmodule.c
 +++ b/arch/arm/vfp/vfpmodule.c
 @@ -21,6 +21,7 @@
  #include asm/cputype.h
  #include asm/thread_notify.h
  #include asm/vfp.h
 +#include asm/cpu_pm.h
  
  #include vfpinstr.h
  #include vfp.h
 @@ -169,6 +170,44 @@ static struct notifier_block vfp_notifier_block = {
   .notifier_call  = vfp_notifier,
  };
  
 +#ifdef CONFIG_CPU_PM
 +static int vfp_cpu_pm_notifier(struct notifier_block *self, unsigned long 
 cmd,
 + void *v)
 +{
 + u32 fpexc = fmrx(FPEXC);
 + unsigned int cpu = smp_processor_id();
 +
 + switch (cmd) {
 + case CPU_PM_ENTER:
 + if (last_VFP_context[cpu]) {
 + fmxr(FPEXC, fpexc | FPEXC_EN);
 + vfp_save_state(last_VFP_context[cpu], fpexc);
 + /* force a reload when coming back from idle */
 + last_VFP_context[cpu] = NULL;
 + fmxr(FPEXC, fpexc  ~FPEXC_EN);
 + }
 + break;

This doesn't look right.  On SMP setups, we always save the state of an
enabled VFP on thread switches.  That means the saved context in every
thread is always up to date for all threads, except _possibly_ for the
currently executing thread on the CPU.

On UP setups, we only save the state when we need to, so we need to do
something like the above.

However, we're growing more and more functions in the VFP code dealing
with saving and restoring state, and it's starting to become really silly
and confusing about which function is called for what and why, and why
it's different from another function doing something similar.

We need to sort this out so we have a _sane_ approach to this, rather
than inventing more and more creative ways to save VFP state and
restore it later.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH 04/17] ARM: vfp: Use cpu pm notifiers to save vfp state

2011-07-09 Thread Russell King - ARM Linux

On Sat, Jul 09, 2011 at 11:44:08AM +0100, Russell King - ARM Linux wrote:
 We need to sort this out so we have a _sane_ approach to this, rather
 than inventing more and more creative ways to save VFP state and
 restore it later.

And here, let's prove that the current code is just soo bloody complex
that it needs redoing.  In the following code, 'last_VFP_context' is
renamed to 'vfp_current_hw_state' for clarity.

void vfp_sync_hwstate(struct thread_info *thread)
{
unsigned int cpu = get_cpu();

/*
 * If the thread we're interested in is the current owner of the
 * hardware VFP state, then we need to save its state.
 */
if (vfp_current_hw_state[cpu] == thread-vfpstate) {
u32 fpexc = fmrx(FPEXC);

/*
 * Save the last VFP state on this CPU.
 */
fmxr(FPEXC, fpexc | FPEXC_EN);
vfp_save_state(thread-vfpstate, fpexc | FPEXC_EN);
fmxr(FPEXC, fpexc);
}

Here, 'thread' is the thread we're interested in ensuring that we have
up to date context in thread-vfpstate.  On entry to this function, we
can be running on any CPU in the system, and 'thread' could have been
running on any other CPU in the system.

What this code is saying is: if the currrent CPU's hardware VFP state was
owned by this thread, then update the current VFP state.  So far it looks
sane.

Now, lets consider what with thread migration.  First, lets define three
threads.

Thread 1, we'll call 'interesting_thread' which is a thread which is
running on CPU0, using VFP (so vfp_current_hw_state[0] =
interesting_thread-vfpstate) and gets migrated off to CPU1, where
it continues execution of VFP instructions.

Thread 2, we'll call 'new_cpu0_thread' which is the thread which takes
over on CPU0.  This has also been using VFP, and last used VFP on CPU0,
but doesn't use it again.

The following code will be executed twice:

cpu = thread-cpu;

/*
 * On SMP, if VFP is enabled, save the old state in
 * case the thread migrates to a different CPU. The
 * restoring is done lazily.
 */
if ((fpexc  FPEXC_EN)  vfp_current_hw_state[cpu]) {
vfp_save_state(vfp_current_hw_state[cpu], fpexc);
vfp_current_hw_state[cpu]-hard.cpu = cpu;
}
/*
 * Thread migration, just force the reloading of the
 * state on the new CPU in case the VFP registers
 * contain stale data.
 */
if (thread-vfpstate.hard.cpu != cpu)
vfp_current_hw_state[cpu] = NULL;

The first execution will be on CPU0 to switch away from 'interesting_thread'.
interesting_thread-cpu will be 0.

So, vfp_current_hw_state[0] points at interesting_thread-vfpstate.
The hardware state will be saved, along with the CPU number (0) that
it was executing on.

'thread' will be 'new_cpu0_thread' with new_cpu0_thread-cpu = 0.
Also, because it was executing on CPU0, new_cpu0_thread-vfpstate.hard.cpu = 0,
and so the thread migration check is not triggered.

This means that vfp_current_hw_state[0] remains pointing at interesting_thread.

The second execution will be on CPU1 to switch _to_ 'interesting_thread'.
So, 'thread' will be 'interesting_thread' and interesting_thread-cpu now
will be 1.  The previous thread executing on CPU1 is not relevant to this
so we shall ignore that.

We get to the thread migration check.  Here, we discover that
interesting_thread-vfpstate.hard.cpu = 0, yet interesting_thread-cpu is
now 1, indicating thread migration.  We set vfp_current_hw_state[1] to
NULL.

So, at this point vfp_current_hw_state[] contains the following:

[0] = interesting_thread
[1] = NULL

Our interesting thread now executes a VFP instruction, takes a fault
which loads the state into the VFP hardware.  Now, through the assembly
we now have:

[0] = interesting_thread
[1] = interesting_thread

CPU1 stops due to ptrace (and so saves its VFP state) using the thread
switch code above), and CPU0 calls vfp_sync_hwstate().

if (vfp_current_hw_state[cpu] == thread-vfpstate) {
vfp_save_state(thread-vfpstate, fpexc | FPEXC_EN);

BANG, we corrupt interesting_thread's VFP state by overwriting the
more up-to-date state saved by CPU1 with the old VFP state from CPU0.

I think this is not the only problem with this code, and it's in
desperate need of being cleaned up.  Until such time, adding more
state saving code is just going to be extremely hairy.

Finally, as far as saving state for _idle_ goes (in other words, while
the CPU's idle loop in cpu_idle() is running), take a moment to consider
the following: the idle thread being a kernel thread does not use VFP.
It has no useful VFP state.  So:

1. On SMP, because we've switched away from any userland thread, we have

Re: [PATCH 03/17] ARM: gic: Use cpu pm notifiers to save gic state

2011-07-09 Thread Russell King - ARM Linux

On Sat, Jul 09, 2011 at 04:01:19PM -0700, Colin Cross wrote:
 On Sat, Jul 9, 2011 at 3:33 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  On Sat, Jul 09, 2011 at 03:10:56PM -0700, Colin Cross wrote:
  This is necessary for cpuidle states that lose the GIC registers, not
  just suspend, because the GIC is in the cpu's power domain.  We could
  avoid saving and restoring all the GIC registers in suspend and idle
  by reusing the initialization functions, and then having the core irq
  code call the unmask, set_type, and set_affinity functions on each irq
  to reconfigure it, but that will be very inefficient - it will convert
  each register write in the restore functions to a read-modify-write
  per interrupt in that register.  Santosh is already complaining that
  this commong GIC restore code will be slower than the automatic DMA to
  restore the GIC registers that OMAP4 supports.
 
  Well, we need to come up with something sensible - a way of doing this
  which doesn't require every interrupt controller driver (of which we as
  an architecture have many) to have lots of support added.
 
  If the current way is inefficient and is noticably so, then let's talk
  to Thomas about finding a way around that - maybe having the generic
  code make one suspend/resume callback per irq gc chip rather than doing
  it per-IRQ.  We can then reuse the same paths for suspend/resume as for
  idle state saving.
 
 
 Are you referring to moving the gic driver to be gc chip?  Otherwise,
 I don't understand your suggestion - how is callback per chip any
 different than what this patch implements?  It just gets it's
 notification through a cpu_pm notifier, which works in idle and
 suspend, instead of a syscore op like the gc driver does.
 
 This patch does save and restore some registers that are never
 modified after init, so they don't need to be saved.

The point is that we should aim to get to the point where, if an
interrupt controller supports PM, then it supports _all_ PM out the
box and doesn't require additional code for cpu idle PM vs system
suspend PM.

In other words, all we should need to do is provide genirq with a
couple of functions for 'save state' and 'restore state'.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC PATCH 00/17] ARM: common idle infrastructure

2011-07-07 Thread Russell King - ARM Linux

On Thu, Jul 07, 2011 at 04:50:13PM +0100, Lorenzo Pieralisi wrote:
 This patchset is a first attempt at providing a consolidation of idle
 code for the ARM processor architecture and a request for comment on
 the provided methodology.
 It relies and it is based on kernel features such as suspend/resume,
 pm notifiers and common code for cpu_reset().

Please delay this for the next merge window - there's too much going on
at the moment to even start looking at this patch set.

Thanks.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v5] Add ARM cpu topology definition

2011-07-05 Thread Russell King - ARM Linux

Looks fine now, and so can go to my patch system.  Many thanks.

On Mon, Jul 04, 2011 at 06:49:45PM +0200, Vincent Guittot wrote:
 The affinity between ARM processors is defined in the MPIDR register.
 We can identify which processors are in the same cluster,
 and which ones have performance interdependency. We can define the
 cpu topology of ARM platform, that is then used by sched_mc and sched_smt.
 
 The default state of sched_mc and sched_smt config is disable.
 When enabled, the behavior of the scheduler can be modified with
 sched_mc_power_savings and sched_smt_power_savings sysfs interfaces.
 
 Changes since v4 :
 *  Remove unnecessary parentheses and blank lines
 
 Changes since v3 :
 * Update the format of printk message
 * Remove blank line
 
 Changes since v2 :
 * Update the commit message and some comments
 
 Changes since v1 :
 * Update the commit message
 * Add read_cpuid_mpidr in arch/arm/include/asm/cputype.h
 * Modify header of arch/arm/kernel/topology.c
 * Modify tests and manipulation of MPIDR's bitfields
 * Modify the place and dependancy of the config
 * Modify Noop functions
 
 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 Reviewed-by: Amit Kucheria amit.kuche...@linaro.org
 ---
  arch/arm/Kconfig|   25 +++
  arch/arm/include/asm/cputype.h  |6 ++
  arch/arm/include/asm/topology.h |   33 +
  arch/arm/kernel/Makefile|1 +
  arch/arm/kernel/smp.c   |5 ++
  arch/arm/kernel/topology.c  |  148 
 +++
  6 files changed, 218 insertions(+), 0 deletions(-)
  create mode 100644 arch/arm/kernel/topology.c
 
 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
 index 9adc278..f327e55 100644
 --- a/arch/arm/Kconfig
 +++ b/arch/arm/Kconfig
 @@ -1344,6 +1344,31 @@ config SMP_ON_UP
  
 If you don't know what to do here, say Y.
  
 +config ARM_CPU_TOPOLOGY
 + bool Support cpu topology definition
 + depends on SMP  CPU_V7
 + default y
 + help
 +   Support ARM cpu topology definition. The MPIDR register defines
 +   affinity between processors which is then used to describe the cpu
 +   topology of an ARM System.
 +
 +config SCHED_MC
 + bool Multi-core scheduler support
 + depends on ARM_CPU_TOPOLOGY
 + help
 +   Multi-core scheduler support improves the CPU scheduler's decision
 +   making when dealing with multi-core CPU chips at a cost of slightly
 +   increased overhead in some places. If unsure say N here.
 +
 +config SCHED_SMT
 + bool SMT scheduler support
 + depends on ARM_CPU_TOPOLOGY
 + help
 +   Improves the CPU scheduler's decision making when dealing with
 +   MultiThreading at a cost of slightly increased overhead in some
 +   places. If unsure say N here.
 +
  config HAVE_ARM_SCU
   bool
   depends on SMP
 diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
 index cd4458f..cb47d28 100644
 --- a/arch/arm/include/asm/cputype.h
 +++ b/arch/arm/include/asm/cputype.h
 @@ -8,6 +8,7 @@
  #define CPUID_CACHETYPE  1
  #define CPUID_TCM2
  #define CPUID_TLBTYPE3
 +#define CPUID_MPIDR  5
  
  #define CPUID_EXT_PFR0   c1, 0
  #define CPUID_EXT_PFR1   c1, 1
 @@ -70,6 +71,11 @@ static inline unsigned int __attribute_const__ 
 read_cpuid_tcmstatus(void)
   return read_cpuid(CPUID_TCM);
  }
  
 +static inline unsigned int __attribute_const__ read_cpuid_mpidr(void)
 +{
 + return read_cpuid(CPUID_MPIDR);
 +}
 +
  /*
   * Intel's XScale3 core supports some v6 features (supersections, L2)
   * but advertises itself as v5 as it does not support the v6 ISA.  For
 diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
 index accbd7c..a7e457e 100644
 --- a/arch/arm/include/asm/topology.h
 +++ b/arch/arm/include/asm/topology.h
 @@ -1,6 +1,39 @@
  #ifndef _ASM_ARM_TOPOLOGY_H
  #define _ASM_ARM_TOPOLOGY_H
  
 +#ifdef CONFIG_ARM_CPU_TOPOLOGY
 +
 +#include linux/cpumask.h
 +
 +struct cputopo_arm {
 + int thread_id;
 + int core_id;
 + int socket_id;
 + cpumask_t thread_sibling;
 + cpumask_t core_sibling;
 +};
 +
 +extern struct cputopo_arm cpu_topology[NR_CPUS];
 +
 +#define topology_physical_package_id(cpu)(cpu_topology[cpu].socket_id)
 +#define topology_core_id(cpu)(cpu_topology[cpu].core_id)
 +#define topology_core_cpumask(cpu)   (cpu_topology[cpu].core_sibling)
 +#define topology_thread_cpumask(cpu) (cpu_topology[cpu].thread_sibling)
 +
 +#define mc_capable() (cpu_topology[0].socket_id != -1)
 +#define smt_capable()(cpu_topology[0].thread_id != -1)
 +
 +void init_cpu_topology(void);
 +void store_cpu_topology(unsigned int cpuid);
 +const struct cpumask *cpu_coregroup_mask(unsigned int cpu);
 +
 +#else
 +
 +static inline void init_cpu_topology(void) { }
 +static inline void store_cpu_topology(unsigned int cpuid) { }
 +
 +#endif
 +
  #include asm-generic/topology.h
  
  #endif /* _ASM_ARM_TOPOLOGY_H

Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

2011-07-03 Thread Russell King - ARM Linux

On Tue, Jun 28, 2011 at 09:22:20AM +0300, saeed bishara wrote:
 On Mon, Jun 27, 2011 at 2:02 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  On Mon, Jun 27, 2011 at 01:34:48PM +0300, saeed bishara wrote:
  Russell,
    I'm curious about the correctness of this patch for systems with
  outer cache. shouldn't the dsb be issued before the outer cache
  maintenance?
 
  Maybe we should do two passes over SG lists then - one for the inner and
  another for the outer cache?
 
  In effect we could do three passes:
 
  1. Calculate the total size of the SG list to determine whether full
    cache flush is more efficient.
  2. Flush inner cache
    Then dsb()
  3. Flush outer cache
    Another dsb()
 
 looking at l2x0 cache, it seems to me that it would be possible to do
 asynchronous l2 cache maintenance for
 range operations, so maybe that can be utilized in order to
 parallelism between inner cache and outer cache maintenance,
 so the flow can be:
 2. Flush sg buffer from inner cache
 3. dsb()
 4. start Flush outer cache for that buffer
 5. Flush next sg buffer from inner cache
 6. dsb
 7. goto 4.
 8. when no more buffers left, wait for outer cache operations

I'm not sure how practical that is given the architecture of the L2x0
controllers - where we need to wait for the previous operation to
complete by reading the operation specific register and then issue a
sync.

Having looked at this again, I think trying to do any optimization of
this code will be fraught, because of the dmabounce stuff getting in
the way.  If only dmabounce didn't exist... if only crap hardware didn't
exist...  dmabounce really needs to die.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency

2011-07-02 Thread Russell King - ARM Linux

On Fri, Jul 01, 2011 at 06:44:43PM +0200, Arnd Bergmann wrote:
 On Thursday 30 June 2011, Russell King - ARM Linux wrote:
  We've been here before - with PCMCIA's card insertion code, where you
  have to go through a sequence of events (insert, power up, reset, etc).
  The PCMCIA code used to have a collection of small functions to do
  each step, one chained after the other in a state machine fashion.
  The result was horrid.  That's exactly what you'll end up with here.
  
  Threads have their place, and this is one of them.
 
 Ok, fair enough. The performance enhancement is certainly here already
 with getting the cache management operations out of the hot path,
 and for the fully asynchronous case it's not getting better by trying
 to be smarter.
 
 At least for ARM, the overhead of the DMA mapping operations will
 dwarf the overhead of the extra context switches for the foreseeable
 future, so we don't need to bother.
 
 Things might be different for coherent low-end CPU cores like Atom
 when mmc device become much faster and block access becomes CPU
 bound.

One other thing to be considered here is whether this idea should be
limited to just MMC or whether it should be extended further, to
move the DMA mapping stuff out of the hot path for other block devices
too.

There are ARM systems with SATA which do 28MB/s - which could be
improved by this technique.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v4] Add ARM cpu topology definition

2011-07-01 Thread Russell King - ARM Linux

On Fri, Jul 01, 2011 at 07:43:30AM +0200, Vincent Guittot wrote:
 Changes since v3 :
 * Update the format of printk message
 * Remove blank line

Can I trouble you to check the patch for more instances of the 'blank line
at end of function' thing... Also, let's get rid of unnecessary parens.

 diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
 index accbd7c..63a7454 100644
 --- a/arch/arm/include/asm/topology.h
 +++ b/arch/arm/include/asm/topology.h
 @@ -1,6 +1,39 @@
  #ifndef _ASM_ARM_TOPOLOGY_H
  #define _ASM_ARM_TOPOLOGY_H
  
 +#ifdef CONFIG_ARM_CPU_TOPOLOGY
 +
 +#include linux/cpumask.h
 +
 +struct cputopo_arm {
 + int thread_id;
 + int core_id;
 + int socket_id;
 + cpumask_t thread_sibling;
 + cpumask_t core_sibling;
 +};
 +
 +extern struct cputopo_arm cpu_topology[NR_CPUS];
 +
 +#define topology_physical_package_id(cpu)(cpu_topology[cpu].socket_id)
 +#define topology_core_id(cpu)(cpu_topology[cpu].core_id)
 +#define topology_core_cpumask(cpu)   ((cpu_topology[cpu].core_sibling))
 +#define topology_thread_cpumask(cpu) ((cpu_topology[cpu].thread_sibling))

The inner-most parens aren't required.

 +
 +#define mc_capable() (cpu_topology[0].socket_id != -1)
 +#define smt_capable()(cpu_topology[0].thread_id != -1)
 +
 +void init_cpu_topology(void);
 +void store_cpu_topology(unsigned int cpuid);
 +const struct cpumask *cpu_coregroup_mask(unsigned int cpu);
 +
 +#else
 +
 +static inline void init_cpu_topology(void) { };
 +static inline void store_cpu_topology(unsigned int cpuid) { };

Functions don't need a ; after the }

 +
 +#endif
 +
  #include asm-generic/topology.h
  
  #endif /* _ASM_ARM_TOPOLOGY_H */
 diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
 index a5b31af..816a481 100644
 --- a/arch/arm/kernel/Makefile
 +++ b/arch/arm/kernel/Makefile
 @@ -61,6 +61,7 @@ obj-$(CONFIG_IWMMXT)+= iwmmxt.o
  obj-$(CONFIG_CPU_HAS_PMU)+= pmu.o
  obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
  AFLAGS_iwmmxt.o  := -Wa,-mcpu=iwmmxt
 +obj-$(CONFIG_ARM_CPU_TOPOLOGY)  += topology.o
  
  ifneq ($(CONFIG_ARCH_EBSA110),y)
obj-y  += io.o
 diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
 index 344e52b..3e8dc3b 100644
 --- a/arch/arm/kernel/smp.c
 +++ b/arch/arm/kernel/smp.c
 @@ -31,6 +31,7 @@
  #include asm/cacheflush.h
  #include asm/cpu.h
  #include asm/cputype.h
 +#include asm/topology.h
  #include asm/mmu_context.h
  #include asm/pgtable.h
  #include asm/pgalloc.h
 @@ -268,6 +269,9 @@ static void __cpuinit smp_store_cpu_info(unsigned int 
 cpuid)
   struct cpuinfo_arm *cpu_info = per_cpu(cpu_data, cpuid);
  
   cpu_info-loops_per_jiffy = loops_per_jiffy;
 +
 + store_cpu_topology(cpuid);
 +

Don't need this blank line.

  }
  
  /*
 @@ -354,6 +358,8 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
  {
   unsigned int ncores = num_possible_cpus();
  
 + init_cpu_topology();
 +
   smp_store_cpu_info(smp_processor_id());
  
   /*
 diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
 new file mode 100644
 index 000..e8f3b95
 --- /dev/null
 +++ b/arch/arm/kernel/topology.c
 @@ -0,0 +1,149 @@
 +/*
 + * arch/arm/kernel/topology.c
 + *
 + * Copyright (C) 2011 Linaro Limited.
 + * Written by: Vincent Guittot
 + *
 + * based on arch/sh/kernel/topology.c
 + *
 + * This file is subject to the terms and conditions of the GNU General Public
 + * License.  See the file COPYING in the main directory of this archive
 + * for more details.
 + */
 +
 +#include linux/cpu.h
 +#include linux/cpumask.h
 +#include linux/init.h
 +#include linux/percpu.h
 +#include linux/node.h
 +#include linux/nodemask.h
 +#include linux/sched.h
 +
 +#include asm/cputype.h
 +#include asm/topology.h
 +
 +#define MPIDR_SMP_BITMASK (0x3  30)
 +#define MPIDR_SMP_VALUE (0x2  30)
 +
 +#define MPIDR_MT_BITMASK (0x1  24)
 +
 +/*
 + * These masks reflect the current use of the affinity levels.
 + * The affinity level can be up to 16 bits according to ARM ARM
 + */
 +
 +#define MPIDR_LEVEL0_MASK 0x3
 +#define MPIDR_LEVEL0_SHIFT 0
 +
 +#define MPIDR_LEVEL1_MASK 0xF
 +#define MPIDR_LEVEL1_SHIFT 8
 +
 +#define MPIDR_LEVEL2_MASK 0xFF
 +#define MPIDR_LEVEL2_SHIFT 16
 +
 +struct cputopo_arm cpu_topology[NR_CPUS];
 +
 +const struct cpumask *cpu_coregroup_mask(unsigned int cpu)
 +{
 + return (cpu_topology[cpu].core_sibling);

These parens aren't required.

 +}
 +
 +/*
 + * store_cpu_topology is called at boot when only one cpu is running
 + * and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
 + * which prevents simultaneous write access to cpu_topology array
 + */
 +void store_cpu_topology(unsigned int cpuid)
 +{
 + struct cputopo_arm *cpuid_topo = (cpu_topology[cpuid]);

Neither are these.

 + unsigned int mpidr;
 + unsigned int cpu;
 +
 + /* If the cpu topology has been already set, just return */
 + if (cpuid_topo-core_id

Re: [PATCH v8 00/12] use nonblock mmc requests to minimize latency

2011-06-30 Thread Russell King - ARM Linux

On Thu, Jun 30, 2011 at 03:12:46PM +0200, Arnd Bergmann wrote:
 I think this looks good enough to merge into the linux-mmc tree, the code is
 clean and the benefits are clear.
 
 Acked-by: Arnd Bergmann a...@arndb.de
 
 One logical follow-up as both a cleanup and performance optimization would be
 to get rid of the mmc_queue_thread completely. When mmc_blk_issue_rq() is
 non-blocking always, you can call it directly from the mmc_request()
 function, instead of waking up another thread to do it for you.

It isn't anywhere near that - because you need to wait for the
request to complete, then analyze the results and if there has been
an error, send more commands and wait for their response.

To do all that in an asynchronous fashion will just create a mess of
small little functions with hard to understand code.  It's far better
to do all that in a clear procedural way in a thread.

We've been here before - with PCMCIA's card insertion code, where you
have to go through a sequence of events (insert, power up, reset, etc).
The PCMCIA code used to have a collection of small functions to do
each step, one chained after the other in a state machine fashion.
The result was horrid.  That's exactly what you'll end up with here.

Threads have their place, and this is one of them.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

2011-06-27 Thread Russell King - ARM Linux

On Mon, Jun 27, 2011 at 01:34:48PM +0300, saeed bishara wrote:
 Russell,
   I'm curious about the correctness of this patch for systems with
 outer cache. shouldn't the dsb be issued before the outer cache
 maintenance?

Maybe we should do two passes over SG lists then - one for the inner and
another for the outer cache?

In effect we could do three passes:

1. Calculate the total size of the SG list to determine whether full
   cache flush is more efficient.
2. Flush inner cache
   Then dsb()
3. Flush outer cache
   Another dsb()


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

2011-06-23 Thread Russell King - ARM Linux

On Tue, Jun 21, 2011 at 11:26:27AM +0200, Per Forlin wrote:
 Here are the results.

It looks like this patch is either a no-op or slightly worse.  As
people have been telling me that dsb is rather expensive, and this
patch results in less dsbs, I'm finding these results hard to believe.
It seems to be saying that dsb is an effective no-op on your platform.

So either people are wrong about dsb being expensive, the patch is
wrong, or there's something wrong with these results/test method.

You do have an error in the ported patch, as that hasn't updated the
v7 cache cleaning code to remove the dsb() there, but that would only
affect the write tests.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency

2011-06-21 Thread Russell King - ARM Linux

On Sun, Jun 19, 2011 at 11:17:26PM +0200, Per Forlin wrote:
 How significant is the cache maintenance over head?

Per,

Can you measure how much difference this has before and after your
patch set please?  This moves the dsb() out of the individual cache
maintanence functions, such that we will only perform one dsb() per
dma_*_sg call rather than one per SG entry.

Thanks.

 arch/arm/include/asm/dma-mapping.h |   11 +++
 arch/arm/mm/cache-fa.S |6 --
 arch/arm/mm/cache-v4wb.S   |2 --
 arch/arm/mm/cache-v6.S |6 --
 arch/arm/mm/cache-v7.S |3 ---
 arch/arm/mm/dma-mapping.c  |8 
 6 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 4fff837..853eba5 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -115,6 +115,11 @@ static inline void __dma_page_dev_to_cpu(struct page 
*page, unsigned long off,
___dma_page_dev_to_cpu(page, off, size, dir);
 }
 
+static inline void __dma_sync(void)
+{
+   dsb();
+}
+
 /*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
@@ -378,6 +383,7 @@ static inline dma_addr_t dma_map_single(struct device *dev, 
void *cpu_addr,
BUG_ON(!valid_dma_direction(dir));
 
addr = __dma_map_single(dev, cpu_addr, size, dir);
+   __dma_sync();
debug_dma_map_page(dev, virt_to_page(cpu_addr),
(unsigned long)cpu_addr  ~PAGE_MASK, size,
dir, addr, true);
@@ -407,6 +413,7 @@ static inline dma_addr_t dma_map_page(struct device *dev, 
struct page *page,
BUG_ON(!valid_dma_direction(dir));
 
addr = __dma_map_page(dev, page, offset, size, dir);
+   __dma_sync();
debug_dma_map_page(dev, page, offset, size, dir, addr, false);
 
return addr;
@@ -431,6 +438,7 @@ static inline void dma_unmap_single(struct device *dev, 
dma_addr_t handle,
 {
debug_dma_unmap_page(dev, handle, size, dir, true);
__dma_unmap_single(dev, handle, size, dir);
+   __dma_sync();
 }
 
 /**
@@ -452,6 +460,7 @@ static inline void dma_unmap_page(struct device *dev, 
dma_addr_t handle,
 {
debug_dma_unmap_page(dev, handle, size, dir, false);
__dma_unmap_page(dev, handle, size, dir);
+   __dma_sync();
 }
 
 /**
@@ -484,6 +493,7 @@ static inline void dma_sync_single_range_for_cpu(struct 
device *dev,
return;
 
__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+   __dma_sync();
 }
 
 static inline void dma_sync_single_range_for_device(struct device *dev,
@@ -498,6 +508,7 @@ static inline void dma_sync_single_range_for_device(struct 
device *dev,
return;
 
__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
+   __dma_sync();
 }
 
 static inline void dma_sync_single_for_cpu(struct device *dev,
diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S
index 1fa6f71..6eeb734 100644
--- a/arch/arm/mm/cache-fa.S
+++ b/arch/arm/mm/cache-fa.S
@@ -179,8 +179,6 @@ fa_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
-   mov r0, #0
-   mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
mov pc, lr
 
 /*
@@ -197,8 +195,6 @@ fa_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
-   mov r0, #0  
-   mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
mov pc, lr
 
 /*
@@ -212,8 +208,6 @@ ENTRY(fa_dma_flush_range)
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
-   mov r0, #0  
-   mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
mov pc, lr
 
 /*
diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S
index f40c696..523c0cb 100644
--- a/arch/arm/mm/cache-v4wb.S
+++ b/arch/arm/mm/cache-v4wb.S
@@ -194,7 +194,6 @@ v4wb_dma_inv_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
-   mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
mov pc, lr
 
 /*
@@ -211,7 +210,6 @@ v4wb_dma_clean_range:
add r0, r0, #CACHE_DLINESIZE
cmp r0, r1
blo 1b
-   mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
mov pc, lr
 
 /*
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 73b4a8b..7a842dd 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -239,8 +239,6 @@ v6_dma_inv_range:
strlo   r2, [r0]@ write for ownership
 #endif
blo 1b
-   mov r0, #0
-   mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
mov pc, lr
 
 /*
@@ -262,8 +260,6 @@

Re: [RFC] Add Arm cpu topology definition

2011-06-16 Thread Russell King - ARM Linux

On Thu, Jun 16, 2011 at 10:49:13AM +0200, Vincent Guittot wrote:
 @@ -219,6 +219,24 @@ source kernel/Kconfig.freezer
 
  menu System Type
 
 +config SCHED_MC
 + bool Multi-core scheduler support
 + depends on SMP  ARM_CPU_TOPOLOGY
 + default n
 + help
 +   Multi-core scheduler support improves the CPU scheduler's decision
 +   making when dealing with multi-core CPU chips at a cost of slightly
 +   increased overhead in some places. If unsure say N here.
 +
 +config SCHED_SMT
 + bool SMT scheduler support
 + depends on SMP  ARM_CPU_TOPOLOGY
 + default n
 + help
 +   Improves the CPU scheduler's decision making when dealing with
 +   MultiThreading at a cost of slightly increased overhead in some
 +   places. If unsure say N here.

Why place these in the system type menu?  Wouldn't it be better to
place them along side ARM_CPU_TOPOLOGY, and place that along side
the SMP option too?

 +
  config MMU
   bool MMU-based Paged Memory Management Support
   default y
 @@ -1062,6 +1080,14 @@ if !MMU
  source arch/arm/Kconfig-nommu
  endif
 
 +config ARM_CPU_TOPOLOGY
 + bool Support cpu topology definition
 + depends on SMP  CPU_V7
 + help
 +   Support Arm cpu topology definition. The MPIDR register defines
 +   affinity between processors which is used to set the cpu
 +   topology of an Arm System.

Is there a reason you'd want to disable this?

Please also note that it's ARM not Arm.

 diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
 index accbd7c..cb90d0a 100644
 --- a/arch/arm/include/asm/topology.h
 +++ b/arch/arm/include/asm/topology.h
 @@ -1,6 +1,39 @@
  #ifndef _ASM_ARM_TOPOLOGY_H
  #define _ASM_ARM_TOPOLOGY_H
 
 +#ifdef CONFIG_ARM_CPU_TOPOLOGY
 +
 +#include linux/cpumask.h
 +
 +struct cputopo_arm {
 + int thread_id;
 + int core_id;
 + int socket_id;
 + cpumask_t thread_sibling;
 + cpumask_t core_sibling;
 +};
 +
 +extern struct cputopo_arm cpu_topology[NR_CPUS];
 +
 +#define topology_physical_package_id(cpu)(cpu_topology[cpu].socket_id)
 +#define topology_core_id(cpu)(cpu_topology[cpu].core_id)
 +#define topology_core_cpumask(cpu)   ((cpu_topology[cpu].core_sibling))
 +#define topology_thread_cpumask(cpu) ((cpu_topology[cpu].thread_sibling))

The thing which naggs me about this is that its a static array, and we
know from x86 that static arrays of per-cpu data have various issues
(cache line bouncing, as well as limitations when we end up with large
numbers of CPUs.)

Lastly, x86 seems to do this:

#define arch_provides_topology_pointers yes

and the only effect I can find of that define is in drivers/base/topology.c:

#ifdef arch_provides_topology_pointers
...
unsigned int cpu = dev-id; \
return show_cpumap(0, topology_##name(cpu), buf);   \
...
#else
...
return show_cpumap(0, topology_##name(dev-id), buf);   \
...
#endif

dev-id is type 'u32' which devolves to 'unsigned int' on all supported
arches.  So it looks to me like this arch_provides_topology_pointers
define is completely pointless and we just have useless obfuscation for
the hell of it.

That's not a criticism of your patch, it's pointing out a difference
between x86 and our implementation.

 +#include linux/cpu.h
 +#include linux/cpumask.h
 +#include linux/init.h
 +#include linux/percpu.h
 +#include linux/node.h
 +#include linux/nodemask.h
 +#include linux/sched.h
 +
 +#include asm/cacheflush.h
 +#include asm/topology.h
 +
 +#define hard_smp_mpidr() \
 + ({ \
 + unsigned int cpunum; \
 + __asm__(mrc p15, 0, %0, c0, c0, 5 \
 + : =r (cpunum)); \
 + cpunum; \
 + })

Please add a definition for CPUID_MPIDR to arch/arm/include/asm/cputype.h
and a read_cpuid_mpidr() function, and use that instead.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH v6 0/5] Basic ARM devicetree support

2011-05-11 Thread Russell King - ARM Linux

On Wed, May 11, 2011 at 10:44:49PM +0200, Grant Likely wrote:
 Right now it merges cleanly with linux-next and the resulting tree
 builds and boots at least on qemu.  Unless you really object, I'm
 going to ask Stephen to add the following branch to the /end/ of the
 list of trees for linux-next so it can easily be dropped it if it
 causes any problems.

As far as the set of five patches looks fine to me, I don't have any
objections against them.  So I think we can merge them for .40.

What I've always worried about is the platform stuff, and that's
something I'm going to continue worrying about because I don't think
we have sufficient review capacity to ensure that we don't end up
with lots of stupidities.

Eg, we need to properly sort out how we're going to represent stuff
so we don't end up with X IRQ controllers, Y clock events, etc.  In
other words, I don't want to see DT growing an AT91 IRQ controller,
PXA IRQ controller, SA1100 IRQ controller, etc.

One of the things we must deal with is how do we reduce the amount
of IRQ controller code, clock event code, etc that we have in the
kernel tree.  That means coming up with some generic representation
of these facilities and having the right DT properties in place to
be able to describe to the generic representation what's required of
it.

If we can't do that, then DT isn't solving the problem which Linus
has complained about, and we will still be in the situation where
platform X wants its own IRQ controller, clock event, etc code.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Booting Integrator/AP with ARM920T

2011-03-21 Thread Russell King - ARM Linux

On Mon, Mar 21, 2011 at 11:17:19AM +0100, Linus Walleij wrote:
 Does anyone beside me *ever* try to actually boot a recent kernel
 on this thing?
 
 How do you do it? Do I need patched U-boots etc, or can I just load
 a Z-record image of vmlinux into RAM over the serial console?
 
 Needless to say, ARM Ltd. has deleted the web pages for official
 support of this thing as far as I can see. I was going to try out a
 few patches affecting mach-integrator...

That's going back quite a few years... and yes, it wasn't trivial.  How
it used to be done was with a boot loader called 'milo' which read the
kernel out of flash - you had to program it and the kernel using 'afu'
via the debugger.

Looking at the various scripts I have around, the other way I did it was
using Angel (and presumably armsd).  I have this .angelrc file:

device /dev/ttyS3
options 9600 8N1
baud 115200
base 0x8000
entry 0x8000
#otherfile ramdisk_img.gz
#otherbase 0xc080
r0 0x00
r1 21
exec .start-minicom

and .start-minicom is:
#!/bin/sh
exec ~/bin/start-minicom int

Not sure if any of this helps, but might provide some hints.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Yet another memory provider: can linaro organize a meeting?

2011-03-10 Thread Russell King - ARM Linux

On Thu, Mar 10, 2011 at 03:14:11PM +0100, Marek Szyprowski wrote:
 Hello,
 
 On Tuesday, March 08, 2011 9:14 AM Hans Verkuil wrote:
 
  We had a discussion yesterday regarding ways in which linaro can assist
  V4L2 development. One topic was that of sorting out memory providers like
  GEM and HWMEM.
  
  Today I learned of yet another one: UMP from ARM.
  
  http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-
  source/page__cid__133__show__newcomment/
 
 I really wonder what's the opinion of ARM Linux maintainer on this memory
 allocator. Russell - could you comment on it?

First I've heard about it.  I'll have to do some reading first, but I'm
rather busy at the present time.

As far as DMA memory allocation goes, I do have that patch laying around
which preallocates the DMA coherent and writecombine memory, but inspite
of sending it to the mailing list, there was very little in the way of
feedback.

Someone was going to go through the various platforms and work out which
could be reduced down to 1MB coherent/1MB writecombine, but I never saw
any follow-up to that.

I've been debating about just throwing it in the kernel for this coming
merge window anyway - I suspect most people just don't care how DMA
memory is provided, so long as it works and works reliably.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC] ARM: dma-mapping: outer cache is invalidated twice

2011-02-15 Thread Russell King - ARM Linux

On Tue, Feb 15, 2011 at 02:14:55PM +0100, Per Forlin wrote:
 outer_inv_range () is called twice for DMA_FROM_DEVICE.
 The first time to get rid of potential writebacks and the second
 time to get rid of any stale speculative prefetches

Correct.

 outer_inv_range() is a rather expensive operation. In the first case
 isn't it enough to just call cache_sync()?

No.  If the CPU speculatively fetches data from the DMA buffer after
it's been mapped for DMA, it will bring data into the L2 cache.  This
data may or may not be up to date with the DMA buffer contents once
DMA has completed.

As there is no way to know, we have to invalidate the L2 cache (and the
L1 cache) after the DMA has completed to avoid any possibility of data
corruption.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [RFC] ARM: dma-mapping: outer cache is invalidated twice

2011-02-15 Thread Russell King - ARM Linux

On Tue, Feb 15, 2011 at 02:54:21PM +0100, Per Forlin wrote:
 I don't fully understand this yet. I think you are right but I need a
 little help to get there myself.
 I agree, the cache (L1 and L2) must be invalidated after the DMA has 
 completed.
 Before starting the DMA the write buffers must be drained (cache_sync).
 
 Why invalidate the cache before starting the DMA?

Think about what happens if you have dirty cache lines in the DMA region.
These can be evicted when other cache lines are loaded, which will result
in them overwriting contents of memory.  If the DMA device has already
written to that memory, the result is data corruption.

So, the invalidate prior to DMA is to get rid of any dirty cache lines
which could be written back to memory.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-02-01 Thread Russell King - ARM Linux

On Tue, Feb 01, 2011 at 01:34:59PM +, Pawel Moll wrote:
  And to prove the point, I have MMCI running at up to 4Mbps, an 8 fold
  increase over what the current fixed upper-rate implementation does.
  The adaptive rate implementation is just a proof of concept at the
  moment and requires further work to improve the rate selection algorithm.
 
 Great, I've terribly glad you managed to have a go at this (I honestly
 wanted to, but simply had no time). I'm looking forward to see the
 patches and will be more than happy to backport them for the sake of the
 Linaro guys using 2.6.35 and 2.6.37 right now.
 
 On our side we did extend the FIFO and performed some tests (not very
 extensive yet though). The change seems not to break anything and help
 in the pathological (heavy USB traffic) scenario.
 
 When I get your changes and some official FPGA release, I'll try to push
 the bandwidth limits even further - hopefully changes will complement.

You can't push it any further without increasing the CPU/bus clock rates.
My measurements show that it takes the CPU in the region of 6-9us to
unload 32 bytes from the FIFO, which gives a theoretical limit of 2.8
to 4.2Mbps, depending on how the platform booted (some reboots its
consistently in the order of 6us, some boots its consistently around 9us.)

  The real solution to this is for there to be proper working DMA support
  implemented on ARM platforms,
 
 In case of VE this is all about getting an engine into the test chips,
 what didn't happen for A9 (the request lines are routed between the
 motherboard and the tile and IO FPGA can - theoretically - use the MMCI
 requests). As far as I'm told this cell is simply huge (silicon-wise)
 and therefore it's the first candidate to cut down when area is
 scarce... Anyway, I've spoken to guys around and asked them to keep the
 problem in mind, so we may get something with the next releases.

Bear in mind that PL18x + PL08x doesn't work.  Catalin forwarded my
concerns over this to ARM Support - where I basically ask how to program
the hardware up to DMA a single 64K transfer off a MMC card into a set
of scattered memory locations.

I've yet to have a response, so I'll take it that it's just possible
(the TRMs say as much).

The problem is that for a transfer, the MMCI produces BREQ * n + LBREQ,
and the DMAC will only listen for a LBREQ if it's in peripheral flow
control.  If it's in peripheral flow control, then it ignores the transfer
length field in the control register, only moving to the next LLI when it
sees LBREQ or LSREQ.

It ignores LBREQ and LSREQ in DMAC flow control mode..  You can DMA almost
all the data too/from the MMCI, but you miss the last half-fifo-size worth
of data.  While you can unload that manually for a read, you can't load
it manually for a write.

With peripheral flow control, you can only DMA the requested data to a
single contiguous buffer without breaking the MMC request into much
smaller chunks.  As Peter Pearse's PL08x code seems to suggest, the
maximum size of those chunks is 1K.

This seems to be a fundamental problem with the way each primecell has
been designed.

So, I do hope that someone decides to implement something more reasonable
if Versatile Express were to get a DMA controller.  If it's another PL08x
then it isn't worth it - it won't work.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-24 Thread Russell King - ARM Linux

On Mon, Jan 24, 2011 at 12:27:16PM -, Pawel Moll wrote:
 So - we'll try to enlarge FIFO. For the moment - playing with interrupts
 affinity seem to be a viable workaround.

I don't think enlarging the FIFO will help too much.  The issue is
whether the CPU can keep up with the data rate coming off the card.
If it can't, then no matter how large the FIFO is, it will eventually
overflow.

The real answer is to avoid PIO mode, and use DMA support.  However,
I've had problems using DMA on the ARM development boards.  You can
find details my DMA issues internally within ARM by talking to Catalin.

The alternative answer, I believe implemented by some of ARMs silicon
partners, is to turn the card clock off when the FIFO becomes full/empty
to stop it sending more data.  I think this violates some of the MMC/SD
requirements, but it seems to work for the silicon partners just fine.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-24 Thread Russell King - ARM Linux

On Mon, Jan 24, 2011 at 04:13:14PM +, Pawel Moll wrote:
  I don't think enlarging the FIFO will help too much.  The issue is
  whether the CPU can keep up with the data rate coming off the card.
  If it can't, then no matter how large the FIFO is, it will
  eventually overflow.
 
 Yes, I realize that. But so far the only time when problem happens is
 the timeout caused by ISP1761 handler.

That's fixable by having the block driver retry.

I have patches lodged with cjb which rework the MMC block driver error
handling, and I now have MMC rootfs entirely usable with the ISP1761
driver.  With ISP1761 also fixed (god knows why the driver maintainer
isn't doing anything with that patch) then the problem goes away.

So really this is a non-issue.

 If we substantially increase the FIFO depth, we'll just have much more
 margin - enough to mostly work. To my taste, the 1.3ms required to
 service the USB interrupt is already waaay to long. I'd consider any
 substantially larger latencies pathological, so making sure that we
 have margin of 3, maybe even 5 ms sounds good enough to me.

Well, fixing MMCI because ISP1761 is buggy is not the way forward.  The
answer is to fix the broken ISP1761 driver.

We know as it currently stands in mainline that it's utterly unusable
with certain serial devices.  That doesn't mean we go off and fix MMCI.

  The real answer is to avoid PIO mode, and use DMA support.
  However,
  I've had problems using DMA on the ARM development boards.  You can
 
 Well, using DMA won't be easy on VE, will it? ;-) Besides even with
 this, in some extreme situation, the bus could be potentially stalled
 long enough to cause an underrun. Yes, very unlikely, but not impossible.

I'm running SD rootfs alongside NFS/ssh/NTP and the buggy ISP1761.  With the
pile of 'fixes' patches I have here (which are currently stuck with various
maintainers) I have a completely stable system.

I've also had it running gstreamer on CLCD with AACI too, again with SD
rootfs.  Provided the video is already loaded off the SD card (because
the transfer rate is too slow) it's fine.

The only remaining problem I have with it is !£$%% busybox shell which
doesn't like readonly rootfs with command history.  The last command is
/sbin/reboot - I'm sure you can imagine what keeps on happening,
particularly at the most annoying points in time.

  The alternative answer, I believe implemented by some of ARMs silicon
  partners, is to turn the card clock off when the FIFO becomes full/empty
  to stop it sending more data.  I think this violates some of the MMC/SD
  requirements, but it seems to work for the silicon partners just fine.
 
 Oh no, that's absolutely legal - see JEDEC 84-A441, p. 7.7:
 The MultiMediaCard bus clock signal can be used by the host to put the
 card into energy saving mode, or to control the data flow (to avoid
 under-run or over-run conditions) on the bus.
 
 So this would be the technically correct fix. The problem is that this
 would require even more MMCI modifications then enlarging FIFO, so it's
 unlikely to happen.

Well as I see it, it's pointless enlarging the FIFO.  ARM Ltd isn't going
to give me updated FPGA images for the Integrator/CP, Versatile PB926,
Realview EB and Versatile Express.

Given that the problem is already fixed via a set of patches, I see no
reason to mess about with the hardware, thereby making the driver more
complicated for *no* benefit.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-24 Thread Russell King - ARM Linux

On Mon, Jan 24, 2011 at 04:24:00PM +, Russell King - ARM Linux wrote:
 On Mon, Jan 24, 2011 at 04:13:14PM +, Pawel Moll wrote:
   The real answer is to avoid PIO mode, and use DMA support.
   However,
   I've had problems using DMA on the ARM development boards.  You can
  
  Well, using DMA won't be easy on VE, will it? ;-) Besides even with
  this, in some extreme situation, the bus could be potentially stalled
  long enough to cause an underrun. Yes, very unlikely, but not impossible.
 
 I'm running SD rootfs alongside NFS/ssh/NTP and the buggy ISP1761.  With the
 pile of 'fixes' patches I have here (which are currently stuck with various
 maintainers) I have a completely stable system.
 
 I've also had it running gstreamer on CLCD with AACI too, again with SD
 rootfs.  Provided the video is already loaded off the SD card (because
 the transfer rate is too slow) it's fine.
 
 The only remaining problem I have with it is !£$%% busybox shell which
 doesn't like readonly rootfs with command history.  The last command is
 /sbin/reboot - I'm sure you can imagine what keeps on happening,
 particularly at the most annoying points in time.

I should also point out that my Versatile Express has been on continuously
for the last two to three weeks, running off SD rootfs, and since fixing
the ISP1761 driver and MMC block driver error handling there's been zero
issues with MMC or ISP1761.

So, MMCI overruns is now a complete non-issue for me.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-24 Thread Russell King - ARM Linux

On Mon, Jan 24, 2011 at 04:39:21PM -, Pawel Moll wrote:
  Well, fixing MMCI because ISP1761 is buggy is not the way forward.
  The answer is to fix the broken ISP1761 driver.
 
 Totally agree. The thing is that you are the only person right now who
 doesn't have any problems.

That's not my problem.  A simple search of the linux-usb mailing list
will give you the patch:

http://marc.info/?l=linux-usbm=129469114904445w=2

 The others can quickly benefit from the workaround.

Others _can_ quickly benefit from the ISP1761 fix by applying the above
patch, which not only fixes the MMCI overruns, but also allows USB to
serial devices to work with Versatile Express.  The above patch is
the correct solution.

For christ sake, that patch is a fix for a *known* errata issued by the
ISP1761 manufacturer.

  Well as I see it, it's pointless enlarging the FIFO.  ARM Ltd isn't
  going to give me updated FPGA images for the Integrator/CP, Versatile
  PB926, Realview EB and Versatile Express.
 
 If we succeed with this, you (and all other users) will get the
 improved FPGA image. But for VE only indeed.

That's utterly useless and completely pointless.

Look, stop jumping off a cliff with this and come back to reality.  The
ISP1761 is buggy.  We have a fix.  We have a tested fix.  We have a tested
fix out on mailing lists for everyone to use.  It fixes more than MMC
overruns and makes the ISP1761 usable.

The only problem is that it's taking *FAR* too long for it to get into
mainline for no apparant reason.  That doesn't stop anyone taking the
tested patch in the URL above and gaining proper USB functionality.

The right thing to do is to follow up on my reply on the ISP1761 thread
to find out why it's taking soo long for it to get into Linus' tree.

I repeat, with the proper set of patches in mainline THERE IS NO ISSUE
WITH MMCI.  THE HARDWARE DOES NOT NEED FIXING.  Please listen to me.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-24 Thread Russell King - ARM Linux

My final mail on this subject.  I'm adding Philippe and Catalin so that
they're in the loop on this.

Take a mainline kernel.  Apply the attached three patches.  MMC will then
work without any problems.  No hacks required.  There is *absolutely* *no*
need to waste time with hardware modifications.

With this you will find that MMCI FIFO underruns/overruns are no longer
any problem.
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index bfc8a8a..264a6bb 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -245,6 +245,20 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
return result;
 }
 
+static void send_stop(struct mmc_card *card, struct request *req)
+{
+   struct mmc_command cmd;
+   int err;
+
+   memset(cmd, 0, sizeof(struct mmc_command));
+   cmd.opcode = MMC_STOP_TRANSMISSION;
+   cmd.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+   err = mmc_wait_for_cmd(card-host, cmd, 0);
+   if (err)
+   pr_err(%s: error %d sending stop command\n,
+  req-rq_disk-disk_name, err);
+}
+
 static u32 get_card_status(struct mmc_card *card, struct request *req)
 {
struct mmc_command cmd;
@@ -255,9 +269,9 @@ static u32 get_card_status(struct mmc_card *card, struct 
request *req)
if (!mmc_host_is_spi(card-host))
cmd.arg = card-rca  16;
cmd.flags = MMC_RSP_SPI_R2 | MMC_RSP_R1 | MMC_CMD_AC;
-   err = mmc_wait_for_cmd(card-host, cmd, 0);
+   err = mmc_wait_for_cmd(card-host, cmd, 2);
if (err)
-   printk(KERN_ERR %s: error %d sending status command,
+   pr_err(%s: error %d sending status command\n,
   req-rq_disk-disk_name, err);
return cmd.resp[0];
 }
@@ -336,7 +350,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct 
request *req)
struct mmc_blk_data *md = mq-data;
struct mmc_card *card = md-queue.card;
struct mmc_blk_request brq;
-   int ret = 1, disable_multi = 0;
+   int ret = 1, disable_multi = 0, retry = 0;
 
mmc_claim_host(card-host);
 
@@ -432,6 +446,53 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, 
struct request *req)
 * programming mode even when things go wrong.
 */
if (brq.cmd.error || brq.data.error || brq.stop.error) {
+   status = get_card_status(card, req);
+
+   /* First print what's up */
+   if (brq.cmd.error)
+   pr_err(%s: error %d sending read/write 
command, card status %#x\n,
+  req-rq_disk-disk_name, brq.cmd.error,
+  status);
+
+   if (brq.data.error)
+   pr_err(%s: error %d transferring data, sector 
%u, nr %u, cmd response %#x, card status %#x\n,
+  req-rq_disk-disk_name, brq.data.error,
+  (unsigned)blk_rq_pos(req),
+  (unsigned)blk_rq_sectors(req),
+  brq.cmd.resp[0], status);
+
+   if (brq.stop.error)
+   pr_err(%s: error %d sending stop command, 
original cmd response %#x, card status %#x\n,
+  req-rq_disk-disk_name, brq.stop.error,
+  brq.cmd.resp[0], status);
+
+   /*
+* Now check the current card state.  If it is
+* in some data transfer mode, tell it to stop
+* (and hopefully transition back to TRAN.)
+*/
+   if (R1_CURRENT_STATE(status) == R1_STATE_DATA ||
+   R1_CURRENT_STATE(status) == R1_STATE_RCV)
+   send_stop(card, req);
+
+   /*
+* r/w cmd failure - get_card_status() should
+* tell us why the command was not accepted
+*/
+   if (brq.cmd.error  retry  2) {
+   /*
+* if it was a r/w cmd crc error, or illegal
+* command (eg, issued in wrong state) then
+* retry - we should have corrected the
+* state problem above.
+*/
+   if (status  (R1_COM_CRC_ERROR |
+ R1_ILLEGAL_COMMAND)) {
+   retry++;
+   continue;
+   }
+   }
+
if (brq.data.blocks  1  rq_data_dir(req) == READ) {

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-24 Thread Russell King - ARM Linux

On Mon, Jan 24, 2011 at 07:59:03PM +, Pawel Moll wrote:
  If you're flooding the system with USB traffic, enlargening the
  FIFO size won't help.  Making the FIFO larger just decreases the
  _interrupt_ _latency_ requirements.  It doesn't mean you can
  cope with the amount of data being transferred.
 
 On VE both ISP and MMCI are sharing the same static memory interface,

What has that to do with it?  If the static memory controller was the
bottleneck, don't you think that two CPUs running in parallel, one
reading data from the ISP1761 and the other reading the MMCI would
suffer bus starvation?  Your HACK HACK HACK patch shows that clearly
isn't the case.

You've already told me that you've measured the ISP1761 interrupt
handler taking 1.3ms to empty data off of the chip.  If that's 60K
of data, that's a data rate of around 47MiB/s.

At 521kHz transfer rate, it takes about 490us for MMCI to half-fill
its FIFO, or 980us to fully fill it.  It takes (measured) about 6-9us
to unload 32 bytes of data from the FIFO.

Translating the CPU read rate, that's a data rate of around 4MiB/s.

So I put it to you that there's plenty of bus bandwidth present to
service both the ISP1761 and MMCI.  What we're lacking is CPU
bandwidth.

I guess you haven't thought about moving MMCI to an adaptive clocking
solution?  What I'm suggesting is halve the clock rate on FIFO error
and retry.  Increase the clock rate on each successful transfer up
to the maximum provided by the core MMC code.

That should _significantly_ increase the achievable PIO data rate
way beyond what a deeper FIFO could ever hope to achieve, and will
allow it to adapt to situations where you load the system up beyond
the maximum latency which the MMCI can stand.

This would benefit a whole range of platforms, improving performance
across the board, which as you've already indicated merely going for
a deeper FIFO would be unable to do.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: [PATCH] arm: Improve MMC performance on Versatile Express

2011-01-21 Thread Russell King - ARM Linux

On Fri, Jan 21, 2011 at 02:59:35PM -0500, Nicolas Pitre wrote:
 On Fri, 21 Jan 2011, Pawel Moll wrote:
 
  VE suffers from serious problem with MMC transfers - low performance,
  errors when other IO peripherals (especially USB) are used at the
  same time etc.
  
  It all boils down to the MMC controller short FIFO and - in result -
  timing constrains. The most problematic case - USB driver hogging
  CPU and MMC FIFO under/overruns in the result - can be mitigated on
  SMP system by distributing interrupts handling for these peripherals
  between cores.
 
 Wouldn't the ultimate solution be to simply use FIQs to service the MMC FIFO?

Could you suggest how to route an arbitary interrupt to the FIQ using
the GIC?

On systems which implement security extensions, the non-secure world can
only use IRQs and not FIQs.  The secure world can use FIQs - in which
case interrupts marked as secure interrupts can go to FIQ.

On systems without security extensions, the GIC only supports IRQ, and
you need a second GIC implemented for FIQs.  I'm not aware of any system
doing that.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: RFC: Dynamic hwcaps

2010-12-07 Thread Russell King - ARM Linux

On Tue, Dec 07, 2010 at 10:45:42AM +, Dave Martin wrote:
 Yes-- though I didn't elaborate on it.  You need a packager that can
 understand, say, that a binary built for ARMv5 EABI can interoperate
 with ARMv7 binaries etc.
 Again, I've heard it suggested that RPM can handle this, but I haven't
 looked at it in detail myself.

That is indeed the case - as on x86, it used to be common to build the
majority of the distribution for i386, and glibc and a few other bits
for a range of ix86 CPUs.

rpm and yum know that i386 is compatible with i486, which is compatible
with i586 etc, so it will install an i386 package on i686 if no i486,
i586 or i686 package is available.

It does the same for ARM with ARMv3, ARMv4 etc.

 The dynamic hwcaps approach doesn't really solve that problem:

Has anyone investigated whether it is possible to power down things like
Neon etc meanwhile leaving the rest of the CPU running?  I've not seen
anything in the ARM documentation to suggest that's the case.

Even in MPCore based systems, the interface between the SCU and individual
processors by default doesn't have the necessary clamps built in to allow
individual CPUs to be powered off, and I'm not aware of any designs which
decided to enable this feature (as there's a performance penalty).  So I'd
be really surprised if there was any support for powering down Neon
separately from the host CPU.

If that's the case, it's entirely pointless discussing what userspace can
or can't do - if you have Neon available and can't power it down, and it's
faster for doing something, you might as well use it so you can put the
main CPU into WFI mode or get on with some other useful work.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: RFC: Dynamic hwcaps

2010-12-07 Thread Russell King - ARM Linux

On Tue, Dec 07, 2010 at 03:06:51PM +, Dave Martin wrote:
 Hi,
 
 On Tue, Dec 7, 2010 at 11:04 AM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  On Tue, Dec 07, 2010 at 10:45:42AM +, Dave Martin wrote:
  Yes-- though I didn't elaborate on it.  You need a packager that can
  understand, say, that a binary built for ARMv5 EABI can interoperate
  with ARMv7 binaries etc.
  Again, I've heard it suggested that RPM can handle this, but I haven't
  looked at it in detail myself.
 
  That is indeed the case - as on x86, it used to be common to build the
  majority of the distribution for i386, and glibc and a few other bits
  for a range of ix86 CPUs.
 
  rpm and yum know that i386 is compatible with i486, which is compatible
  with i586 etc, so it will install an i386 package on i686 if no i486,
  i586 or i686 package is available.
 
  It does the same for ARM with ARMv3, ARMv4 etc.
 
 That sounds plausible.

That sounds like doubt.

I've used rpm extensively over the last 10 years or so, both on x86 and
ARM.  I've built many versions of Red Hat and Fedora for ARM.  My ARM
machines here (including the one which is going to send this email) run
the result of that - and is currently a mixture of ARMv3 and ARMv4
Fedora packages.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

1 2 >

1 - 100 of 104 matches

Mail list logo