date:20121116

Re: new execve/kernel_thread design

2012-11-16 Thread Michal Simek

2012/11/15 Al Viro :
> On Thu, Nov 15, 2012 at 05:41:16PM +0100, Michal Simek wrote:
>> Here is the branch based on rc5 (information below)
>> and here is giweb.
>> http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
>>
>> I have also looked at your sys_fork / sys_vfork / sys_clone unification
>> and I have fixed it for Microblaze.
>>
>> Also I have done some tests on it for sure.
>>
>> I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
>> Are you OK with that?
>
> Umm...  In principle - yes, but I've a couple of question abouts those.

sure.
BTW: that generic sys_fork / sys_vfork / sys_clone will go through your tree.


> 1) What's that set_fs(USER_DS) in start_thread() for?  Note that we do the 
> same
> thing in flush_old_exec(), at the same time we remove PF_KTHREAD from
> current->flags.

ok. Will remove it.

> While we are at it, if we *ever* hit do_signal() with KERNEL_DS, we are
> very deep in trouble.  set_fs(USER_DS) in setup_{rt_,}frame() is pointless.

I have seen that several your signal patches around signal are there.

Do you have set of tests which should run it?


> 2) your definition of current_pt_regs() is an exact copy of on in
> include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
> needed at all?  IOW, I'd rather added #include  to
> arch/microblaze/kernel/process.c instead...

Agree. Fixed.

I have updated that branch or I can send you patches if you like.

Thanks,
Michal



-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 1/3] Runtime Interpreted Power Sequences

2012-11-16 Thread Srinivas KANDAGATLA

Hi Alex,
I am looking forward for this feature to be mainlined, but I have
comment on the way the types are tied up to power seq infrastructure.
I know your use case are limited to using type "delay", "pwm" and "gpio"
and "regulator", However there are instances where the devices can be
powered up or reset by writing to special registers or sysconfs or
something else.
So My suggestion would be to make these type register them selfs
dynamically with the power_seq infrastructure so that in future this can
be extended to other types as-well.
This trivial change can make a lot of difference for the future chips
which do thing bit differently.
ST Microelectronics chips fit it in these category and I guess other
Vendors have this similar chips.

What do you think?


Thanks,
srini

On 16/11/12 06:38, Alexandre Courbot wrote:
> Some device drivers (e.g. panel or backlights) need to follow precise
> sequences for powering on and off, involving GPIOs, regulators, PWMs
> with a precise powering order and delays to respect between steps.
> These sequences are device-specific, and do not belong to a particular
> driver - therefore they have been performed by board-specific hook
> functions to far.
>
> With the advent of the device tree and of ARM kernels that are not
> board-tied, we cannot rely on these board-specific hooks anymore but
> need a way to implement these sequences in a portable manner. This patch
> introduces a simple interpreter that can execute such power sequences
> encoded either as platform data or within the device tree.
>
> Signed-off-by: Alexandre Courbot 
> Reviewed-by: Stephen Warren 
> Reviewed-by: Mark Brown 
> ---
>  .../devicetree/bindings/power/power_seq.txt| 121 +++
>  Documentation/power/power_seq.txt  | 253 ++
>  drivers/power/Kconfig  |   1 +
>  drivers/power/Makefile |   1 +
>  drivers/power/power_seq/Kconfig|   2 +
>  drivers/power/power_seq/Makefile   |   1 +
>  drivers/power/power_seq/power_seq.c| 376 
> +
>  drivers/power/power_seq/power_seq_delay.c  |  65 
>  drivers/power/power_seq/power_seq_gpio.c   |  94 ++
>  drivers/power/power_seq/power_seq_pwm.c|  82 +
>  drivers/power/power_seq/power_seq_regulator.c  |  83 +
>  include/linux/power_seq.h  | 203 +++
>  12 files changed, 1282 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/power/power_seq.txt
>  create mode 100644 Documentation/power/power_seq.txt
>  create mode 100644 drivers/power/power_seq/Kconfig
>  create mode 100644 drivers/power/power_seq/Makefile
>  create mode 100644 drivers/power/power_seq/power_seq.c
>  create mode 100644 drivers/power/power_seq/power_seq_delay.c
>  create mode 100644 drivers/power/power_seq/power_seq_gpio.c
>  create mode 100644 drivers/power/power_seq/power_seq_pwm.c
>  create mode 100644 drivers/power/power_seq/power_seq_regulator.c
>  create mode 100644 include/linux/power_seq.h
>
> diff --git a/Documentation/devicetree/bindings/power/power_seq.txt 
> b/Documentation/devicetree/bindings/power/power_seq.txt
> new file mode 100644
> index 000..7880a6c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/power/power_seq.txt
> @@ -0,0 +1,121 @@
> +Runtime Interpreted Power Sequences
> +===
> +
> +Power sequences are sequential descriptions of actions to be performed on
> +power-related resources. Having these descriptions in a well-defined data 
> format
> +allows us to take much of the board- or device- specific power control code 
> out
> +of the kernel and place it into the device tree instead, making kernels less
> +board-dependant.
> +
> +A device typically makes use of multiple power sequences, for different 
> purposes
> +such as powering on and off. All the power sequences of a given device are
> +grouped into a set. In the device tree, this set is a sub-node of the device
> +node named "power-sequences".
> +
> +Power Sequences Structure
> +-
> +Every device that makes use of power sequences must have a "power-sequences"
> +node into which individual power sequences are declared as sub-nodes. The 
> name
> +of the node becomes the name of the sequence within the power sequences
> +framework.
> +
> +Similarly, each power sequence declares its steps as sub-nodes of itself. 
> Steps
> +must be named sequentially, with the first step named step0, the second 
> step1,
> +etc. Failure to follow this rule will result in a parsing error.
> +
> +Power Sequences Steps
> +-
> +Steps of a sequence describe an action to be performed on a resource. They
> +always include a "type" property which indicates what kind of resource this
> +step works on. Depending on the resource type, additional properties are 
> defined
> +to control the action to be performed

Re: [PATCH v2 1/3] gpio / ACPI: add ACPI support

2012-11-16 Thread Mika Westerberg

On Fri, Nov 16, 2012 at 08:54:48AM +0200, Mika Westerberg wrote:
> On Fri, Nov 16, 2012 at 02:34:22AM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 15, 2012 01:03:15 PM Mika Westerberg wrote:
> > > From: Mathias Nyman 
> > > 
> > > Add support for translating ACPI GPIO pin numbers to Linux GPIO API pins.
> > > Needs a gpio controller driver with the acpi handler hook set.
> > > 
> > > Drivers can use acpi_get_gpio() to translate ACPI5 GpioIO and GpioInt
> > > resources to Linux GPIO's.
> > > 
> > > Signed-off-by: Mathias Nyman 
> > > Signed-off-by: Mika Westerberg 
> > > ---
> > >  drivers/gpio/Kconfig|4 
> > >  drivers/gpio/Makefile   |1 +
> > >  drivers/gpio/gpiolib-acpi.c |   56 
> > > +++
> > >  include/linux/acpi_gpio.h   |   19 +++
> > >  4 files changed, 80 insertions(+)
> > >  create mode 100644 drivers/gpio/gpiolib-acpi.c
> > >  create mode 100644 include/linux/acpi_gpio.h
> > > 
> > > diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
> > > index f11d8e3..5c9b384 100644
> > > --- a/drivers/gpio/Kconfig
> > > +++ b/drivers/gpio/Kconfig
> > > @@ -49,6 +49,10 @@ config OF_GPIO
> > >   def_bool y
> > >   depends on OF
> > >  
> > > +config GPIO_ACPI
> > > + def_bool y
> > > + depends on ACPI
> > > +
> > >  config DEBUG_GPIO
> > >   bool "Debug GPIO calls"
> > >   depends on DEBUG_KERNEL
> > > diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
> > > index 9aeed67..420dbac 100644
> > > --- a/drivers/gpio/Makefile
> > > +++ b/drivers/gpio/Makefile
> > > @@ -4,6 +4,7 @@ ccflags-$(CONFIG_DEBUG_GPIO)  += -DDEBUG
> > >  
> > >  obj-$(CONFIG_GPIOLIB)+= gpiolib.o devres.o
> > >  obj-$(CONFIG_OF_GPIO)+= gpiolib-of.o
> > > +obj-$(CONFIG_GPIO_ACPI)  += gpiolib-acpi.o
> > >  
> > >  # Device drivers. Generally keep list sorted alphabetically
> > >  obj-$(CONFIG_GPIO_GENERIC)   += gpio-generic.o
> > > diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c
> > > new file mode 100644
> > > index 000..8ef9831
> > > --- /dev/null
> > > +++ b/drivers/gpio/gpiolib-acpi.c
> > > @@ -0,0 +1,56 @@
> > > +/*
> > > + * ACPI helpers for GPIO API
> > > + *
> > > + * Copyright (C) 2012, Intel Corporation
> > > + * Authors: Mathias Nyman 
> > > + *  Mika Westerberg 
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +static int acpi_gpiochip_find(struct gpio_chip *gc, void *data)
> > > +{
> > > + acpi_handle handle = data;
> > > +
> > > + if (!gc->dev)
> > > + return false;
> > > +
> > > + return gc->dev->acpi_handle == handle;
> > 
> > I'd prefer DEVICE_ACPI_HANDLE() to be used in such places, we may want to
> > replace it with something else in the future or make it work differently.
> 
> Sure but then we need to make it available for drivers as well when
> !CONFIG_ACPI. Something like below is needed.

One more thing, sometimes we want to assign the handle like in the case of
SPI controller driver we set the master->dev.acpi_handle. In that case we
can't use DEVICE_ACPI_HANDLE() as is. Should we just do something like:

master->dev.acpi_handle = pdev->dev.acpi_handle;

or should we introduce some new macro that supports this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

MDaemon Notification -- Attachment Removed

2012-11-16 Thread Postmaster

---
MDaemon has detected restricted attachments within an email message
---

>From  : linux-kernel@vger.kernel.org
To: yi...@yilin.com
Subject   : Returned mail: see transcript for details
Message-ID: 

-
Attachment(s) removed
-
yi...@yilin.com.zip (yi...@yilin.com)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch,v3 00/10] make I/O path allocations more numa-friendly

2012-11-16 Thread Bart Van Assche


On 11/09/12 20:17, Jeff Moyer wrote:

This patch set makes memory allocations for data structures used in
the I/O path more numa friendly by allocating them from the same numa
node as the storage device.  I've only converted a handful of drivers
at this point.  My testing showed that, for workloads where the I/O
processes were not tied to the numa node housing the device, a speedup
of around 6% was observed.  When the I/O processes were tied to the
numa node of the device, there was no measurable difference in my test
setup.  Given my relatively low-end setup[1], I wouldn't be surprised
if others could show a more significant performance advantage.

Comments would be greatly appreciated.


Sorry but I'm not familiar with any of the SCSI LLDs modified via this 
patch series. But I'm fine with the SCSI core patches in this series. So 
if you want you can add the following to the first five patches in this 
series:


Reviewed-by: Bart Van Assche 

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 13/14] misc: tsl2550: Remove redundant check on unsigned variable

2012-11-16 Thread Arnd Bergmann

On Friday 16 November 2012, Tushar Behera wrote:
> No need to check whether unsigned variable is less than 0.
> 
> CC: Arnd Bergmann 
> Signed-off-by: Tushar Behera 

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset

2012-11-16 Thread Yoshihiro YUNOMAE


Hi Marcelo,

Thank you for commenting on my patch set.

(2012/11/16 12:19), Marcelo Tosatti wrote:

On Wed, Nov 14, 2012 at 10:36:21AM +0900, Yoshihiro YUNOMAE wrote:

[...]

In this summary, I suggest the patch which TSC offset for each guest can be
output on the host.


The guest TSC can change (for example if TSC scaling is used). Moreover
TSC offset can change, and you'd have to monitor that. What


Yes, that's true. Changing TSC offset is the key point to use TSC for
merging trace data of guests and the host.


about a module option so that tsc_offset is written as zero (to be
used as debugging tool). Then the following restrictions apply:

- TSC must be synchronized across CPUs/VCPUS.
- TSC must be reliable.

Would that suffice? (a module option to kvm.ko, say zero_tsc_offset).


As you say, the guest TSC can change, so guest TSC needs to meet these
two restrictions to merge the trace data in chronological order.

However, the zero-TSC offset method is not enough, I think.
I will use TSC values as the tracing timestamp not only for debugging
but for failure analysis on actual operations. When we introduce
the zero-TSC offset, normally it will be no problem. However, if
the guest executes write_tsc or the guest works live migration, TSC
offset will be changed. After all, we need to monitor the TSC offset
value.

Thank you,

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/3] gpio / ACPI: add ACPI support

2012-11-16 Thread Mika Westerberg

On Fri, Nov 16, 2012 at 10:05:49AM +0200, Mika Westerberg wrote:
> 
> One more thing, sometimes we want to assign the handle like in the case of
> SPI controller driver we set the master->dev.acpi_handle. In that case we
> can't use DEVICE_ACPI_HANDLE() as is. Should we just do something like:
> 
>   master->dev.acpi_handle = pdev->dev.acpi_handle;
> 
> or should we introduce some new macro that supports this?

Or we could just drop the cast from the macro and use the same.

#define DEVICE_ACPI_HANDLE(dev) ((dev)->acpi_handle)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dt: platform: Extract device name from device tree blob

2012-11-16 Thread lee.jo...@linaro.org

On Thu, 15 Nov 2012, Per Förlin wrote:

> On 11/15/2012 04:52 PM, Grant Likely wrote:
> > On Mon, 12 Nov 2012 23:54:40 +0100, Per FÃ¶rlin  
> > wrote:
> >> On 11/12/2012 04:20 PM, Grant Likely wrote:
> >>> On Mon, Nov 12, 2012 at 2:59 PM, Per Forlin  
> >>> wrote:
>  Add support to extract device name from device tree blob.
>  If the property "dev-name" is set in the DTS this name will
>  be used when creating the device.
>  The auxdata_lookup has precedence and will override
>  the "dev-name" property.
> >>>
> >>> Using a 'dev-name' property has the same problem that the 'cell-index'
> >>> properties have in that it is encoding part of the global namespace
> >>> local to the node and it becomes easy to create collisions. Instead of
> >>> this check to see if one of the properties in /aliases points to the
> >>> node and use that for the name.
> >>>
> >>> g.
> >>>
> >> Thanks Grant for your feedback,
> >>
> >> Extract from exynos5250.dtsi:
> >> ---
> >>aliases {
> >>spi0 = &spi_0;
> >>spi1 = &spi_1;
> >>spi2 = &spi_2;
> >>};
> >>
> >>spi_0: spi@12d2 {
> >> ...
> >>};
> >>
> >>spi_1: spi@12d3 {
> >> ...
> >>};
> >>
> >>spi_2: spi@12d4 {
> >> ...
> >>};
> >> ---
> >>
> >> Alias refers to the device node. The device node is not aware of the alias.
> >>
> >> How to get a device name from the aliases.
> >> 1. Traverse all aliases for each device node (time consuming if there are 
> >> many aliases)
> >> 2. Make a new function of_alias_get_name(), today there is only 
> >> of_alias_get_id()
> >> 3. The functionality of setting device_name based on alias name needs to 
> >> be optional because one may want to use aliases without changing the name 
> >> of the device.
> >> All this is feasible but perhaps not optimal.
> >>
> >> I don't really see how come name space is a big issue in this case. The 
> >> name space of "dev-name" is local to the device node. A child device node 
> >> can use the same dev-name as the parent (unless I'm mistaken which happens 
> >> quite often). Introducing yet another property name pollutes the name 
> >> space of the device node. Still I think the pros are stronger than the 
> >> cons.
> >>
> >> Do you still prefer to use the name of the Alias? Could you please 
> >> elaborate a bit more how this can be done in practice?
> >> I would agree with you if there was a reference from the device node to 
> >> the alias.
> > 
> > Oh, I see what you're trying to do. As Lee pointed out you're trying to
> > make the Linux internal way of matching up clocks and regulators happy.
> > That is very much a Linux-kernel internal thing and should be solved in
> > the kernel. Trying to solve it with fixed names in the device tree will
> > cause problems down the road.
> > 
> > I though you were wanting to have logical names for the devices that
> > make sense to the user which is how aliases is used now.
> > 
> > So, no, don't do this.
> > 
> > g.
> > 
> Hi,
> 
> I came to the same conclusion when I dag into it some more. I replied to my 
> own comment and concluded (https://lkml.org/lkml/2012/11/13/309).
> The solution is to move clocks and regulators into the DTS. When this is done 
> there will be no need for setting a specific device-name (all those 
> auxdata_lookup can be removed)

Right, something I will do when we have all the pieces in place.

FWIW, all of the regulators are in the DTS(I) files and are happy.
We're still working on Clocks, I believe Ulf will deal with these.

-- 
Lee Jones
Linaro ST-Ericsson Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: add get_user() support for 8 byte types

2012-11-16 Thread Arnd Bergmann

On Thursday 15 November 2012, Rob Clark wrote:
> 
> From: Rob Clark 
> 
> A new atomic modeset/pageflip ioctl being developed in DRM requires
> get_user() to work for 64bit types (in addition to just put_user()).
> 
> v1: original
> v2: pass correct size to check_uaccess, and better handling of narrowing
> double word read with __get_user_xb() (Russell King's suggestion)
> v3: explain in comment about why this works for narrowing fetch to 1,
> 2, or 4 byte type on ARM.
> 
> Signed-off-by: Rob Clark 

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-16 Thread Pandarathil, Vijaymohan R

When an error is detected on a PCIe device which does not have an
AER-aware driver, prevent AER infrastructure from reporting
successful error recovery.

This is because the report_error_detected() function that gets
called in the first phase of recovery process allows forward
progress even when the driver for the device does not have AER
capabilities. It seems that all callbacks (in pci_error_handlers
structure) registered by drivers that gets called during error
recovery are not mandatory. So the intention of the infrastructure
design seems to be to allow forward progress even when a specific
callback has not been registered by a driver. However, if error
handler structure itself has not been registered, it doesn't make
sense to allow forward progress.

As a result of the current design, in the case of a single device
having an AER-unaware driver or in the case of any function in a
multi-function card having an AER-unaware driver, a successful
recovery is reported.

Typical scenario this happens is when a PCI device is detached
from a KVM host and the pci-stub driver on the host claims the
device. The pci-stub driver does not have error handling capabilities
but the AER infrastructure still reports that the device recovered
successfully.

The changes proposed here leaves the device(s)in an unrecovered state
if the driver for the device or for any device in the subtree
does not have error handler structure registered. This reflects
the true state of the device and prevents any partial recovery (or no
recovery at all) reported as successful.

v2:
  - Made changes so that all devices in the subtree have the error
state set correctly.

Reviewed-by: Linas Vepstas  gmail.com>
Reviewed-by: Myron Stowe  redhat.com>
Reviewed-by: Bjorn Helgaas  google.com>
Signed-off-by: Vijay Mohan Pandarathil  hp.com>

---
 drivers/pci/pcie/aer/aerdrv.h  |  5 -
 drivers/pci/pcie/aer/aerdrv_core.c | 21 ++---
 include/linux/pci.h|  3 +++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 94a7598..22f840f 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -87,6 +87,9 @@ struct aer_broadcast_data {
 static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
enum pci_ers_result new)
 {
+   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
+   return PCI_ERS_RESULT_NO_AER_DRIVER;
+
if (new == PCI_ERS_RESULT_NONE)
return orig;
 
@@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum 
pci_ers_result orig,
break;
case PCI_ERS_RESULT_DISCONNECT:
if (new == PCI_ERS_RESULT_NEED_RESET)
-   orig = new;
+   orig = PCI_ERS_RESULT_NEED_RESET;
break;
default:
break;
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 06bad96..c1b8fdd 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -231,11 +231,26 @@ static int report_error_detected(struct pci_dev *dev, 
void *data)
   dev->driver ?
   "no AER-aware driver" : "no driver");
}
-   return 0;
+
+   /*
+* If there's any device in the subtree that does not
+* have an error_detected callback, returning
+* PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
+* the subsequent mmio_enabled/slot_reset/resume
+* callbacks of "any" device in the subtree. All the
+* devices in the subtree are left in the error state
+* without recovery.
+*/
+
+   if !(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE)
+   vote = PCI_ERS_RESULT_NO_AER_DRIVER;
+   else
+   vote = PCI_ERS_RESULT_NONE;
+   } else {
+   err_handler = dev->driver->err_handler;
+   vote = err_handler->error_detected(dev, result_data->state);
}
 
-   err_handler = dev->driver->err_handler;
-   vote = err_handler->error_detected(dev, result_data->state);
result_data->result = merge_result(result_data->result, vote);
return 0;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ee21795..fb7e869 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -538,6 +538,9 @@ enum pci_ers_result {
 
/* Device driver is fully recovered and operational */
PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
+
+   /* No AER capabilities registered for the driver */
+   PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
 };
 
 /* PCI bus error event callbacks */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
t

Re: [PATCH] usb: dwc3: core: move dwc3_cache_hwparams before dwc3_alloc_event_buffers

2012-11-16 Thread Felipe Balbi

Hi,

On Fri, Nov 16, 2012 at 12:07:54PM +0530, Kishon Vijay Abraham I wrote:
> commit 392142 moved event buffer allocation out of dwc3_core_init() but
> event buffer allocation uses the cached copy of hwparams to determine
> the number of event buffers and the caching is done in dwc3_core_init.
> So moved dwc3_cache_hwparams function before dwc3_alloc_event_buffers so
> that dwc3_alloc_event_buffers sees the correct number of event buffers.
> 
> Signed-off-by: Kishon Vijay Abraham I 

My bad, what a regression I caused :-)

Greg, can you take this one ?

Signed-off-by: Felipe Balbi 

> ---
>  drivers/usb/dwc3/core.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index b923183..88e8d31 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -356,8 +356,6 @@ static int __devinit dwc3_core_init(struct dwc3 *dwc)
>  
>   dwc3_core_soft_reset(dwc);
>  
> - dwc3_cache_hwparams(dwc);
> -
>   reg = dwc3_readl(dwc->regs, DWC3_GCTL);
>   reg &= ~DWC3_GCTL_SCALEDOWN_MASK;
>   reg &= ~DWC3_GCTL_DISSCRAMBLE;
> @@ -498,6 +496,8 @@ static int __devinit dwc3_probe(struct platform_device 
> *pdev)
>   pm_runtime_get_sync(dev);
>   pm_runtime_forbid(dev);
>  
> + dwc3_cache_hwparams(dwc);
> +
>   ret = dwc3_alloc_event_buffers(dwc, DWC3_EVENT_BUFFERS_SIZE);
>   if (ret) {
>   dev_err(dwc->dev, "failed to allocate event buffers\n");
> -- 
> 1.7.9.5
> 

-- 
balbi


signature.asc
Description: Digital signature

Re: [PATCH 0/9 v4] use efficient this_cpu_* helper

2012-11-16 Thread Shan Wei

Hi Tejun Heo:

Tejun Heo said, at 2012/11/15 22:53:
> On Thu, Nov 15, 2012 at 02:19:38PM +, Christoph Lameter wrote:
>> Tejon: Could you pick up this patchset?
> 
> Sure, but, Shan, when posting patchset, please make the patches
> replies to the head message; otherwise, it's pretty difficult to track
> what's going on with the patchset as a whole.  I see that some patches
> are being picked up by respective subsystems.  If you have patches
> left, please let me know.

OK, next time i will do as you suggest.

This patchset include more subsystem, i.e network, rcu, trace.
The best way to avoid code conflict is subsystem maintainer to pick them up
to their code tree. I will remind them in each patch that not yet applied and 
add you to the receiver list.

Best Regards
Shan Wei

> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 1/3] Runtime Interpreted Power Sequences

2012-11-16 Thread Alex Courbot

Hi Srinivas,

On Friday 16 November 2012 15:58:29 Srinivas KANDAGATLA wrote:
> Hi Alex,
> I am looking forward for this feature to be mainlined,

*cough* Ack *cough* :)

> but I have
> comment on the way the types are tied up to power seq infrastructure.
> I know your use case are limited to using type "delay", "pwm" and "gpio"
> and "regulator", However there are instances where the devices can be
> powered up or reset by writing to special registers or sysconfs or
> something else.
> So My suggestion would be to make these type register them selfs
> dynamically with the power_seq infrastructure so that in future this can
> be extended to other types as-well.
> This trivial change can make a lot of difference for the future chips
> which do thing bit differently.
> ST Microelectronics chips fit it in these category and I guess other
> Vendors have this similar chips.

The current implementation is (purposedly) minimal and will certainly be 
extended. There are other aspects of regulators for instance that should also 
be controllable (voltage comes to mind). And I am totally open to supporting 
new kinds of resources as usage broadens. For this first version I just wanted 
to introduce the feature and minimize the impact should anything (DT 
bindings?) need to change.

I am a little bit skeptical about the purpose of directly accessing registers 
(or any part of the address space) from power sequences. It should at least be 
possible to involve some kind of abstraction. Not necessarily one of the 
currently supported types - but at least something.

The reason is that I'd like to try and avoid direct references to resources 
within sequences as much as possible to make them reusable. If your system has 
two identical devices, you should not need to duplicate their sequences just 
to change a register range from the few steps that make use of it. If you can 
do the same job with, say, a regulator, you can just give it a name, get it at 
runtime using regulator_get() and define it outside of the sequence, in our 
device node.

Of course there might be scenarios where you really need to access a register 
and there is no way to do otherwise, in this case I am open to discussion. But 
before resorting to this I'd like to make that the existing abstraction cannot 
cover the case already.

Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6] staging: ste_rmi4: Convert to Type-B support

2012-11-16 Thread Alexandra Chin

Convert to MT-B because Synaptics touch devices are capable
of tracking identifiable fingers.

Signed-off-by: Alexandra Chin 
---
This patch was tested on Pandaboard.

Changes from v6:
- Incorporated Henrik's review comments
  *remove irrelevant changes within the patch

Changes from v5:
- Incorporated Henrik's review comments
  *rollback to v3 from v4
  *fix odd line break in v3

Changes from v4:
- Incorporated Henrik's review comments
  *split function synpatics_rmi4_touchscreen_report
  *split function synaptics_rmi4_i2c_query_device

Changes from v3:
- Incorporated Henrik's review comments
  *remove 'else' after an error path return
  *add input_mt_sync_frame() for pointer emulation effects
  *correct names of touchscreen
- Replace printk with dev_err

Changes from v2:
- Incorporated Henrik's review comments
  *directly report finger state with Type-B
- Against 3.7-rcX
  *call input_mt_init_slots with INPUT_MT_DIRECT flag
---
 drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c |   76 +++--
 1 files changed, 33 insertions(+), 43 deletions(-)

diff --git a/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c 
b/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
index 277491a..ed304e0 100644
--- a/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
+++ b/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "synaptics_i2c_rmi4.h"
 
 /* TODO: for multiple device support will need a per-device mutex */
@@ -67,7 +68,6 @@
 #define PDT_START_SCAN_LOCATION (0x00E9)
 #define PDT_END_SCAN_LOCATION  (0x000A)
 #define PDT_ENTRY_SIZE (0x0006)
-#define RMI4_NUMBER_OF_MAX_FINGERS (8)
 #define SYNAPTICS_RMI4_TOUCHPAD_FUNC_NUM   (0x11)
 #define SYNAPTICS_RMI4_DEVICE_CONTROL_FUNC_NUM (0x01)
 
@@ -164,6 +164,7 @@ struct synaptics_rmi4_device_info {
  * @regulator: pointer to the regulator structure
  * @wait: wait queue structure variable
  * @touch_stopped: flag to stop the thread function
+ * @fingers_supported: maximum supported fingers
  *
  * This structure gives the device data information.
  */
@@ -184,6 +185,7 @@ struct synaptics_rmi4_data {
struct regulator*regulator;
wait_queue_head_t   wait;
booltouch_stopped;
+   unsigned char   fingers_supported;
 };
 
 /**
@@ -303,22 +305,21 @@ static int synpatics_rmi4_touchpad_report(struct 
synaptics_rmi4_data *pdata,
/* number of touch points - fingers down in this case */
int touch_count = 0;
int finger;
-   int fingers_supported;
int finger_registers;
int reg;
int finger_shift;
int finger_status;
int retval;
+   int x, y;
+   int wx, wy;
unsigned short  data_base_addr;
unsigned short  data_offset;
unsigned char   data_reg_blk_size;
unsigned char   values[2];
unsigned char   data[DATA_LEN];
-   int x[RMI4_NUMBER_OF_MAX_FINGERS];
-   int y[RMI4_NUMBER_OF_MAX_FINGERS];
-   int wx[RMI4_NUMBER_OF_MAX_FINGERS];
-   int wy[RMI4_NUMBER_OF_MAX_FINGERS];
+   unsigned char   fingers_supported = pdata->fingers_supported;
struct  i2c_client *client = pdata->i2c_client;
+   struct  input_dev *input_dev = pdata->input_dev;
 
/* get 2D sensor finger data */
/*
@@ -333,7 +334,6 @@ static int synpatics_rmi4_touchpad_report(struct 
synaptics_rmi4_data *pdata,
 *  10 = finger present but data may not be accurate,
 *  11 = reserved for product use.
 */
-   fingers_supported   = rfi->num_of_data_points;
finger_registers= (fingers_supported + 3)/4;
data_base_addr  = rfi->fn_desc.data_base_addr;
retval = synaptics_rmi4_i2c_block_read(pdata, data_base_addr, values,
@@ -358,7 +358,11 @@ static int synpatics_rmi4_touchpad_report(struct 
synaptics_rmi4_data *pdata,
 * if finger status indicates a finger is present then
 * read the finger data and report it
 */
-   if (finger_status == 1 || finger_status == 2) {
+   input_mt_slot(input_dev, finger);
+   input_mt_report_slot_state(input_dev, MT_TOOL_FINGER,
+   finger_status != 0);
+
+   if (finger_status) {
/* Read the finger data */
data_offset = data_base_addr +
((finger * data_reg_blk_size) +
@@ -367,50 +371,33 @@ static int synpatics_rmi4_touchpad_report(struct 
synaptics_rmi4_data *pdata,
data_offset, data,
data_reg_blk_size);

Re: [PATCH v3 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

2012-11-16 Thread Shan Wei

Shan Wei said, at 2012/11/13 9:52:
> From: Shan Wei 
> 
> Signed-off-by: Shan Wei 
> Acked-by: Christoph Lameter 

Paul, would you like to pick it up to your tree?

> ---
> no changes vs v3,v2.
> ---
>  kernel/rcutree.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 74df86b..441b945 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
>   struct rcu_node *rnp_old = NULL;
>  
>   /* Funnel through hierarchy to reduce memory contention. */
> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> + rnp = __this_cpu_read(rsp->rda->mynode);
>   for (; rnp != NULL; rnp = rnp->parent) {
>   ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
> !raw_spin_trylock(&rnp->fqslock);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 7/9] trace: use this_cpu_ptr per-cpu helper

2012-11-16 Thread Shan Wei

Shan Wei said, at 2012/11/13 9:53:
> From: Shan Wei 
> 
> typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024].
> But, typeof(&buffer[0]) is a pointer to char which match the return type of 
> get_trace_buf().
> As well-known, the value of &buffer is equal to &buffer[0].
> so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast. 
> 
> Signed-off-by: Shan Wei 

Steven Rostedt,  would you like to pick it up to your tree?

> ---
> v4: avoid type cast using &percpu_buffer->buffer[0].
> ---
>  kernel/trace/blktrace.c |2 +-
>  kernel/trace/trace.c|5 +
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index c0bd030..71259e2 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -147,7 +147,7 @@ void __trace_note_message(struct blk_trace *bt, const 
> char *fmt, ...)
>   return;
>  
>   local_irq_save(flags);
> - buf = per_cpu_ptr(bt->msg_data, smp_processor_id());
> + buf = this_cpu_ptr(bt->msg_data);
>   va_start(args, fmt);
>   n = vscnprintf(buf, BLK_TN_MAX_MSG, fmt, args);
>   va_end(args);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 31e4f55..65cb003 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1495,7 +1495,6 @@ static struct trace_buffer_struct 
> *trace_percpu_nmi_buffer;
>  static char *get_trace_buf(void)
>  {
>   struct trace_buffer_struct *percpu_buffer;
> - struct trace_buffer_struct *buffer;
>  
>   /*
>* If we have allocated per cpu buffers, then we do not
> @@ -1513,9 +1512,7 @@ static char *get_trace_buf(void)
>   if (!percpu_buffer)
>   return NULL;
>  
> - buffer = per_cpu_ptr(percpu_buffer, smp_processor_id());
> -
> - return buffer->buffer;
> + return this_cpu_ptr(&percpu_buffer->buffer[0]);
>  }
>  
>  static int alloc_percpu_trace_buffer(void)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 4/9] net: openvswitch: use this_cpu_ptr per-cpu helper

2012-11-16 Thread Shan Wei

Shan Wei said, at 2012/11/13 9:52:
> From: Shan Wei 
> 
> just use more faster this_cpu_ptr instead of per_cpu_ptr(p, 
> smp_processor_id());
> 
> 
> Signed-off-by: Shan Wei 
> Reviewed-by: Christoph Lameter 

Jesse Gross,  would you like to pick it up to your tree?

> ---
> no changes vs v3,v2.
> ---
>  net/openvswitch/datapath.c |4 ++--
>  net/openvswitch/vport.c|5 ++---
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 4c4b62c..77d16a5 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -208,7 +208,7 @@ void ovs_dp_process_received_packet(struct vport *p, 
> struct sk_buff *skb)
>   int error;
>   int key_len;
>  
> - stats = per_cpu_ptr(dp->stats_percpu, smp_processor_id());
> + stats = this_cpu_ptr(dp->stats_percpu);
>  
>   /* Extract flow from 'skb' into 'key'. */
>   error = ovs_flow_extract(skb, p->port_no, &key, &key_len);
> @@ -282,7 +282,7 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff 
> *skb,
>   return 0;
>  
>  err:
> - stats = per_cpu_ptr(dp->stats_percpu, smp_processor_id());
> + stats = this_cpu_ptr(dp->stats_percpu);
>  
>   u64_stats_update_begin(&stats->sync);
>   stats->n_lost++;
> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> index 03779e8..70af0be 100644
> --- a/net/openvswitch/vport.c
> +++ b/net/openvswitch/vport.c
> @@ -333,8 +333,7 @@ void ovs_vport_receive(struct vport *vport, struct 
> sk_buff *skb)
>  {
>   struct vport_percpu_stats *stats;
>  
> - stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());
> -
> + stats = this_cpu_ptr(vport->percpu_stats);
>   u64_stats_update_begin(&stats->sync);
>   stats->rx_packets++;
>   stats->rx_bytes += skb->len;
> @@ -359,7 +358,7 @@ int ovs_vport_send(struct vport *vport, struct sk_buff 
> *skb)
>   if (likely(sent)) {
>   struct vport_percpu_stats *stats;
>  
> - stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());
> + stats = this_cpu_ptr(vport->percpu_stats);
>  
>   u64_stats_update_begin(&stats->sync);
>   stats->tx_packets++;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] backlight: lcd: return ENXIO when ops functions cannot be called

2012-11-16 Thread Jingoo Han

Previously, when ops functions cannot be called,
lcd_show_contrast() and lcd_store_contrast() returned 0,
instead of ENXIO.

Thus, in this case, a local variable 'rc' for return value
should be initialized as ENXIO, after kstrtoul() is called.

Signed-off-by: Jingoo Han 
Cc: Richard Purdie 
---
 drivers/video/backlight/lcd.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/video/backlight/lcd.c b/drivers/video/backlight/lcd.c
index a5d0d02..34fb6bd 100644
--- a/drivers/video/backlight/lcd.c
+++ b/drivers/video/backlight/lcd.c
@@ -108,7 +108,7 @@ static ssize_t lcd_show_power(struct device *dev, struct 
device_attribute *attr,
 static ssize_t lcd_store_power(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
 {
-   int rc = -ENXIO;
+   int rc;
struct lcd_device *ld = to_lcd_device(dev);
unsigned long power;
 
@@ -116,6 +116,8 @@ static ssize_t lcd_store_power(struct device *dev,
if (rc)
return rc;
 
+   rc = -ENXIO;
+
mutex_lock(&ld->ops_lock);
if (ld->ops && ld->ops->set_power) {
pr_debug("set power to %lu\n", power);
@@ -144,7 +146,7 @@ static ssize_t lcd_show_contrast(struct device *dev,
 static ssize_t lcd_store_contrast(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
 {
-   int rc = -ENXIO;
+   int rc;
struct lcd_device *ld = to_lcd_device(dev);
unsigned long contrast;
 
@@ -152,6 +154,8 @@ static ssize_t lcd_store_contrast(struct device *dev,
if (rc)
return rc;
 
+   rc = -ENXIO;
+
mutex_lock(&ld->ops_lock);
if (ld->ops && ld->ops->set_contrast) {
pr_debug("set contrast to %lu\n", contrast);
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 3/4] pinctrl: at91: Prevent NULL dereference if of_match_device returns NULL

2012-11-16 Thread Nicolas Ferre

On 11/15/2012 11:00 AM, Jean-Christophe PLAGNIOL-VILLARD :
> On 12:58 Thu 15 Nov , Axel Lin wrote:
>> of_match_device() may return NULL.
> this is not possible on at91
> 
> and I do a oups here as if we have a NULL pointer which means the driver is
> wrong

Well, okay, but it does not prevent from adding a supplementary check to
mimic every other pinctrl driver and use a common return path to the
of_match_device() function.

So I am not completely against this patch (that I why I added my Acked-by).


>> Signed-off-by: Axel Lin 
>> ---
>> This resend CC Jean-Christophe.
>>
>>  drivers/pinctrl/pinctrl-at91.c |   17 +
>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pinctrl/pinctrl-at91.c b/drivers/pinctrl/pinctrl-at91.c
>> index 8490a55..32006c8 100644
>> --- a/drivers/pinctrl/pinctrl-at91.c
>> +++ b/drivers/pinctrl/pinctrl-at91.c
>> @@ -829,13 +829,18 @@ static int __devinit at91_pinctrl_probe_dt(struct 
>> platform_device *pdev,
>>  uint32_t *tmp;
>>  struct device_node *np = pdev->dev.of_node;
>>  struct device_node *child;
>> +const struct of_device_id *match;
>>  
>>  if (!np)
>>  return -ENODEV;
>>  
>> +match = of_match_device(at91_pinctrl_of_match, &pdev->dev)->data;
>> +if (!match)
>> +return -ENODEV;
>> +
>>  info->dev = &pdev->dev;
>> -info->ops = (struct at91_pinctrl_mux_ops*)
>> -of_match_device(at91_pinctrl_of_match, &pdev->dev)->data;
>> +info->ops = (struct at91_pinctrl_mux_ops *) match->data;
>> +
>>  at91_pinctrl_child_count(info, np);
>>  
>>  if (info->nbanks < 1) {
>> @@ -1359,6 +1364,7 @@ static struct of_device_id at91_gpio_of_match[] 
>> __devinitdata = {
>>  static int __devinit at91_gpio_probe(struct platform_device *pdev)
>>  {
>>  struct device_node *np = pdev->dev.of_node;
>> +const struct of_device_id *match;
>>  struct resource *res;
>>  struct at91_gpio_chip *at91_chip = NULL;
>>  struct gpio_chip *chip;
>> @@ -1399,8 +1405,11 @@ static int __devinit at91_gpio_probe(struct 
>> platform_device *pdev)
>>  goto err;
>>  }
>>  
>> -at91_chip->ops = (struct at91_pinctrl_mux_ops*)
>> -of_match_device(at91_gpio_of_match, &pdev->dev)->data;
>> +match = of_match_device(at91_gpio_of_match, &pdev->dev);
>> +if (!match)
>> +return -ENODEV;
>> +
>> +at91_chip->ops = (struct at91_pinctrl_mux_ops *) match->data;
>>  at91_chip->pioc_virq = irq;
>>  at91_chip->pioc_idx = alias_idx;
>>  
>> -- 
>> 1.7.9.5
>>
>>
>>
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/9] net: core: use this_cpu_ptr per-cpu helper

2012-11-16 Thread Shan Wei

Shan Wei said, at 2012/11/13 9:51:
> From: Shan Wei 
> 
> flush_tasklet is a struct, not a pointer in percpu var.
> so use this_cpu_ptr to get the member pointer.
> 
> Signed-off-by: Shan Wei 
> Reviewed-by: Christoph Lameter 

David Miller,  would you like to pick it up to your net-next tree?

> ---
> no changes vs v3.
> ---
>  net/core/flow.c |4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/flow.c b/net/core/flow.c
> index e318c7e..b0901ee 100644
> --- a/net/core/flow.c
> +++ b/net/core/flow.c
> @@ -327,11 +327,9 @@ static void flow_cache_flush_tasklet(unsigned long data)
>  static void flow_cache_flush_per_cpu(void *data)
>  {
>   struct flow_flush_info *info = data;
> - int cpu;
>   struct tasklet_struct *tasklet;
>  
> - cpu = smp_processor_id();
> - tasklet = &per_cpu_ptr(info->cache->percpu, cpu)->flush_tasklet;
> + tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet);
>   tasklet->data = (unsigned long)info;
>   tasklet_schedule(tasklet);
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/9] net: rds: use this_cpu_* per-cpu helper

2012-11-16 Thread Shan Wei

Shan Wei said, at 2012/11/13 9:52:
> From: Shan Wei 
> 
> 
> Signed-off-by: Shan Wei 
> Reviewed-by: Christoph Lameter 

David Miller,  would you like to pick it up to your net-next tree?


> ---
> v4:
> 1. add missing __percpu annotations.
> 2. [read|write]ing fields of struct rds_ib_cache_head
> using __this_cpu_* operation, drop per_cpu_ptr.
> ---
>  net/rds/ib.h  |2 +-
>  net/rds/ib_recv.c |   24 +---
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/net/rds/ib.h b/net/rds/ib.h
> index 8d2b3d5..7280ab8 100644
> --- a/net/rds/ib.h
> +++ b/net/rds/ib.h
> @@ -50,7 +50,7 @@ struct rds_ib_cache_head {
>  };
>  
>  struct rds_ib_refill_cache {
> - struct rds_ib_cache_head *percpu;
> + struct rds_ib_cache_head __percpu *percpu;
>   struct list_head *xfer;
>   struct list_head *ready;
>  };
> diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
> index 8d19491..8c5bc85 100644
> --- a/net/rds/ib_recv.c
> +++ b/net/rds/ib_recv.c
> @@ -418,20 +418,21 @@ static void rds_ib_recv_cache_put(struct list_head 
> *new_item,
>struct rds_ib_refill_cache *cache)
>  {
>   unsigned long flags;
> - struct rds_ib_cache_head *chp;
>   struct list_head *old;
> + struct list_head __percpu *chpfirst;
>  
>   local_irq_save(flags);
>  
> - chp = per_cpu_ptr(cache->percpu, smp_processor_id());
> - if (!chp->first)
> + chpfirst = __this_cpu_read(cache->percpu->first);
> + if (!chpfirst)
>   INIT_LIST_HEAD(new_item);
>   else /* put on front */
> - list_add_tail(new_item, chp->first);
> - chp->first = new_item;
> - chp->count++;
> + list_add_tail(new_item, chpfirst);
>  
> - if (chp->count < RDS_IB_RECYCLE_BATCH_COUNT)
> + __this_cpu_write(chpfirst, new_item);
> + __this_cpu_inc(cache->percpu->count);
> +
> + if (__this_cpu_read(cache->percpu->count) < RDS_IB_RECYCLE_BATCH_COUNT)
>   goto end;
>  
>   /*
> @@ -443,12 +444,13 @@ static void rds_ib_recv_cache_put(struct list_head 
> *new_item,
>   do {
>   old = xchg(&cache->xfer, NULL);
>   if (old)
> - list_splice_entire_tail(old, chp->first);
> - old = cmpxchg(&cache->xfer, NULL, chp->first);
> + list_splice_entire_tail(old, chpfirst);
> + old = cmpxchg(&cache->xfer, NULL, chpfirst);
>   } while (old);
>  
> - chp->first = NULL;
> - chp->count = 0;
> +
> + __this_cpu_write(chpfirst, NULL);
> + __this_cpu_write(cache->percpu->count, 0);
>  end:
>   local_irq_restore(flags);
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/3] Runtime Interpreted Power Sequences

2012-11-16 Thread Thierry Reding

On Fri, Nov 16, 2012 at 03:38:20PM +0900, Alexandre Courbot wrote:
> Hopefully the final series before the feature gets merged. Anton Vorontsov
> kindly accepted to take it into his tree, so this series is mostly a call for
> acks, tests and reviews notices before the merge window for 3.8 opens. If you
> are interested in seeing this feature, please add your name.
> 
> This series also adds an entry for the subsystem into MAINTAINERS, setting me 
> as
> the person in charge.
> 
> Changes from v7:
> - fix bug reported by Tony Prisk
> - add MAINTAINERS entry
> 
> Alexandre Courbot (3):
>   Runtime Interpreted Power Sequences
>   pwm_backlight: use power sequences
>   Take maintainership of power sequences
> 
>  .../devicetree/bindings/power/power_seq.txt| 121 +++
>  .../bindings/video/backlight/pwm-backlight.txt |  63 +++-
>  Documentation/power/power_seq.txt  | 253 ++
>  MAINTAINERS|  10 +
>  drivers/power/Kconfig  |   1 +
>  drivers/power/Makefile |   1 +
>  drivers/power/power_seq/Kconfig|   2 +
>  drivers/power/power_seq/Makefile   |   1 +
>  drivers/power/power_seq/power_seq.c| 376 
> +
>  drivers/power/power_seq/power_seq_delay.c  |  65 
>  drivers/power/power_seq/power_seq_gpio.c   |  94 ++
>  drivers/power/power_seq/power_seq_pwm.c|  82 +
>  drivers/power/power_seq/power_seq_regulator.c  |  83 +
>  drivers/video/backlight/Kconfig|   1 +
>  drivers/video/backlight/pwm_bl.c   | 160 +++--
>  include/linux/power_seq.h  | 203 +++
>  include/linux/pwm_backlight.h  |  18 +-
>  17 files changed, 1494 insertions(+), 40 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/power/power_seq.txt
>  create mode 100644 Documentation/power/power_seq.txt
>  create mode 100644 drivers/power/power_seq/Kconfig
>  create mode 100644 drivers/power/power_seq/Makefile
>  create mode 100644 drivers/power/power_seq/power_seq.c
>  create mode 100644 drivers/power/power_seq/power_seq_delay.c
>  create mode 100644 drivers/power/power_seq/power_seq_gpio.c
>  create mode 100644 drivers/power/power_seq/power_seq_pwm.c
>  create mode 100644 drivers/power/power_seq/power_seq_regulator.c
>  create mode 100644 include/linux/power_seq.h

The series,

Tested-by: Thierry Reding 
Acked-by: Thierry Reding 


pgpKQFMzVkOaB.pgp
Description: PGP signature

Re: [PATCH 1/3, v5] AMD64 EDAC: Add muli-domain support

2012-11-16 Thread Daniel J Blueman


On 12/11/2012 21:24, Borislav Petkov wrote:

On Mon, Nov 05, 2012 at 02:05:24PM +0800, Daniel J Blueman wrote:

Fix the handling of memory controller detection to index the array
of detected Northbridges, allowing memory controllers over multiple
PCI domains in federated systems eg using Numascale's NumaConnect/
NumaChip.

v4: Generate linear Northbridge ID by indexing detected Northbridges
v5: Reorder functions to prevent extra function declaration; merge 4th
 patch; simplify Fam15h code; add detail to warning

Signed-off-by: Daniel J Blueman 


Acked-by: Borislav Petkov 

Btw, I don't have access to a multi-socket single-board AMD system right
now so would you please test the patchset on such a system too, if you
haven't done so yet?

Thanks a lot.


Yep, the expected memory controller indexes, population, column-strobe 
rows, banks and sysfs paths are detected on my hex-northbridge fam10h 
box with 3.7-rc5 with these patches:


EDAC MC: Ver: 3.0.0
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS2: Registered DDR3 RAM
EDAC amd64: CS3: Registered DDR3 RAM
EDAC MC0: Giving out device to 'amd64_edac' 'F10h': DEV :00:18.2
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS2: Registered DDR3 RAM
EDAC amd64: CS3: Registered DDR3 RAM
EDAC MC1: Giving out device to 'amd64_edac' 'F10h': DEV :00:19.2
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 2).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS2: Registered DDR3 RAM
EDAC amd64: CS3: Registered DDR3 RAM
EDAC MC2: Giving out device to 'amd64_edac' 'F10h': DEV :00:1a.2
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 3).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS2: Registered DDR3 RAM
EDAC amd64: CS3: Registered DDR3 RAM
EDAC MC3: Giving out device to 'amd64_edac' 'F10h': DEV :00:1b.2
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 4).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS2: Registered DDR3 RAM
EDAC amd64: CS3: Registered DDR3 RAM
EDAC MC4: Giving out device to 'amd64_edac' 'F10h': DEV :00:1c.2
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 5).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:   0MB 1:   0MB
EDAC amd64: MC: 2: 4096MB 3: 4096MB
EDAC amd64: MC: 4:   0MB 5:   0MB
EDAC amd64: MC: 6:   0MB 7:   0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS2: Registered DDR3 RAM
EDAC amd64: CS3: Registered DDR3 RAM
EDAC MC5: Giving out device to 'amd64_edac' 'F10h': DEV :00:1d.2
EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI 
controller': DEV ':00:18.2' (POLLED)


root@ibm-x3755-01:/sys/devices/system/edac# ls -d mc/mc*/{rank*,csrow*}
mc/mc0/csrow2  mc/mc1/csrow2  mc/mc2/csrow2  mc/mc3/csrow2 
mc/mc4/csrow2  mc/mc5/csrow2
mc/mc0/csrow3  mc/mc1/csrow3  mc/mc2/csrow3  mc/mc3/csrow3 
mc/mc4/csrow3  mc/mc5/csrow3
m

Re: [PATCH v8 2/3] pwm_backlight: use power sequences

2012-11-16 Thread Thierry Reding

On Fri, Nov 16, 2012 at 03:38:22PM +0900, Alexandre Courbot wrote:
> Make use of the power sequences specified in the device tree or platform
> data to control how the backlight is powered on and off.
> 
> Signed-off-by: Alexandre Courbot 
> Reviewed-by: Stephen Warren 

Anton,

I would normally take pwm-backlight patches through the PWM tree, but
since this patch depends on the first in this series, I think it would
be easier if you took it through your tree along with the rest of the
series. If that's okay with you:

Acked-by: Thierry Reding 

Otherwise just let me know and I'm sure we can coordinate to take this
in via the two trees separately.

Thanks,
Thierry

pgpB2mTxIhXL2.pgp
Description: PGP signature

Re: [PATCH 1/8] mtd: diskonchip: don't warn about ARM architecture

2012-11-16 Thread Artem Bityutskiy

On Tue, 2012-11-06 at 22:55 +0100, Arnd Bergmann wrote:
> Enabling the diskonchip drivers on most architectures results
> in a pointless warning "#warning Unknown architecture for
> DiskOnChip. No default probe locations defined". The driver
> can in fact handle the default location already through the
> CONFIG_MTD_DOCPROBE_ADDRESS, which gets set on the platforms
> that need it, and we get a run-time error if this is not
> set correctly.

Pushed patches 1, 2, and 3 to l2-mtd.git, thanks!

-- 
Best Regards,
Artem Bityutskiy


signature.asc
Description: This is a digitally signed message part

[PATCH 5/9] x86: Merge early_reserve_initrd for 32bit and 64bit

2012-11-16 Thread Yinghai Lu

They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first, and just need to make sure
we reserve them before we are using memblock to find free mem to use.

Signed-off-by: Yinghai Lu 
---
 arch/x86/kernel/head32.c |   11 ---
 arch/x86/kernel/head64.c |   11 ---
 arch/x86/kernel/setup.c  |   22 ++
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index c18f59d..4c52efc 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -33,17 +33,6 @@ void __init i386_start_kernel(void)
memblock_reserve(__pa_symbol(&_text),
 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Reserve INITRD */
-   if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-   /* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
-   u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-   }
-#endif
-
/* Call the subarch specific early setup function */
switch (boot_params.hdr.hardware_subarch) {
case X86_SUBARCH_MRST:
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..00e612a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -100,17 +100,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
memblock_reserve(__pa_symbol(&_text),
 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Reserve INITRD */
-   if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-   /* Assume only end is not page aligned */
-   unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
-   unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
-   unsigned long ramdisk_end   = PAGE_ALIGN(ramdisk_image + 
ramdisk_size);
-   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-   }
-#endif
-
reserve_ebda_region();
 
/*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6d29d1f..ee6d267 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -364,6 +364,19 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 
return mapped_pages << PAGE_SHIFT;
 }
+static void __init early_reserve_initrd(void)
+{
+   /* Assume only end is not page aligned */
+   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+
+   if (!boot_params.hdr.type_of_loader ||
+   !ramdisk_image || !ramdisk_size)
+   return; /* No initrd provided by bootloader */
+
+   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
+}
 static void __init reserve_initrd(void)
 {
/* Assume only end is not page aligned */
@@ -390,10 +403,6 @@ static void __init reserve_initrd(void)
if (pfn_range_is_mapped(PFN_DOWN(ramdisk_image),
PFN_DOWN(ramdisk_end))) {
/* All are mapped, easy case */
-   /*
-* don't need to reserve again, already reserved early
-* in i386_start_kernel
-*/
initrd_start = ramdisk_image + PAGE_OFFSET;
initrd_end = initrd_start + ramdisk_size;
return;
@@ -404,6 +413,9 @@ static void __init reserve_initrd(void)
memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
 }
 #else
+static void __init early_reserve_initrd(void)
+{
+}
 static void __init reserve_initrd(void)
 {
 }
@@ -665,6 +677,8 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+   early_reserve_initrd();
+
 #ifdef CONFIG_X86_32
memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
visws_early_detect();
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/9] x86: ramdisk info print with high bits.

2012-11-16 Thread Yinghai Lu

ramdisk could be loaded high now for 64bit.

So need to print more digits for them.

Signed-off-by: Yinghai Lu 
Cc: Matt Fleming 
---
 arch/x86/kernel/setup.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c2eb535..0e13c6e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -341,7 +341,7 @@ static void __init relocate_initrd(void)
memblock_reserve(ramdisk_here, area_size);
initrd_start = ramdisk_here + PAGE_OFFSET;
initrd_end   = initrd_start + ramdisk_size;
-   printk(KERN_INFO "Allocated new RAMDISK: [mem %#010llx-%#010llx]\n",
+   printk(KERN_INFO "Allocated new RAMDISK: [mem %#018llx-%#018llx]\n",
 ramdisk_here, ramdisk_here + ramdisk_size - 1);
 
q = (char *)initrd_start;
@@ -363,8 +363,8 @@ static void __init relocate_initrd(void)
 
ramdisk_image = get_ramdisk_image();
ramdisk_size  = get_ramdisk_size();
-   printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
-   " [mem %#010llx-%#010llx]\n",
+   printk(KERN_INFO "Move RAMDISK from [mem %#018llx-%#018llx] to"
+   " [mem %#018llx-%#018llx]\n",
ramdisk_image, ramdisk_image + ramdisk_size - 1,
ramdisk_here, ramdisk_here + ramdisk_size - 1);
 }
@@ -416,7 +416,7 @@ static void __init reserve_initrd(void)
   "disabling initrd (%lld needed, %lld available)\n",
   ramdisk_size, mapped_size>>1);
 
-   printk(KERN_INFO "RAMDISK: [mem %#010llx-%#010llx]\n", ramdisk_image,
+   printk(KERN_INFO "RAMDISK: [mem %#018llx-%#018llx]\n", ramdisk_image,
ramdisk_end - 1);
 
if (pfn_range_is_mapped(PFN_DOWN(ramdisk_image),
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/9] x86, boot: Move lldt/ltr out of 64bit only path

2012-11-16 Thread Yinghai Lu

commit 08da5a2ca

x86_64: Early segment setup for VT

add lldt/ltr to clean more segments.

but those code are put in code64, and it is using gdt that is only loaded
from code32 path.

that breaks booting with 64bit bootloader that does not go that
code32 path, because they have different gdt.

Move those lines into code32 after their gdt is loaded.

Also it push down startup_64 to 0x208.

Signed-off-by: Yinghai Lu 
Cc: Zachary Amsden 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/head_64.S |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2c4b171..4c8af67 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -154,6 +154,12 @@ ENTRY(startup_32)
btsl$_EFER_LME, %eax
wrmsr
 
+   /* After gdt is loaded */
+   xorl%eax, %eax
+   lldt%ax
+   movl$0x20, %eax
+   ltr %ax
+
/*
 * Setup for the jump to 64bit mode
 *
@@ -191,7 +197,7 @@ no_longmode:
 * it may change in the future.
 */
.code64
-   .org 0x200
+   .org 0x208
 ENTRY(startup_64)
/*
 * We come here either from startup_32 or directly from a
@@ -247,9 +253,6 @@ preferred_addr:
movl%eax, %ss
movl%eax, %fs
movl%eax, %gs
-   lldt%ax
-   movl$0x20, %eax
-   ltr %ax
 
/*
 * Compute the decompressed kernel start address.  It is where
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 9/9] x86: remove 1024g limitation for kexec buffer on 64bit

2012-11-16 Thread Yinghai Lu

could find buffer above that, remove that obsolete limitation.
use MAXMEM instead.

Tested on sytem more than 1024g ram.

Signed-off-by: Yinghai Lu 
Cc: "Eric W. Biederman" 
---
 arch/x86/include/asm/kexec.h |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..11bfdc5 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -48,11 +48,11 @@
 # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
 #else
 /* Maximum physical address we can use pages from */
-# define KEXEC_SOURCE_MEMORY_LIMIT  (0xFFUL)
+# define KEXEC_SOURCE_MEMORY_LIMIT  (MAXMEM-1)
 /* Maximum address we can reach in physical address mode */
-# define KEXEC_DESTINATION_MEMORY_LIMIT (0xFFUL)
+# define KEXEC_DESTINATION_MEMORY_LIMIT (MAXMEM-1)
 /* Maximum address we can use for the control pages */
-# define KEXEC_CONTROL_MEMORY_LIMIT (0xFFUL)
+# define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
 
 /* Allocate one page for the pdp and the second for the code */
 # define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/9] x86, 64bit: set extra ident page table for whole kernel range

2012-11-16 Thread Yinghai Lu

Current when kernel is loaded above 1G, only [_text, _text+2M]
is set up with extra ident page table.
That is not enough, some variables that could be used early are
out of that range. (like gdt...)

Just set map with [_text, _end] for all text/data/bss/brk...

Signed-off-by: Yinghai Lu 
Cc: "Eric W. Biederman" 
---
 arch/x86/kernel/head_64.S |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..efc0c08 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -115,7 +115,16 @@ startup_64:
andq$(PTRS_PER_PMD - 1), %rax
leaq__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
leaqlevel2_spare_pgt(%rip), %rbx
-   movq%rdx, 0(%rbx, %rax, 8)
+   leaq_end(%rip), %r8
+   decq%r8
+   shrq$PMD_SHIFT, %r8
+   andq$(PTRS_PER_PMD - 1), %r8
+1: movq%rdx, 0(%rbx, %rax, 8)
+   addq$PMD_SIZE, %rdx
+   incq%rax
+   cmp %r8, %rax
+   jle 1b
+
 ident_complete:
 
/*
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/9] x86, boot: add field to support load bzImage and ramdisk high

2012-11-16 Thread Yinghai Lu

ext_ramdisk_image/size will record high 32bits for ramdisk info.

code64_start_offset will be set for bzImage that support 64bit.

Signed-off-by: Yinghai Lu 
Cc: Rob Landley 
Cc: Matt Fleming 
---
 Documentation/x86/boot.txt   |4 
 arch/x86/boot/header.S   |   11 ++-
 arch/x86/include/asm/bootparam.h |3 +++
 arch/x86/kernel/setup.c  |6 ++
 4 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 9efceff..26c7dcc 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -193,6 +193,10 @@ Offset Proto   NameMeaning
 0258/8 2.10+   pref_addressPreferred loading address
 0260/4 2.10+   init_size   Linear memory required during initialization
 0264/4 2.11+   handover_offset Offset of handover entry point
+0268/4 2.12+   ext_ramdisk_image initrd load address high 32 bits
+   (set by boot loader)
+026C/4 2.12+   ext_ramdisk_size initrd size high 32 bits (set by boot loader)
+0270/4 2.12+   code64_start_offset 64bit start offset for bzImage
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 2a01744..85ce1a4 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -279,7 +279,7 @@ _start:
# Part 2 of the header, from the old setup.S
 
.ascii  "HdrS"  # header signature
-   .word   0x020b  # header version number (>= 0x0105)
+   .word   0x020c  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -400,6 +400,15 @@ init_size: .long INIT_SIZE # kernel 
initialization size
 handover_offset:   .long 0x30  # offset to the handover
# protocol entry point
 
+ext_ramdisk_image: .long   0   # address of loaded ramdisk image
+   # Here the loader puts the high 32 bits
+   # address where it loaded the image.
+   # This only will be read by the kernel.
+
+ext_ramdisk_size:  .long   0   # its size in bytes high 32 bits.
+code64_start_offset:   .long   BOOT_CODE64_START_OFFSET# startup_64 in
+   # arch/x86/boot/compressed/head_64.S
+
 # End of setup header #
 
.section ".entrytext", "ax"
diff --git a/arch/x86/include/asm/bootparam.h b/arch/x86/include/asm/bootparam.h
index 2ad874c..4e42fd1 100644
--- a/arch/x86/include/asm/bootparam.h
+++ b/arch/x86/include/asm/bootparam.h
@@ -67,6 +67,9 @@ struct setup_header {
__u64   pref_address;
__u32   init_size;
__u32   handover_offset;
+   __u32   ext_ramdisk_image;
+   __u32   ext_ramdisk_size;
+   __u32   code64_start_offset;
 } __attribute__((packed));
 
 struct sys_desc_table {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 194e151..c2eb535 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -302,12 +302,18 @@ static u64 __init get_ramdisk_image(void)
 {
u64 ramdisk_image = boot_params.hdr.ramdisk_image;
 
+   if (boot_params.hdr.version >= 0x020c)
+   ramdisk_image |= (u64)boot_params.hdr.ext_ramdisk_image << 32;
+
return ramdisk_image;
 }
 static u64 __init get_ramdisk_size(void)
 {
u64 ramdisk_size = boot_params.hdr.ramdisk_size;
 
+   if (boot_params.hdr.version >= 0x020c)
+   ramdisk_size |= (u64)boot_params.hdr.ext_ramdisk_size << 32;
+
return ramdisk_size;
 }
 
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/9] x86, 64bit: add support for loading kernel above 512G

2012-11-16 Thread Yinghai Lu

Current kernel is not allowed to be loaded above 512g, it thinks
that address is too big.

We only need to add one extra one spare page for needed level3 to
point another 512g range.

Need to check _text range and set level4 pg to point to that spare
level3 page, and set level3 to point to level2 page to cover
[_text, _end] with extra mapping.

We need this to put relocatable bzImage high above 512g.

Signed-off-by: Yinghai Lu 
Cc: "Eric W. Biederman" 
---
 arch/x86/kernel/head_64.S |   34 +++---
 1 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index efc0c08..32fa9d0 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -78,12 +78,6 @@ startup_64:
testl   %eax, %eax
jnz bad_address
 
-   /* Is the address too large? */
-   leaq_text(%rip), %rdx
-   movq$PGDIR_SIZE, %rax
-   cmpq%rax, %rdx
-   jae bad_address
-
/* Fixup the physical addresses in the page table
 */
addq%rbp, init_level4_pgt + 0(%rip)
@@ -102,12 +96,35 @@ startup_64:
andq$PMD_PAGE_MASK, %rdi
 
movq%rdi, %rax
+   shrq$PGDIR_SHIFT, %rax
+   andq$(PTRS_PER_PGD - 1), %rax
+   jz  skip_level3_spare
+
+   /* Set level3 at first */
+   leaq(level3_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   leaqinit_level4_pgt(%rip), %rbx
+   movq%rdx, 0(%rbx, %rax, 8)
+   addq$L4_PAGE_OFFSET, %rax
+   movq%rdx, 0(%rbx, %rax, 8)
+
+   /* always need to set level2 */
+   movq%rdi, %rax
+   shrq$PUD_SHIFT, %rax
+   andq$(PTRS_PER_PUD - 1), %rax
+   leaqlevel3_spare_pgt(%rip), %rbx
+   jmp set_level2_spare
+
+skip_level3_spare:
+   movq%rdi, %rax
shrq$PUD_SHIFT, %rax
andq$(PTRS_PER_PUD - 1), %rax
jz  ident_complete
 
-   leaq(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   /* only set level2 with out level3 spare */
leaqlevel3_ident_pgt(%rip), %rbx
+
+set_level2_spare:
+   leaq(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
movq%rdx, 0(%rbx, %rax, 8)
 
movq%rdi, %rax
@@ -435,6 +452,9 @@ NEXT_PAGE(level2_kernel_pgt)
PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
KERNEL_IMAGE_SIZE/PMD_SIZE)
 
+NEXT_PAGE(level3_spare_pgt)
+   .fill   512, 8, 0
+
 NEXT_PAGE(level2_spare_pgt)
.fill   512, 8, 0
 
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/9] x86: add get_ramdisk_image/size

2012-11-16 Thread Yinghai Lu

There several places to find ramdisk early for reserving and relocating.

use functions to make code more readable and consistent.

later will add ext_ramdisk_image/size with those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu 
---
 arch/x86/kernel/setup.c |   29 +
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ee6d267..194e151 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -298,12 +298,25 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+static u64 __init get_ramdisk_image(void)
+{
+   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+   return ramdisk_image;
+}
+static u64 __init get_ramdisk_size(void)
+{
+   u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+   return ramdisk_size;
+}
+
 #define MAX_MAP_CHUNK  (NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_image = get_ramdisk_image();
+   u64 ramdisk_size  = get_ramdisk_size();
u64 area_size = PAGE_ALIGN(ramdisk_size);
u64 ramdisk_here;
unsigned long slop, clen, mapaddr;
@@ -342,8 +355,8 @@ static void __init relocate_initrd(void)
ramdisk_size  -= clen;
}
 
-   ramdisk_image = boot_params.hdr.ramdisk_image;
-   ramdisk_size  = boot_params.hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image();
+   ramdisk_size  = get_ramdisk_size();
printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
" [mem %#010llx-%#010llx]\n",
ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -367,8 +380,8 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 static void __init early_reserve_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_image = get_ramdisk_image();
+   u64 ramdisk_size  = get_ramdisk_size();
u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
if (!boot_params.hdr.type_of_loader ||
@@ -380,8 +393,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_image = get_ramdisk_image();
+   u64 ramdisk_size  = get_ramdisk_size();
u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
u64 mapped_size;
 
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/9] x86, boot, 64bit: Add support for loading ramdisk and bzImage high

2012-11-16 Thread Yinghai Lu

Now we have limit kdump reseved under 896M, because kexec has the limitation.
and also bzImage need to stay under 4g.

To make kexec/kdump could use range above 4g, we need to make bzImage and
ramdisk could be loaded above 4g.
During booting bzImage will be unpacked on same postion and stay high.

The patches add field in boot header to
1. get info about ramdisk position info above 4g from bootloader/kexec
2. set code64_start_offset in header for bzImage and bootloader/kexec load
   could check that to decide if need to put bzImage high.

This patches is tested with kexec tools with local changes, will send kexec
tools change to kexec list later.

could be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
for-x86-boot

and it is on top of for-x86-mm

Thanks

Yinghai

Yinghai Lu (9):
  x86, boot: Move lldt/ltr out of 64bit only path
  x86: Add macro for 64bit entry for bzImage
  x86, 64bit: set extra ident page table for whole kernel range
  x86, 64bit: add support for loading kernel above 512G
  x86: Merge early_reserve_initrd for 32bit and 64bit
  x86: add get_ramdisk_image/size
  x86, boot: add field to support load bzImage and ramdisk high
  x86: ramdisk info print with high bits.
  x86: remove 1024g limitation for kexec buffer on 64bit

 Documentation/x86/boot.txt |4 ++
 arch/x86/boot/compressed/head_64.S |   11 --
 arch/x86/boot/header.S |   11 ++-
 arch/x86/include/asm/boot.h|2 +
 arch/x86/include/asm/bootparam.h   |3 ++
 arch/x86/include/asm/kexec.h   |6 ++--
 arch/x86/kernel/head32.c   |   11 --
 arch/x86/kernel/head64.c   |   11 --
 arch/x86/kernel/head_64.S  |   45 ++-
 arch/x86/kernel/setup.c|   61 +++
 10 files changed, 113 insertions(+), 52 deletions(-)

-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/9] x86: Add macro for 64bit entry for bzImage

2012-11-16 Thread Yinghai Lu

will use it with bzImage header.

Signed-off-by: Yinghai Lu 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/head_64.S |2 +-
 arch/x86/include/asm/boot.h|2 ++
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 4c8af67..0733638 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -197,7 +197,7 @@ no_longmode:
 * it may change in the future.
 */
.code64
-   .org 0x208
+   .org BOOT_CODE64_START_OFFSET
 ENTRY(startup_64)
/*
 * We come here either from startup_32 or directly from a
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index b13fe63..bcae405 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -38,8 +38,10 @@
 
 #ifdef CONFIG_X86_64
 #define BOOT_STACK_SIZE0x4000
+#define BOOT_CODE64_START_OFFSET 0x208
 #else
 #define BOOT_STACK_SIZE0x1000
+#define BOOT_CODE64_START_OFFSET 0
 #endif
 
 #endif /* __KERNEL__ */
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] powerpc, perf: Change PMU flag values representation from decimal to hex

2012-11-16 Thread Anshuman Khandual

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/perf_event_server.h | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 9710be3..e3f10bb 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 
 #define MAX_HWEVENTS   8
 #define MAX_EVENT_ALTERNATIVES 8
@@ -45,11 +46,21 @@ struct power_pmu {
 /*
  * Values for power_pmu.flags
  */
-#define PPMU_LIMITED_PMC5_61   /* PMC5/6 have limited function */
-#define PPMU_ALT_SIPR  2   /* uses alternate posn for SIPR/HV */
-#define PPMU_NO_SIPR   4   /* no SIPR/HV in MMCRA at all */
-#define PPMU_NO_CONT_SAMPLING  8   /* no continuous sampling */
-#define PPMU_SIAR_VALID16  /* Processor has SIAR Valid bit 
*/
+
+#define PPMU_LIMITED_PMC5_6\
+   LONG_ASM_CONST(0x0001) /* PMC5/6 have limited function */
+
+#define PPMU_ALT_SIPR  \
+   LONG_ASM_CONST(0x0002) /* uses alternate posn for SIPR/HV */
+
+#define PPMU_NO_SIPR   \
+   LONG_ASM_CONST(0x0004) /* no SIPR/HV in MMCRA at all */
+
+#define PPMU_NO_CONT_SAMPLING  \
+   LONG_ASM_CONST(0x0008) /* no continuous sampling */
+
+#define PPMU_SIAR_VALID\
+   LONG_ASM_CONST(0x0010) /* Processor has SIAR Valid bit */
 
 /*
  * Values for flags to get_alternatives()
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] jffs2: Make handling of erase_completion_lock consistent in jffs2_do_reserve_space()

2012-11-16 Thread Artem Bityutskiy

On Mon, 2012-11-05 at 22:40 +0400, Alexey Khoroshilov wrote:
> Users of jffs2_do_reserve_space() expect they still held
> erase_completion_lock after call to it. But there is a path
> where jffs2_do_reserve_space() leaves erase_completion_lock unlocked.
> The patch fixes it.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Pushed to l2-mtd.git, thanks!

-- 
Best Regards,
Artem Bityutskiy


signature.asc
Description: This is a digitally signed message part

RE: [RFC] staging: ste_rmi4: merge into the main kernel tree

2012-11-16 Thread Alexandra Chin

Hi Dmitry,

> > > Please have Greg commit the patch that Henrik reviewed to staging and
> > > then work with Christopher Heiny group on getting the full featured
> > > driver into mainline.

Thanks for your reminding, final patch has been re-sent to staging
maintainer.

> In this case you need to enumerate the benefits of this driver over
> unified driver and show why the unified driver can't be fixed.
> 
> Currently the driver is in mainline (even though the directory is
> staging) and nobody will remove it until another driver is fully
> functional and ready for prime time. But once this happens I do not see
> the benefits of maintaining 2 drivers for the same hardware.

I see. Given that the driver will remain in mainline before generic driver
is ready, currently I will continue to maintain the driver in staging.

Appreciate your response.

Alexandra
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 1/3] Runtime Interpreted Power Sequences

2012-11-16 Thread Srinivas KANDAGATLA

On 16/11/12 08:31, Alex Courbot wrote:
> Hi Srinivas,
>
> On Friday 16 November 2012 15:58:29 Srinivas KANDAGATLA wrote:
>> Hi Alex,
>> I am looking forward for this feature to be mainlined,
> *cough* Ack *cough* :)
:-)
>> but I have
>> comment on the way the types are tied up to power seq infrastructure.
>> I know your use case are limited to using type "delay", "pwm" and "gpio"
>> and "regulator", However there are instances where the devices can be
>> powered up or reset by writing to special registers or sysconfs or
>> something else.
>> So My suggestion would be to make these type register them selfs
>> dynamically with the power_seq infrastructure so that in future this can
>> be extended to other types as-well.
>> This trivial change can make a lot of difference for the future chips
>> which do thing bit differently.
>> ST Microelectronics chips fit it in these category and I guess other
>> Vendors have this similar chips.
> The current implementation is (purposedly) minimal and will certainly be 
> extended. There are other aspects of regulators for instance that should also 
> be controllable (voltage comes to mind). And I am totally open to supporting 
> new kinds of resources as usage broadens. For this first version I just 
> wanted 
> to introduce the feature and minimize the impact should anything (DT 
> bindings?) need to change.
Ok I agree. I was thinking more of to fit few things specific to our
chip via power-seqs.
>
> I am a little bit skeptical about the purpose of directly accessing registers 
> (or any part of the address space) from power sequences. It should at least 
> be 
> possible to involve some kind of abstraction. Not necessarily one of the 
> currently supported types - but at least something.
Yes, There is a level of abstraction (aka sysconf) in our case.. again
it is not mainlined yet.
>
> The reason is that I'd like to try and avoid direct references to resources 
> within sequences as much as possible to make them reusable. If your system 
> has 
> two identical devices, you should not need to duplicate their sequences just 
> to change a register range from the few steps that make use of it. If you can 
> do the same job with, say, a regulator, you can just give it a name, get it 
> at 
> runtime using regulator_get() and define it outside of the sequence, in our 
> device node.
>
> Of course there might be scenarios where you really need to access a register 
> and there is no way to do otherwise, in this case I am open to discussion. 
> But 
> before resorting to this I'd like to make that the existing abstraction 
> cannot 
> cover the case already.
yep.

thanks,
srini
>
> Alex.
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] power: generic-adc-battery: Fix checking return value of request_any_context_irq

2012-11-16 Thread Axel Lin

On failure, request_any_context_irq() returns a negative value.
On success, it returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED.

Signed-off-by: Axel Lin 
---
This was sent on https://lkml.org/lkml/2012/10/11/77
Nobody picks up this patch, so resend it.
I think this patch is for 3.7.

Regards,
Axel
 drivers/power/generic-adc-battery.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/power/generic-adc-battery.c 
b/drivers/power/generic-adc-battery.c
index 9bdf444..4a00ef7 100644
--- a/drivers/power/generic-adc-battery.c
+++ b/drivers/power/generic-adc-battery.c
@@ -327,7 +327,7 @@ static int __devinit gab_probe(struct platform_device *pdev)
ret = request_any_context_irq(irq, gab_charged,
IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING,
"battery charged", adc_bat);
-   if (ret)
+   if (ret < 0)
goto err_gpio;
}
 
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFT 1/2] pinctrl: dove: Fix dove_audio1_ctrl_set when BIT(0|1|2|3) of config is clear

2012-11-16 Thread Axel Lin

2012/11/12 Sebastian Hesselbarth :
> On 11/11/2012 02:44 PM, Axel Lin wrote:
>>
>> Current implementation in dove_audio1_ctrl_set() does not clear
>> corresponding
>> register bit if BIT(0|1|2|3) of config is clear. Fix it.
>>
>> Signed-off-by: Axel Lin
>
>
> Axel,
>
> thanks for the hint on not clearing the audio1 bits before setting them.
> I will test this tonight and guess you should prepare a real patch for
> this one.

Hi Sebastian,
Does this patch work for you?
May I have your Acked-and-Tested-by if you think this patch is ok.

Thanks,
Axel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/14] xen: netback: Remove redundant check on unsigned variable

2012-11-16 Thread Ian Campbell

On Fri, 2012-11-16 at 06:50 +, Tushar Behera wrote:
> No need to check whether unsigned variable is less than 0.
> 
> CC: Ian Campbell 
> CC: xen-de...@lists.xensource.com
> CC: net...@vger.kernel.org
> Signed-off-by: Tushar Behera 

Acked-by: Ian Campbell 

Thanks.

> ---
>  drivers/net/xen-netback/netback.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index aab8677..515e10c 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -190,14 +190,14 @@ static int get_page_ext(struct page *pg,
>  
>   group = ext.e.group - 1;
>  
> - if (group < 0 || group >= xen_netbk_group_nr)
> + if (group >= xen_netbk_group_nr)
>   return 0;
>  
>   netbk = &xen_netbk[group];
>  
>   idx = ext.e.idx;
>  
> - if ((idx < 0) || (idx >= MAX_PENDING_REQS))
> + if (idx >= MAX_PENDING_REQS)
>   return 0;
>  
>   if (netbk->mmap_pages[idx] != pg)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] misc: apds9802als: Fix the logic checking timeout in als_wait_for_data_ready()

2012-11-16 Thread Axel Lin

In the case of timeout waiting for data ready, the retry variable is -1.
This also fixes a bug: current code returns -ETIMEDOUT if latest retry success
( which means retry is 0 when exiting the while loop ).

Signed-off-by: Axel Lin 
---
Resend to Greg's correct email address.

 drivers/misc/apds9802als.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/apds9802als.c b/drivers/misc/apds9802als.c
index 0314773..94923d2 100644
--- a/drivers/misc/apds9802als.c
+++ b/drivers/misc/apds9802als.c
@@ -68,7 +68,7 @@ static int als_wait_for_data_ready(struct device *dev)
ret = i2c_smbus_read_byte_data(client, 0x86);
} while (!(ret & 0x80) && retry--);
 
-   if (!retry) {
+   if (retry < 0) {
dev_warn(dev, "timeout waiting for data ready\n");
return -ETIMEDOUT;
}
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH]cputime: make bool type for steal ticks

2012-11-16 Thread liguang

Signed-off-by: liguang 
---
 kernel/sched/cputime.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 81b763b..d2c24c1 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -282,7 +282,7 @@ static __always_inline bool steal_account_process_tick(void)
this_rq()->prev_steal_time += st * TICK_NSEC;
 
account_steal_time(st);
-   return st;
+   return !!st;
}
 #endif
return false;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/14] HID: hiddev: Remove redundant check on unsigned variable

2012-11-16 Thread Jiri Kosina

On Fri, 16 Nov 2012, Tushar Behera wrote:

> No need to check whether unsigned variable is less than 0.
> 
> CC: Jiri Kosina 
> CC: linux-...@vger.kernel.org
> CC: linux-in...@vger.kernel.org
> Signed-off-by: Tushar Behera 
> ---
>  drivers/hid/usbhid/hiddev.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
> index 14599e2..711c965 100644
> --- a/drivers/hid/usbhid/hiddev.c
> +++ b/drivers/hid/usbhid/hiddev.c
> @@ -625,7 +625,7 @@ static long hiddev_ioctl(struct file *file, unsigned int 
> cmd, unsigned long arg)
>   break;
>  
>   case HIDIOCAPPLICATION:
> - if (arg < 0 || arg >= hid->maxapplication)
> + if (arg >= hid->maxapplication)
>   break;
>  
>   for (i = 0; i < hid->maxcollection; i++)

Applied, thanks.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications

2012-11-16 Thread Glauber Costa

On 11/16/2012 01:25 AM, David Rientjes wrote:
> On Thu, 15 Nov 2012, Anton Vorontsov wrote:
> 
>> Hehe, you're saying that we have to have cgroups=y. :) But some folks were
>> deliberately asking us to make the cgroups optional.
>>
> 
> Enabling just CONFIG_CGROUPS (which is enabled by default) and no other 
> current cgroups increases the size of the kernel text by less than 0.3% 
> with x86_64 defconfig:
> 
>text  data bss dec hex filename
> 10330039  1038912 1118208 12487159 be89f7 vmlinux.disabled
> 10360993  1041624 1122304 12524921 bf1d79 vmlinux.enabled
> 
> I understand that users with minimally-enabled configs for an optimized 
> memory footprint will have a higher percentage because their kernel is 
> already smaller (~1.8% increase for allnoconfig), but I think the cost of 
> enabling the cgroups code to be able to mount a vmpressure cgroup (which 
> I'd rename to be "mempressure" to be consistent with "memcg" but it's only 
> an opinion) is relatively small and allows for a much more maintainable 
> and extendable feature to be included: it already provides the 
> cgroup.event_control interface that supports eventfd that makes 
> implementation much easier.  It also makes writing a library on top of the 
> cgroup to be much easier because of the standardization.
> 
> I'm more concerned about what to do with the memcg memory thresholds and 
> whether they can be replaced with this new cgroup.  If so, then we'll have 
> to figure out how to map those triggers to use the new cgroup's interface 
> in a way that doesn't break current users that open and pass the fd of 
> memory.usage_in_bytes to cgroup.event_control for memcg.
> 
>> OK, here is what I can try to do:
>>
>> - Implement memory pressure cgroup as you described, by doing so we'd make
>>   the thing play well with cpusets and memcg;
>>
>> - This will be eventfd()-based;
>>
> 
> Should be based on cgroup.event_control, see how memcg interfaces its 
> memory thresholds with this in Documentation/cgroups/memory.txt.
> 
>> - Once done, we will have a solution for pretty much every major use-case
>>   (i.e. servers, desktops and Android, they all have cgroups enabled);
>>
> 
> Excellent!  I'd be interested in hearing anybody else's opinions, 
> especially those from the memcg world, so we make sure that everybody is 
> happy with the API that you've described.
> 
Just CC'd them all.

My personal take:

Most people hate memcg due to the cost it imposes. I've already
demonstrated that with some effort, it doesn't necessarily have to be
so. (http://lwn.net/Articles/517634/)

The one thing I missed on that work, was precisely notifications. If you
can come up with a good notifications scheme that *lives* in memcg, but
does not *depend* in the memcg infrastructure, I personally think it
could be a big win.

Doing this in memcg has the advantage that the "per-group" vs "global"
is automatically solved, since the root memcg is just another name for
"global".

I honestly like your low/high/oom scheme better than memcg's
"threshold-in-bytes". I would also point out that those thresholds are
*far* from exact, due to the stock charging mechanism, and can be wrong
by as much as O(#cpus). So far, nobody complained. So in theory it
should be possible to convert memcg to low/high/oom, while still
accepting writes in bytes, that would be thrown in the closest bucket.

Another thing from one of your e-mails, that may shift you in the memcg
direction:

"2. The last time I checked, cgroups memory controller did not (and I
guess still does not) not account kernel-owned slabs. I asked several
times why so, but nobody answered."

It should, now, in the latest -mm, although it won't do per-group
reclaim (yet).

I am also failing to see how cpusets would be involved in here. I
understand that you may have free memory in terms of size, but still be
further restricted by cpuset. But I also think that having multiple
entry points for this buy us nothing at all. So the choices I see are:

1) If cpuset + memcg are comounted, take this into account when deciding
low / high / oom. This is yet another advantage over the "threshold in
bytes" interface, in which you can transparently take
other issues into account while keeping the interface.

2) If they are not, just ignore this effect.

The fallback in 2) sounds harsh, but I honestly think this is the price
to pay for the insanity of mounting those things in different
hierarchies, and we do have a plan to have all those things eventually
together anyway. If you have two cgroups dealing with memory, and set
them up in orthogonal ways, I really can't see how we can bring sanity
to that. So just admitting and unleashing the insanity may be better, if
it brings up our urge to fix it. It worked for Batman, why wouldn't it
work for us?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON

2012-11-16 Thread Jaegeuk Hanse


On 11/16/2012 03:56 AM, Hugh Dickins wrote:

Offtopic...

On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:

Another question. Why the function shmem_fallocate which you add to kernel
need call shmem_getpage?

Because shmem_getpage(_gfp) is where shmem's
page lookup and allocation complexities are handled.

I assume the question behind your question is: why does shmem actually
allocate pages for its fallocate, instead of just reserving the space?

I did play with just reserving the space, with more special entries in
the radix_tree to note the reservations made.  It should be doable for
the vm_enough_memory and sbinfo->used_blocks reservations.

What absolutely deterred me from taking that path was the mem_cgroup
case: shmem and swap and memcg are not easy to get working right together,
and nobody would thank me for complicating memcg just for shmem_fallocate.

By allocating pages, the pre-existing memcg code just works; if we used
reservations instead, we would have to track their memcg charges in some
additional new way.  I see no justification for that complication.


Hi Hugh

Some questions about your shmem/tmpfs: misc and fallocate patchset.

- Since shmem_setattr can truncate tmpfs files, why need add another 
similar codes in function shmem_fallocate? What's the trick?

- in tmpfs: support fallocate preallocation patch changelog:
  "Christoph Hellwig: What for exactly?  Please explain why 
preallocating on tmpfs would make any sense.
  Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on 
files on the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS 
[or -EOPNOTSUPP] on fallocate is just ugly."
  Could shmem/tmpfs fallocate prevent one process truncate the file 
which the second process mmap() and get SIGBUS when the second process 
access mmap but out of current size of file?


Regards,
Jaegeuk


Hugh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] scsi: prevent stack buffer overflow in host_reset

2012-11-16 Thread Rolf Eike Beer

Am Donnerstag 15 November 2012, 15:51:46 schrieb Sasha Levin:
> store_host_reset() has tried to re-invent the wheel to compare sysfs
> strings. Unfortunately it did so poorly and never bothered to check the
> input from userspace before overwriting stack with it, so something simple
> as:
> 
> echo "WoopsieWoopsie" >
> /sys/devices/pseudo_0/adapter0/host0/scsi_host/host0/host_reset
> 
> would result in:
> 
> [  316.310101] Kernel panic - not syncing: stack-protector: Kernel stack is
> corrupted in: 81f5bac7 [  316.310101]
> [  316.320051] Pid: 6655, comm: sh Tainted: GW   
> 3.7.0-rc5-next-20121114-sasha-00016-g5c9d68d-dirty #129 [  316.320051] Call
> Trace:
> [  316.340058] pps pps0: PPS event at 1352918752.620355751
> [  316.340062] pps pps0: capture assert seq #303
> [  316.320051]  [] panic+0xcd/0x1f4
> [  316.320051]  [] ? store_host_reset+0xd7/0x100
> [  316.320051]  [] __stack_chk_fail+0x16/0x20
> [  316.320051]  [] store_host_reset+0xd7/0x100
> [  316.320051]  [] dev_attr_store+0x13/0x30
> [  316.320051]  [] sysfs_write_file+0x101/0x170
> [  316.320051]  [] vfs_write+0xb8/0x180
> [  316.320051]  [] sys_write+0x50/0xa0
> [  316.320051]  [] tracesys+0xe1/0xe6
> 
> Fix this by uninventing whatever was going on there and just use
> sysfs_streq.
> 
> Bug introduced by 29443691 ("[SCSI] scsi: Added support for adapter and
> firmware reset").
> 
> Signed-off-by: Sasha Levin 

That revision is in 3.2 and all following, so I think this needs to go into 
stable, too.

Eike

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH RESEND 3/4] pinctrl: at91: Prevent NULL dereference if of_match_device returns NULL

2012-11-16 Thread Jean-Christophe PLAGNIOL-VILLARD

On 09:36 Fri 16 Nov , Nicolas Ferre wrote:
> On 11/15/2012 11:00 AM, Jean-Christophe PLAGNIOL-VILLARD :
> > On 12:58 Thu 15 Nov , Axel Lin wrote:
> >> of_match_device() may return NULL.
> > this is not possible on at91
> > 
> > and I do a oups here as if we have a NULL pointer which means the driver is
> > wrong
> 
> Well, okay, but it does not prevent from adding a supplementary check to
> mimic every other pinctrl driver and use a common return path to the
> of_match_device() function.
> 
> So I am not completely against this patch (that I why I added my Acked-by).
here it's a silent warning a oops it clear we have a bug

Best Regards,
J.
> 
> 
> >> Signed-off-by: Axel Lin 
> >> ---
> >> This resend CC Jean-Christophe.
> >>
> >>  drivers/pinctrl/pinctrl-at91.c |   17 +
> >>  1 file changed, 13 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/pinctrl/pinctrl-at91.c 
> >> b/drivers/pinctrl/pinctrl-at91.c
> >> index 8490a55..32006c8 100644
> >> --- a/drivers/pinctrl/pinctrl-at91.c
> >> +++ b/drivers/pinctrl/pinctrl-at91.c
> >> @@ -829,13 +829,18 @@ static int __devinit at91_pinctrl_probe_dt(struct 
> >> platform_device *pdev,
> >>uint32_t *tmp;
> >>struct device_node *np = pdev->dev.of_node;
> >>struct device_node *child;
> >> +  const struct of_device_id *match;
> >>  
> >>if (!np)
> >>return -ENODEV;
> >>  
> >> +  match = of_match_device(at91_pinctrl_of_match, &pdev->dev)->data;
> >> +  if (!match)
> >> +  return -ENODEV;
> >> +
> >>info->dev = &pdev->dev;
> >> -  info->ops = (struct at91_pinctrl_mux_ops*)
> >> -  of_match_device(at91_pinctrl_of_match, &pdev->dev)->data;
> >> +  info->ops = (struct at91_pinctrl_mux_ops *) match->data;
> >> +
> >>at91_pinctrl_child_count(info, np);
> >>  
> >>if (info->nbanks < 1) {
> >> @@ -1359,6 +1364,7 @@ static struct of_device_id at91_gpio_of_match[] 
> >> __devinitdata = {
> >>  static int __devinit at91_gpio_probe(struct platform_device *pdev)
> >>  {
> >>struct device_node *np = pdev->dev.of_node;
> >> +  const struct of_device_id *match;
> >>struct resource *res;
> >>struct at91_gpio_chip *at91_chip = NULL;
> >>struct gpio_chip *chip;
> >> @@ -1399,8 +1405,11 @@ static int __devinit at91_gpio_probe(struct 
> >> platform_device *pdev)
> >>goto err;
> >>}
> >>  
> >> -  at91_chip->ops = (struct at91_pinctrl_mux_ops*)
> >> -  of_match_device(at91_gpio_of_match, &pdev->dev)->data;
> >> +  match = of_match_device(at91_gpio_of_match, &pdev->dev);
> >> +  if (!match)
> >> +  return -ENODEV;
> >> +
> >> +  at91_chip->ops = (struct at91_pinctrl_mux_ops *) match->data;
> >>at91_chip->pioc_virq = irq;
> >>at91_chip->pioc_idx = alias_idx;
> >>  
> >> -- 
> >> 1.7.9.5
> >>
> >>
> >>
> > 
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > 
> > 
> 
> 
> -- 
> Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 2/3] pwm_backlight: use power sequences

2012-11-16 Thread Anton Vorontsov

On Fri, Nov 16, 2012 at 09:49:58AM +0100, Thierry Reding wrote:
> On Fri, Nov 16, 2012 at 03:38:22PM +0900, Alexandre Courbot wrote:
> > Make use of the power sequences specified in the device tree or platform
> > data to control how the backlight is powered on and off.
> > 
> > Signed-off-by: Alexandre Courbot 
> > Reviewed-by: Stephen Warren 
> 
> Anton,
> 
> I would normally take pwm-backlight patches through the PWM tree, but
> since this patch depends on the first in this series, I think it would
> be easier if you took it through your tree along with the rest of the
> series. If that's okay with you:
> 
> Acked-by: Thierry Reding 

Sure, I'll take it. Thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 1/3] Runtime Interpreted Power Sequences

2012-11-16 Thread Alex Courbot


On 11/16/2012 04:26 PM, Anton Vorontsov wrote:

+#include "power_seq_delay.c"
+#include "power_seq_regulator.c"
+#include "power_seq_pwm.c"
+#include "power_seq_gpio.c"


This is odd, although I remember you already explained why you have to
include the .c files, instead of linking them separately. But I forgot the
reason. :) I think this deserves a comment in the code.


This is because of the table right after these includes:

static const struct power_seq_res_ops power_seq_ops[POWER_SEQ_NUM_TYPES] = {
[POWER_SEQ_DELAY] = POWER_SEQ_DELAY_TYPE,
[POWER_SEQ_REGULATOR] = POWER_SEQ_REGULATOR_TYPE,
[POWER_SEQ_PWM] = POWER_SEQ_PWM_TYPE,
[POWER_SEQ_GPIO] = POWER_SEQ_GPIO_TYPE,
};

The POWER_SEQ_*_TYPE macros are defined in the C files. It's the 
simplest way to initialize this table, and the code inside these C files 
is short and simple enough that I thought I would be forgiven. :)


At first everything was in power_seq.c and it was fine, then I thought 
it would be better to move resource support code into their own filesm 
and now everybody is asking. :P


But yeah, maybe it would be even better to not stop halfway and use 
dynamic linking.


Comment added for the time being. ;)


+static int of_power_seq_parse_step(struct device *dev,
+  struct device_node *node,
+  struct power_seq *seq,
+  unsigned int step_nbr,
+  struct list_head *resources)
+{
+   struct power_seq_step *step = &seq->steps[step_nbr];
+   struct power_seq_resource res, *res2;
+   const char *type;
+   int i, err;


nit: one variable declaration per line.


Fair enough - but is that a convention? checkpatch.pl was happy with these.

Thanks,
Alex.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 0/3] Add hardware I/O coherency support for Armada 370/XP

2012-11-16 Thread Gregory CLEMENT

The purpose of this patch set is to add hardware I/O Coherency support
for Armada 370 and Armada XP. Theses SoCs come with an unit called
coherency fabric. A beginning of the support for this unit have been
introduced with the SMP patch set. This series extend this support:
the coherency fabric unit allows to use the Armada XP and the Armada
370 as nearly coherent architectures.

The third patches enables this new feature and register our own set
of DMA ops, to benefit this hardware enhancement.

The first patches exports dma operation functions needed by to
register our own set of dma ops.

The second patch introduces a new flag for the address decoding
configuration in order to be able to set the memory windows as
shared memory.

This series depend on the SMP patch set (V3 was posted on Monday)

The git branch called HWIOCC-for-3.8-V2 is also available at
https://github.com/MISL-EBU-System-SW/mainline-public.git.

Changelog:
V1 -> V2:
- Rebased on to v3.7-rc5
- Added a new patch to exports the dma ops functions
- Renamed the function for a more generic name mvebu_hwcc
- removed the non SMP case during init
- spelling and wording issues
- updating the binding documentation for coherency fabric

Gregory CLEMENT (3):
  arm: dma mapping: Export dma ops functions
  arm: plat-orion: Add coherency attribute when setup mbus target
  arm: mvebu: Add hardware I/O Coherency support

 .../devicetree/bindings/arm/coherency-fabric.txt   |9 ++-
 arch/arm/boot/dts/armada-370-xp.dtsi   |3 +-
 arch/arm/include/asm/dma-mapping.h |   62 +
 arch/arm/mach-mvebu/addr-map.c |3 +
 arch/arm/mach-mvebu/coherency.c|   73 
 arch/arm/mm/dma-mapping.c  |   36 +++---
 arch/arm/plat-orion/addr-map.c |4 ++
 arch/arm/plat-orion/include/plat/addr-map.h|1 +
 8 files changed, 160 insertions(+), 31 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 1/3] arm: dma mapping: Export dma ops functions

2012-11-16 Thread Gregory CLEMENT

Expose the DMA operations functions. Until now only the dma_ops
structs in a whole or some dma operation were exposed. This patch
exposes all the dma coherents and non-coherents operations. They can
be reused when an architecture or driver need to create its own set of
dma_operation.

Signed-off-by: Gregory CLEMENT 
---
 arch/arm/include/asm/dma-mapping.h |   62 
 arch/arm/mm/dma-mapping.c  |   36 +
 2 files changed, 70 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 2300484..f940a10 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -112,6 +112,60 @@ static inline void dma_free_noncoherent(struct device 
*dev, size_t size,
 extern int dma_supported(struct device *dev, u64 mask);
 
 /**
+ * arm_dma_map_page - map a portion of a page for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @page: page that buffer resides in
+ * @offset: offset into page for start of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_page().
+ */
+extern dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size,
+   enum dma_data_direction dir,
+   struct dma_attrs *attrs);
+
+extern dma_addr_t arm_coherent_dma_map_page(struct device *dev,
+   struct page *page,
+   unsigned long offset, size_t size,
+   enum dma_data_direction dir,
+   struct dma_attrs *attrs);
+
+/**
+ * arm_dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @handle: DMA address of buffer
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
+ *
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
+ * All other usages are undefined.
+ *
+ * After this call, reads by the CPU to the buffer are guaranteed to see
+ * whatever the device wrote there.
+ */
+extern void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+   size_t size, enum dma_data_direction dir,
+   struct dma_attrs *attrs);
+
+extern void arm_dma_sync_single_for_cpu(struct device *dev,
+   dma_addr_t handle, size_t size,
+   enum dma_data_direction dir);
+
+extern void arm_dma_sync_single_for_device(struct device *dev,
+   dma_addr_t handle, size_t size,
+   enum dma_data_direction dir);
+
+extern int arm_dma_set_mask(struct device *dev, u64 dma_mask);
+
+
+/**
  * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: required memory size
@@ -125,6 +179,10 @@ extern int dma_supported(struct device *dev, u64 mask);
 extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
   gfp_t gfp, struct dma_attrs *attrs);
 
+extern void *arm_coherent_dma_alloc(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp,
+   struct dma_attrs *attrs);
+
 #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL)
 
 static inline void *dma_alloc_attrs(struct device *dev, size_t size,
@@ -157,6 +215,10 @@ static inline void *dma_alloc_attrs(struct device *dev, 
size_t size,
 extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
 dma_addr_t handle, struct dma_attrs *attrs);
 
+extern void arm_coherent_dma_free(struct device *dev, size_t size,
+   void *cpu_addr, dma_addr_t handle,
+   struct dma_attrs *attrs);
+
 #define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL)
 
 static inline void dma_free_attrs(struct device *dev, size_t size,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 58bc3e4..5b60ee6 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -56,20 +56,13 @@ static void __dma_page_dev_to_cpu(struct page *, unsigned 
long,
size_t, enum dma_data_direction);
 
 /**
- * arm_dma_map_page - map a porti

[PATCH V2 3/3] arm: mvebu: Add hardware I/O Coherency support

2012-11-16 Thread Gregory CLEMENT

Armada 370 and XP come with an unit called coherency fabric. This unit
allows to use the Armada 370/XP as a nearly coherent architecture. The
coherency mechanism uses snoop filters to ensure the coherency between
caches, DRAM and devices. This mechanism needs a synchronization
barrier which guarantees that all the memory writes initiated by the
devices have reached their target and do not reside in intermediate
write buffers. That's why the architecture is not totally coherent and
we need to provide our own functions for some DMA operations.

Beside the use of the coherency fabric, the device units will have to
set the attribute flag of the decoding address window to select the
accurate coherency process for the memory transaction. This is done
each device driver programs the DRAM address windows. The value of the
attribute set by the driver is retrieved through the
orion_addr_map_cfg struct filled during the early initialization of
the platform.

Signed-off-by: Gregory CLEMENT 
Reviewed-by: Yehuda Yitschak 
---
 .../devicetree/bindings/arm/coherency-fabric.txt   |9 ++-
 arch/arm/boot/dts/armada-370-xp.dtsi   |3 +-
 arch/arm/mach-mvebu/addr-map.c |3 +
 arch/arm/mach-mvebu/coherency.c|   73 
 4 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/coherency-fabric.txt 
b/Documentation/devicetree/bindings/arm/coherency-fabric.txt
index 2bfbf67..17d8cd1 100644
--- a/Documentation/devicetree/bindings/arm/coherency-fabric.txt
+++ b/Documentation/devicetree/bindings/arm/coherency-fabric.txt
@@ -5,12 +5,17 @@ Available on Marvell SOCs: Armada 370 and Armada XP
 Required properties:
 
 - compatible: "marvell,coherency-fabric"
-- reg: Should contain,coherency fabric registers location and length.
+
+- reg: Should contain coherency fabric registers location and
+  length. First pair for the coherency fabric registers, second pair
+  for the per-CPU fabric registers registers.
 
 Example:
 
 coherency-fabric@d0020200 {
compatible = "marvell,coherency-fabric";
-   reg = <0xd0020200 0xb0>;
+   reg = <0xd0020200 0xb0>,
+   <0xd0021810 0x1c>;
+
 };
 
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi 
b/arch/arm/boot/dts/armada-370-xp.dtsi
index b0d075b..98a6b26 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -38,7 +38,8 @@
 
coherency-fabric@d0020200 {
compatible = "marvell,coherency-fabric";
-   reg = <0xd0020200 0xb0>;
+   reg = <0xd0020200 0xb0>,
+ <0xd0021810 0x1c>;
};
 
soc {
diff --git a/arch/arm/mach-mvebu/addr-map.c b/arch/arm/mach-mvebu/addr-map.c
index fe454a4..595f6b7 100644
--- a/arch/arm/mach-mvebu/addr-map.c
+++ b/arch/arm/mach-mvebu/addr-map.c
@@ -108,6 +108,9 @@ static int __init armada_setup_cpu_mbus(void)
 
addr_map_cfg.bridge_virt_base = mbus_unit_addr_decoding_base;
 
+   if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
+   addr_map_cfg.hw_io_coherency = 1;
+
/*
 * Disable, clear and configure windows.
 */
diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 20a0ccc..153fcfa 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include "armada-370-xp.h"
 
@@ -32,11 +34,14 @@
  * value matching its virtual mapping
  */
 static void __iomem *coherency_base = ARMADA_370_XP_REGS_VIRT_BASE + 0x20200;
+static void __iomem *coherency_cpu_base;
 
 /* Coherency fabric registers */
 #define COHERENCY_FABRIC_CTL_OFFSET   0x0
 #define COHERENCY_FABRIC_CFG_OFFSET   0x4
 
+#define IO_SYNC_BARRIER_CTL_OFFSET0x0
+
 static struct of_device_id of_coherency_table[] = {
{.compatible = "marvell,coherency-fabric"},
{ /* end of list */ },
@@ -75,6 +80,70 @@ int set_cpu_coherent(unsigned int hw_cpu_id, int 
smp_group_id)
return 0;
 }
 
+static inline void mvebu_hwcc_sync_io_barrier(void)
+{
+   writel(0x1, coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET);
+   while (readl(coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET) & 0x1);
+}
+
+static dma_addr_t mvebu_hwcc_dma_map_page(struct device *dev, struct page 
*page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+   if (dir != DMA_TO_DEVICE)
+   mvebu_hwcc_sync_io_barrier();
+   return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+}
+
+
+static void mvebu_hwcc_dma_unmap_page(struct device *dev, dma_addr_t 
dma_handle,
+ size_t size, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+

[PATCH V2 2/3] arm: plat-orion: Add coherency attribute when setup mbus target

2012-11-16 Thread Gregory CLEMENT

Recent SoC such as Armada 370/XP came with the possibility to deal
with the I/O coherency by hardware. In this case the transaction
attribute of the window must be flagged as "Shared transaction". Once
this flag is set, then the transactions will be forced to be sent
through the coherency block, in other case transaction is driven
directly to DRAM.

Signed-off-by: Gregory CLEMENT 
Reviewed-by: Yehuda Yitschak 
Acked-by: Thomas Petazzoni 
---
 arch/arm/plat-orion/addr-map.c  |4 
 arch/arm/plat-orion/include/plat/addr-map.h |1 +
 2 files changed, 5 insertions(+)

diff --git a/arch/arm/plat-orion/addr-map.c b/arch/arm/plat-orion/addr-map.c
index a7b8060..febe386 100644
--- a/arch/arm/plat-orion/addr-map.c
+++ b/arch/arm/plat-orion/addr-map.c
@@ -42,6 +42,8 @@ EXPORT_SYMBOL_GPL(mv_mbus_dram_info);
 #define WIN_REMAP_LO_OFF   0x0008
 #define WIN_REMAP_HI_OFF   0x000c
 
+#define ATTR_HW_COHERENCY  (0x1 << 4)
+
 /*
  * Default implementation
  */
@@ -163,6 +165,8 @@ void __init orion_setup_cpu_mbus_target(const struct 
orion_addr_map_cfg *cfg,
w = &orion_mbus_dram_info.cs[cs++];
w->cs_index = i;
w->mbus_attr = 0xf & ~(1 << i);
+   if (cfg->hw_io_coherency)
+   w->mbus_attr |= ATTR_HW_COHERENCY;
w->base = base & 0x;
w->size = (size | 0x) + 1;
}
diff --git a/arch/arm/plat-orion/include/plat/addr-map.h 
b/arch/arm/plat-orion/include/plat/addr-map.h
index ec63e4a..b76c065 100644
--- a/arch/arm/plat-orion/include/plat/addr-map.h
+++ b/arch/arm/plat-orion/include/plat/addr-map.h
@@ -17,6 +17,7 @@ struct orion_addr_map_cfg {
const int num_wins; /* Total number of windows */
const int remappable_wins;
void __iomem *bridge_virt_base;
+   int hw_io_coherency;
 
/* If NULL, the default cpu_win_can_remap will be used, using
   the value in remappable_wins */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] LDT - Linux Driver Template

2012-11-16 Thread Bjørn Mork

Greg KH  writes:

>  Normally you just start with a
> driver for a device like the one you need to write and modify it from
> there.

Yes.

Even if the template driver is fixed up to be the most beautiful driver
ever made, it will still always be made for non-existing hardware.  This
causes two major problems:
 - the driver will not be tested, so it will have bugs
 - the driver will not be used by anyone, so it will not be maintained
   (remember that it is initially perfect, so there is no reason to
   change it)

May I suggest another approach?  How about selecting a set of existing
drivers which are suitable as templates, and put all this effort into
making those drivers *the* perfect examples instead? Start submitting
cleanup patches for the selected drivers until everyone is satisfied and
then document them as starting points for anyone wanting to write a
similar driver.

I believe many subsystem maintainers already have such sample drivers
which they point new submitters to when asked.  That does not mean that
these drivers necessarily are perfect, so there is still work to do here
for anyone interested.  And collecting this information and documenting
it would be useful in itself.

It would also be nice if hardware availability was considered when
selecting the sample drivers.  Buying an already supported device to
experiment with its driver can be useful even if you have another device
you want to write a driver for.  Or just for the learning experience.

Just my € .02

Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] power: generic-adc-battery: Fix checking return value of request_any_context_irq

2012-11-16 Thread Anton Vorontsov

On Fri, Nov 16, 2012 at 05:09:22PM +0800, Axel Lin wrote:
> On failure, request_any_context_irq() returns a negative value.
> On success, it returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED.
> 
> Signed-off-by: Axel Lin 
> ---
> This was sent on https://lkml.org/lkml/2012/10/11/77
> Nobody picks up this patch, so resend it.
> I think this patch is for 3.7.

Nah, the driver is new, plus I doubt that anyone actually registers it in
v3.7, and it's definitely not a regression.


The fix itself is a valid one though, I applied it to battery tree.

Thank you!

> Regards,
> Axel
>  drivers/power/generic-adc-battery.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/generic-adc-battery.c 
> b/drivers/power/generic-adc-battery.c
> index 9bdf444..4a00ef7 100644
> --- a/drivers/power/generic-adc-battery.c
> +++ b/drivers/power/generic-adc-battery.c
> @@ -327,7 +327,7 @@ static int __devinit gab_probe(struct platform_device 
> *pdev)
>   ret = request_any_context_irq(irq, gab_charged,
>   IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING,
>   "battery charged", adc_bat);
> - if (ret)
> + if (ret < 0)
>   goto err_gpio;
>   }
>  
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 resend hot_track 00/16] vfs: hot data tracking

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

HI, guys,

  Any comments or ideas are appreciated, thanks.

NOTE:

  The patchset can be obtained via my kernel dev git on github:
git://github.com/wuzhy/kernel.git hot_tracking
  If you're interested, you can also review them via
https://github.com/wuzhy/kernel/commits/hot_tracking

  For more info, please check hot_tracking.txt in Documentation

TODO List:

 1.) Need to do scalability or performance tests. - Required
 2.) Need one simpler but efficient temp calculation function
 3.) How to save the file temperature among the umount to be able to
 preserve the file tempreture after reboot - Optional

Changelog:

 - Solved 64 bits inode number issue. [David Sterba]
 - Embed struct hot_type in struct file_system_type [Darrick J. Wong]
 - Cleanup Some issues [David Sterba]
 - Use a static hot debugfs root [Greg KH]
 - Rewritten debugfs support based on seq_file operation. [Dave Chinner]
 - Refactored workqueue support. [Dave Chinner]
 - Turn some Micro into be tunable   [Zhiyong, Zheng Liu]
   TIME_TO_KICK, and HEAT_UPDATE_DELAY
 - Introduce hot func registering framework [Zhiyong]
 - Remove global variable for hot tracking [Zhiyong]
 - Add xfs hot tracking support [Dave Chinner]
 - Add ext4 hot tracking support [Zheng Liu]
 - Cleanedup a lot of other issues [Dave Chinner]
 - Added memory shrinker [Dave Chinner]
 - Converted to one workqueue to update map info periodically [Dave Chinner]
 - Cleanedup a lot of other issues [Dave Chinner]
 - Reduce new files and put all in fs/hot_tracking.[ch] [Dave Chinner]
 - Add btrfs hot tracking support [Zhiyong]
 - The first three patches can probably just be flattened into one.
[Marco Stornelli , Dave Chinner]

Zhi Yong Wu (16):
  vfs: introduce some data structures
  vfs: add init and cleanup functions
  vfs: add I/O frequency update function
  vfs: add two map arrays
  vfs: add hooks to enable hot tracking
  vfs: add temp calculation function
  vfs: add map info update function
  vfs: add aging function
  vfs: add one work queue
  vfs: add FS hot type support
  vfs: register one shrinker
  vfs: add one ioctl interface
  vfs: add debugfs support
  proc: add two hot_track proc files
  btrfs: add hot tracking support
  vfs: add documentation

 Documentation/filesystems/00-INDEX |2 +
 Documentation/filesystems/hot_tracking.txt |  263 ++
 fs/Makefile|2 +-
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/super.c   |   22 +-
 fs/compat_ioctl.c  |5 +
 fs/dcache.c|2 +
 fs/direct-io.c |6 +
 fs/hot_tracking.c  | 1306 
 fs/hot_tracking.h  |   52 ++
 fs/ioctl.c |   74 ++
 include/linux/fs.h |5 +
 include/linux/hot_tracking.h   |  152 
 kernel/sysctl.c|   14 +
 mm/filemap.c   |6 +
 mm/page-writeback.c|   12 +
 mm/readahead.c |7 +
 17 files changed, 1929 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/filesystems/hot_tracking.txt
 create mode 100644 fs/hot_tracking.c
 create mode 100644 fs/hot_tracking.h
 create mode 100644 include/linux/hot_tracking.h

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 02/16] vfs: add init and cleanup functions

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Add initialization function to create some
key data structures when hot tracking is enabled;
Clean up them when hot tracking is disabled

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|  115 ++
 include/linux/fs.h   |4 ++
 include/linux/hot_tracking.h |3 +
 3 files changed, 122 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index ef7ff09..1fd4d0e 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -76,12 +76,92 @@ static void hot_inode_item_init(struct hot_inode_item *he,
he->hot_inode_tree = hot_inode_tree;
kref_init(&he->hot_inode.refs);
spin_lock_init(&he->hot_inode.lock);
+   INIT_LIST_HEAD(&he->hot_inode.n_list);
he->hot_inode.hot_freq_data.avg_delta_reads = (u64) -1;
he->hot_inode.hot_freq_data.avg_delta_writes = (u64) -1;
he->hot_inode.hot_freq_data.flags = FREQ_DATA_TYPE_INODE;
hot_range_tree_init(he);
 }
 
+static void hot_range_item_free(struct kref *kref)
+{
+   struct hot_comm_item *comm_item = container_of(kref,
+   struct hot_comm_item, refs);
+   struct hot_range_item *hr = container_of(comm_item,
+   struct hot_range_item, hot_range);
+
+   rb_erase(&hr->hot_range.rb_node,
+   &hr->hot_inode->hot_range_tree.map);
+   kmem_cache_free(hot_range_item_cachep, hr);
+}
+
+/*
+ * Drops the reference out on hot_range_item by one
+ * and free the structure if the reference count hits zero
+ */
+static void hot_range_item_put(struct hot_range_item *hr)
+{
+   kref_put(&hr->hot_range.refs, hot_range_item_free);
+}
+
+/* Frees the entire hot_range_tree. */
+static void hot_range_tree_free(struct hot_inode_item *he)
+{
+   struct rb_node *node;
+   struct hot_comm_item *ci;
+   struct hot_range_item *hr;
+
+   /* Free hot inode and range trees on fs root */
+   spin_lock(&he->lock);
+   while ((node = rb_first(&he->hot_range_tree.map))) {
+   ci = rb_entry(node, struct hot_comm_item, rb_node);
+   hr = container_of(ci,
+   struct hot_range_item, hot_range);
+   hot_range_item_put(hr);
+   }
+   spin_unlock(&he->lock);
+}
+
+static void hot_inode_item_free(struct kref *kref)
+{
+   struct hot_comm_item *comm_item = container_of(kref,
+   struct hot_comm_item, refs);
+   struct hot_inode_item *he = container_of(comm_item,
+   struct hot_inode_item, hot_inode);
+
+   hot_range_tree_free(he);
+   rb_erase(&he->hot_inode.rb_node, &he->hot_inode_tree->map);
+   kmem_cache_free(hot_inode_item_cachep, he);
+}
+
+/*
+ * Drops the reference out on hot_inode_item by one
+ * and free the structure if the reference count hits zero
+ */
+void hot_inode_item_put(struct hot_inode_item *he)
+{
+   kref_put(&he->hot_inode.refs, hot_inode_item_free);
+}
+EXPORT_SYMBOL_GPL(hot_inode_item_put);
+
+/* Frees the entire hot_inode_tree. */
+static void hot_inode_tree_exit(struct hot_info *root)
+{
+   struct rb_node *node;
+   struct hot_comm_item *ci;
+   struct hot_inode_item *he;
+
+   /* Free hot inode and range trees on fs root */
+   spin_lock(&root->lock);
+   while ((node = rb_first(&root->hot_inode_tree.map))) {
+   ci = rb_entry(node, struct hot_comm_item, rb_node);
+   he = container_of(ci,
+   struct hot_inode_item, hot_inode);
+   hot_inode_item_put(he);
+   }
+   spin_unlock(&root->lock);
+}
+
 /*
  * Initialize kmem cache for hot_inode_item and hot_range_item.
  */
@@ -107,3 +187,38 @@ err:
kmem_cache_destroy(hot_inode_item_cachep);
 }
 EXPORT_SYMBOL_GPL(hot_cache_init);
+
+/*
+ * Initialize the data structures for hot data tracking.
+ */
+int hot_track_init(struct super_block *sb)
+{
+   struct hot_info *root;
+   int ret = -ENOMEM;
+
+   root = kzalloc(sizeof(struct hot_info), GFP_NOFS);
+   if (!root) {
+   printk(KERN_ERR "%s: Failed to malloc memory for "
+   "hot_info\n", __func__);
+   return ret;
+   }
+
+   hot_inode_tree_init(root);
+
+   sb->s_hot_root = root;
+
+   printk(KERN_INFO "VFS: Turning on hot data tracking\n");
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(hot_track_init);
+
+void hot_track_exit(struct super_block *sb)
+{
+   struct hot_info *root = sb->s_hot_root;
+
+   hot_inode_tree_exit(root);
+   sb->s_hot_root = NULL;
+   kfree(root);
+}
+EXPORT_SYMBOL_GPL(hot_track_exit);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b33cfc9..c541ae7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1321,6 +1322,9 @@ struct super_block {
 
/* Being remounted read-only */
int s_readonly_remount;

[PATCH v1 hot_track 03/16] vfs: add I/O frequency update function

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Add some util helpers to update access frequencies
for one file or its range.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|  178 ++
 fs/hot_tracking.h|5 +
 include/linux/hot_tracking.h |4 +
 3 files changed, 187 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 1fd4d0e..6d396fe 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -162,6 +162,135 @@ static void hot_inode_tree_exit(struct hot_info *root)
spin_unlock(&root->lock);
 }
 
+struct hot_inode_item
+*hot_inode_item_lookup(struct hot_info *root, u64 ino)
+{
+   struct rb_node **p = &root->hot_inode_tree.map.rb_node;
+   struct rb_node *parent = NULL;
+   struct hot_comm_item *ci;
+   struct hot_inode_item *entry;
+
+   /* walk tree to find insertion point */
+   spin_lock(&root->lock);
+   while (*p) {
+   parent = *p;
+   ci = rb_entry(parent, struct hot_comm_item, rb_node);
+   entry = container_of(ci, struct hot_inode_item, hot_inode);
+   if (ino < entry->i_ino)
+   p = &(*p)->rb_left;
+   else if (ino > entry->i_ino)
+   p = &(*p)->rb_right;
+   else {
+   spin_unlock(&root->lock);
+   kref_get(&entry->hot_inode.refs);
+   return entry;
+   }
+   }
+   spin_unlock(&root->lock);
+
+   entry = kmem_cache_zalloc(hot_inode_item_cachep, GFP_NOFS);
+   if (!entry)
+   return ERR_PTR(-ENOMEM);
+
+   spin_lock(&root->lock);
+   hot_inode_item_init(entry, ino, &root->hot_inode_tree);
+   rb_link_node(&entry->hot_inode.rb_node, parent, p);
+   rb_insert_color(&entry->hot_inode.rb_node,
+   &root->hot_inode_tree.map);
+   spin_unlock(&root->lock);
+
+   kref_get(&entry->hot_inode.refs);
+   return entry;
+}
+EXPORT_SYMBOL_GPL(hot_inode_item_lookup);
+
+static loff_t hot_range_end(struct hot_range_item *hr)
+{
+   if (hr->start + hr->len < hr->start)
+   return (loff_t)-1;
+
+   return hr->start + hr->len - 1;
+}
+
+static struct hot_range_item
+*hot_range_item_lookup(struct hot_inode_item *he,
+   loff_t start)
+{
+   struct rb_node **p = &he->hot_range_tree.map.rb_node;
+   struct rb_node *parent = NULL;
+   struct hot_comm_item *ci;
+   struct hot_range_item *entry;
+
+   /* walk tree to find insertion point */
+   spin_lock(&he->lock);
+   while (*p) {
+   parent = *p;
+   ci = rb_entry(parent, struct hot_comm_item, rb_node);
+   entry = container_of(ci, struct hot_range_item, hot_range);
+   if (start < entry->start)
+   p = &(*p)->rb_left;
+   else if (start > hot_range_end(entry))
+   p = &(*p)->rb_right;
+   else {
+   spin_unlock(&he->lock);
+   kref_get(&entry->hot_range.refs);
+   return entry;
+   }
+   }
+   spin_unlock(&he->lock);
+
+   entry = kmem_cache_zalloc(hot_range_item_cachep, GFP_NOFS);
+   if (!entry)
+   return ERR_PTR(-ENOMEM);
+
+   spin_lock(&he->lock);
+   hot_range_item_init(entry, start, he);
+   rb_link_node(&entry->hot_range.rb_node, parent, p);
+   rb_insert_color(&entry->hot_range.rb_node,
+   &he->hot_range_tree.map);
+   spin_unlock(&he->lock);
+
+   kref_get(&entry->hot_range.refs);
+   return entry;
+}
+
+/*
+ * This function does the actual work of updating
+ * the frequency numbers, whatever they turn out to be.
+ */
+static void hot_rw_freq_calc(struct timespec old_atime,
+   struct timespec cur_time, u64 *avg)
+{
+   struct timespec delta_ts;
+   u64 new_delta;
+
+   delta_ts = timespec_sub(cur_time, old_atime);
+   new_delta = timespec_to_ns(&delta_ts) >> FREQ_POWER;
+
+   *avg = (*avg << FREQ_POWER) - *avg + new_delta;
+   *avg = *avg >> FREQ_POWER;
+}
+
+static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write)
+{
+   struct timespec cur_time = current_kernel_time();
+
+   if (write) {
+   freq_data->nr_writes += 1;
+   hot_rw_freq_calc(freq_data->last_write_time,
+   cur_time,
+   &freq_data->avg_delta_writes);
+   freq_data->last_write_time = cur_time;
+   } else {
+   freq_data->nr_reads += 1;
+   hot_rw_freq_calc(freq_data->last_read_time,
+   freq_data->last_read_time,
+   cur_time,
+   &freq_data->avg_delta_reads);
+   freq_data->last_read_time = cur_time;
+   }
+}
+
 /*
  *

[PATCH v1 hot_track 05/16] vfs: add hooks to enable hot tracking

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Miscellaneous features that implement hot data tracking
and generally make the hot data functions a bit more friendly.

Signed-off-by: Zhi Yong Wu 
---
 fs/direct-io.c  |6 ++
 mm/filemap.c|6 ++
 mm/page-writeback.c |   12 
 mm/readahead.c  |7 +++
 4 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index f86c720..51f13f4 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include "hot_tracking.h"
 
 /*
  * How many user pages to map in one call to get_user_pages().  This determines
@@ -1297,6 +1298,11 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct 
inode *inode,
prefetch(bdev->bd_queue);
prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES);
 
+   /* Hot data tracking */
+   hot_update_freqs(inode, offset,
+   iov_length(iov, nr_segs),
+   rw & WRITE);
+
return do_blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
 nr_segs, get_block, end_io,
 submit_io, flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index 83efee7..6141374 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -33,6 +33,7 @@
 #include  /* for BUG_ON(!in_atomic()) only */
 #include 
 #include 
+#include 
 #include "internal.h"
 
 /*
@@ -1224,6 +1225,11 @@ readpage:
 * PG_error will be set again if readpage fails.
 */
ClearPageError(page);
+
+   /* Hot data tracking */
+   hot_update_freqs(inode, (loff_t)page->index << PAGE_CACHE_SHIFT,
+   PAGE_CACHE_SIZE, 0);
+
/* Start the actual read. The read will unlock the page. */
error = mapping->a_ops->readpage(filp, page);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 830893b..dc8f721 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -35,6 +35,7 @@
 #include  /* __set_page_dirty_buffers */
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -1903,13 +1904,24 @@ EXPORT_SYMBOL(generic_writepages);
 int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
 {
int ret;
+   loff_t start = 0;
+   size_t count = 0;
 
if (wbc->nr_to_write <= 0)
return 0;
+
+   start = mapping->writeback_index << PAGE_CACHE_SHIFT;
+   count = wbc->nr_to_write;
+
if (mapping->a_ops->writepages)
ret = mapping->a_ops->writepages(mapping, wbc);
else
ret = generic_writepages(mapping, wbc);
+
+   /* Hot data tracking */
+   hot_update_freqs(mapping->host, start,
+   (count - wbc->nr_to_write) * PAGE_CACHE_SIZE, 1);
+
return ret;
 }
 
diff --git a/mm/readahead.c b/mm/readahead.c
index 7963f23..d1ab688 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Initialise a struct file's readahead state.  Assumes that the caller has
@@ -138,6 +139,12 @@ static int read_pages(struct address_space *mapping, 
struct file *filp,
 out:
blk_finish_plug(&plug);
 
+   /* Hot data tracking */
+   hot_update_freqs(mapping->host,
+   (loff_t)(list_entry(pages->prev, struct page, lru)->index)
+   << PAGE_CACHE_SHIFT,
+   (size_t)nr_pages * PAGE_CACHE_SIZE, 0);
+
return ret;
 }
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 06/16] vfs: add temp calculation function

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c |   74 +
 fs/hot_tracking.h |   21 +++
 2 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index bd2c353..3cb14e2 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -25,6 +25,14 @@
 static struct kmem_cache *hot_inode_item_cachep __read_mostly;
 static struct kmem_cache *hot_range_item_cachep __read_mostly;
 
+static u64 hot_raw_shift(u64 counter, u32 bits, bool dir)
+{
+   if (dir)
+   return counter << bits;
+   else
+   return counter >> bits;
+}
+
 /*
  * Initialize the inode tree. Should be called for each new inode
  * access or other user of the hot_inode interface.
@@ -312,6 +320,72 @@ static void hot_freq_data_update(struct hot_freq_data 
*freq_data, bool write)
 }
 
 /*
+ * hot_temp_calc() is responsible for distilling the six heat
+ * criteria down into a single temperature value for the data,
+ * which is an integer between 0 and HEAT_MAX_VALUE.
+ */
+static u32 hot_temp_calc(struct hot_freq_data *freq_data)
+{
+   u32 result = 0;
+
+   struct timespec ckt = current_kernel_time();
+   u64 cur_time = timespec_to_ns(&ckt);
+
+   u32 nrr_heat = (u32)hot_raw_shift((u64)freq_data->nr_reads,
+   NRR_MULTIPLIER_POWER, true);
+   u32 nrw_heat = (u32)hot_raw_shift((u64)freq_data->nr_writes,
+   NRW_MULTIPLIER_POWER, true);
+
+   u64 ltr_heat =
+   hot_raw_shift((cur_time - timespec_to_ns(&freq_data->last_read_time)),
+   LTR_DIVIDER_POWER, false);
+   u64 ltw_heat =
+   hot_raw_shift((cur_time - timespec_to_ns(&freq_data->last_write_time)),
+   LTW_DIVIDER_POWER, false);
+
+   u64 avr_heat =
+   hot_raw_shiftu64) -1) - freq_data->avg_delta_reads),
+   AVR_DIVIDER_POWER, false);
+   u64 avw_heat =
+   hot_raw_shiftu64) -1) - freq_data->avg_delta_writes),
+   AVW_DIVIDER_POWER, false);
+
+   /* ltr_heat is now guaranteed to be u32 safe */
+   if (ltr_heat >= hot_raw_shift((u64) 1, 32, true))
+   ltr_heat = 0;
+   else
+   ltr_heat = hot_raw_shift((u64) 1, 32, true) - ltr_heat;
+
+   /* ltw_heat is now guaranteed to be u32 safe */
+   if (ltw_heat >= hot_raw_shift((u64) 1, 32, true))
+   ltw_heat = 0;
+   else
+   ltw_heat = hot_raw_shift((u64) 1, 32, true) - ltw_heat;
+
+   /* avr_heat is now guaranteed to be u32 safe */
+   if (avr_heat >= hot_raw_shift((u64) 1, 32, true))
+   avr_heat = (u32) -1;
+
+   /* avw_heat is now guaranteed to be u32 safe */
+   if (avw_heat >= hot_raw_shift((u64) 1, 32, true))
+   avw_heat = (u32) -1;
+
+   nrr_heat = (u32)hot_raw_shift((u64)nrr_heat,
+   (3 - NRR_COEFF_POWER), false);
+   nrw_heat = (u32)hot_raw_shift((u64)nrw_heat,
+   (3 - NRW_COEFF_POWER), false);
+   ltr_heat = hot_raw_shift(ltr_heat, (3 - LTR_COEFF_POWER), false);
+   ltw_heat = hot_raw_shift(ltw_heat, (3 - LTW_COEFF_POWER), false);
+   avr_heat = hot_raw_shift(avr_heat, (3 - AVR_COEFF_POWER), false);
+   avw_heat = hot_raw_shift(avw_heat, (3 - AVW_COEFF_POWER), false);
+
+   result = nrr_heat + nrw_heat + (u32) ltr_heat +
+   (u32) ltw_heat + (u32) avr_heat + (u32) avw_heat;
+
+   return result;
+}
+
+/*
  * Initialize inode and range map info.
  */
 static void hot_map_init(struct hot_info *root)
diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
index 8571186..f33066f 100644
--- a/fs/hot_tracking.h
+++ b/fs/hot_tracking.h
@@ -24,4 +24,25 @@
 #define RANGE_SIZE (1 << RANGE_BITS)
 #define FREQ_POWER 4
 
+/* NRR/NRW heat unit = 2^X accesses */
+#define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */
+#define NRR_COEFF_POWER 0
+#define NRW_MULTIPLIER_POWER 20 /* NRW - number of writes since mount */
+#define NRW_COEFF_POWER 0
+
+/* LTR/LTW heat unit = 2^X ns of age */
+#define LTR_DIVIDER_POWER 30 /* LTR - time elapsed since last read(ns) */
+#define LTR_COEFF_POWER 1
+#define LTW_DIVIDER_POWER 30 /* LTW - time elapsed since last write(ns) */
+#define LTW_COEFF_POWER 1
+
+/*
+ * AVR/AVW cold unit = 2^X ns of average delta
+ * AVR/AVW heat unit = HEAT_MAX_VALUE - cold unit
+ */
+#define AVR_DIVIDER_POWER 40 /* AVR - average delta between recent reads(ns) */
+#define AVR_COEFF_POWER 0
+#define AVW_DIVIDER_POWER 40 /* AVW - average delta between recent writes(ns) 
*/
+#define AVW_COEFF_POWER 0
+
 #endif /* __HOT_TRACKING__ */
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 09/16] vfs: add one work queue

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Add a per-superblock workqueue and a delayed_work
to run periodic work to update map info on each superblock.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|   81 ++
 fs/hot_tracking.h|3 ++
 include/linux/hot_tracking.h |3 ++
 3 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 304028d..873d234 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -15,9 +15,12 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hot_tracking.h"
 
@@ -539,6 +542,63 @@ static void hot_map_exit(struct hot_info *root)
}
 }
 
+/* Temperature compare function*/
+static int hot_temp_cmp(void *priv, struct list_head *a,
+   struct list_head *b)
+{
+   struct hot_comm_item *ap =
+   container_of(a, struct hot_comm_item, n_list);
+   struct hot_comm_item *bp =
+   container_of(b, struct hot_comm_item, n_list);
+
+   int diff = ap->hot_freq_data.last_temp
+   - bp->hot_freq_data.last_temp;
+   if (diff > 0)
+   return -1;
+   if (diff < 0)
+   return 1;
+   return 0;
+}
+
+/*
+ * Every sync period we update temperatures for
+ * each hot inode item and hot range item for aging
+ * purposes.
+ */
+static void hot_update_worker(struct work_struct *work)
+{
+   struct hot_info *root = container_of(to_delayed_work(work),
+   struct hot_info, update_work);
+   struct rb_node *node;
+   struct hot_comm_item *ci;
+   struct hot_inode_item *he;
+   int i;
+
+   node = rb_first(&root->hot_inode_tree.map);
+   while (node) {
+   ci = rb_entry(node, struct hot_comm_item, rb_node);
+   he = container_of(ci, struct hot_inode_item, hot_inode);
+   kref_get(&he->hot_inode.refs);
+   hot_map_update(
+   &he->hot_inode.hot_freq_data, root);
+   hot_range_update(he, root);
+   node = rb_next(node);
+   hot_inode_item_put(he);
+   }
+
+   /* Sort temperature map info */
+   for (i = 0; i < HEAT_MAP_SIZE; i++) {
+   list_sort(NULL, &root->heat_inode_map[i].node_list,
+   hot_temp_cmp);
+   list_sort(NULL, &root->heat_range_map[i].node_list,
+   hot_temp_cmp);
+   }
+
+   /* Instert next delayed work */
+   queue_delayed_work(root->update_wq, &root->update_work,
+   msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC));
+}
+
 /*
  * Initialize kmem cache for hot_inode_item and hot_range_item.
  */
@@ -632,11 +692,30 @@ int hot_track_init(struct super_block *sb)
hot_inode_tree_init(root);
hot_map_init(root);
 
+   root->update_wq = alloc_workqueue(
+   "hot_update_wq", WQ_NON_REENTRANT, 0);
+   if (!root->update_wq) {
+   printk(KERN_ERR "%s: Failed to create "
+   "hot update workqueue\n", __func__);
+   goto failed_wq;
+   }
+
+   /* Initialize hot tracking wq and arm one delayed work */
+   INIT_DELAYED_WORK(&root->update_work, hot_update_worker);
+   queue_delayed_work(root->update_wq, &root->update_work,
+   msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC));
+
sb->s_hot_root = root;
 
printk(KERN_INFO "VFS: Turning on hot data tracking\n");
 
return 0;
+
+failed_wq:
+   hot_map_exit(root);
+   hot_inode_tree_exit(root);
+   kfree(root);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(hot_track_init);
 
@@ -644,6 +723,8 @@ void hot_track_exit(struct super_block *sb)
 {
struct hot_info *root = sb->s_hot_root;
 
+   cancel_delayed_work_sync(&root->update_work);
+   destroy_workqueue(root->update_wq);
hot_map_exit(root);
hot_inode_tree_exit(root);
sb->s_hot_root = NULL;
diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
index 46d068a..96379a6 100644
--- a/fs/hot_tracking.h
+++ b/fs/hot_tracking.h
@@ -30,6 +30,9 @@
  */
 #define TIME_TO_KICK 300
 
+/* set how often to update temperatures (seconds) */
+#define HEAT_UPDATE_DELAY 300
+
 /* NRR/NRW heat unit = 2^X accesses */
 #define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */
 #define NRR_COEFF_POWER 0
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index 34a0530..ef12748 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -88,6 +88,9 @@ struct hot_info {
/* map of range temperature */
struct hot_map_head heat_range_map[HEAT_MAP_SIZE];
unsigned int hot_map_nr;
+
+   struct workqueue_struct *update_wq;
+   struct delayed_work update_work;
 };
 
 extern void __init hot_cache_init(void);

[PATCH v1 hot_track 10/16] vfs: add FS hot type support

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Introduce one way to enable that specific FS
can inject its own hot tracking type.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|   43 +++--
 fs/hot_tracking.h|1 -
 include/linux/fs.h   |1 +
 include/linux/hot_tracking.h |   19 ++
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 873d234..81fb084 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -64,8 +64,11 @@ void hot_range_tree_init(struct hot_inode_item *he)
 static void hot_range_item_init(struct hot_range_item *hr, loff_t start,
struct hot_inode_item *he)
 {
+   struct hot_info *root = container_of(he->hot_inode_tree,
+   struct hot_info, hot_inode_tree);
+
hr->start = start;
-   hr->len = RANGE_SIZE;
+   hr->len = hot_raw_shift(1, root->hot_type->range_bits, true);
hr->hot_inode = he;
kref_init(&hr->hot_range.refs);
spin_lock_init(&hr->hot_range.lock);
@@ -302,19 +305,21 @@ static void hot_rw_freq_calc(struct timespec old_atime,
*avg = *avg >> FREQ_POWER;
 }
 
-static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write)
+static void hot_freq_data_update(struct hot_info *root,
+   struct hot_freq_data *freq_data, bool write)
 {
struct timespec cur_time = current_kernel_time();
 
if (write) {
freq_data->nr_writes += 1;
-   hot_rw_freq_calc(freq_data->last_write_time,
+   root->hot_type->ops.hot_rw_freq_calc_fn(
+   freq_data->last_write_time,
cur_time,
&freq_data->avg_delta_writes);
freq_data->last_write_time = cur_time;
} else {
freq_data->nr_reads += 1;
-   hot_rw_freq_calc(freq_data->last_read_time,
+   root->hot_type->ops.hot_rw_freq_calc_fn(
freq_data->last_read_time,
cur_time,
&freq_data->avg_delta_reads);
@@ -418,7 +423,7 @@ static void hot_map_update(struct hot_freq_data *freq_data,
struct hot_comm_item *comm_item;
struct hot_inode_item *he;
struct hot_range_item *hr;
-   u32 temp = hot_temp_calc(freq_data);
+   u32 temp = root->hot_type->ops.hot_temp_calc_fn(freq_data);
u8 a_temp = (u8)hot_raw_shift((u64)temp, (32 - HEAT_MAP_BITS), false);
u8 b_temp = (u8)hot_raw_shift((u64)freq_data->last_temp,
(32 - HEAT_MAP_BITS), false);
@@ -491,7 +496,7 @@ static void hot_range_update(struct hot_inode_item *he,
hot_map_update(&hr->hot_range.hot_freq_data, root);
 
spin_lock(&hr->hot_range.lock);
-   obsolete = hot_is_obsolete(
+   obsolete = root->hot_type->ops.hot_is_obsolete_fn(
&hr->hot_range.hot_freq_data);
spin_unlock(&hr->hot_range.lock);
 
@@ -634,6 +639,7 @@ void hot_update_freqs(struct inode *inode, loff_t start,
struct hot_info *root = inode->i_sb->s_hot_root;
struct hot_inode_item *he;
struct hot_range_item *hr;
+   u64 range_size;
loff_t cur, end;
 
if (!root || (len == 0))
@@ -646,15 +652,19 @@ void hot_update_freqs(struct inode *inode, loff_t start,
}
 
spin_lock(&he->hot_inode.lock);
-   hot_freq_data_update(&he->hot_inode.hot_freq_data, rw);
+   hot_freq_data_update(root, &he->hot_inode.hot_freq_data, rw);
spin_unlock(&he->hot_inode.lock);
 
/*
-* Align ranges on RANGE_SIZE boundary
+* Align ranges on range size boundary
 * to prevent proliferation of range structs
 */
-   end = (start + len + RANGE_SIZE - 1) >> RANGE_BITS;
-   for (cur = (start >> RANGE_BITS); cur < end; cur++) {
+   range_size  = hot_raw_shift(1,
+   root->hot_type->range_bits, true);
+   end = hot_raw_shift((start + len + range_size - 1),
+   root->hot_type->range_bits, false);
+   cur = hot_raw_shift(start, root->hot_type->range_bits, false);
+   for (; cur < end; cur++) {
hr = hot_range_item_lookup(he, cur);
if (IS_ERR(hr)) {
WARN(1, "hot_range_item_lookup returns %ld\n",
@@ -664,7 +674,7 @@ void hot_update_freqs(struct inode *inode, loff_t start,
}
 
spin_lock(&hr->hot_range.lock);
-   hot_freq_data_update(&hr->hot_range.hot_freq_data, rw);
+   hot_freq_data_update(root, &hr->hot_range.hot_freq_data, rw);
spin_unlock(&hr->hot_range.lock);
 
hot_range_item_put(hr);
@@ -692,6 +702,17 @@ int hot_track_init(struct super_block *sb)

[PATCH v1 hot_track 12/16] vfs: add one ioctl interface

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  FS_IOC_GET_HEAT_INFO: return a struct containing the various
metrics collected in hot_freq_data structs, and also return a
calculated data temperature based on those metrics. Optionally, retrieve
the temperature from the hot data hash list instead of recalculating it.

Signed-off-by: Zhi Yong Wu 
---
 fs/compat_ioctl.c|5 +++
 fs/ioctl.c   |   74 ++
 include/linux/hot_tracking.h |   19 +++
 3 files changed, 98 insertions(+), 0 deletions(-)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 4c6285f..ad1d603 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1400,6 +1401,9 @@ COMPATIBLE_IOCTL(TIOCSTART)
 COMPATIBLE_IOCTL(TIOCSTOP)
 #endif
 
+/*Hot data tracking*/
+COMPATIBLE_IOCTL(FS_IOC_GET_HEAT_INFO)
+
 /* fat 'r' ioctls. These are handled by fat with ->compat_ioctl,
but we don't want warnings on other file systems. So declare
them as compatible here. */
@@ -1579,6 +1583,7 @@ asmlinkage long compat_sys_ioctl(unsigned int fd, 
unsigned int cmd,
case FIBMAP:
case FIGETBSZ:
case FIONREAD:
+   case FS_IOC_GET_HEAT_INFO:
if (S_ISREG(f.file->f_path.dentry->d_inode->i_mode))
break;
/*FALL THROUGH*/
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 3bdad6d..79fe81f 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -537,6 +538,76 @@ static int ioctl_fsthaw(struct file *filp)
 }
 
 /*
+ * Retrieve information about access frequency for the given file. Return it in
+ * a userspace-friendly struct for btrfsctl (or another tool) to parse.
+ *
+ * The temperature that is returned can be "live" -- that is, recalculated when
+ * the ioctl is called -- or it can be returned from the hashtable, reflecting
+ * the (possibly old) value that the system will use when considering files
+ * for migration. This behavior is determined by hot_heat_info->live.
+ */
+static int ioctl_heat_info(struct file *file, void __user *argp)
+{
+   struct inode *inode = file->f_dentry->d_inode;
+   struct hot_heat_info heat_info;
+   struct hot_inode_item *he;
+   int ret = 0;
+
+   if (copy_from_user((void *)&heat_info,
+   argp,
+   sizeof(struct hot_heat_info)) != 0) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   he = hot_inode_item_lookup(inode->i_sb->s_hot_root, inode->i_ino);
+   if (!he) {
+   /* we don't have any info on this file yet */
+   ret = -ENODATA;
+   goto err;
+   }
+
+   spin_lock(&he->hot_inode.lock);
+   heat_info.avg_delta_reads =
+   (__u64) he->hot_inode.hot_freq_data.avg_delta_reads;
+   heat_info.avg_delta_writes =
+   (__u64) he->hot_inode.hot_freq_data.avg_delta_writes;
+   heat_info.last_read_time =
+   (__u64) timespec_to_ns(&he->hot_inode.hot_freq_data.last_read_time);
+   heat_info.last_write_time =
+   (__u64) timespec_to_ns(&he->hot_inode.hot_freq_data.last_write_time);
+   heat_info.num_reads =
+   (__u32) he->hot_inode.hot_freq_data.nr_reads;
+   heat_info.num_writes =
+   (__u32) he->hot_inode.hot_freq_data.nr_writes;
+
+   if (heat_info.live > 0) {
+   /*
+* got a request for live temperature,
+* call hot_hash_calc_temperature to recalculate
+*/
+   heat_info.temp =
+   inode->i_sb->s_hot_root->hot_type->ops.hot_temp_calc_fn(
+   &he->hot_inode.hot_freq_data);
+   } else {
+   /* not live temperature, get it from the hashlist */
+   heat_info.temp = he->hot_inode.hot_freq_data.last_temp;
+   }
+   spin_unlock(&he->hot_inode.lock);
+
+   hot_inode_item_put(he);
+
+   if (copy_to_user(argp, (void *)&heat_info,
+   sizeof(struct hot_heat_info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+err:
+   return ret;
+}
+
+/*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
  *
@@ -591,6 +662,9 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, 
unsigned int cmd,
case FIGETBSZ:
return put_user(inode->i_sb->s_blocksize, argp);
 
+   case FS_IOC_GET_HEAT_INFO:
+   return ioctl_heat_info(filp, argp);
+
default:
if (S_ISREG(inode->i_mode))
error = file_ioctl(filp, cmd, arg);
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index 24e91ff..97283b3 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -44,6 +44,17 @@ struct hot_freq_data {

[PATCH v1 hot_track 15/16] btrfs: add hot tracking support

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Introduce one new mount option '-o hot_track',
and add its parsing support.
  Its usage looks like:
   mount -o hot_track
   mount -o nouser,hot_track
   mount -o nouser,hot_track,loop
   mount -o hot_track,nouser

Reviewed-by:   David Sterba 
Signed-off-by: Zhi Yong Wu 
---
 fs/btrfs/ctree.h |1 +
 fs/btrfs/super.c |   22 +-
 2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c72ead8..4703178 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1756,6 +1756,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY(1 << 20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1 << 22)
+#define BTRFS_MOUNT_HOT_TRACK  (1 << 23)
 
 #define btrfs_clear_opt(o, opt)((o) &= ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 915ac14..0bcc62b 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "compat.h"
 #include "delayed-inode.h"
 #include "ctree.h"
@@ -299,6 +300,10 @@ static void btrfs_put_super(struct super_block *sb)
 * last process that kept it busy.  Or segfault in the aforementioned
 * process...  Whom would you report that to?
 */
+
+   /* Hot data tracking */
+   if (btrfs_test_opt(btrfs_sb(sb)->tree_root, HOT_TRACK))
+   hot_track_exit(sb);
 }
 
 enum {
@@ -311,7 +316,7 @@ enum {
Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
-   Opt_check_integrity_print_mask, Opt_fatal_errors,
+   Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_hot_track,
Opt_err,
 };
 
@@ -352,6 +357,7 @@ static match_table_t tokens = {
{Opt_check_integrity_including_extent_data, "check_int_data"},
{Opt_check_integrity_print_mask, "check_int_print_mask=%d"},
{Opt_fatal_errors, "fatal_errors=%s"},
+   {Opt_hot_track, "hot_track"},
{Opt_err, NULL},
 };
 
@@ -614,6 +620,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
goto out;
}
break;
+   case Opt_hot_track:
+   btrfs_set_opt(info->mount_opt, HOT_TRACK);
+   break;
case Opt_err:
printk(KERN_INFO "btrfs: unrecognized mount option "
   "'%s'\n", p);
@@ -841,11 +850,20 @@ static int btrfs_fill_super(struct super_block *sb,
goto fail_close;
}
 
+   if (btrfs_test_opt(fs_info->tree_root, HOT_TRACK)) {
+   err = hot_track_init(sb);
+   if (err)
+   goto fail_hot;
+   }
+
save_mount_options(sb, data);
cleancache_init_fs(sb);
sb->s_flags |= MS_ACTIVE;
return 0;
 
+fail_hot:
+   dput(sb->s_root);
+   sb->s_root = NULL;
 fail_close:
close_ctree(fs_info->tree_root);
return err;
@@ -941,6 +959,8 @@ static int btrfs_show_options(struct seq_file *seq, struct 
dentry *dentry)
seq_puts(seq, ",skip_balance");
if (btrfs_test_opt(root, PANIC_ON_FATAL_ERROR))
seq_puts(seq, ",fatal_errors=panic");
+   if (btrfs_test_opt(root, HOT_TRACK))
+   seq_puts(seq, ",hot_track");
return 0;
 }
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 14/16] proc: add two hot_track proc files

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Add two proc files hot-kick-time and hot-update-delay
under the dir /proc/sys/fs/ in order to turn
TIME_TO_KICK and HEAT_UPDATE_DELAY into be tunable.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|   12 +---
 fs/hot_tracking.h|9 -
 include/linux/hot_tracking.h |7 +++
 kernel/sysctl.c  |   14 ++
 4 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index a98bfe6..69a6d33 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -27,6 +27,12 @@
 
 static struct dentry *hot_debugfs_root;
 
+int sysctl_hot_kick_time __read_mostly = 300;
+EXPORT_SYMBOL_GPL(sysctl_hot_kick_time);
+
+int sysctl_hot_update_delay __read_mostly = 300;
+EXPORT_SYMBOL_GPL(sysctl_hot_update_delay);
+
 /* kmem_cache pointers for slab caches */
 static struct kmem_cache *hot_inode_item_cachep __read_mostly;
 static struct kmem_cache *hot_range_item_cachep __read_mostly;
@@ -406,7 +412,7 @@ static bool hot_is_obsolete(struct hot_freq_data *freq_data)
(cur_time - timespec_to_ns(&freq_data->last_read_time));
u64 last_write_ns =
(cur_time - timespec_to_ns(&freq_data->last_write_time));
-   u64 kick_ns =  TIME_TO_KICK * NSEC_PER_SEC;
+   u64 kick_ns =  sysctl_hot_kick_time * NSEC_PER_SEC;
 
if ((last_read_ns > kick_ns) && (last_write_ns > kick_ns))
ret = 1;
@@ -604,7 +610,7 @@ static void hot_update_worker(struct work_struct *work)
 
/* Instert next delayed work */
queue_delayed_work(root->update_wq, &root->update_work,
-   msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC));
+   msecs_to_jiffies(sysctl_hot_update_delay * MSEC_PER_SEC));
 }
 
 static void *hot_range_seq_start(struct seq_file *seq, loff_t *pos)
@@ -1251,7 +1257,7 @@ int hot_track_init(struct super_block *sb)
/* Initialize hot tracking wq and arm one delayed work */
INIT_DELAYED_WORK(&root->update_work, hot_update_worker);
queue_delayed_work(root->update_wq, &root->update_work,
-   msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC));
+   msecs_to_jiffies(sysctl_hot_update_delay * MSEC_PER_SEC));
 
/* Register a shrinker callback */
root->hot_shrink.shrink = hot_track_prune;
diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
index a969940..ab6d603 100644
--- a/fs/hot_tracking.h
+++ b/fs/hot_tracking.h
@@ -23,15 +23,6 @@
 #define RANGE_BITS 20
 #define FREQ_POWER 4
 
-/*
- * time to quit keeping track of
- * tracking data (seconds)
- */
-#define TIME_TO_KICK 300
-
-/* set how often to update temperatures (seconds) */
-#define HEAT_UPDATE_DELAY 300
-
 /* NRR/NRW heat unit = 2^X accesses */
 #define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */
 #define NRR_COEFF_POWER 0
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index afb2952..f764730 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -126,6 +126,13 @@ struct hot_info {
 };
 
 /*
+ * Two variables have meanings as below:
+ * 1. time to quit keeping track of tracking data (seconds)
+ * 2. set how often to update temperatures (seconds)
+ */
+extern int sysctl_hot_kick_time, sysctl_hot_update_delay;
+
+/*
  * Hot data tracking ioctls:
  *
  * HOT_INFO - retrieve info on frequency of access
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 26f65ea..37624fb 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1545,6 +1545,20 @@ static struct ctl_table fs_table[] = {
.proc_handler   = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
+   {
+   .procname   = "hot-kick-time",
+   .data   = &sysctl_hot_kick_time,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
+   {
+   .procname   = "hot-update-delay",
+   .data   = &sysctl_hot_update_delay,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
{ }
 };
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 16/16] vfs: add documentation

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Add one doc for VFS hot tracking feature

Signed-off-by: Zhi Yong Wu 
---
 Documentation/filesystems/00-INDEX |2 +
 Documentation/filesystems/hot_tracking.txt |  263 
 2 files changed, 265 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/hot_tracking.txt

diff --git a/Documentation/filesystems/00-INDEX 
b/Documentation/filesystems/00-INDEX
index 8c624a1..b68bdff 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -118,3 +118,5 @@ xfs.txt
- info and mount options for the XFS filesystem.
 xip.txt
- info on execute-in-place for file mappings.
+hot_tracking.txt
+   - info on hot data tracking in VFS layer
diff --git a/Documentation/filesystems/hot_tracking.txt 
b/Documentation/filesystems/hot_tracking.txt
new file mode 100644
index 000..0adc524
--- /dev/null
+++ b/Documentation/filesystems/hot_tracking.txt
@@ -0,0 +1,263 @@
+Hot Data Tracking
+
+September, 2012Zhi Yong Wu 
+
+CONTENTS
+
+1. Introduction
+2. Motivation
+3. The Design
+4. How to Calc Frequency of Reads/Writes & Temperature
+5. Git Development Tree
+6. Usage Example
+
+
+1. Introduction
+
+  The feature adds experimental support for tracking data temperature
+information in VFS layer.  Essentially, this means maintaining some key
+stats(like number of reads/writes, last read/write time, frequency of
+reads/writes), then distilling those numbers down to a single
+"temperature" value that reflects what data is "hot," and using that
+temperature to move data to SSDs.
+
+  The long-term goal of the feature is to allow some FSs,
+e.g. Btrfs to intelligently utilize SSDs in a heterogenous volume.
+Incidentally, this project has been motivated by
+the Project Ideas page on the Btrfs wiki.
+
+  Of course, users are warned not to run this code outside of development
+environments. These patches are EXPERIMENTAL, and as such they might eat
+your data and/or memory. That said, the code should be relatively safe
+when the hottrack mount option are disabled.
+
+
+2. Motivation
+
+  The overall goal of enabling hot data relocation to SSD has been
+motivated by the Project Ideas page on the Btrfs wiki at
+.
+It will divide into two steps. VFS provide hot data tracking function
+while specific FS will provide hot data relocation function.
+So as the first step of this goal, it is hoped that the patchset
+for hot data tracking will eventually mature into VFS.
+
+  This is essentially the traditional cache argument: SSD is fast and
+expensive; HDD is cheap but slow. ZFS, for example, can already take
+advantage of SSD caching. Btrfs should also be able to take advantage of
+hybrid storage without many broad, sweeping changes to existing code.
+
+
+3. The Design
+
+These include the following parts:
+
+* Hooks in existing vfs functions to track data access frequency
+
+* New radix-trees for tracking access frequency of inodes and sub-file
+ranges
+The relationship between super_block and radix-tree is as below:
+hot_info.hot_inode_tree
+Each FS instance can find hot tracking info s_hotinfo.
+In this hot_info, it store a lot of hot tracking info such as hot_inode_tree,
+inode and range list, etc.
+
+* A list for indexing data by its temperature
+
+* A debugfs interface for dumping data from the radix-trees
+
+* A background kthread for updating inode heat info
+
+* Mount options for enabling temperature tracking(-o hot_track,
+default mean disabled)
+* An ioctl to retrieve the frequency information collected for a certain
+file
+* Ioctls to enable/disable frequency tracking per inode.
+
+Let us see their relationship as below:
+
+* hot_info.hot_inode_tree indexes hot_inode_items, one per inode
+
+* hot_inode_item contains access frequency data for that inode
+
+* hot_inode_item holds a heat list node to index the access
+frequency data for that inode
+
+* hot_inode_item.hot_range_tree indexes hot_range_items for that inode
+
+* hot_range_item contains access frequency data for that range
+
+* hot_range_item holds a heat list node to index the access
+frequency data for that range
+
+* hot_info.heat_inode_map indexes per-inode heat list nodes
+
+* hot_info.heat_range_map indexes per-range heat list nodes
+
+  How about some ascii art? :) Just looking at the hot inode item case
+(the range item case is the same pattern, though), we have:
+
+heat_inode_map   hot_inode_tree
+| |
+| V
+|   +---hot_comm_item+
+|   |   frequency data   |
++---+   |list_head   |
+|   V^ | V
+| ...<--hot_comm_item-->...  | |  ...<--hot_comm_item-->...
+|   frequency data   | |frequency data
++>list_head--

[PATCH v1 hot_track 13/16] vfs: add debugfs support

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Add a /sys/kernel/debug/hot_track// directory for each
volume that contains two files. The first, `inode_stats', contains the
heat information for inodes that have been brought into the hot data map
structures. The second, `range_stats', contains similar information for
subfile ranges.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|  489 +-
 fs/hot_tracking.h|5 +
 include/linux/hot_tracking.h |1 +
 3 files changed, 493 insertions(+), 2 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 8144200..a98bfe6 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -21,9 +21,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hot_tracking.h"
 
+static struct dentry *hot_debugfs_root;
+
 /* kmem_cache pointers for slab caches */
 static struct kmem_cache *hot_inode_item_cachep __read_mostly;
 static struct kmem_cache *hot_range_item_cachep __read_mostly;
@@ -215,8 +218,8 @@ struct hot_inode_item
else if (ino > entry->i_ino)
p = &(*p)->rb_right;
else {
-   spin_unlock(&root->lock);
kref_get(&entry->hot_inode.refs);
+   spin_unlock(&root->lock);
return entry;
}
}
@@ -266,8 +269,8 @@ static struct hot_range_item
else if (start > hot_range_end(entry))
p = &(*p)->rb_right;
else {
-   spin_unlock(&he->lock);
kref_get(&entry->hot_range.refs);
+   spin_unlock(&he->lock);
return entry;
}
}
@@ -604,6 +607,475 @@ static void hot_update_worker(struct work_struct *work)
msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC));
 }
 
+static void *hot_range_seq_start(struct seq_file *seq, loff_t *pos)
+{
+   struct hot_info *root = seq->private;
+   struct rb_node *node, *node2;
+   struct hot_comm_item *ci;
+   struct hot_inode_item *he;
+   struct hot_range_item *hr;
+   loff_t l = *pos;
+
+   spin_lock(&root->lock);
+   node = rb_first(&root->hot_inode_tree.map);
+   while (node) {
+   ci = rb_entry(node, struct hot_comm_item, rb_node);
+   he = container_of(ci, struct hot_inode_item, hot_inode);
+   spin_lock(&he->lock);
+   node2 = rb_first(&he->hot_range_tree.map);
+   while (node2) {
+   if (!l--) {
+   ci = rb_entry(node2,
+   struct hot_comm_item, rb_node);
+   hr = container_of(ci,
+   struct hot_range_item, hot_range);
+   kref_get(&hr->hot_range.refs);
+   spin_unlock(&he->lock);
+   spin_unlock(&root->lock);
+   return hr;
+   }
+   node2 = rb_next(node2);
+   }
+   node = rb_next(node);
+   spin_unlock(&he->lock);
+   }
+   spin_unlock(&root->lock);
+   return NULL;
+}
+
+static void *hot_range_seq_next(struct seq_file *seq,
+   void *v, loff_t *pos)
+{
+   struct rb_node *node, *node2;
+   struct hot_comm_item *ci;
+   struct hot_inode_item *he;
+   struct hot_range_item *hr_next = NULL, *hr = v;
+
+   spin_lock(&hr->hot_range.lock);
+   (*pos)++;
+   node2 = rb_next(&hr->hot_range.rb_node);
+   if (node2)
+   goto next;
+
+   node = rb_next(&hr->hot_inode->hot_inode.rb_node);
+   if (node) {
+   ci = rb_entry(node, struct hot_comm_item, rb_node);
+   he = container_of(ci, struct hot_inode_item, hot_inode);
+   node2 = rb_first(&he->hot_range_tree.map);
+   if (node2) {
+next:
+   ci = rb_entry(node2,
+   struct hot_comm_item, rb_node);
+   hr_next = container_of(ci,
+   struct hot_range_item, hot_range);
+   kref_get(&hr_next->hot_range.refs);
+   }
+   }
+   spin_unlock(&hr->hot_range.lock);
+
+   hot_range_item_put(hr);
+   return hr_next;
+}
+
+static void hot_range_seq_stop(struct seq_file *seq, void *v)
+{
+   struct hot_range_item *hr = v;
+
+   if (hr)
+   hot_range_item_put(hr);
+}
+
+static int hot_range_seq_show(struct seq_file *seq, void *v)
+{
+   struct hot_range_item *hr = v;
+   struct hot_inode_item *he = hr->hot_inode;
+   struct hot_freq_data *freq_data = &hr->hot_range.hot_freq_data;
+   struct hot_info *root = container_of(he->hot_inode_tree,
+   struct hot_info, hot

[PATCH v1 hot_track 11/16] vfs: register one shrinker

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Register a shrinker to control the amount of
memory that is used in tracking hot regions - if we are throwing
inodes out of memory due to memory pressure, we most definitely are
going to need to reduce the amount of memory the tracking code is
using, even if it means losing useful information (i.e. the shrinker
accelerates the aging process).

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|   61 ++
 include/linux/hot_tracking.h |1 +
 2 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 81fb084..8144200 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -630,6 +630,61 @@ err:
 }
 EXPORT_SYMBOL_GPL(hot_cache_init);
 
+static int hot_track_prune_map(struct hot_map_head *map_head,
+   bool type, int nr)
+{
+   struct hot_comm_item *node;
+   int i;
+
+   for (i = 0; i < HEAT_MAP_SIZE; i++) {
+   while (!list_empty(&(map_head + i)->node_list)) {
+   if (nr-- <= 0)
+   break;
+
+   node = list_first_entry(&(map_head + i)->node_list,
+   struct hot_comm_item, n_list);
+   if (type) {
+   struct hot_inode_item *hot_inode =
+   container_of(node,
+   struct hot_inode_item, hot_inode);
+   hot_inode_item_put(hot_inode);
+   } else {
+   struct hot_range_item *hot_range =
+   container_of(node,
+   struct hot_range_item, hot_range);
+   hot_range_item_put(hot_range);
+   }
+   }
+   }
+
+   return nr;
+}
+
+/* The shrinker callback function */
+static int hot_track_prune(struct shrinker *shrink,
+   struct shrink_control *sc)
+{
+   struct hot_info *root =
+   container_of(shrink, struct hot_info, hot_shrink);
+   int ret;
+
+   if (sc->nr_to_scan == 0)
+   return root->hot_map_nr;
+
+   if (!(sc->gfp_mask & __GFP_FS))
+   return -1;
+
+   ret = hot_track_prune_map(root->heat_range_map,
+   false, sc->nr_to_scan);
+   if (ret > 0)
+   ret = hot_track_prune_map(root->heat_inode_map,
+   true, ret);
+   if (ret > 0)
+   root->hot_map_nr -= (sc->nr_to_scan - ret);
+
+   return root->hot_map_nr;
+}
+
 /*
  * Main function to update access frequency from read/writepage(s) hooks
  */
@@ -726,6 +781,11 @@ int hot_track_init(struct super_block *sb)
queue_delayed_work(root->update_wq, &root->update_work,
msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC));
 
+   /* Register a shrinker callback */
+   root->hot_shrink.shrink = hot_track_prune;
+   root->hot_shrink.seeks = DEFAULT_SEEKS;
+   register_shrinker(&root->hot_shrink);
+
sb->s_hot_root = root;
 
printk(KERN_INFO "VFS: Turning on hot data tracking\n");
@@ -744,6 +804,7 @@ void hot_track_exit(struct super_block *sb)
 {
struct hot_info *root = sb->s_hot_root;
 
+   unregister_shrinker(&root->hot_shrink);
cancel_delayed_work_sync(&root->update_work);
destroy_workqueue(root->update_wq);
hot_map_exit(root);
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index f73111e..24e91ff 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -110,6 +110,7 @@ struct hot_info {
struct workqueue_struct *update_wq;
struct delayed_work update_work;
struct hot_type *hot_type;
+   struct shrinker hot_shrink;
 };
 
 extern void __init hot_cache_init(void);
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 01/16] vfs: introduce some data structures

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  One root structure hot_info is defined, is hooked
up in super_block, and will be used to hold radix tree
root, hash list root and some other information, etc.
  Adds hot_inode_tree struct to keep track of
frequently accessed files, and be keyed by {inode, offset}.
Trees contain hot_inode_items representing those files
and ranges.
  Having these trees means that vfs can quickly determine the
temperature of some data by doing some calculations on the
hot_freq_data struct that hangs off of the tree item.
  Define two items hot_inode_item and hot_range_item,
one of them represents one tracked file
to keep track of its access frequency and the tree of
ranges in this file, while the latter represents
a file range of one inode.
  Each of the two structures contains a hot_freq_data
struct with its frequency of access metrics (number of
{reads, writes}, last {read,write} time, frequency of
{reads,writes}).
  Also, each hot_inode_item contains one hot_range_tree
struct which is keyed by {inode, offset, length}
and used to keep track of all the ranges in this file.

Signed-off-by: Zhi Yong Wu 
---
 fs/Makefile  |2 +-
 fs/dcache.c  |2 +
 fs/hot_tracking.c|  109 ++
 fs/hot_tracking.h|   22 
 include/linux/hot_tracking.h |   79 ++
 5 files changed, 213 insertions(+), 1 deletions(-)
 create mode 100644 fs/hot_tracking.c
 create mode 100644 fs/hot_tracking.h
 create mode 100644 include/linux/hot_tracking.h

diff --git a/fs/Makefile b/fs/Makefile
index 1d7af79..f966dea 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -11,7 +11,7 @@ obj-y :=  open.o read_write.o file_table.o super.o \
attr.o bad_inode.o file.o filesystems.o namespace.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
pnode.o drop_caches.o splice.o sync.o utimes.o \
-   stack.o fs_struct.o statfs.o
+   stack.o fs_struct.o statfs.o hot_tracking.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
diff --git a/fs/dcache.c b/fs/dcache.c
index 3a463d0..7d5be16 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "mount.h"
 
@@ -3172,4 +3173,5 @@ void __init vfs_caches_init(unsigned long mempages)
mnt_init();
bdev_cache_init();
chrdev_init();
+   hot_cache_init();
 }
diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
new file mode 100644
index 000..ef7ff09
--- /dev/null
+++ b/fs/hot_tracking.c
@@ -0,0 +1,109 @@
+/*
+ * fs/hot_tracking.c
+ *
+ * Copyright (C) 2012 IBM Corp. All rights reserved.
+ * Written by Zhi Yong Wu 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hot_tracking.h"
+
+/* kmem_cache pointers for slab caches */
+static struct kmem_cache *hot_inode_item_cachep __read_mostly;
+static struct kmem_cache *hot_range_item_cachep __read_mostly;
+
+/*
+ * Initialize the inode tree. Should be called for each new inode
+ * access or other user of the hot_inode interface.
+ */
+static void hot_inode_tree_init(struct hot_info *root)
+{
+   root->hot_inode_tree.map = RB_ROOT;
+   spin_lock_init(&root->lock);
+}
+
+/*
+ * Initialize the hot range tree. Should be called for each new inode
+ * access or other user of the hot_range interface.
+ */
+void hot_range_tree_init(struct hot_inode_item *he)
+{
+   he->hot_range_tree.map = RB_ROOT;
+   spin_lock_init(&he->lock);
+}
+
+/*
+ * Initialize a new hot_range_item structure. The new structure is
+ * returned with a reference count of one and needs to be
+ * freed using free_range_item()
+ */
+static void hot_range_item_init(struct hot_range_item *hr, loff_t start,
+   struct hot_inode_item *he)
+{
+   hr->start = start;
+   hr->len = RANGE_SIZE;
+   hr->hot_inode = he;
+   kref_init(&hr->hot_range.refs);
+   spin_lock_init(&hr->hot_range.lock);
+   hr->hot_range.hot_freq_data.avg_delta_reads = (u64) -1;
+   hr->hot_range.hot_freq_data.avg_delta_writes = (u64) -1;
+   hr->hot_range.hot_freq_data.flags = FREQ_DATA_TYPE_RANGE;
+}
+
+/*
+ * Initialize a new hot_inode_item structure. The new structure is
+ * returned with a reference count of one and needs to be
+ * freed using hot_free_inode_item()
+ */
+static void hot_inode_item_init(struct hot_inode_item *he,
+   u64 ino,
+   struct hot_rb_tree *hot_inode_tree)
+{
+   he->i_ino = ino;
+   he->hot_inode_tree = hot_inode_tree;
+   kref_init(&he->hot_inode.refs);
+   spi

[PATCH v1 hot_track 04/16] vfs: add two map arrays

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

  Adds two map arrays which contains
a lot of list and is used to efficiently
look up the data temperature of a file or its
ranges.
  In each list of map arrays, the array node
will keep track of temperature info.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c|   60 ++
 include/linux/hot_tracking.h |   16 +++
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 6d396fe..bd2c353 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -58,6 +58,7 @@ static void hot_range_item_init(struct hot_range_item *hr, 
loff_t start,
hr->hot_inode = he;
kref_init(&hr->hot_range.refs);
spin_lock_init(&hr->hot_range.lock);
+   INIT_LIST_HEAD(&hr->hot_range.n_list);
hr->hot_range.hot_freq_data.avg_delta_reads = (u64) -1;
hr->hot_range.hot_freq_data.avg_delta_writes = (u64) -1;
hr->hot_range.hot_freq_data.flags = FREQ_DATA_TYPE_RANGE;
@@ -89,6 +90,16 @@ static void hot_range_item_free(struct kref *kref)
struct hot_comm_item, refs);
struct hot_range_item *hr = container_of(comm_item,
struct hot_range_item, hot_range);
+   struct hot_info *root = container_of(
+   hr->hot_inode->hot_inode_tree,
+   struct hot_info, hot_inode_tree);
+
+   spin_lock(&hr->hot_range.lock);
+   if (!list_empty(&hr->hot_range.n_list)) {
+   list_del_init(&hr->hot_range.n_list);
+   root->hot_map_nr--;
+   }
+   spin_unlock(&hr->hot_range.lock);
 
rb_erase(&hr->hot_range.rb_node,
&hr->hot_inode->hot_range_tree.map);
@@ -128,6 +139,15 @@ static void hot_inode_item_free(struct kref *kref)
struct hot_comm_item, refs);
struct hot_inode_item *he = container_of(comm_item,
struct hot_inode_item, hot_inode);
+   struct hot_info *root = container_of(he->hot_inode_tree,
+   struct hot_info, hot_inode_tree);
+
+   spin_lock(&he->hot_inode.lock);
+   if (!list_empty(&he->hot_inode.n_list)) {
+   list_del_init(&he->hot_inode.n_list);
+   root->hot_map_nr--;
+   }
+   spin_unlock(&he->hot_inode.lock);
 
hot_range_tree_free(he);
rb_erase(&he->hot_inode.rb_node, &he->hot_inode_tree->map);
@@ -292,6 +312,44 @@ static void hot_freq_data_update(struct hot_freq_data 
*freq_data, bool write)
 }
 
 /*
+ * Initialize inode and range map info.
+ */
+static void hot_map_init(struct hot_info *root)
+{
+   int i;
+   for (i = 0; i < HEAT_MAP_SIZE; i++) {
+   INIT_LIST_HEAD(&root->heat_inode_map[i].node_list);
+   INIT_LIST_HEAD(&root->heat_range_map[i].node_list);
+   root->heat_inode_map[i].temp = i;
+   root->heat_range_map[i].temp = i;
+   }
+}
+
+static void hot_map_list_free(struct list_head *node_list,
+   struct hot_info *root)
+{
+   struct list_head *pos, *next;
+   struct hot_comm_item *node;
+
+   list_for_each_safe(pos, next, node_list) {
+   node = list_entry(pos, struct hot_comm_item, n_list);
+   list_del_init(&node->n_list);
+   root->hot_map_nr--;
+   }
+
+}
+
+/* Free inode and range map info */
+static void hot_map_exit(struct hot_info *root)
+{
+   int i;
+   for (i = 0; i < HEAT_MAP_SIZE; i++) {
+   hot_map_list_free(&root->heat_inode_map[i].node_list, root);
+   hot_map_list_free(&root->heat_range_map[i].node_list, root);
+   }
+}
+
+/*
  * Initialize kmem cache for hot_inode_item and hot_range_item.
  */
 void __init hot_cache_init(void)
@@ -382,6 +440,7 @@ int hot_track_init(struct super_block *sb)
}
 
hot_inode_tree_init(root);
+   hot_map_init(root);
 
sb->s_hot_root = root;
 
@@ -395,6 +454,7 @@ void hot_track_exit(struct super_block *sb)
 {
struct hot_info *root = sb->s_hot_root;
 
+   hot_map_exit(root);
hot_inode_tree_exit(root);
sb->s_hot_root = NULL;
kfree(root);
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index b9992c0..34a0530 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -20,6 +20,9 @@
 #include 
 #include 
 
+#define HEAT_MAP_BITS 8
+#define HEAT_MAP_SIZE (1 << HEAT_MAP_BITS)
+
 struct hot_rb_tree {
struct rb_root map;
spinlock_t lock;
@@ -41,12 +44,19 @@ struct hot_freq_data {
u32 last_temp;
 };
 
+/* List heads in hot map array */
+struct hot_map_head {
+   struct list_head node_list;
+   u8 temp;
+};
+
 /* The common info for both following structures */
 struct hot_comm_item {
struct rb_node rb_node; /* rbtree index */
struct hot_freq_data hot_freq_data;  /* frequency data */
spinlock_t lock; /* protects object data */
str

[PATCH v1 hot_track 08/16] vfs: add aging function

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c |   49 +
 fs/hot_tracking.h |6 ++
 2 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 446fbd4..304028d 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -385,6 +385,24 @@ static u32 hot_temp_calc(struct hot_freq_data *freq_data)
return result;
 }
 
+static bool hot_is_obsolete(struct hot_freq_data *freq_data)
+{
+   int ret = 0;
+   struct timespec ckt = current_kernel_time();
+
+   u64 cur_time = timespec_to_ns(&ckt);
+   u64 last_read_ns =
+   (cur_time - timespec_to_ns(&freq_data->last_read_time));
+   u64 last_write_ns =
+   (cur_time - timespec_to_ns(&freq_data->last_write_time));
+   u64 kick_ns =  TIME_TO_KICK * NSEC_PER_SEC;
+
+   if ((last_read_ns > kick_ns) && (last_write_ns > kick_ns))
+   ret = 1;
+
+   return ret;
+}
+
 /*
  * Calculate a new temperature and, if necessary,
  * move the list_head corresponding to this inode or range
@@ -452,6 +470,37 @@ static void hot_map_update(struct hot_freq_data *freq_data,
}
 }
 
+/* Update temperatures for each range item for aging purposes */
+static void hot_range_update(struct hot_inode_item *he,
+   struct hot_info *root)
+{
+   struct rb_node *node;
+   struct hot_comm_item *ci;
+   struct hot_range_item *hr;
+   bool obsolete;
+
+   spin_lock(&he->lock);
+   node = rb_first(&he->hot_range_tree.map);
+   while (node) {
+   ci = rb_entry(node, struct hot_comm_item, rb_node);
+   hr = container_of(ci, struct hot_range_item, hot_range);
+   kref_get(&hr->hot_range.refs);
+   hot_map_update(&hr->hot_range.hot_freq_data, root);
+
+   spin_lock(&hr->hot_range.lock);
+   obsolete = hot_is_obsolete(
+   &hr->hot_range.hot_freq_data);
+   spin_unlock(&hr->hot_range.lock);
+
+   node = rb_next(node);
+
+   hot_range_item_put(hr);
+   if (obsolete)
+   hot_range_item_put(hr);
+   }
+   spin_unlock(&he->lock);
+}
+
 /*
  * Initialize inode and range map info.
  */
diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
index f33066f..46d068a 100644
--- a/fs/hot_tracking.h
+++ b/fs/hot_tracking.h
@@ -24,6 +24,12 @@
 #define RANGE_SIZE (1 << RANGE_BITS)
 #define FREQ_POWER 4
 
+/*
+ * time to quit keeping track of
+ * tracking data (seconds)
+ */
+#define TIME_TO_KICK 300
+
 /* NRR/NRW heat unit = 2^X accesses */
 #define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */
 #define NRR_COEFF_POWER 0
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] thp: fix update_mmu_cache_pmd() calls

2012-11-16 Thread Kirill A. Shutemov

From: "Kirill A. Shutemov" 

update_mmu_cache_pmd() takes pointer to pmd_t as third, not pmd_t.

mm/huge_memory.c: In function 'do_huge_pmd_numa_page':
mm/huge_memory.c:825:2: error: incompatible type for argument 3 of 
'update_mmu_cache_pmd'
In file included from include/linux/mm.h:44:0,
 from mm/huge_memory.c:8:
arch/mips/include/asm/pgtable.h:385:20: note: expected 'struct pmd_t *' but 
argument is of type 'pmd_t'
mm/huge_memory.c:895:2: error: incompatible type for argument 3 of 
'update_mmu_cache_pmd'
In file included from include/linux/mm.h:44:0,
 from mm/huge_memory.c:8:
arch/mips/include/asm/pgtable.h:385:20: note: expected 'struct pmd_t *' but 
argument is of type 'pmd_t'

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4489e16..2401a16 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -886,7 +886,7 @@ fixup:
/* change back to regular protection */
entry = pmd_modify(entry, vma->vm_page_prot);
set_pmd_at(mm, haddr, pmd, entry);
-   update_mmu_cache_pmd(vma, address, entry);
+   update_mmu_cache_pmd(vma, address, &entry);
 
 unlock:
spin_unlock(&mm->page_table_lock);
@@ -956,7 +956,7 @@ migrate:
page_add_new_anon_rmap(new_page, vma, haddr);
 
set_pmd_at(mm, haddr, pmd, entry);
-   update_mmu_cache_pmd(vma, address, entry);
+   update_mmu_cache_pmd(vma, address, &entry);
page_remove_rmap(page);
spin_unlock(&mm->page_table_lock);
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] x86: convert update_mmu_cache() and update_mmu_cache_pmd() to functions

2012-11-16 Thread Kirill A. Shutemov

From: "Kirill A. Shutemov" 

Converting macros to functions unhide type problems before changes will
be integrated and trigger problems on other architectures.

Signed-off-by: Kirill A. Shutemov 
---
 arch/x86/include/asm/pgtable.h| 12 
 arch/x86/include/asm/pgtable_32.h |  7 ---
 arch/x86/include/asm/pgtable_64.h |  3 ---
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a984cf9..ec08b47 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -775,6 +775,18 @@ static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, 
int count)
memcpy(dst, src, count * sizeof(pgd_t));
 }
 
+/*
+ * The x86 doesn't have any external MMU info: the kernel page
+ * tables contain all the necessary information.
+ */
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+}
+static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
+   unsigned long addr, pmd_t *pmd)
+{
+}
 
 #include 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/pgtable_32.h 
b/arch/x86/include/asm/pgtable_32.h
index 8faa215..9ee3221 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -66,13 +66,6 @@ do { \
__flush_tlb_one((vaddr));   \
 } while (0)
 
-/*
- * The i386 doesn't have any external MMU info: the kernel page
- * tables contain all the necessary information.
- */
-#define update_mmu_cache(vma, address, ptep) do { } while (0)
-#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
-
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/x86/include/asm/pgtable_64.h 
b/arch/x86/include/asm/pgtable_64.h
index 47356f9..615b0c7 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -142,9 +142,6 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
 #define pte_unmap(pte) ((void)(pte))/* NOP */
 
-#define update_mmu_cache(vma, address, ptep) do { } while (0)
-#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
-
 /* Encode and de-code a swap entry */
 #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 hot_track 07/16] vfs: add map info update function

2012-11-16 Thread zwu . kernel

From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c |   67 +
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 3cb14e2..446fbd4 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -386,6 +386,73 @@ static u32 hot_temp_calc(struct hot_freq_data *freq_data)
 }
 
 /*
+ * Calculate a new temperature and, if necessary,
+ * move the list_head corresponding to this inode or range
+ * to the proper list with the new temperature
+ */
+static void hot_map_update(struct hot_freq_data *freq_data,
+   struct hot_info *root)
+{
+   struct hot_map_head *buckets, *cur_bucket;
+   struct hot_comm_item *comm_item;
+   struct hot_inode_item *he;
+   struct hot_range_item *hr;
+   u32 temp = hot_temp_calc(freq_data);
+   u8 a_temp = (u8)hot_raw_shift((u64)temp, (32 - HEAT_MAP_BITS), false);
+   u8 b_temp = (u8)hot_raw_shift((u64)freq_data->last_temp,
+   (32 - HEAT_MAP_BITS), false);
+
+   comm_item = container_of(freq_data,
+   struct hot_comm_item, hot_freq_data);
+
+   if (freq_data->flags & FREQ_DATA_TYPE_INODE) {
+   he = container_of(comm_item,
+   struct hot_inode_item, hot_inode);
+   buckets = root->heat_inode_map;
+
+   if (he == NULL)
+   return;
+
+   spin_lock(&he->hot_inode.lock);
+   if (list_empty(&he->hot_inode.n_list) || (a_temp != b_temp)) {
+   if (!list_empty(&he->hot_inode.n_list)) {
+   list_del_init(&he->hot_inode.n_list);
+   root->hot_map_nr--;
+   }
+
+   cur_bucket = buckets + a_temp;
+   list_add_tail(&he->hot_inode.n_list,
+   &cur_bucket->node_list);
+   root->hot_map_nr++;
+   freq_data->last_temp = temp;
+   }
+   spin_unlock(&he->hot_inode.lock);
+   } else if (freq_data->flags & FREQ_DATA_TYPE_RANGE) {
+   hr = container_of(comm_item,
+   struct hot_range_item, hot_range);
+   buckets = root->heat_range_map;
+
+   if (hr == NULL)
+   return;
+
+   spin_lock(&hr->hot_range.lock);
+   if (list_empty(&hr->hot_range.n_list) || (a_temp != b_temp)) {
+   if (!list_empty(&hr->hot_range.n_list)) {
+   list_del_init(&hr->hot_range.n_list);
+   root->hot_map_nr--;
+   }
+
+   cur_bucket = buckets + a_temp;
+   list_add_tail(&hr->hot_range.n_list,
+   &cur_bucket->node_list);
+   root->hot_map_nr++;
+   freq_data->last_temp = temp;
+   }
+   spin_unlock(&hr->hot_range.lock);
+   }
+}
+
+/*
  * Initialize inode and range map info.
  */
 static void hot_map_init(struct hot_info *root)
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drm/nouveau: fix init with agpgart-uninorth

2012-11-16 Thread Aaro Koskinen

Check that the AGP aperture can be mapped. This follows a similar change
done for Radeon (commit 365048ff, drm/radeon: AGP memory is only I/O if
the aperture can be mapped by the CPU.).

The patch fixes the following error seen on G5 iMac:

nouveau E[ DRM] failed to create kernel channel, -12

Signed-off-by: Aaro Koskinen 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 35ac57f..5f0e7ef 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1279,7 +1279,7 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, 
struct ttm_mem_reg *mem)
if (drm->agp.stat == ENABLED) {
mem->bus.offset = mem->start << PAGE_SHIFT;
mem->bus.base = drm->agp.base;
-   mem->bus.is_iomem = true;
+   mem->bus.is_iomem = !dev->agp->cant_use_aperture;
}
 #endif
break;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/3] gpio / ACPI: add ACPI support

2012-11-16 Thread Rafael J. Wysocki

On Friday, November 16, 2012 10:12:46 AM Mika Westerberg wrote:
> On Fri, Nov 16, 2012 at 10:05:49AM +0200, Mika Westerberg wrote:
> > 
> > One more thing, sometimes we want to assign the handle like in the case of
> > SPI controller driver we set the master->dev.acpi_handle. In that case we
> > can't use DEVICE_ACPI_HANDLE() as is. Should we just do something like:
> > 
> > master->dev.acpi_handle = pdev->dev.acpi_handle;
> > 
> > or should we introduce some new macro that supports this?
> 
> Or we could just drop the cast from the macro and use the same.
> 
>   #define DEVICE_ACPI_HANDLE(dev) ((dev)->acpi_handle)

Well, I'm not sure.  Perhaps it's better to add a new macro, like
ACPI_HANDLE(dev), defined as above and use it going forward (we can then
phase out the old one gradually).

However, let's leave the patches in this series as they are for now, we can
add that macro in a separate patch later.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-16 Thread Marcelo Tosatti

On Fri, Nov 16, 2012 at 12:46:16PM +0800, Xiao Guangrong wrote:
> On 11/16/2012 11:56 AM, Marcelo Tosatti wrote:
> > On Fri, Nov 16, 2012 at 11:39:12AM +0800, Xiao Guangrong wrote:
> >> On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
> >>> On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
>  On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
> > On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
> >> Hi Marcelo,
> >>
> >> On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
> >>> On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
>  Do not drop large spte until it can be insteaded by small pages so 
>  that
>  the guest can happliy read memory through it
> 
>  The idea is from Avi:
>  | As I mentioned before, write-protecting a large spte is a good 
>  idea,
>  | since it moves some work from protect-time to fault-time, so it 
>  reduces
>  | jitter.  This removes the need for the return value.
> 
>  Signed-off-by: Xiao Guangrong 
>  ---
>   arch/x86/kvm/mmu.c |   34 +-
>   1 files changed, 9 insertions(+), 25 deletions(-)
> >>>
> >>> Its likely that other 4k pages are mapped read-write in the 2mb range 
> >>> covered by a read-only 2mb map. Therefore its not entirely useful to
> >>> map read-only. 
> >>>
> >>
> >> It needs a page fault to install a pte even if it is the read access.
> >> After the change, the page fault can be avoided.
> >>
> >>> Can you measure an improvement with this change?
> >>
> >> I have a test case to measure the read time which has been attached.
> >> It maps 4k pages at first (dirt-loggged), then switch to large sptes
> >> (stop dirt-logging), at the last, measure the read access time after 
> >> write
> >> protect sptes.
> >>
> >> Before: 23314111 nsAfter: 11404197 ns
> >
> > Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with 
> > shadow),
> > that is:
> >
> > - large page must be destroyed when write protecting due to 
> > shadowed page.
> > - with shadow, it does not make sense to write protect 
> > large sptes as mentioned earlier.
> >
> 
>  This case is removed now, the code when e49146dce8c3dc6f44 was applied 
>  is:
>  |
>  |pt = sp->spt;
>  |for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
>  |/* avoid RMW */
>  |if (is_writable_pte(pt[i]))
>  |update_spte(&pt[i], pt[i] & 
>  ~PT_WRITABLE_MASK);
>  |}
> 
>  The real problem in this code is it would write-protect the spte even if
>  it is not a last spte that caused the middle-level shadow page table was
>  write-protected. So e49146dce8c3dc6f44 added this code:
>  |if (sp->role.level != PT_PAGE_TABLE_LEVEL)
>  |continue;
>  |
>  was good to fix this problem.
> 
>  Now, the current code is:
>  |for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
>  |if (!is_shadow_present_pte(pt[i]) ||
>  |  !is_last_spte(pt[i], sp->role.level))
>  |continue;
>  |
>  |spte_write_protect(kvm, &pt[i], &flush, false);
>  |}
>  It only write-protect the last spte. So, it allows large spte existent.
>  (the large spte can be broken by drop_large_spte() on the page-fault 
>  path.)
> 
> > So i wonder why is this part from your patch
> >
> > -   if (level > PT_PAGE_TABLE_LEVEL &&
> > -   has_wrprotected_page(vcpu->kvm, gfn, level)) {
> > -   ret = 1;
> > -   drop_spte(vcpu->kvm, sptep);
> > -   goto done;
> > -   }
> >
> > necessary (assuming EPT is in use).
> 
>  This is safe, we change these code to:
> 
>  -if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>  +if ((level > PT_PAGE_TABLE_LEVEL &&
>  +   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
>  +  mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>   pgprintk("%s: found shadow page for %llx, 
>  marking ro\n",
>    __func__, gfn);
>   ret = 1;
> 
>  The spte become read-only which can ensure the shadow gfn can not be 
>  changed.
> 
>  Btw, the origin code allows to create readonly spte under this case if 
>  !(pte_access & WRITEABBLE)
> >>>
> >>> Regarding shadow: it should be fine as long as fault path

Re: [PATCH v2 1/3] gpio / ACPI: add ACPI support

2012-11-16 Thread Rafael J. Wysocki

On Thursday, November 15, 2012 01:03:15 PM Mika Westerberg wrote:
> From: Mathias Nyman 
> 
> Add support for translating ACPI GPIO pin numbers to Linux GPIO API pins.
> Needs a gpio controller driver with the acpi handler hook set.
> 
> Drivers can use acpi_get_gpio() to translate ACPI5 GpioIO and GpioInt
> resources to Linux GPIO's.
> 
> Signed-off-by: Mathias Nyman 
> Signed-off-by: Mika Westerberg 

I wonder if anyone has objections or comments?

Linus, Grant, are you OK with this patch?

Rafael


> ---
>  drivers/gpio/Kconfig|4 
>  drivers/gpio/Makefile   |1 +
>  drivers/gpio/gpiolib-acpi.c |   56 
> +++
>  include/linux/acpi_gpio.h   |   19 +++
>  4 files changed, 80 insertions(+)
>  create mode 100644 drivers/gpio/gpiolib-acpi.c
>  create mode 100644 include/linux/acpi_gpio.h
> 
> diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
> index f11d8e3..5c9b384 100644
> --- a/drivers/gpio/Kconfig
> +++ b/drivers/gpio/Kconfig
> @@ -49,6 +49,10 @@ config OF_GPIO
>   def_bool y
>   depends on OF
>  
> +config GPIO_ACPI
> + def_bool y
> + depends on ACPI
> +
>  config DEBUG_GPIO
>   bool "Debug GPIO calls"
>   depends on DEBUG_KERNEL
> diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
> index 9aeed67..420dbac 100644
> --- a/drivers/gpio/Makefile
> +++ b/drivers/gpio/Makefile
> @@ -4,6 +4,7 @@ ccflags-$(CONFIG_DEBUG_GPIO)  += -DDEBUG
>  
>  obj-$(CONFIG_GPIOLIB)+= gpiolib.o devres.o
>  obj-$(CONFIG_OF_GPIO)+= gpiolib-of.o
> +obj-$(CONFIG_GPIO_ACPI)  += gpiolib-acpi.o
>  
>  # Device drivers. Generally keep list sorted alphabetically
>  obj-$(CONFIG_GPIO_GENERIC)   += gpio-generic.o
> diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c
> new file mode 100644
> index 000..8ef9831
> --- /dev/null
> +++ b/drivers/gpio/gpiolib-acpi.c
> @@ -0,0 +1,56 @@
> +/*
> + * ACPI helpers for GPIO API
> + *
> + * Copyright (C) 2012, Intel Corporation
> + * Authors: Mathias Nyman 
> + *  Mika Westerberg 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static int acpi_gpiochip_find(struct gpio_chip *gc, void *data)
> +{
> + acpi_handle handle = data;
> +
> + if (!gc->dev)
> + return false;
> +
> + return gc->dev->acpi_handle == handle;
> +}
> +
> +/**
> + * acpi_get_gpio() - Translate ACPI GPIO pin to GPIO number usable with GPIO 
> API
> + * @path:ACPI GPIO controller full path name, (e.g. "\\_SB.GPO1")
> + * @pin: ACPI GPIO pin number (0-based, controller-relative)
> + *
> + * Returns GPIO number to use with Linux generic GPIO API, or errno error 
> value
> + */
> +
> +int acpi_get_gpio(char *path, int pin)
> +{
> + struct gpio_chip *chip;
> + acpi_handle handle;
> + acpi_status status;
> +
> + status = acpi_get_handle(NULL, path, &handle);
> + if (ACPI_FAILURE(status))
> + return -ENODEV;
> +
> + chip = gpiochip_find(handle, acpi_gpiochip_find);
> + if (!chip)
> + return -ENODEV;
> +
> + if (!gpio_is_valid(chip->base + pin))
> + return -EINVAL;
> +
> + return chip->base + pin;
> +}
> +EXPORT_SYMBOL_GPL(acpi_get_gpio);
> diff --git a/include/linux/acpi_gpio.h b/include/linux/acpi_gpio.h
> new file mode 100644
> index 000..91615a3
> --- /dev/null
> +++ b/include/linux/acpi_gpio.h
> @@ -0,0 +1,19 @@
> +#ifndef _LINUX_ACPI_GPIO_H_
> +#define _LINUX_ACPI_GPIO_H_
> +
> +#include 
> +
> +#ifdef CONFIG_GPIO_ACPI
> +
> +int acpi_get_gpio(char *path, int pin);
> +
> +#else /* CONFIG_GPIO_ACPI */
> +
> +static inline int acpi_get_gpio(char *path, int pin)
> +{
> + return -ENODEV;
> +}
> +
> +#endif /* CONFIG_GPIO_ACPI */
> +
> +#endif /* _LINUX_ACPI_GPIO_H_ */
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] spi / ACPI: add ACPI enumeration support

2012-11-16 Thread Rafael J. Wysocki

On Thursday, November 15, 2012 01:03:16 PM Mika Westerberg wrote:
> ACPI 5 introduced SPISerialBus resource that allows us to enumerate and
> configure the SPI slave devices behind the SPI controller. This patch adds
> support for this to the SPI core.
> 
> In addition we bind ACPI nodes to SPI devices. This makes it possible for
> the slave drivers to get the ACPI handle for further configuration.
> 
> Signed-off-by: Mika Westerberg 

Does anyone has objections or comments?  Do we need to change anything
in this patch?

Grant? Linus? Bjorn?

Rafael


> ---
>  drivers/spi/spi.c |  201 
> -
>  1 file changed, 200 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 84c2861..e39a484 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -35,6 +35,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  static void spidev_release(struct device *dev)
>  {
> @@ -93,6 +95,10 @@ static int spi_match_device(struct device *dev, struct 
> device_driver *drv)
>   if (of_driver_match_device(dev, drv))
>   return 1;
>  
> + /* Then try ACPI */
> + if (acpi_driver_match_device(dev, drv))
> + return 1;
> +
>   if (sdrv->id_table)
>   return !!spi_match_id(sdrv->id_table, spi);
>  
> @@ -888,6 +894,196 @@ static void of_register_spi_devices(struct spi_master 
> *master)
>  static void of_register_spi_devices(struct spi_master *master) { }
>  #endif
>  
> +#ifdef CONFIG_ACPI
> +static int acpi_spi_add_resource(struct acpi_resource *ares, void *data)
> +{
> + struct acpi_resource_spi_serialbus *sb;
> + struct spi_device *spi = data;
> +
> + if (ares->type != ACPI_RESOURCE_TYPE_SERIAL_BUS)
> + return 0;
> +
> + sb = &ares->data.spi_serial_bus;
> + if (sb->type != ACPI_RESOURCE_SERIAL_TYPE_SPI)
> + return 0;
> +
> + spi->chip_select = sb->device_selection;
> + spi->max_speed_hz = sb->connection_speed;
> +
> + if (sb->clock_phase == ACPI_SPI_SECOND_PHASE)
> + spi->mode |= SPI_CPHA;
> + if (sb->clock_polarity == ACPI_SPI_START_HIGH)
> + spi->mode |= SPI_CPOL;
> + if (sb->device_polarity == ACPI_SPI_ACTIVE_HIGH)
> + spi->mode |= SPI_CS_HIGH;
> +
> + return 1;
> +}
> +
> +static acpi_status acpi_spi_add_device(acpi_handle handle, u32 level,
> +void *data, void **return_value)
> +{
> + struct spi_master *master = data;
> + struct resource_list_entry *rentry;
> + struct list_head resource_list;
> + struct acpi_device *adev;
> + struct spi_device *spi;
> + int ret;
> +
> + if (acpi_bus_get_device(handle, &adev))
> + return AE_OK;
> + if (acpi_bus_get_status(adev) || !adev->status.present)
> + return AE_OK;
> +
> + spi = spi_alloc_device(master);
> + if (!spi) {
> + dev_err(&master->dev, "failed to allocate SPI device for %s\n",
> + dev_name(&adev->dev));
> + return AE_NO_MEMORY;
> + }
> +
> + INIT_LIST_HEAD(&resource_list);
> + ret = acpi_dev_get_resources(adev, &resource_list,
> +  acpi_spi_add_resource, spi);
> + if (ret < 0)
> + goto fail_put_dev;
> +
> + list_for_each_entry(rentry, &resource_list, node) {
> + struct resource *r = &rentry->res;
> +
> + if (resource_type(r) == IORESOURCE_IRQ) {
> + spi->irq = r->start;
> + break;
> + }
> + }
> +
> + acpi_dev_free_resource_list(&resource_list);
> +
> + if (!spi->max_speed_hz)
> + goto fail_put_dev;
> +
> + strlcpy(spi->modalias, dev_name(&adev->dev), sizeof(spi->modalias));
> + if (spi_add_device(spi)) {
> + dev_err(&master->dev, "failed to add SPI device %s from ACPI\n",
> + dev_name(&adev->dev));
> + goto fail_put_dev;
> + }
> +
> + return AE_OK;
> +
> +fail_put_dev:
> + spi_dev_put(spi);
> +
> + return AE_OK;
> +}
> +
> +static void acpi_register_spi_devices(struct spi_master *master)
> +{
> + acpi_status status;
> + acpi_handle handle;
> +
> + handle = master->dev.acpi_handle;
> + if (!handle)
> + return;
> +
> + status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1,
> +  acpi_spi_add_device, NULL,
> +  master, NULL);
> + if (ACPI_FAILURE(status))
> + dev_warn(&master->dev, "failed to enumerate SPI slaves\n");
> +}
> +
> +struct acpi_spi_find {
> + acpi_handle handle;
> + u16 chip_select;
> + bool found;
> +};
> +
> +static int acpi_spi_find_child_address(struct acpi_resource *ares, void 
> *data)
> +{
> + struct acpi_resource_spi_serialbus *sb;
> + struct acpi_spi_find *spi_find = data;
> +
> + if (a

Re: [PATCH v2 3/3] i2c / ACPI: add ACPI enumeration support

2012-11-16 Thread Rafael J. Wysocki

On Thursday, November 15, 2012 01:03:17 PM Mika Westerberg wrote:
> ACPI 5 introduced I2cSerialBus resource that makes it possible to enumerate
> and configure the I2C slave devices behind the I2C controller. This patch
> adds helper functions to support I2C slave enumeration.
> 
> An ACPI enabled I2C controller driver only needs to call 
> acpi_i2c_register_devices()
> in order to get its slave devices enumerated, created and bound to the
> corresponding ACPI handle.
> 
> Signed-off-by: Mika Westerberg 

Are there any objections against this patch or comments?

Worfram? Jean? Ben?

Rafael


> ---
>  drivers/acpi/Kconfig |6 ++
>  drivers/acpi/Makefile|1 +
>  drivers/acpi/acpi_i2c.c  |  212 
> ++
>  drivers/i2c/i2c-core.c   |   10 +++
>  include/linux/acpi_i2c.h |   27 ++
>  5 files changed, 256 insertions(+)
>  create mode 100644 drivers/acpi/acpi_i2c.c
>  create mode 100644 include/linux/acpi_i2c.h
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 119d58d..0300bf6 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -181,6 +181,12 @@ config ACPI_DOCK
> This driver supports ACPI-controlled docking stations and removable
> drive bays such as the IBM Ultrabay and the Dell Module Bay.
>  
> +config ACPI_I2C
> + def_tristate I2C
> + depends on I2C
> + help
> +   ACPI I2C enumeration support.
> +
>  config ACPI_PROCESSOR
>   tristate "Processor"
>   select THERMAL
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 7289828..2a4502b 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -70,6 +70,7 @@ obj-$(CONFIG_ACPI_HED)  += hed.o
>  obj-$(CONFIG_ACPI_EC_DEBUGFS)+= ec_sys.o
>  obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o
>  obj-$(CONFIG_ACPI_BGRT)  += bgrt.o
> +obj-$(CONFIG_ACPI_I2C)   += acpi_i2c.o
>  
>  # processor has its own "processor." module_param namespace
>  processor-y  := processor_driver.o processor_throttling.o
> diff --git a/drivers/acpi/acpi_i2c.c b/drivers/acpi/acpi_i2c.c
> new file mode 100644
> index 000..5bf2dad
> --- /dev/null
> +++ b/drivers/acpi/acpi_i2c.c
> @@ -0,0 +1,212 @@
> +/*
> + * ACPI I2C enumeration support
> + *
> + * Copyright (C) 2012, Intel Corporation
> + * Author: Mika Westerberg 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +ACPI_MODULE_NAME("i2c");
> +
> +static int acpi_i2c_add_resource(struct acpi_resource *ares, void *data)
> +{
> + struct acpi_resource_i2c_serialbus *sb;
> + struct i2c_board_info *info = data;
> +
> + if (ares->type != ACPI_RESOURCE_TYPE_SERIAL_BUS)
> + return 0;
> +
> + sb = &ares->data.i2c_serial_bus;
> + if (sb->type != ACPI_RESOURCE_SERIAL_TYPE_I2C)
> + return 0;
> +
> + info->addr = sb->slave_address;
> + if (sb->access_mode == ACPI_I2C_10BIT_MODE)
> + info->flags |= I2C_CLIENT_TEN;
> +
> + return 1;
> +}
> +
> +static acpi_status acpi_i2c_add_device(acpi_handle handle, u32 level,
> +void *data, void **return_value)
> +{
> + struct i2c_adapter *adapter = data;
> + struct resource_list_entry *rentry;
> + struct list_head resource_list;
> + struct i2c_board_info info;
> + struct acpi_device *adev;
> + int ret;
> +
> + if (acpi_bus_get_device(handle, &adev))
> + return AE_OK;
> + if (acpi_bus_get_status(adev) || !adev->status.present)
> + return AE_OK;
> +
> + memset(&info, 0, sizeof(info));
> +
> + INIT_LIST_HEAD(&resource_list);
> + ret = acpi_dev_get_resources(adev, &resource_list,
> +  acpi_i2c_add_resource, &info);
> + if (ret < 0)
> + return AE_OK;
> +
> + list_for_each_entry(rentry, &resource_list, node) {
> + struct resource *r = &rentry->res;
> +
> + if (resource_type(r) == IORESOURCE_IRQ) {
> + info.irq = r->start;
> + break;
> + }
> + }
> +
> + acpi_dev_free_resource_list(&resource_list);
> +
> + if (!info.addr)
> + return AE_OK;
> +
> + strlcpy(info.type, dev_name(&adev->dev), sizeof(info.type));
> + if (!i2c_new_device(adapter, &info)) {
> + dev_err(&adapter->dev,
> + "failed to add I2C device %s from ACPI\n",
> + dev_name(&adev->dev));
> + }
> +
> + return AE_OK;
> +}
> +
> +/**
> + * acpi_i2c_register_devices - enumerate I2C slave devices behind adapter
> + * @adapter: pointer to adapter
> + *
> + * Enumerate all I2C slave devices behind this adapter by walking

Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset

2012-11-16 Thread Marcelo Tosatti

On Fri, Nov 16, 2012 at 05:09:40PM +0900, Yoshihiro YUNOMAE wrote:
> Hi Marcelo,
> 
> Thank you for commenting on my patch set.
> 
> (2012/11/16 12:19), Marcelo Tosatti wrote:
> >On Wed, Nov 14, 2012 at 10:36:21AM +0900, Yoshihiro YUNOMAE wrote:
> [...]
> >>In this summary, I suggest the patch which TSC offset for each guest can be
> >>output on the host.
> >
> >The guest TSC can change (for example if TSC scaling is used). Moreover
> >TSC offset can change, and you'd have to monitor that. What
> 
> Yes, that's true. Changing TSC offset is the key point to use TSC for
> merging trace data of guests and the host.
> 
> >about a module option so that tsc_offset is written as zero (to be
> >used as debugging tool). Then the following restrictions apply:
> >
> >- TSC must be synchronized across CPUs/VCPUS.
> >- TSC must be reliable.
> >
> >Would that suffice? (a module option to kvm.ko, say zero_tsc_offset).
> 
> As you say, the guest TSC can change, so guest TSC needs to meet these
> two restrictions to merge the trace data in chronological order.
> 
> However, the zero-TSC offset method is not enough, I think.
> I will use TSC values as the tracing timestamp not only for debugging
> but for failure analysis on actual operations. When we introduce
> the zero-TSC offset, normally it will be no problem. However, if
> the guest executes write_tsc or the guest works live migration, TSC
> offset will be changed. After all, we need to monitor the TSC offset
> value.
> 
> Thank you,

What i wrote was not precise.

With TSC scaling (of AMD's svm.c) or TSC trapping, the guest RDTSC is
not monotonic with reference to the host TSC. That is, TSC scaling
and/or trapping are fundamentally incompatible with the feature you
propose.

What you are saying is that is you'd like this feature for production,
not debugging. Unfortunately this requires a synchronized TSC across
CPUs, which is not widespread. Therefore the suggestion to introduce a
debugging facility (you cannot rely on this for failure analysis on all
systems).

Even then, a more reliable method for reporting TSC offset must be used.

Apart from these issues, it is very useful to correlate guest/host
events in the way you propose.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden

2012-11-16 Thread Rafael J. Wysocki

On Friday, November 16, 2012 09:27:05 AM Huang Ying wrote:
> On Fri, 2012-11-16 at 02:29 +0100, Rafael J. Wysocki wrote:
> > On Friday, November 16, 2012 08:54:56 AM Huang Ying wrote:
> > > On Fri, 2012-11-16 at 01:55 +0100, Rafael J. Wysocki wrote:
> > > > On Friday, November 16, 2012 01:44:00 AM Rafael J. Wysocki wrote:
> > > > > On Friday, November 16, 2012 08:36:14 AM Huang Ying wrote:
> > > > > > On Thu, 2012-11-15 at 10:51 +0100, Rafael J. Wysocki wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > 
> > > > > > For this situation, if user "echo auto > .../power/control" for the
> > > > > > device, the runtime PM callbacks of device will be called.  I think 
> > > > > > that
> > > > > > is not intended.  So I think it is better to use some kind of flag 
> > > > > > or
> > > > > > state for that.
> > > > > 
> > > > > I'm not sure what situation exactly you have in mind.  Care to give an
> > > > > exact scenario?
> > > > 
> > > > Ah, I see.  When we've just called drv->remove(), there is a window in
> > > > which user space may cause the driver's runtime PM callbacks to be
> > > > executed by changing its attribute to "auto".
> > > > 
> > > > So perhaps we should check pci_dev->driver rather than 
> > > > pci_dev->dev.driver
> > > > in the runtime PM callbacks?  With a few more changes that should allow 
> > > > us
> > > > to close that race.
> > > 
> > > Yes.  And I think, with pci_dev->driver (after some changes suggested by
> > > Alan), we need not to use pm_runtime_get/put_skip_callbacks().
> > 
> > Good.  Can you please prepare a patch, then? :-)
> 
> Sure.

Cool, thanks!

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Correct description of SwapFree in Documentation/filesystems/proc.txt

2012-11-16 Thread Michal Hocko

On Fri 16-11-12 08:01:42, Michael Kerrisk wrote:
> After migrating most of the information in 
> Documentation/filesystems/proc.txt to the proc(5) man page,
> Jim Paris pointed out to me that the description of SwapFree
> in the man page seemed wrong. I think Jim is right,
> but am given pause by fact that that text has been in 
> Documentation/filesystems/proc.txt since at least 2.6.0.
> Anyway, I believe that the patch below fixes things.

Yes, this goes back to 2003 when the /proc/meminfo doc has been added.

> 
> Signed-off-by: Michael Kerrisk 

Reviewed-by: Michal Hocko 

> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index a1793d6..cf4260f 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -778,8 +778,7 @@ AnonHugePages:   49152 kB
>other things, it is where everything from the Slab is
>allocated.  Bad things happen when you're out of lowmem.
> SwapTotal: total amount of swap space available
> -SwapFree: Memory which has been evicted from RAM, and is temporarily
> -  on the disk
> +SwapFree: Amount of swap space that is currently unused.
> Dirty: Memory which is waiting to get written back to the disk
> Writeback: Memory which is actively being written back to the disk
> AnonPages: Non-file backed pages mapped into userspace page tables
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the pm tree with the pci tree

2012-11-16 Thread Rafael J. Wysocki

On Tuesday, November 06, 2012 09:52:40 AM Bjorn Helgaas wrote:
> On Mon, Nov 5, 2012 at 7:48 PM, Stephen Rothwell  
> wrote:
> > Hi Rafael,
> >
> > Today's linux-next merge of the pm tree got a conflict in
> > arch/x86/pci/acpi.c between commit 3f385fa9edce ("x86/PCI: Ignore _SEG on
> > HP xw9300") from the pci tree and commit 97a7108a3c00 ("ACPI / x86: Add
> > quirk for "CheckPoint P-20-00" to not use bridge _CRS_ info") from the pm
> > tree.
> >
> > I fixed it up (see below) and can carry the fix as necessary (no action
> > is required).
> 
> My opinion was that we should just drop the CheckPoint quirk.  See
> https://bugzilla.kernel.org/show_bug.cgi?id=47981#c36 for my
> rationale.

OK, I agree.  I'll revert that commit, then.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] fat: fix time updates for create and delete

2012-11-16 Thread Namjae Jeon

2012/11/15, OGAWA Hirofumi :
> Namjae Jeon  writes:
>
>> 1)If we consider the code for FAT and MSDOS to be same with respect to
>> timing updates - there there is difference in code as mentioned below:
>> a) For:vfat_rmdir()
>>  inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC;
>>For:msdos_rmdir()
>>  inode->i_ctime = CURRENT_TIME_SEC;
>>  
>> b) For: vfat_unlink()
>>  inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC;
>>For: msdos_unlink()
>>  inode->i_ctime = CURRENT_TIME_SEC;
>>
>> So, atleast the uniformity in the code is missing for MSDOS and VFAT.
>
> MSDOS doesn't have atime.
>
>> 2) Regarding timings information
>> Original timings on VFAT:
>> #> mkdir parent
>> #> stat parent
>> File type:directory
>> I-node number:30
>> Mode: 40755 (octal)
>> Link count:   2
>> Ownership:UID=0   GID=0
>> Preferred I/O block size: 4096 bytes
>> File size:4096 bytes
>> Blocks allocated: 8
>> Last status change:   Thu Jan  1 00:01:29 2012
>> Last file access: Thu Jan  1 00:01:29 2012
>> Last file modification:   Thu Jan  1 00:01:29 2012
>
> [...]
>
>> As can be seen from the 'stat' information - the timing information
>> appears same for VFAT like EXT4/XFS after changes.
>> Please let me know your opinion.
>
> You have to think about compatibility with other FAT, not unix fs.

Agreed, ctime is creation time, and there are comptability issues with
the patch.

But there is confusion about 'ctime' usage in the default code. When
referring the code I found many instances except 'fat_fill_inode'
where 'ctime' is updated as if it is 'change time' instead of
'creation time' like in functions: fat_write_end(), fat_cont_expend(),
fat_free(), vfat_add_entry().

As a case when I check using a simple case:
dd if=/dev/zero of=./samplefile bs=4096 count=10
=> check file timings
wait for 2minutes
Now, append to this file
echo "this is simple string to be appended" >> samplefile
=> check file timings

I can see - it resulted in change in 'ctime' and 'mtime'.
Now, when Connecting this Drive to Windows - it shows the time of
'second write' as the CREATION time as well as "Modification time".
If you agree that this is a strange/problem. I can try to fix the
timestamp of linux FAT checking this compatability pattern to the
nearest.
Let me know your opinion.

Thanks.

> --
> OGAWA Hirofumi 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/2] x86: clear vmcss on all cpus when doing kdump if necessary

2012-11-16 Thread zhangyanfei

Hello Marcelo,

Any thoughts?


于 2012年11月14日 17:29, zhangyanfei 写道:
> Currently, kdump just makes all the logical processors leave VMX operation by
> executing VMXOFF instruction, so any VMCSs active on the logical processors 
> may
> be corrupted. But, sometimes, we need the VMCSs to debug guest images 
> contained
> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
> before
> executing the VMXOFF instruction.
> 
> The patch set provides a way to VMCLEAR vmcss related to guests on all cpus 
> before
> executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the
> vmcore updated and non-corrupted.
> 
> Changelog from v3 to v4:
> 1. add a new percpu variable vmclear_skipped to skip
>vmclear in kdump in some conditions.
> 
> Changelog from v2 to v3:
> 1. remove unnecessary conditions in function
>cpu_emergency_clear_loaded_vmcss as Marcelo suggested.
> 
> Changelog from v1 to v2:
> 1. remove the sysctl and clear VMCSs unconditionally.
> 
> Zhang Yanfei (2):
>   x86/kexec: VMCLEAR vmcss on all cpus if necessary
>   KVM: set/unset crash_clear_loaded_vmcss and vmclear_skipped in
> kvm_intel module
> 
>  arch/x86/include/asm/kexec.h |3 +++ 
>  arch/x86/kernel/crash.c  |   32 
>  arch/x86/kvm/vmx.c   |   32 
>  3 files changed, 67 insertions(+), 0 deletions(-)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 PATCH 2/2] sched: Use Per-Entity-Load-Tracking metric for load balancing

2012-11-16 Thread Preeti U Murthy

Hi Vincent,
Thank you for your review.

On 11/15/2012 11:43 PM, Vincent Guittot wrote:
> Hi Preeti,
> 
> On 15 November 2012 17:54, Preeti U Murthy  wrote:
>> Currently the load balancer weighs a task based upon its priority,and this
>> weight consequently gets added up to the weight of the run queue that it is
>> on.It is this weight of the runqueue that sums up to a sched group's load
>> which is used to decide the busiest or the idlest group and the runqueue
>> thereof.
>>
>> The Per-Entity-Load-Tracking metric however measures how long a task has
>> been runnable over the duration of its lifetime.This gives us a hint of
>> the amount of CPU time that the task can demand.This metric takes care of the
>> task priority as well.Therefore apart from the priority of a task we also
>> have an idea of the live behavior of the task.This seems to be a more
>> realistic metric to use to compute task weight which adds upto the run queue
>> weight and the weight of the sched group.Consequently they can be used for
>> load balancing.
>>
>> The semantics of load balancing is left untouched.The two functions
>> load_balance() and select_task_rq_fair() perform the task of load
>> balancing.These two paths have been browsed through in this patch to make
>> necessary changes.
>>
>> weighted_cpuload() and task_h_load() provide the run queue weight and the
>> weight of the task respectively.They have been modified to provide the
>> Per-Entity-Load-Tracking metric as relevant for each.
>> The rest of the modifications had to be made to suit these two changes.
>>
>> Completely Fair Scheduler class is the only sched_class which contributes to
>> the run queue load.Therefore the rq->load.weight==cfs_rq->load.weight when
>> the cfs_rq is the root cfs_rq (rq->cfs) of the hierarchy.When replacing this
>> with Per-Entity-Load-Tracking metric,cfs_rq->runnable_load_avg needs to be
>> used as this is the right reflection of the run queue load when
>> the cfs_rq is the root cfs_rq (rq->cfs) of the hierarchy.This metric reflects
>> the percentage uptime of the tasks that are queued on it and hence that 
>> contribute
>> to the load.Thus cfs_rq->runnable_load_avg replaces the metric earlier used 
>> in
>> weighted_cpuload().
>>
>> The task load is aptly captured by se.avg.load_avg_contrib which captures the
>> runnable time vs the alive time of the task against its priority.This metric
>> replaces the earlier metric used in task_h_load().
>>
>> The consequent changes appear as data type changes for the helper variables;
>> they abound in number.Because cfs_rq->runnable_load_avg needs to be big 
>> enough
>> to capture the tasks' load often and accurately.
> 
> You are now using cfs_rq->runnable_load_avg instead of
> cfs_rq->load.weight for calculation of cpu_load but
> cfs_rq->runnable_load_avg is smaller or equal to cfs_rq->load.weight
> value. This implies that the new value is smaller or equal to the old
> statistic so you should be able to keep the same variable width for
> the computation of cpu_load

Right.But cfs_rq->runnable_load_avg is a 64 bit unsigned integer as per
the Per-entity-load-tracking patchset.I could not figure out why this is
the case although as you mention, its value will not exceed
cfs_rq->load.weight.In order to retain the data type of
cfs_rq->runnable_load_avg as it is,these changes had to be made to suit
it.It would be good if someone would clarify why it is a 64 bit
integer,will save a lot of trouble if we could consider this the same
length as cfs_rq->load.weight.Ben,Paul? can you clarify this point?
> 
>>
>> The following patch does not consider CONFIG_FAIR_GROUP_SCHED AND
>> CONFIG_SCHED_NUMA.This is done so as to evaluate this approach starting from 
>> the
>> simplest scenario.Earlier discussions can be found in the link below.
>>
>> Link: https://lkml.org/lkml/2012/10/25/162
>> Signed-off-by: Preeti U Murthy
>> ---
>>  include/linux/sched.h |2 +-
>>  kernel/sched/core.c   |   12 +
>>  kernel/sched/fair.c   |   64 
>> +
>>  kernel/sched/sched.h  |2 +-
>>  4 files changed, 40 insertions(+), 40 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 087dd20..302756e 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -924,7 +924,7 @@ struct sched_domain {
>> unsigned int lb_count[CPU_MAX_IDLE_TYPES];
>> unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
>> unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
>> -   unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
>> +   u64 lb_imbalance[CPU_MAX_IDLE_TYPES];
>> unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
>> unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
>> unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 24d8b9b..4dea057 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2415,8 +2415,8 @@ static const unsigned char
>

Re: [PATCH] Correct description of SwapFree in Documentation/filesystems/proc.txt

2012-11-16 Thread Kamezawa Hiroyuki


(2012/11/16 16:01), Michael Kerrisk wrote:

After migrating most of the information in
Documentation/filesystems/proc.txt to the proc(5) man page,
Jim Paris pointed out to me that the description of SwapFree
in the man page seemed wrong. I think Jim is right,
but am given pause by fact that that text has been in
Documentation/filesystems/proc.txt since at least 2.6.0.
Anyway, I believe that the patch below fixes things.

Signed-off-by: Michael Kerrisk 



Acked-by: KAMEZAWA Hiroyuki 



diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index a1793d6..cf4260f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -778,8 +778,7 @@ AnonHugePages:   49152 kB
other things, it is where everything from the Slab is
allocated.  Bad things happen when you're out of lowmem.
 SwapTotal: total amount of swap space available
-SwapFree: Memory which has been evicted from RAM, and is temporarily
-  on the disk
+SwapFree: Amount of swap space that is currently unused.
 Dirty: Memory which is waiting to get written back to the disk
 Writeback: Memory which is actively being written back to the disk
 AnonPages: Non-file backed pages mapped into userspace page tables
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: bump PREALLOC_DMA_DEBUG_ENTRIES

2012-11-16 Thread Maarten Lankhorst

I ran out of free entries when I had CONFIG_DMA_API_DEBUG enabled.
Some other archs seem to default to 65536, so increase this limit for x86 too.

Signed-off-by: Maarten Lankhorst 

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de2b7ad..f5597c5 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -56,7 +56,7 @@ struct device x86_dma_fallback_dev = {
 EXPORT_SYMBOL(x86_dma_fallback_dev);
 
 /* Number of entries preallocated for DMA-API debugging */
-#define PREALLOC_DMA_DEBUG_ENTRIES   32768
+#define PREALLOC_DMA_DEBUG_ENTRIES   65536
 
 int dma_set_mask(struct device *dev, u64 mask)
 {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] LDT - Linux Driver Template

2012-11-16 Thread Constantine Shulyupin

On Fri, Nov 16, 2012 at 11:46 AM, Bjørn Mork  wrote:
> Greg KH  writes:
>
>>  Normally you just start with a
>> driver for a device like the one you need to write and modify it from
>> there.
>
> Yes.
>
> Even if the template driver is fixed up to be the most beautiful driver
> ever made, it will still always be made for non-existing hardware.  This
> causes two major problems:
>  - the driver will not be tested, so it will have bugs
>  - the driver will not be used by anyone, so it will not be maintained
>(remember that it is initially perfect, so there is no reason to
>change it)
Thanks. I seems you have missed.
The main advantage of LDT - is working driver with real HW and simple
test suite.
It implements trivial UART driver just to write to port, receive real
HW interrupt
and read data from port. (You can to suggest to use any another HW).
Without available UART, LDT emulates loopback in SW for testing.
Memory buffers are used for mmap and ioctl operations.
LDT test script ldt-test and test utility dio.c configures the driver
for loopback mode, passes data to the driver, receives back and
compares with input and gives result of comparison.
To perform validation tests, regression tests need simply to run test script.
Detailed kernel log and ftrace log during the test are saved for analysts.

> May I suggest another approach?  How about selecting a set of existing
> drivers which are suitable as templates, and put all this effort into
> making those drivers *the* perfect examples instead? Start submitting
> cleanup patches for the selected drivers until everyone is satisfied and
> then document them as starting points for anyone wanting to write a
> similar driver.
Thank you. Possible too. Can you or somebody else recommend such drivers?

>
> I believe many subsystem maintainers already have such sample drivers
> which they point new submitters to when asked.  That does not mean that
> these drivers necessarily are perfect, so there is still work to do here
> for anyone interested.  And collecting this information and documenting
> it would be useful in itself.
Thanks. I already research omap panda platform for improvement opportunities.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 1/4] genalloc: add a global pool list, allow to find pools by phys address

2012-11-16 Thread Philipp Zabel

This patch keeps all created pools in a global list and adds two
functions that allow to retrieve the gen_pool pointer from a known
physical address and from a device tree node.

Signed-off-by: Philipp Zabel 
Reviewed-by: Shawn Guo 
---
 include/linux/genalloc.h |   14 ++
 lib/genalloc.c   |   67 ++
 2 files changed, 81 insertions(+)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dd7c569..91d606e 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -47,6 +47,7 @@ typedef unsigned long (*genpool_algo_t)(unsigned long *map,
  *  General purpose special memory pool descriptor.
  */
 struct gen_pool {
+   struct list_head next_pool; /* pool in global list */
spinlock_t lock;
struct list_head chunks;/* list of chunks in this pool */
int min_alloc_order;/* minimum allocation order */
@@ -105,4 +106,17 @@ extern unsigned long gen_pool_first_fit(unsigned long 
*map, unsigned long size,
 extern unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data);
 
+extern struct gen_pool *gen_pool_find_by_phys(phys_addr_t phys);
+
+struct device_node;
+#ifdef CONFIG_OF
+extern struct gen_pool *of_get_named_gen_pool(struct device_node *np,
+   const char *propname, int index);
+#else
+inline struct gen_pool *of_get_named_gen_pool(struct device_node *np,
+   const char *propname, int index)
+{
+   return NULL;
+}
+#endif
 #endif /* __GENALLOC_H__ */
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 5492043..edf4bf3 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -34,6 +34,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
+static LIST_HEAD(pools);
+static DEFINE_SPINLOCK(list_lock);
 
 static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
 {
@@ -154,6 +159,9 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int 
nid)
pool->min_alloc_order = min_alloc_order;
pool->algo = gen_pool_first_fit;
pool->data = NULL;
+   spin_lock(&list_lock);
+   list_add_rcu(&pool->next_pool, &pools);
+   spin_unlock(&list_lock);
}
return pool;
 }
@@ -236,6 +244,9 @@ void gen_pool_destroy(struct gen_pool *pool)
int order = pool->min_alloc_order;
int bit, end_bit;
 
+   spin_lock(&list_lock);
+   list_del_rcu(&pool->next_pool);
+   spin_unlock(&list_lock);
list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
list_del(&chunk->next_chunk);
@@ -480,3 +491,59 @@ unsigned long gen_pool_best_fit(unsigned long *map, 
unsigned long size,
return start_bit;
 }
 EXPORT_SYMBOL(gen_pool_best_fit);
+
+/*
+ * gen_pool_find_by_phys - find a pool by physical start address
+ * @phys: physical address as added with gen_pool_add_virt
+ *
+ * Returns the pool that contains the chunk starting at phys,
+ * or NULL if not found.
+ */
+struct gen_pool *gen_pool_find_by_phys(phys_addr_t phys)
+{
+   struct gen_pool *pool, *found = NULL;
+   struct gen_pool_chunk *chunk;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(pool, &pools, next_pool) {
+   list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
+   if (phys == chunk->phys_addr) {
+   found = pool;
+   break;
+   }
+   }
+   }
+   rcu_read_unlock();
+
+   return found;
+}
+EXPORT_SYMBOL_GPL(gen_pool_find_by_phys);
+
+#ifdef CONFIG_OF
+/**
+ * of_get_named_gen_pool - find a pool by phandle property
+ * @np: device node
+ * @propname: property name containing phandle(s)
+ * @index: index into the phandle array
+ *
+ * Returns the pool that contains the chunk starting at the physical
+ * address of the device tree node pointed at by the phandle property,
+ * or NULL if not found.
+ */
+struct gen_pool *of_get_named_gen_pool(struct device_node *np,
+   const char *propname, int index)
+{
+   struct device_node *np_pool;
+   struct resource res;
+   int ret;
+
+   np_pool = of_parse_phandle(np, propname, index);
+   if (!np_pool)
+   return NULL;
+   ret = of_address_to_resource(np_pool, 0, &res);
+   if (ret < 0)
+   return NULL;
+   return gen_pool_find_by_phys((phys_addr_t) res.start);
+}
+EXPORT_SYMBOL_GPL(of_get_named_gen_pool);
+#endif /* CONFIG_OF */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 0/4] Add generic driver for on-chip SRAM

2012-11-16 Thread Philipp Zabel

These patches add support to configure on-chip SRAM via device-tree
node or platform data and to obtain the resulting genalloc pool from
the physical address or a phandle pointing at the device tree node.
This allows drivers to allocate SRAM with the genalloc API without
hard-coding the genalloc pool pointer.

The on-chip SRAM on i.MX53 and i.MX6q can be registered via device tree
and changed to use the simple generic SRAM driver:

ocram: ocram@0090 {
compatible = "fsl,imx-ocram", "sram";
reg = <0x0090 0x3f000>;
};

A driver that needs to allocate SRAM buffers, like the video processing
unit on i.MX53, can retrieve the genalloc pool from a phandle in the
device tree using of_get_named_gen_pool(node, "iram", 0) from patch 1:

vpu@63ff4000 {
/* ... */
iram = <&ocram>;
};

Changes since v5:
 - Addressed Paul Gortmaker's comments, merging the clock patch
   into the SRAM driver patch.
 - Hard coded the allocation granularity to 512 bytes and dropped
   Matt Porter's patch for now. Whether or not this should be
   configured in the device tree could use further discussion.
 - Added a coda driver patch to use the genalloc API and, again,
   the i.MX53/i.MX6 device tree patch to show the whole picture.

regards
Philipp

---
 Documentation/devicetree/bindings/misc/sram.txt |   17 
 arch/arm/boot/dts/imx53.dtsi|5 +
 arch/arm/boot/dts/imx6q.dtsi|6 ++
 drivers/media/platform/Kconfig  |3 +-
 drivers/media/platform/coda.c   |   47 ++---
 drivers/misc/Kconfig|9 ++
 drivers/misc/Makefile   |1 +
 drivers/misc/sram.c |  121 +++
 include/linux/genalloc.h|   14 +++
 lib/genalloc.c  |   67 +
 10 files changed, 274 insertions(+), 16 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 2/4] misc: Generic on-chip SRAM allocation driver

2012-11-16 Thread Philipp Zabel

This driver requests and remaps a memory region as configured in the
device tree. It serves memory from this region via the genalloc API.
It optionally enables the SRAM clock.

Other drivers can retrieve the genalloc pool from a phandle pointing
to this drivers' device node in the device tree.

Signed-off-by: Philipp Zabel 
Reviewed-by: Shawn Guo 
---
 Documentation/devicetree/bindings/misc/sram.txt |   17 
 drivers/misc/Kconfig|9 ++
 drivers/misc/Makefile   |1 +
 drivers/misc/sram.c |  121 +++
 4 files changed, 148 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/sram.txt
 create mode 100644 drivers/misc/sram.c

diff --git a/Documentation/devicetree/bindings/misc/sram.txt 
b/Documentation/devicetree/bindings/misc/sram.txt
new file mode 100644
index 000..b64136c
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/sram.txt
@@ -0,0 +1,17 @@
+Generic on-chip SRAM
+
+Simple IO memory regions to be managed by the genalloc API.
+
+Required properties:
+
+- compatible : sram
+
+- reg : SRAM iomem address range
+
+Example:
+
+sram: sram@5c00 {
+   compatible = "sram";
+   reg = <0x5c00 0x4>; /* 256 KiB SRAM at address 0x5c00 */
+};
+
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index b151b7c..211468c 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -499,6 +499,15 @@ config USB_SWITCH_FSA9480
  stereo and mono audio, video, microphone and UART data to use
  a common connector port.
 
+config SRAM
+   bool "Generic on-chip SRAM driver"
+   depends on HAS_IOMEM
+   select GENERIC_ALLOCATOR
+   help
+ This driver allows to declare a memory region to be managed
+ by the genalloc API. It is supposed to be used for small
+ on-chip SRAM areas found on many SoCs.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 2129377..d845690 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -49,3 +49,4 @@ obj-y += carma/
 obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
 obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/
 obj-$(CONFIG_INTEL_MEI)+= mei/
+obj-$(CONFIG_SRAM) += sram.o
diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
new file mode 100644
index 000..fec8143
--- /dev/null
+++ b/drivers/misc/sram.c
@@ -0,0 +1,121 @@
+/*
+ * Generic on-chip SRAM allocation driver
+ *
+ * Copyright (C) 2012 Philipp Zabel, Pengutronix
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SRAM_GRANULARITY   512
+
+struct sram_dev {
+   struct gen_pool *pool;
+   struct clk *clk;
+};
+
+static int __devinit sram_probe(struct platform_device *pdev)
+{
+   void __iomem *virt_base;
+   struct sram_dev *sram;
+   struct resource *res;
+   unsigned long size;
+   int ret;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res)
+   return -EINVAL;
+
+   size = resource_size(res);
+
+   virt_base = devm_request_and_ioremap(&pdev->dev, res);
+   if (!virt_base)
+   return -EADDRNOTAVAIL;
+
+   sram = devm_kzalloc(&pdev->dev, sizeof(*sram), GFP_KERNEL);
+   if (!sram)
+   return -ENOMEM;
+
+   sram->clk = devm_clk_get(&pdev->dev, NULL);
+   if (IS_ERR(sram->clk))
+   sram->clk = NULL;
+   else
+   clk_prepare_enable(sram->clk);
+
+   sram->pool = gen_pool_create(ilog2(SRAM_GRANULARITY), -1);
+   if (!sram->pool)
+   return -ENOMEM;
+
+   ret = gen_pool_add_virt(sram->pool, (unsigned long)virt_base,
+   res->start, size, -1);
+   if (ret < 0) {
+   gen_pool_destroy(sram->pool);
+   return ret;
+   }
+
+   platform_set_drvdata(pdev, sram);
+
+   dev_dbg(&pdev->dev, "SRAM pool: %ld KiB @ 0x%p\n", size / 1024, 
virt_base);
+
+   return 0;
+}
+
+static int __devexit sram_remove(st

RE: [PATCH RESEND 0/7] MFD: ti_am335x_tscadc: DT support and TSC features addition

2012-11-16 Thread Patil, Rachna

Hi,

This is just a gentle reminder of the patch set I had posted earlier viz.
"[PATCH RESEND 0/7] MFD: ti_am335x_tscadc: DT support and TSC features addition"
Can this patch set be pulled in if there are no review comments.
This patch set does not break anything existing, it just adds new features and 
DT support for the MFD core and its clients.

Regards,
Rachna

On Wed, Nov 07, 2012 at 12:21:57, Patil, Rachna wrote:
> This patch set is a cumulative set of [1] and [2] sent earlier.
> 
> Note that there are no code changes in either of the patch set, only rebased 
> on top of MFD-next to make sure that all the patches apply without any 
> conflicts.
> 
> This patch set has been tested on AM335x EVM and is based on top of [3].
> 
> [1] http://www.spinics.net/lists/linux-input/msg23060.html
> [2] http://www.spinics.net/lists/linux-input/msg23090.html
> [3] https://lkml.org/lkml/2012/11/6/67
> 
> Patil, Rachna (7):
>   input: ti_am335x_tsc: Step enable bits made configurable
>   input: ti_am335x_tsc: Order of TSC wires, made configurable
>   input: ti_am335x_tsc: Add variance filter
>   MFD: ti_am335x_tscadc: add device tree binding information
>   MFD: ti_am335x_tscadc: Add DT support
>   input: ti_am335x_tsc: Add DT support
>   IIO: ti_am335x_adc: Add DT support
> 
>  .../devicetree/bindings/mfd/ti_am335x_tscadc.txt   |   35 +++
>  drivers/iio/adc/ti_am335x_adc.c|   24 ++-
>  drivers/input/touchscreen/ti_am335x_tsc.c  |  239 
> +---
>  drivers/mfd/ti_am335x_tscadc.c |   28 ++-
>  include/linux/input/ti_am335x_tsc.h|   12 +
>  include/linux/mfd/ti_am335x_tscadc.h   |   11 +-
>  6 files changed, 308 insertions(+), 41 deletions(-)  create mode 100644 
> Documentation/devicetree/bindings/mfd/ti_am335x_tscadc.txt
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 3/4] media: coda: use genalloc API

2012-11-16 Thread Philipp Zabel

This patch depends on "genalloc: add a global pool list,
allow to find pools by phys address", which provides the
of_get_named_gen_pool function.

Signed-off-by: Philipp Zabel 
---
 drivers/media/platform/Kconfig |3 +--
 drivers/media/platform/coda.c  |   47 
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 181c768..09d45c6 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -130,10 +130,9 @@ if V4L_MEM2MEM_DRIVERS
 
 config VIDEO_CODA
tristate "Chips&Media Coda multi-standard codec IP"
-   depends on VIDEO_DEV && VIDEO_V4L2 && ARCH_MXC
+   depends on VIDEO_DEV && VIDEO_V4L2
select VIDEOBUF2_DMA_CONTIG
select V4L2_MEM2MEM_DEV
-   select IRAM_ALLOC if SOC_IMX53
---help---
   Coda is a range of video codec IPs that supports
   H.264, MPEG-4, and other video formats.
diff --git a/drivers/media/platform/coda.c b/drivers/media/platform/coda.c
index cd04ae2..f17b659 100644
--- a/drivers/media/platform/coda.c
+++ b/drivers/media/platform/coda.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,7 +25,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include 
 #include 
@@ -43,6 +43,7 @@
 #define CODA7_WORK_BUF_SIZE(512 * 1024 + CODA_FMO_BUF_SIZE * 8 * 1024)
 #define CODA_PARA_BUF_SIZE (10 * 1024)
 #define CODA_ISRAM_SIZE(2048 * 2)
+#define CODADX6_IRAM_SIZE  0xb000
 #define CODA7_IRAM_SIZE0x14000 /* 81920 bytes */
 
 #define CODA_MAX_FRAMEBUFFERS  2
@@ -128,7 +129,10 @@ struct coda_dev {
 
struct coda_aux_buf codebuf;
struct coda_aux_buf workbuf;
+   struct gen_pool *iram_pool;
+   long unsigned int   iram_vaddr;
long unsigned int   iram_paddr;
+   unsigned long   iram_size;
 
spinlock_t  irqlock;
struct mutexdev_mutex;
@@ -1958,6 +1962,22 @@ static int __devinit coda_probe(struct platform_device 
*pdev)
return -ENOENT;
}
 
+   /* Without device tree, get SRAM paddr from second memory resource */
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   if (res != NULL)
+   dev->iram_pool = gen_pool_find_by_phys(res->start);
+#ifdef CONFIG_OF
+   if (!dev->iram_pool) {
+   struct device_node *np = pdev->dev.of_node;
+
+   dev->iram_pool = of_get_named_gen_pool(np, "iram", 0);
+   }
+#endif
+   if (!dev->iram_pool) {
+   dev_err(&pdev->dev, "iram pool not available\n");
+   return -ENOMEM;
+   }
+
ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
if (ret)
return ret;
@@ -1992,18 +2012,17 @@ static int __devinit coda_probe(struct platform_device 
*pdev)
return -ENOMEM;
}
 
-   if (dev->devtype->product == CODA_DX6) {
-   dev->iram_paddr = 0x4c00;
-   } else {
-   void __iomem *iram_vaddr;
-
-   iram_vaddr = iram_alloc(CODA7_IRAM_SIZE,
-   &dev->iram_paddr);
-   if (!iram_vaddr) {
-   dev_err(&pdev->dev, "unable to alloc iram\n");
-   return -ENOMEM;
-   }
+   if (dev->devtype->product == CODA_DX6)
+   dev->iram_size = CODADX6_IRAM_SIZE;
+   else
+   dev->iram_size = CODA7_IRAM_SIZE;
+   dev->iram_vaddr = gen_pool_alloc(dev->iram_pool, dev->iram_size);
+   if (!dev->iram_vaddr) {
+   dev_err(&pdev->dev, "unable to alloc iram\n");
+   return -ENOMEM;
}
+   dev->iram_paddr = gen_pool_virt_to_phys(dev->iram_pool,
+   dev->iram_vaddr);
 
platform_set_drvdata(pdev, dev);
 
@@ -2020,8 +2039,8 @@ static int coda_remove(struct platform_device *pdev)
if (dev->alloc_ctx)
vb2_dma_contig_cleanup_ctx(dev->alloc_ctx);
v4l2_device_unregister(&dev->v4l2_dev);
-   if (dev->iram_paddr)
-   iram_free(dev->iram_paddr, CODA7_IRAM_SIZE);
+   if (dev->iram_vaddr)
+   gen_pool_free(dev->iram_pool, dev->iram_vaddr, dev->iram_size);
if (dev->codebuf.vaddr)
dma_free_coherent(&pdev->dev, dev->codebuf.size,
  &dev->codebuf.vaddr, dev->codebuf.paddr);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] ARM: AT91SAM9G45: fix crypto peripherals irq issue due to sparse irq support.

2012-11-16 Thread Nicolas Ferre

On 11/07/2012 04:27 PM, Jean-Christophe PLAGNIOL-VILLARD :
> On 17:31 Tue 06 Nov , Nicolas Royer wrote:
>> Spare irq support introduced by commit "8fe82a5 ARM: at91: sparse irq 
>> support"
>> involves to add the NR_IRQS_LEGACY offset to irq number.
>>
>> Signed-off-by: Nicolas Royer 
>> Acked-by: Nicolas Ferre 
>> Acked-by: Eric Bénard 
>> Tested-by: Eric Bénard 
>> Cc: sta...@vger.kernel.org # 3.6
> 
> ok will see if we can have it for 3.7

Queued in our at91-fixes pull request for 3.7-final.

Thanks a lot, best regards,
-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 4/4] ARM: dts: add sram for imx53 and imx6q

2012-11-16 Thread Philipp Zabel

Signed-off-by: Philipp Zabel 
Reviewed-by: Shawn Guo 
---
 arch/arm/boot/dts/imx53.dtsi |5 +
 arch/arm/boot/dts/imx6q.dtsi |6 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/imx53.dtsi b/arch/arm/boot/dts/imx53.dtsi
index 76ebb1a..7677218 100644
--- a/arch/arm/boot/dts/imx53.dtsi
+++ b/arch/arm/boot/dts/imx53.dtsi
@@ -541,5 +541,10 @@
status = "disabled";
};
};
+
+   ocram: ocram@f800 {
+   compatible = "fsl,imx-ocram", "sram";
+   reg = <0xf800 0x2>;
+   };
};
 };
diff --git a/arch/arm/boot/dts/imx6q.dtsi b/arch/arm/boot/dts/imx6q.dtsi
index f3990b0..855ac25 100644
--- a/arch/arm/boot/dts/imx6q.dtsi
+++ b/arch/arm/boot/dts/imx6q.dtsi
@@ -116,6 +116,12 @@
status = "disabled";
};
 
+   ocram: ocram@0090 {
+   compatible = "fsl,imx-ocram", "sram";
+   reg = <0x0090 0x3f000>;
+   clocks = <&clks 142>;
+   };
+
timer@00a00600 {
compatible = "arm,cortex-a9-twd-timer";
reg = <0x00a00600 0x20>;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 700 matches

Mail list logo