[RFC v02 1/5] PowerCap: Documentation

2013-08-07 Thread Srinivas Pandruvada
Added power cap framework documentation. This explains the use of power capping
framework, sysfs and programming interface.
There are two documents:
Documentation/powercap/PowerCappingFramework.txt: Explains use case and API in
details.
Documentation/ABI/testing/sysfs-class-powercap: Explains ABIs.

Reviewed-by: Len Brown 
Signed-off-by: Srinivas Pandruvada 
Signed-off-by: Jacob Pan 
Signed-off-by: Arjan van de Ven 
---
 Documentation/ABI/testing/sysfs-class-powercap   | 165 ++
 Documentation/powercap/PowerCappingFramework.txt | 686 +++
 2 files changed, 851 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-powercap
 create mode 100644 Documentation/powercap/PowerCappingFramework.txt

diff --git a/Documentation/ABI/testing/sysfs-class-powercap 
b/Documentation/ABI/testing/sysfs-class-powercap
new file mode 100644
index 000..0e5d6e4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-powercap
@@ -0,0 +1,165 @@
+What:  /sys/class/power_cap/
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   The power_cap/ class sub directory belongs to the power cap
+   subsystem. Refer to
+   Documentation/powercap/PowerCappingFramework.txt for details.
+
+What:  /sys/class/power_cap/controller_name
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   The /sys/class/power_cap/controller_name directories correspond
+   to each controller under power_cap control. Here controller_name
+   is a unique name under /sys/class_power_cap. Each
+   controller_name directory contains one or more power zones.
+
+What:  /sys/class/power_cap/controller_name/type
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   For controller type is "controller". This allows user space
+   to differentiate between a controller device from a power zone
+   device.
+
+What:  /sys/class/power_cap/controller_name/power_zone
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   A Controller can have one or more power zones. A power zone is
+   an abstraction of devices, which can be independently monitored
+   and controlled.
+
+What:  /sys/class/power_cap/controller_name/power_zone/power_zone
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   A power zone can have one or more power zones as children.
+   This child power zone provides monitoring and control for
+   a subset of device under parent. E.g. if there is parent
+   power zone for whole CPU package, each CPU cores in it can be
+   a child power zone.
+
+What:  /sys/class/power_cap/controller_name/power_zone/name
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   Specifies the name of this power zone.
+
+
+What:  /sys/class/power_cap/controller_name/power_zone/type
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   For power zone type is "power-zone".
+
+
+What:  /sys/class/power_cap/controller_name/power_zone/energy_uj
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   Current energy counter in micro-joules. Write "0" to reset.
+   If the counter can not be reset, then this attribute is
+   read-only.
+
+What:  /sys/class/power_cap/controller_name/power_zone/
+   max_energy_range_uj
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   Range of the above energy counter in micro-joules.
+
+
+What:  /sys/class/power_cap/controller_name/power_zone/power_uw
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   Current power in micro-watts. Write "0" to reset.
+   If the value can not be reset, then the attribute is read
+   only.
+
+What:  /sys/class/power_cap/controller_name/power_zone/
+   max_power_range_uw
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   linux...@vger.kernel.org
+Description:
+   Range of the above power value in micro-watts.
+
+What:  /sys/class/power_cap/controller_name/power_zone/
+   constraint_X_name
+Date:  August 2013
+KernelVersion: 3.12
+Contact:   

Re: [PATCH v3 3/5] devicetree: create a separate binding description for sata_highbank

2013-08-07 Thread Kumar Gala

On Aug 7, 2013, at 10:52 AM, Mark Langsdorf wrote:

> The Calxeda sata_highbank driver has been adding its descriptions to the
> ahci driver. Separate them properly.
> 
> Signed-off-by: Mark Langsdorf 
> Acked-by: Rob Herring 
> ---
> Changes from v2
>   Fixed some indenting.
> Changes from v1
>   None.
> 
> .../devicetree/bindings/ata/ahci-platform.txt  | 18 +++-
> .../devicetree/bindings/ata/sata_highbank.txt  | 32 ++
> 2 files changed, 36 insertions(+), 14 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/ata/sata_highbank.txt
> 
> diff --git a/Documentation/devicetree/bindings/ata/ahci-platform.txt 
> b/Documentation/devicetree/bindings/ata/ahci-platform.txt
> index 3ec0c5c..89de156 100644
> --- a/Documentation/devicetree/bindings/ata/ahci-platform.txt
> +++ b/Documentation/devicetree/bindings/ata/ahci-platform.txt
> @@ -4,27 +4,17 @@ SATA nodes are defined to describe on-chip Serial ATA 
> controllers.
> Each SATA controller should have its own node.
> 
> Required properties:
> -- compatible: compatible list, contains "calxeda,hb-ahci" or 
> "snps,spear-ahci"
> +- compatible: compatible list, contains "snps,spear-ahci"
> - interrupts: 
> - reg   : 
> 
> Optional properties:
> -- calxeda,port-phys: phandle-combophy and lane assignment, which maps each
> - SATA port to a combophy and a lane within that
> - combophy
> -- calxeda,sgpio-gpio: phandle-gpio bank, bit offset, and default on or off,
> - which indicates that the driver supports SGPIO
> - indicator lights using the indicated GPIOs
> -- calxeda,led-order : a u32 array that map port numbers to offsets within the
> - SGPIO bitstream.
> - dma-coherent  : Present if dma operations are coherent
> 
> Example:
> sata@ffe08000 {
> - compatible = "calxeda,hb-ahci";
> -reg = <0xffe08000 0x1000>;
> -interrupts = <115>;
> - calxeda,port-phys = < 0  0  1
> -  2  3>;
> + compatible = "snps,spear-ahci";
> + reg = <0xffe08000 0x1000>;
> + interrupts = <115>;
> 
> };
> diff --git a/Documentation/devicetree/bindings/ata/sata_highbank.txt 
> b/Documentation/devicetree/bindings/ata/sata_highbank.txt
> new file mode 100644
> index 000..1ac6d3d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/ata/sata_highbank.txt
> @@ -0,0 +1,32 @@
> +* Calxeda AHCI SATA Controller
> +
> +SATA nodes are defined to describe on-chip Serial ATA controllers.
> +The Calxeda SATA controller mostly conforms to the AHCI interface
> +with some special extensions to add functionality.
> +Each SATA controller should have its own node.
> +
> +Required properties:
> +- compatible: compatible list, contains "calxeda,hb-ahci"
> +- interrupts: 
> +- reg   : 
> +
> +Optional properties:
> +- dma-coherent  : Present if dma operations are coherent
> +- calxeda,port-phys: phandle-combophy and lane assignment, which maps each
> + SATA port to a combophy and a lane within that
> + combophy
> +- calxeda,sgpio-gpio: phandle-gpio bank, bit offset, and default on or off,
> + which indicates that the driver supports SGPIO
> + indicator lights using the indicated GPIOs
> +- calxeda,led-order : a u32 array that map port numbers to offsets within the
> + SGPIO bitstream.

nit: whitespace after :

> +
> +Example:
> +sata@ffe08000 {
> + compatible = "calxeda,hb-ahci";
> + reg = <0xffe08000 0x1000>;
> + interrupts = <115>;
> + calxeda,port-phys = < 0  0  1
> +  2  3>;
> +

Its probably good to show all optional props (dma-coherent, 
calxeda,sgpio-gpios, & calxeda,led-order) in the example.

> +};
> -- 
> 1.8.1.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by 
The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-07 Thread Steven Rostedt
On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote:

> You might want to try creating a global array of counters (accessible
> both from C for printout and assembly for update).
> 
> Index the array from assembly using:   (2f - 1f)
> 
> 1:
> jmp ...;
> 2:
> 
> And put an atomic increment of the counter. This increment instruction
> should be located prior to the jmp for obvious reasons.
> 
> You'll end up with the sums you're looking for at indexes 2 and 5 of the
> array.

After I post the patches, feel free to knock yourself out.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: at91/dt: split sam9x5 peripheral definitions

2013-08-07 Thread Thomas Petazzoni
Dear Boris BREZILLON,

On Wed,  7 Aug 2013 12:14:26 +0200, Boris BREZILLON wrote:
> This patch splits the sam9x5 peripheral definitions into:
> - a common base for all sam9x5 SoCs (at91sam9x5.dtsi)
> - several optional peripheral definitions which will be included by specific
>   sam9x5 SoCs (at91sam9x5_'periph name'.dtsi)
> 
> This provides a better representation of the real hardware (drop unneeded
> dt nodes) and avoids future peripheral id conflict (lcdc and isi both use
> peripheral id 25).
> 
> Signed-off-by: Boris BREZILLON 
> ---
>  arch/arm/boot/dts/at91sam9g25.dtsi   |2 +
>  arch/arm/boot/dts/at91sam9g35.dtsi   |1 +
>  arch/arm/boot/dts/at91sam9x25.dtsi   |   24 ++-
>  arch/arm/boot/dts/at91sam9x35.dtsi   |1 +
>  arch/arm/boot/dts/at91sam9x5.dtsi|   67 
> --
>  arch/arm/boot/dts/at91sam9x5_macb0.dtsi  |   56 +
>  arch/arm/boot/dts/at91sam9x5_macb1.dtsi  |   44 
>  arch/arm/boot/dts/at91sam9x5_usart3.dtsi |   51 +++
>  8 files changed, 158 insertions(+), 88 deletions(-)
>  create mode 100644 arch/arm/boot/dts/at91sam9x5_macb0.dtsi
>  create mode 100644 arch/arm/boot/dts/at91sam9x5_macb1.dtsi
>  create mode 100644 arch/arm/boot/dts/at91sam9x5_usart3.dtsi

Hum, do we really want to have .dtsi files per peripheral? I might have
overlooked this, but I think it's the first time we would have this in
arch/arm/boot/dts.

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Aug 7

2013-08-07 Thread Phil Sutter
On Wed, Aug 07, 2013 at 10:29:18AM +0200, Sedat Dilek wrote:
> On Wed, Aug 7, 2013 at 7:54 AM, Stephen Rothwell  
> wrote:
> > Hi all,
> >
> > Changes since 20130806:
> >
> > The ext4 tree lost its build failure.
> >
> > The mvebu tree gained a build failure so I used the version from
> > next-20130806.
> >
> > The akpm tree gained conflicts against the ext4 tree.
> >
> > 
> >
> 
> [ CC some netdev and wireless folks ]
> 
> Yesterday, I discovered an issue with net-next.
> The patch in [1] fixed the problems in my network/wifi environment.
> Hannes confirmed that virtio_net are solved, too.
> Today's next-20130807 still needs it for affected people.
> 
> - Sedat -
> 
> [1] http://marc.info/?l=linux-netdev=137582524017840=2
> [2] http://marc.info/?l=linux-netdev=137583048219416=2
> [3] http://marc.info/?t=13757971288=1=2

Could you please try the attached patch. It limits parsing the ethernet
header (by calling eth_type_trans()) to cases when the configured
protocol is ETH_P_ALL, so at least for 802.1X this should fix the
problem.

The idea behind this patch is that users setting the protocol to
something else probably do know better and so should be left alone.

Best wishes, Phil
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bbe1ece..66bc79c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1932,8 +1932,6 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,
 
ph.raw = frame;
 
-   skb->protocol = proto;
-   skb->dev = dev;
skb->priority = po->sk.sk_priority;
skb->mark = po->sk.sk_mark;
sock_tx_timestamp(>sk, _shinfo(skb)->tx_flags);
@@ -2002,13 +2000,18 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,
if (unlikely(err))
return err;
 
-   if (dev->type == ARPHRD_ETHER)
-   skb->protocol = eth_type_trans(skb, dev);
-
data += dev->hard_header_len;
to_write -= dev->hard_header_len;
}
 
+   if (dev->type == ARPHRD_ETHER &&
+   proto = htons(ETH_P_ALL)) {
+   skb->protocol = eth_type_trans(skb, dev);
+   } else {
+   skb->protocol = proto;
+   skb->dev = dev;
+   }
+
max_frame_len = dev->mtu + dev->hard_header_len;
if (skb->protocol == htons(ETH_P_8021Q))
max_frame_len += VLAN_HLEN;
@@ -2331,15 +2334,17 @@ static int packet_snd(struct socket *sock,
 
sock_tx_timestamp(sk, _shinfo(skb)->tx_flags);
 
-   if (dev->type == ARPHRD_ETHER) {
+   if (dev->type == ARPHRD_ETHER &&
+   proto == htons(ETH_P_ALL)) {
skb->protocol = eth_type_trans(skb, dev);
-   if (skb->protocol == htons(ETH_P_8021Q))
-   reserve += VLAN_HLEN;
} else {
skb->protocol = proto;
skb->dev = dev;
}
 
+   if (skb->protocol == htons(ETH_P_8021Q))
+   reserve += VLAN_HLEN;
+
if (!gso_type && (len > dev->mtu + reserve + extra_len)) {
err = -EMSGSIZE;
goto out_free;


Re: WARNING: CPU: 26 PID: 93793 at fs/ext4/inode.c:230 ext4_evict_inode+0x4c9/0x500 [ext4]() still in 3.11-rc3

2013-08-07 Thread Guenter Roeck

On 08/07/2013 08:33 AM, Jan Kara wrote:

On Wed 07-08-13 08:27:32, Guenter Roeck wrote:

On 08/07/2013 08:20 AM, Jan Kara wrote:

On Thu 01-08-13 20:58:46, Davidlohr Bueso wrote:

On Thu, 2013-08-01 at 22:33 +0200, Jan Kara wrote:

   Hi,

On Thu 01-08-13 13:14:19, Davidlohr Bueso wrote:

FYI I'm seeing loads of the following messages with Linus' latest
3.11-rc3 (which includes 822dbba33458cd6ad)

   Thanks for notice. I see you are running reaim to trigger this. What
workload?


After re-running the workloads one by one, I finally hit the issue again
with 'dbase'. FWIW I'm using ramdisks + ext4.

   Hum, I'm not able to reproduce this with current Linus' kernel - commit
e4ef108fcde0b97ed38923ba1ea06c7a152bab9e - I've tried with ramdisk but no
luck. Are you using some special mount options?


I don't see this commit in the upstream kernel ?

   It is Linus's merge of Tejun's libata fix from Tuesday...


I tried reproducing the problem on the same system I had seen 822dbba33458cd6ad 
on,
with the same workload. It has now been running since last Friday, but I have
not seen any problems.

   Ah, OK, so it may be fixed after all. If you happen to see it again,
please let me know. Thanks!


At least the problem I found, yes. The problem Davidlohr found may be a 
different one.

Guenter


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] hwmon: (lm90) Add power control

2013-08-07 Thread Stephen Warren
On 08/07/2013 03:35 AM, Wei Ni wrote:
> On 08/07/2013 04:45 PM, Alexander Shiyan wrote:
>>> On 08/07/2013 03:50 PM, Guenter Roeck wrote:
 On 08/07/2013 12:32 AM, Wei Ni wrote:
> On 08/07/2013 03:27 PM, Alexander Shiyan wrote:
>>> The device lm90 can be controlled by the vdd rail.
>>> Adding the power control support to power on/off the vdd rail.
>>> And make sure that power is enabled before accessing the device.
>>>
>>> Signed-off-by: Wei Ni 
>>> ---
>>>   drivers/hwmon/lm90.c |   52 
>>> ++
>> [...]
>>> +   if (!data->lm90_reg) {
>>> +   data->lm90_reg = regulator_get(>dev, "vdd");
>>> +   if (IS_ERR_OR_NULL(data->lm90_reg)) {
>>> +   if (PTR_ERR(data->lm90_reg) == -ENODEV)
>>> +   dev_info(>dev,
>>> +"No regulator found for vdd. 
>>> Assuming vdd is always powered.");
>>> +   else
>>> +   dev_warn(>dev,
>>> +"Error [%ld] in getting the 
>>> regulator handle for vdd.\n",
>>> +PTR_ERR(data->lm90_reg));
>>> +   data->lm90_reg = NULL;
>>> +   mutex_unlock(>update_lock);
>>> +   return -ENODEV;
>>> +   }
>>> +   }
>>> +   if (is_enable) {
>>> +   ret = regulator_enable(data->lm90_reg);
>>> +   msleep(POWER_ON_DELAY);
>>
>> Can this delay be handled directly from regulator?
>
> I think it should be handled in the device driver.
> Because there have different delay time to wait devices stable.
>

 Then why does no other caller of regulator_enable() need this ?
 I don't think lm90 is so much different to other users of regulator
 functionality.
>>>
>>> May be I'm wrong. I noticed that in lm90 SPEC, the max of "SMBus Clock
>>> Low Time" is 25ms, so I supposed that it may need about 20ms to stable
>>> after power on.
>>>
>>> Anyway, if I remove this delay, the driver also works fine, so I will
>>> remove it in my next patch.
>>
>> I originally had in mind that regulator API contain own delay option.
>> E.g. reg-fixed-voltage && gpio-regulator contains "startup-delay-us" 
>> property.
> 
> As I know the "startup-delay-us" is used for the regulator device, not
> the consumer devices.

Yes, the regulator should encoded its own startup delay. Each individual
device should handle its own requirements for delay after power is stable.

> In this patch, msleep(POWER_ON_DELAY) was used to wait the lm90 stable,
> but it seems it's unnecessary now :)

No, the driver needs to handle this properly. If the datasheet says a
delay is needed, it is.

It's probably working because in your tests the supply just happens to
be on already.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-07 Thread Mathieu Desnoyers
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
> 
> > Add short_counter,long_counter and before increment counter before each
> > jump. That way we will know how many short/long jumps were taken. 
> 
> That's not trivial at all. The jump is a single location (in an asm
> goto() statement) that happens to be inlined through out the kernel. The
> assembler decides if it will be a short or long jump. How do you add a
> counter to count the difference?

You might want to try creating a global array of counters (accessible
both from C for printout and assembly for update).

Index the array from assembly using:   (2f - 1f)

1:
jmp ...;
2:

And put an atomic increment of the counter. This increment instruction
should be located prior to the jmp for obvious reasons.

You'll end up with the sums you're looking for at indexes 2 and 5 of the
array.

Thanks,

Mathieu

> 
> The output I gave is from the boot up code that converts the jmp back to
> a nop (or in this case, the default nop to the ideal nop). It knows the
> size by reading the op code. This is a static analysis, not a running
> one. It's no trivial task to have a counter for each jump.
> 
> There is a way though. If we enable all the jumps (all tracepoints, and
> other users of jumplabel), record the trace and then compare the trace
> to the output that shows which ones were short jumps, and all others are
> long jumps.
> 
> I'll post the patches soon and you can have fun doing the compare :-)
> 
> Actually, I'm working on the 4 patches of the series that is more about
> clean ups and safety checks than the jmp conversion. That is not
> controversial, and I'll be posting them for 3.12 soon.
> 
> After that, I'll post the updated patches that have the conversion as
> well as the counter, for RFC and for others to play with.
> 
> -- Steve
> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ARM: dt: t114 dalmore: add dt entry for nct1008

2013-08-07 Thread Stephen Warren
On 08/07/2013 12:52 AM, Wei Ni wrote:
> Enable thermal sensor nct1008 for t114 dalmore.

Wei, I assume this patch doesn't depend on any of the other LM90-related
patches you've sent; I can simply apply it right away?

Is the LM90 DT binding fully documented somewhere, including the
vdd-supply property?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: dts: Fix memory node in skeleton64.dtsi

2013-08-07 Thread Jason Cooper
On Wed, Aug 07, 2013 at 08:23:06AM +0200, Gregory CLEMENT wrote:
> On 07/08/2013 03:33, Stepan Moskovchenko wrote:
> > Update the reg property of the memory node in
> > skeleton64.dtsi to reflect the fact that the root node uses
> > address-cells=2 and size-cells=2.
> 
> Good catch
> 
> Acked-by: Gregory CLEMENT 

Since we introduced the file, and I can't think of any other tree that
should take it, so I'll go ahead and take it.

thx,

Jason.

> > Change-Id: Ie9b61166143969e020ceebc51e9a384405d8c0f2
> > Signed-off-by: Stepan Moskovchenko 
> > ---
> >  arch/arm/boot/dts/skeleton64.dtsi |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/arm/boot/dts/skeleton64.dtsi 
> > b/arch/arm/boot/dts/skeleton64.dtsi
> > index 1599415..b5d7f36 100644
> > --- a/arch/arm/boot/dts/skeleton64.dtsi
> > +++ b/arch/arm/boot/dts/skeleton64.dtsi
> > @@ -9,5 +9,5 @@
> > #size-cells = <2>;
> > chosen { };
> > aliases { };
> > -   memory { device_type = "memory"; reg = <0 0>; };
> > +   memory { device_type = "memory"; reg = <0 0 0 0>; };
> >  };
> > 
> 
> 
> -- 
> Gregory Clement, Free Electrons
> Kernel, drivers, real-time and embedded Linux
> development, consulting, training and support.
> http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pinctrl: msm: Add support for MSM TLMM pinmux

2013-08-07 Thread Stephen Warren
On 08/06/2013 05:45 PM, Hanumant Singh wrote:
> On 7/31/2013 5:17 PM, Hanumant Singh wrote:
>> On 7/31/2013 2:06 PM, Stephen Warren wrote:
>>> On 07/31/2013 01:46 PM, Hanumant Singh wrote:
 On 7/30/2013 8:59 PM, Stephen Warren wrote:
> On 07/30/2013 06:13 PM, Hanumant Singh wrote:
>> On 7/30/2013 5:08 PM, Stephen Warren wrote:
>>> On 07/30/2013 06:01 PM, Hanumant Singh wrote:
 On 7/30/2013 2:22 PM, Stephen Warren wrote:
> On 07/30/2013 03:10 PM, hanumant wrote:
> ...
>> We actually have the same TLMM pinmux used by several socs of a
>> family.
>> The number of pins on each soc may vary.
>> Also a given soc gets used in a number of boards.
>> The device tree for a given soc is split into the different
>> boards
>> that
>> its in ie the boards inherit a common soc.dtsi but have separate
>> dts.
>> The boards for the same soc may use different pin groups for
>> accomplishing a function, since we have multiple i2c, spi uart
>> etc
>> peripheral instances on a soc. A different instance of each of
>> the
>> above
>> peripherals, can be used in different boards, utilizing different
>> or subset of same pin groups.
>> Thus I would need to have multiple C files for one soc, based
>> on the
>> boards that it goes into.
>
> The pinctrl driver should be exposing the raw capabilities of
> the HW.
> All the board-specific configuration should be expressed in DT.
> So, the
> driver shouldn't have to know anything about different boards at
> compile-time.
>
 I agree, so I wanted to keep the pin grouping information in DT, we
 already have a board based differentiation of dts files in DT,
 for the
 same soc.
>>>
>>> That's the opposite of what I was saying. Pin groups are a
>>> feature of
>>> the SoC design, not the board.
>>>
>> Sorry I guess I wasn't clear.
>> Right now I have a soc-pinctrl.dtsi containing pin groupings.
>> This will be "inherited" by soc-boardtype.dts.
>> The pinctrl client device nodes in soc-boardtype.dts will point to
>> pin
>> groupings in soc-pinctrl.dtsi that are valid for that particular
>> boardtype.
>> Is this a valid design?
>
> OK, so you have two types of child node inside the pinctrl DT node;
> some
> define the pin groups the SoC has (in soc.dtsi) and some define
> pinctrl
> states that reference the pin group nodes and are referenced by the
> client nodes.
>
> That's probably fine. However, I'd still question putting the pin
> group
> nodes in DT at all; I'm not convinced it's better than just putting
> those into the driver itself. You end up with the same data tables
> after
> parsing the DT anyway.
>

 Any feedback for the rest of the patch?
>>>
>>> I'm certainly waiting for this aspect of the patch to be resolved; I
>>> think it will impact the rest of the patch so much that it's not worth
>>> reviewing until we decide on where to represent the pin groups (some DT
>>> parsing could would be removed if we put the pin group definitions into
>>> the driver, hence wouldn't need to be reviewed, and likewise there's be
>>> some new tables to review).
>>>
>>
>> I am trying to look at examples of what you are suggesting.
>> I was looking at the exynos implementation, and just from a brief glance
>> it seems like there too the pin grouping is being specified in the
>> device tree, using what looks like labels of the pins.
>> The labels are matched to group structures in soc specific files?
>>
>> By having the pin groupings in DT I am able to reuse the driver without
>> any SOC based code bloat.
>> As I mentioned earlier, we have entire families of SOCs using the same
>> TLMM hardware.
>> Its not a guarantee that for a given TLMM version,
>> the pin groupings on that hardware are the same for every SOC that its
>> in. Its infact most likely that I wont be able to use the pin groupings
>> from one SOC to the next even if they both use the same TLMM.
>> It will very quickly lead to a bloat of
>> pinctrl-.c (containing the pin groupings replicated for each
>> soc)
>> which use TLMM version specific register programming implementation
>> pinctrl-tlmm-.c
>> and the DT parsing and interface to framework (which remains unchanged).
>> pinctrl-msm.c.
>>
>> Thanks
>> Hanumant
>>
> 
> Any comments on this?

No. As I said, I personally want to see all the pingroups defined in the
pinctrl driver. But, if someone else acks/... the patches without it, I
probably won't nack it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH v3 3/5] devicetree: create a separate binding description for sata_highbank

2013-08-07 Thread Mark Langsdorf
The Calxeda sata_highbank driver has been adding its descriptions to the
ahci driver. Separate them properly.

Signed-off-by: Mark Langsdorf 
Acked-by: Rob Herring 
---
Changes from v2
Fixed some indenting.
Changes from v1
None.

 .../devicetree/bindings/ata/ahci-platform.txt  | 18 +++-
 .../devicetree/bindings/ata/sata_highbank.txt  | 32 ++
 2 files changed, 36 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/ata/sata_highbank.txt

diff --git a/Documentation/devicetree/bindings/ata/ahci-platform.txt 
b/Documentation/devicetree/bindings/ata/ahci-platform.txt
index 3ec0c5c..89de156 100644
--- a/Documentation/devicetree/bindings/ata/ahci-platform.txt
+++ b/Documentation/devicetree/bindings/ata/ahci-platform.txt
@@ -4,27 +4,17 @@ SATA nodes are defined to describe on-chip Serial ATA 
controllers.
 Each SATA controller should have its own node.
 
 Required properties:
-- compatible: compatible list, contains "calxeda,hb-ahci" or 
"snps,spear-ahci"
+- compatible: compatible list, contains "snps,spear-ahci"
 - interrupts: 
 - reg   : 
 
 Optional properties:
-- calxeda,port-phys: phandle-combophy and lane assignment, which maps each
-   SATA port to a combophy and a lane within that
-   combophy
-- calxeda,sgpio-gpio: phandle-gpio bank, bit offset, and default on or off,
-   which indicates that the driver supports SGPIO
-   indicator lights using the indicated GPIOs
-- calxeda,led-order : a u32 array that map port numbers to offsets within the
-   SGPIO bitstream.
 - dma-coherent  : Present if dma operations are coherent
 
 Example:
 sata@ffe08000 {
-   compatible = "calxeda,hb-ahci";
-reg = <0xffe08000 0x1000>;
-interrupts = <115>;
-   calxeda,port-phys = < 0  0  1
-2  3>;
+   compatible = "snps,spear-ahci";
+   reg = <0xffe08000 0x1000>;
+   interrupts = <115>;
 
 };
diff --git a/Documentation/devicetree/bindings/ata/sata_highbank.txt 
b/Documentation/devicetree/bindings/ata/sata_highbank.txt
new file mode 100644
index 000..1ac6d3d
--- /dev/null
+++ b/Documentation/devicetree/bindings/ata/sata_highbank.txt
@@ -0,0 +1,32 @@
+* Calxeda AHCI SATA Controller
+
+SATA nodes are defined to describe on-chip Serial ATA controllers.
+The Calxeda SATA controller mostly conforms to the AHCI interface
+with some special extensions to add functionality.
+Each SATA controller should have its own node.
+
+Required properties:
+- compatible: compatible list, contains "calxeda,hb-ahci"
+- interrupts: 
+- reg   : 
+
+Optional properties:
+- dma-coherent  : Present if dma operations are coherent
+- calxeda,port-phys: phandle-combophy and lane assignment, which maps each
+   SATA port to a combophy and a lane within that
+   combophy
+- calxeda,sgpio-gpio: phandle-gpio bank, bit offset, and default on or off,
+   which indicates that the driver supports SGPIO
+   indicator lights using the indicated GPIOs
+- calxeda,led-order : a u32 array that map port numbers to offsets within the
+   SGPIO bitstream.
+
+Example:
+sata@ffe08000 {
+   compatible = "calxeda,hb-ahci";
+   reg = <0xffe08000 0x1000>;
+   interrupts = <115>;
+   calxeda,port-phys = < 0  0  1
+2  3>;
+
+};
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 5/5] sata, highbank: send extra clock cycles in SGPIO patterns

2013-08-07 Thread Mark Langsdorf
Some SGPIO PICs don't follow the standard very well and expect a certain
number of clock cycles or port frames in each SGPIO pattern. Add two
optional parameters in the DTB that can provide the number of extra
clock cycles to be sent before and after SGPIO pattern. Read those
parameters from the DTB and send the extra clock cycles.

Signed-off-by: Mark Langsdorf 
Acked-by: Rob Herring 
---
Changes from v2
None.
Changes from v1
Added an example to the bindings.
Forced the pre-clocks and post-clocks values to 0 if there is an
error while reading them or the values aren't in the DTB.

 Documentation/devicetree/bindings/ata/sata_highbank.txt |  6 ++
 drivers/ata/sata_highbank.c | 13 +
 2 files changed, 19 insertions(+)

diff --git a/Documentation/devicetree/bindings/ata/sata_highbank.txt 
b/Documentation/devicetree/bindings/ata/sata_highbank.txt
index fdbd4476..6124a32 100644
--- a/Documentation/devicetree/bindings/ata/sata_highbank.txt
+++ b/Documentation/devicetree/bindings/ata/sata_highbank.txt
@@ -23,6 +23,10 @@ Optional properties:
 - calxeda,tx-atten  : a u32 array that contains TX attenuation override
codes, one per port. The upper 3 bytes are always
0 and thus ignored.
+- calxeda,pre-clocks : a u32 that indicates the number of additional clock
+   cycles to transmit before sending an SGPIO pattern
+- calxeda,post-clocks: a u32 that indicates the number of additional clock
+   cycles to transmit after sending an SGPIO pattern
 
 Example:
 sata@ffe08000 {
@@ -32,4 +36,6 @@ Example:
calxeda,port-phys = < 0  0  1
 2  3>;
calxeda,tx-atten = <0xff 22 0xff 0xff 23>;
+   calxeda,pre-clocks = <10>;
+   calxeda,post-clocks = <0>;
 };
diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index a7c8038..7f5e5d9 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -84,6 +84,9 @@ static DEFINE_SPINLOCK(sgpio_lock);
 
 struct ecx_plat_data {
u32 n_ports;
+   /* number of extra clocks that the SGPIO PIC controller expects */
+   u32 pre_clocks;
+   u32 post_clocks;
unsignedsgpio_gpio[SGPIO_PINS];
u32 sgpio_pattern;
u32 port_to_sgpio[SGPIO_PORTS];
@@ -160,6 +163,9 @@ static ssize_t ecx_transmit_led_message(struct ata_port 
*ap, u32 state,
spin_lock_irqsave(_lock, flags);
ecx_parse_sgpio(pdata, ap->port_no, state);
sgpio_out = pdata->sgpio_pattern;
+   for (i = 0; i < pdata->pre_clocks; i++)
+   ecx_led_cycle_clock(pdata);
+
gpio_set_value(pdata->sgpio_gpio[SLOAD], 1);
ecx_led_cycle_clock(pdata);
gpio_set_value(pdata->sgpio_gpio[SLOAD], 0);
@@ -172,6 +178,8 @@ static ssize_t ecx_transmit_led_message(struct ata_port 
*ap, u32 state,
sgpio_out >>= 1;
ecx_led_cycle_clock(pdata);
}
+   for (i = 0; i < pdata->post_clocks; i++)
+   ecx_led_cycle_clock(pdata);
 
/* save off new led state for port/slot */
emp->led_state = state;
@@ -206,6 +214,11 @@ static void highbank_set_em_messages(struct device *dev,
of_property_read_u32_array(np, "calxeda,led-order",
pdata->port_to_sgpio,
pdata->n_ports);
+   if (of_property_read_u32(np, "calxeda,pre-clocks", >pre_clocks))
+   pdata->pre_clocks = 0;
+   if (of_property_read_u32(np, "calxeda,post-clocks",
+   >post_clocks))
+   pdata->post_clocks = 0;
 
/* store em_loc */
hpriv->em_loc = 0;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: unused swap offset / bad page map.

2013-08-07 Thread Dave Jones

void __lru_cache_add(struct page *page)
{
struct pagevec *pvec = _cpu_var(lru_add_pvec);

page_cache_get(page);
if (!pagevec_space(pvec))
__pagevec_lru_add(pvec);
pagevec_add(pvec, page);
put_cpu_var(lru_add_pvec);
}

I added a printk, and found that pagevec_add frequently returns 0. Is that ok ?

What happens to 'page' in this case ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/5] sata, highbank: fix ordering of SGPIO signals

2013-08-07 Thread Mark Langsdorf
The ACTIVITY and ERROR signals were reversed in the original commit.
Fix that so that hard drive activity does not show up on the error
light, and attempts to indicate that the hard drive is failing do
not show up as hard drive activity. This fixes a fairly serious
functional bug in the driver, but failing to apply this patch will
not cause any stability issues on the system.

Signed-off-by: Mark Langsdorf 
---
Changes from v2
Further rewords of the commit message.
Changes from v1
Expanded commit message explaining the problems with the
unpatched code.

 drivers/ata/sata_highbank.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index d047d92..e9a4f46 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -86,11 +86,11 @@ struct ecx_plat_data {
 
 #define SGPIO_SIGNALS  3
 #define ECX_ACTIVITY_BITS  0x30
-#define ECX_ACTIVITY_SHIFT 2
+#define ECX_ACTIVITY_SHIFT 0
 #define ECX_LOCATE_BITS0x8
 #define ECX_LOCATE_SHIFT   1
 #define ECX_FAULT_BITS 0x40
-#define ECX_FAULT_SHIFT0
+#define ECX_FAULT_SHIFT2
 static inline int sgpio_bit_shift(struct ecx_plat_data *pdata, u32 port,
u32 shift)
 {
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 4/5] sata, highbank: set tx_atten override bits

2013-08-07 Thread Mark Langsdorf
Some board designs do not drive the SATA transmit lines within the
specification. The ECME can provide override settings, on a per board
basis, to bring the transmit lines within spec. Read those settings
from the DTB and program them in.

Signed-off-by: Mark Langsdorf 
---
Changes from v2
None.
Changes from v1
Clarified that the array is a u32 array.
Added an example in the bindings.

 .../devicetree/bindings/ata/sata_highbank.txt  |  5 +-
 drivers/ata/sata_highbank.c| 58 +-
 2 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/ata/sata_highbank.txt 
b/Documentation/devicetree/bindings/ata/sata_highbank.txt
index 1ac6d3d..fdbd4476 100644
--- a/Documentation/devicetree/bindings/ata/sata_highbank.txt
+++ b/Documentation/devicetree/bindings/ata/sata_highbank.txt
@@ -20,6 +20,9 @@ Optional properties:
indicator lights using the indicated GPIOs
 - calxeda,led-order : a u32 array that map port numbers to offsets within the
SGPIO bitstream.
+- calxeda,tx-atten  : a u32 array that contains TX attenuation override
+   codes, one per port. The upper 3 bytes are always
+   0 and thus ignored.
 
 Example:
 sata@ffe08000 {
@@ -28,5 +31,5 @@ Example:
interrupts = <115>;
calxeda,port-phys = < 0  0  1
 2  3>;
-
+   calxeda,tx-atten = <0xff 22 0xff 0xff 23>;
 };
diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index 8b40025..a7c8038 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -46,14 +46,19 @@
 #define CR_BUSY0x0001
 #define CR_START   0x0001
 #define CR_WR_RDN  0x0002
+#define CPHY_TX_INPUT_STS  0x2001
 #define CPHY_RX_INPUT_STS  0x2002
-#define CPHY_SATA_OVERRIDE 0x4000
-#define CPHY_OVERRIDE  0x2005
+#define CPHY_SATA_TX_OVERRIDE  0x8000
+#define CPHY_SATA_RX_OVERRIDE  0x4000
+#define CPHY_TX_OVERRIDE   0x2004
+#define CPHY_RX_OVERRIDE   0x2005
 #define SPHY_LANE  0x100
 #define SPHY_HALF_RATE 0x0001
 #define CPHY_SATA_DPLL_MODE0x0700
 #define CPHY_SATA_DPLL_SHIFT   8
 #define CPHY_SATA_DPLL_RESET   (1 << 11)
+#define CPHY_SATA_TX_ATTEN 0x1c00
+#define CPHY_SATA_TX_ATTEN_SHIFT   10
 #define CPHY_PHY_COUNT 6
 #define CPHY_LANE_COUNT4
 #define CPHY_PORT_COUNT(CPHY_PHY_COUNT * 
CPHY_LANE_COUNT)
@@ -66,6 +71,7 @@ struct phy_lane_info {
void __iomem *phy_base;
u8 lane_mapping;
u8 phy_devs;
+   u8 tx_atten;
 };
 static struct phy_lane_info port_data[CPHY_PORT_COUNT];
 
@@ -76,7 +82,6 @@ static DEFINE_SPINLOCK(sgpio_lock);
 #define SGPIO_PINS 3
 #define SGPIO_PORTS8
 
-/* can be cast as an ahci_host_priv for compatibility with most functions */
 struct ecx_plat_data {
u32 n_ports;
unsignedsgpio_gpio[SGPIO_PINS];
@@ -259,8 +264,27 @@ static void highbank_cphy_disable_overrides(u8 sata_port)
if (unlikely(port_data[sata_port].phy_base == NULL))
return;
tmp = combo_phy_read(sata_port, CPHY_RX_INPUT_STS + lane * SPHY_LANE);
-   tmp &= ~CPHY_SATA_OVERRIDE;
-   combo_phy_write(sata_port, CPHY_OVERRIDE + lane * SPHY_LANE, tmp);
+   tmp &= ~CPHY_SATA_RX_OVERRIDE;
+   combo_phy_write(sata_port, CPHY_RX_OVERRIDE + lane * SPHY_LANE, tmp);
+}
+
+static void cphy_override_tx_attenuation(u8 sata_port, u32 val)
+{
+   u8 lane = port_data[sata_port].lane_mapping;
+   u32 tmp;
+
+   if (val & 0x8)
+   return;
+
+   tmp = combo_phy_read(sata_port, CPHY_TX_INPUT_STS + lane * SPHY_LANE);
+   tmp &= ~CPHY_SATA_TX_OVERRIDE;
+   combo_phy_write(sata_port, CPHY_TX_OVERRIDE + lane * SPHY_LANE, tmp);
+
+   tmp |= CPHY_SATA_TX_OVERRIDE;
+   combo_phy_write(sata_port, CPHY_TX_OVERRIDE + lane * SPHY_LANE, tmp);
+
+   tmp |= (val << CPHY_SATA_TX_ATTEN_SHIFT) & CPHY_SATA_TX_ATTEN;
+   combo_phy_write(sata_port, CPHY_TX_OVERRIDE + lane * SPHY_LANE, tmp);
 }
 
 static void cphy_override_rx_mode(u8 sata_port, u32 val)
@@ -268,21 +292,21 @@ static void cphy_override_rx_mode(u8 sata_port, u32 val)
u8 lane = port_data[sata_port].lane_mapping;
u32 tmp;
tmp = combo_phy_read(sata_port, CPHY_RX_INPUT_STS + lane * SPHY_LANE);
-   tmp &= ~CPHY_SATA_OVERRIDE;
-   combo_phy_write(sata_port, CPHY_OVERRIDE + lane * SPHY_LANE, tmp);
+   tmp &= ~CPHY_SATA_RX_OVERRIDE;
+   combo_phy_write(sata_port, CPHY_RX_OVERRIDE + lane * SPHY_LANE, tmp);
 
-   tmp |= CPHY_SATA_OVERRIDE;
- 

[PATCH v3 2/5] sata highbank: enable 64-bit DMA mask when using LPAE

2013-08-07 Thread Mark Langsdorf
From: Rob Herring 

Signed-off-by: Rob Herring 
Signed-off-by: Mark Langsdorf 
---
Changes from v1, v2
None.

 drivers/ata/sata_highbank.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index e9a4f46..8b40025 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -479,6 +479,9 @@ static int ahci_highbank_probe(struct platform_device *pdev)
if (hpriv->cap & HOST_CAP_PMP)
pi.flags |= ATA_FLAG_PMP;
 
+   if (hpriv->cap & HOST_CAP_64)
+   dma_set_coherent_mask(dev, DMA_BIT_MASK(64));
+
/* CAP.NP sometimes indicate the index of the last enabled
 * port, at other times, that of the last possible port, so
 * determining the maximum port number requires looking at
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] mm: make lru_add_drain_all() selective

2013-08-07 Thread Chris Metcalf
This change makes lru_add_drain_all() only selectively interrupt
the cpus that have per-cpu free pages that can be drained.

This is important in nohz mode where calling mlockall(), for
example, otherwise will interrupt every core unnecessarily.

Signed-off-by: Chris Metcalf 
---
Oops! In the previous version of this change I had just blindly patched
it forward from a slightly older version of mm/swap.c.  This version is
now properly against a version of mm/swap.c that includes all the latest
changes to lru_add_drain_all().

 include/linux/workqueue.h |  3 +++
 kernel/workqueue.c| 35 ++-
 mm/swap.c | 37 -
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a0ed78a..71a3fe7 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+struct cpumask;
+
 struct workqueue_struct;
 
 struct work_struct;
@@ -470,6 +472,7 @@ extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
 extern void flush_scheduled_work(void);
 
+extern int schedule_on_cpu_mask(work_func_t func, const struct cpumask *mask);
 extern int schedule_on_each_cpu(work_func_t func);
 
 int execute_in_process_context(work_func_t fn, struct execute_work *);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..a6d1809 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2962,17 +2962,18 @@ bool cancel_delayed_work_sync(struct delayed_work 
*dwork)
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
 /**
- * schedule_on_each_cpu - execute a function synchronously on each online CPU
+ * schedule_on_cpu_mask - execute a function synchronously on each listed CPU
  * @func: the function to call
+ * @mask: the cpumask to invoke the function on
  *
- * schedule_on_each_cpu() executes @func on each online CPU using the
+ * schedule_on_cpu_mask() executes @func on each listed CPU using the
  * system workqueue and blocks until all CPUs have completed.
- * schedule_on_each_cpu() is very slow.
+ * schedule_on_cpu_mask() is very slow.
  *
  * RETURNS:
  * 0 on success, -errno on failure.
  */
-int schedule_on_each_cpu(work_func_t func)
+int schedule_on_cpu_mask(work_func_t func, const struct cpumask *mask)
 {
int cpu;
struct work_struct __percpu *works;
@@ -2981,24 +2982,40 @@ int schedule_on_each_cpu(work_func_t func)
if (!works)
return -ENOMEM;
 
-   get_online_cpus();
-
-   for_each_online_cpu(cpu) {
+   for_each_cpu(cpu, mask) {
struct work_struct *work = per_cpu_ptr(works, cpu);
 
INIT_WORK(work, func);
schedule_work_on(cpu, work);
}
 
-   for_each_online_cpu(cpu)
+   for_each_cpu(cpu, mask)
flush_work(per_cpu_ptr(works, cpu));
 
-   put_online_cpus();
free_percpu(works);
return 0;
 }
 
 /**
+ * schedule_on_each_cpu - execute a function synchronously on each online CPU
+ * @func: the function to call
+ *
+ * schedule_on_each_cpu() executes @func on each online CPU using the
+ * system workqueue and blocks until all CPUs have completed.
+ * schedule_on_each_cpu() is very slow.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int schedule_on_each_cpu(work_func_t func)
+{
+   get_online_cpus();
+   schedule_on_cpu_mask(func, cpu_online_mask);
+   put_online_cpus();
+   return 0;
+}
+
+/**
  * flush_scheduled_work - ensure that any scheduled work has run to completion.
  *
  * Forces execution of the kernel-global workqueue and blocks until its
diff --git a/mm/swap.c b/mm/swap.c
index 4a1d0d2..d4a862b 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -405,6 +405,11 @@ static void activate_page_drain(int cpu)
pagevec_lru_move_fn(pvec, __activate_page, NULL);
 }
 
+static bool need_activate_page_drain(int cpu)
+{
+   return pagevec_count(_cpu(activate_page_pvecs, cpu)) != 0;
+}
+
 void activate_page(struct page *page)
 {
if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
@@ -422,6 +427,11 @@ static inline void activate_page_drain(int cpu)
 {
 }
 
+static bool need_activate_page_drain(int cpu)
+{
+   return false;
+}
+
 void activate_page(struct page *page)
 {
struct zone *zone = page_zone(page);
@@ -683,7 +693,32 @@ static void lru_add_drain_per_cpu(struct work_struct 
*dummy)
  */
 int lru_add_drain_all(void)
 {
-   return schedule_on_each_cpu(lru_add_drain_per_cpu);
+   cpumask_var_t mask;
+   int cpu, rc;
+
+   if (!alloc_cpumask_var(, GFP_KERNEL))
+   return -ENOMEM;
+   cpumask_clear(mask);
+
+   /*
+* Figure out which cpus need flushing.  It's OK if we race
+* with changes to the per-cpu lru pvecs, since it's no worse
+* than if we flushed all cpus, since a cpu could still end
+

Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-07 Thread Roland Dreier
On Wed, Aug 7, 2013 at 7:38 AM, David Milburn  wrote:
> I was able to succesfully test this patch overnight, I had been experimenting 
> with the
> sg driver setting the BIO_NULL_MAPPED flag in sg_rq_end_io_usercontext for a 
> orphan process
> which prevented the corruption, but your solution seems much better.

Very cool, thanks for the testing.

I actually looked at using BIO_NULL_MAPPED as well, but it seemed a
bit too fragile to me -- it had the right effect of skipping
__bio_copy_iov(), and skipping the __free_pages() stuff in there is OK
because sg owns its pages rather than the bio layer, but all that
seemed vulnerable to being broken by an unrelated change.

Out of curiousity, were you already working on this bug?  Because if
you had fixed it a few weeks earlier we might not have spent so long
wondering WTF was stomping on the memory of one of our processes :)

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OPP: rename functions? (was [PATCH] OPP: Export opp_add())

2013-08-07 Thread Nishanth Menon

Rafael,
offline question:

On 08/06/2013 09:15 AM, Rafael J. Wysocki wrote:

On Tuesday, August 06, 2013 08:08:20 AM Nishanth Menon wrote:

change in subject to reflect new discussion.

On 05:53-20130806, Randy Dunlap wrote:

On 08/03/2013 02:25 AM, Viresh Kumar wrote:



+EXPORT_SYMBOL_GPL(opp_add);


Could it be renamed to pm_opp_add() or power_opp_add() ?
The name is a bit too unspecific IMO.

Though this has nothing specific with this patch, an interesting point.

git grep -w opp . showed drivers/tty/n_tty.c,
drivers/sbus/char/openprom.c and arch/powerpc/kvm/mpic.c using
variables named opp to mean what ever they had in context. rest(around
40 odd files) seem to use opp as in Documentation/power/opp.txt..

We could go with a pm_ prefix or even dev_pm_opp_ prefix to be more
specific, though I prefer just pm_. If Rafael and others are ok, I can
post a series out.


Yup, that would be useful.  I'm for dev_pm_opp_ if that matters.


Given that there would be quiet a few conflicts, do you have a 
suggestion around what baseline I should submit this?



--
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] fuse: drop dentry on failed revalidate

2013-08-07 Thread Miklos Szeredi
On Tue, Aug 6, 2013 at 10:06 PM, Anand Avati  wrote:
> On 8/6/13 7:30 AM, Miklos Szeredi wrote:
>>
>> From: Anand Avati 
>>
>> Drop a subtree when we find that it has moved or been delated.  This can
>> be
>> done as long as there are no submounts under this location.
>>
>> If the directory was moved and we come across the same directory in a
>> future lookup it will be reconnected by d_materialise_unique().
>>
>> Signed-off-by: Anand Avati 
>> Signed-off-by: Miklos Szeredi 
>> ---
>>   fs/fuse/dir.c | 7 ++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index 131d14b..4ba5893 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -226,8 +226,13 @@ static int fuse_dentry_revalidate(struct dentry
>> *entry, unsigned int flags)
>> if (!err) {
>> struct fuse_inode *fi = get_fuse_inode(inode);
>> if (outarg.nodeid != get_node_id(inode)) {
>> +   int ret = 0;
>> +
>> +   if (check_submounts_and_drop(entry) != 0)
>> +   ret = 1;
>> +
>> fuse_queue_forget(fc, forget,
>> outarg.nodeid, 1);
>> -   return 0;
>> +   return ret;
>
>
> If outarg.nodeid != get_node_id(inode), then we have to return 0 no matter
> what (whether we successfully dropped the entry or not), no?

If we return 0 in that case (we failed to invalidate the dentry), then
the VFS will call d_invalidate() which will fail.  The result is the
same...

> Or are you
> trying to forcefully keep the path to reach the submount alive? If so, we
> still fail in inode_permission() .. -> getattr() of the dir inode, no?

Yes.  But the path to the mountpoint should still be reachable (for
the purpose of unmounting for example).  I'm including an interesting
discussion between Al and Linus about this (mailing lists weren't
CC-d, but I don't think they'd mind).

BTW, the isue that non-directory mountpoints  are dropped by NFS and
friends is not addressed by my previous patchset.  Updated patches
coming up.

Thanks,
Miklos


Subject: [heads-up] breakage with revalidate on NFS and elsewhere

---
From: Al Viro 

1) In NFS ->d_revalidate() we blindly evict non-directories from
dcache.  So does d_invalidate().  Which will leave anything bound on
the file in question unreachable.  It's not a complete leak (e.g. umount -l
or death of namespace will still evict those), but it's certainly a bug
and one with potential for rather unhappy admin.

Note that there's no reason whatsoever to do that d_drop() in case of
non-directories; the only possible caller (do_revalidate(); the other
call site is for directories only) will call d_invalidate(), which will
drop them itself.

d_invalidate() is more interesting; the minimal fix is to have it check
d_mounted and if it's non-zero - grab namespace_sem, find all vfsmounts
with this ->mnt_root, umount_tree() for all of those, drop namespace_sem,
then release all collected vfsmounts.  What's more, we probably want to
extend that to directories; the same thing could be done for all children
with non-zero d_mounted, killing the "has submounts" logics in NFS revalidate.

It's not even hard to implement - all we need is a secondary hash chains
going through vfsmounts, keyed by ->mnt_mountpoint alone.  That would be
enough (alternative would be to put them on a cyclic list anchored in dentry,
but that'd lead to much worse memory waste since for almost all dentries
the list would be empty).

_However_, there's a secondary issue with d_invalidate() callers.  What
happens to the "case-insensitive" crap?  Suppose we have something mounted
on /mnt/foo/bar, with /mnt/foo/bar being on VFAT.  Somebody wants to open
/mnt/foo/BaR; what should that do to mountpoint?

Current behaviour is
a) if it's a directory, have lookup return /mnt/foo/bar, case be
damned.
b) if it's a non-directory, leak the vfsmount(s), return dentry
with new name.

IMO we should _NOT_ make any vfsmounts unreachable in that case; too obvious
abuse potential.  The only question is whether to have invalidation simply
fail (i.e. case (a) for everything) or to try and flip ->mnt_mountpoint in
them to the "replacement" dentry.  I think that the former is the right
answer.  In any case, this means splitting d_invalidate() in two variants
(unmounting and non-unmounting).

We also need to review other __d_drop()/d_drop() users - potentially they
might need the same kind of treatment ;-/

2) NFS4 ->d_revalidate() is too bloody eager to bypass everything
bypassable; as the result, if you have a something bound on top of file
and attempt to open it, the damn thing will blindly try to open _underlying_
file.  You either get that file opened (and nameidata_to_filp() will return it,
nevermind where 

Re: [PATCH 00/26] STA2X11 devicetree support for amba/pci

2013-08-07 Thread H. Peter Anvin
On 08/07/2013 03:16 AM, Alessandro Rubini wrote:
> 
> Some of the problems he found are:
> 
>  * Passing a dtb to the kernel: we use a modified kexec at present
>because x86 boot loaders can't pass the DT blob, to our knowledge.
> 
>  * Passing correct irq numbers to the AMBA drivers, because PCI MSI
>irq numbers are dynamically allocated (we solved this by using
>of_update_property() at runtime). We also had to register a new
>irq domain for msi irqs, otherwise of_irq_map_one() would complain
>about irqs lacking a corresponding domain.
> 
>  * Switching to a new gpio driver with devicetree support (we took the
>Nomadik gpio/pinctrl because our device apparently has more or less
>the same gpio cell as the Nomadik chip). This requires implementation
>of writel_relaxed() and IRQF_VALID on x86: we hacked them internally
>but the patches are not part of this set. We're willing to solve
>these incompatibilities first, if there's interest.
> 
>  * Writing a suitable dts: at present, a dts only exists for one
>of the STA2X11 based boards (Intel Northville). This includes a
>copy of all the physical addresses for the devices, as dts requires
>that, even if such addresses are automatically assigned by PCI.
>Clearly, with this approach we kill PCI autodetect: if you plug
>to a different slot you need a different dts.
> 
> This got us a more or less working kernel on the Northville board
> (where the device is soldered on the motherboard and acts as main chipset).
> The plug-in PCIe board cannot be supported by device tree, as far as
> we know, which in our opinion is a strong downside of device tree in favor
> of the platform data "shit".
> 

OK, so we have a real corner case here... which is a plugin board beyond
which sits a bus normally used by fixed devices.  You are definitely
correct that this is something that stresses current means of
description to the breaking point.

*Note there are some questions below that I would perfectly understand
if you can't talk about publicly, if so, please contact me privately at
my corporate address.*

However, the plugin board is very different from it being the main
chipset, in no small part because you can boot without it.  I think this
is the first time I have ever heard of a chip which can act both as a
system chipset and a plugin card.

The mainboard case is relatively straightforward -- we should use ACPI 5
(preferred for x86) or device tree to describe it.  My understanding
from what you describe so far is that the only existing case is the
Northville which is a mainboard.

For the plugin case, my thinking is that we probably do need a driver of
some kind which at least contains the description of the board, as I
assume one is not present in any kind of firmware on the board itself
(*do any such boards or plans for them actually exist at this point?*)
Ideally that driver should be (primarily?) a data object (an ACPI 5 SSDT
or a DTB file) rather than open coded C.

I believe ACPI 5 unlike device tree should be able to specify the
dynamic properties that you are rightfully concerned with.

Sorry if this feels like a wild goose chase to you.  Some of this
problem domain is not very well handled by the current code, but we
really have to draw a hard line to make sure it doesn't descend into
unmaintainable chaos.

We have similar issues with MinnowBoard and are trying to use that as a
platform to figure out how a lot of these things need to be handled.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/27] drivers/i2c/busses: don't check resource with devm_ioremap_resource

2013-08-07 Thread Wolfram Sang
On Tue, Jul 23, 2013 at 08:01:39PM +0200, Wolfram Sang wrote:
> devm_ioremap_resource does sanity checks on the given resource. No need to
> duplicate this in the driver.
> 
> Signed-off-by: Wolfram Sang 

Applied to for-next, thanks!



signature.asc
Description: Digital signature


Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2013-08-07 Thread Johannes Weiner
On Wed, Aug 07, 2013 at 03:58:28PM +0100, Mel Gorman wrote:
> On Fri, Aug 02, 2013 at 11:37:26AM -0400, Johannes Weiner wrote:
> > Each zone that holds userspace pages of one workload must be aged at a
> > speed proportional to the zone size.  Otherwise, the time an
> > individual page gets to stay in memory depends on the zone it happened
> > to be allocated in.  Asymmetry in the zone aging creates rather
> > unpredictable aging behavior and results in the wrong pages being
> > reclaimed, activated etc.
> > 
> > But exactly this happens right now because of the way the page
> > allocator and kswapd interact.  The page allocator uses per-node lists
> > of all zones in the system, ordered by preference, when allocating a
> > new page.  When the first iteration does not yield any results, kswapd
> > is woken up and the allocator retries.  Due to the way kswapd reclaims
> > zones below the high watermark while a zone can be allocated from when
> > it is above the low watermark, the allocator may keep kswapd running
> > while kswapd reclaim ensures that the page allocator can keep
> > allocating from the first zone in the zonelist for extended periods of
> > time.  Meanwhile the other zones rarely see new allocations and thus
> > get aged much slower in comparison.
> > 
> > The result is that the occasional page placed in lower zones gets
> > relatively more time in memory, even gets promoted to the active list
> > after its peers have long been evicted.  Meanwhile, the bulk of the
> > working set may be thrashing on the preferred zone even though there
> > may be significant amounts of memory available in the lower zones.
> > 
> > Even the most basic test -- repeatedly reading a file slightly bigger
> > than memory -- shows how broken the zone aging is.  In this scenario,
> > no single page should be able stay in memory long enough to get
> > referenced twice and activated, but activation happens in spades:
> > 
> >   $ grep active_file /proc/zoneinfo
> >   nr_inactive_file 0
> >   nr_active_file 0
> >   nr_inactive_file 0
> >   nr_active_file 8
> >   nr_inactive_file 1582
> >   nr_active_file 11994
> >   $ cat data data data data >/dev/null
> >   $ grep active_file /proc/zoneinfo
> >   nr_inactive_file 0
> >   nr_active_file 70
> >   nr_inactive_file 258753
> >   nr_active_file 443214
> >   nr_inactive_file 149793
> >   nr_active_file 12021
> > 
> > Fix this with a very simple round robin allocator.  Each zone is
> > allowed a batch of allocations that is proportional to the zone's
> > size, after which it is treated as full.  The batch counters are reset
> > when all zones have been tried and the allocator enters the slowpath
> > and kicks off kswapd reclaim.  Allocation and reclaim is now fairly
> > spread out to all available/allowable zones:
> > 
> >   $ grep active_file /proc/zoneinfo
> >   nr_inactive_file 0
> >   nr_active_file 0
> >   nr_inactive_file 174
> >   nr_active_file 4865
> >   nr_inactive_file 53
> >   nr_active_file 860
> >   $ cat data data data data >/dev/null
> >   $ grep active_file /proc/zoneinfo
> >   nr_inactive_file 0
> >   nr_active_file 0
> >   nr_inactive_file 22
> >   nr_active_file 4988
> >   nr_inactive_file 190969
> >   nr_active_file 937
> > 
> > When zone_reclaim_mode is enabled, allocations will now spread out to
> > all zones on the local node, not just the first preferred zone (which
> > on a 4G node might be a tiny Normal zone).
> > 
> > Signed-off-by: Johannes Weiner 
> > Tested-by: Zlatko Calusic 
> > ---
> >  include/linux/mmzone.h |  1 +
> >  mm/page_alloc.c| 69 
> > ++
> >  2 files changed, 60 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index af4a3b7..dcad2ab 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -352,6 +352,7 @@ struct zone {
> >  * free areas of different sizes
> >  */
> > spinlock_t  lock;
> > +   int alloc_batch;
> > int all_unreclaimable; /* All pages pinned */
> >  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > /* Set to true when the PG_migrate_skip bits should be cleared */
> 
> This adds a dirty cache line that is updated on every allocation even if
> it's from the per-cpu allocator. I am concerned that this will introduce
> noticable overhead in the allocator paths on large machines running
> allocator intensive workloads.
> 
> Would it be possible to move it into the per-cpu pageset? I understand
> that hte round-robin nature will then depend on what CPU is running and
> the performance characterisics will be different. There might even be an
> adverse workload that uses all the batches from all available CPUs until
> it is essentially the same problem but that would be a very worst case.
> I would hope that in general 

Re: [PATCH 02/12] drivers/i2c/busses: don't use devm_pinctrl_get_select_default() in probe

2013-08-07 Thread Wolfram Sang
On Wed, Jul 10, 2013 at 04:57:37PM +0100, Wolfram Sang wrote:
> Since commit ab78029 (drivers/pinctrl: grab default handles from device core),
> we can rely on device core for setting the default pins. Compile tested only.
> 
> Acked-by: Linus Walleij  (personally at LCE13)
> Signed-off-by: Wolfram Sang 

Applied to for-next, thanks!



signature.asc
Description: Digital signature


Re: WARNING: CPU: 26 PID: 93793 at fs/ext4/inode.c:230 ext4_evict_inode+0x4c9/0x500 [ext4]() still in 3.11-rc3

2013-08-07 Thread Jan Kara
On Wed 07-08-13 08:27:32, Guenter Roeck wrote:
> On 08/07/2013 08:20 AM, Jan Kara wrote:
> >On Thu 01-08-13 20:58:46, Davidlohr Bueso wrote:
> >>On Thu, 2013-08-01 at 22:33 +0200, Jan Kara wrote:
> >>>   Hi,
> >>>
> >>>On Thu 01-08-13 13:14:19, Davidlohr Bueso wrote:
> FYI I'm seeing loads of the following messages with Linus' latest
> 3.11-rc3 (which includes 822dbba33458cd6ad)
> >>>   Thanks for notice. I see you are running reaim to trigger this. What
> >>>workload?
> >>
> >>After re-running the workloads one by one, I finally hit the issue again
> >>with 'dbase'. FWIW I'm using ramdisks + ext4.
> >   Hum, I'm not able to reproduce this with current Linus' kernel - commit
> >e4ef108fcde0b97ed38923ba1ea06c7a152bab9e - I've tried with ramdisk but no
> >luck. Are you using some special mount options?
> >
> I don't see this commit in the upstream kernel ?
  It is Linus's merge of Tejun's libata fix from Tuesday...

> I tried reproducing the problem on the same system I had seen 
> 822dbba33458cd6ad on,
> with the same workload. It has now been running since last Friday, but I have
> not seen any problems.
  Ah, OK, so it may be fixed after all. If you happen to see it again,
please let me know. Thanks!

Honza

-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: unused swap offset / bad page map.

2013-08-07 Thread Dave Jones
On Wed, Aug 07, 2013 at 06:04:20PM +0800, Hillf Danton wrote:
 > > There were a slew of these. same trace, different addr/anon_vma/index.
 > > mapping always null.
 > >
 > Would you please run again with the debug info added?
 > ---
 > --- a/mm/swapfile.c  Wed Aug  7 17:27:22 2013
 > +++ b/mm/swapfile.c  Wed Aug  7 17:57:20 2013
 > @@ -509,6 +509,7 @@ static struct swap_info_struct *swap_inf
 >  {
 >  struct swap_info_struct *p;
 >  unsigned long offset, type;
 > +int race = 0;
 > 
 >  if (!entry.val)
 >  goto out;
 > @@ -524,10 +525,17 @@ static struct swap_info_struct *swap_inf
 >  if (!p->swap_map[offset])
 >  goto bad_free;
 >  spin_lock(>lock);
 > +if (!p->swap_map[offset]) {
 > +race = 1;
 > +spin_unlock(>lock);
 > +goto bad_free;
 > +}
 >  return p;
 > 
 >  bad_free:
 >  printk(KERN_ERR "swap_free: %s%08lx\n", Unused_offset, entry.val);
 > +if (race)
 > +printk(KERN_ERR "but due to race\n");
 >  goto out;
 >  bad_offset:
 >  printk(KERN_ERR "swap_free: %s%08lx\n", Bad_offset, entry.val);
 > --

printk didn't trigger.
This time around the oom killer was going off the same time.
I'm wondering if we have some allocations somewhere in the swap code that
don't handle failure correctly.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the ext4 tree

2013-08-07 Thread Kevin Hilman
Stephen Rothwell  writes:

> Hi Sedat,
>
> On Wed, 7 Aug 2013 07:16:57 +0200 Sedat Dilek
>  wrote:
>>
>> On Mon, Jul 29, 2013 at 3:08 AM, Stephen Rothwell  
>> wrote:
>> >
>> > After merging the ext4 tree, today's linux-next build (powerpc
>> > ppc64_defconfig) failed like this:
>> >
>> > fs/ext4/ialloc.c: In function '__ext4_new_inode':
>> > fs/ext4/ialloc.c:817:1: warning: label 'next_ino' defined but not used 
>> > [-Wunused-label]
>> >  next_ino:
>> >  ^
>> > fs/ext4/ialloc.c:792:4: error: label 'next_inode' used but not defined
>> > goto next_inode;
>> > ^
>> >
>> > Hmm ...
>> >
>> > Caused by commit 4a8603ef197a ("ext4: avoid reusing recently deleted
>> > inodes in no journal mode").
>> >
>> > I have used the ext4 tree from next-20130726 for today.
>> 
>> Since this message ext4-tree was not updated.
>> The commit "ext4: avoid reusing recently deleted inodes in no journal
>> mode" was refreshed and has a different commit-id.
>> Did you test with this one? You still see the breakage?
>
> Today's linux-next does not have this build failure.

However, this same commit does introduce a new build failure (not
present in next-20130806) when ext4 is built as a module:

ERROR: "dirty_expire_interval" [fs/ext4/ext4.ko] undefined!
make[3]: *** [__modpost] Error 1
make[2]: *** [modules] Error 2

The change below fixes the problem.

Found when building the mv78xx0_defconfig on ARM.

Kevin

8<--
>From 8bd2e08124d9b298f42a0e0c3a7584ba285f Mon Sep 17 00:00:00 2001
From: Kevin Hilman 
Date: Wed, 7 Aug 2013 08:17:43 -0700
Subject: [PATCH] mm: page-writeback: export dirty_expire_interval, used by
 ext4

commit 533ec0ed (ext4: avoid reusing recently deleted inodes in no
journal mode) started using dirty_expire_inteval, which is not
available to modules.  Make it available to modules.

Cc: "Theodore Ts'o" 
Signed-off-by: Kevin Hilman 
---
 mm/page-writeback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d374b29..c8b61ef 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -104,6 +104,8 @@ EXPORT_SYMBOL_GPL(dirty_writeback_interval);
  */
 unsigned int dirty_expire_interval = 30 * 100; /* centiseconds */
 
+EXPORT_SYMBOL_GPL(dirty_expire_interval);
+
 /*
  * Flag that makes the machine dump writes/reads and block dirtyings.
  */
-- 
1.8.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond
On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> On 6 Aug 2013, Trond Myklebust verbalised:
> > True. How about something like the following instead. Note the change to
> > the original patch...
> 
> Well, with those applied I could reboot without a panic for the first
> time since 3.8.x: looking good. I'll give it a reboot or two with a
> system that's not hot from booting though.
> 

Could you please also try applying only the 1/2 patch, to see if that
suffices to quell the shutdown panic?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: WARNING: CPU: 26 PID: 93793 at fs/ext4/inode.c:230 ext4_evict_inode+0x4c9/0x500 [ext4]() still in 3.11-rc3

2013-08-07 Thread Guenter Roeck

On 08/07/2013 08:20 AM, Jan Kara wrote:

On Thu 01-08-13 20:58:46, Davidlohr Bueso wrote:

On Thu, 2013-08-01 at 22:33 +0200, Jan Kara wrote:

   Hi,

On Thu 01-08-13 13:14:19, Davidlohr Bueso wrote:

FYI I'm seeing loads of the following messages with Linus' latest
3.11-rc3 (which includes 822dbba33458cd6ad)

   Thanks for notice. I see you are running reaim to trigger this. What
workload?


After re-running the workloads one by one, I finally hit the issue again
with 'dbase'. FWIW I'm using ramdisks + ext4.

   Hum, I'm not able to reproduce this with current Linus' kernel - commit
e4ef108fcde0b97ed38923ba1ea06c7a152bab9e - I've tried with ramdisk but no
luck. Are you using some special mount options?


I don't see this commit in the upstream kernel ?

I tried reproducing the problem on the same system I had seen 822dbba33458cd6ad 
on,
with the same workload. It has now been running since last Friday, but I have
not seen any problems.

Guenter


Honza




[ cut here ]
WARNING: CPU: 26 PID: 93793 at fs/ext4/inode.c:230 ext4_evict_inode+0x4c9/0x500 
[ext4]()
Modules linked in: autofs4 cpufreq_ondemand freq_table sunrpc 8021q garp stp 
llc pcc_cpufreq ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter 
ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack 
ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod uinput 
iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel 
ghash_clmulni_intel microcode pcspkr sg lpc_ich mfd_core hpilo hpwdt 
i7core_edac edac_core netxen_nic mperf ext4 jbd2 mbcache sd_mod crc_t10dif 
aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 hpsa radeon 
ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: freq_table]
CPU: 26 PID: 93793 Comm: reaim Tainted: GW3.11.0-rc3+ #1
Hardware name: HP ProLiant DL980 G7, BIOS P66 06/24/2011
  00e6 8985db603d78 8153ce4d 00e6
   8985db603db8 8104cf1c 8985db603dc8
  8b05c485b8b0 8b05c485b9b8 8b05c485b800 ff9c
Call Trace:
  [] dump_stack+0x49/0x5c
  [] warn_slowpath_common+0x8c/0xc0
  [] warn_slowpath_null+0x1a/0x20
  [] ext4_evict_inode+0x4c9/0x500 [ext4]
  [] evict+0xa7/0x1c0
  [] iput_final+0xe3/0x170
  [] iput+0x3e/0x50
  [] do_unlinkat+0x1c6/0x280
  [] ? task_work_run+0x94/0xf0
  [] ? do_notify_resume+0x84/0x90
  [] SyS_unlink+0x16/0x20
  [] system_call_fastpath+0x16/0x1b
---[ end trace 15e812809616488b ]---







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] arm: omap: Proper cleanups for omap_device

2013-08-07 Thread Alexander Holler

Am 07.08.2013 07:52, schrieb Greg Kroah-Hartman:

On Tue, Aug 06, 2013 at 03:37:13PM +0200, Alexander Holler wrote:

Am 06.08.2013 12:14, schrieb Greg Kroah-Hartman:


What exactly is a platform device anyway?


Originally it was a "something that wasn't connected to a bus, but just
had memory-mapped i/o."  Like the PS2 keyboard controller.

Embedded systems got ahold of this and went to town, and made everything
a platform device because they could, and no one was paying attention.

Then OF came along and used it as well, and you know the rest...

I think we need to get the ACPI and OF people, and me, in a room
together at the kernel summit and not let us out until we have this all
worked out.


MFD uses platform devices too.


Ugh, I've been avoiding looking at mfd for a long time now, and really
don't want to start now...



I've just mentioned it to suggest that platform devices seem to be used 
all over the kernel as the generic (minimal) form of a device driver. At 
least that is the impression I've got.


Regards,

Alexander Holler

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf,arm -- oops in validate_event

2013-08-07 Thread Vince Weaver
On Wed, 7 Aug 2013, Vince Weaver wrote:

> On Wed, 7 Aug 2013, Will Deacon wrote:
> 
> > Ok, so the following quick hack below should solve the issue (can you 
> > confirm
> > it please, since I don't have access to any hardware atm?)
> > 
> > We should revisit this for 3.12 though, because I'm not sure that our
> > validation code even does the right thing when there are multiple PMUs
> > involved.
> > 
> > --->8
> > 
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index d9f5cd4..0500f10b 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > struct pmu *leader_pmu = event->group_leader->pmu;
> >  
> > +   if (is_software_event(event))
> > +   return 1;
> > +
> > if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> > return 1;
> 
> this isn't enough.  You can also trigger the oops by using
> tracepoint or breakpoint events as group leaders in addition to software
> events.

I take that back, it turns out tracepoint and breakpoint both
have task_ctx_nr set to perf_sw_context (althouth breakpoint has
a comment saying this may change in the future).

Let me compile and verify the fix.  It may take some time for the compile 
to finish as it's not a very fast machine.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] gpio: adnp: Fix segfault if request_threaded_irq fails

2013-08-07 Thread Lars Poeschel
From: Lars Poeschel 

In case request_threaded_irq inside adnp_irq_setup fails, the driver
segfaults. This is because irq_domain_remove is called twice with
the same pointer. First time in adnp_irq_setup and then a second time
after leaving adnp_irq_setup in the error path of adnp_i2c_probe
inside adnp_teardown.
This fixes this by removing the call to irq_domain_remove from
adnp_irq_setup.

Signed-off-by: Lars Poeschel 
---
 drivers/gpio/gpio-adnp.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpio/gpio-adnp.c b/drivers/gpio/gpio-adnp.c
index e60567f..c0f3fc4 100644
--- a/drivers/gpio/gpio-adnp.c
+++ b/drivers/gpio/gpio-adnp.c
@@ -490,15 +490,11 @@ static int adnp_irq_setup(struct adnp *adnp)
if (err != 0) {
dev_err(chip->dev, "can't request IRQ#%d: %d\n",
adnp->client->irq, err);
-   goto error;
+   return err;
}
 
chip->to_irq = adnp_gpio_to_irq;
return 0;
-
-error:
-   irq_domain_remove(adnp->domain);
-   return err;
 }
 
 static void adnp_irq_teardown(struct adnp *adnp)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: CPU: 26 PID: 93793 at fs/ext4/inode.c:230 ext4_evict_inode+0x4c9/0x500 [ext4]() still in 3.11-rc3

2013-08-07 Thread Jan Kara
On Thu 01-08-13 20:58:46, Davidlohr Bueso wrote:
> On Thu, 2013-08-01 at 22:33 +0200, Jan Kara wrote:
> >   Hi,
> > 
> > On Thu 01-08-13 13:14:19, Davidlohr Bueso wrote:
> > > FYI I'm seeing loads of the following messages with Linus' latest
> > > 3.11-rc3 (which includes 822dbba33458cd6ad)
> >   Thanks for notice. I see you are running reaim to trigger this. What
> > workload?
> 
> After re-running the workloads one by one, I finally hit the issue again
> with 'dbase'. FWIW I'm using ramdisks + ext4.
  Hum, I'm not able to reproduce this with current Linus' kernel - commit
e4ef108fcde0b97ed38923ba1ea06c7a152bab9e - I've tried with ramdisk but no
luck. Are you using some special mount options?

Honza

> > 
> > > [ cut here ]
> > > WARNING: CPU: 26 PID: 93793 at fs/ext4/inode.c:230 
> > > ext4_evict_inode+0x4c9/0x500 [ext4]()
> > > Modules linked in: autofs4 cpufreq_ondemand freq_table sunrpc 8021q garp 
> > > stp llc pcc_cpufreq ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 
> > > iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 
> > > xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror 
> > > dm_region_hash dm_log dm_mod uinput iTCO_wdt iTCO_vendor_support coretemp 
> > > kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg 
> > > lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core netxen_nic mperf ext4 
> > > jbd2 mbcache sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw 
> > > gf128mul glue_helper aes_x86_64 hpsa radeon ttm drm_kms_helper drm 
> > > i2c_algo_bit i2c_core [last unloaded: freq_table]
> > > CPU: 26 PID: 93793 Comm: reaim Tainted: GW3.11.0-rc3+ #1
> > > Hardware name: HP ProLiant DL980 G7, BIOS P66 06/24/2011
> > >  00e6 8985db603d78 8153ce4d 00e6
> > >   8985db603db8 8104cf1c 8985db603dc8
> > >  8b05c485b8b0 8b05c485b9b8 8b05c485b800 ff9c
> > > Call Trace:
> > >  [] dump_stack+0x49/0x5c
> > >  [] warn_slowpath_common+0x8c/0xc0
> > >  [] warn_slowpath_null+0x1a/0x20
> > >  [] ext4_evict_inode+0x4c9/0x500 [ext4]
> > >  [] evict+0xa7/0x1c0
> > >  [] iput_final+0xe3/0x170
> > >  [] iput+0x3e/0x50
> > >  [] do_unlinkat+0x1c6/0x280
> > >  [] ? task_work_run+0x94/0xf0
> > >  [] ? do_notify_resume+0x84/0x90
> > >  [] SyS_unlink+0x16/0x20
> > >  [] system_call_fastpath+0x16/0x1b
> > > ---[ end trace 15e812809616488b ]---
> > > 
> > > 
> 
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf,arm -- oops in validate_event

2013-08-07 Thread Vince Weaver
On Wed, 7 Aug 2013, Will Deacon wrote:

> Ok, so the following quick hack below should solve the issue (can you confirm
> it please, since I don't have access to any hardware atm?)
> 
> We should revisit this for 3.12 though, because I'm not sure that our
> validation code even does the right thing when there are multiple PMUs
> involved.
> 
> --->8
> 
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index d9f5cd4..0500f10b 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> struct pmu *leader_pmu = event->group_leader->pmu;
>  
> +   if (is_software_event(event))
> +   return 1;
> +
> if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> return 1;

this isn't enough.  You can also trigger the oops by using
tracepoint or breakpoint events as group leaders in addition to software
events.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8] dmaengine: Add MOXA ART DMA engine driver

2013-08-07 Thread Mark Rutland
On Tue, Aug 06, 2013 at 01:38:31PM +0100, Jonas Jensen wrote:
> The MOXA ART SoC has a DMA controller capable of offloading expensive
> memory operations, such as large copies. This patch adds support for
> the controller including four channels. Two of these are used to
> handle MMC copy on the UC-7112-LX hardware. The remaining two can be
> used in a future audio driver or client application.
> 
> Signed-off-by: Jonas Jensen 
> ---
> 
> Notes:
> Add test dummy DMA channels to MMC, prove the controller
> has support for interchangeable channel numbers [0].
> 
> Add new filter data struct, store dma_spec passed in xlate,
> similar to proposed patch for omap/edma [1][2].
> 
> [0] 
> https://bitbucket.org/Kasreyn/linux-next/commits/2f17ac38c5d3af49bc0c559c429a351ddd40063d
> [1] https://lkml.org/lkml/2013/8/1/750  "[PATCH] DMA: let filter 
> functions of of_dma_simple_xlate possible check of_node"
> [2] https://lkml.org/lkml/2013/3/11/203 "A proposal to check the device 
> in generic way"
> 
> Changes since v7:
> 
> 1. remove unnecessary loop in moxart_alloc_chan_resources()
> 2. remove unnecessary status check in moxart_tx_status()
> 3. check/handle dma_async_device_register() return value
> 4. check/handle devm_request_irq() return value
> 5. add and use filter data struct
> 6. check if channel device is the same as passed to
>of_dma_controller_register()
> 7. add check if chan->device->dev->of_node is the same as
>dma_spec->np (xlate)
> 8. support interchangeable channels, #dma-cells is now <1>
> 
> device tree bindings document:
> 9. update description and example, change "#dma-cells" to "<1>"
> 
> Applies to next-20130806
> 
>  .../devicetree/bindings/dma/moxa,moxart-dma.txt|  19 +
>  drivers/dma/Kconfig|   7 +
>  drivers/dma/Makefile   |   1 +
>  drivers/dma/moxart-dma.c   | 614 
> +
>  4 files changed, 641 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/dma/moxa,moxart-dma.txt
>  create mode 100644 drivers/dma/moxart-dma.c
> 
> diff --git a/Documentation/devicetree/bindings/dma/moxa,moxart-dma.txt 
> b/Documentation/devicetree/bindings/dma/moxa,moxart-dma.txt
> new file mode 100644
> index 000..69e7001
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/dma/moxa,moxart-dma.txt
> @@ -0,0 +1,19 @@
> +MOXA ART DMA Controller
> +
> +See dma.txt first
> +
> +Required properties:
> +
> +- compatible : Must be "moxa,moxart-dma"
> +- reg :Should contain registers location and length
> +- interrupts : Should contain the interrupt number
> +- #dma-cells : Should be 1, a single cell holding a line request number
> +
> +Example:
> +
> +   dma: dma@9050 {
> +   compatible = "moxa,moxart-dma";
> +   reg = <0x9050 0x1000>;
> +   interrupts = <24 0>;
> +   #dma-cells = <1>;
> +   };

The binding looks sensible to me now, but I have a couple of (hopefully
final) questions on the probe failure path.

[...]

> +
> +   ret = dma_async_device_register(>dma_slave);
> +   if (ret) {
> +   dev_err(dev, "dma_async_device_register failed\n");
> +   return ret;
> +   }
> +
> +   ret = of_dma_controller_register(node, moxart_of_xlate, mdc);
> +   if (ret) {
> +   dev_err(dev, "of_dma_controller_register failed\n");
> +   dma_async_device_unregister(>dma_slave);
> +   return ret;
> +   }
> +
> +   platform_set_drvdata(pdev, mdc);
> +
> +   tasklet_init(>tasklet, moxart_dma_tasklet, (unsigned long)mdc);
> +
> +   ret = devm_request_irq(dev, irq, moxart_dma_interrupt, 0,
> +  "moxart-dma-engine", mdc);
> +   if (ret) {
> +   dev_err(dev, "devm_request_irq failed\n");

Do you not need calls to of_dma_controller_free and
dma_async_device_unregister here? I'm not all that familiar with the DMA
API, so maybe you don't.

> +   return ret;
> +   }
> +
> +   dev_dbg(dev, "%s: IRQ=%u\n", __func__, irq);
> +
> +   return 0;
> +}
> +
> +static int moxart_remove(struct platform_device *pdev)
> +{
> +   struct moxart_dma_container *m = dev_get_drvdata(>dev);

Similarly, do you not need to call of_dma_controller free here?

> +   dma_async_device_unregister(>dma_slave);
> +   return 0;
> +}

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/1 resend] i2c: rcar: modify I2C driver

2013-08-07 Thread Wolfram Sang
On Mon, Aug 05, 2013 at 04:19:34PM +0900, Nguyen Viet Dung wrote:
> This patch modify I2C driver of rcar-H1 to usable on both rcar-H1 and rcar-H2.
> 
> Signed-off-by: Nguyen Viet Dung 

Isn't it possible to distinguish between H1 and H2 somewhere in
hardware? Then we could skip the 'flags' variable in pdata.

Thanks,

   Wolfram


signature.asc
Description: Digital signature


Re: [PATCH] ARM: dts: am33xx: Correct gpio #interrupt-cells property

2013-08-07 Thread Lars Poeschel
On Wednesday 07 August 2013 at 16:53:09, Mark Rutland wrote:
> On Wed, Aug 07, 2013 at 12:06:32PM +0100, Lars Poeschel wrote:
> > From: Lars Poeschel 
> > 
> > Following commit ff5c9059 and therefore other omap platforms using
> > the gpio-omap driver correct the #interrupt-cells property on am33xx
> > too. The omap gpio binding documentaion also states that
> > the #interrupt-cells property should be 2.
> 
> I take it there are no device nodes for which any of these nodes are an
> interrupt parent (which would need to be updated)?

As far as I know: No.

Lars

> If so:
> 
> Acked-by: Mark Rutland 
> 
> Thanks,
> Mark.
> 
> > Signed-off-by: Lars Poeschel 
> > ---
> > 
> >  arch/arm/boot/dts/am33xx.dtsi |8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/boot/dts/am33xx.dtsi
> > b/arch/arm/boot/dts/am33xx.dtsi index 38b446b..033c5dd 100644
> > --- a/arch/arm/boot/dts/am33xx.dtsi
> > +++ b/arch/arm/boot/dts/am33xx.dtsi
> > @@ -102,7 +102,7 @@
> > 
> > gpio-controller;
> > #gpio-cells = <2>;
> > interrupt-controller;
> > 
> > -   #interrupt-cells = <1>;
> > +   #interrupt-cells = <2>;
> > 
> > reg = <0x44e07000 0x1000>;
> > interrupts = <96>;
> > 
> > };
> > 
> > @@ -113,7 +113,7 @@
> > 
> > gpio-controller;
> > #gpio-cells = <2>;
> > interrupt-controller;
> > 
> > -   #interrupt-cells = <1>;
> > +   #interrupt-cells = <2>;
> > 
> > reg = <0x4804c000 0x1000>;
> > interrupts = <98>;
> > 
> > };
> > 
> > @@ -124,7 +124,7 @@
> > 
> > gpio-controller;
> > #gpio-cells = <2>;
> > interrupt-controller;
> > 
> > -   #interrupt-cells = <1>;
> > +   #interrupt-cells = <2>;
> > 
> > reg = <0x481ac000 0x1000>;
> > interrupts = <32>;
> > 
> > };
> > 
> > @@ -135,7 +135,7 @@
> > 
> > gpio-controller;
> > #gpio-cells = <2>;
> > interrupt-controller;
> > 
> > -   #interrupt-cells = <1>;
> > +   #interrupt-cells = <2>;
> > 
> > reg = <0x481ae000 0x1000>;
> > interrupts = <62>;
> > 
> > };
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: List corruption in hidraw_release in 3.11-rc4

2013-08-07 Thread Manoj Chourasia
Hi Peter,

The patch I posted was solving slab memory corruption issue which was occurring 
because of the race in device disconnect and device release. We found the some 
of the device data structure being used after free. Later we figure out the 
patch which was reverted earlier was solving our issue but there was still some 
slab memory corruption.  That was due to reason that list delete of the device 
was called after freeing the hidraw. I protect drop_ref by mutex lock and also 
delete the list before calling drop_ref that solve the issue. If you are seeing 
memory corruption then the patch could solve your issue.

Regards
-Manoj

-Original Message-
From: Jiri Kosina [mailto:jkos...@suse.cz] 
Sent: Wednesday, August 07, 2013 7:04 PM
To: Peter Wu
Cc: linux-in...@vger.kernel.org; Manoj Chourasia; linux-kernel@vger.kernel.org; 
alno...@suse.cz
Subject: Re: List corruption in hidraw_release in 3.11-rc4

On Wed, 7 Aug 2013, Peter Wu wrote:

> > does the patch below fix the problem you are seeing?
> That one is already in 3.11-rc4 as far as I can see. Also, that code 
> can probably simplified by moving the mutex_unlock after the out 
> label, removing the need to duplicate the mutex_unlock.
> 
> Remember what I said about "no Oopses"? Well, it turned out that 
> several memory structures were damaged which causes a general 
> protection fault in sock_alloc_inode and other places.
> 
> I managed to create a program that can reproduce this bug 100% in a 
> QEMU virtual machine with a Logitech USB receiver passed to it.
> 
> qemu-system-x86_64 -enable-kvm -m 1G -usb -usbdevice host:046d:c52b 
> (pass -kernel, -initrd, -append as needed)
> 
> Copy hidraw-test to initrd, boot QEMU and run `hidraw-test`. Result: 
> instant (= +/- 2 seconds) crash.
> 
> I have applied Manoj's patch[1] on top of 3.11-rc4 which seem to fix the 
> issue. 
> One observation is that the new device is named /dev/hidraw1 instead 
> of /dev/hidraw0. Example:
> 
> f(){ hidraw-test /dev/hidraw$1 usb1;}
> # needed for 3.11-rc4
> f 1; f 1 # crash
> # needed for 3.11-rc4 + patch
> f 1; f 2 # ok
> 
> Regards,
> Peter
> 
>  [1]: http://lkml.org/lkml/2013/7/22/248

That one I am still reviewing ... can I add your Tested-by: to it when I'll be 
applying it and pushing to Linus?

Thanks.

> --
> /* cc hidraw-test.c -o hidraw-test
>  * hidraw-test /dev/hidraw0 usb1; hidraw-test /dev/hidraw0 usb1;  */ 
> #include  #include  #include  #include 
>  #include  #include 
> 
> int open_and_write(const char *path, const char *data) {
>   int sfd, r;
> 
>   sfd = open(path, O_WRONLY);
>   if (sfd < 0) {
>   perror(path);
>   return 1;
>   }
> 
>   r = write(sfd, data, strlen(data));
>   if (r < 0) {
>   fprintf(stderr, "write(%s, %s): %s\n",
>   path, data, strerror(errno));
>   return 1;
>   }
>   close(sfd);
>   return 0;
> }
> 
> int dork(const char *hiddev, const char *name) {
>   int fd;
>   char c;
> 
>   fd = open(hiddev, O_RDWR | O_NONBLOCK);
>   if (fd < 0) {
>   perror("open");
>   return 1;
>   }
> 
>   if (open_and_write("/sys/bus/usb/drivers/usb/unbind", name))
>   return 1;
> 
>   // does not make a difference
>   //sleep(1);
> 
>   if (open_and_write("/sys/bus/usb/drivers/usb/bind", name))
>   return 1;
> 
>   // allow devices to get discovered
>   sleep(1);
> 
>   printf("read() = %zi\n", read(fd, , 1)); perror("read");
>   close(fd);
>   return 0;
> }
> 
> int main(int argc, char **argv) {
>   if (argc < 3) {
>   fprintf(stderr, "Usage: %s /dev/hidrawN usbN\n", *argv);
>   return 1;
>   }
> 
>   system("modprobe -v usbhid");
>   system("modprobe -v hid-logitech-dj");
> 
>   dork(argv[1], argv[2]);
> 
>   return 0;
> }
> 

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2c: add sanity check to i2c_put_adapter

2013-08-07 Thread Wolfram Sang
On Thu, Aug 01, 2013 at 02:10:46PM +0200, Sebastian Hesselbarth wrote:
> i2c_put_adapter dereferences i2c_adapter pointer passed without check
> for NULL. This adds a check for non-NULL pointer to allow i2c_put_adapter
> called with NULL and behave the same way i2c_release_client does already.
> 
> Signed-off-by: Sebastian Hesselbarth 

Applied to for-next, thanks! Please describe the use case next time in
the patch description. The current text describes more what is changed
not why. You did that later ("easier probing").



signature.asc
Description: Digital signature


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-07 Thread Steven Rostedt
On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:

> Add short_counter,long_counter and before increment counter before each
> jump. That way we will know how many short/long jumps were taken. 

That's not trivial at all. The jump is a single location (in an asm
goto() statement) that happens to be inlined through out the kernel. The
assembler decides if it will be a short or long jump. How do you add a
counter to count the difference?

The output I gave is from the boot up code that converts the jmp back to
a nop (or in this case, the default nop to the ideal nop). It knows the
size by reading the op code. This is a static analysis, not a running
one. It's no trivial task to have a counter for each jump.

There is a way though. If we enable all the jumps (all tracepoints, and
other users of jumplabel), record the trace and then compare the trace
to the output that shows which ones were short jumps, and all others are
long jumps.

I'll post the patches soon and you can have fun doing the compare :-)

Actually, I'm working on the 4 patches of the series that is more about
clean ups and safety checks than the jmp conversion. That is not
controversial, and I'll be posting them for 3.12 soon.

After that, I'll post the updated patches that have the conversion as
well as the counter, for RFC and for others to play with.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tile: remove unnecessary backslashes in asm-offsets.c

2013-08-07 Thread Chris Metcalf
Pointed out by checkpatch.  A few of the DEFINE() lines were
properly written without backslash continuation; fix the rest.

Signed-off-by: Chris Metcalf 
---
 arch/tile/kernel/asm-offsets.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/tile/kernel/asm-offsets.c b/arch/tile/kernel/asm-offsets.c
index 01ddf19..8fff475 100644
--- a/arch/tile/kernel/asm-offsets.c
+++ b/arch/tile/kernel/asm-offsets.c
@@ -37,28 +37,28 @@
 
 void foo(void)
 {
-   DEFINE(SINGLESTEP_STATE_BUFFER_OFFSET, \
+   DEFINE(SINGLESTEP_STATE_BUFFER_OFFSET,
   offsetof(struct single_step_state, buffer));
-   DEFINE(SINGLESTEP_STATE_FLAGS_OFFSET, \
+   DEFINE(SINGLESTEP_STATE_FLAGS_OFFSET,
   offsetof(struct single_step_state, flags));
-   DEFINE(SINGLESTEP_STATE_ORIG_PC_OFFSET, \
+   DEFINE(SINGLESTEP_STATE_ORIG_PC_OFFSET,
   offsetof(struct single_step_state, orig_pc));
-   DEFINE(SINGLESTEP_STATE_NEXT_PC_OFFSET, \
+   DEFINE(SINGLESTEP_STATE_NEXT_PC_OFFSET,
   offsetof(struct single_step_state, next_pc));
-   DEFINE(SINGLESTEP_STATE_BRANCH_NEXT_PC_OFFSET, \
+   DEFINE(SINGLESTEP_STATE_BRANCH_NEXT_PC_OFFSET,
   offsetof(struct single_step_state, branch_next_pc));
-   DEFINE(SINGLESTEP_STATE_UPDATE_VALUE_OFFSET, \
+   DEFINE(SINGLESTEP_STATE_UPDATE_VALUE_OFFSET,
   offsetof(struct single_step_state, update_value));
 
-   DEFINE(THREAD_INFO_TASK_OFFSET, \
+   DEFINE(THREAD_INFO_TASK_OFFSET,
   offsetof(struct thread_info, task));
-   DEFINE(THREAD_INFO_FLAGS_OFFSET, \
+   DEFINE(THREAD_INFO_FLAGS_OFFSET,
   offsetof(struct thread_info, flags));
-   DEFINE(THREAD_INFO_STATUS_OFFSET, \
+   DEFINE(THREAD_INFO_STATUS_OFFSET,
   offsetof(struct thread_info, status));
-   DEFINE(THREAD_INFO_HOMECACHE_CPU_OFFSET, \
+   DEFINE(THREAD_INFO_HOMECACHE_CPU_OFFSET,
   offsetof(struct thread_info, homecache_cpu));
-   DEFINE(THREAD_INFO_STEP_STATE_OFFSET, \
+   DEFINE(THREAD_INFO_STEP_STATE_OFFSET,
   offsetof(struct thread_info, step_state));
 
DEFINE(TASK_STRUCT_THREAD_KSP_OFFSET,
@@ -66,11 +66,11 @@ void foo(void)
DEFINE(TASK_STRUCT_THREAD_PC_OFFSET,
   offsetof(struct task_struct, thread.pc));
 
-   DEFINE(HV_TOPOLOGY_WIDTH_OFFSET, \
+   DEFINE(HV_TOPOLOGY_WIDTH_OFFSET,
   offsetof(HV_Topology, width));
-   DEFINE(HV_TOPOLOGY_HEIGHT_OFFSET, \
+   DEFINE(HV_TOPOLOGY_HEIGHT_OFFSET,
   offsetof(HV_Topology, height));
 
-   DEFINE(IRQ_CPUSTAT_SYSCALL_COUNT_OFFSET, \
+   DEFINE(IRQ_CPUSTAT_SYSCALL_COUNT_OFFSET,
   offsetof(irq_cpustat_t, irq_syscall_count));
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tile: various console improvements

2013-08-07 Thread Chris Metcalf
This change improves and cleans up the tile console.

- We enable HVC_IRQ support on tilegx, with the addition of a new
  Tilera hypervisor API for tilegx to allow a console IPI.  If IPI
  support is not available we fall back to the previous polling mode.

- We simplify the earlyprintk code to use CON_BOOT and eliminate some
  of the other supporting earlyprintk code.

- A new tile_console_write() primitive is used to send output to
  the console and is factored out of the hvc_tile driver.
  This lets us support a "sim_console" boot argument to allow using
  simulator hooks to send output to the "console" as a slightly
  faster alternative to emulating the hardware more directly.

Signed-off-by: Chris Metcalf 
---
 arch/tile/Kconfig |   1 +
 arch/tile/include/asm/setup.h |   3 +-
 arch/tile/include/hv/hypervisor.h |  29 +++-
 arch/tile/kernel/early_printk.c   |  47 +++-
 arch/tile/kernel/hvglue.lds   |   3 +-
 arch/tile/kernel/reboot.c |   2 -
 drivers/tty/hvc/hvc_tile.c| 149 --
 7 files changed, 186 insertions(+), 48 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index e41a381..0576e1d 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -112,6 +112,7 @@ config SMP
 config HVC_TILE
depends on TTY
select HVC_DRIVER
+   select HVC_IRQ if TILEGX
def_bool y
 
 config TILEGX
diff --git a/arch/tile/include/asm/setup.h b/arch/tile/include/asm/setup.h
index d04..e989090 100644
--- a/arch/tile/include/asm/setup.h
+++ b/arch/tile/include/asm/setup.h
@@ -24,9 +24,8 @@
  */
 #define MAXMEM_PFN PFN_DOWN(MAXMEM)
 
+int tile_console_write(const char *buf, int count);
 void early_panic(const char *fmt, ...);
-void warn_early_printk(void);
-void __init disable_early_printk(void);
 
 /* Init-time routine to do tile-specific per-cpu setup. */
 void setup_cpu(int boot);
diff --git a/arch/tile/include/hv/hypervisor.h 
b/arch/tile/include/hv/hypervisor.h
index 837dca5..f882ebc 100644
--- a/arch/tile/include/hv/hypervisor.h
+++ b/arch/tile/include/hv/hypervisor.h
@@ -318,8 +318,11 @@
 /** hv_set_pte_super_shift */
 #define HV_DISPATCH_SET_PTE_SUPER_SHIFT   57
 
+/** hv_console_set_ipi */
+#define HV_DISPATCH_CONSOLE_SET_IPI   63
+
 /** One more than the largest dispatch value */
-#define _HV_DISPATCH_END  58
+#define _HV_DISPATCH_END  64
 
 
 #ifndef __ASSEMBLER__
@@ -585,6 +588,30 @@ typedef struct
  */
 int hv_get_ipi_pte(HV_Coord tile, int pl, HV_PTE* pte);
 
+/** Configure the console interrupt.
+ *
+ * When the console client interrupt is enabled, the hypervisor will
+ * deliver the specified IPI to the client in the following situations:
+ *
+ * - The console has at least one character available for input.
+ *
+ * - The console can accept new characters for output, and the last call
+ *   to hv_console_write() did not write all of the characters requested
+ *   by the client.
+ *
+ * Note that in some system configurations, console interrupt will not
+ * be available; clients should be prepared for this routine to fail and
+ * to fall back to periodic console polling in that case.
+ *
+ * @param ipi Index of the IPI register which will receive the interrupt.
+ * @param event IPI event number for console interrupt. If less than 0,
+ *disable the console IPI interrupt.
+ * @param coord Tile to be targeted for console interrupt.
+ * @return 0 on success, otherwise, HV_EINVAL if illegal parameter,
+ * HV_ENOTSUP if console interrupt are not available.
+ */
+int hv_console_set_ipi(int ipi, int event, HV_Coord coord);
+
 #else /* !CHIP_HAS_IPI() */
 
 /** A set of interrupts. */
diff --git a/arch/tile/kernel/early_printk.c b/arch/tile/kernel/early_printk.c
index 34d72a1..b608e00 100644
--- a/arch/tile/kernel/early_printk.c
+++ b/arch/tile/kernel/early_printk.c
@@ -23,19 +23,24 @@
 
 static void early_hv_write(struct console *con, const char *s, unsigned n)
 {
-   hv_console_write((HV_VirtAddr) s, n);
+   tile_console_write(s, n);
+
+   /*
+* Convert NL to NLCR (close enough to CRNL) during early boot.
+* We assume newlines are at the ends of strings, which turns out
+* to be good enough for early boot console output.
+*/
+   if (n && s[n-1] == '\n')
+   tile_console_write("\r", 1);
 }
 
 static struct console early_hv_console = {
.name = "earlyhv",
.write =early_hv_write,
-   .flags =CON_PRINTBUFFER,
+   .flags =CON_PRINTBUFFER | CON_BOOT,
.index =-1,
 };
 
-/* Direct interface for emergencies */
-static int early_console_complete;
-
 void early_panic(const char *fmt, ...)
 {
va_list ap;
@@ -43,51 +48,21 @@ void early_panic(const char *fmt, ...)
va_start(ap, fmt);
early_printk("Kernel panic - not syncing: ");
early_vprintk(fmt, ap);
- 

[PATCH] tile: support "memmap" boot parameter

2013-08-07 Thread Chris Metcalf
This change adds support for the "memmap" boot parameter similar
to what x86 provides.  The tile version supports "memmap=1G$5G",
for example, as a way to reserve a 1 GB range starting at PA 5GB.
The memory is reserved via bootmem during startup, and we create a
suitable "struct resource" marked as "Reserved" so you can see the
range reported by /proc/iomem.  Up to 64 such regions can currently
be reserved on the boot command line.

We do not support the x86 options "memmap=nn@ss" (force some memory
to be available at the given address) since it's pointless to try to
have Linux use memory the Tilera hypervisor hasn't given it.  We do
not support "memmap=nn#ss" to add an ACPI range for later processing,
since we don't support ACPI.  We do not support "memmap=exactmap"
since we don't support reading the e820 information from the BIOS
like x86 does.  I did add support for "memmap=nn" (and the synonym
"mem=nn") which cap the highest PA value at "nn"; these are both
just a synonym for the existing tile boot option "maxmem".

Signed-off-by: Chris Metcalf 
---
 arch/tile/kernel/setup.c | 80 +---
 1 file changed, 76 insertions(+), 4 deletions(-)

diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 676e155..b00e156 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -154,6 +154,65 @@ static int __init setup_maxnodemem(char *str)
 }
 early_param("maxnodemem", setup_maxnodemem);
 
+struct memmap_entry {
+   u64 addr;   /* start of memory segment */
+   u64 size;   /* size of memory segment */
+};
+static struct memmap_entry memmap_map[64];
+static int memmap_nr;
+
+static void add_memmap_region(u64 addr, u64 size)
+{
+   if (memmap_nr >= ARRAY_SIZE(memmap_map)) {
+   pr_err("Ooops! Too many entries in the memory map!\n");
+   return;
+   }
+   memmap_map[memmap_nr].addr = addr;
+   memmap_map[memmap_nr].size = size;
+   memmap_nr++;
+}
+
+static int __init setup_memmap(char *p)
+{
+   char *oldp;
+   u64 start_at, mem_size;
+
+   if (!p)
+   return -EINVAL;
+
+   if (!strncmp(p, "exactmap", 8)) {
+   pr_err("\"memmap=exactmap\" not valid on tile\n");
+   return 0;
+   }
+
+   oldp = p;
+   mem_size = memparse(p, );
+   if (p == oldp)
+   return -EINVAL;
+
+   if (*p == '@') {
+   pr_err("\"memmap=nn@ss\" (force RAM) invalid on tile\n");
+   } else if (*p == '#') {
+   pr_err("\"memmap=nn#ss\" (force ACPI data) invalid on tile\n");
+   } else if (*p == '$') {
+   start_at = memparse(p+1, );
+   add_memmap_region(start_at, mem_size);
+   } else {
+   if (mem_size == 0)
+   return -EINVAL;
+   maxmem_pfn = (mem_size >> HPAGE_SHIFT) <<
+   (HPAGE_SHIFT - PAGE_SHIFT);
+   }
+   return *p == '\0' ? 0 : -EINVAL;
+}
+early_param("memmap", setup_memmap);
+
+static int __init setup_mem(char *str)
+{
+   return setup_maxmem(str);
+}
+early_param("mem", setup_mem);  /* compatibility with x86 */
+
 static int __init setup_isolnodes(char *str)
 {
char buf[MAX_NUMNODES * 5];
@@ -629,6 +688,12 @@ static void __init setup_bootmem_allocator(void)
for (i = 0; i < MAX_NUMNODES; ++i)
setup_bootmem_allocator_node(i);
 
+   /* Reserve any memory excluded by "memmap" arguments. */
+   for (i = 0; i < memmap_nr; ++i) {
+   struct memmap_entry *m = _map[i];
+   reserve_bootmem(m->addr, m->size, 0);
+   }
+
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end)
reserve_bootmem(crashk_res.start, resource_size(_res), 
0);
@@ -1562,11 +1627,11 @@ insert_non_bus_resource(void)
 #endif
 
 static struct resource* __init
-insert_ram_resource(u64 start_pfn, u64 end_pfn)
+insert_ram_resource(u64 start_pfn, u64 end_pfn, bool reserved)
 {
struct resource *res =
kzalloc(sizeof(struct resource), GFP_ATOMIC);
-   res->name = "System RAM";
+   res->name = reserved ? "Reserved" : "System RAM";
res->start = start_pfn << PAGE_SHIFT;
res->end = (end_pfn << PAGE_SHIFT) - 1;
res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
@@ -1601,11 +1666,11 @@ static int __init request_standard_resources(void)
end_pfn > pci_reserve_start_pfn) {
if (end_pfn > pci_reserve_end_pfn)
insert_ram_resource(pci_reserve_end_pfn,
-end_pfn);
+   end_pfn, 0);
end_pfn = pci_reserve_start_pfn;
}
 #endif
-   insert_ram_resource(start_pfn, end_pfn);
+   insert_ram_resource(start_pfn, end_pfn, 0);
}
 
code_resource.start = __pa(_text - 

Re: [PATCH] i2c: mv64xxx: Document the newly introduced allwinner compatible

2013-08-07 Thread Wolfram Sang
On Wed, Jul 24, 2013 at 09:14:35AM +0200, Maxime Ripard wrote:
> Signed-off-by: Maxime Ripard 

Applied to for-current, thanks!

And please, always send to the I2C list. I work heavily with patchwork
monitoring the I2C list; everything not there will easily be forgotten!


signature.asc
Description: Digital signature


[PATCH] tile: fix comment bug in sys_cmpxchg description

2013-08-07 Thread Chris Metcalf
Signed-off-by: Chris Metcalf 
---
 arch/tile/kernel/intvec_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index cb52d66..25966af 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1609,7 +1609,7 @@ ENTRY(sys_cmpxchg)
  * Because of C pointer arithmetic, we want to compute this:
  *
  * ((char*)atomic_locks +
- *  (((r0 >> 3) & (1 << (ATOMIC_HASH_SIZE - 1))) << 2))
+ *  (((r0 >> 3) & ((1 << ATOMIC_HASH_SHIFT) - 1)) << 2))
  *
  * Instead of two shifts we just ">> 1", and use 'mm'
  * to ignore the low and high bits we don't want.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: make lru_add_drain_all() selective

2013-08-07 Thread Chris Metcalf
This change makes lru_add_drain_all() only selectively interrupt
the cpus that have per-cpu free pages that can be drained.

This is important in nohz mode where calling mlockall(), for
example, otherwise will interrupt every core unnecessarily.

Signed-off-by: Chris Metcalf 
---
 include/linux/workqueue.h |  3 +++
 kernel/workqueue.c| 35 ++-
 mm/swap.c | 38 +-
 3 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a0ed78a..71a3fe7 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+struct cpumask;
+
 struct workqueue_struct;
 
 struct work_struct;
@@ -470,6 +472,7 @@ extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
 extern void flush_scheduled_work(void);
 
+extern int schedule_on_cpu_mask(work_func_t func, const struct cpumask *mask);
 extern int schedule_on_each_cpu(work_func_t func);
 
 int execute_in_process_context(work_func_t fn, struct execute_work *);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..a6d1809 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2962,17 +2962,18 @@ bool cancel_delayed_work_sync(struct delayed_work 
*dwork)
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
 /**
- * schedule_on_each_cpu - execute a function synchronously on each online CPU
+ * schedule_on_cpu_mask - execute a function synchronously on each listed CPU
  * @func: the function to call
+ * @mask: the cpumask to invoke the function on
  *
- * schedule_on_each_cpu() executes @func on each online CPU using the
+ * schedule_on_cpu_mask() executes @func on each listed CPU using the
  * system workqueue and blocks until all CPUs have completed.
- * schedule_on_each_cpu() is very slow.
+ * schedule_on_cpu_mask() is very slow.
  *
  * RETURNS:
  * 0 on success, -errno on failure.
  */
-int schedule_on_each_cpu(work_func_t func)
+int schedule_on_cpu_mask(work_func_t func, const struct cpumask *mask)
 {
int cpu;
struct work_struct __percpu *works;
@@ -2981,24 +2982,40 @@ int schedule_on_each_cpu(work_func_t func)
if (!works)
return -ENOMEM;
 
-   get_online_cpus();
-
-   for_each_online_cpu(cpu) {
+   for_each_cpu(cpu, mask) {
struct work_struct *work = per_cpu_ptr(works, cpu);
 
INIT_WORK(work, func);
schedule_work_on(cpu, work);
}
 
-   for_each_online_cpu(cpu)
+   for_each_cpu(cpu, mask)
flush_work(per_cpu_ptr(works, cpu));
 
-   put_online_cpus();
free_percpu(works);
return 0;
 }
 
 /**
+ * schedule_on_each_cpu - execute a function synchronously on each online CPU
+ * @func: the function to call
+ *
+ * schedule_on_each_cpu() executes @func on each online CPU using the
+ * system workqueue and blocks until all CPUs have completed.
+ * schedule_on_each_cpu() is very slow.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int schedule_on_each_cpu(work_func_t func)
+{
+   get_online_cpus();
+   schedule_on_cpu_mask(func, cpu_online_mask);
+   put_online_cpus();
+   return 0;
+}
+
+/**
  * flush_scheduled_work - ensure that any scheduled work has run to completion.
  *
  * Forces execution of the kernel-global workqueue and blocks until its
diff --git a/mm/swap.c b/mm/swap.c
index 4a1d0d2..981b1d9 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -683,7 +683,43 @@ static void lru_add_drain_per_cpu(struct work_struct 
*dummy)
  */
 int lru_add_drain_all(void)
 {
-   return schedule_on_each_cpu(lru_add_drain_per_cpu);
+   cpumask_var_t mask;
+   int cpu, rc;
+
+   if (!alloc_cpumask_var(, GFP_KERNEL))
+   return -ENOMEM;
+   cpumask_clear(mask);
+
+   /*
+* Figure out which cpus need flushing.  It's OK if we race
+* with changes to the per-cpu lru pvecs, since it's no worse
+* than if we flushed all cpus, since a cpu could still end
+* up putting pages back on its pvec before we returned.
+* And this avoids interrupting other cpus unnecessarily.
+*/
+   for_each_online_cpu(cpu) {
+   struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
+   struct pagevec *pvec = _cpu(lru_rotate_pvecs, cpu);
+   int count = pagevec_count(pvec);
+   int lru;
+
+   if (!count) {
+   for_each_lru(lru) {
+   pvec = [lru - LRU_BASE];
+   count = pagevec_count(pvec);
+   if (count)
+   break;
+   }
+   }
+
+   if (count)
+   cpumask_set_cpu(cpu, mask);
+   }
+
+   rc = schedule_on_cpu_mask(lru_add_drain_per_cpu, 

[PATCH] tile: avoid recursive backtrace faults

2013-08-07 Thread Chris Metcalf
This change adds support for avoiding recursive backtracer crashes;
we haven't seen this in practice other than when things are seriously
corrupt, but it may help avoid losing the root cause of a crash.

Also, don't abort kernel backtracers for invalid userspace PC's.
If we do, we lose the ability to backtrace through a userspace
call to a bad address above PAGE_OFFSET, even though that it can
be perfectly reasonable to continue the backtrace in such a case.

Signed-off-by: Chris Metcalf 
---
 arch/tile/include/asm/processor.h |  2 ++
 arch/tile/kernel/stack.c  | 30 --
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/tile/include/asm/processor.h 
b/arch/tile/include/asm/processor.h
index cda2724..fed1c04 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -110,6 +110,8 @@ struct thread_struct {
unsigned long long interrupt_mask;
/* User interrupt-control 0 state */
unsigned long intctrl_0;
+   /* Is this task currently doing a backtrace? */
+   bool in_backtrace;
 #if CHIP_HAS_PROC_STATUS_SPR()
/* Any other miscellaneous processor state bits */
unsigned long proc_status;
diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c
index af8dfc9..c972689 100644
--- a/arch/tile/kernel/stack.c
+++ b/arch/tile/kernel/stack.c
@@ -103,8 +103,7 @@ static struct pt_regs *valid_fault_handler(struct 
KBacktraceIterator* kbt)
if (kbt->verbose)
pr_err("  <%s while in kernel mode>\n", fault);
} else if (EX1_PL(p->ex1) == USER_PL &&
-   p->pc < PAGE_OFFSET &&
-   p->sp < PAGE_OFFSET) {
+  p->sp < PAGE_OFFSET && p->sp != 0) {
if (kbt->verbose)
pr_err("  <%s while in user mode>\n", fault);
} else if (kbt->verbose) {
@@ -352,6 +351,26 @@ static void describe_addr(struct KBacktraceIterator *kbt,
 }
 
 /*
+ * Avoid possible crash recursion during backtrace.  If it happens, it
+ * makes it easy to lose the actual root cause of the failure, so we
+ * put a simple guard on all the backtrace loops.
+ */
+static bool start_backtrace(void)
+{
+   if (current->thread.in_backtrace) {
+   pr_err("Backtrace requested while in backtrace!\n");
+   return false;
+   }
+   current->thread.in_backtrace = true;
+   return true;
+}
+
+static void end_backtrace(void)
+{
+   current->thread.in_backtrace = false;
+}
+
+/*
  * This method wraps the backtracer's more generic support.
  * It is only invoked from the architecture-specific code; show_stack()
  * and dump_stack() (in entry.S) are architecture-independent entry points.
@@ -361,6 +380,8 @@ void tile_show_stack(struct KBacktraceIterator *kbt, int 
headers)
int i;
int have_mmap_sem = 0;
 
+   if (!start_backtrace())
+   return;
if (headers) {
/*
 * Add a blank line since if we are called from panic(),
@@ -402,6 +423,7 @@ void tile_show_stack(struct KBacktraceIterator *kbt, int 
headers)
pr_err("Stack dump complete\n");
if (have_mmap_sem)
up_read(>task->mm->mmap_sem);
+   end_backtrace();
 }
 EXPORT_SYMBOL(tile_show_stack);
 
@@ -463,6 +485,8 @@ void save_stack_trace_tsk(struct task_struct *task, struct 
stack_trace *trace)
int skip = trace->skip;
int i = 0;
 
+   if (!start_backtrace())
+   goto done;
if (task == NULL || task == current)
KBacktraceIterator_init_current();
else
@@ -476,6 +500,8 @@ void save_stack_trace_tsk(struct task_struct *task, struct 
stack_trace *trace)
break;
trace->entries[i++] = kbt.it.pc;
}
+   end_backtrace();
+done:
trace->nr_entries = i;
 }
 EXPORT_SYMBOL(save_stack_trace_tsk);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tile: fix tilegx vmalloc_sync_all BUG_ON

2013-08-07 Thread Chris Metcalf
As specified, the test wasn't correct, and in any case it should
be a BUILD_BUG_ON.

Signed-off-by: Chris Metcalf 
---
 arch/tile/mm/fault.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index f7f99f9..6152819 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -870,7 +870,8 @@ void vmalloc_sync_all(void)
 {
 #ifdef __tilegx__
/* Currently all L1 kernel pmd's are static and shared. */
-   BUG_ON(pgd_index(VMALLOC_END) != pgd_index(VMALLOC_START));
+   BUILD_BUG_ON(pgd_index(VMALLOC_END - PAGE_SIZE) !=
+pgd_index(VMALLOC_START));
 #else
/*
 * Note that races in the updates of insync and start aren't
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2013-08-07 Thread Mel Gorman
On Fri, Aug 02, 2013 at 11:37:26AM -0400, Johannes Weiner wrote:
> Each zone that holds userspace pages of one workload must be aged at a
> speed proportional to the zone size.  Otherwise, the time an
> individual page gets to stay in memory depends on the zone it happened
> to be allocated in.  Asymmetry in the zone aging creates rather
> unpredictable aging behavior and results in the wrong pages being
> reclaimed, activated etc.
> 
> But exactly this happens right now because of the way the page
> allocator and kswapd interact.  The page allocator uses per-node lists
> of all zones in the system, ordered by preference, when allocating a
> new page.  When the first iteration does not yield any results, kswapd
> is woken up and the allocator retries.  Due to the way kswapd reclaims
> zones below the high watermark while a zone can be allocated from when
> it is above the low watermark, the allocator may keep kswapd running
> while kswapd reclaim ensures that the page allocator can keep
> allocating from the first zone in the zonelist for extended periods of
> time.  Meanwhile the other zones rarely see new allocations and thus
> get aged much slower in comparison.
> 
> The result is that the occasional page placed in lower zones gets
> relatively more time in memory, even gets promoted to the active list
> after its peers have long been evicted.  Meanwhile, the bulk of the
> working set may be thrashing on the preferred zone even though there
> may be significant amounts of memory available in the lower zones.
> 
> Even the most basic test -- repeatedly reading a file slightly bigger
> than memory -- shows how broken the zone aging is.  In this scenario,
> no single page should be able stay in memory long enough to get
> referenced twice and activated, but activation happens in spades:
> 
>   $ grep active_file /proc/zoneinfo
>   nr_inactive_file 0
>   nr_active_file 0
>   nr_inactive_file 0
>   nr_active_file 8
>   nr_inactive_file 1582
>   nr_active_file 11994
>   $ cat data data data data >/dev/null
>   $ grep active_file /proc/zoneinfo
>   nr_inactive_file 0
>   nr_active_file 70
>   nr_inactive_file 258753
>   nr_active_file 443214
>   nr_inactive_file 149793
>   nr_active_file 12021
> 
> Fix this with a very simple round robin allocator.  Each zone is
> allowed a batch of allocations that is proportional to the zone's
> size, after which it is treated as full.  The batch counters are reset
> when all zones have been tried and the allocator enters the slowpath
> and kicks off kswapd reclaim.  Allocation and reclaim is now fairly
> spread out to all available/allowable zones:
> 
>   $ grep active_file /proc/zoneinfo
>   nr_inactive_file 0
>   nr_active_file 0
>   nr_inactive_file 174
>   nr_active_file 4865
>   nr_inactive_file 53
>   nr_active_file 860
>   $ cat data data data data >/dev/null
>   $ grep active_file /proc/zoneinfo
>   nr_inactive_file 0
>   nr_active_file 0
>   nr_inactive_file 22
>   nr_active_file 4988
>   nr_inactive_file 190969
>   nr_active_file 937
> 
> When zone_reclaim_mode is enabled, allocations will now spread out to
> all zones on the local node, not just the first preferred zone (which
> on a 4G node might be a tiny Normal zone).
> 
> Signed-off-by: Johannes Weiner 
> Tested-by: Zlatko Calusic 
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c| 69 
> ++
>  2 files changed, 60 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index af4a3b7..dcad2ab 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -352,6 +352,7 @@ struct zone {
>* free areas of different sizes
>*/
>   spinlock_t  lock;
> + int alloc_batch;
>   int all_unreclaimable; /* All pages pinned */
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>   /* Set to true when the PG_migrate_skip bits should be cleared */

This adds a dirty cache line that is updated on every allocation even if
it's from the per-cpu allocator. I am concerned that this will introduce
noticable overhead in the allocator paths on large machines running
allocator intensive workloads.

Would it be possible to move it into the per-cpu pageset? I understand
that hte round-robin nature will then depend on what CPU is running and
the performance characterisics will be different. There might even be an
adverse workload that uses all the batches from all available CPUs until
it is essentially the same problem but that would be a very worst case.
I would hope that in general it would work without adding a big source of
dirty cache line bouncing in the middle of the allocator.

What I do not know offhand is how much space there is in that pageset
thing before it grows by another cache line.

I should note that the page allocator 

Re: [PATCH] ARM: dts: am33xx: Correct gpio #interrupt-cells property

2013-08-07 Thread Mark Rutland
On Wed, Aug 07, 2013 at 12:06:32PM +0100, Lars Poeschel wrote:
> From: Lars Poeschel 
> 
> Following commit ff5c9059 and therefore other omap platforms using
> the gpio-omap driver correct the #interrupt-cells property on am33xx
> too. The omap gpio binding documentaion also states that
> the #interrupt-cells property should be 2.

I take it there are no device nodes for which any of these nodes are an
interrupt parent (which would need to be updated)?

If so:

Acked-by: Mark Rutland 

Thanks,
Mark.

> 
> Signed-off-by: Lars Poeschel 
> ---
>  arch/arm/boot/dts/am33xx.dtsi |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
> index 38b446b..033c5dd 100644
> --- a/arch/arm/boot/dts/am33xx.dtsi
> +++ b/arch/arm/boot/dts/am33xx.dtsi
> @@ -102,7 +102,7 @@
>   gpio-controller;
>   #gpio-cells = <2>;
>   interrupt-controller;
> - #interrupt-cells = <1>;
> + #interrupt-cells = <2>;
>   reg = <0x44e07000 0x1000>;
>   interrupts = <96>;
>   };
> @@ -113,7 +113,7 @@
>   gpio-controller;
>   #gpio-cells = <2>;
>   interrupt-controller;
> - #interrupt-cells = <1>;
> + #interrupt-cells = <2>;
>   reg = <0x4804c000 0x1000>;
>   interrupts = <98>;
>   };
> @@ -124,7 +124,7 @@
>   gpio-controller;
>   #gpio-cells = <2>;
>   interrupt-controller;
> - #interrupt-cells = <1>;
> + #interrupt-cells = <2>;
>   reg = <0x481ac000 0x1000>;
>   interrupts = <32>;
>   };
> @@ -135,7 +135,7 @@
>   gpio-controller;
>   #gpio-cells = <2>;
>   interrupt-controller;
> - #interrupt-cells = <1>;
> + #interrupt-cells = <2>;
>   reg = <0x481ae000 0x1000>;
>   interrupts = <62>;
>   };
> -- 
> 1.7.10.4
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 15:57:34, Michal Hocko wrote:
[...]
> Hmm, OK so you think that the fd limit is sufficient already?

Hmm, that would need to touch the code as well (the register callback
would need to make sure only one event is registered per cfile). But yes
this way would be better. I will send a new patch once I have an idle
moment.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] ARM: Tegra: Add CPU's OPPs for using cpufreq-cpu0 driver

2013-08-07 Thread Viresh Kumar
cpufreq-cpu0 driver needs OPPs to be present in DT which can be probed by it to
get frequency table. This patch adds OPPs and clock-latency to tegra cpu0 node
for multiple SoCs.

Voltage levels aren't used until now for tegra and so a flat value which would
eventually be ignored is used to represent voltage.

Signed-off-by: Viresh Kumar 
---
 arch/arm/boot/dts/tegra114.dtsi | 12 
 arch/arm/boot/dts/tegra20.dtsi  | 12 
 arch/arm/boot/dts/tegra30.dtsi  | 12 
 3 files changed, 36 insertions(+)

diff --git a/arch/arm/boot/dts/tegra114.dtsi b/arch/arm/boot/dts/tegra114.dtsi
index abf6c40..730e0d9 100644
--- a/arch/arm/boot/dts/tegra114.dtsi
+++ b/arch/arm/boot/dts/tegra114.dtsi
@@ -438,6 +438,18 @@
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <0>;
+   operating-points = <
+   /* kHzignored */
+216000   100
+312000   100
+456000   100
+608000   100
+76   100
+816000   100
+912000   100
+100  100
+   >;
+   clock-latency = <30>;
};
 
cpu@1 {
diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi
index 9653fd8..5696f98 100644
--- a/arch/arm/boot/dts/tegra20.dtsi
+++ b/arch/arm/boot/dts/tegra20.dtsi
@@ -577,6 +577,18 @@
device_type = "cpu";
compatible = "arm,cortex-a9";
reg = <0>;
+   operating-points = <
+   /* kHzignored */
+216000   100
+312000   100
+456000   100
+608000   100
+76   100
+816000   100
+912000   100
+100  100
+   >;
+   clock-latency = <30>;
};
 
cpu@1 {
diff --git a/arch/arm/boot/dts/tegra30.dtsi b/arch/arm/boot/dts/tegra30.dtsi
index d8783f0..5930290 100644
--- a/arch/arm/boot/dts/tegra30.dtsi
+++ b/arch/arm/boot/dts/tegra30.dtsi
@@ -569,6 +569,18 @@
device_type = "cpu";
compatible = "arm,cortex-a9";
reg = <0>;
+   operating-points = <
+   /* kHzignored */
+216000   100
+312000   100
+456000   100
+608000   100
+76   100
+816000   100
+912000   100
+100  100
+   >;
+   clock-latency = <30>;
};
 
cpu@1 {
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] cpufreq: Tegra: Remove tegra-cpufreq driver

2013-08-07 Thread Viresh Kumar
We are using generic cpufreq-cpu0 driver, so lets get rid of platform specific
tegra-cpufreq.c driver.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/Makefile|   1 -
 drivers/cpufreq/tegra-cpufreq.c | 291 
 2 files changed, 292 deletions(-)
 delete mode 100644 drivers/cpufreq/tegra-cpufreq.c

diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index ad5866c..e74b3ee 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -76,7 +76,6 @@ obj-$(CONFIG_ARM_S5PV210_CPUFREQ) += s5pv210-cpufreq.o
 obj-$(CONFIG_ARM_SA1100_CPUFREQ)   += sa1100-cpufreq.o
 obj-$(CONFIG_ARM_SA1110_CPUFREQ)   += sa1110-cpufreq.o
 obj-$(CONFIG_ARM_SPEAR_CPUFREQ)+= spear-cpufreq.o
-obj-$(CONFIG_ARM_TEGRA_CPUFREQ)+= tegra-cpufreq.o
 
 
##
 # PowerPC platform drivers
diff --git a/drivers/cpufreq/tegra-cpufreq.c b/drivers/cpufreq/tegra-cpufreq.c
deleted file mode 100644
index cd66b85..000
--- a/drivers/cpufreq/tegra-cpufreq.c
+++ /dev/null
@@ -1,291 +0,0 @@
-/*
- * Copyright (C) 2010 Google, Inc.
- *
- * Author:
- * Colin Cross 
- * Based on arch/arm/plat-omap/cpu-omap.c, (C) 2005 Nokia Corporation
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static struct cpufreq_frequency_table freq_table[] = {
-   { .frequency = 216000 },
-   { .frequency = 312000 },
-   { .frequency = 456000 },
-   { .frequency = 608000 },
-   { .frequency = 76 },
-   { .frequency = 816000 },
-   { .frequency = 912000 },
-   { .frequency = 100 },
-   { .frequency = CPUFREQ_TABLE_END },
-};
-
-#define NUM_CPUS   2
-
-static struct clk *cpu_clk;
-static struct clk *pll_x_clk;
-static struct clk *pll_p_clk;
-static struct clk *emc_clk;
-
-static unsigned long target_cpu_speed[NUM_CPUS];
-static DEFINE_MUTEX(tegra_cpu_lock);
-static bool is_suspended;
-
-static int tegra_verify_speed(struct cpufreq_policy *policy)
-{
-   return cpufreq_frequency_table_verify(policy, freq_table);
-}
-
-static unsigned int tegra_getspeed(unsigned int cpu)
-{
-   unsigned long rate;
-
-   if (cpu >= NUM_CPUS)
-   return 0;
-
-   rate = clk_get_rate(cpu_clk) / 1000;
-   return rate;
-}
-
-static int tegra_cpu_clk_set_rate(unsigned long rate)
-{
-   int ret;
-
-   /*
-* Take an extra reference to the main pll so it doesn't turn
-* off when we move the cpu off of it
-*/
-   clk_prepare_enable(pll_x_clk);
-
-   ret = clk_set_parent(cpu_clk, pll_p_clk);
-   if (ret) {
-   pr_err("Failed to switch cpu to clock pll_p\n");
-   goto out;
-   }
-
-   if (rate == clk_get_rate(pll_p_clk))
-   goto out;
-
-   ret = clk_set_rate(pll_x_clk, rate);
-   if (ret) {
-   pr_err("Failed to change pll_x to %lu\n", rate);
-   goto out;
-   }
-
-   ret = clk_set_parent(cpu_clk, pll_x_clk);
-   if (ret) {
-   pr_err("Failed to switch cpu to clock pll_x\n");
-   goto out;
-   }
-
-out:
-   clk_disable_unprepare(pll_x_clk);
-   return ret;
-}
-
-static int tegra_update_cpu_speed(struct cpufreq_policy *policy,
-   unsigned long rate)
-{
-   int ret = 0;
-   struct cpufreq_freqs freqs;
-
-   freqs.old = tegra_getspeed(0);
-   freqs.new = rate;
-
-   if (freqs.old == freqs.new)
-   return ret;
-
-   /*
-* Vote on memory bus frequency based on cpu frequency
-* This sets the minimum frequency, display or avp may request higher
-*/
-   if (rate >= 816000)
-   clk_set_rate(emc_clk, 6); /* cpu 816 MHz, emc max */
-   else if (rate >= 456000)
-   clk_set_rate(emc_clk, 3); /* cpu 456 MHz, emc 150Mhz */
-   else
-   clk_set_rate(emc_clk, 1);  /* emc 50Mhz */
-
-   cpufreq_notify_transition(policy, , CPUFREQ_PRECHANGE);
-
-#ifdef CONFIG_CPU_FREQ_DEBUG
-   printk(KERN_DEBUG "cpufreq-tegra: transition: %u --> %u\n",
-  freqs.old, freqs.new);
-#endif
-
-   ret = tegra_cpu_clk_set_rate(freqs.new * 1000);
-   if (ret) {
-   pr_err("cpu-tegra: Failed to set cpu frequency to %d kHz\n",
-   freqs.new);
-   freqs.new = 

[PATCH 4/6] ARM: Tegra: defconfig: select cpufreq-cpu0 driver

2013-08-07 Thread Viresh Kumar
Tegra requires cpufreq-cpu0 driver to be compiled in and hence we select it from
the defconfig.

Signed-off-by: Viresh Kumar 
---
 arch/arm/configs/tegra_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/tegra_defconfig b/arch/arm/configs/tegra_defconfig
index 1effb43..3fcec8f 100644
--- a/arch/arm/configs/tegra_defconfig
+++ b/arch/arm/configs/tegra_defconfig
@@ -38,6 +38,7 @@ CONFIG_ZBOOT_ROM_BSS=0x0
 CONFIG_KEXEC=y
 CONFIG_CPU_FREQ=y
 CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
+CONFIG_GENERIC_CPUFREQ_CPU0=y
 CONFIG_CPU_IDLE=y
 CONFIG_VFP=y
 CONFIG_PM_RUNTIME=y
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] ARM: Tegra: start using cpufreq-cpu0 driver

2013-08-07 Thread Viresh Kumar
cpufreq-cpu0 driver can be probed over DT only if a corresponding device node is
created for the SoC which wants to use it. Lets create a platform device for
cpufreq-cpu0 driver for Tegra.

Also it removes the Kconfig entry responsible to compiling tegra-cpufreq driver
and hence there will not be any conflicts between two cpufreq drivers.

Signed-off-by: Viresh Kumar 
---
 arch/arm/mach-tegra/tegra.c | 2 ++
 drivers/cpufreq/Kconfig.arm | 8 
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mach-tegra/tegra.c b/arch/arm/mach-tegra/tegra.c
index 0d1e412..6ab3f69 100644
--- a/arch/arm/mach-tegra/tegra.c
+++ b/arch/arm/mach-tegra/tegra.c
@@ -82,11 +82,13 @@ static struct of_dev_auxdata tegra20_auxdata_lookup[] 
__initdata = {
 
 static void __init tegra_dt_init(void)
 {
+   struct platform_device_info devinfo = { .name = "cpufreq-cpu0", };
struct soc_device_attribute *soc_dev_attr;
struct soc_device *soc_dev;
struct device *parent = NULL;
 
tegra_clocks_apply_init_table();
+   platform_device_register_full();
 
soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
if (!soc_dev_attr)
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index de4d5d9..9472160 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -215,11 +215,3 @@ config ARM_SPEAR_CPUFREQ
default y
help
  This adds the CPUFreq driver support for SPEAr SOCs.
-
-config ARM_TEGRA_CPUFREQ
-   bool "TEGRA CPUFreq support"
-   depends on ARCH_TEGRA
-   select CPU_FREQ_TABLE
-   default y
-   help
- This adds the CPUFreq driver support for TEGRA SOCs.
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] ARM: Tegra: Enable OPP library

2013-08-07 Thread Viresh Kumar
cpufreq-cpu0 driver is dependent on OPP library and hence we need to enable it
for Tegra as we are going to use cpufreq-cpu0.

Signed-off-by: Viresh Kumar 
---
 arch/arm/mach-tegra/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mach-tegra/Kconfig b/arch/arm/mach-tegra/Kconfig
index ef3a8da..63875c5 100644
--- a/arch/arm/mach-tegra/Kconfig
+++ b/arch/arm/mach-tegra/Kconfig
@@ -1,6 +1,8 @@
 config ARCH_TEGRA
bool "NVIDIA Tegra" if ARCH_MULTI_V7
select ARCH_HAS_CPUFREQ
+   select ARCH_HAS_OPP
+   select PM_OPP if PM
select ARCH_REQUIRE_GPIOLIB
select CLKDEV_LOOKUP
select CLKSRC_MMIO
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] clk: Tegra: Add CPU0 clock driver

2013-08-07 Thread Viresh Kumar
This patch adds CPU0's clk driver for Tegra. It will be used by the generic
cpufreq-cpu0 driver to get/set cpu clk.

Most of the platform specific bits are picked from tegra-cpufreq.c.

Signed-off-by: Viresh Kumar 
---
 drivers/clk/tegra/Makefile  |   1 +
 drivers/clk/tegra/clk-cpu.c | 164 
 drivers/clk/tegra/clk-tegra30.c |   4 +
 include/linux/clk/tegra.h   |   1 +
 4 files changed, 170 insertions(+)
 create mode 100644 drivers/clk/tegra/clk-cpu.c

diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
index f49fac2..0e818c0 100644
--- a/drivers/clk/tegra/Makefile
+++ b/drivers/clk/tegra/Makefile
@@ -10,3 +10,4 @@ obj-y += clk-super.o
 obj-$(CONFIG_ARCH_TEGRA_2x_SOC) += clk-tegra20.o
 obj-$(CONFIG_ARCH_TEGRA_3x_SOC) += clk-tegra30.o
 obj-$(CONFIG_ARCH_TEGRA_114_SOC)   += clk-tegra114.o
+obj-$(CONFIG_GENERIC_CPUFREQ_CPU0) += clk-cpu.o
diff --git a/drivers/clk/tegra/clk-cpu.c b/drivers/clk/tegra/clk-cpu.c
new file mode 100644
index 000..01716d6
--- /dev/null
+++ b/drivers/clk/tegra/clk-cpu.c
@@ -0,0 +1,164 @@
+/*
+ * Copyright (C) 2013 Linaro
+ *
+ * Author: Viresh Kumar 
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+/*
+ * Responsible for setting cpu0 clk as requested by cpufreq-cpu0 driver
+ *
+ * All platform specific bits are taken from tegra-cpufreq driver.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+
+#define to_clk_cpu0(_hw) container_of(_hw, struct clk_cpu0, hw)
+
+struct clk_cpu0 {
+   struct clk_hw hw;
+   spinlock_t *lock;
+};
+
+static struct clk *cpu_clk;
+static struct clk *pll_x_clk;
+static struct clk *pll_p_clk;
+static struct clk *emc_clk;
+
+static unsigned long cpu0_recalc_rate(struct clk_hw *hw,
+   unsigned long parent_rate)
+{
+   return clk_get_rate(cpu_clk);
+}
+
+static long cpu0_round_rate(struct clk_hw *hw, unsigned long drate,
+   unsigned long *parent_rate)
+{
+   return clk_round_rate(cpu_clk, drate);
+}
+
+static int cpu0_set_rate(struct clk_hw *hw, unsigned long rate,
+   unsigned long parent_rate)
+{
+   int ret;
+
+   /*
+* Vote on memory bus frequency based on cpu frequency
+* This sets the minimum frequency, display or avp may request higher
+*/
+   if (rate >= 81600)
+   clk_set_rate(emc_clk, 6); /* cpu 816 MHz, emc max */
+   else if (rate >= 45600)
+   clk_set_rate(emc_clk, 3); /* cpu 456 MHz, emc 150Mhz */
+   else
+   clk_set_rate(emc_clk, 1); /* emc 50Mhz */
+
+   /*
+* Take an extra reference to the main pll so it doesn't turn
+* off when we move the cpu off of it
+*/
+   clk_prepare_enable(pll_x_clk);
+
+   ret = clk_set_parent(cpu_clk, pll_p_clk);
+   if (ret) {
+   pr_err("%s: Failed to switch cpu to clock pll_p\n", __func__);
+   goto out;
+   }
+
+   if (rate == clk_get_rate(pll_p_clk))
+   goto out;
+
+   ret = clk_set_rate(pll_x_clk, rate);
+   if (ret) {
+   pr_err("Failed to change pll_x to %lu\n", rate);
+   goto out;
+   }
+
+   ret = clk_set_parent(cpu_clk, pll_x_clk);
+   if (ret) {
+   pr_err("Failed to switch cpu to clock pll_x\n");
+   goto out;
+   }
+
+out:
+   clk_disable_unprepare(pll_x_clk);
+   return ret;
+}
+
+static struct clk_ops clk_cpu0_ops = {
+   .recalc_rate = cpu0_recalc_rate,
+   .round_rate = cpu0_round_rate,
+   .set_rate = cpu0_set_rate,
+};
+
+struct clk *tegra_clk_register_cpu0(void)
+{
+   struct clk_init_data init;
+   struct clk_cpu0 *cpu0;
+   struct clk *clk;
+
+   cpu0 = kzalloc(sizeof(*cpu0), GFP_KERNEL);
+   if (!cpu0) {
+   pr_err("%s: could not allocate cpu0 clk\n", __func__);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   cpu_clk = clk_get_sys(NULL, "cpu");
+   if (IS_ERR(cpu_clk)) {
+   clk = cpu_clk;
+   goto free_mem;
+   }
+
+   pll_x_clk = clk_get_sys(NULL, "pll_x");
+   if (IS_ERR(pll_x_clk)) {
+   clk = pll_x_clk;
+   goto put_cpu_clk;
+   }
+
+   pll_p_clk = clk_get_sys(NULL, "pll_p_cclk");
+   if (IS_ERR(pll_p_clk)) {
+   clk = pll_p_clk;
+   goto put_pll_x_clk;
+   }
+
+   emc_clk = clk_get_sys("cpu", "emc");
+   if (IS_ERR(emc_clk)) {
+   clk = emc_clk;
+   goto put_pll_p_clk;
+   }
+
+   cpu0->hw.init = 
+
+   init.name = "cpu0";
+   init.ops = _cpu0_ops;
+   init.flags = CLK_IS_ROOT | CLK_GET_RATE_NOCACHE;
+   init.num_parents = 0;
+
+   clk 

[PATCH 0/6] Tegra: Use cpufreq-cpu0 driver

2013-08-07 Thread Viresh Kumar
Hi Stephen,

This is the first attempt to get rid of tegra-cpufreq driver. This patchset
tries to add supporting infrastructure for tegra to use cpufreq-cpu0 driver.

I don't have hardware to test it and so is compiled tested only.. Few bits may
be missing as I couldn't think of all aspects and so may need your help getting
them fixed.

Once this is tested by you, I would like to take it through my ARM cpufreq tree
if nobody else has a problem with it.

Thanks

--
Viresh.

Viresh Kumar (6):
  clk: Tegra: Add CPU0 clock driver
  ARM: Tegra: Add CPU's OPPs for using cpufreq-cpu0 driver
  ARM: Tegra: Enable OPP library
  ARM: Tegra: defconfig: select cpufreq-cpu0 driver
  ARM: Tegra: start using cpufreq-cpu0 driver
  cpufreq: Tegra: Remove tegra-cpufreq driver

 arch/arm/boot/dts/tegra114.dtsi  |  12 ++
 arch/arm/boot/dts/tegra20.dtsi   |  12 ++
 arch/arm/boot/dts/tegra30.dtsi   |  12 ++
 arch/arm/configs/tegra_defconfig |   1 +
 arch/arm/mach-tegra/Kconfig  |   2 +
 arch/arm/mach-tegra/tegra.c  |   2 +
 drivers/clk/tegra/Makefile   |   1 +
 drivers/clk/tegra/clk-cpu.c  | 164 ++
 drivers/clk/tegra/clk-tegra30.c  |   4 +
 drivers/cpufreq/Kconfig.arm  |   8 --
 drivers/cpufreq/Makefile |   1 -
 drivers/cpufreq/tegra-cpufreq.c  | 291 ---
 include/linux/clk/tegra.h|   1 +
 13 files changed, 211 insertions(+), 300 deletions(-)
 create mode 100644 drivers/clk/tegra/clk-cpu.c
 delete mode 100644 drivers/cpufreq/tegra-cpufreq.c

-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-07 Thread David Milburn

Roland Dreier wrote:

From: Roland Dreier 

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(>rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(>rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(>rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(>read_wait);
kill_fasync(>async_qp, SIGPOLL, POLL_IN);
kref_put(>f_ref, sg_remove_sfp);
} else {
INIT_WORK(>ew.work, sg_rq_end_io_usercontext);
schedule_work(>ew.work);
}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

As suggested by James Bottomley ,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.

There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.

Huge thanks to Costa Sapuntzakis  for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier 
Cc: 
---
 fs/bio.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..c5eae72 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
bio_vec *iovecs,
 int bio_uncopy_user(struct bio *bio)
 {
struct bio_map_data *bmd = bio->bi_private;
-   int ret = 0;
+   struct bio_vec *bvec;
+   int ret = 0, i;
 
-	if (!bio_flagged(bio, BIO_NULL_MAPPED))

-   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
-bmd->nr_sgvecs, bio_data_dir(bio) == READ,
-0, bmd->is_our_pages);
+   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
+   /*
+* if we're in a workqueue, the request is orphaned, so
+* don't copy into a random user address space, just free.
+*/
+   if (current->mm)
+   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
+bmd->nr_sgvecs, bio_data_dir(bio) 
== READ,
+0, bmd->is_our_pages);
+   else if (bmd->is_our_pages)
+   bio_for_each_segment_all(bvec, bio, i)
+   __free_page(bvec->bv_page);
+   }
bio_free_map_data(bmd);
bio_put(bio);
return ret;


Hi Roland,

I was able to succesfully test this patch overnight, I had been 
experimenting with the
sg driver setting the 

Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > OK, I have obviously misunderstood your concern mentioned in the other
> > email. Could you be more specific what is the DoS scenario which was
> > your concern, then?
> 
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.

OK, got your point. You are right and I haven't considered the size of
the table and the size restrictions of kmalloc. Thanks for pointing this
out!
---
>From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Wed, 7 Aug 2013 11:11:22 +0200
Subject: [PATCH] memcg: limit the number of thresholds per-memcg

There is no limit for the maximum number of threshold events registered
per memcg. It is even worse that all the events are stored in a
per-memcg table which is enlarged when a new event is registered. This
can lead to the following issue mentioned by Tejun:
"
So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.
"

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo 
Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c |8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup 
*memcg)
mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS   1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup 
*cgrp,
else
BUG();
 
+   if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2c-designware: Manually set RESTART bit between messages

2013-08-07 Thread Wolfram Sang
On Fri, Jun 21, 2013 at 03:05:28PM +0800, Chew Chiau Ee wrote:
> From: Chew, Chiau Ee 
> 
> If both IC_EMPTYFIFO_HOLD_MASTER_EN and IC_RESTART_EN are set to 1, the
> Designware I2C controller doesn't generate RESTART unless user specifically
> requests it by setting RESTART bit in IC_DATA_CMD register.
> 
> Since IC_EMPTYFIFO_HOLD_MASTER_EN setting can't be detected from hardware
> register, we must always manually set the restart bit between messages.
> 
> Signed-off-by: Chew, Chiau Ee 

Applied to for-next, thanks!



signature.asc
Description: Digital signature


Re: [patch v2 2/3] mm: page_alloc: rearrange watermark checking in get_page_from_freelist

2013-08-07 Thread Mel Gorman
On Fri, Aug 02, 2013 at 11:37:25AM -0400, Johannes Weiner wrote:
> Allocations that do not have to respect the watermarks are rare
> high-priority events.  Reorder the code such that per-zone dirty
> limits and future checks important only to regular page allocations
> are ignored in these extraordinary situations.
> 
> Signed-off-by: Johannes Weiner 
> Reviewed-by: Rik van Riel 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2 1/3] mm: vmscan: fix numa reclaim balance problem in kswapd

2013-08-07 Thread Mel Gorman
On Fri, Aug 02, 2013 at 11:37:24AM -0400, Johannes Weiner wrote:
> When the page allocator fails to get a page from all zones in its
> given zonelist, it wakes up the per-node kswapds for all zones that
> are at their low watermark.
> 
> However, with a system under load the free pages in a zone can
> fluctuate enough that the allocation fails but the kswapd wakeup is
> also skipped while the zone is still really close to the low
> watermark.
> 
> When one node misses a wakeup like this, it won't be aged before all
> the other node's zones are down to their low watermarks again.  And
> skipping a full aging cycle is an obvious fairness problem.
> 
> Kswapd runs until the high watermarks are restored, so it should also
> be woken when the high watermarks are not met.  This ages nodes more
> equally and creates a safety margin for the page counter fluctuation.
> 
> By using zone_balanced(), it will now check, in addition to the
> watermark, if compaction requires more order-0 pages to create a
> higher order page.
> 
> Signed-off-by: Johannes Weiner 
> Reviewed-by: Rik van Riel 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v3 2/5] dma: mpc512x: add support for peripheral transfers

2013-08-07 Thread Alexander Popov
2013/8/3 Gerhard Sittig :
> On Wed, Jul 31, 2013 at 11:21 +0400, Alexander Popov wrote:
>>
> You don't provide a lot of information to those you want to
> receive feedback from.  You should keep a history and list the
> changes between versions.  And you may want to somehow link this
> v3 to its predecessor -- especially when you only send part of
> the series and assume that reviewers may know where to find the
> remainder.
>
> Please help those persons you want to get help from.

Thanks. Now I see how to collaborate via mailing lists properly.

> I think it's unfortunate to attribute the "will access
> peripheral" to the channel instead of the transfer job, and to
> set the flag from within the device control callback, and to
> nevery clear the flag (what will happen if a channel gets freed
> and reallocated by some other client?).
>
> I think that the peripheral access is an attribute of the
> transfer job, and should be setup in the prep routines (both set
> and cleared, depending on what gets setup).  This would be more
> robust and more readable (read: maintainable) in my eyes.

Yes. I agree, I will implement it and offer differences from RFC v2
in the initial topic.

Best regards,
Alexander.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: fix wrong address when loading PRM_FRAC_INCREMENTOR_DENUMERATOR_RELOAD

2013-08-07 Thread Chen Baozi
The denominator should be load from INCREMENTOR_DENUMERATOR_RELOAD_OFFSET
rather than INCREMENTER_NUMERATOR_OFFSET.

This is more likely a typo, since INCREMENTER_DENUMERATOR_RELOAD[23:17] is
reserved. It seems that it won't make much trouble without this fix, because
the useful [11:0] bits are mask and set the right value. Anyway, reading
from a right address is better choice.

Signed-off-by: Chen Baozi 
---
 arch/arm/mach-omap2/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c
index 1e77f11..ccc5c72 100644
--- a/arch/arm/mach-omap2/timer.c
+++ b/arch/arm/mach-omap2/timer.c
@@ -537,7 +537,7 @@ static void __init realtime_counter_init(void)
reg |= num;
__raw_writel(reg, base + INCREMENTER_NUMERATOR_OFFSET);
 
-   reg = __raw_readl(base + INCREMENTER_NUMERATOR_OFFSET) &
+   reg = __raw_readl(base + INCREMENTER_DENUMERATOR_RELOAD_OFFSET) &
NUMERATOR_DENUMERATOR_MASK;
reg |= den;
__raw_writel(reg, base + INCREMENTER_DENUMERATOR_RELOAD_OFFSET);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v3 1/5] dma: mpc512x: reorder mpc8308 specific instructions

2013-08-07 Thread Alexander Popov
2013/8/3 Gerhard Sittig :
> On Wed, Jul 31, 2013 at 11:20 +0400, Alexander Popov wrote:
>>
> Please make sure to either cite
> properly or to properly mark changes as such.  Don't spread false
> information, please.  You are free to change what I submitted,
> but you should not pretend that I wrote what has become of the
> code after you have modified it.  Please fix the attribution.

Excuse me for the confusion.
I'll be careful with "From:" notes.

> Just to clarify:  The defines here appear to be more appropriate
> than the initial enums, after it turned out that we need not
> handle indiviudal channels in special ways, and really only need
> these three numbers (one of them being the maximum of the
> others).  But regardless of what you have changed, you should
> clearly state the fact.

Ok, I'll do so.

Best regards,
Alexander.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] KVM: MMU: fix check the reserved bits on the gpte of L2

2013-08-07 Thread Paolo Bonzini

On 08/05/2013 06:59 AM, Xiao Guangrong wrote:

Current code always uses arch.mmu to check the reserved bits on guest gpte
which is valid only for L1 guest, we should use arch.nested_mmu instead when
we translate gva to gpa for the L2 guest

Fix it by using @mmu instead since it is adapted to the current mmu mode
automatically

The bug can be triggered when nested npt is used and L1 guest and L2 guest
use different mmu mode

Reported-by: Jan Kiszka 
Signed-off-by: Xiao Guangrong 
---
  arch/x86/kvm/paging_tmpl.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7769699..3a75828 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -218,8 +218,7 @@ retry_walk:
if (unlikely(!is_present_gpte(pte)))
goto error;

-   if (unlikely(is_rsvd_bits_set(>arch.mmu, pte,
- walker->level))) {
+   if (unlikely(is_rsvd_bits_set(mmu, pte, walker->level))) {
errcode |= PFERR_RSVD_MASK | PFERR_PRESENT_MASK;
goto error;
}



Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] xfs: introduce object readahead to log recovery

2013-08-07 Thread Zhi Yong Wu
HI, xfs maintainers,

any comments?

On Wed, Jul 31, 2013 at 4:42 PM,   wrote:
> From: Zhi Yong Wu 
>
>   It can take a long time to run log recovery operation because it is
> single threaded and is bound by read latency. We can find that it took
> most of the time to wait for the read IO to occur, so if one object
> readahead is introduced to log recovery, it will obviously reduce the
> log recovery time.
>
> Log recovery time stat:
>
>   w/o this patchw/ this patch
>
> real:0m15.023s 0m7.802s
> user:0m0.001s  0m0.001s
> sys: 0m0.246s  0m0.107s
>
> Signed-off-by: Zhi Yong Wu 
> ---
>  fs/xfs/xfs_log_recover.c | 159 
> +--
>  1 file changed, 153 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 7681b19..ebb00bc 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3116,6 +3116,106 @@ xlog_recover_free_trans(
> kmem_free(trans);
>  }
>
> +STATIC void
> +xlog_recover_buffer_ra_pass2(
> +   struct xlog *log,
> +   struct xlog_recover_item*item)
> +{
> +   struct xfs_buf_log_format   *buf_f = item->ri_buf[0].i_addr;
> +   struct xfs_mount*mp = log->l_mp;
> +
> +   if (xlog_check_buffer_cancelled(log, buf_f->blf_blkno,
> +   buf_f->blf_len, buf_f->blf_flags)) {
> +   return;
> +   }
> +
> +   xfs_buf_readahead(mp->m_ddev_targp, buf_f->blf_blkno,
> +   buf_f->blf_len, NULL);
> +}
> +
> +STATIC void
> +xlog_recover_inode_ra_pass2(
> +   struct xlog *log,
> +   struct xlog_recover_item*item)
> +{
> +   struct xfs_inode_log_format ilf_buf;
> +   struct xfs_inode_log_format *ilfp;
> +   struct xfs_mount*mp = log->l_mp;
> +   int error;
> +
> +   if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
> +   ilfp = item->ri_buf[0].i_addr;
> +   } else {
> +   ilfp = _buf;
> +   memset(ilfp, 0, sizeof(*ilfp));
> +   error = xfs_inode_item_format_convert(>ri_buf[0], ilfp);
> +   if (error)
> +   return;
> +   }
> +
> +   if (xlog_check_buffer_cancelled(log, ilfp->ilf_blkno, ilfp->ilf_len, 
> 0))
> +   return;
> +
> +   xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
> +   ilfp->ilf_len, _inode_buf_ops);
> +}
> +
> +STATIC void
> +xlog_recover_dquot_ra_pass2(
> +   struct xlog *log,
> +   struct xlog_recover_item*item)
> +{
> +   struct xfs_mount*mp = log->l_mp;
> +   struct xfs_disk_dquot   *recddq;
> +   struct xfs_dq_logformat *dq_f;
> +   uinttype;
> +
> +
> +   if (mp->m_qflags == 0)
> +   return;
> +
> +   recddq = item->ri_buf[1].i_addr;
> +   if (recddq == NULL)
> +   return;
> +   if (item->ri_buf[1].i_len < sizeof(struct xfs_disk_dquot))
> +   return;
> +
> +   type = recddq->d_flags & (XFS_DQ_USER | XFS_DQ_PROJ | XFS_DQ_GROUP);
> +   ASSERT(type);
> +   if (log->l_quotaoffs_flag & type)
> +   return;
> +
> +   dq_f = item->ri_buf[0].i_addr;
> +   ASSERT(dq_f);
> +   ASSERT(dq_f->qlf_len == 1);
> +
> +   xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno,
> +   dq_f->qlf_len, NULL);
> +}
> +
> +STATIC void
> +xlog_recover_ra_pass2(
> +   struct xlog *log,
> +   struct xlog_recover_item*item)
> +{
> +   switch (ITEM_TYPE(item)) {
> +   case XFS_LI_BUF:
> +   xlog_recover_buffer_ra_pass2(log, item);
> +   break;
> +   case XFS_LI_INODE:
> +   xlog_recover_inode_ra_pass2(log, item);
> +   break;
> +   case XFS_LI_DQUOT:
> +   xlog_recover_dquot_ra_pass2(log, item);
> +   break;
> +   case XFS_LI_EFI:
> +   case XFS_LI_EFD:
> +   case XFS_LI_QUOTAOFF:
> +   default:
> +   break;
> +   }
> +}
> +
>  STATIC int
>  xlog_recover_commit_pass1(
> struct xlog *log,
> @@ -3177,6 +3277,26 @@ xlog_recover_commit_pass2(
> }
>  }
>
> +STATIC int
> +xlog_recover_items_pass2(
> +   struct xlog *log,
> +   struct xlog_recover *trans,
> +   struct list_head*buffer_list,
> +   struct list_head*item_list)
> +{
> +   struct xlog_recover_item*item;
> +   int error = 0;
> +
> +   list_for_each_entry(item, item_list, ri_list) {
> +   error = xlog_recover_commit_pass2(log, trans,
> + 

Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control

2013-08-07 Thread Tejun Heo
On Wed, Aug 07, 2013 at 03:57:34PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > > It isn't different from listening from epoll, for example.
> > > 
> > > epoll limits the number of watchers, no?
> > 
> > Not that I know of.  It'll be limited by max open fds but I don't
> > think there are other limits. 
> 
> max_user_watches seems to be a limit (4% of lowmem in maximum).

That's per *user* not per event source.  The problem here is creating
a global (across securit domains) resource shared by all users.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/22] ARM: ux500: Remove '0x's from HREF v60+ DTS file

2013-08-07 Thread Lee Jones
On Wed, 07 Aug 2013, Linus Walleij wrote:

> On Mon, Jul 22, 2013 at 12:52 PM, Lee Jones  wrote:
> 
> > Signed-off-by: Lee Jones 
> 
> None of these patches apply since I applied your other patch series
> that rename all the files ... can you respin the ux500 "0x"-strip patches
> on top of the rename set? My ux500-devicetree branch can be used
> as a baseline.

I can do that. Although, would you prefer that I fixed-up my renaming
patches, then applied the 0x patches on top instead?

-- 
Lee Jones
Linaro ST-Ericsson Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Tejun Heo
Hello,

On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> OK, I have obviously misunderstood your concern mentioned in the other
> email. Could you be more specific what is the DoS scenario which was
> your concern, then?

So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.

Putting an extra limit on it isn't an actual solution but could be
better, I think.  It at least makes it clear that this is a limited
global resource.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:47:41, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > > It isn't different from listening from epoll, for example.
> > 
> > epoll limits the number of watchers, no?
> 
> Not that I know of.  It'll be limited by max open fds but I don't
> think there are other limits. 

max_user_watches seems to be a limit (4% of lowmem in maximum).

> Why would there be?

Because userspace should hog kernel resources without any limit.

> > > If there needs to be kernel memory limit, shouldn't that be handled by
> > > kmemcg?
> > 
> > kmemcg would surely help but turning it on just because of potential
> > abuse of the event registration API sounds like an overkill.
> > 
> > I think having a cap for user trigable kernel resources is a good thing
> > in general.
> 
> I don't know.  It's just very arbitrary because listening to events
> itself isn't (and shouldn't) be something which consumes resource
> which isn't attributed to the listener and this artificially creates a
> global resource.  The problem with memory usage event is breaching
> that rule with shared kmalloc() so putting well-defined limit on it is
> fine but the latter two create additional artificial restrictions
> which are both unnecessary and unconventional.  No?

Hmm, OK so you think that the fd limit is sufficient already?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf,arm -- oops in validate_event

2013-08-07 Thread Mark Rutland
On Wed, Aug 07, 2013 at 02:00:27PM +0100, Will Deacon wrote:
> On Tue, Aug 06, 2013 at 02:08:15PM +0100, Mark Rutland wrote:
> > On Tue, Aug 06, 2013 at 12:59:21PM +0100, Will Deacon wrote:
> > > But we already check `event->pmu != leader_pmu' in validate_event, so we
> > > shouldn't get anywhere nearer calling get_event_idx in the case you
> > > describe. It sounds more like we have an inconsistency with one of the
> > > events.
> > 
> > Note in my example that the software event was the group leader (so in
> > fact we'd *only* be checking those events which we can't actually
> > handle...).
> > 
> > I was also under the impression that in the case of mixed hardware and
> > software events, a hardware event must be the group leader. That
> > doesn't seem to be the case. If a hardware event is added to a software
> > group, the group is moved to hardware context but the original software
> > event stays as the group leader.
> 
> Ok, so the following quick hack below should solve the issue (can you confirm
> it please, since I don't have access to any hardware atm?)

It works for me when running Vince's test case.

Tested-by: Mark Rutland 

> 
> We should revisit this for 3.12 though, because I'm not sure that our
> validation code even does the right thing when there are multiple PMUs
> involved.

Certainly. I suspect we're not alone there.

Thanks,
Mark.

> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index d9f5cd4..0500f10b 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -253,6 +253,9 @@ validate_event(struct pmu_hw_events *hw_events,
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> struct pmu *leader_pmu = event->group_leader->pmu;
>  
> +   if (is_software_event(event))
> +   return 1;
> +
> if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
> return 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/18] ARM: integrator: Switch to sched_clock_register()

2013-08-07 Thread Linus Walleij
On Thu, Aug 1, 2013 at 12:31 AM, Stephen Boyd  wrote:

> The 32 bit sched_clock interface now supports 64 bits. Upgrade to
> the 64 bit function to allow us to remove the 32 bit registration
> interface.
>
> Cc: Linus Walleij 
> Signed-off-by: Stephen Boyd 

For this patch (given the idea is accepted)
Acked-by: Linus Walleij 

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Tools: hv: use full nlmsghdr in netlink_send

2013-08-07 Thread KY Srinivasan


> -Original Message-
> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: Wednesday, August 07, 2013 9:45 AM
> To: KY Srinivasan; gre...@linuxfoundation.org
> Cc: linux-kernel@vger.kernel.org; Olaf Hering
> Subject: [PATCH] Tools: hv: use full nlmsghdr in netlink_send
> 
> There is no need to have a nlmsghdr pointer to another temporary buffer.
> Instead use a full struct nlmsghdr.
> 
> Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
> ---
>  tools/hv/hv_kvp_daemon.c | 15 +--
>  tools/hv/hv_vss_daemon.c | 15 +--
>  2 files changed, 10 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
> index 1bd1ad1..7c05f55 100644
> --- a/tools/hv/hv_kvp_daemon.c
> +++ b/tools/hv/hv_kvp_daemon.c
> @@ -1393,23 +1393,18 @@ kvp_get_domain_name(char *buffer, int length)
>  static int
>  netlink_send(int fd, struct cn_msg *msg)
>  {
> - struct nlmsghdr *nlh;
> + struct nlmsghdr nlh = { .nlmsg_type = NLMSG_DONE };
>   unsigned int size;
>   struct msghdr message;
> - char buffer[64];
>   struct iovec iov[2];
> 
>   size = sizeof(struct cn_msg) + msg->len;
> 
> - nlh = (struct nlmsghdr *)buffer;
> - nlh->nlmsg_seq = 0;
> - nlh->nlmsg_pid = getpid();
> - nlh->nlmsg_type = NLMSG_DONE;
> - nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
> - nlh->nlmsg_flags = 0;
> + nlh.nlmsg_pid = getpid();
> + nlh.nlmsg_len = NLMSG_LENGTH(size);
> 
> - iov[0].iov_base = nlh;
> - iov[0].iov_len = sizeof(*nlh);
> + iov[0].iov_base = 
> + iov[0].iov_len = sizeof(nlh);
> 
>   iov[1].iov_base = msg;
>   iov[1].iov_len = size;
> diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
> index 2f1f53f..8ac0ee7 100644
> --- a/tools/hv/hv_vss_daemon.c
> +++ b/tools/hv/hv_vss_daemon.c
> @@ -105,23 +105,18 @@ static int vss_operate(int operation)
> 
>  static int netlink_send(int fd, struct cn_msg *msg)
>  {
> - struct nlmsghdr *nlh;
> + struct nlmsghdr nlh = { .nlmsg_type = NLMSG_DONE };
>   unsigned int size;
>   struct msghdr message;
> - char buffer[64];
>   struct iovec iov[2];
> 
>   size = sizeof(struct cn_msg) + msg->len;
> 
> - nlh = (struct nlmsghdr *)buffer;
> - nlh->nlmsg_seq = 0;
> - nlh->nlmsg_pid = getpid();
> - nlh->nlmsg_type = NLMSG_DONE;
> - nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
> - nlh->nlmsg_flags = 0;
> + nlh.nlmsg_pid = getpid();
> + nlh.nlmsg_len = NLMSG_LENGTH(size);
> 
> - iov[0].iov_base = nlh;
> - iov[0].iov_len = sizeof(*nlh);
> + iov[0].iov_base = 
> + iov[0].iov_len = sizeof(nlh);
> 
>   iov[1].iov_base = msg;
>   iov[1].iov_len = size;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/22] ARM: ux500: Remove '0x's from HREF v60+ DTS file

2013-08-07 Thread Linus Walleij
On Mon, Jul 22, 2013 at 12:52 PM, Lee Jones  wrote:

> Signed-off-by: Lee Jones 

None of these patches apply since I applied your other patch series
that rename all the files ... can you respin the ux500 "0x"-strip patches
on top of the rename set? My ux500-devicetree branch can be used
as a baseline.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 00/11] Add namespace support for syslog

2013-08-07 Thread Serge Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> Since this still has not been addressed.  I am going to repeat Andrews
> objection again.
> 
> Isn't there a better way to get iptables information out than to use
> syslog.  I did not have time to follow up on that but it did appear that

Bruno suggested NFLOG target + ulogd.  That's not ideal, but doable.  At
least each container should be able to do that for itself.  What it
won't do is let a host admin make sure that he doesn't get corrupted
syslog entries when partial-lines get sent from several containers and
the kernel and randomly spliced together.  It also would simply be
better if the information was *always* sent to userspace instead of
syslog.

> someone did have a better way to get the information out.
> 
> Essentially the argument against this goes.  The kernel logging facility
> is really not a particularly good tool to be using for anything other
> than kernel debugging information, and there appear to be no substantial
> uses for a separate syslog that should not be done in other ways.
> 
> That design objection must be addressed before merging this code can be
> given serious consideration.
> 
> Eric
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/13] ARM: ux500: Remove u9540.dts as it's been replaced

2013-08-07 Thread Linus Walleij
On Fri, Jul 19, 2013 at 4:13 PM, Lee Jones  wrote:

> This must have been a merge error. There was a patch which renamed the
> u9540.dts to ccu9540.dts, however the u9540.dts was reincarnate with
> the same patches which created it in the first place. Let's kill it
> once and for all.
>
> Signed-off-by: Lee Jones 

I applied all the rename patches but it appears they were never
really tested, so I made this patch fixing all the bugs they introduced.
(Quicker than iterating the patch set.)

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control

2013-08-07 Thread Tejun Heo
Hello,

On Wed, Aug 07, 2013 at 03:37:46PM +0200, Michal Hocko wrote:
> > It isn't different from listening from epoll, for example.
> 
> epoll limits the number of watchers, no?

Not that I know of.  It'll be limited by max open fds but I don't
think there are other limits.  Why would there be?

> > If there needs to be kernel memory limit, shouldn't that be handled by
> > kmemcg?
> 
> kmemcg would surely help but turning it on just because of potential
> abuse of the event registration API sounds like an overkill.
> 
> I think having a cap for user trigable kernel resources is a good thing
> in general.

I don't know.  It's just very arbitrary because listening to events
itself isn't (and shouldn't) be something which consumes resource
which isn't attributed to the listener and this artificially creates a
global resource.  The problem with memory usage event is breaching
that rule with shared kmalloc() so putting well-defined limit on it is
fine but the latter two create additional artificial restrictions
which are both unnecessary and unconventional.  No?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:22:10, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of threshold events registered
> > per memcg. This might lead to an user triggered memory depletion if a
> > regular user is allowed to register on memory.[memsw.]usage_in_bytes
> > eventfd interface.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> I don't think the memory consumption per-se is the issue to be handled
> here (as kernel memory consumption is a different generic problem) but
> rather that all listeners, regardless of their priv level, cgroup
> membership and so on, end up contributing to this single shared
> contiguous table,

The table is per-memcg but you are right that everybody who has file
write access to the particular group's usage file can register to it.

> which makes it quite easy to do DoS attack on it if
> the event control is actually delegated to untrusted security domain,

OK, I have obviously misunderstood your concern mentioned in the other
email. Could you be more specific what is the DoS scenario which was
your concern, then?

[...]
> Can you please update the patch description to reflect the actual
> problem?

As soon as I understand what is your concern ;)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Tools: hv: use full nlmsghdr in netlink_send

2013-08-07 Thread Olaf Hering
There is no need to have a nlmsghdr pointer to another temporary buffer.
Instead use a full struct nlmsghdr.

Signed-off-by: Olaf Hering 
---
 tools/hv/hv_kvp_daemon.c | 15 +--
 tools/hv/hv_vss_daemon.c | 15 +--
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 1bd1ad1..7c05f55 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1393,23 +1393,18 @@ kvp_get_domain_name(char *buffer, int length)
 static int
 netlink_send(int fd, struct cn_msg *msg)
 {
-   struct nlmsghdr *nlh;
+   struct nlmsghdr nlh = { .nlmsg_type = NLMSG_DONE };
unsigned int size;
struct msghdr message;
-   char buffer[64];
struct iovec iov[2];
 
size = sizeof(struct cn_msg) + msg->len;
 
-   nlh = (struct nlmsghdr *)buffer;
-   nlh->nlmsg_seq = 0;
-   nlh->nlmsg_pid = getpid();
-   nlh->nlmsg_type = NLMSG_DONE;
-   nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
-   nlh->nlmsg_flags = 0;
+   nlh.nlmsg_pid = getpid();
+   nlh.nlmsg_len = NLMSG_LENGTH(size);
 
-   iov[0].iov_base = nlh;
-   iov[0].iov_len = sizeof(*nlh);
+   iov[0].iov_base = 
+   iov[0].iov_len = sizeof(nlh);
 
iov[1].iov_base = msg;
iov[1].iov_len = size;
diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
index 2f1f53f..8ac0ee7 100644
--- a/tools/hv/hv_vss_daemon.c
+++ b/tools/hv/hv_vss_daemon.c
@@ -105,23 +105,18 @@ static int vss_operate(int operation)
 
 static int netlink_send(int fd, struct cn_msg *msg)
 {
-   struct nlmsghdr *nlh;
+   struct nlmsghdr nlh = { .nlmsg_type = NLMSG_DONE };
unsigned int size;
struct msghdr message;
-   char buffer[64];
struct iovec iov[2];
 
size = sizeof(struct cn_msg) + msg->len;
 
-   nlh = (struct nlmsghdr *)buffer;
-   nlh->nlmsg_seq = 0;
-   nlh->nlmsg_pid = getpid();
-   nlh->nlmsg_type = NLMSG_DONE;
-   nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
-   nlh->nlmsg_flags = 0;
+   nlh.nlmsg_pid = getpid();
+   nlh.nlmsg_len = NLMSG_LENGTH(size);
 
-   iov[0].iov_base = nlh;
-   iov[0].iov_len = sizeof(*nlh);
+   iov[0].iov_base = 
+   iov[0].iov_len = sizeof(nlh);
 
iov[1].iov_base = msg;
iov[1].iov_len = size;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] cris: fix return type of ffs()

2013-08-07 Thread Akinobu Mita
The return type of ffs() is 'int' on all architectures except cris and
hexagon.  This unifies the return type to 'int'.

The problem I'm seeing is that the following line generates a warning
on cris and hexagon because of the mismatch between format '%u' and
return type of ffs().

printk("bits in OOB size: %u\n",ffs(ns->geom.oobsz) - 1);

But removing this warning by casting to 'int' looks odd, so I suggest
unifying the return type of ffs() on all architectures.

Signed-off-by: Akinobu Mita 
Reported-by: Fengguang Wu 
Cc: Mikael Starvik 
Cc: Jesper Nilsson 
Cc: linux-cris-ker...@axis.com
Cc: Richard Kuo 
Cc: linux-hexa...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
---
 arch/cris/include/arch-v10/arch/bitops.h | 2 +-
 arch/cris/include/arch-v32/arch/bitops.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/cris/include/arch-v10/arch/bitops.h 
b/arch/cris/include/arch-v10/arch/bitops.h
index 03d9cfd..cc37a22 100644
--- a/arch/cris/include/arch-v10/arch/bitops.h
+++ b/arch/cris/include/arch-v10/arch/bitops.h
@@ -65,7 +65,7 @@ static inline unsigned long __ffs(unsigned long word)
  * differs in spirit from the above ffz (man ffs).
  */
 
-static inline unsigned long kernel_ffs(unsigned long w)
+static inline int kernel_ffs(unsigned long w)
 {
return w ? cris_swapwbrlz (w) + 1 : 0;
 }
diff --git a/arch/cris/include/arch-v32/arch/bitops.h 
b/arch/cris/include/arch-v32/arch/bitops.h
index 147689d6..a5d0963 100644
--- a/arch/cris/include/arch-v32/arch/bitops.h
+++ b/arch/cris/include/arch-v32/arch/bitops.h
@@ -55,7 +55,7 @@ __ffs(unsigned long w)
 /*
  * Find First Bit that is set.
  */
-static inline unsigned long
+static inline int
 kernel_ffs(unsigned long w)
 {
return w ? cris_swapwbrlz (w) + 1 : 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] hexagon: fix return type of ffs()

2013-08-07 Thread Akinobu Mita
The return type of ffs() is 'int' on all architectures except cris and
hexagon.  This unifies the return type to 'int'.

The problem I'm seeing is that the following line generates a warning
on cris and hexagon because of the mismatch between format '%u' and
return type of ffs().

printk("bits in OOB size: %u\n",ffs(ns->geom.oobsz) - 1);

But removing this warning by casting to 'int' looks odd, so I suggest
unifying the return type of ffs() on all architectures.

Signed-off-by: Akinobu Mita 
Reported-by: Fengguang Wu 
Cc: Mikael Starvik 
Cc: Jesper Nilsson 
Cc: linux-cris-ker...@axis.com
Cc: Richard Kuo 
Cc: linux-hexa...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
---
This patch is not compile tested yet, because I couldn't find cross
compiler for hexagon.

 arch/hexagon/include/asm/bitops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/hexagon/include/asm/bitops.h 
b/arch/hexagon/include/asm/bitops.h
index 9b1e4af..80e34a6 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -234,7 +234,7 @@ static inline long fls(int x)
  * the libc and compiler builtin ffs routines, therefore
  * differs in spirit from the above ffz (man ffs).
  */
-static inline long ffs(int x)
+static inline int ffs(int x)
 {
int r;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/17] perf util: Save pid-cmdline mapping into tracing header

2013-08-07 Thread David Ahern

On 8/5/13 3:17 AM, Namhyung Kim wrote:

I don't think this is a problem, its in line with Ingo's suggestion of a
new perf ioctl to ask the kernel to generate PERF_RECORD_MMAP events for
existing threads.


Hmm.. could you please give me a link of the thread?


I believe this is the thread being referred to:
https://lkml.org/lkml/2013/6/25/180

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Add madvise(..., MADV_WILLWRITE)

2013-08-07 Thread Jan Kara
On Mon 05-08-13 12:43:58, Andy Lutomirski wrote:
> My application fallocates and mmaps (shared, writable) a lot (several
> GB) of data at startup.  Those mappings are mlocked, and they live on
> ext4.  The first write to any given page is slow because
> ext4_da_get_block_prep can block.  This means that, to get decent
> performance, I need to write something to all of these pages at
> startup.  This, in turn, causes a giant IO storm as several GB of
> zeros get pointlessly written to disk.
> 
> This series is an attempt to add madvise(..., MADV_WILLWRITE) to
> signal to the kernel that I will eventually write to the referenced
> pages.  It should cause any expensive operations that happen on the
> first write to happen immediately, but it should not result in
> dirtying the pages.
> 
> madvice(addr, len, MADV_WILLWRITE) returns the number of bytes that
> the operation succeeded on or a negative error code if there was an
> actual failure.  A return value of zero signifies that the kernel
> doesn't know how to "willwrite" the range and that userspace should
> implement a fallback.
> 
> For now, it only works on shared writable ext4 mappings.  Eventually
> it should support other filesystems as well as private pages (it
> should COW the pages but not cause swap IO) and anonymous pages (it
> should COW the zero page if applicable).
> 
> The implementation leaves much to be desired.  In particular, it
> generates dirty buffer heads on a clean page, and this scares me.
> 
> Thoughts?
  One question before I look at the patches: Why don't you use fallocate()
in your application? The functionality you require seems to be pretty
similar to it - writing to an already allocated block is usually quick.


Honza

> Andy Lutomirski (3):
>   mm: Add MADV_WILLWRITE to indicate that a range will be written to
>   fs: Add block_willwrite
>   ext4: Implement willwrite for the delalloc case
> 
>  fs/buffer.c| 57 
> ++
>  fs/ext4/ext4.h |  2 ++
>  fs/ext4/file.c |  1 +
>  fs/ext4/inode.c| 22 +
>  include/linux/buffer_head.h|  3 ++
>  include/linux/mm.h | 12 +++
>  include/uapi/asm-generic/mman-common.h |  3 ++
>  mm/madvise.c   | 28 +++--
>  8 files changed, 126 insertions(+), 2 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] memcg: Limit the number of events registered on oom_control

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:08:36, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Aug 07, 2013 at 01:28:26PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of oom_control events
> > registered per memcg. This might lead to an user triggered memory
> > depletion if a regular user is allowed to register events.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_OOM_NOTIFY_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 24KB which shouldn't be a big deal
> > and it should be good enough (even 1024 oom notification events sounds
> > crazy).
> 
> I think putting restriction on usage_event makes sense as that builds
> a shared contiguous table from all events which can't be attributed
> correctly and makes it easy to trigger allocation failures due to
> large order allocation but is this necessary for oom and vmpressure,
> both of which allocate only for the listening task?

Once I was there I made them consistent in that regards.

> It isn't different from listening from epoll, for example.

epoll limits the number of watchers, no?

> If there needs to be kernel memory limit, shouldn't that be handled by
> kmemcg?

kmemcg would surely help but turning it on just because of potential
abuse of the event registration API sounds like an overkill.

I think having a cap for user trigable kernel resources is a good thing
in general.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET cgroup/for-3.12] cgroup: make cgroup_event specific to memcg

2013-08-07 Thread Tejun Heo
Hello, Michal.

On Wed, Aug 07, 2013 at 03:26:13PM +0200, Michal Hocko wrote:
> I would rather see it not changed unless it really is a big win in the
> cgroup core. So far I do not see anything like that (just look at
> __cgroup_from_dentry which needs to be exported to allow for the move).

The end goal is cleaning up cftype so that it becomes a thin wrapper
around seq_file and I'd really like to keep the interface minimal so
that it's difficult to misunderstand.

> You reduce the amount of code in cgroup.c, alright, but the code
> doesn't go away really. It just moves out of your sight and moves the
> same burden on somebody else without providing a new generic interface.

If the implementation details are all that you're objecting, I'll be
happy to restructure it.  I just didn't pay too much attention to it
because I considered it to be mostly deprecated.  I don't think it'll
be too much work and strongly think it'll be worth the effort.  Our
code base is extremely nasty is and I'll try to get any ounce of
cleanup I can get.

> If somebody needs a notification interface (and there is no one available
> right now) then you cannot prevent from such a pointless work anyway...

I'm gonna add one for freezer state transitions.  It'll be simple
"this file changed" thing and will probably apply that to at least oom
and vmpressure.  I'm relatively confident that it's gonna be pretty
simple and that's gonna be the cgroup event mechanism.

> cgroup_event_* don't sound memcg specific at all. They are playing with
> cgroup dentry reference counting and do a generic functionality which
> memcg doesn't need to know about.

Sure, I'll try to clean it up so that it doesn't meddle with cgroup
internals directly.

> I wouldn't object to having non-cgroup internals playing variant. I just
> do not think it makes sense to invest time to something that should go
> away long term.

I suppose it's priority thing.  To me, cleaning up cgroup core API is
quite important and I'd be happy to sink time and effort into it and
it's not like we can drop the event thing in a release cycle or two.
We'd have to carry it for years, so I think the effort is justified.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: fix wrong address when loading PRM_FRAC_INCREMENTOR_DENUMERATOR_RELOAD

2013-08-07 Thread Chen Baozi

On Aug 7, 2013, at 7:09 PM, Tony Lindgren  wrote:

> * Chen Baozi  [130805 08:33]:
>> ping?
>> 
>> On Aug 1, 2013, at 7:27 PM, Chen Baozi  wrote:
>> 
>>> The denominator should be load from INCREMENTOR_DENUMERATOR_RELOAD_OFFSET
>>> rather than INCREMENTER_NUMERATOR_OFFSET.
> 
> Maybe describe what exactly happens without this fix?

I think it is more likely a typo, since 
INCREMENTER_DENUMERATOR_RELOAD[23:17] is reserved. It seems
that it won't make much trouble without this fix because
the useful [11:0] bit is mask and set the right value later.

Cheers,

Baozi

> 
> Also we should get few acks for this for the -rc series.
> 
> Regards,
> 
> Tony
> 
>>> Signed-off-by: Chen Baozi 
>>> ---
>>> arch/arm/mach-omap2/timer.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/arm/mach-omap2/timer.c b/arch/arm/mach-omap2/timer.c
>>> index b37e1fc..9265e03 100644
>>> --- a/arch/arm/mach-omap2/timer.c
>>> +++ b/arch/arm/mach-omap2/timer.c
>>> @@ -537,7 +537,7 @@ static void __init realtime_counter_init(void)
>>> reg |= num;
>>> __raw_writel(reg, base + INCREMENTER_NUMERATOR_OFFSET);
>>> 
>>> -   reg = __raw_readl(base + INCREMENTER_NUMERATOR_OFFSET) &
>>> +   reg = __raw_readl(base + INCREMENTER_DENUMERATOR_RELOAD_OFFSET) &
>>> NUMERATOR_DENUMERATOR_MASK;
>>> reg |= den;
>>> __raw_writel(reg, base + INCREMENTER_DENUMERATOR_RELOAD_OFFSET);
>>> -- 
>>> 1.8.1.4
>>> 
>> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: List corruption in hidraw_release in 3.11-rc4

2013-08-07 Thread Jiri Kosina
On Wed, 7 Aug 2013, Peter Wu wrote:

> > does the patch below fix the problem you are seeing?
> That one is already in 3.11-rc4 as far as I can see. Also, that code can 
> probably simplified by moving the mutex_unlock after the out label, removing 
> the need to duplicate the mutex_unlock.
> 
> Remember what I said about "no Oopses"? Well, it turned out that several 
> memory structures were damaged which causes a general protection fault in 
> sock_alloc_inode and other places.
> 
> I managed to create a program that can reproduce this bug 100% in a QEMU 
> virtual machine with a Logitech USB receiver passed to it.
> 
> qemu-system-x86_64 -enable-kvm -m 1G -usb -usbdevice host:046d:c52b
> (pass -kernel, -initrd, -append as needed)
> 
> Copy hidraw-test to initrd, boot QEMU and run `hidraw-test`. Result: instant
> (= +/- 2 seconds) crash.
> 
> I have applied Manoj's patch[1] on top of 3.11-rc4 which seem to fix the 
> issue. 
> One observation is that the new device is named /dev/hidraw1 instead of 
> /dev/hidraw0. Example:
> 
> f(){ hidraw-test /dev/hidraw$1 usb1;}
> # needed for 3.11-rc4
> f 1; f 1 # crash
> # needed for 3.11-rc4 + patch
> f 1; f 2 # ok
> 
> Regards,
> Peter
> 
>  [1]: http://lkml.org/lkml/2013/7/22/248

That one I am still reviewing ... can I add your Tested-by: to it when 
I'll be applying it and pushing to Linus?

Thanks.

> --
> /* cc hidraw-test.c -o hidraw-test
>  * hidraw-test /dev/hidraw0 usb1; hidraw-test /dev/hidraw0 usb1;
>  */
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int open_and_write(const char *path, const char *data) {
>   int sfd, r;
> 
>   sfd = open(path, O_WRONLY);
>   if (sfd < 0) {
>   perror(path);
>   return 1;
>   }
> 
>   r = write(sfd, data, strlen(data));
>   if (r < 0) {
>   fprintf(stderr, "write(%s, %s): %s\n",
>   path, data, strerror(errno));
>   return 1;
>   }
>   close(sfd);
>   return 0;
> }
> 
> int dork(const char *hiddev, const char *name) {
>   int fd;
>   char c;
> 
>   fd = open(hiddev, O_RDWR | O_NONBLOCK);
>   if (fd < 0) {
>   perror("open");
>   return 1;
>   }
> 
>   if (open_and_write("/sys/bus/usb/drivers/usb/unbind", name))
>   return 1;
> 
>   // does not make a difference
>   //sleep(1);
> 
>   if (open_and_write("/sys/bus/usb/drivers/usb/bind", name))
>   return 1;
> 
>   // allow devices to get discovered
>   sleep(1);
> 
>   printf("read() = %zi\n", read(fd, , 1)); perror("read");
>   close(fd);
>   return 0;
> }
> 
> int main(int argc, char **argv) {
>   if (argc < 3) {
>   fprintf(stderr, "Usage: %s /dev/hidrawN usbN\n", *argv);
>   return 1;
>   }
> 
>   system("modprobe -v usbhid");
>   system("modprobe -v hid-logitech-dj");
> 
>   dork(argv[1], argv[2]);
> 
>   return 0;
> }
> 

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: List corruption in hidraw_release in 3.11-rc4

2013-08-07 Thread Peter Wu
On Wednesday 07 August 2013 03:01:26 Jiri Kosina wrote:
> On Tue, 6 Aug 2013, Peter Wu wrote:
> > While debugging upowerd (with Logitech Unifying receiver via hidraw),
> > I came across this list corruption warning.
> 
> Peter,
> 
> does the patch below fix the problem you are seeing?
That one is already in 3.11-rc4 as far as I can see. Also, that code can 
probably simplified by moving the mutex_unlock after the out label, removing 
the need to duplicate the mutex_unlock.

Remember what I said about "no Oopses"? Well, it turned out that several 
memory structures were damaged which causes a general protection fault in 
sock_alloc_inode and other places.

I managed to create a program that can reproduce this bug 100% in a QEMU 
virtual machine with a Logitech USB receiver passed to it.

qemu-system-x86_64 -enable-kvm -m 1G -usb -usbdevice host:046d:c52b
(pass -kernel, -initrd, -append as needed)

Copy hidraw-test to initrd, boot QEMU and run `hidraw-test`. Result: instant
(= +/- 2 seconds) crash.

I have applied Manoj's patch[1] on top of 3.11-rc4 which seem to fix the issue. 
One observation is that the new device is named /dev/hidraw1 instead of 
/dev/hidraw0. Example:

f(){ hidraw-test /dev/hidraw$1 usb1;}
# needed for 3.11-rc4
f 1; f 1 # crash
# needed for 3.11-rc4 + patch
f 1; f 2 # ok

Regards,
Peter

 [1]: http://lkml.org/lkml/2013/7/22/248
--
/* cc hidraw-test.c -o hidraw-test
 * hidraw-test /dev/hidraw0 usb1; hidraw-test /dev/hidraw0 usb1;
 */
#include 
#include 
#include 
#include 
#include 
#include 

int open_and_write(const char *path, const char *data) {
int sfd, r;

sfd = open(path, O_WRONLY);
if (sfd < 0) {
perror(path);
return 1;
}

r = write(sfd, data, strlen(data));
if (r < 0) {
fprintf(stderr, "write(%s, %s): %s\n",
path, data, strerror(errno));
return 1;
}
close(sfd);
return 0;
}

int dork(const char *hiddev, const char *name) {
int fd;
char c;

fd = open(hiddev, O_RDWR | O_NONBLOCK);
if (fd < 0) {
perror("open");
return 1;
}

if (open_and_write("/sys/bus/usb/drivers/usb/unbind", name))
return 1;

// does not make a difference
//sleep(1);

if (open_and_write("/sys/bus/usb/drivers/usb/bind", name))
return 1;

// allow devices to get discovered
sleep(1);

printf("read() = %zi\n", read(fd, , 1)); perror("read");
close(fd);
return 0;
}

int main(int argc, char **argv) {
if (argc < 3) {
fprintf(stderr, "Usage: %s /dev/hidrawN usbN\n", *argv);
return 1;
}

system("modprobe -v usbhid");
system("modprobe -v hid-logitech-dj");

dork(argv[1], argv[2]);

return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/13] ARM: ux500: Remove Snowball DTS entry for ROHM BH1780GLI ambient light sensor

2013-08-07 Thread Linus Walleij
On Fri, Jul 19, 2013 at 4:13 PM, Lee Jones  wrote:

> It doesn't exist on the Snowball development board.
>
> Signed-off-by: Lee Jones 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/13] ARM: ux500: Remove Snowball DTS entry for TPS61052 chip

2013-08-07 Thread Linus Walleij
On Fri, Jul 19, 2013 at 4:13 PM, Lee Jones  wrote:

> TPS61052 is a; boost converter, LED driver, LED flash driver and
> simple GPIO pin chip. It has no use here however, as it is not
> found on the Snowball development board.
>
> Signed-off-by: Lee Jones 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/13] ARM: ux500: Remove Snowball DTS entry for National Semiconductor LP5521 LED chip

2013-08-07 Thread Linus Walleij
On Fri, Jul 19, 2013 at 4:13 PM, Lee Jones  wrote:

> It doesn't exist on the Snowball development board.
>
> Signed-off-by: Lee Jones 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/13] ARM: ux500: Remove Toshiba TC35892 I/O Expander's DT entry from Snowball's DTS

2013-08-07 Thread Linus Walleij
On Fri, Jul 19, 2013 at 4:13 PM, Lee Jones  wrote:

> It doesn't exist on this development board.
>
> Signed-off-by: Lee Jones 

Patch applied to my ux500-dt branch.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-07 Thread Rob Clark
On Wed, Aug 7, 2013 at 12:23 AM, John Stultz  wrote:
> On Tue, Aug 6, 2013 at 5:15 AM, Rob Clark  wrote:
>> well, let's divide things up into two categories:
>>
>> 1) the arrangement and format of pixels.. ie. what userspace would
>> need to know if it mmap's a buffer.  This includes pixel format,
>> stride, etc.  This should be negotiated in userspace, it would be
>> crazy to try to do this in the kernel.
>>
>> 2) the physical placement of the pages.  Ie. whether it is contiguous
>> or not.  Which bank the pages in the buffer are placed in, etc.  This
>> is not visible to userspace.  This is the purpose of the attach step,
>> so you know all the devices involved in sharing up front before
>> allocating the backing pages.  (Or in the worst case, if you have a
>> "late attacher" you at least know when no device is doing dma access
>> to a buffer and can reallocate and move the buffer.)  A long time
>
> One concern I know the Android folks have expressed previously (and
> correct me if its no longer an objection), is that this attach time
> in-kernel constraint solving / moving or reallocating buffers is
> likely to hurt determinism.  If I understood, their perspective was
> that userland knows the device path the buffers will travel through,
> so why not leverage that knowledge, rather then having the kernel have
> to sort it out for itself after the fact.

If you know the device path, then attach the buffer at all the devices
before you start using it.  Problem solved.. kernel knows all devices
before pages need be allocated ;-)

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Tools: hv: correct payload size in netlink_send

2013-08-07 Thread KY Srinivasan


> -Original Message-
> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: Wednesday, August 07, 2013 9:07 AM
> To: KY Srinivasan; gre...@linuxfoundation.org
> Cc: linux-kernel@vger.kernel.org; Olaf Hering
> Subject: [PATCH] Tools: hv: correct payload size in netlink_send
> 
> netlink_send is supposed to send just the cn_msg+hv_kvp_msg via netlink.
> Currently it sets an incorrect iovec size, as reported by valgrind.
> 
> In the case of registering with the kernel the allocated buffer is large
> enough to hold nlmsghdr+cn_msg+hv_kvp_msg, no overrun happens. In the
> case of responding to the kernel the cn_msg is located in the middle of
> recv_buffer, after the nlmsghdr. Currently the code in netlink_send adds
> also the size of nlmsghdr to the payload. But nlmsghdr is a separate
> iovec. This leads to an (harmless) out-of-bounds access when the kernel
> processes the iovec. Correct the iovec size of the cn_msg to be just
> cn_msg + its payload.

Thanks Olaf.

> 
> Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 

> ---
>  tools/hv/hv_kvp_daemon.c | 2 +-
>  tools/hv/hv_vss_daemon.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
> index d3bcb84..1bd1ad1 100644
> --- a/tools/hv/hv_kvp_daemon.c
> +++ b/tools/hv/hv_kvp_daemon.c
> @@ -1399,7 +1399,7 @@ netlink_send(int fd, struct cn_msg *msg)
>   char buffer[64];
>   struct iovec iov[2];
> 
> - size = NLMSG_SPACE(sizeof(struct cn_msg) + msg->len);
> + size = sizeof(struct cn_msg) + msg->len;
> 
>   nlh = (struct nlmsghdr *)buffer;
>   nlh->nlmsg_seq = 0;
> diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
> index 6b4f2fa..2f1f53f 100644
> --- a/tools/hv/hv_vss_daemon.c
> +++ b/tools/hv/hv_vss_daemon.c
> @@ -111,7 +111,7 @@ static int netlink_send(int fd, struct cn_msg *msg)
>   char buffer[64];
>   struct iovec iov[2];
> 
> - size = NLMSG_SPACE(sizeof(struct cn_msg) + msg->len);
> + size = sizeof(struct cn_msg) + msg->len;
> 
>   nlh = (struct nlmsghdr *)buffer;
>   nlh->nlmsg_seq = 0;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET cgroup/for-3.12] cgroup: make cgroup_event specific to memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 08:43:21, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Aug 07, 2013 at 02:18:36PM +0200, Michal Hocko wrote:
> > How is it specific to memcg? The fact only memcg uses the interface
> > doesn't imply it is memcg specific.
> 
> I don't follow.  It's only for memcg.  That is *by definition* memcg
> specific.  It's the verbatim meaning of the word.

My understanding of "memcg specific" is that it uses memcg specific
code/data structures. But let's not play with words.

> Now, I do
> understand that it can be a concern the implementation details as-is
> could be a bit too invasive into cgroup core to be moved to memcg, but
> that's something we can work on, right?

Does it really make sense to work on this interface if it is planned to
be replaced by something different. Isn't that just a waste of time?

> Can you at least agree that the feature is nmemcg specific and it'd be
> better to be located in memcg if possible?  That really isn't not much
> to ask and is a logical thing to do.

I would rather see it not changed unless it really is a big win in the
cgroup core. So far I do not see anything like that (just look at
__cgroup_from_dentry which needs to be exported to allow for the move).
You reduce the amount of code in cgroup.c, alright, but the code
doesn't go away really. It just moves out of your sight and moves the
same burden on somebody else without providing a new generic interface.

> > There are other ways to achieve the same. E.g. not ack new usage of
> > register callback users. We have done similar with other things like
> > use_hierarchy...
> 
> Yes, but those are all inferior to actually moving the code where it
> belongs.  Those makes the code harder to follow and people
> misunderstand and waste time working on stuff (either in the core or
> controllers) which eventually end up getting nacked.  Why do that when
> we can easily do better?  What's the rationale behind that?

If somebody needs a notification interface (and there is no one available
right now) then you cannot prevent from such a pointless work anyway...

> > The cleanup is removing 2 callbacks with a cost of moving non-memcg
> > specific code inside memcg. That is what I am objecting to.
> 
> I don't really get your "non-memcg" specific code assertion when it is
> by definition memcg-specific.  What are you talking about?

cgroup_event_* don't sound memcg specific at all. They are playing with
cgroup dentry reference counting and do a generic functionality which
memcg doesn't need to know about.

> > I will not repeat myself. We seem to disagree on where the code belongs.
> > As I've said I will not ack this code, try to find somebody else who
> > think it is a good idea. I do not see any added value.
> 
> Nacking is part of your authority as maintainer but you should still
> provide plausible rationale for that.

I didn't say I Nack it. I said I won't Ack it. If Johannes or Kamezawa
think this is OK and another bloat in memcg is not a big deal I will not
block it. I won't be happy but how is the life.

> Are you saying that even if the
> code is restructured so that it's not invasive into cgroup core, you
> are still gonna disagree with it because it's still somehow not
> memcg-specifc?

I wouldn't object to having non-cgroup internals playing variant. I just
do not think it makes sense to invest time to something that should go
away long term.

> Please don't repeat yourself but do explain your rationale.  That's
> part of your duty as a maintainer too.

I think I am clear what I do not like about this move.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf tools: add 'keep tracking' test

2013-08-07 Thread Adrian Hunter
Add a test for the newly added PERF_COUNT_SW_DUMMY event.
The test checks that tracking events continue when an
event is disabled but a dummy software event is not
disabled.

Signed-off-by: Adrian Hunter 
---
 tools/perf/Makefile  |   1 +
 tools/perf/tests/builtin-test.c  |   4 ++
 tools/perf/tests/keep-tracking.c | 150 +++
 tools/perf/tests/tests.h |   1 +
 tools/perf/util/evlist.c |  42 ++-
 tools/perf/util/evlist.h |   5 ++
 6 files changed, 201 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/tests/keep-tracking.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index bfd12d0..0193e7c 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -392,6 +392,7 @@ LIB_OBJS += $(OUTPUT)tests/sw-clock.o
 ifeq ($(ARCH),x86)
 LIB_OBJS += $(OUTPUT)tests/perf-time-to-tsc.o
 endif
+LIB_OBJS += $(OUTPUT)tests/keep-tracking.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b7b4049..2a468a1 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -100,6 +100,10 @@ static struct test {
},
 #endif
{
+   .desc = "Test using a dummy software event to keep tracking",
+   .func = test__keep_tracking,
+   },
+   {
.func = NULL,
},
 };
diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c
new file mode 100644
index 000..74abe00
--- /dev/null
+++ b/tools/perf/tests/keep-tracking.c
@@ -0,0 +1,150 @@
+#include 
+#include 
+#include 
+
+#include "parse-events.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "thread_map.h"
+#include "cpumap.h"
+#include "tests.h"
+
+#define CHECK__(x) {   \
+   while ((x) < 0) {   \
+   pr_debug(#x " failed!\n");  \
+   goto out_err;   \
+   }   \
+}
+
+#define CHECK_NOT_NULL__(x) {  \
+   while ((x) == NULL) {   \
+   pr_debug(#x " failed!\n");  \
+   goto out_err;   \
+   }   \
+}
+
+static int find_comm(struct perf_evlist *evlist, const char *comm)
+{
+   union perf_event *event;
+   int i, found;
+
+   found = 0;
+   for (i = 0; i < evlist->nr_mmaps; i++) {
+   while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+   if (event->header.type == PERF_RECORD_COMM &&
+   (pid_t)event->comm.pid == getpid() &&
+   (pid_t)event->comm.tid == getpid() &&
+   strcmp(event->comm.comm, comm) == 0)
+   found += 1;
+   }
+   }
+   return found;
+}
+
+/**
+ * test__keep_tracking - test using a dummy software event to keep tracking.
+ *
+ * This function implements a test that checks that tracking events continue
+ * when an event is disabled but a dummy software event is not disabled.  If 
the
+ * test passes %0 is returned, otherwise %-1 is returned.
+ */
+int test__keep_tracking(void)
+{
+   struct perf_record_opts opts = {
+   .mmap_pages  = UINT_MAX,
+   .user_freq   = UINT_MAX,
+   .user_interval   = ULLONG_MAX,
+   .freq= 4000,
+   .target  = {
+   .uses_mmap   = true,
+   },
+   };
+   struct thread_map *threads = NULL;
+   struct cpu_map *cpus = NULL;
+   struct perf_evlist *evlist = NULL;
+   struct perf_evsel *evsel = NULL;
+   int found, err = -1;
+   const char *comm;
+
+   threads = thread_map__new(-1, getpid(), UINT_MAX);
+   CHECK_NOT_NULL__(threads);
+
+   cpus = cpu_map__new(NULL);
+   CHECK_NOT_NULL__(cpus);
+
+   evlist = perf_evlist__new();
+   CHECK_NOT_NULL__(evlist);
+
+   perf_evlist__set_maps(evlist, cpus, threads);
+
+   CHECK__(parse_events(evlist, "dummy:u"));
+   CHECK__(parse_events(evlist, "cycles:u"));
+
+   perf_evlist__config(evlist, );
+
+   evsel = perf_evlist__first(evlist);
+
+   evsel->attr.comm = 1;
+   evsel->attr.disabled = 1;
+   evsel->attr.enable_on_exec = 0;
+
+   CHECK__(perf_evlist__open(evlist));
+
+   CHECK__(perf_evlist__mmap(evlist, UINT_MAX, false));
+
+   /*
+* First, test that a 'comm' event can be found when the event is
+* enabled.
+*/
+
+   perf_evlist__enable(evlist);
+
+   comm = "Test COMM 1";
+   CHECK__(prctl(PR_SET_NAME, (unsigned long)comm, 0, 0, 0));
+
+   perf_evlist__disable(evlist);
+
+   found = find_comm(evlist, comm);
+   if (found != 1) {
+   pr_debug("First 

[PATCH 2/3] perf tools: add support for PERF_COUNT_SW_DUMMY

2013-08-07 Thread Adrian Hunter
Add support for the new dummy software event
PERF_COUNT_SW_DUMMY.

Signed-off-by: Adrian Hunter 
---
 tools/perf/util/parse-events.c | 4 
 tools/perf/util/parse-events.l | 1 +
 tools/perf/util/python.c   | 1 +
 3 files changed, 6 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index dba877d..1ef81ea 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -108,6 +108,10 @@ static struct event_symbol 
event_symbols_sw[PERF_COUNT_SW_MAX] = {
.symbol = "emulation-faults",
.alias  = "",
},
+   [PERF_COUNT_SW_DUMMY] = {
+   .symbol = "dummy",
+   .alias  = "",
+   },
 };
 
 #define __PERF_EVENT_FIELD(config, name) \
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index b36115f..29c5d24 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -144,6 +144,7 @@ context-switches|cs { return 
sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW
 cpu-migrations|migrations  { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
 alignment-faults   { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
 emulation-faults   { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
+dummy  { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
 
 L1-dcache|l1-d|l1d|L1-data |
 L1-icache|l1-i|l1i|L1-instruction  |
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 925e0c3..2fa83c0 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -967,6 +967,7 @@ static struct {
{ "COUNT_SW_PAGE_FAULTS_MAJ",  PERF_COUNT_SW_PAGE_FAULTS_MAJ },
{ "COUNT_SW_ALIGNMENT_FAULTS", PERF_COUNT_SW_ALIGNMENT_FAULTS },
{ "COUNT_SW_EMULATION_FAULTS", PERF_COUNT_SW_EMULATION_FAULTS },
+   { "COUNT_SW_DUMMY",PERF_COUNT_SW_DUMMY },
 
{ "SAMPLE_IP",PERF_SAMPLE_IP },
{ "SAMPLE_TID",   PERF_SAMPLE_TID },
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >