date:20121102

[GIT PULL] FRV fixes

2012-11-02 Thread David Howells


Hi Linus,

Can you pull these patches for FRV please?  They're a collection of small
fixes for the FRV architecture.

David
---
The following changes since commit 8c23f406c6d86808726ace580657186bc3b44587:

  Merge git://git.kernel.org/pub/scm/virt/kvm/kvm (2012-11-01 08:27:02 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-frv.git 
tags/frv-fixes-20121102

for you to fetch changes up to 1d72d9f83df057e71c7951def41138a0230bf737:

  frv: fix the broken preempt (2012-11-02 12:08:25 -0400)


FRV fixes for 3.7


Al Viro (2):
  frv: switch to saner kernel_execve() semantics
  frv: fix the broken preempt

David Howells (5):
  FRV: Add missing linux/export.h #inclusions
  FRV: Don't objcopy the GNU build_id note
  FRV: gcc-4.1.2 also inlines weak functions
  FRV: Fix the preemption handling
  FRV: Fix the new-style kernel_thread() stuff

 arch/frv/Kconfig  |  1 +
 arch/frv/boot/Makefile| 10 ++
 arch/frv/include/asm/unistd.h |  1 -
 arch/frv/kernel/entry.S   | 28 +++-
 arch/frv/kernel/process.c |  5 +++--
 arch/frv/mb93090-mb00/pci-dma-nommu.c |  1 +
 init/main.c   |  2 ++
 7 files changed, 16 insertions(+), 32 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread Vivek Goyal

On Thu, Nov 01, 2012 at 01:50:08PM -0400, Eric Paris wrote:

[..]
> I've talked with and
> worked with a public cloud operator who wants to prevent even a
> malicious root user from being able to run code in ring 0 inside their
> VM.  The hope in that case was that in doing so they can indirectly
> shrink the attack surface between virtual machine and hypervisor.
> They hoped to limit the ways the guest could interact to only those
> methods the linux kernel implemented.
> 
> They want to launch a vm running a kernel they chose and make sure
> root inside the vm could not run some other kernel or run arbitrary
> code in kernel space.  It's wasn't something they solved completely.
> If it was, all of this secure boot work would be finished.  Which is
> why we are having these discussions to understand all of the way that
> you an Alan seem to have to get around the secure boot restrictions.
> And look for solutions to retain functionality which meeting the
> security goal of 'prevent uid=0 to ring 0 privilege escalation.

So will secure boot help with above use case you mentioned? I think
until and unless you lock down user space too on host, it will not be
possible.

On the flip side, one might be able to launch windows in qemu (compromised
noe) and might fool user into thinking it is booted natively and steal
login credentials and other stuff.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: setting up CDB filters in udev (was Re: [PATCH v2 0/3] block: add queue-private command filter, editable via sysfs)

2012-11-02 Thread Tejun Heo

Hey, Alan.

On Fri, Nov 02, 2012 at 05:21:45PM +, Alan Cox wrote:
> That also means that a normal app running as superuser for some reason
> would set its user filter and any accidentally inherited descriptors will
> be less dangerous as the are today. It also means a CAP_SYS_RAWIO capable
> app can still use filters itself as good programming practise.
> 
> It effectively means you have to deliberately and intentionally set up an
> 'inherited' extra rights case.

The last part, I agree, but in general I think what you're describing
is way too elaborate for the problem at hand.  It's like adding
arbitrary range-filter for /dev/sdX which can be overridden by
userland.  You sure can find use case for such thing if you try hard
enough, but it's way over-engineered nonetheless.  I don't think we're
addressing huge range and number of use cases here and would much
prefer to keep it as simple as possible.

 * Devices are given standard filter matching the device class.  Any
   !CAP_SYS_RAWIO user can only issue commands allowed by the filter.

 * CAP_SYS_RAWIO can issue an ioctl to disable the filter all
   accessors of the fd and transfer it.

That should be enough, no?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3, v2] x86/xor: make virtualization friendly

2012-11-02 Thread H. Peter Anvin

Aren't we actually talking just about PV here?

If so the test is wrong.

Jan Beulich  wrote:

>In virtualized environments, the CR0.TS management needed here can be a
>lot slower than anticipated by the original authors of this code, which
>particularly means that in such cases forcing the use of SSE- (or MMX-)
>based implementations is not desirable - actual measurements should
>always be done in that case.
>
>For consistency, pull into the shared (32- and 64-bit) header not only
>the inclusion of the generic code, but also that of the AVX variants.
>
>Signed-off-by: Jan Beulich 
>Cc: Konrad Rzeszutek Wilk 
>
>---
> arch/x86/include/asm/xor.h|8 +++-
> arch/x86/include/asm/xor_32.h |   22 ++
> arch/x86/include/asm/xor_64.h |   10 ++
> 3 files changed, 23 insertions(+), 17 deletions(-)
>
>--- 3.7-rc3-x86-xor.orig/arch/x86/include/asm/xor.h
>+++ 3.7-rc3-x86-xor/arch/x86/include/asm/xor.h
>@@ -487,6 +487,12 @@ static struct xor_block_template xor_blo
> 
> #undef XOR_CONSTANT_CONSTRAINT
> 
>+/* Also try the AVX routines */
>+#include 
>+
>+/* Also try the generic routines. */
>+#include 
>+
> #ifdef CONFIG_X86_32
> # include 
> #else
>@@ -494,6 +500,6 @@ static struct xor_block_template xor_blo
> #endif
> 
> #define XOR_SELECT_TEMPLATE(FASTEST) \
>-  AVX_SELECT(FASTEST)
>+  (cpu_has_hypervisor ? (FASTEST) : AVX_SELECT(FASTEST))
> 
> #endif /* _ASM_X86_XOR_H */
>--- 3.7-rc3-x86-xor.orig/arch/x86/include/asm/xor_32.h
>+++ 3.7-rc3-x86-xor/arch/x86/include/asm/xor_32.h
>@@ -537,12 +537,6 @@ static struct xor_block_template xor_blo
>   .do_5 = xor_sse_5,
> };
> 
>-/* Also try the AVX routines */
>-#include 
>-
>-/* Also try the generic routines.  */
>-#include 
>-
>/* We force the use of the SSE xor block because it can write around
>L2.
>  We may also be able to load into the L1 only depending on how the cpu
>deals with a load to a line that is being prefetched.  */
>@@ -553,15 +547,19 @@ do { 
>\
>   if (cpu_has_xmm) {  \
>   xor_speed(_block_pIII_sse); \
>   xor_speed(_block_sse_pf64); \
>-  } else if (cpu_has_mmx) {   \
>+  if (!cpu_has_hypervisor)\
>+  break;  \
>+  }   \
>+  if (cpu_has_mmx) {  \
>   xor_speed(_block_pII_mmx);  \
>   xor_speed(_block_p5_mmx);   \
>-  } else {\
>-  xor_speed(_block_8regs);\
>-  xor_speed(_block_8regs_p);  \
>-  xor_speed(_block_32regs);   \
>-  xor_speed(_block_32regs_p); \
>+  if (!cpu_has_hypervisor)\
>+  break;  \
>   }   \
>+  xor_speed(_block_8regs);\
>+  xor_speed(_block_8regs_p);  \
>+  xor_speed(_block_32regs);   \
>+  xor_speed(_block_32regs_p); \
> } while (0)
> 
> #endif /* _ASM_X86_XOR_32_H */
>--- 3.7-rc3-x86-xor.orig/arch/x86/include/asm/xor_64.h
>+++ 3.7-rc3-x86-xor/arch/x86/include/asm/xor_64.h
>@@ -9,10 +9,6 @@ static struct xor_block_template xor_blo
>   .do_5 = xor_sse_5,
> };
> 
>-
>-/* Also try the AVX routines */
>-#include 
>-
>/* We force the use of the SSE xor block because it can write around
>L2.
>  We may also be able to load into the L1 only depending on how the cpu
>deals with a load to a line that is being prefetched.  */
>@@ -22,6 +18,12 @@ do {\
>   AVX_XOR_SPEED;  \
>   xor_speed(_block_sse_pf64); \
>   xor_speed(_block_sse);  \
>+  if (cpu_has_hypervisor) {   \
>+  xor_speed(_block_8regs);\
>+  xor_speed(_block_8regs_p);  \
>+  xor_speed(_block_32regs);   \
>+  xor_speed(_block_32regs_p); \
>+  }   \
> } while (0)
> 
> #endif /* _ASM_X86_XOR_64_H */

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 2/4] mfd: tps65910: use regmap irq framework for interrupt support

2012-11-02 Thread Laxman Dewangan

Implement irq support of tps65910 with regmap irq framework
in place of implementing locally.
This reduces the code size significantly and easy to maintain.

Signed-off-by: Laxman Dewangan 
Reviewed-by: Mark Brown 
---
No change from V1, just rearrange this patch.
It was already reviwed by Mark and hence adding his reviewed by.
  
No change from V2.

 drivers/mfd/tps65910-irq.c   |  375 --
 include/linux/mfd/tps65910.h |  139 +++-
 2 files changed, 278 insertions(+), 236 deletions(-)

diff --git a/drivers/mfd/tps65910-irq.c b/drivers/mfd/tps65910-irq.c
index 09aab3e..554543a 100644
--- a/drivers/mfd/tps65910-irq.c
+++ b/drivers/mfd/tps65910-irq.c
@@ -24,171 +24,184 @@
 #include 
 #include 
 
-/*
- * This is a threaded IRQ handler so can access I2C/SPI.  Since all
- * interrupts are clear on read the IRQ line will be reasserted and
- * the physical IRQ will be handled again if another interrupt is
- * asserted while we run - in the normal course of events this is a
- * rare occurrence so we save I2C/SPI reads.  We're also assuming that
- * it's rare to get lots of interrupts firing simultaneously so try to
- * minimise I/O.
- */
-static irqreturn_t tps65910_irq(int irq, void *irq_data)
-{
-   struct tps65910 *tps65910 = irq_data;
-   unsigned int reg;
-   u32 irq_sts;
-   u32 irq_mask;
-   int i;
-
-   tps65910_reg_read(tps65910, TPS65910_INT_STS, );
-   irq_sts = reg;
-   tps65910_reg_read(tps65910, TPS65910_INT_STS2, );
-   irq_sts |= reg << 8;
-   switch (tps65910_chip_id(tps65910)) {
-   case TPS65911:
-   tps65910_reg_read(tps65910, TPS65910_INT_STS3, );
-   irq_sts |= reg << 16;
-   }
-
-   tps65910_reg_read(tps65910, TPS65910_INT_MSK, );
-   irq_mask = reg;
-   tps65910_reg_read(tps65910, TPS65910_INT_MSK2, );
-   irq_mask |= reg << 8;
-   switch (tps65910_chip_id(tps65910)) {
-   case TPS65911:
-   tps65910_reg_read(tps65910, TPS65910_INT_MSK3, );
-   irq_mask |= reg << 16;
-   }
-
-   irq_sts &= ~irq_mask;
-
-   if (!irq_sts)
-   return IRQ_NONE;
-
-   for (i = 0; i < tps65910->irq_num; i++) {
-
-   if (!(irq_sts & (1 << i)))
-   continue;
-
-   handle_nested_irq(irq_find_mapping(tps65910->domain, i));
-   }
-
-   /* Write the STS register back to clear IRQs we handled */
-   reg = irq_sts & 0xFF;
-   irq_sts >>= 8;
-   tps65910_reg_write(tps65910, TPS65910_INT_STS, reg);
-   reg = irq_sts & 0xFF;
-   tps65910_reg_write(tps65910, TPS65910_INT_STS2, reg);
-   switch (tps65910_chip_id(tps65910)) {
-   case TPS65911:
-   reg = irq_sts >> 8;
-   tps65910_reg_write(tps65910, TPS65910_INT_STS3, reg);
-   }
-
-   return IRQ_HANDLED;
-}
-
-static void tps65910_irq_lock(struct irq_data *data)
-{
-   struct tps65910 *tps65910 = irq_data_get_irq_chip_data(data);
-
-   mutex_lock(>irq_lock);
-}
-
-static void tps65910_irq_sync_unlock(struct irq_data *data)
-{
-   struct tps65910 *tps65910 = irq_data_get_irq_chip_data(data);
-   u32 reg_mask;
-   unsigned int reg;
-
-   tps65910_reg_read(tps65910, TPS65910_INT_MSK, );
-   reg_mask = reg;
-   tps65910_reg_read(tps65910, TPS65910_INT_MSK2, );
-   reg_mask |= reg << 8;
-   switch (tps65910_chip_id(tps65910)) {
-   case TPS65911:
-   tps65910_reg_read(tps65910, TPS65910_INT_MSK3, );
-   reg_mask |= reg << 16;
-   }
 
-   if (tps65910->irq_mask != reg_mask) {
-   reg = tps65910->irq_mask & 0xFF;
-   tps65910_reg_write(tps65910, TPS65910_INT_MSK, reg);
-   reg = tps65910->irq_mask >> 8 & 0xFF;
-   tps65910_reg_write(tps65910, TPS65910_INT_MSK2, reg);
-   switch (tps65910_chip_id(tps65910)) {
-   case TPS65911:
-   reg = tps65910->irq_mask >> 16;
-   tps65910_reg_write(tps65910, TPS65910_INT_MSK3, reg);
-   }
-   }
-   mutex_unlock(>irq_lock);
-}
-
-static void tps65910_irq_enable(struct irq_data *data)
-{
-   struct tps65910 *tps65910 = irq_data_get_irq_chip_data(data);
-
-   tps65910->irq_mask &= ~(1 << data->hwirq);
-}
-
-static void tps65910_irq_disable(struct irq_data *data)
-{
-   struct tps65910 *tps65910 = irq_data_get_irq_chip_data(data);
-
-   tps65910->irq_mask |= (1 << data->hwirq);
-}
+static const struct regmap_irq tps65911_irqs[] = {
+   /* INT_STS */
+   [TPS65911_IRQ_PWRHOLD_F] = {
+   .mask = INT_MSK_PWRHOLD_F_IT_MSK_MASK,
+   .reg_offset = 0,
+   },
+   [TPS65911_IRQ_VBAT_VMHI] = {
+   .mask = INT_MSK_VMBHI_IT_MSK_MASK,
+   .reg_offset = 0,
+   },
+   [TPS65911_IRQ_PWRON] = {
+   .mask = INT_MSK_PWRON_IT_MSK_MASK,
+

[PATCH V3 3/4] mfd: tps65910: move interrupt implementation code to mfd file

2012-11-02 Thread Laxman Dewangan

In place of implementing the irq support in separate file,
moving implementation to main mfd file.
The irq files only contains the table and init steps only
and does not need extra file to have this only for this
purpose.

Signed-off-by: Laxman Dewangan 
Reviewed-by: Mark Brown 
---
No change from V1, just rearrange this patch.
It was already reviwed by Mark and hence adding his reviewed by.
 
No change from V2.

 drivers/mfd/Makefile |2 +-
 drivers/mfd/tps65910-irq.c   |  243 --
 drivers/mfd/tps65910.c   |  216 +
 include/linux/mfd/tps65910.h |4 -
 4 files changed, 217 insertions(+), 248 deletions(-)
 delete mode 100644 drivers/mfd/tps65910-irq.c

diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 296817c..b64d8f5 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -52,7 +52,7 @@ obj-$(CONFIG_TPS6105X)+= tps6105x.o
 obj-$(CONFIG_TPS65010) += tps65010.o
 obj-$(CONFIG_TPS6507X) += tps6507x.o
 obj-$(CONFIG_MFD_TPS65217) += tps65217.o
-obj-$(CONFIG_MFD_TPS65910) += tps65910.o tps65910-irq.o
+obj-$(CONFIG_MFD_TPS65910) += tps65910.o
 tps65912-objs   := tps65912-core.o tps65912-irq.o
 obj-$(CONFIG_MFD_TPS65912) += tps65912.o
 obj-$(CONFIG_MFD_TPS65912_I2C) += tps65912-i2c.o
diff --git a/drivers/mfd/tps65910-irq.c b/drivers/mfd/tps65910-irq.c
deleted file mode 100644
index 554543a..000
--- a/drivers/mfd/tps65910-irq.c
+++ /dev/null
@@ -1,243 +0,0 @@
-/*
- * tps65910-irq.c  --  TI TPS6591x
- *
- * Copyright 2010 Texas Instruments Inc.
- *
- * Author: Graeme Gregory 
- * Author: Jorge Eduardo Candelaria 
- *
- *  This program is free software; you can redistribute it and/or modify it
- *  under  the terms of the GNU General  Public License as published by the
- *  Free Software Foundation;  either version 2 of the License, or (at your
- *  option) any later version.
- *
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-
-static const struct regmap_irq tps65911_irqs[] = {
-   /* INT_STS */
-   [TPS65911_IRQ_PWRHOLD_F] = {
-   .mask = INT_MSK_PWRHOLD_F_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_VBAT_VMHI] = {
-   .mask = INT_MSK_VMBHI_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_PWRON] = {
-   .mask = INT_MSK_PWRON_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_PWRON_LP] = {
-   .mask = INT_MSK_PWRON_LP_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_PWRHOLD_R] = {
-   .mask = INT_MSK_PWRHOLD_R_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_HOTDIE] = {
-   .mask = INT_MSK_HOTDIE_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_RTC_ALARM] = {
-   .mask = INT_MSK_RTC_ALARM_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-   [TPS65911_IRQ_RTC_PERIOD] = {
-   .mask = INT_MSK_RTC_PERIOD_IT_MSK_MASK,
-   .reg_offset = 0,
-   },
-
-   /* INT_STS2 */
-   [TPS65911_IRQ_GPIO0_R] = {
-   .mask = INT_MSK2_GPIO0_R_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO0_F] = {
-   .mask = INT_MSK2_GPIO0_F_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO1_R] = {
-   .mask = INT_MSK2_GPIO1_R_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO1_F] = {
-   .mask = INT_MSK2_GPIO1_F_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO2_R] = {
-   .mask = INT_MSK2_GPIO2_R_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO2_F] = {
-   .mask = INT_MSK2_GPIO2_F_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO3_R] = {
-   .mask = INT_MSK2_GPIO3_R_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-   [TPS65911_IRQ_GPIO3_F] = {
-   .mask = INT_MSK2_GPIO3_F_IT_MSK_MASK,
-   .reg_offset = 1,
-   },
-
-   /* INT_STS3 */
-   [TPS65911_IRQ_GPIO4_R] = {
-   .mask = INT_MSK3_GPIO4_R_IT_MSK_MASK,
-   .reg_offset = 2,
-   },
-   [TPS65911_IRQ_GPIO4_F] = {
-   .mask = INT_MSK3_GPIO4_F_IT_MSK_MASK,
-   .reg_offset = 2,
-   },
-   [TPS65911_IRQ_GPIO5_R] = {
-   .mask = INT_MSK3_GPIO5_R_IT_MSK_MASK,
-   .reg_offset = 2,
-   },
-   [TPS65911_IRQ_GPIO5_F] = {
-   .mask = INT_MSK3_GPIO5_F_IT_MSK_MASK,
-   .reg_offset = 2,
-   },
-   [TPS65911_IRQ_WTCHDG] = {
-   .mask = INT_MSK3_WTCHDG_IT_MSK_MASK,
-

[PATCH V3 0/4] mfd: tps65910: use regmap irq framework for interrupt

2012-11-02 Thread Laxman Dewangan

This patch series has following change:
- Use regmap irq framework for interrupt registration. Corrected the
  register bit definition for interrupts.
- Move the irq table to tps65910.c and get rid of tps65910-irq.c.
- Raarrange the init sequence of different sub moduled of tps65910 like
  irq, clock and then mfd devices.
- Export the irq domain handle from regmap to use in mfd driver.
- Pass the irq domain in mfd_add_devices to have proper interrupt mapping
  for sub devices like RTC.


Changes from V1: 
- rearrange patch on which older patch3 become the new patch 1.
- Add stub for new api in the regmap.
- Add empty line if paragraph change in description.

Changes from V2:
- Remove the older 4th patch where API for getting irq_domain was added
  in regmap. It is already taken care by Mark B.

Laxman Dewangan (4):
  mfd: tps65910: Initialize mfd devices after all initialization done
  mfd: tps65910: use regmap irq framework for interrupt support
  mfd: tps65910: move interrupt implementation code to mfd file
  mfd: tps65910: pass irq_domain when adding mfd sub devices

 drivers/mfd/Makefile |2 +-
 drivers/mfd/tps65910-irq.c   |  260 --
 drivers/mfd/tps65910.c   |  233 --
 include/linux/mfd/tps65910.h |  143 ---
 4 files changed, 325 insertions(+), 313 deletions(-)
 delete mode 100644 drivers/mfd/tps65910-irq.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 1/4] mfd: tps65910: Initialize mfd devices after all initialization done

2012-11-02 Thread Laxman Dewangan

Add sub devices of tps65910 after all initialization like interrupt,
clock etc. is done. This will make sure that require data gets
initialized properly before sub devices probe's get called.

Signed-off-by: Laxman Dewangan 
Reviewed-by: Mark Brown 
---
Changes from V:
- Rearrange patches so that this patch become first in place of third of
  series.
- Added Reviewed by: Mark as it is already revieved in patch V1.

Changes from V2:
- None.

 drivers/mfd/tps65910.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/mfd/tps65910.c b/drivers/mfd/tps65910.c
index 0d79ce2..27fbbe5 100644
--- a/drivers/mfd/tps65910.c
+++ b/drivers/mfd/tps65910.c
@@ -279,14 +279,6 @@ static __devinit int tps65910_i2c_probe(struct i2c_client 
*i2c,
return ret;
}
 
-   ret = mfd_add_devices(tps65910->dev, -1,
- tps65910s, ARRAY_SIZE(tps65910s),
- NULL, 0, NULL);
-   if (ret < 0) {
-   dev_err(>dev, "mfd_add_devices failed: %d\n", ret);
-   return ret;
-   }
-
init_data->irq = pmic_plat_data->irq;
init_data->irq_base = pmic_plat_data->irq_base;
 
@@ -299,6 +291,14 @@ static __devinit int tps65910_i2c_probe(struct i2c_client 
*i2c,
pm_power_off = tps65910_power_off;
}
 
+   ret = mfd_add_devices(tps65910->dev, -1,
+ tps65910s, ARRAY_SIZE(tps65910s),
+ NULL, 0, NULL);
+   if (ret < 0) {
+   dev_err(>dev, "mfd_add_devices failed: %d\n", ret);
+   return ret;
+   }
+
return ret;
 }
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3 4/4] mfd: tps65910: pass irq_domain when adding mfd sub devices

2012-11-02 Thread Laxman Dewangan

When adding the sub device "tps65910-rtc", is it passed the
IO resource IRQ for the interrupt number. This interrupt needs
to map in the device irq domain. Pass the irq domain of device
in mfd_add_devices() so that proper irq mapping can be done when
adding the sub device RTC.

Signed-off-by: Laxman Dewangan 
---
- Remove older patch 4 and make this as 4th patch.

 drivers/mfd/tps65910.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/mfd/tps65910.c b/drivers/mfd/tps65910.c
index d4d4eb5..ca37833 100644
--- a/drivers/mfd/tps65910.c
+++ b/drivers/mfd/tps65910.c
@@ -509,7 +509,8 @@ static __devinit int tps65910_i2c_probe(struct i2c_client 
*i2c,
 
ret = mfd_add_devices(tps65910->dev, -1,
  tps65910s, ARRAY_SIZE(tps65910s),
- NULL, 0, NULL);
+ NULL, 0,
+ regmap_irq_get_domain(tps65910->irq_data));
if (ret < 0) {
dev_err(>dev, "mfd_add_devices failed: %d\n", ret);
return ret;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] irq_work: A couple fixes v2

2012-11-02 Thread Frederic Weisbecker

Hey,

After some discussion with Steve, this is a respin with changelogs and
comments sanitized. The code itself hasn't changed.

Thanks.

Frederic Weisbecker (2):
  irq_work: Fix racy IRQ_WORK_BUSY flag setting
  irq_work: Fix racy check on work pending flag

 kernel/irq_work.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012-11-02 Thread Frederic Weisbecker

The IRQ_WORK_BUSY flag is set right before we execute the
work. Once this flag value is set, the work enters a
claimable state again.

So if we have specific data to compute in our work, we ensure it's
either handled by another CPU or locally by enqueuing the work again.
This state machine is guanranteed by atomic operations on the flags.

So when we set IRQ_WORK_BUSY without using an xchg-like operation,
we break this guarantee as in the following summarized scenario:

CPU 1   CPU 2
-   -
(flags = 0)
old_flags = flags;
(flags = 0)
cmpxchg(flags, old_flags,
old_flags | IRQ_WORK_FLAGS)
(flags = 3)
[...]
flags = IRQ_WORK_BUSY
(flags = 2)
func()
(sees flags = 3)
cmpxchg(flags, old_flags,
old_flags | 
IRQ_WORK_FLAGS)
(give up)

cmpxchg(flags, 2, 0);
(flags = 0)

CPU 1 claims a work and executes it, so it sets IRQ_WORK_BUSY and
the work is again in a claimable state. Now CPU 2 has new data to process
and try to claim that work but it may see a stale value of the flags
and think the work is still pending somewhere that will handle our data.
This is because CPU 1 doesn't set IRQ_WORK_BUSY atomically.

As a result, the data expected to be handle by CPU 2 won't get handled.

To fix this, use xchg() to set IRQ_WORK_BUSY, this way we ensure the CPU 2
will see the correct value with cmpxchg() using the expected ordering.

Changelog-heavily-inspired-by: Steven Rostedt 
Signed-off-by: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Steven Rostedt 
Cc: Paul Gortmaker 
Cc: Anish Kumar 
---
 kernel/irq_work.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 1588e3b..57be1a6 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -119,8 +119,11 @@ void irq_work_run(void)
/*
 * Clear the PENDING bit, after this point the @work
 * can be re-used.
+* Make it immediately visible so that other CPUs trying
+* to claim that work don't rely on us to handle their data
+* while we are in the middle of the func.
 */
-   work->flags = IRQ_WORK_BUSY;
+   xchg(>flags, IRQ_WORK_BUSY);
work->func(work);
/*
 * Clear the BUSY bit and return to the free state if
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] irq_work: Fix racy check on work pending flag

2012-11-02 Thread Frederic Weisbecker

Work claiming wants to be SMP-safe.

And by the time we try to claim a work, if it is already executing
concurrently on another CPU, we want to succeed the claiming and queue
the work again because the other CPU may have missed the data we wanted
to handle in our work if it's about to complete there.

This scenario is summarized below:

CPU 1   CPU 2
-   -
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
xchg(flags, IRQ_WORK_BUSY)
(flags = 2)
func()
if (flags & IRQ_WORK_PENDING)
(not true)
cmpxchg(flags, flags, 
IRQ_WORK_FLAGS)
(flags = 3)
[...]
cmpxchg(flags, IRQ_WORK_BUSY, 0);
(fail, pending on CPU 2)

This state machine is synchronized using [cmp]xchg() on the flags.
As such, the early IRQ_WORK_PENDING check in CPU 2 above is racy.
By the time we check it, we may be dealing with a stale value because
we aren't using an atomic accessor. As a result, CPU 2 may "see"
that the work is still pending on another CPU while it may be
actually completing the work function exection already, leaving
our data unprocessed.

To fix this, we start by speculating about the value we wish to be
in the work->flags but we only make any conclusion after the value
returned by the cmpxchg() call that either claims the work or let
the current owner handle the pending work for us.

Changelog-heavily-inspired-by: Steven Rostedt 
Signed-off-by: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Steven Rostedt 
Cc: Paul Gortmaker 
Cc: Anish Kumar 
---
 kernel/irq_work.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 57be1a6..64eddd5 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -34,15 +34,21 @@ static DEFINE_PER_CPU(struct llist_head, irq_work_list);
  */
 static bool irq_work_claim(struct irq_work *work)
 {
-   unsigned long flags, nflags;
+   unsigned long flags, oflags, nflags;
 
+   /*
+* Start with our best wish as a premise but only trust any
+* flag value after cmpxchg() result.
+*/
+   flags = work->flags & ~IRQ_WORK_PENDING;
for (;;) {
-   flags = work->flags;
-   if (flags & IRQ_WORK_PENDING)
-   return false;
nflags = flags | IRQ_WORK_FLAGS;
-   if (cmpxchg(>flags, flags, nflags) == flags)
+   oflags = cmpxchg(>flags, flags, nflags);
+   if (oflags == flags)
break;
+   if (oflags & IRQ_WORK_PENDING)
+   return false;
+   flags = oflags;
cpu_relax();
}
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/9] net: core: use this_cpu_ptr per-cpu helper

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:
> +++ b/net/core/flow.c
> @@ -327,11 +327,9 @@ static void flow_cache_flush_tasklet(unsigned long data)
>  static void flow_cache_flush_per_cpu(void *data)
>  {
>   struct flow_flush_info *info = data;
> - int cpu;
>   struct tasklet_struct *tasklet;
>
> - cpu = smp_processor_id();
> - tasklet = _cpu_ptr(info->cache->percpu, cpu)->flush_tasklet;
> + tasklet = _cpu_ptr(info->cache->percpu)->flush_tasklet

Another case for the use of this_cpu_read
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH REPOST 1/3] mfd: Convert tps6586x to irq_domain

2012-11-02 Thread Laxman Dewangan

Allocate the irq base if it base is not porvided i.e.
in case of device tree invocation of this driver.
Convert the tps6586x driver to irq domain, using a
legacy IRQ mapping if an irq_base is specified in
platform data or dynamically allocated and otherwise
using a linear mapping.

Signed-off-by: Laxman Dewangan 
Reviewed-by: Mark Brown 
---
Reposting in case of patch missed.
Also added Reviewed by Mark as he already reviewed the patches.

 drivers/mfd/tps6586x.c   |   91 ++---
 include/linux/mfd/tps6586x.h |1 +
 2 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/drivers/mfd/tps6586x.c b/drivers/mfd/tps6586x.c
index 4674643..2cdf1e6 100644
--- a/drivers/mfd/tps6586x.c
+++ b/drivers/mfd/tps6586x.c
@@ -17,12 +17,14 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -116,6 +118,7 @@ struct tps6586x {
int irq_base;
u32 irq_en;
u8  mask_reg[5];
+   struct irq_domain   *irq_domain;
 };
 
 static inline struct tps6586x *dev_to_tps6586x(struct device *dev)
@@ -184,6 +187,14 @@ int tps6586x_update(struct device *dev, int reg, uint8_t 
val, uint8_t mask)
 }
 EXPORT_SYMBOL_GPL(tps6586x_update);
 
+int tps6586x_irq_get_virq(struct device *dev, int irq)
+{
+   struct tps6586x *tps6586x = dev_to_tps6586x(dev);
+
+   return irq_create_mapping(tps6586x->irq_domain, irq);
+}
+EXPORT_SYMBOL_GPL(tps6586x_irq_get_virq);
+
 static int __remove_subdev(struct device *dev, void *unused)
 {
platform_device_unregister(to_platform_device(dev));
@@ -205,7 +216,7 @@ static void tps6586x_irq_lock(struct irq_data *data)
 static void tps6586x_irq_enable(struct irq_data *irq_data)
 {
struct tps6586x *tps6586x = irq_data_get_irq_chip_data(irq_data);
-   unsigned int __irq = irq_data->irq - tps6586x->irq_base;
+   unsigned int __irq = irq_data->hwirq;
const struct tps6586x_irq_data *data = _irqs[__irq];
 
tps6586x->mask_reg[data->mask_reg] &= ~data->mask_mask;
@@ -216,7 +227,7 @@ static void tps6586x_irq_disable(struct irq_data *irq_data)
 {
struct tps6586x *tps6586x = irq_data_get_irq_chip_data(irq_data);
 
-   unsigned int __irq = irq_data->irq - tps6586x->irq_base;
+   unsigned int __irq = irq_data->hwirq;
const struct tps6586x_irq_data *data = _irqs[__irq];
 
tps6586x->mask_reg[data->mask_reg] |= data->mask_mask;
@@ -239,6 +250,39 @@ static void tps6586x_irq_sync_unlock(struct irq_data *data)
mutex_unlock(>irq_lock);
 }
 
+static struct irq_chip tps6586x_irq_chip = {
+   .name = "tps6586x",
+   .irq_bus_lock = tps6586x_irq_lock,
+   .irq_bus_sync_unlock = tps6586x_irq_sync_unlock,
+   .irq_disable = tps6586x_irq_disable,
+   .irq_enable = tps6586x_irq_enable,
+};
+
+static int tps6586x_irq_map(struct irq_domain *h, unsigned int virq,
+   irq_hw_number_t hw)
+{
+   struct tps6586x *tps6586x = h->host_data;
+
+   irq_set_chip_data(virq, tps6586x);
+   irq_set_chip_and_handler(virq, _irq_chip, handle_simple_irq);
+   irq_set_nested_thread(virq, 1);
+
+   /* ARM needs us to explicitly flag the IRQ as valid
+* and will set them noprobe when we do so. */
+#ifdef CONFIG_ARM
+   set_irq_flags(virq, IRQF_VALID);
+#else
+   irq_set_noprobe(virq);
+#endif
+
+   return 0;
+}
+
+static struct irq_domain_ops tps6586x_domain_ops = {
+   .map= tps6586x_irq_map,
+   .xlate  = irq_domain_xlate_twocell,
+};
+
 static irqreturn_t tps6586x_irq(int irq, void *data)
 {
struct tps6586x *tps6586x = data;
@@ -259,7 +303,8 @@ static irqreturn_t tps6586x_irq(int irq, void *data)
int i = __ffs(acks);
 
if (tps6586x->irq_en & (1 << i))
-   handle_nested_irq(tps6586x->irq_base + i);
+   handle_nested_irq(
+   irq_find_mapping(tps6586x->irq_domain, i));
 
acks &= ~(1 << i);
}
@@ -272,11 +317,8 @@ static int __devinit tps6586x_irq_init(struct tps6586x 
*tps6586x, int irq,
 {
int i, ret;
u8 tmp[4];
-
-   if (!irq_base) {
-   dev_warn(tps6586x->dev, "No interrupt support on IRQ base\n");
-   return -EINVAL;
-   }
+   int new_irq_base;
+   int irq_num = ARRAY_SIZE(tps6586x_irqs);
 
mutex_init(>irq_lock);
for (i = 0; i < 5; i++) {
@@ -286,25 +328,24 @@ static int __devinit tps6586x_irq_init(struct tps6586x 
*tps6586x, int irq,
 
tps6586x_reads(tps6586x->dev, TPS6586X_INT_ACK1, sizeof(tmp), tmp);
 
-   tps6586x->irq_base = irq_base;
-
-   tps6586x->irq_chip.name = "tps6586x";
-   tps6586x->irq_chip.irq_enable = tps6586x_irq_enable;
-   tps6586x->irq_chip.irq_disable = tps6586x_irq_disable;
-

[PATCH v3] staging: ste_rmi4: Convert to Type-B support

2012-11-02 Thread Alexandra Chin

Hi Henrik and all,

This patch converts to MT-B because Synaptics touch devices are
capable of tracking identifiable fingers

This patch was tested on pandaboard, except input_mt_sync_frame(),
which is a quite new function.
I changed to use sylpheed as my mail client. Please let me know
if there is any problem.
Greatly appreciate your time.

Alexandra Chin

Signed-off-by: Alexandra Chin 
---
Changes from v3:
- Incorporated Henrik's review comments
  *remove 'else' after an error path return
  *add input_mt_sync_frame() for pointer emulation effects
  *correct names of touchscreen
- Replace printk with dev_err

Changes from v2:
- Incorporated Henrik's review comments
  *directly report finger state with Type-B
- Against 3.7-rcX
  *call input_mt_init_slots with INPUT_MT_DIRECT flag

 drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c |  122 -
 1 files changed, 57 insertions(+), 65 deletions(-)

diff --git a/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c 
b/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
index 277491a..7876f6b 100644
--- a/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
+++ b/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
@@ -1,7 +1,7 @@
 /**
  *
- * Synaptics Register Mapped Interface (RMI4) I2C Physical Layer Driver.
- * Copyright (c) 2007-2010, Synaptics Incorporated
+ * Synaptics Register Mapped Interface (RMI4) I2C Touchscreen Driver.
+ * Copyright (c) 2007-2012, Synaptics Incorporated
  *
  * Author: Js HA  for ST-Ericsson
  * Author: Naveen Kumar G  for ST-Ericsson
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "synaptics_i2c_rmi4.h"
 
 /* TODO: for multiple device support will need a per-device mutex */
@@ -63,12 +64,11 @@
 #define MASK_4BIT  0x0F
 #define MASK_3BIT  0x07
 #define MASK_2BIT  0x03
-#define TOUCHPAD_CTRL_INTR 0x8
+#define TOUCHSCREEN_CTRL_INTR  0x8
 #define PDT_START_SCAN_LOCATION (0x00E9)
 #define PDT_END_SCAN_LOCATION  (0x000A)
 #define PDT_ENTRY_SIZE (0x0006)
-#define RMI4_NUMBER_OF_MAX_FINGERS (8)
-#define SYNAPTICS_RMI4_TOUCHPAD_FUNC_NUM   (0x11)
+#define SYNAPTICS_RMI4_TOUCHSCREEN_FUNC_NUM(0x11)
 #define SYNAPTICS_RMI4_DEVICE_CONTROL_FUNC_NUM (0x01)
 
 /**
@@ -164,6 +164,7 @@ struct synaptics_rmi4_device_info {
  * @regulator: pointer to the regulator structure
  * @wait: wait queue structure variable
  * @touch_stopped: flag to stop the thread function
+ * @fingers_supported: maximum supported fingers
  *
  * This structure gives the device data information.
  */
@@ -184,6 +185,7 @@ struct synaptics_rmi4_data {
struct regulator*regulator;
wait_queue_head_t   wait;
booltouch_stopped;
+   unsigned char   fingers_supported;
 };
 
 /**
@@ -291,34 +293,33 @@ exit:
 }
 
 /**
- * synpatics_rmi4_touchpad_report() - reports for the rmi4 touchpad device
+ * synpatics_rmi4_touchscreen_report() - reports for the rmi4 touchscreen 
device
  * @pdata: pointer to synaptics_rmi4_data structure
  * @rfi: pointer to synaptics_rmi4_fn structure
  *
- * This function calls to reports for the rmi4 touchpad device
+ * This function calls to reports for the rmi4 touchscreen device
  */
-static int synpatics_rmi4_touchpad_report(struct synaptics_rmi4_data *pdata,
+static int synpatics_rmi4_touchscreen_report(struct synaptics_rmi4_data *pdata,
struct synaptics_rmi4_fn *rfi)
 {
/* number of touch points - fingers down in this case */
int touch_count = 0;
int finger;
-   int fingers_supported;
int finger_registers;
int reg;
int finger_shift;
int finger_status;
int retval;
+   int x, y;
+   int wx, wy;
unsigned short  data_base_addr;
unsigned short  data_offset;
unsigned char   data_reg_blk_size;
unsigned char   values[2];
unsigned char   data[DATA_LEN];
-   int x[RMI4_NUMBER_OF_MAX_FINGERS];
-   int y[RMI4_NUMBER_OF_MAX_FINGERS];
-   int wx[RMI4_NUMBER_OF_MAX_FINGERS];
-   int wy[RMI4_NUMBER_OF_MAX_FINGERS];
+   unsigned char   fingers_supported = pdata->fingers_supported;
struct  i2c_client *client = pdata->i2c_client;
+   struct  input_dev *input_dev = pdata->input_dev;
 
/* get 2D sensor finger data */
/*
@@ -333,7 +334,6 @@ static int synpatics_rmi4_touchpad_report(struct 
synaptics_rmi4_data *pdata,
 *  10 = finger present but data may not be accurate,
 *  11 = reserved for product use.
 */
-   fingers_supported   = rfi->num_of_data_points;
finger_registers= (fingers_supported + 3)/4;
data_base_addr  = rfi->fn_desc.data_base_addr;
retval = synaptics_rmi4_i2c_block_read(pdata, data_base_addr, values,
@@

Re: [PATCH v2 5/9] kernel: padata : use this_cpu_read per-cpu helper

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:

> - queue = per_cpu_ptr(pd->pqueue, smp_processor_id());
> - if (queue->cpu_index == next_queue->cpu_index) {
> + if (this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index) {
>   padata = ERR_PTR(-ENODATA);

Reviewed-by: Christoph Lameter 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH REPOST 2/3] mfd: tps6586x: add irq io-resource for rtc sub driver

2012-11-02 Thread Laxman Dewangan

Add IRQ IORESOURCE for rtc sub driver of this device.
The rtc driver can get the irq by calling platform_get_irq().

Signed-off-by: Laxman Dewangan 
Reviewed-by: Mark Brown 
---
Reposting in case of patch missed.
Also added Reviewed by Mark as he already reviewed the patches.

 drivers/mfd/tps6586x.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/mfd/tps6586x.c b/drivers/mfd/tps6586x.c
index 2cdf1e6..c11539a 100644
--- a/drivers/mfd/tps6586x.c
+++ b/drivers/mfd/tps6586x.c
@@ -96,12 +96,22 @@ static const struct tps6586x_irq_data tps6586x_irqs[] = {
[TPS6586X_INT_RTC_ALM2] = TPS6586X_IRQ(TPS6586X_INT_MASK4, 1 << 1),
 };
 
+static struct resource tps6586x_rtc_resources[] = {
+   {
+   .start  = TPS6586X_INT_RTC_ALM1,
+   .end= TPS6586X_INT_RTC_ALM1,
+   .flags  = IORESOURCE_IRQ,
+   },
+};
+
 static struct mfd_cell tps6586x_cell[] = {
{
.name = "tps6586x-gpio",
},
{
.name = "tps6586x-rtc",
+   .num_resources = ARRAY_SIZE(tps6586x_rtc_resources),
+   .resources = _rtc_resources[0],
},
{
.name = "tps6586x-onkey",
@@ -562,7 +572,7 @@ static int __devinit tps6586x_i2c_probe(struct i2c_client 
*client,
 
ret = mfd_add_devices(tps6586x->dev, -1,
  tps6586x_cell, ARRAY_SIZE(tps6586x_cell),
- NULL, 0, NULL);
+ NULL, 0, tps6586x->irq_domain);
if (ret < 0) {
dev_err(>dev, "mfd_add_devices failed: %d\n", ret);
goto err_mfd_add;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] mfd: tps6586x: implement gpio_to_irq

2012-11-02 Thread Laxman Dewangan

The TPS6586x adds the interrupt of this device using
linear mapping on irq domain.
Hence, implement gpio_to_irq to get the irq number
corresponding to TPS6586x GPIOs which is created
dynamically.

Signed-off-by: Laxman Dewangan 
Reviewed-by: Mark Brown 
Acked-by: Linus Walleij 
---
Reposting in case of patch missed.
Also added Reviewed by Mark as he already reviewed the patches.
Linus W has already acked it as adding his ack.

 drivers/gpio/gpio-tps6586x.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/gpio/gpio-tps6586x.c b/drivers/gpio/gpio-tps6586x.c
index 2526b3b..62e9e1c 100644
--- a/drivers/gpio/gpio-tps6586x.c
+++ b/drivers/gpio/gpio-tps6586x.c
@@ -80,6 +80,14 @@ static int tps6586x_gpio_output(struct gpio_chip *gc, 
unsigned offset,
val, mask);
 }
 
+static int tps6586x_gpio_to_irq(struct gpio_chip *gc, unsigned offset)
+{
+   struct tps6586x_gpio *tps6586x_gpio = to_tps6586x_gpio(gc);
+
+   return tps6586x_irq_get_virq(tps6586x_gpio->parent,
+   TPS6586X_INT_PLDO_0 + offset);
+}
+
 static int __devinit tps6586x_gpio_probe(struct platform_device *pdev)
 {
struct tps6586x_platform_data *pdata;
@@ -106,6 +114,7 @@ static int __devinit tps6586x_gpio_probe(struct 
platform_device *pdev)
tps6586x_gpio->gpio_chip.direction_output = tps6586x_gpio_output;
tps6586x_gpio->gpio_chip.set= tps6586x_gpio_set;
tps6586x_gpio->gpio_chip.get= tps6586x_gpio_get;
+   tps6586x_gpio->gpio_chip.to_irq = tps6586x_gpio_to_irq;
 
 #ifdef CONFIG_OF_GPIO
tps6586x_gpio->gpio_chip.of_node = pdev->dev.parent->of_node;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/9] net: openvswitch: use this_cpu_ptr per-cpu helper

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:

> +++ b/net/openvswitch/datapath.c
> @@ -208,7 +208,7 @@ void ovs_dp_process_received_packet(struct vport *p, 
> struct sk_buff *skb)
>   int error;
>   int key_len;
>
> - stats = per_cpu_ptr(dp->stats_percpu, smp_processor_id());
> + stats = this_cpu_ptr(dp->stats_percpu);
>
>   /* Extract flow from 'skb' into 'key'. */
>   error = ovs_flow_extract(skb, p->port_no, , _len);
> @@ -282,7 +282,7 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff 
> *skb,
>   return 0;
>
>  err:
> - stats = per_cpu_ptr(dp->stats_percpu, smp_processor_id());
> + stats = this_cpu_ptr(dp->stats_percpu);
>
>   u64_stats_update_begin(>sync);
>   stats->n_lost++;
> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> index 03779e8..70af0be 100644
> --- a/net/openvswitch/vport.c
> +++ b/net/openvswitch/vport.c
> @@ -333,8 +333,7 @@ void ovs_vport_receive(struct vport *vport, struct 
> sk_buff *skb)
>  {
>   struct vport_percpu_stats *stats;
>
> - stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());
> -
> + stats = this_cpu_ptr(vport->percpu_stats);
>   u64_stats_update_begin(>sync);
>   stats->rx_packets++;
>   stats->rx_bytes += skb->len;
> @@ -359,7 +358,7 @@ int ovs_vport_send(struct vport *vport, struct sk_buff 
> *skb)
>   if (likely(sent)) {
>   struct vport_percpu_stats *stats;
>
> - stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());
> + stats = this_cpu_ptr(vport->percpu_stats);
>
>   u64_stats_update_begin(>sync);
>   stats->tx_packets++;

Use this_cpu_inc(vport->percpu_stats->packets) here?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:

>
>   /* Funnel through hierarchy to reduce memory contention. */
> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> + rnp = __this_cpu_read(rsp->rda->mynode);
>   for (; rnp != NULL; rnp = rnp->parent) {

Reviewed-by: Christoph Lameter 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 7/9] trace: use this_cpu_ptr per-cpu helper

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:

> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 31e4f55..81ae35b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1513,7 +1513,7 @@ static char *get_trace_buf(void)
>   if (!percpu_buffer)
>   return NULL;
>
> - buffer = per_cpu_ptr(percpu_buffer, smp_processor_id());
> + buffer = this_cpu_ptr(percpu_buffer);
>
>   return buffer->buffer;

return this_cpu_read(percpu_buffer->buffer)

and drop the this_cpu_ptr operation please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH REPOST 0/3] mfd: tps6586x: Convert to irq domain

2012-11-02 Thread Laxman Dewangan

This patch series convert the irq implementation to use the
irq domain.
Accordingly, gpio driver and rtc registration is updated.

Reposting the series and adding the reviwed by Mark and acked by Linus W
for the respective patches.

Laxman Dewangan (3):
  mfd: Convert tps6586x to irq_domain
  mfd: tps6586x: add irq io-resource for rtc sub driver
  mfd: tps6586x: implement gpio_to_irq

 drivers/gpio/gpio-tps6586x.c |9 
 drivers/mfd/tps6586x.c   |  103 +++---
 include/linux/mfd/tps6586x.h |1 +
 3 files changed, 87 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 9/9] net: batman-adv: use per_cpu_add helper

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:

> diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
> index 897ba6a..3aef5b2 100644
> --- a/net/batman-adv/main.h
> +++ b/net/batman-adv/main.h
> @@ -263,9 +263,7 @@ static inline bool batadv_has_timed_out(unsigned long 
> timestamp,
>  static inline void batadv_add_counter(struct batadv_priv *bat_priv, size_t 
> idx,
> size_t count)
>  {
> - int cpu = get_cpu();
> - per_cpu_ptr(bat_priv->bat_counters, cpu)[idx] += count;
> - put_cpu();
> + this_cpu_add(bat_priv->bat_counters[idx], count);
>  }

Reviewed-by: Christoph Lameter 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread James Bottomley

On Fri, 2012-11-02 at 16:54 +, Matthew Garrett wrote:
> On Fri, Nov 02, 2012 at 04:52:44PM +, James Bottomley wrote:
> 
> > The first question is how many compromises do you need.  Without
> > co-operation from windows, you don't get to install something in the
> > boot system, so if you're looking for a single compromise vector, the
> > only realistic attack is to trick the user into booting a hacked linux
> > system from USB or DVD.
> 
> You run a binary. It pops up a box saying "Windows needs your permission 
> to continue", just like almost every other Windows binary that's any 
> use. Done.

And if all the loaders do some type of present user test on a virgin
system, how do you propose to get that message up there?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9 v2] use efficient this_cpu_* helper

2012-11-02 Thread Christoph Lameter

On Sat, 3 Nov 2012, Shan Wei wrote:

> this_cpu_ptr is faster than per_cpu_ptr(p, smp_processor_id())
> and can reduce  memory accesses.
> The latter helper needs to find the offset for current cpu,
> and needs more assembler instructions which objdump shows in following.
>
> per_cpu_ptr(p, smp_processor_id())：
>   1e:   65 8b 04 25 00 00 00 00 mov%gs:0x0,%eax
>   26:   48 98   cltq
>   28:   31 f6   xor%esi,%esi
>   2a:   48 c7 c7 00 00 00 00mov$0x0,%rdi
>   31:   48 8b 04 c5 00 00 00 00 mov0x0(,%rax,8),%rax
>   39:   c7 44 10 04 14 00 00 00 movl   $0x14,0x4(%rax,%rdx,1)
>
> this_cpu_ptr(p)
>   1e:   65 48 03 14 25 00 00 00 00  add%gs:0x0,%rdx
>   27:   31 f6   xor%esi,%esi
>   29:   c7 42 04 14 00 00 00movl   $0x14,0x4(%rdx)
>   30:   48 c7 c7 00 00 00 00mov$0x0,%rdi

this_cpu_read() etc even avoids the use of this_cpu_ptr()
reducing the 6 instructions earlier to 1.

Re: setting up CDB filters in udev (was Re: [PATCH v2 0/3] block: add queue-private command filter, editable via sysfs)

2012-11-02 Thread Paolo Bonzini

Il 02/11/2012 17:51, Tejun Heo ha scritto:
>>> > > What disturbs me is that it's a completely new interface to userland
>>> > > and at the same a very limited one at that.  So, yeah, it's
>>> > > bothersome.  I personally would prefer SCM_RIGHTS behavior change +
>>> > > hard coded filters per device class.
>> > 
>> > I think hard-coded filters are bad (I prefer to move policy to
>> > userspace), and SCM_RIGHTS without a ioctl is out of question, really.
> No rule is really absolute.  To me, it seems the suggested in-kernel
> per-device command code filter is both too big for the given problem

Is it?  150 lines of code?  The per-class filters would share the first
two patches with this series, add a long list of commands to filter, and
the ioctl would be on top of that.

Long lists are better kept in configuration files than in kernel
sources; not to mention the higher cost of getting the API wrong for a
ioctl vs. sysfs.

> while being too limited for much beyond that.

What are the use cases beyond these?  AFAIK these were the first two in
ten years or so...

> So, if we can get away
> with adding an ioctl, I personally think that would be a better
> approach.

I would really prefer to get a green light from Jens/James for per-class
filters in the kernel (which are worth a few hundred lines of data)
before implementing that.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] xfs: silence GCC warning

2012-11-02 Thread Paul Bolle

On Fri, 2012-11-02 at 09:07 -0400, Christoph Hellwig wrote:
> Looks good, Dave has actually sent it a tidbit earlier as part
> of his series with fixes for 3.7-rc

I see, thanks.


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] of/address: sparc: Declare of_address_to_resource() as an extern function for sparc again

2012-11-02 Thread Sam Ravnborg

Hi Andreas.
On Fri, Nov 02, 2012 at 12:03:56PM +0100, Andreas Larsson wrote:
> This bug-fix makes sure that of_address_to_resource is defined extern for 
> sparc
> so that the sparc-specific implementation of of_address_to_resource() is once
> again used when including include/linux/of_address.h in a sparc context. A
> number of drivers in mainline relies on this function working for sparc.

How about following (untested) approach.
diff --git a/arch/sparc/include/asm/prom.h b/arch/sparc/include/asm/prom.h
index c287651..8194801 100644
--- a/arch/sparc/include/asm/prom.h
+++ b/arch/sparc/include/asm/prom.h
@@ -63,5 +63,8 @@ extern char *of_console_options;
 extern void irq_trans_init(struct device_node *dp);
 extern char *build_path_component(struct device_node *dp);
 
+/* SPARC has a local implementation */
+#define of_address_to_resource of_address_to_resource
+
 #endif /* __KERNEL__ */
 #endif /* _SPARC_PROM_H */
diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index a1984dd..e20e3af 100644
--- a/include/linux/of_address.h
+++ b/include/linux/of_address.h
@@ -28,11 +28,13 @@ static inline unsigned long pci_address_to_pio(phys_addr_t 
addr) { return -1; }
 #endif
 
 #else /* CONFIG_OF_ADDRESS */
+#ifndef of_address_to_resource
 static inline int of_address_to_resource(struct device_node *dev, int index,
 struct resource *r)
 {
return -EINVAL;
 }
+#endif
 static inline struct device_node *of_find_matching_node_by_address(
struct device_node *from,
const struct of_device_id *matches,

We use prom.h to teach the general of layer what SPARC provides.
In prom.h we define the symbol of_address_to_resource which tell
of_address.h that we have a local definition of this function, and
the static version is skipped.

This looks more elegant as we do not have to hardcode SPARC in of_address.h
and this is easy to re-use the sme pattern in other places.

Also pci_address_to_pio already uses the same approach in the same file.
pci_address_to_pio is defined if it was not defined before - I see no
reason to do so which is why I omitted it in the above.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/9] trace: use this_cpu_ptr per-cpu helper

2012-11-02 Thread Christoph Lameter

On Fri, 2 Nov 2012, Shan Wei wrote:

> Christoph Lameter said, at 2012/11/1 1:50:
> >>
> >> -  buffer = per_cpu_ptr(percpu_buffer, smp_processor_id());
> >> +  buffer = this_cpu_ptr(percpu_buffer);
> >>
> >>return buffer->buffer;
> >
> >
> > Just do a
> >
> > return this_cpu_read(percpu_buffer->buffer);
> >
> > and get rid of the this_cpu_ptr op
>
> can not do that.
> kernel/trace/trace.c:1515: error: incompatible types when assigning to type 
> 'char[1024]' from type 'char *'

hmm what is actually returned is a pointer to char right? And buffer
is char[1024] so I guess then you need to pass a pointer to char to
this_cpu_read.

return this_cpu_read(&(percpu_buffer->buffer))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: setting up CDB filters in udev (was Re: [PATCH v2 0/3] block: add queue-private command filter, editable via sysfs)

2012-11-02 Thread Tejun Heo

Hello, Paolo.

On Fri, Nov 02, 2012 at 06:49:43PM +0100, Paolo Bonzini wrote:
> > No rule is really absolute.  To me, it seems the suggested in-kernel
> > per-device command code filter is both too big for the given problem
> 
> Is it?  150 lines of code?  The per-class filters would share the first
> two patches with this series, add a long list of commands to filter, and
> the ioctl would be on top of that.

It's not really about the lines of code.  It adds a new userland
visible interface.  As for the "long" list of commands, it depends on
how you write it but even if it's textually long it's still very
simple in terms of actual complexity.

> > while being too limited for much beyond that.
> 
> What are the use cases beyond these?  AFAIK these were the first two in
> ten years or so...

If this is such a cold area, why do we want do anything other than the
simplest possible?

> > So, if we can get away
> > with adding an ioctl, I personally think that would be a better
> > approach.
> 
> I would really prefer to get a green light from Jens/James for per-class
> filters in the kernel (which are worth a few hundred lines of data)
> before implementing that.

Sure, Jens, James?  Guys, come on.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread Matthew Garrett

On Fri, Nov 02, 2012 at 05:48:31PM +, James Bottomley wrote:
> On Fri, 2012-11-02 at 16:54 +, Matthew Garrett wrote:
> > On Fri, Nov 02, 2012 at 04:52:44PM +, James Bottomley wrote:
> > 
> > > The first question is how many compromises do you need.  Without
> > > co-operation from windows, you don't get to install something in the
> > > boot system, so if you're looking for a single compromise vector, the
> > > only realistic attack is to trick the user into booting a hacked linux
> > > system from USB or DVD.
> > 
> > You run a binary. It pops up a box saying "Windows needs your permission 
> > to continue", just like almost every other Windows binary that's any 
> > use. Done.
> 
> And if all the loaders do some type of present user test on a virgin
> system, how do you propose to get that message up there?

? That's the message generated by the Windows access control mechanism 
when you run a binary that requests elevated privileges.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread James Bottomley

On Fri, 2012-11-02 at 17:54 +, Matthew Garrett wrote:
> On Fri, Nov 02, 2012 at 05:48:31PM +, James Bottomley wrote:
> > On Fri, 2012-11-02 at 16:54 +, Matthew Garrett wrote:
> > > On Fri, Nov 02, 2012 at 04:52:44PM +, James Bottomley wrote:
> > > 
> > > > The first question is how many compromises do you need.  Without
> > > > co-operation from windows, you don't get to install something in the
> > > > boot system, so if you're looking for a single compromise vector, the
> > > > only realistic attack is to trick the user into booting a hacked linux
> > > > system from USB or DVD.
> > > 
> > > You run a binary. It pops up a box saying "Windows needs your permission 
> > > to continue", just like almost every other Windows binary that's any 
> > > use. Done.
> > 
> > And if all the loaders do some type of present user test on a virgin
> > system, how do you propose to get that message up there?
> 
> ? That's the message generated by the Windows access control mechanism 
> when you run a binary that requests elevated privileges.

So that's a windows attack vector using a windows binary? I can't really
see how it's relevant to the secure boot discussion then.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [B.A.T.M.A.N.] [PATCH v2 9/9] net: batman-adv: use per_cpu_add helper

2012-11-02 Thread Sven Eckelmann

On Saturday 03 November 2012 00:02:06 Shan Wei wrote:
> From: Shan Wei 
> 
> As Christoph Lameter said:
> > In addition, following usage of per_cpu_ptr can be replaced by
> > this_cpu_read.
> > 
> > cpu=get_cpu()
> > 
> > *per_cpu_ptr(p,cpu)
> > 
> > 
> > put_cpu()
> 
> Right.
> 
> Signed-off-by: Shan Wei 
> ---

Is this really supposed to be the commit message?

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread Matthew Garrett

On Fri, Nov 02, 2012 at 05:57:38PM +, James Bottomley wrote:
> On Fri, 2012-11-02 at 17:54 +, Matthew Garrett wrote:
> > ? That's the message generated by the Windows access control mechanism 
> > when you run a binary that requests elevated privileges.
> 
> So that's a windows attack vector using a windows binary? I can't really
> see how it's relevant to the secure boot discussion then.

A user runs a binary that elevates itself to admin. Absent any flaws in 
Windows (cough), that should be all it can do in a Secure Boot world. 
But if you can drop a small trusted Linux system in there and use that 
to boot a compromised Windows kernel, it can make itself persistent.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily

2012-11-02 Thread Oleg Nesterov

Currently the writer does msleep() plus synchronize_sched() 3 times
to acquire/release the semaphore, and during this time the readers
are blocked completely. Even if the "write" section was not actually
started or if it was already finished.

With this patch down_write/up_write does synchronize_sched() twice
and down_read/up_read are still possible during this time, just they
use the slow path.

percpu_down_write() first forces the readers to use rw_semaphore and
increment the "slow" counter to take the lock for reading, then it
takes that rw_semaphore for writing and blocks the readers.

Also. With this patch the code relies on the documented behaviour of
synchronize_sched(), it doesn't try to pair synchronize_sched() with
barrier.

Signed-off-by: Oleg Nesterov 
---
 include/linux/percpu-rwsem.h |   83 +
 lib/Makefile |2 +-
 lib/percpu-rwsem.c   |  123 ++
 3 files changed, 137 insertions(+), 71 deletions(-)
 create mode 100644 lib/percpu-rwsem.c

diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 250a4ac..592f0d6 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -2,82 +2,25 @@
 #define _LINUX_PERCPU_RWSEM_H
 
 #include 
+#include 
 #include 
-#include 
-#include 
+#include 
 
 struct percpu_rw_semaphore {
-   unsigned __percpu *counters;
-   bool locked;
-   struct mutex mtx;
+   unsigned int __percpu   *fast_read_ctr;
+   struct mutexwriter_mutex;
+   struct rw_semaphore rw_sem;
+   atomic_tslow_read_ctr;
+   wait_queue_head_t   write_waitq;
 };
 
-#define light_mb() barrier()
-#define heavy_mb() synchronize_sched()
+extern void percpu_down_read(struct percpu_rw_semaphore *);
+extern void percpu_up_read(struct percpu_rw_semaphore *);
 
-static inline void percpu_down_read(struct percpu_rw_semaphore *p)
-{
-   rcu_read_lock_sched();
-   if (unlikely(p->locked)) {
-   rcu_read_unlock_sched();
-   mutex_lock(>mtx);
-   this_cpu_inc(*p->counters);
-   mutex_unlock(>mtx);
-   return;
-   }
-   this_cpu_inc(*p->counters);
-   rcu_read_unlock_sched();
-   light_mb(); /* A, between read of p->locked and read of data, paired 
with D */
-}
+extern void percpu_down_write(struct percpu_rw_semaphore *);
+extern void percpu_up_write(struct percpu_rw_semaphore *);
 
-static inline void percpu_up_read(struct percpu_rw_semaphore *p)
-{
-   light_mb(); /* B, between read of the data and write to p->counter, 
paired with C */
-   this_cpu_dec(*p->counters);
-}
-
-static inline unsigned __percpu_count(unsigned __percpu *counters)
-{
-   unsigned total = 0;
-   int cpu;
-
-   for_each_possible_cpu(cpu)
-   total += ACCESS_ONCE(*per_cpu_ptr(counters, cpu));
-
-   return total;
-}
-
-static inline void percpu_down_write(struct percpu_rw_semaphore *p)
-{
-   mutex_lock(>mtx);
-   p->locked = true;
-   synchronize_sched(); /* make sure that all readers exit the 
rcu_read_lock_sched region */
-   while (__percpu_count(p->counters))
-   msleep(1);
-   heavy_mb(); /* C, between read of p->counter and write to data, paired 
with B */
-}
-
-static inline void percpu_up_write(struct percpu_rw_semaphore *p)
-{
-   heavy_mb(); /* D, between write to data and write to p->locked, paired 
with A */
-   p->locked = false;
-   mutex_unlock(>mtx);
-}
-
-static inline int percpu_init_rwsem(struct percpu_rw_semaphore *p)
-{
-   p->counters = alloc_percpu(unsigned);
-   if (unlikely(!p->counters))
-   return -ENOMEM;
-   p->locked = false;
-   mutex_init(>mtx);
-   return 0;
-}
-
-static inline void percpu_free_rwsem(struct percpu_rw_semaphore *p)
-{
-   free_percpu(p->counters);
-   p->counters = NULL; /* catch use after free bugs */
-}
+extern int percpu_init_rwsem(struct percpu_rw_semaphore *);
+extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
 
 #endif
diff --git a/lib/Makefile b/lib/Makefile
index 821a162..4dad4a7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 idr.o int_sqrt.o extable.o \
 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
-is_single_threaded.o plist.o decompress.o
+is_single_threaded.o plist.o decompress.o percpu-rwsem.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/percpu-rwsem.c b/lib/percpu-rwsem.c
new file mode 100644
index 000..0e3bc0f
--- /dev/null
+++ b/lib/percpu-rwsem.c
@@ -0,0 +1,123 @@
+#include 
+#include 
+#include 
+
+int percpu_init_rwsem(struct percpu_rw_semaphore *brw)
+{
+   brw->fast_read_ctr = alloc_percpu(int);
+   if

[PATCH v2 0/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily

2012-11-02 Thread Oleg Nesterov

On 11/01, Linus Torvalds wrote:
>
> On Wed, Oct 31, 2012 at 12:41 PM, Oleg Nesterov  wrote:
> >
> > With this patch down_read/up_read does synchronize_sched() twice and
> > down_read/up_read are still possible during this time, just they use
> > the slow path.
>
> The changelog is wrong (it's the write path, not read path, that does
> the synchronize_sched).

Fixed, thanks,

> >  struct percpu_rw_semaphore {
> > -   unsigned __percpu *counters;
> > -   bool locked;
> > -   struct mutex mtx;
> > +   int __percpu*fast_read_ctr;
>
> This change is wrong.
>
> You must not make the 'fast_read_ctr' thing be an int. Or at least you
> need to be a hell of a lot more careful about it.
>
> Why?
>
> Because the readers update the counters while possibly moving around
> cpu's, the increment and decrement of the counters may be on different
> CPU's. But that means that when you add all the counters together,
> things can overflow (only the final sum is meaningful). And THAT in
> turn means that you should not use a signed count, for the simple
> reason that signed integers don't have well-behaved overflow behavior
> in C.

Yes, Mikulas has pointed this too, but I forgot to make it "unsigned".

> Now, I doubt you'll find an architecture or C compiler where this will
> actually ever make a difference,

Yes. And we have other examples, say, mnt->mnt_pcp->mnt_writers is "int".

> but the fact remains that you
> shouldn't use signed integers for counters like this. You should use
> unsigned, and you should rely on the well-defined modulo-2**n
> semantics.

OK, I changed this.

But please note that clear_fast_ctr() still returns "int", even if it
uses "unsigned" to calculate the result. Because we use this value for
atomic_add(int i) and it can be actually negative, so to me it looks
a bit better this way even if the generated code is the same.

> I'd also like to see a comment somewhere in the source code about the
> whole algorithm and the rules.

Added the comments before down_read and down_write.

> Other than that, I guess it looks ok.

Great, please see v2.

I am not sure I addressed Paul's concerns, so I guess I need his ack.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

2012-11-02 Thread Paul E. McKenney

On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> From: Shan Wei 
> 
> Signed-off-by: Shan Wei 
> ---
>  kernel/rcutree.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 74df86b..441b945 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
>   struct rcu_node *rnp_old = NULL;
> 
>   /* Funnel through hierarchy to reduce memory contention. */
> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> + rnp = __this_cpu_read(rsp->rda->mynode);

OK, I'll bite...  Why this instead of:

rnp = __this_cpu_read(rsp->rda)->mynode;

Thanx, Paul

>   for (; rnp != NULL; rnp = rnp->parent) {
>   ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
> !raw_spin_trylock(>fqslock);
> -- 
> 1.7.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/19] tracing: Make tracing_enabled be equal to tracing_on

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

The tracing_enabled file has been deprecated as it never was able
to serve its purpose well. The tracing_on file has taken over.
Instead of having code to keep tracing_enabled, have the tracing_enabled
file just set tracing_on, and remove the tracing_enabled variable.

This allows us to remove the tracing_enabled file. The reason that
the remove is in a different change set and not removed here is
in case we find some lonely userspace tool that requires the file
to exist. Then the removal patch will get reverted, but this one
will not.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c  |   79 +++--
 kernel/trace/trace_selftest.c |   12 ---
 2 files changed, 5 insertions(+), 86 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d1d8039..3c9b96a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -205,20 +205,9 @@ static struct trace_array  max_tr;
 
 static DEFINE_PER_CPU(struct trace_array_cpu, max_tr_data);
 
-/* tracer_enabled is used to toggle activation of a tracer */
-static int tracer_enabled = 1;
-
-/**
- * tracing_is_enabled - return tracer_enabled status
- *
- * This function is used by other tracers to know the status
- * of the tracer_enabled flag.  Tracers may use this function
- * to know if it should enable their features when starting
- * up. See irqsoff tracer for an example (start_irqsoff_tracer).
- */
 int tracing_is_enabled(void)
 {
-   return tracer_enabled;
+   return tracing_is_on();
 }
 
 /*
@@ -1112,8 +1101,7 @@ void trace_find_cmdline(int pid, char comm[])
 
 void tracing_record_cmdline(struct task_struct *tsk)
 {
-   if (atomic_read(_record_cmdline_disabled) || !tracer_enabled ||
-   !tracing_is_on())
+   if (atomic_read(_record_cmdline_disabled) || !tracing_is_on())
return;
 
if (!__this_cpu_read(trace_cmdline_save))
@@ -2967,56 +2955,6 @@ static const struct file_operations 
tracing_saved_cmdlines_fops = {
 };
 
 static ssize_t
-tracing_ctrl_read(struct file *filp, char __user *ubuf,
- size_t cnt, loff_t *ppos)
-{
-   char buf[64];
-   int r;
-
-   r = sprintf(buf, "%u\n", tracer_enabled);
-   return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
-}
-
-static ssize_t
-tracing_ctrl_write(struct file *filp, const char __user *ubuf,
-  size_t cnt, loff_t *ppos)
-{
-   struct trace_array *tr = filp->private_data;
-   unsigned long val;
-   int ret;
-
-   ret = kstrtoul_from_user(ubuf, cnt, 10, );
-   if (ret)
-   return ret;
-
-   val = !!val;
-
-   mutex_lock(_types_lock);
-   if (tracer_enabled ^ val) {
-
-   /* Only need to warn if this is used to change the state */
-   WARN_ONCE(1, "tracing_enabled is deprecated. Use tracing_on");
-
-   if (val) {
-   tracer_enabled = 1;
-   if (current_trace->start)
-   current_trace->start(tr);
-   tracing_start();
-   } else {
-   tracer_enabled = 0;
-   tracing_stop();
-   if (current_trace->stop)
-   current_trace->stop(tr);
-   }
-   }
-   mutex_unlock(_types_lock);
-
-   *ppos += cnt;
-
-   return cnt;
-}
-
-static ssize_t
 tracing_set_trace_read(struct file *filp, char __user *ubuf,
   size_t cnt, loff_t *ppos)
 {
@@ -3469,7 +3407,7 @@ static int tracing_wait_pipe(struct file *filp)
return -EINTR;
 
/*
-* We block until we read something and tracing is disabled.
+* We block until we read something and tracing is enabled.
 * We still block if tracing is disabled, but we have never
 * read anything. This allows a user to cat this file, and
 * then enable tracing. But after we have read something,
@@ -3477,7 +3415,7 @@ static int tracing_wait_pipe(struct file *filp)
 *
 * iter->pos will be 0 if we haven't read anything.
 */
-   if (!tracer_enabled && iter->pos)
+   if (tracing_is_enabled() && iter->pos)
break;
}
 
@@ -4076,13 +4014,6 @@ static const struct file_operations tracing_max_lat_fops 
= {
.llseek = generic_file_llseek,
 };
 
-static const struct file_operations tracing_ctrl_fops = {
-   .open   = tracing_open_generic,
-   .read   = tracing_ctrl_read,
-   .write  = tracing_ctrl_write,
-   .llseek = generic_file_llseek,
-};
-
 static const struct file_operations set_tracer_fops = {
.open   = tracing_open_generic,
.read   =

[PATCH 15/19] tracing: Remove unused function unregister_tracer()

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

The function register_tracer() is only used by kernel core code,
that never needs to remove the tracer. As trace_events have become
the main way to add new tracing to the kernel, the need to
unregister a tracer has diminished. Remove the unused function
unregister_tracer(). If a need arises where we need it, then we
can always add it back.

Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |   26 --
 kernel/trace/trace.h |1 -
 2 files changed, 27 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6ed6013..d1d8039 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -882,32 +882,6 @@ int register_tracer(struct tracer *type)
return ret;
 }
 
-void unregister_tracer(struct tracer *type)
-{
-   struct tracer **t;
-
-   mutex_lock(_types_lock);
-   for (t = _types; *t; t = &(*t)->next) {
-   if (*t == type)
-   goto found;
-   }
-   pr_info("Tracer %s not registered\n", type->name);
-   goto out;
-
- found:
-   *t = (*t)->next;
-
-   if (type == current_trace && tracer_enabled) {
-   tracer_enabled = 0;
-   tracing_stop();
-   if (current_trace->stop)
-   current_trace->stop(_trace);
-   current_trace = _trace;
-   }
-out:
-   mutex_unlock(_types_lock);
-}
-
 void tracing_reset(struct trace_array *tr, int cpu)
 {
struct ring_buffer *buffer = tr->buffer;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 839ae00..3e8a176 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -410,7 +410,6 @@ void tracing_sched_switch_assign_trace(struct trace_array 
*tr);
 void tracing_stop_sched_switch_record(void);
 void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
-void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
 enum trace_file_type {
TRACE_FILE_LAT_FMT  = 1,
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 01/19] tracing: Replace strict_strto* with kstrto*

2012-11-02 Thread Steven Rostedt

From: Daniel Walter 

 * remove old string conversions with kstrto*

Link: http://lkml.kernel.org/r/20120926200838.gc1...@0x90.at

Signed-off-by: Daniel Walter 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/ftrace.c  |2 +-
 kernel/trace/trace.c   |2 +-
 kernel/trace/trace_events_filter.c |4 ++--
 kernel/trace/trace_functions.c |2 +-
 kernel/trace/trace_kprobe.c|2 +-
 kernel/trace/trace_probe.c |   14 +++---
 kernel/trace/trace_uprobe.c|2 +-
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9dcf15d..60ad606 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4381,7 +4381,7 @@ ftrace_pid_write(struct file *filp, const char __user 
*ubuf,
if (strlen(tmp) == 0)
return 1;
 
-   ret = strict_strtol(tmp, 10, );
+   ret = kstrtol(tmp, 10, );
if (ret < 0)
return ret;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 31e4f55..f6928ed 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -431,7 +431,7 @@ static int __init set_tracing_thresh(char *str)
 
if (!str)
return 0;
-   ret = strict_strtoul(str, 0, );
+   ret = kstrtoul(str, 0, );
if (ret < 0)
return 0;
tracing_thresh = threshold * 1000;
diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index c154797..e5b0ca8 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -1000,9 +1000,9 @@ static int init_pred(struct filter_parse_state *ps,
}
} else {
if (field->is_signed)
-   ret = strict_strtoll(pred->regex.pattern, 0, );
+   ret = kstrtoll(pred->regex.pattern, 0, );
else
-   ret = strict_strtoull(pred->regex.pattern, 0, );
+   ret = kstrtoull(pred->regex.pattern, 0, );
if (ret) {
parse_error(ps, FILT_ERR_ILLEGAL_INTVAL, 0);
return -EINVAL;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 507a7a9..618dcf8 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -366,7 +366,7 @@ ftrace_trace_onoff_callback(struct ftrace_hash *hash,
 * We use the callback data field (which is a pointer)
 * as our counter.
 */
-   ret = strict_strtoul(number, 0, (unsigned long *));
+   ret = kstrtoul(number, 0, (unsigned long *));
if (ret)
return ret;
 
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1a21170..5a3c533 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -444,7 +444,7 @@ static int create_trace_probe(int argc, char **argv)
return -EINVAL;
}
/* an address specified */
-   ret = strict_strtoul([1][0], 0, (unsigned long *));
+   ret = kstrtoul([1][0], 0, (unsigned long *));
if (ret) {
pr_info("Failed to parse address.\n");
return ret;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index daa9980..412e959 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -441,7 +441,7 @@ static const struct fetch_type *find_fetch_type(const char 
*type)
goto fail;
 
type++;
-   if (strict_strtoul(type, 0, ))
+   if (kstrtoul(type, 0, ))
goto fail;
 
switch (bs) {
@@ -501,8 +501,8 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned 
long *offset)
 
tmp = strchr(symbol, '+');
if (tmp) {
-   /* skip sign because strict_strtol doesn't accept '+' */
-   ret = strict_strtoul(tmp + 1, 0, offset);
+   /* skip sign because kstrtoul doesn't accept '+' */
+   ret = kstrtoul(tmp + 1, 0, offset);
if (ret)
return ret;
 
@@ -533,7 +533,7 @@ static int parse_probe_vars(char *arg, const struct 
fetch_type *t,
else
ret = -EINVAL;
} else if (isdigit(arg[5])) {
-   ret = strict_strtoul(arg + 5, 10, );
+   ret = kstrtoul(arg + 5, 10, );
if (ret || param > PARAM_MAX_STACK)
ret = -EINVAL;
else {
@@ -579,7 +579,7 @@ static int parse_probe_arg(char *arg, const struct 
fetch_type *t,
 
case '@':   /* memory or symbol */
if (isdigit(arg[1])) {
-   ret = strict_strtoul(arg + 1, 0, );
+   ret = kstrtoul(arg + 1, 0, );

[PATCH 18/19] tracing: Use irq_work for wake ups and remove _nowake_() functions

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

Have the ring buffer commit function use the irq_work infrastructure to
wake up any waiters waiting on the ring buffer for new data. The irq_work
was created for such a purpose, where doing the actual wake up at the
time of adding data is too dangerous, as an event or function trace may
be in the midst of the work queue locks and cause deadlocks. The irq_work
will either delay the action to the next timer interrupt, or trigger an IPI
to itself forcing an interrupt to do the work (in a safe location).

With irq_work, all ring buffer commits can safely do wakeups, removing
the need for the ring buffer commit "nowake" variants, which were used
by events and function tracing. All commits can now safely use the
normal commit, and the "nowake" variants can be removed.

Cc: Peter Zijlstra 
Signed-off-by: Steven Rostedt 
---
 include/linux/ftrace_event.h  |   14 ++---
 include/trace/ftrace.h|3 +-
 kernel/trace/Kconfig  |1 +
 kernel/trace/trace.c  |  121 +
 kernel/trace/trace.h  |5 --
 kernel/trace/trace_events.c   |2 +-
 kernel/trace/trace_kprobe.c   |8 +--
 kernel/trace/trace_sched_switch.c |2 +-
 kernel/trace/trace_selftest.c |1 +
 9 files changed, 84 insertions(+), 73 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..b80c8dd 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -127,13 +127,13 @@ trace_current_buffer_lock_reserve(struct ring_buffer 
**current_buffer,
 void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags, int pc);
-void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
-  struct ring_buffer_event *event,
-   unsigned long flags, int pc);
-void trace_nowake_buffer_unlock_commit_regs(struct ring_buffer *buffer,
-   struct ring_buffer_event *event,
-   unsigned long flags, int pc,
-   struct pt_regs *regs);
+void trace_buffer_unlock_commit(struct ring_buffer *buffer,
+   struct ring_buffer_event *event,
+   unsigned long flags, int pc);
+void trace_buffer_unlock_commit_regs(struct ring_buffer *buffer,
+struct ring_buffer_event *event,
+unsigned long flags, int pc,
+struct pt_regs *regs);
 void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
 struct ring_buffer_event *event);
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index a763888..698f2a8 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -545,8 +545,7 @@ ftrace_raw_event_##call(void *__data, proto)
\
{ assign; } \
\
if (!filter_current_check_discard(buffer, event_call, entry, event)) \
-   trace_nowake_buffer_unlock_commit(buffer,   \
- event, irq_flags, pc); \
+   trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
 }
 /*
  * The ftrace_test_probe is compiled out, it is only here as a build time check
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 4cea4f4..5d89335 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -119,6 +119,7 @@ config TRACING
select BINARY_PRINTF
select EVENT_TRACING
select TRACE_CLOCK
+   select IRQ_WORK
 
 config GENERIC_TRACER
bool
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d5cbc0d..37d1c70 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -85,6 +86,14 @@ static int dummy_set_flag(u32 old_flags, u32 bit, int set)
 static DEFINE_PER_CPU(bool, trace_cmdline_save);
 
 /*
+ * When a reader is waiting for data, then this variable is
+ * set to true.
+ */
+static bool trace_wakeup_needed;
+
+static struct irq_work trace_work_wakeup;
+
+/*
  * Kill all tracing for good (never come back).
  * It is initialized to 1 but will turn to zero if the initialization
  * of the tracer is successful. But that is the only place that sets
@@ -329,12 +338,18 @@ unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | 
TRACE_ITER_PRINTK |
 static int trace_stop_count;
 static DEFINE_RAW_SPINLOCK(tracing_start_lock);
 
-static void wakeup_work_handler(struct work_struct *work)

[PATCH 17/19] tracing: Remove deprecated tracing_enabled file

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

The tracing_enabled file was used as a quick way to stop
tracers, and try to bring down overhead for things like
the latency tracers (irqsoff, wakeup, etc). But it didn't
work that well.

The tracing_on file was created as a really fast way to
stop recording into the ftrace ring buffer and can interact
with the kernel. That is a tracing_off() call in the kernel
can disable recording of events, and then from userspace one
could echo 1 into the tracing_on file to continue it. The
tracing_enabled function did too much to allow for this.

The tracing_on has taken over as a way to start and stop tracing
and the tracing_enabled file should not be used. But because of
its existance, it still confuses people. Over a year ago the
following commit was added:

 commit 6752ab4a9c30d5411b2dfdb251a3f1cb18aae487
 Author: Steven Rostedt 
 Date:   Tue Feb 8 13:54:06 2011 -0500

tracing: Deprecate tracing_enabled for tracing_on

This commit added a WARN_ON() if the tracing_enabled file's variable
was changed. After this was added, only LatencyTop complained, and
they soon fixed their tool as there was no reason that LatencyTop
should touch this file as it was using the perf ring buffers which
this file does not interact with. But since that time no one else
has complained about this WARN_ON(). Thus it is safe to assume that
this file is no longer needed. Time to get rid of it.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 3c9b96a..d5cbc0d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4788,9 +4788,6 @@ static __init int tracer_init_debugfs(void)
 
d_tracer = tracing_init_dentry();
 
-   trace_create_file("tracing_enabled", 0644, d_tracer,
- _trace, _simple_fops);
-
trace_create_file("trace_options", 0644, d_tracer,
NULL, _iter_fops);
 
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 11/19] linux/kernel.h: Remove duplicate trace_printk declaration

2012-11-02 Thread Steven Rostedt

From: Michal Hocko 

!CONFIG_TRACING both declares and defines (empty) trace_printk.
The first one is not redundant so it can be removed.

Link: http://lkml.kernel.org/r/1351172511-18125-1-git-send-email-mho...@suse.cz

Signed-off-by: Michal Hocko 
Signed-off-by: Steven Rostedt 
---
 include/linux/kernel.h |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index a123b13..7785d5d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -527,9 +527,6 @@ __ftrace_vprintk(unsigned long ip, const char *fmt, va_list 
ap);
 
 extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
 #else
-static inline __printf(1, 2)
-int trace_printk(const char *fmt, ...);
-
 static inline void tracing_start(void) { }
 static inline void tracing_stop(void) { }
 static inline void ftrace_off_permanent(void) { }
@@ -539,8 +536,8 @@ static inline void tracing_on(void) { }
 static inline void tracing_off(void) { }
 static inline int tracing_is_on(void) { return 0; }
 
-static inline int
-trace_printk(const char *fmt, ...)
+static inline __printf(1, 2)
+int trace_printk(const char *fmt, ...)
 {
return 0;
 }
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 14/19] tracing: Separate open function from set_event and available_events

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

The open function used by available_events is the same as set_event even
though it uses different seq functions. This causes a side effect of
writing into available_events clearing all events, even though
available_events is suppose to be read only.

There's no reason to keep a single function for just the open and have
both use different functions for everything else. It is a little
confusing and causes strange behavior. Just have each have their own
function.

Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace_events.c |   46 +--
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index dec47e7..cb2df3b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -491,19 +491,6 @@ static void t_stop(struct seq_file *m, void *p)
mutex_unlock(_mutex);
 }
 
-static int
-ftrace_event_seq_open(struct inode *inode, struct file *file)
-{
-   const struct seq_operations *seq_ops;
-
-   if ((file->f_mode & FMODE_WRITE) &&
-   (file->f_flags & O_TRUNC))
-   ftrace_clear_events();
-
-   seq_ops = inode->i_private;
-   return seq_open(file, seq_ops);
-}
-
 static ssize_t
 event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
  loff_t *ppos)
@@ -980,6 +967,9 @@ show_header(struct file *filp, char __user *ubuf, size_t 
cnt, loff_t *ppos)
return r;
 }
 
+static int ftrace_event_avail_open(struct inode *inode, struct file *file);
+static int ftrace_event_set_open(struct inode *inode, struct file *file);
+
 static const struct seq_operations show_event_seq_ops = {
.start = t_start,
.next = t_next,
@@ -995,14 +985,14 @@ static const struct seq_operations show_set_event_seq_ops 
= {
 };
 
 static const struct file_operations ftrace_avail_fops = {
-   .open = ftrace_event_seq_open,
+   .open = ftrace_event_avail_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
 };
 
 static const struct file_operations ftrace_set_event_fops = {
-   .open = ftrace_event_seq_open,
+   .open = ftrace_event_set_open,
.read = seq_read,
.write = ftrace_event_write,
.llseek = seq_lseek,
@@ -1078,6 +1068,26 @@ static struct dentry *event_trace_events_dir(void)
return d_events;
 }
 
+static int
+ftrace_event_avail_open(struct inode *inode, struct file *file)
+{
+   const struct seq_operations *seq_ops = _event_seq_ops;
+
+   return seq_open(file, seq_ops);
+}
+
+static int
+ftrace_event_set_open(struct inode *inode, struct file *file)
+{
+   const struct seq_operations *seq_ops = _set_event_seq_ops;
+
+   if ((file->f_mode & FMODE_WRITE) &&
+   (file->f_flags & O_TRUNC))
+   ftrace_clear_events();
+
+   return seq_open(file, seq_ops);
+}
+
 static struct dentry *
 event_subsystem_dir(const char *name, struct dentry *d_events)
 {
@@ -1508,15 +1518,13 @@ static __init int event_trace_init(void)
return 0;
 
entry = debugfs_create_file("available_events", 0444, d_tracer,
-   (void *)_event_seq_ops,
-   _avail_fops);
+   NULL, _avail_fops);
if (!entry)
pr_warning("Could not create debugfs "
   "'available_events' entry\n");
 
entry = debugfs_create_file("set_event", 0644, d_tracer,
-   (void *)_set_event_seq_ops,
-   _set_event_fops);
+   NULL, _set_event_fops);
if (!entry)
pr_warning("Could not create debugfs "
   "'set_event' entry\n");
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 00/19] [GIT PULL][3.8] tracing: updates (v2)

2012-11-02 Thread Steven Rostedt


Ingo,

I removed the few problem patches (and their dependencies) and
retested the result.

Please pull the latest tip/perf/core-2 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/core-2

Head SHA1: 7bcfaf54f591a0775254c4ea679faf615152ee3a


Daniel Walter (1):
  tracing: Replace strict_strto* with kstrto*

David Sharp (2):
  tracing: Trivial cleanup
  tracing: Reset ring buffer when changing trace_clocks

Hiraku Toyooka (1):
  tracing: Change tracer's integer flags to bool

Michal Hocko (1):
  linux/kernel.h: Remove duplicate trace_printk declaration

Slava Pestov (1):
  ring-buffer: Add a 'dropped events' counter

Steven Rostedt (11):
  tracing: Allow tracers to start at core initcall
  tracing: Expand ring buffer when trace_printk() is used
  tracing: Enable comm recording if trace_printk() is used
  tracing: Have tracing_sched_wakeup_trace() use standard unlock_commit
  tracing: Cache comms only after an event occurred
  tracing: Separate open function from set_event and available_events
  tracing: Remove unused function unregister_tracer()
  tracing: Make tracing_enabled be equal to tracing_on
  tracing: Remove deprecated tracing_enabled file
  tracing: Use irq_work for wake ups and remove *_nowake_*() functions
  tracing: Add trace_options kernel command line parameter

Vaibhav Nagarnaik (1):
  tracing: Cleanup unnecessary function declarations

Yoshihiro YUNOMAE (1):
  ring-buffer: Change unsigned long type of ring_buffer_oldest_event_ts() 
to u64


 Documentation/kernel-parameters.txt  |   16 ++
 include/linux/ftrace_event.h |   14 +-
 include/linux/kernel.h   |7 +-
 include/linux/ring_buffer.h  |3 +-
 include/trace/ftrace.h   |3 +-
 include/trace/syscall.h  |   23 ---
 kernel/trace/Kconfig |1 +
 kernel/trace/ftrace.c|6 +-
 kernel/trace/ring_buffer.c   |   51 -
 kernel/trace/trace.c |  372 +-
 kernel/trace/trace.h |   14 +-
 kernel/trace/trace_branch.c  |4 +-
 kernel/trace/trace_events.c  |   51 +++--
 kernel/trace/trace_events_filter.c   |4 +-
 kernel/trace/trace_functions.c   |5 +-
 kernel/trace/trace_functions_graph.c |6 +-
 kernel/trace/trace_irqsoff.c |   14 +-
 kernel/trace/trace_kprobe.c  |   10 +-
 kernel/trace/trace_probe.c   |   14 +-
 kernel/trace/trace_sched_switch.c|4 +-
 kernel/trace/trace_sched_wakeup.c|   10 +-
 kernel/trace/trace_selftest.c|   13 +-
 kernel/trace/trace_syscalls.c|   61 +++---
 kernel/trace/trace_uprobe.c  |2 +-
 24 files changed, 365 insertions(+), 343 deletions(-)


signature.asc
Description: This is a digitally signed message part

[PATCH 12/19] tracing: Reset ring buffer when changing trace_clocks

2012-11-02 Thread Steven Rostedt

From: David Sharp 

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.

Google-Bug-Id: 6980623
Link: 
http://lkml.kernel.org/r/1349998076-15495-3-git-send-email-dhsh...@google.com

Cc: Masami Hiramatsu 
Signed-off-by: David Sharp 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 88111b0..6ed6013 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4073,6 +4073,14 @@ static ssize_t tracing_clock_write(struct file *filp, 
const char __user *ubuf,
if (max_tr.buffer)
ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+   /*
+* New clock may not be consistent with the previous clock.
+* Reset the buffer so that it doesn't have incomparable timestamps.
+*/
+   tracing_reset_online_cpus(_trace);
+   if (max_tr.buffer)
+   tracing_reset_online_cpus(_tr);
+
mutex_unlock(_types_lock);
 
*fpos += cnt;
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 13/19] ring-buffer: Change unsigned long type of ring_buffer_oldest_event_ts() to u64

2012-11-02 Thread Steven Rostedt

From: Yoshihiro YUNOMAE 

ring_buffer_oldest_event_ts() should return a value of u64 type, because
ring_buffer_per_cpu->buffer_page->buffer_data_page->time_stamp is u64 type.

Link: 
http://lkml.kernel.org/r/1349998076-15495-5-git-send-email-dhsh...@google.com

Cc: Frederic Weisbecker 
Cc: Vaibhav Nagarnaik 
Signed-off-by: Yoshihiro YUNOMAE 
Signed-off-by: David Sharp 
Signed-off-by: Steven Rostedt 
---
 include/linux/ring_buffer.h |2 +-
 kernel/trace/ring_buffer.c  |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2007375..519777e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -159,7 +159,7 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer);
 void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
 
-unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu);
+u64 ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_entries(struct ring_buffer *buffer);
 unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 23a384b..3c7834c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2932,12 +2932,12 @@ rb_num_of_entries(struct ring_buffer_per_cpu 
*cpu_buffer)
  * @buffer: The ring buffer
  * @cpu: The per CPU buffer to read from.
  */
-unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
+u64 ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
 {
unsigned long flags;
struct ring_buffer_per_cpu *cpu_buffer;
struct buffer_page *bpage;
-   unsigned long ret;
+   u64 ret;
 
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return 0;
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 19/19] tracing: Add trace_options kernel command line parameter

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

Add trace_options to the kernel command line parameter to be able to
set options at early boot. For example, to enable stack dumps of
events, add the following:

  trace_options=stacktrace

This along with the trace_event option, you can get not only
traces of the events but also the stack dumps with them.

Requested-by: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 Documentation/kernel-parameters.txt |   16 +++
 kernel/trace/trace.c|   54 +--
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 9776f06..2b48c52 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2859,6 +2859,22 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
to facilitate early boot debugging.
See also Documentation/trace/events.txt
 
+   trace_options=[option-list]
+   [FTRACE] Enable or disable tracer options at boot.
+   The option-list is a comma delimited list of options
+   that can be enabled or disabled just as if you were
+   to echo the option name into
+
+   /sys/kernel/debug/tracing/trace_options
+
+   For example, to enable stacktrace option (to dump the
+   stack trace of each event), add to the command line:
+
+ trace_options=stacktrace
+
+   See also Documentation/trace/ftrace.txt "trace options"
+   section.
+
transparent_hugepage=
[KNL]
Format: [always|madvise|never]
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 37d1c70..c1434b5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -155,6 +155,18 @@ static int __init set_ftrace_dump_on_oops(char *str)
 }
 __setup("ftrace_dump_on_oops", set_ftrace_dump_on_oops);
 
+
+static char trace_boot_options_buf[MAX_TRACER_SIZE] __initdata;
+static char *trace_boot_options __initdata;
+
+static int __init set_trace_boot_options(char *str)
+{
+   strncpy(trace_boot_options_buf, str, MAX_TRACER_SIZE);
+   trace_boot_options = trace_boot_options_buf;
+   return 0;
+}
+__setup("trace_options=", set_trace_boot_options);
+
 unsigned long long ns2usecs(cycle_t nsec)
 {
nsec += 500;
@@ -2838,24 +2850,14 @@ static void set_tracer_flags(unsigned int mask, int 
enabled)
trace_printk_start_stop_comm(enabled);
 }
 
-static ssize_t
-tracing_trace_options_write(struct file *filp, const char __user *ubuf,
-   size_t cnt, loff_t *ppos)
+static int trace_set_options(char *option)
 {
-   char buf[64];
char *cmp;
int neg = 0;
-   int ret;
+   int ret = 0;
int i;
 
-   if (cnt >= sizeof(buf))
-   return -EINVAL;
-
-   if (copy_from_user(, ubuf, cnt))
-   return -EFAULT;
-
-   buf[cnt] = 0;
-   cmp = strstrip(buf);
+   cmp = strstrip(option);
 
if (strncmp(cmp, "no", 2) == 0) {
neg = 1;
@@ -2874,10 +2876,25 @@ tracing_trace_options_write(struct file *filp, const 
char __user *ubuf,
mutex_lock(_types_lock);
ret = set_tracer_option(current_trace, cmp, neg);
mutex_unlock(_types_lock);
-   if (ret)
-   return ret;
}
 
+   return ret;
+}
+
+static ssize_t
+tracing_trace_options_write(struct file *filp, const char __user *ubuf,
+   size_t cnt, loff_t *ppos)
+{
+   char buf[64];
+
+   if (cnt >= sizeof(buf))
+   return -EINVAL;
+
+   if (copy_from_user(, ubuf, cnt))
+   return -EFAULT;
+
+   trace_set_options(buf);
+
*ppos += cnt;
 
return cnt;
@@ -5133,6 +5150,13 @@ __init static int tracer_alloc_buffers(void)
 
register_die_notifier(_die_notifier);
 
+   while (trace_boot_options) {
+   char *option;
+
+   option = strsep(_boot_options, ",");
+   trace_set_options(option);
+   }
+
return 0;
 
 out_free_cpumask:
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 07/19] tracing: Have tracing_sched_wakeup_trace() use standard unlock_commit

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

The functon tracing_sched_wakeup_trace() does an open coded unlock
commit and save stack. This is what the trace_nowake_buffer_unlock_commit()
is for.

Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace_sched_switch.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_sched_switch.c 
b/kernel/trace/trace_sched_switch.c
index 7e62c0a..b0a136a 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -102,9 +102,7 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
entry->next_cpu = task_cpu(wakee);
 
if (!filter_check_discard(call, entry, buffer, event))
-   ring_buffer_unlock_commit(buffer, event);
-   ftrace_trace_stack(tr->buffer, flags, 6, pc);
-   ftrace_trace_userstack(tr->buffer, flags, pc);
+   trace_nowake_buffer_unlock_commit(buffer, event, flags, pc);
 }
 
 static void
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 08/19] tracing: Cache comms only after an event occurred

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

Whenever an event is registered, the comm of tasks are saved at
every task switch instead of saving them at every event. But if
an event isn't executed much, the comm cache will be filled up
by tasks that did not record the event and you lose out on the comms
that did.

Here's an example, if you enable the following events:

echo 1 > /debug/tracing/events/kvm/kvm_cr/enable
echo 1 > /debug/tracing/events/net/net_dev_xmit/enable

Note, there's no kvm running on this machine so the first event will
never be triggered, but because it is enabled, the storing of comms
will continue. If we now disable the network event:

echo 0 > /debug/tracing/events/net/net_dev_xmit/enable

and look at the trace:

cat /debug/tracing/trace
sshd-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
sshd-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=226 rc=0
sshd-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=226 rc=0
sshd-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6be0 len=114 rc=0
sshd-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 
skbaddr=88005cbb6be0 len=114 rc=0

We see that process 2672 which triggered the events has the comm "sshd".
But if we run hackbench for a bit and look again:

cat /debug/tracing/trace
   <...>-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=242 rc=0
   <...>-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6de0 len=226 rc=0
   <...>-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 
skbaddr=88005cbb6de0 len=226 rc=0
   <...>-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 
skbaddr=88005cbb6be0 len=114 rc=0
   <...>-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 
skbaddr=88005cbb6be0 len=114 rc=0

The stored "sshd" comm has been flushed out and we get a useless "<...>".

But by only storing comms after a trace event occurred, we can run
hackbench all day and still get the same output.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |   35 ++
 kernel/trace/trace.h |3 +++
 kernel/trace/trace_branch.c  |2 +-
 kernel/trace/trace_functions_graph.c |4 ++--
 4 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b90a827..88111b0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -78,6 +78,13 @@ static int dummy_set_flag(u32 old_flags, u32 bit, int set)
 }
 
 /*
+ * To prevent the comm cache from being overwritten when no
+ * tracing is active, only save the comm when a trace event
+ * occurred.
+ */
+static DEFINE_PER_CPU(bool, trace_cmdline_save);
+
+/*
  * Kill all tracing for good (never come back).
  * It is initialized to 1 but will turn to zero if the initialization
  * of the tracer is successful. But that is the only place that sets
@@ -1135,6 +1142,11 @@ void tracing_record_cmdline(struct task_struct *tsk)
!tracing_is_on())
return;
 
+   if (!__this_cpu_read(trace_cmdline_save))
+   return;
+
+   __this_cpu_write(trace_cmdline_save, false);
+
trace_save_cmdline(tsk);
 }
 
@@

[PATCH 03/19] tracing: Change tracers integer flags to bool

2012-11-02 Thread Steven Rostedt

From: Hiraku Toyooka 

print_max and use_max_tr in struct tracer are "int" variables and
used like flags. This is wasteful, so change the type to "bool".

Link: http://lkml.kernel.org/r/20121002082710.9807.86393.stgit@falsita

Signed-off-by: Hiraku Toyooka 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.h  |4 ++--
 kernel/trace/trace_irqsoff.c  |   12 ++--
 kernel/trace/trace_sched_wakeup.c |8 
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c15f528..c56a233 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -285,8 +285,8 @@ struct tracer {
int (*set_flag)(u32 old_flags, u32 bit, int set);
struct tracer   *next;
struct tracer_flags *flags;
-   int print_max;
-   int use_max_tr;
+   boolprint_max;
+   booluse_max_tr;
 };
 
 
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 11edebd..5ffce7b 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -604,7 +604,7 @@ static struct tracer irqsoff_tracer __read_mostly =
.reset  = irqsoff_tracer_reset,
.start  = irqsoff_tracer_start,
.stop   = irqsoff_tracer_stop,
-   .print_max  = 1,
+   .print_max  = true,
.print_header   = irqsoff_print_header,
.print_line = irqsoff_print_line,
.flags  = _flags,
@@ -614,7 +614,7 @@ static struct tracer irqsoff_tracer __read_mostly =
 #endif
.open   = irqsoff_trace_open,
.close  = irqsoff_trace_close,
-   .use_max_tr = 1,
+   .use_max_tr = true,
 };
 # define register_irqsoff(trace) register_tracer()
 #else
@@ -637,7 +637,7 @@ static struct tracer preemptoff_tracer __read_mostly =
.reset  = irqsoff_tracer_reset,
.start  = irqsoff_tracer_start,
.stop   = irqsoff_tracer_stop,
-   .print_max  = 1,
+   .print_max  = true,
.print_header   = irqsoff_print_header,
.print_line = irqsoff_print_line,
.flags  = _flags,
@@ -647,7 +647,7 @@ static struct tracer preemptoff_tracer __read_mostly =
 #endif
.open   = irqsoff_trace_open,
.close  = irqsoff_trace_close,
-   .use_max_tr = 1,
+   .use_max_tr = true,
 };
 # define register_preemptoff(trace) register_tracer()
 #else
@@ -672,7 +672,7 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
.reset  = irqsoff_tracer_reset,
.start  = irqsoff_tracer_start,
.stop   = irqsoff_tracer_stop,
-   .print_max  = 1,
+   .print_max  = true,
.print_header   = irqsoff_print_header,
.print_line = irqsoff_print_line,
.flags  = _flags,
@@ -682,7 +682,7 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 #endif
.open   = irqsoff_trace_open,
.close  = irqsoff_trace_close,
-   .use_max_tr = 1,
+   .use_max_tr = true,
 };
 
 # define register_preemptirqsoff(trace) register_tracer()
diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index 2f6af78..bc64fc1 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -589,7 +589,7 @@ static struct tracer wakeup_tracer __read_mostly =
.reset  = wakeup_tracer_reset,
.start  = wakeup_tracer_start,
.stop   = wakeup_tracer_stop,
-   .print_max  = 1,
+   .print_max  = true,
.print_header   = wakeup_print_header,
.print_line = wakeup_print_line,
.flags  = _flags,
@@ -599,7 +599,7 @@ static struct tracer wakeup_tracer __read_mostly =
 #endif
.open   = wakeup_trace_open,
.close  = wakeup_trace_close,
-   .use_max_tr = 1,
+   .use_max_tr = true,
 };
 
 static struct tracer wakeup_rt_tracer __read_mostly =
@@ -610,7 +610,7 @@ static struct tracer wakeup_rt_tracer __read_mostly =
.start  = wakeup_tracer_start,
.stop   = wakeup_tracer_stop,
.wait_pipe  = poll_wait_pipe,
-   .print_max  = 1,
+   .print_max  = true,
.print_header   = wakeup_print_header,
.print_line = wakeup_print_line,
.flags  = _flags,
@@ -620,7 +620,7 @@ static struct tracer wakeup_rt_tracer __read_mostly =
 #endif
.open   = wakeup_trace_open,
.close  = wakeup_trace_close,
-   .use_max_tr = 1,
+   .use_max_tr = true,
 };
 
 __init static int init_wakeup_tracer(void)
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 09/19] tracing: Trivial cleanup

2012-11-02 Thread Steven Rostedt

From: David Sharp 

Remove ftrace_format_syscall() declaration; it is neither defined nor
used. Also update a comment and formatting.

Link: 
http://lkml.kernel.org/r/1339112785-21806-1-git-send-email-vnagarn...@google.com

Signed-off-by: David Sharp 
Signed-off-by: Vaibhav Nagarnaik 
Signed-off-by: Steven Rostedt 
---
 include/trace/syscall.h|2 --
 kernel/trace/ring_buffer.c |6 +++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 31966a4..0c95796 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -39,8 +39,6 @@ extern int reg_event_syscall_enter(struct ftrace_event_call 
*call);
 extern void unreg_event_syscall_enter(struct ftrace_event_call *call);
 extern int reg_event_syscall_exit(struct ftrace_event_call *call);
 extern void unreg_event_syscall_exit(struct ftrace_event_call *call);
-extern int
-ftrace_format_syscall(struct ftrace_event_call *call, struct trace_seq *s);
 enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags,
  struct trace_event *event);
 enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags,
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0ebeb1d..23a384b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1821,7 +1821,7 @@ rb_add_time_stamp(struct ring_buffer_event *event, u64 
delta)
 }
 
 /**
- * ring_buffer_update_event - update event type and data
+ * rb_update_event - update event type and data
  * @event: the even to update
  * @type: the type of event
  * @length: the size of the event field in the ring buffer
@@ -2723,8 +2723,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_discard_commit);
  * and not the length of the event which would hold the header.
  */
 int ring_buffer_write(struct ring_buffer *buffer,
-   unsigned long length,
-   void *data)
+ unsigned long length,
+ void *data)
 {
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 06/19] tracing: Enable comm recording if trace_printk() is used

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

If comm recording is not enabled when trace_printk() is used then
you just get this type of output:

[ adding trace_printk("hello! %d", irq); in do_IRQ ]

   <...>-2843  [001] d.h.80.812300: do_IRQ: hello! 14
   <...>-2734  [002] d.h280.824664: do_IRQ: hello! 14
   <...>-2713  [003] d.h.80.829971: do_IRQ: hello! 14
   <...>-2814  [000] d.h.80.833026: do_IRQ: hello! 14

By enabling the comm recorder when trace_printk is enabled:

   hackbench-6715  [001] d.h.   193.233776: do_IRQ: hello! 21
sshd-2659  [001] d.h.   193.665862: do_IRQ: hello! 21
  -0 [001] d.h1   193.665996: do_IRQ: hello! 21

Suggested-by: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c|   36 ++--
 kernel/trace/trace.h|1 +
 kernel/trace/trace_events.c |3 +++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a5411b7..b90a827 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1559,10 +1559,10 @@ static int alloc_percpu_trace_buffer(void)
return -ENOMEM;
 }
 
+static int buffers_allocated;
+
 void trace_printk_init_buffers(void)
 {
-   static int buffers_allocated;
-
if (buffers_allocated)
return;
 
@@ -1575,6 +1575,34 @@ void trace_printk_init_buffers(void)
tracing_update_buffers();
 
buffers_allocated = 1;
+
+   /*
+* trace_printk_init_buffers() can be called by modules.
+* If that happens, then we need to start cmdline recording
+* directly here. If the global_trace.buffer is already
+* allocated here, then this was called by module code.
+*/
+   if (global_trace.buffer)
+   tracing_start_cmdline_record();
+}
+
+void trace_printk_start_comm(void)
+{
+   /* Start tracing comms if trace printk is set */
+   if (!buffers_allocated)
+   return;
+   tracing_start_cmdline_record();
+}
+
+static void trace_printk_start_stop_comm(int enabled)
+{
+   if (!buffers_allocated)
+   return;
+
+   if (enabled)
+   tracing_start_cmdline_record();
+   else
+   tracing_stop_cmdline_record();
 }
 
 /**
@@ -2797,6 +2825,9 @@ static void set_tracer_flags(unsigned int mask, int 
enabled)
 
if (mask == TRACE_ITER_OVERWRITE)
ring_buffer_change_overwrite(global_trace.buffer, enabled);
+
+   if (mask == TRACE_ITER_PRINTK)
+   trace_printk_start_stop_comm(enabled);
 }
 
 static ssize_t
@@ -5099,6 +5130,7 @@ __init static int tracer_alloc_buffers(void)
 
/* Only allocate trace_printk buffers if a trace_printk exists */
if (__stop___trace_bprintk_fmt != __start___trace_bprintk_fmt)
+   /* Must be called before global_trace.buffer is allocated */
trace_printk_init_buffers();
 
/* To save memory, keep the ring buffer size to its minimum */
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c56a233..7824a55 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -841,6 +841,7 @@ extern const char *__start___trace_bprintk_fmt[];
 extern const char *__stop___trace_bprintk_fmt[];
 
 void trace_printk_init_buffers(void);
+void trace_printk_start_comm(void);
 
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter)\
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index d608d09..dec47e7 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1489,6 +1489,9 @@ static __init int event_trace_enable(void)
if (ret)
pr_warn("Failed to enable trace event: %s\n", token);
}
+
+   trace_printk_start_comm();
+
return 0;
 }
 
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 10/19] tracing: Cleanup unnecessary function declarations

2012-11-02 Thread Steven Rostedt

From: Vaibhav Nagarnaik 

The functions defined in include/trace/syscalls.h are not used directly
since struct ftrace_event_class was introduced. Remove them from the
header file and rearrange the ftrace_event_class declarations in
trace_syscalls.c.

Link: 
http://lkml.kernel.org/r/1339112785-21806-2-git-send-email-vnagarn...@google.com

Signed-off-by: Vaibhav Nagarnaik 
Signed-off-by: Steven Rostedt 
---
 include/trace/syscall.h   |   21 --
 kernel/trace/trace_syscalls.c |   61 -
 2 files changed, 29 insertions(+), 53 deletions(-)

diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 0c95796..84bc419 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -31,25 +31,4 @@ struct syscall_metadata {
struct ftrace_event_call *exit_event;
 };
 
-#ifdef CONFIG_FTRACE_SYSCALLS
-extern unsigned long arch_syscall_addr(int nr);
-extern int init_syscall_trace(struct ftrace_event_call *call);
-
-extern int reg_event_syscall_enter(struct ftrace_event_call *call);
-extern void unreg_event_syscall_enter(struct ftrace_event_call *call);
-extern int reg_event_syscall_exit(struct ftrace_event_call *call);
-extern void unreg_event_syscall_exit(struct ftrace_event_call *call);
-enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags,
- struct trace_event *event);
-enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags,
-struct trace_event *event);
-#endif
-
-#ifdef CONFIG_PERF_EVENTS
-int perf_sysenter_enable(struct ftrace_event_call *call);
-void perf_sysenter_disable(struct ftrace_event_call *call);
-int perf_sysexit_enable(struct ftrace_event_call *call);
-void perf_sysexit_disable(struct ftrace_event_call *call);
-#endif
-
 #endif /* _TRACE_SYSCALL_H */
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 2485a7d..7609dd6 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -21,9 +21,6 @@ static int syscall_enter_register(struct ftrace_event_call 
*event,
 static int syscall_exit_register(struct ftrace_event_call *event,
 enum trace_reg type, void *data);
 
-static int syscall_enter_define_fields(struct ftrace_event_call *call);
-static int syscall_exit_define_fields(struct ftrace_event_call *call);
-
 static struct list_head *
 syscall_get_enter_fields(struct ftrace_event_call *call)
 {
@@ -32,30 +29,6 @@ syscall_get_enter_fields(struct ftrace_event_call *call)
return >enter_fields;
 }
 
-struct trace_event_functions enter_syscall_print_funcs = {
-   .trace  = print_syscall_enter,
-};
-
-struct trace_event_functions exit_syscall_print_funcs = {
-   .trace  = print_syscall_exit,
-};
-
-struct ftrace_event_class event_class_syscall_enter = {
-   .system = "syscalls",
-   .reg= syscall_enter_register,
-   .define_fields  = syscall_enter_define_fields,
-   .get_fields = syscall_get_enter_fields,
-   .raw_init   = init_syscall_trace,
-};
-
-struct ftrace_event_class event_class_syscall_exit = {
-   .system = "syscalls",
-   .reg= syscall_exit_register,
-   .define_fields  = syscall_exit_define_fields,
-   .fields = LIST_HEAD_INIT(event_class_syscall_exit.fields),
-   .raw_init   = init_syscall_trace,
-};
-
 extern struct syscall_metadata *__start_syscalls_metadata[];
 extern struct syscall_metadata *__stop_syscalls_metadata[];
 
@@ -432,7 +405,7 @@ void unreg_event_syscall_exit(struct ftrace_event_call 
*call)
mutex_unlock(_trace_lock);
 }
 
-int init_syscall_trace(struct ftrace_event_call *call)
+static int init_syscall_trace(struct ftrace_event_call *call)
 {
int id;
int num;
@@ -457,6 +430,30 @@ int init_syscall_trace(struct ftrace_event_call *call)
return id;
 }
 
+struct trace_event_functions enter_syscall_print_funcs = {
+   .trace  = print_syscall_enter,
+};
+
+struct trace_event_functions exit_syscall_print_funcs = {
+   .trace  = print_syscall_exit,
+};
+
+struct ftrace_event_class event_class_syscall_enter = {
+   .system = "syscalls",
+   .reg= syscall_enter_register,
+   .define_fields  = syscall_enter_define_fields,
+   .get_fields = syscall_get_enter_fields,
+   .raw_init   = init_syscall_trace,
+};
+
+struct ftrace_event_class event_class_syscall_exit = {
+   .system = "syscalls",
+   .reg= syscall_exit_register,
+   .define_fields  = syscall_exit_define_fields,
+   .fields = LIST_HEAD_INIT(event_class_syscall_exit.fields),
+   .raw_init   = init_syscall_trace,
+};
+
 unsigned long __init __weak arch_syscall_addr(int nr)
 {
return (unsigned long)sys_call_table[nr];
@@ -537,7 +534,7 @@ static void perf_syscall_enter(void

[PATCH 04/19] ring-buffer: Add a dropped events counter

2012-11-02 Thread Steven Rostedt

From: Slava Pestov 

The existing 'overrun' counter is incremented when the ring
buffer wraps around, with overflow on (the default). We wanted
a way to count requests lost from the buffer filling up with
overflow off, too. I decided to add a new counter instead
of retro-fitting the existing one because it seems like a
different statistic to count conceptually, and also because
of how the code was structured.

Link: 
http://lkml.kernel.org/r/1310765038-26399-1-git-send-email-slavapes...@google.com

Signed-off-by: Slava Pestov 
Signed-off-by: Steven Rostedt 
---
 include/linux/ring_buffer.h |1 +
 kernel/trace/ring_buffer.c  |   41 +++--
 kernel/trace/trace.c|3 +++
 3 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 6c8835f..2007375 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -166,6 +166,7 @@ unsigned long ring_buffer_overruns(struct ring_buffer 
*buffer);
 unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_commit_overrun_cpu(struct ring_buffer *buffer, int 
cpu);
+unsigned long ring_buffer_dropped_events_cpu(struct ring_buffer *buffer, int 
cpu);
 
 u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu);
 void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b979426..0ebeb1d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -460,9 +460,10 @@ struct ring_buffer_per_cpu {
unsigned long   lost_events;
unsigned long   last_overrun;
local_t entries_bytes;
-   local_t commit_overrun;
-   local_t overrun;
local_t entries;
+   local_t overrun;
+   local_t commit_overrun;
+   local_t dropped_events;
local_t committing;
local_t commits;
unsigned long   read;
@@ -2155,8 +2156,10 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 * If we are not in overwrite mode,
 * this is easy, just stop here.
 */
-   if (!(buffer->flags & RB_FL_OVERWRITE))
+   if (!(buffer->flags & RB_FL_OVERWRITE)) {
+   local_inc(_buffer->dropped_events);
goto out_reset;
+   }
 
ret = rb_handle_head_page(cpu_buffer,
  tail_page,
@@ -2995,7 +2998,8 @@ unsigned long ring_buffer_entries_cpu(struct ring_buffer 
*buffer, int cpu)
 EXPORT_SYMBOL_GPL(ring_buffer_entries_cpu);
 
 /**
- * ring_buffer_overrun_cpu - get the number of overruns in a cpu_buffer
+ * ring_buffer_overrun_cpu - get the number of overruns caused by the ring
+ * buffer wrapping around (only if RB_FL_OVERWRITE is on).
  * @buffer: The ring buffer
  * @cpu: The per CPU buffer to get the number of overruns from
  */
@@ -3015,7 +3019,9 @@ unsigned long ring_buffer_overrun_cpu(struct ring_buffer 
*buffer, int cpu)
 EXPORT_SYMBOL_GPL(ring_buffer_overrun_cpu);
 
 /**
- * ring_buffer_commit_overrun_cpu - get the number of overruns caused by 
commits
+ * ring_buffer_commit_overrun_cpu - get the number of overruns caused by
+ * commits failing due to the buffer wrapping around while there are 
uncommitted
+ * events, such as during an interrupt storm.
  * @buffer: The ring buffer
  * @cpu: The per CPU buffer to get the number of overruns from
  */
@@ -3036,6 +3042,28 @@ ring_buffer_commit_overrun_cpu(struct ring_buffer 
*buffer, int cpu)
 EXPORT_SYMBOL_GPL(ring_buffer_commit_overrun_cpu);
 
 /**
+ * ring_buffer_dropped_events_cpu - get the number of dropped events caused by
+ * the ring buffer filling up (only if RB_FL_OVERWRITE is off).
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to get the number of overruns from
+ */
+unsigned long
+ring_buffer_dropped_events_cpu(struct ring_buffer *buffer, int cpu)
+{
+   struct ring_buffer_per_cpu *cpu_buffer;
+   unsigned long ret;
+
+   if (!cpumask_test_cpu(cpu, buffer->cpumask))
+   return 0;
+
+   cpu_buffer = buffer->buffers[cpu];
+   ret = local_read(_buffer->dropped_events);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_dropped_events_cpu);
+
+/**
  * ring_buffer_entries - get the number of entries in a buffer
  * @buffer: The ring buffer
  *
@@ -3864,9 +3892,10 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
local_set(_buffer->reader_page->page->commit, 0);

[PATCH 05/19] tracing: Expand ring buffer when trace_printk() is used

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

Since tracing is not used by 99% of Linux users, even though tracing
may be configured in, it does not make sense to allocate 1.4 Megs
per CPU for the ring buffers if they are not used. Thus, on boot up
the ring buffers are set to a minimal size until something needs the
and they are expanded.

This works well for events and tracers (function, etc), but for the
asynchronous use of trace_printk() which can write to the ring buffer
at any time, does not expand the buffers.

On boot up a check is made to see if any trace_printk() is used to
see if the trace_printk() temp buffer pages should be allocated. This
same code can be used to expand the buffers as well.

Suggested-by: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 36c213f..a5411b7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1571,6 +1571,9 @@ void trace_printk_init_buffers(void)
 
pr_info("ftrace: Allocated trace_printk buffers\n");
 
+   /* Expand the buffers to set size */
+   tracing_update_buffers();
+
buffers_allocated = 1;
 }
 
@@ -3030,6 +3033,10 @@ static int __tracing_resize_ring_buffer(unsigned long 
size, int cpu)
 */
ring_buffer_expanded = 1;
 
+   /* May be called before buffers are initialized */
+   if (!global_trace.buffer)
+   return 0;
+
ret = ring_buffer_resize(global_trace.buffer, size, cpu);
if (ret < 0)
return ret;
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

[PATCH 02/19] tracing: Allow tracers to start at core initcall

2012-11-02 Thread Steven Rostedt

From: Steven Rostedt 

There's times during debugging that it is helpful to see traces of early
boot functions. But the tracers are initialized at device_initcall()
which is quite late during the boot process. Setting the kernel command
line parameter ftrace=function will not show anything until the function
tracer is initialized. This prevents being able to trace functions before
device_initcall().

There's no reason that the tracers need to be initialized so late in the
boot process. Move them up to core_initcall() as they still need to come
after early_initcall() which initializes the tracing buffers.

Cc: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/ftrace.c|4 ++--
 kernel/trace/trace_branch.c  |2 +-
 kernel/trace/trace_functions.c   |3 +--
 kernel/trace/trace_functions_graph.c |2 +-
 kernel/trace/trace_irqsoff.c |2 +-
 kernel/trace/trace_sched_wakeup.c|2 +-
 6 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 60ad606..4451aa3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2868,7 +2868,7 @@ static int __init ftrace_mod_cmd_init(void)
 {
return register_ftrace_command(_mod_cmd);
 }
-device_initcall(ftrace_mod_cmd_init);
+core_initcall(ftrace_mod_cmd_init);
 
 static void function_trace_probe_call(unsigned long ip, unsigned long 
parent_ip,
  struct ftrace_ops *op, struct pt_regs 
*pt_regs)
@@ -4055,7 +4055,7 @@ static int __init ftrace_nodyn_init(void)
ftrace_enabled = 1;
return 0;
 }
-device_initcall(ftrace_nodyn_init);
+core_initcall(ftrace_nodyn_init);
 
 static inline int ftrace_init_dyn_debugfs(struct dentry *d_tracer) { return 0; 
}
 static inline void ftrace_startup_enable(int command) { }
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index 8d3538b..bd3e0ee 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -199,7 +199,7 @@ __init static int init_branch_tracer(void)
}
return register_tracer(_trace);
 }
-device_initcall(init_branch_tracer);
+core_initcall(init_branch_tracer);
 
 #else
 static inline
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 618dcf8..bb227e3 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -411,5 +411,4 @@ static __init int init_function_trace(void)
init_func_cmd_traceon();
return register_tracer(_trace);
 }
-device_initcall(init_function_trace);
-
+core_initcall(init_function_trace);
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 99b4378..a84b558 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -1474,4 +1474,4 @@ static __init int init_graph_trace(void)
return register_tracer(_trace);
 }
 
-device_initcall(init_graph_trace);
+core_initcall(init_graph_trace);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index d98ee82..11edebd 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -698,4 +698,4 @@ __init static int init_irqsoff_tracer(void)
 
return 0;
 }
-device_initcall(init_irqsoff_tracer);
+core_initcall(init_irqsoff_tracer);
diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index 02170c0..2f6af78 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -637,4 +637,4 @@ __init static int init_wakeup_tracer(void)
 
return 0;
 }
-device_initcall(init_wakeup_tracer);
+core_initcall(init_wakeup_tracer);
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part

Re: [PATCH 2/5] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

2012-11-02 Thread Konrad Rzeszutek Wilk

> > > + frontswap_enabled = 1;
> > 
> > If frontswap_enabled is going to be on all the time, then what point
> > does it serve?  By extension, can all of the static inline wrappers in
> > frontswap.h be done away with?

Hm, or the frontswap_enabled can be converted to a "frontswap_flag"
which has:

#define FRONTSWAP_ON (1<<1)
#define FRONTSWAP_BACKEND_ON (1<<2)

or so? And then we can see if we can squash the 'backend_registerd'
and 'frontswap_enabled' together.
> 
> The intent of frontswap_enabled and cleancache_enabled was
> to avoid the overhead of a function call at the point where
> each frontswap/cleancache "hooks" is placed, using a global
> variable check instead.  I'm not sure if this minor
> performance tuning effort is worth preserving:  If not,
> I agree frontswap_enabled and the static inline wrappers (as
> well as their cleancache brethren) could be done away with **;
> if worth preserving, then I think frontswap_enabled could
> be set in the init method instead but the check for enabled
> in the frontswap init method and the cleancache init_fs
> method would need to be removed else lazy initialization
> wouldn't work.

Either way, that should be a seperate patch.
> 
> Dan
> 
> ** Note to anyone that tries this:  There is a subtle but
> clever hack in the wrappers suggested by Jeremy Fitzhardinge
> that disables the wrappers at compile-time as well as
> runtime.  IOW, make sure you test-compile both with
> CONFIG_{CLEANCACHE|FRONTSWAP} _and_ with them unconfig'd.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] Input: omap4-keypad: Add pinctrl support

2012-11-02 Thread Mark Brown

On Tue, Oct 30, 2012 at 10:51:11PM +0100, Linus Walleij wrote:
> On Tue, Oct 30, 2012 at 7:37 PM, Mark Brown

> > More seriously the amount of time we seem to have been spending recently
> > on changes which end up requiring us to go through essentially every
> > driver and add code to them (often several times) doesn't seem like
> > we're doing a good job here.

> If this is your main concern you should be made aware that there
> are people out there planning to supplant the existing DT probe paths
> that are now being added to each and every ARM-related driver
> with an ACPI probe path as ARM servers come into the picture.

That's different as we're adding support for a new external interface
which will need per device configuration parsing rather than
reorganising things within the kernel; I'd expect it won't supplant DT
but rather sit alongside it as it's more a requirement for the server
market than for the embedded market.

signature.asc
Description: Digital signature

Re: [PATCH 2/5] mm: frontswap: lazy initialization to allow tmem backends to build/run as modules

2012-11-02 Thread Konrad Rzeszutek Wilk

On Wed, Oct 31, 2012 at 12:05:32PM -0500, Seth Jennings wrote:
> On 10/31/2012 10:07 AM, Dan Magenheimer wrote:
> > With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> > built/loaded as modules rather than built-in and enabled by a boot 
> > parameter,
> > this patch provides "lazy initialization", allowing backends to register to
> > frontswap even after swapon was run. Before a backend registers all calls
> > to init are recorded and the creation of tmem_pools delayed until a backend
> > registers or until a frontswap put is attempted.
> > 
> > Signed-off-by: Stefan Hengelein 
> > Signed-off-by: Florian Schmaus 
> > Signed-off-by: Andor Daam 
> > Signed-off-by: Dan Magenheimer 
> > ---
> >  include/linux/frontswap.h |1 +
> >  mm/frontswap.c|   70 
> > +++-
> >  2 files changed, 63 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
> > index 3044254..ef6ada6 100644
> > --- a/include/linux/frontswap.h
> > +++ b/include/linux/frontswap.h
> > @@ -23,6 +23,7 @@ extern void frontswap_writethrough(bool);
> >  extern void frontswap_tmem_exclusive_gets(bool);
> > 
> >  extern void __frontswap_init(unsigned type);
> > +#define FRONTSWAP_HAS_LAZY_INIT
> >  extern int __frontswap_store(struct page *page);
> >  extern int __frontswap_load(struct page *page);
> >  extern void __frontswap_invalidate_page(unsigned, pgoff_t);
> > diff --git a/mm/frontswap.c b/mm/frontswap.c
> > index 2890e67..523a19b 100644
> > --- a/mm/frontswap.c
> > +++ b/mm/frontswap.c
> > @@ -80,6 +80,19 @@ static inline void inc_frontswap_succ_stores(void) { }
> >  static inline void inc_frontswap_failed_stores(void) { }
> >  static inline void inc_frontswap_invalidates(void) { }
> >  #endif
> > +
> > +/*
> > + * When no backend is registered all calls to init are registered and
> > + * remembered but fail to create tmem_pools. When a backend registers with
> > + * frontswap the previous calls to init are executed to create tmem_pools
> > + * and set the respective poolids.
> > + * While no backend is registered all "puts", "gets" and "flushes" are
> > + * ignored or fail.
> > + */
> > +#define MAX_INITIALIZABLE_SD 32
> 
> MAX_INITIALIZABLE_SD should just be MAX_SWAPFILES
> 
> > +static int sds[MAX_INITIALIZABLE_SD];
> 
> Rather than store and array of enabled types indexed by type, why not
> an array of booleans indexed by type.  Or a bitfield if you really
> want to save space.
> 
> > +static int backend_registered;
> 
> (backend_registered) is equivalent to checking (frontswap_ops != NULL)
> right?
> 
> > +
> >  /*
> >   * Register operations for frontswap, returning previous thus allowing
> >   * detection of multiple backends and possible nesting.
> > @@ -87,9 +100,16 @@ static inline void inc_frontswap_invalidates(void) { }
> >  struct frontswap_ops frontswap_register_ops(struct frontswap_ops *ops)
> >  {
> > struct frontswap_ops old = frontswap_ops;
> > +   int i;
> > 
> > frontswap_ops = *ops;
> > frontswap_enabled = true;
> > +
> > +   backend_registered = 1;
> > +   for (i = 0; i < MAX_INITIALIZABLE_SD; i++) {
> > +   if (sds[i] != -1)
> > +   (*frontswap_ops.init)(sds[i]);
> > +   }
> > return old;
> >  }
> >  EXPORT_SYMBOL(frontswap_register_ops);
> > @@ -122,7 +142,10 @@ void __frontswap_init(unsigned type)
> > BUG_ON(sis == NULL);
> > if (sis->frontswap_map == NULL)
> > return;
> > -   frontswap_ops.init(type);
> > +   if (backend_registered) {
> > +   (*frontswap_ops.init)(type);
> > +   sds[type] = type;
> 
> This is weird, storing the type in an array indexed by type.  Hence my
> suggestion above about an array of booleans or a bitfield.
> 
> > +   }
> >  }
> >  EXPORT_SYMBOL(__frontswap_init);
> > 
> > @@ -147,10 +170,20 @@ int __frontswap_store(struct page *page)
> > struct swap_info_struct *sis = swap_info[type];
> > pgoff_t offset = swp_offset(entry);
> > 
> > +   if (!backend_registered) {
> > +   inc_frontswap_failed_stores();
> > +   return ret;
> > +   }
> > +
> > BUG_ON(!PageLocked(page));
> > BUG_ON(sis == NULL);
> > if (frontswap_test(sis, offset))
> > dup = 1;
> > +   if (type < MAX_INITIALIZABLE_SD && sds[type] == -1) {
> > +   /* lazy init call to handle post-boot insmod backends*/
> > +   (*frontswap_ops.init)(type);
> > +   sds[type] = type;
> > +   }
> > ret = frontswap_ops.store(type, offset, page);
> > if (ret == 0) {
> > frontswap_set(sis, offset);
> > @@ -186,6 +219,9 @@ int __frontswap_load(struct page *page)
> > struct swap_info_struct *sis = swap_info[type];
> > pgoff_t offset = swp_offset(entry);
> > 
> > +   if (!backend_registered)
> > +   return ret;
> > +
> > BUG_ON(!PageLocked(page));
> > BUG_ON(sis == NULL);
> > if (frontswap_test(sis, offset))
> > @@ -209,6 +245,9 @@ void

Re: [PATCH] Staging: Android: logger: module_exit implementationg

2012-11-02 Thread Greg Kroah-Hartman

On Thu, Nov 01, 2012 at 11:15:52PM -0700, Luca Clementi wrote:
> Created the module_exit for the android logger so that
> it can be loaded and unloaded as a module. Fixed
> module_init and some other minor issues.

That's doing more than one thing here at once, care to break it up?
Yeah, I know it seems funny for such a small patch, but it helps.

Also, now that you've added this, the logger driver still can't be built
as a module, as the build system isn't changed to let that happen,
right?

Also, why do you want to build this as a module?

> Signed-off-by: Luca Clementi 
> Cc: Greg Kroah-Hartman 
> Cc: Brian Swetland 
> ---
>  drivers/staging/android/logger.c |   30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/logger.c 
> b/drivers/staging/android/logger.c
> index 1d5ed47..050be01 100644
> --- a/drivers/staging/android/logger.c
> +++ b/drivers/staging/android/logger.c
> @@ -676,4 +676,32 @@ static int __init logger_init(void)
>  out:
>   return ret;
>  }
> -device_initcall(logger_init);
> +
> +static void __exit logger_exit(void)
> +{
> + struct logger_log *current_log, *next_log;
> +
> + list_for_each_entry_safe(current_log, next_log, _list, logs) {
> + /* we have to delete all the entry inside log_list */
> + ret = misc_deregister(_log->misc);
> + if (unlikely(ret)) {
> + pr_err("failed to deregister misc device for log 
> '%s'!\n",
> + current_log->misc.name);
> + }
> + pr_info("removed loggger '%s'\n", current_log->misc.name);

Is that message really needed?

> + vfree(current_log->buffer);
> + kfree(current_log->misc.name);
> + kfree(current_log);
> + }
> +
> + return;
> +}
> +
> +
> +module_init(logger_init);

Is module_init() the same "level" as device_initcall()?  Did you test
this out in an Android system?

> +module_exit(logger_exit);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Brian Swetland, ");
> +MODULE_DESCRIPTION("Android Logger");
> +
> +

What's with the unneeded trailing empty lines?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread Vivek Goyal

On Fri, Nov 02, 2012 at 05:22:41PM +0100, Jiri Kosina wrote:
> On Fri, 2 Nov 2012, Vivek Goyal wrote:
> 
> > > > "crash" utility has module which allows reading kernel memory. So 
> > > > leaking
> > > > this private key will be easier then you are thinking it to be.
> > > 
> > > That's not upstream, right?
> > 
> > Yes, checked with Dave, it is not upstream. Well, still it is a concern
> > for distro kernel.
> 
> Well, that's about /dev/crash, right?

Yes, I was talking about /dev/crash.

> 
> How about /proc/kcore?

Yes, we will have to lock down /proc/kcore too if we go the private
key solution way.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] mm: cleancache: lazy initialization to allow tmem backends to build/run as modules

2012-11-02 Thread Konrad Rzeszutek Wilk

On Wed, Oct 31, 2012 at 08:07:50AM -0700, Dan Magenheimer wrote:
> With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> built/loaded as modules rather than built-in and enabled by a boot parameter,
> this patch provides "lazy initialization", allowing backends to register to
> cleancache even after filesystems were mounted. Calls to init_fs and
> init_shared_fs are remembered as fake poolids but no real tmem_pools created.
> On backend registration the fake poolids are mapped to real poolids and
> respective tmem_pools.
> 
> Signed-off-by: Stefan Hengelein 
> Signed-off-by: Florian Schmaus 
> Signed-off-by: Andor Daam 
> Signed-off-by: Dan Magenheimer 
> ---
>  include/linux/cleancache.h |1 +
>  mm/cleancache.c|  157 
> +++-
>  2 files changed, 141 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h
> index 42e55de..f7e32f0 100644
> --- a/include/linux/cleancache.h
> +++ b/include/linux/cleancache.h
> @@ -37,6 +37,7 @@ extern struct cleancache_ops
>   cleancache_register_ops(struct cleancache_ops *ops);
>  extern void __cleancache_init_fs(struct super_block *);
>  extern void __cleancache_init_shared_fs(char *, struct super_block *);
> +#define CLEANCACHE_HAS_LAZY_INIT
>  extern int  __cleancache_get_page(struct page *);
>  extern void __cleancache_put_page(struct page *);
>  extern void __cleancache_invalidate_page(struct address_space *, struct page 
> *);
> diff --git a/mm/cleancache.c b/mm/cleancache.c
> index 32e6f41..29430b7 100644
> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -45,15 +45,42 @@ static u64 cleancache_puts;
>  static u64 cleancache_invalidates;
>  
>  /*
> + * When no backend is registered all calls to init_fs and init_shard_fs
> + * are registered and fake poolids are given to the respective
> + * super block but no tmem_pools are created. When a backend
> + * registers with cleancache the previous calls to init_fs and
> + * init_shared_fs are executed to create tmem_pools and set the
> + * respective poolids. While no backend is registered all "puts",
> + * "gets" and "flushes" are ignored or fail.
> + */
> +#define MAX_INITIALIZABLE_FS 32
> +#define FAKE_FS_POOLID_OFFSET 1000
> +#define FAKE_SHARED_FS_POOLID_OFFSET 2000
> +static int fs_poolid_map[MAX_INITIALIZABLE_FS];
> +static int shared_fs_poolid_map[MAX_INITIALIZABLE_FS];
> +static char *uuids[MAX_INITIALIZABLE_FS];
> +static int backend_registered;

Those could use some #define's and bool, so please see attached
patch which does this.

>From a89c1224ec1957f1afaf4fbc1de349124bed6c67 Mon Sep 17 00:00:00 2001
From: Dan Magenheimer 
Date: Wed, 31 Oct 2012 08:07:50 -0700
Subject: [PATCH 1/2] mm: cleancache: lazy initialization to allow tmem
 backends to build/run as modules

With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
built/loaded as modules rather than built-in and enabled by a boot parameter,
this patch provides "lazy initialization", allowing backends to register to
cleancache even after filesystems were mounted. Calls to init_fs and
init_shared_fs are remembered as fake poolids but no real tmem_pools created.
On backend registration the fake poolids are mapped to real poolids and
respective tmem_pools.

Signed-off-by: Stefan Hengelein 
Signed-off-by: Florian Schmaus 
Signed-off-by: Andor Daam 
Signed-off-by: Dan Magenheimer 
[v1: Minor fixes: used #define for some values and bools]
Signed-off-by: Konrad Rzeszutek Wilk 
---
 include/linux/cleancache.h |   1 +
 mm/cleancache.c| 156 -
 2 files changed, 140 insertions(+), 17 deletions(-)

diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h
index 42e55de..f7e32f0 100644
--- a/include/linux/cleancache.h
+++ b/include/linux/cleancache.h
@@ -37,6 +37,7 @@ extern struct cleancache_ops
cleancache_register_ops(struct cleancache_ops *ops);
 extern void __cleancache_init_fs(struct super_block *);
 extern void __cleancache_init_shared_fs(char *, struct super_block *);
+#define CLEANCACHE_HAS_LAZY_INIT
 extern int  __cleancache_get_page(struct page *);
 extern void __cleancache_put_page(struct page *);
 extern void __cleancache_invalidate_page(struct address_space *, struct page 
*);
diff --git a/mm/cleancache.c b/mm/cleancache.c
index 32e6f41..318a0ad 100644
--- a/mm/cleancache.c
+++ b/mm/cleancache.c
@@ -45,15 +45,45 @@ static u64 cleancache_puts;
 static u64 cleancache_invalidates;
 
 /*
+ * When no backend is registered all calls to init_fs and init_shard_fs
+ * are registered and fake poolids are given to the respective
+ * super block but no tmem_pools are created. When a backend
+ * registers with cleancache the previous calls to init_fs and
+ * init_shared_fs are executed to create tmem_pools and set the
+ * respective poolids. While no backend is registered all "puts",
+ * "gets" and "flushes" are ignored or fail.
+ */
+#define

Re: [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs

2012-11-02 Thread Paul E. McKenney

On Fri, Nov 02, 2012 at 03:03:01PM +, Christoph Lameter wrote:
> On Fri, 2 Nov 2012, Steven Rostedt wrote:
> 
> > > also it would be best to sync this conceptually with the processors
> > > enabled for rcu processing.
> >
> > Processors can be disabled for rcu processing? Or are you talking about
> > Paul's new work of offloading rcu callbacks?
> 
> Yes. Paul's new work to remove rcu processing from processors. That needs
> to be synced configuration wise somehow. It does not make sense to process
> rcu callbacks on processors where the timer tick does not work anymore.

In kernels built with CONFIG_FAST_NO_HZ=n, if there are callbacks,
then there will be a tick, with or without Frederic's adaptive ticks.
If CONFIG_FAST_NO_HZ=y, if there are callbacks but no tick, RCU will
arrange for a timer to allow RCU processing to proceed as needed, but
much longer than one tick in duration, and only until such time as the
RCU callbacks drain.

So, yes, people who need absolutely all jitter to be banished at whatever
cost would want both adaptive ticks and no-CBs CPUs, but not everyone
who wants adaptive ticks would necessarily want the burden of choosing
which CPUs get callbacks offloaded from and where they should be executed.

So I believe that these need to be controlled separately for the immediate
future.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] staging: zcache2+ramster: enable zcache2 to be built/loaded as a module

2012-11-02 Thread Konrad Rzeszutek Wilk

On Wed, Oct 31, 2012 at 08:07:52AM -0700, Dan Magenheimer wrote:
> Allow zcache2 to be built/loaded as a module.  Note runtime dependency
> disallows loading if cleancache/frontswap lazy initialization patches
> are not present.  Zsmalloc support has not yet been merged into zcache2
> but, once merged, could now easily be selected via a module_param.
> 
> If built-in (not built as a module), the original mechanism of enabling via
> a kernel boot parameter is retained, but this should be considered deprecated.
> 
> Note that module unload is explicitly not yet supported.

I had an issue putting it on v3.7-rc3 with the Kconfig. Not sure why
as it looks exactly the same.

The patch looks good, however..

> @@ -1812,9 +1846,28 @@ static int __init zcache_init(void)
>   }
>   if (ramster_enabled)
>   ramster_init(!disable_cleancache, !disable_frontswap,
> - frontswap_has_exclusive_gets);
> + frontswap_has_exclusive_gets,
> + !disable_frontswap_selfshrink);
>  out:
>   return ret;
>  }

.. ramster_init change is in the next patch. So it looks like the
patch order is a bit mismatched.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 1/2] rcu: Add callback-free CPUs

2012-11-02 Thread Paul E. McKenney

On Wed, Oct 31, 2012 at 03:10:04PM +0100, Frederic Weisbecker wrote:
> 2012/10/31 Paul E. McKenney :
> > +/*
> > + * Per-rcu_data kthread, but only for no-CBs CPUs.  Each kthread invokes
> > + * callbacks queued by the corresponding no-CBs CPU.
> > + */
> > +static int rcu_nocb_kthread(void *arg)
> > +{
> > +   int c, cl;
> > +   struct rcu_head *list;
> > +   struct rcu_head *next;
> > +   struct rcu_head **tail;
> > +   struct rcu_data *rdp = arg;
> > +
> > +   /* Each pass through this loop invokes one batch of callbacks */
> > +   for (;;) {
> > +   /* If not polling, wait for next batch of callbacks. */
> > +   if (!rcu_nocb_poll)
> > +   wait_event(rdp->nocb_wq, rdp->nocb_head);
> > +   list = ACCESS_ONCE(rdp->nocb_head);
> > +   if (!list) {
> > +   schedule_timeout_interruptible(1);
> > +   continue;
> > +   }
> > +
> > +   /*
> > +* Extract queued callbacks, update counts, and wait
> > +* for a grace period to elapse.
> > +*/
> > +   ACCESS_ONCE(rdp->nocb_head) = NULL;
> > +   tail = xchg(>nocb_tail, >nocb_head);
> > +   c = atomic_long_xchg(>nocb_q_count, 0);
> > +   cl = atomic_long_xchg(>nocb_q_count_lazy, 0);
> > +   ACCESS_ONCE(rdp->nocb_p_count) += c;
> > +   ACCESS_ONCE(rdp->nocb_p_count_lazy) += cl;
> > +   wait_rcu_gp(rdp->rsp->call_remote);
> > +
> > +   /* Each pass through the following loop invokes a callback. 
> > */
> > +   trace_rcu_batch_start(rdp->rsp->name, cl, c, -1);
> > +   c = cl = 0;
> > +   while (list) {
> > +   next = list->next;
> > +   /* Wait for enqueuing to complete, if needed. */
> > +   while (next == NULL && >next != tail) {
> > +   schedule_timeout_interruptible(1);
> > +   next = list->next;
> > +   }
> > +   debug_rcu_head_unqueue(list);
> > +   local_bh_disable();
> > +   if (__rcu_reclaim(rdp->rsp->name, list))
> > +   cl++;
> > +   c++;
> > +   local_bh_enable();
> > +   list = next;
> > +   }
> > +   trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> > +   ACCESS_ONCE(rdp->nocb_p_count) -= c;
> > +   ACCESS_ONCE(rdp->nocb_p_count_lazy) -= cl;
> > +   rdp->n_cbs_invoked += c;
> > +   }
> > +   return 0;
> > +}
> > +
> > +/* Initialize per-rcu_data variables for no-CBs CPUs. */
> > +static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> > +{
> > +   rdp->nocb_tail = >nocb_head;
> > +   init_waitqueue_head(>nocb_wq);
> > +}
> > +
> > +/* Create a kthread for each RCU flavor for each no-CBs CPU. */
> > +static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp)
> > +{
> > +   int cpu;
> > +   struct rcu_data *rdp;
> > +   struct task_struct *t;
> > +
> > +   if (rcu_nocb_mask == NULL)
> > +   return;
> > +   for_each_cpu(cpu, rcu_nocb_mask) {
> > +   rdp = per_cpu_ptr(rsp->rda, cpu);
> > +   t = kthread_run(rcu_nocb_kthread, rdp, "rcuo%d", cpu);
> 
> Sorry, I think I left my brain in the middle of the diff. But there is
> something I'm misunderstanding I think. Here you're creating an
> rcu_nocb_kthread per nocb cpu. Looking at the code of
> rcu_nocb_kthread(), it seems to execute the callbacks with
> __rcu_reclaim().

True, executing within the context of the kthread, which might be
executing on any CPU.

> So, in the end, no callbacks CPU execute their callbacks. Isn't it the
> opposite than what is expected? (again, just referring to my
> misunderstanding).

The no-callbacks CPU would execute its own callbacks only if the
corresponding kthread happened to be executing on that CPU.
If you wanted a given CPU to be completely free of callbacks,
you would instead constrain the corresponding kthread to run
elsewhere.

Thanx, Paul

> Thanks.
> 
> > +   BUG_ON(IS_ERR(t));
> > +   ACCESS_ONCE(rdp->nocb_kthread) = t;
> > +   }
> > +}
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/7] Improve swiotlb performance by using physical addresses

2012-11-02 Thread Alexander Duyck

On 11/02/2012 09:21 AM, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 29, 2012 at 03:05:56PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Mon, Oct 29, 2012 at 11:18:09AM -0700, Alexander Duyck wrote:
>>> On Mon, Oct 15, 2012 at 10:19 AM, Alexander Duyck
>>>  wrote:
 While working on 10Gb/s routing performance I found a significant amount of
 time was being spent in the swiotlb DMA handler. Further digging found 
 that a
 significant amount of this was due to virtual to physical address 
 translation
 and calling the function that did it. It accounted for nearly 60% of the
 total swiotlb overhead.

 This patch set works to resolve that by replacing the io_tlb_start and
 io_tlb_end virtual addresses with a physical addresses. In addition it 
 changes
 the io_tlb_overflow_buffer from a virtual to a physical address. I followed
 through with the cleanup to the point that the only functions that really
 require the virtual address for the DMA buffer are the init, free, and
 bounce functions.

 In the case of devices that are using the bounce buffers these patches 
 should
 result in only a slight performance gain if any. This is due to the locking
 overhead required to map and unmap the buffers.

 In the case of devices that are not making use of bounce buffers these 
 patches
 can significantly reduce their overhead. In the case of an ixgbe routing 
 test
 for example, these changes result in 7 fewer calls to __phys_addr and
 allow is_swiotlb_buffer to become inlined due to a reduction in the number 
 of
 instructions. When running a routing throughput test using small packets I
 saw roughly a 6% increase in packets rates after applying these patches. 
 This
 appears to match up with the CPU overhead reduction I was tracking via 
 perf.

 Before:
 Results 10.0Mpps

 After:
 Results 10.6Mpps

 Finally, I updated the parameter names for several of the core function 
 calls
 as there was some ambiguity in naming. Specifically virtual address 
 pointers
 were named dma_addr. When I changed these pointers to physical I instead 
 used
 the name tlb_addr as this value represented a physical address in the
 io_tlb_start region and is less likely to be confused with a bus address.

 v2:
 I reviewed the changes and realized that the first patch that was dropping
 io_tlb_end and calculating the value didn't actually gain me much once I 
 had
 gone through and translated the rest of the addresses to physical 
 addresses.
 As such I have updated the patch so that it instead is converting 
 io_tlb_end
 from a virtual address to a physical address.  This actually helps to 
 reduce
 the overhead for is_swiotlb_buffer and swiotlb_dma_supported by several
 instructions.

 v3:
 After reviewing the patches I realized I was causing some namespace 
 pollution
 since a "static char *" was being replaced with "phys_addr_t" when it 
 should
 have been "static phys_addr_t".  As such I have updated the first 3 
 patches to
 correctly replace static pointers with static physical addresses.

 ---

 Alexander Duyck (7):
   swiotlb:  Do not export swiotlb_bounce since there are no external 
 consumers
   swiotlb: Use physical addresses instead of virtual in 
 swiotlb_tbl_sync_single
   swiotlb: Use physical addresses for swiotlb_tbl_unmap_single
   swiotlb: Return physical addresses when calling 
 swiotlb_tbl_map_single
   swiotlb: Make io_tlb_overflow_buffer a physical address
   swiotlb: Make io_tlb_start a physical address instead of a virtual 
 one
   swiotlb: Make io_tlb_end a physical address instead of a virtual one

  drivers/xen/swiotlb-xen.c |   25 ++--
  include/linux/swiotlb.h   |   20 ++-
  lib/swiotlb.c |  269 
 +++--
  3 files changed, 163 insertions(+), 151 deletions(-)

>>> Is there any ETA on when this patch series might be pulled into a
>>> tree?  I'm just wondering if I need to rebase this patch series and
>>> resubmit it, and if so what tree I need to rebase it off of?
>> No need to rebase it. I did a test on V2 version with Xen, but I still
>> need to do a IA64/Calgary/AMD Vi/Intel VT-d/GART test before
>> pushing it out.
> So you should your patches in linux-next.

I see they are in there.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 4/4] mfd: tps65910: pass irq_domain when adding mfd sub devices

2012-11-02 Thread Mark Brown

On Fri, Nov 02, 2012 at 10:59:58PM +0530, Laxman Dewangan wrote:
> When adding the sub device "tps65910-rtc", is it passed the
> IO resource IRQ for the interrupt number. This interrupt needs
> to map in the device irq domain. Pass the irq domain of device
> in mfd_add_devices() so that proper irq mapping can be done when
> adding the sub device RTC.

Rewiewed-by: Mark Brown 


signature.asc
Description: Digital signature

[PATCH 4/6 v2] arm highbank: add support for pl320 IPC

2012-11-02 Thread Mark Langsdorf

From: Rob Herring 

The pl320 IPC allows for interprocessor communication between the highbank A9
and the EnergyCore Management Engine. The pl320 implements a straightforward
mailbox protocol.

Signed-off-by: Mark Langsdorf 
Signed-off-by: Rob Herring 

Changes from v1:
Removed erroneous changes for cpufreq Kconfig
---
 arch/arm/mach-highbank/Makefile |   2 +
 arch/arm/mach-highbank/include/mach/pl320-ipc.h |  20 ++
 arch/arm/mach-highbank/pl320-ipc.c  | 232 
 3 files changed, 254 insertions(+)
 create mode 100644 arch/arm/mach-highbank/include/mach/pl320-ipc.h
 create mode 100644 arch/arm/mach-highbank/pl320-ipc.c

diff --git a/arch/arm/mach-highbank/Makefile b/arch/arm/mach-highbank/Makefile
index 3ec8bdd..b894708 100644
--- a/arch/arm/mach-highbank/Makefile
+++ b/arch/arm/mach-highbank/Makefile
@@ -7,3 +7,5 @@ obj-$(CONFIG_DEBUG_HIGHBANK_UART)   += lluart.o
 obj-$(CONFIG_SMP)  += platsmp.o
 obj-$(CONFIG_HOTPLUG_CPU)  += hotplug.o
 obj-$(CONFIG_PM_SLEEP) += pm.o
+
+obj-y  += pl320-ipc.o
diff --git a/arch/arm/mach-highbank/include/mach/pl320-ipc.h 
b/arch/arm/mach-highbank/include/mach/pl320-ipc.h
new file mode 100644
index 000..a0e58ee
--- /dev/null
+++ b/arch/arm/mach-highbank/include/mach/pl320-ipc.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright 2010 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+int ipc_call_fast(u32 *data);
+int ipc_call_slow(u32 *data);
+
+extern int pl320_ipc_register_notifier(struct notifier_block *nb);
+extern int pl320_ipc_unregister_notifier(struct notifier_block *nb);
diff --git a/arch/arm/mach-highbank/pl320-ipc.c 
b/arch/arm/mach-highbank/pl320-ipc.c
new file mode 100644
index 000..0eb92e4
--- /dev/null
+++ b/arch/arm/mach-highbank/pl320-ipc.c
@@ -0,0 +1,232 @@
+/*
+ * Copyright 2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define IPCMxSOURCE(m) ((m) * 0x40)
+#define IPCMxDSET(m)   (((m) * 0x40) + 0x004)
+#define IPCMxDCLEAR(m) (((m) * 0x40) + 0x008)
+#define IPCMxDSTATUS(m)(((m) * 0x40) + 0x00C)
+#define IPCMxMODE(m)   (((m) * 0x40) + 0x010)
+#define IPCMxMSET(m)   (((m) * 0x40) + 0x014)
+#define IPCMxMCLEAR(m) (((m) * 0x40) + 0x018)
+#define IPCMxMSTATUS(m)(((m) * 0x40) + 0x01C)
+#define IPCMxSEND(m)   (((m) * 0x40) + 0x020)
+#define IPCMxDR(m, dr) (((m) * 0x40) + ((dr) * 4) + 0x024)
+
+#define IPCMMIS(irq)   (((irq) * 8) + 0x800)
+#define IPCMRIS(irq)   (((irq) * 8) + 0x804)
+
+#define MBOX_MASK(n)   (1 << (n))
+#define IPC_FAST_MBOX  0
+#define IPC_SLOW_MBOX  1
+#define IPC_RX_MBOX2
+
+#define CHAN_MASK(n)   (1 << (n))
+#define A9_SOURCE  1
+#define M3_SOURCE  0
+
+static void __iomem *ipc_base;
+static int ipc_irq;
+static DEFINE_SPINLOCK(ipc_m0_lock);
+static DEFINE_MUTEX(ipc_m1_lock);
+static DECLARE_COMPLETION(ipc_completion);
+static ATOMIC_NOTIFIER_HEAD(ipc_notifier);
+
+static inline void set_destination(int source, int mbox)
+{
+   __raw_writel(CHAN_MASK(source), ipc_base + IPCMxDSET(mbox));
+   __raw_writel(CHAN_MASK(source), ipc_base + IPCMxMSET(mbox));
+}
+
+static inline void clear_destination(int source, int mbox)
+{
+   __raw_writel(CHAN_MASK(source), ipc_base + IPCMxDCLEAR(mbox));
+   __raw_writel(CHAN_MASK(source), ipc_base + IPCMxMCLEAR(mbox));
+}
+
+static void __ipc_send(int mbox, u32 *data)
+{
+   int i;
+   for (i = 0; i < 7; i++)
+   __raw_writel(data[i], ipc_base + IPCMxDR(mbox, i));
+

[PATCH 0/6 v2] cpufreq: add support for Calxeda ECX-1000 (highbank)

2012-11-02 Thread Mark Langsdorf

This patch series adds cpufreq support for the Calxeda ECX-1000 (highbank)
SoCs. The driver is based on the cpufreq-cpu0 driver. Because of the 
unique way that highbank uses the EnergyCore Management Engine to manage
voltages, it was not possible to use the cpufreq-cpu0 driver.

--Mark Langsdorf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/6 v2] power: export opp cpufreq functions

2012-11-02 Thread Mark Langsdorf

These functions are needed to make the cpufreq-core0 and highbank-cpufreq
drivers loadable as modules.

Signed-off-by: Mark Langsdorf 
Acked-by: Nishanth Menon 
Cc: linux...@vger.kernel.org

Changes from v1:
Added Nishanth Menon's ack.
Clarified the purpose of the change in the commit message.
---
 drivers/base/power/opp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index d946864..37dc5f4 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -643,6 +644,7 @@ int opp_init_cpufreq_table(struct device *dev,
 
return 0;
 }
+EXPORT_SYMBOL(opp_init_cpufreq_table);
 
 /**
  * opp_free_cpufreq_table() - free the cpufreq table
@@ -660,6 +662,7 @@ void opp_free_cpufreq_table(struct device *dev,
kfree(*table);
*table = NULL;
 }
+EXPORT_SYMBOL(opp_free_cpufreq_table);
 #endif /* CONFIG_CPU_FREQ */
 
 /**
@@ -720,4 +723,5 @@ int of_init_opp_table(struct device *dev)
 
return 0;
 }
+EXPORT_SYMBOL(of_init_opp_table);
 #endif
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6 v2] cpufreq, highbank: add support for highbank cpufreq

2012-11-02 Thread Mark Langsdorf

Highbank processors depend on the external ECME to perform voltage
management based on a requested frequency. Communication between the
highbank and ECME cores happens over the pl320 IPC channel.

Signed-off-by: Mark Langsdorf 
Cc: devicetree-disc...@lists.ozlabs.org
Cc: Rafael J. Wysocki 

Changes from v1:
Added highbank specific Kconfig changes
---
 .../bindings/cpufreq/highbank-cpufreq.txt  |  53 +
 arch/arm/Kconfig   |   2 +
 arch/arm/boot/dts/highbank.dts |  10 +
 arch/arm/mach-highbank/Kconfig |   2 +
 drivers/cpufreq/Kconfig.arm|  15 ++
 drivers/cpufreq/Makefile   |   1 +
 drivers/cpufreq/highbank-cpufreq.c | 229 +
 7 files changed, 312 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/cpufreq/highbank-cpufreq.txt
 create mode 100644 drivers/cpufreq/highbank-cpufreq.c

diff --git a/Documentation/devicetree/bindings/cpufreq/highbank-cpufreq.txt 
b/Documentation/devicetree/bindings/cpufreq/highbank-cpufreq.txt
new file mode 100644
index 000..3ec2cec
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/highbank-cpufreq.txt
@@ -0,0 +1,53 @@
+Highbank cpufreq driver
+
+This is cpufreq driver for Calxeda ECX-1000 (highbank) processor. It is based
+on the generic cpu0 driver and uses a similar format for bindings. Since
+the EnergyCore Management Engine maintains the voltage based on the
+frequency, the voltage component of the operating points can be set to any
+arbitrary values.
+
+Both required properties listed below must be defined under node /cpus/cpu@0.
+
+Required properties:
+- operating-points: Refer to Documentation/devicetree/bindings/power/opp.txt
+  for details
+- clock-latency: Specify the possible maximum transition latency for clock,
+  in unit of nanoseconds.
+
+Examples:
+
+cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   cpu@0 {
+   compatible = "arm,cortex-a9";
+   reg = <0>;
+   next-level-cache = <>;
+   operating-points = <
+   /* kHz  ignored */
+   79  100
+   396000  100
+   198000  100
+   >;
+   transition-latency = <20>;
+   };
+
+   cpu@1 {
+   compatible = "arm,cortex-a9";
+   reg = <1>;
+   next-level-cache = <>;
+   };
+
+   cpu@2 {
+   compatible = "arm,cortex-a9";
+   reg = <2>;
+   next-level-cache = <>;
+   };
+
+   cpu@3 {
+   compatible = "arm,cortex-a9";
+   reg = <3>;
+   next-level-cache = <>;
+   };
+};
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ade7e92..4ed0b7b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -391,6 +391,8 @@ config ARCH_SIRF
select PINCTRL
select PINCTRL_SIRF
select USE_OF
+   select ARCH_HAS_CPUFREQ
+   select ARCH_HAS_OPP
help
  Support for CSR SiRFprimaII/Marco/Polo platforms
 
diff --git a/arch/arm/boot/dts/highbank.dts b/arch/arm/boot/dts/highbank.dts
index 0c6fc34..7c4c27d 100644
--- a/arch/arm/boot/dts/highbank.dts
+++ b/arch/arm/boot/dts/highbank.dts
@@ -36,6 +36,16 @@
next-level-cache = <>;
clocks = <>;
clock-names = "cpu";
+   operating-points = <
+   /* kHzignored */
+130  100
+120  100
+110  100
+ 80  100
+ 40  100
+ 20  100
+   >;
+   clock-latency = <10>;
};
 
cpu@1 {
diff --git a/arch/arm/mach-highbank/Kconfig b/arch/arm/mach-highbank/Kconfig
index 0e1d0a4..ee83af6 100644
--- a/arch/arm/mach-highbank/Kconfig
+++ b/arch/arm/mach-highbank/Kconfig
@@ -13,3 +13,5 @@ config ARCH_HIGHBANK
select HAVE_SMP
select SPARSE_IRQ
select USE_OF
+   select ARCH_HAS_CPUFREQ
+   select ARCH_HAS_OPP
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 5961e64..bc3ef55 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -76,3 +76,18 @@ config ARM_EXYNOS5250_CPUFREQ
help
  This adds the CPUFreq driver for Samsung EXYNOS5250
  SoC.
+
+config ARM_HIGHBANK_CPUFREQ
+   tristate "Calxeda Highbank-based"
+   depends on ARCH_HIGHBANK
+   select CPU_FREQ_TABLE
+   select HAVE_CLK
+   select PM_OPP
+   select OF
+   default m
+   help
+ This adds the CPUFreq driver for Calxeda Highbank

[PATCH 2/6 v2] clk, highbank: remove non-bypass reset mode

2012-11-02 Thread Mark Langsdorf

The highbank clock will glitch if the clock rate is reset without
relocking the PLL. Remove the option to attempt reseting without
relocking.

Signed-off-by: Mark Langsdorf 
Signed-off-by: Rob Herring 
Cc: mturque...@linaro.org

Changes from v2:
Removed erroneous reformating.

---
 drivers/clk/clk-highbank.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/clk/clk-highbank.c b/drivers/clk/clk-highbank.c
index 52fecad..3a0b723 100644
--- a/drivers/clk/clk-highbank.c
+++ b/drivers/clk/clk-highbank.c
@@ -182,8 +182,10 @@ static int clk_pll_set_rate(struct clk_hw *hwclk, unsigned 
long rate,
reg |= HB_PLL_EXT_ENA;
reg &= ~HB_PLL_EXT_BYPASS;
} else {
+   writel(reg | HB_PLL_EXT_BYPASS, hbclk->reg);
reg &= ~HB_PLL_DIVQ_MASK;
reg |= divq << HB_PLL_DIVQ_SHIFT;
+   writel(reg | HB_PLL_EXT_BYPASS, hbclk->reg);
}
writel(reg, hbclk->reg);
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/6 v2] cpufreq: tolerate inexact values when collecting stats

2012-11-02 Thread Mark Langsdorf

When collecting stats, if a frequency doesn't match the table, go through
the table again with both the search frequency and table values shifted
left by 10 bits.

Signed-off-by: Mark Langsdorf 
Cc: MyungJoo Ham 

Changes from v1:
Implemented a simple round-up algorithm instead of the over/under
method that could cause errors on Intel processors with boost mode.
---
 drivers/cpufreq/cpufreq_stats.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 3998316..ab583e7 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -158,8 +158,12 @@ static struct attribute_group stats_attr_group = {
 static int freq_table_get_index(struct cpufreq_stats *stat, unsigned int freq)
 {
int index;
+   for (index = 0; index < stat->max_state; index++) 
+if (stat->freq_table[index] == freq)
+   return index;
+   /* no exact match, round up */
for (index = 0; index < stat->max_state; index++)
-   if (stat->freq_table[index] == freq)
+   if ((stat->freq_table[index] >> 10) == (freq >> 10))
return index;
return -1;
 }
@@ -251,6 +255,8 @@ static int cpufreq_stats_create_table(struct cpufreq_policy 
*policy,
spin_lock(_stats_lock);
stat->last_time = get_jiffies_64();
stat->last_index = freq_table_get_index(stat, policy->cur);
+   if (stat->last_index > stat->max_state)
+   stat->last_index = stat->max_state - 1;
spin_unlock(_stats_lock);
cpufreq_cpu_put(data);
return 0;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/6 v2] arm: use devicetree to get smp_twd clock

2012-11-02 Thread Mark Langsdorf

From: Rob Herring 

Signed-off-by: Rob Herring 
Cc: Russell King 
Cc: linux-arm-ker...@lists.infradead.org

Changes from v1
None.
---
 arch/arm/kernel/smp_twd.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index b22d700..600fbcc 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -237,12 +237,15 @@ static irqreturn_t twd_handler(int irq, void *dev_id)
return IRQ_NONE;
 }
 
-static struct clk *twd_get_clock(void)
+static struct clk *twd_get_clock(struct device_node *np)
 {
-   struct clk *clk;
+   struct clk *clk = NULL;
int err;
 
-   clk = clk_get_sys("smp_twd", NULL);
+   if (np)
+   clk = of_clk_get(np, 0);
+   if (!clk)
+   clk = clk_get_sys("smp_twd", NULL);
if (IS_ERR(clk)) {
pr_err("smp_twd: clock not found: %d\n", (int)PTR_ERR(clk));
return clk;
@@ -263,6 +266,7 @@ static struct clk *twd_get_clock(void)
return ERR_PTR(err);
}
 
+   twd_timer_rate = clk_get_rate(clk);
return clk;
 }
 
@@ -273,12 +277,7 @@ static int __cpuinit twd_timer_setup(struct 
clock_event_device *clk)
 {
struct clock_event_device **this_cpu_clk;
 
-   if (!twd_clk)
-   twd_clk = twd_get_clock();
-
-   if (!IS_ERR_OR_NULL(twd_clk))
-   twd_timer_rate = clk_get_rate(twd_clk);
-   else
+   if (IS_ERR_OR_NULL(twd_clk))
twd_calibrate_rate();
 
__raw_writel(0, twd_base + TWD_TIMER_CONTROL);
@@ -349,6 +348,10 @@ int __init twd_local_timer_register(struct twd_local_timer 
*tlt)
if (!twd_base)
return -ENOMEM;
 
+   twd_clk = twd_get_clock(NULL);
+
+   twd_clk = twd_get_clock(NULL);
+
return twd_local_timer_common_register();
 }
 
@@ -383,6 +386,8 @@ void __init twd_local_timer_of_register(void)
goto out;
}
 
+   twd_clk = twd_get_clock(np);
+
err = twd_local_timer_common_register();
 
 out:
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/2] memory: davinci - add aemif controller platform driver

2012-11-02 Thread Stephen Warren

On 11/02/2012 10:21 AM, Murali Karicheri wrote:
> This is a platform driver for asynchronous external memory interface
> available on TI SoCs. This driver was previously located inside the
> mach-davinci folder. As this DaVinci IP is re-used across multiple
> family of devices such as c6x, keystone etc, the driver is moved to drivers.
> The driver configures async bus parameters associated with a particular
> chip select. For DaVinci controller driver and driver for other async
> devices such as NOR flash, ASRAM etc, the bus confuguration is
> done by this driver at init time. A set of APIs (set/get) provided to
> update the values based on specific driver usage.

> +++ b/Documentation/devicetree/bindings/arm/davinci/aemif.txt

If the HW/binding is generic, I'd assume the documentation would be
place somewhere more like
Documentation/devicetree/bindings/memory/davinci-aemif.txt?

> @@ -0,0 +1,62 @@
> +* Texas Instruments Davinci AEMIF bus interface
> +
> +This file provides information for the davinci-emif chip select
> +bindings.

Shouldn't the binding be for an IP block (the AEMIF bus controller I
assume), rather than for a particular chip-select generated by the
controller?

> +This is a sub device node inside the davinci-emif device node
> +to describe a async bus for a specific chip select. For NAND,
> +CFI flash device bindings described inside an aemif node,
> +etc, a cs sub node is defined to associate the bus parameter
> +bindings used by the device.

Oh, this file only documents part of the controller's node? It seems
unusual to do that; the documentation for an entire node would usually
be in a single file, which seems to be
Documentation/devicetree/bindings/arm/davinci/nand.txt right now. Is
this "cs" sub-node really something that gets re-used across multiple
different memory controller IPs?

If so, I guess this file should be called something more like
davinci-aemif-cs.txt than davinci-aemif.txt. I'd suggest moving
arm/davinci/nand.txt into a common location too (and renaming it to
davici-nand.txt to better represent the compatible value it documents).

> +Required properties:=
> +- compatible: "ti,davinci-cs";
> +- #address-cells = <1>;
> +- #size-cells = <1>;
> +- cs - cs used by the device (NAND, CFI flash etc. values in the range: 2-5
> +
> +Optional properties:-
> +- asize - asynchronous data bus width (0 - 8bit, 1 - 16 bit)
> +  All of the params below in nanoseconds
> +
> +- ta - Minimum turn around time
> +- rhold - read hold width
> +- rstobe - read strobe width
> +- rsetup - read setup width
> +- whold - write hold width
> +- wstrobe - write strobe width
> +- wsetup - write setup width
> +- ss - enable/disable select strobe (0 - disable, 1 - enable)
> +- ew - enable/disable extended wait cycles (0 - disable, 1 - enable)

I assume all of those are pretty custom to this binding, and not
something you'd expect to re-use across multiple vendors' bindings? If
so, shouldn't they have a "ti," vendor prefix?

> +Example for davinci nand chip select
> +
> +aemif@6000 {
> +
> + compatible = "ti,davinci-aemif";

You need a reg property here.

> + #address-cells = <2>;
> + #size-cells = <1>;
> +
> + nand_cs:cs2@7000 {
> + compatible = "ti,davinci-cs";

You need a reg property here. The unit address "@7000" in the node
name only has one address cell, whereas the parent node sets
#address-cells = <2>.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failures after merge of the final tree (usb tree related)

2012-11-02 Thread Greg KH

On Fri, Nov 02, 2012 at 05:04:59PM +1100, Stephen Rothwell wrote:
> Hi ,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> In file included from drivers/usb/host/ehci-platform.c:28:0:
> drivers/usb/host/ehci.h: In function 'ehci_readl':
> drivers/usb/host/ehci.h:662:3: error: implicit declaration of function 
> 'readl_be' [-Werror=implicit-function-declaration]
> drivers/usb/host/ehci.h:663:3: error: implicit declaration of function 
> 'readl' [-Werror=implicit-function-declaration]
> drivers/usb/host/ehci.h: In function 'ehci_writel':
> drivers/usb/host/ehci.h:674:3: error: implicit declaration of function 
> 'writel_be' [-Werror=implicit-function-declaration]
> drivers/usb/host/ehci.h:675:3: error: implicit declaration of function 
> 'writel' [-Werror=implicit-function-declaration]
> 
> Caused by commit 99f91934a907 ("USB: EHCI: make ehci-platform a separate
> driver") from the usb tree.
> 
> I have reverted that commit for today.
> 
> I then got:
> 
> drivers/usb/chipidea/built-in.o: In function `ehci_init_driver':
> (.opd+0x5a0): multiple definition of `ehci_init_driver'
> drivers/usb/host/built-in.o:(.opd+0xb58): first defined here
> drivers/usb/chipidea/built-in.o: In function `.ehci_init_driver':
> (.text+0x6528): multiple definition of `.ehci_init_driver'
> drivers/usb/host/built-in.o:(.text+0x9eb4): first defined here
> drivers/usb/chipidea/built-in.o: In function `ehci_suspend':
> (.opd+0xc18): multiple definition of `ehci_suspend'
> drivers/usb/host/built-in.o:(.opd+0x11d0): first defined here
> drivers/usb/chipidea/built-in.o: In function `ehci_resume':
> (.opd+0x1020): multiple definition of `ehci_resume'
> drivers/usb/host/built-in.o:(.opd+0x1680): first defined here
> drivers/usb/chipidea/built-in.o: In function `.ehci_suspend':
> (.text+0x100e4): multiple definition of `.ehci_suspend'
> drivers/usb/host/built-in.o:(.text+0x13b00): first defined here
> drivers/usb/chipidea/built-in.o: In function `.ehci_resume':
> (.text+0x17760): multiple definition of `.ehci_resume'
> drivers/usb/host/built-in.o:(.text+0x1b498): first defined here
> drivers/usb/chipidea/built-in.o: In function `ehci_setup':
> (.opd+0x1128): multiple definition of `ehci_setup'
> drivers/usb/host/built-in.o:(.opd+0x17a0): first defined here
> drivers/usb/chipidea/built-in.o: In function `.ehci_setup':
> (.text+0x228f0): multiple definition of `.ehci_setup'
> drivers/usb/host/built-in.o:(.text+0x266fc): first defined here
> 
> Which is caused by commit 3e0232039967 ("USB: EHCI: prepare to make
> ehci-hcd a library module").
> 
> I have added the following patch to disable the chipidea driver for now:


Both of these should now be fixed in my tree.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry

2012-11-02 Thread JoonSoo Kim

Hello, Minchan.

2012/11/1 Minchan Kim :
> On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of first flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of first flushed entry.
>>
>> Additionally, update last_pkmap_nr to this index.
>> It is certain that entry which is below this index is occupied by other 
>> mapping,
>> therefore updating last_pkmap_nr to this index is reasonable optimization.
>>
>> Cc: Mel Gorman 
>> Cc: Peter Zijlstra 
>> Cc: Minchan Kim 
>> Signed-off-by: Joonsoo Kim 
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..97ad208 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void 
>> *vaddr, int size)
>>
>>  #ifdef CONFIG_HIGHMEM
>>  #include 
>> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>>
>>  /* declarations for linux/mm/highmem.c */
>>  unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index d98b0a9..b365f7b 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>>   return virt_to_page(addr);
>>  }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static unsigned int flush_all_zero_pkmaps(void)
>>  {
>>   int i;
>> - int need_flush = 0;
>> + unsigned int index = PKMAP_INVALID_INDEX;
>>
>>   flush_cache_kmaps();
>>
>> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> _page_table[i]);
>>
>>   set_page_address(page, NULL);
>> - need_flush = 1;
>> + if (index == PKMAP_INVALID_INDEX)
>> + index = i;
>>   }
>> - if (need_flush)
>> + if (index != PKMAP_INVALID_INDEX)
>>   flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> +
>> + return index;
>>  }
>>
>>  /**
>> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>>   */
>>  void kmap_flush_unused(void)
>>  {
>> + unsigned int index;
>> +
>>   lock_kmap();
>> - flush_all_zero_pkmaps();
>> + index = flush_all_zero_pkmaps();
>> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> + last_pkmap_nr = index;
>
> I don't know how kmap_flush_unused is really fast path so how my nitpick
> is effective. Anyway,
> What problem happens if we do following as?
>
> lock()
> index = flush_all_zero_pkmaps();
> if (index != PKMAP_INVALID_INDEX)
> last_pkmap_nr = index;
> unlock();
>
> Normally, last_pkmap_nr is increased with searching empty slot in
> map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> in kmap_flush_unused normally become either less than last_pkmap_nr
> or last_pkmap_nr + 1.

There is a case that return value of kmap_flush_unused() is larger
than last_pkmap_nr.
Look at the following example.

Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.

do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17

So, little dirty implementation is needed.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFT RESEND linux-next] c6x: dma-mapping: support debug_dma_mapping_error

2012-11-02 Thread Mark Salter

On Fri, 2012-11-02 at 10:44 -0600, Shuah Khan wrote:
> On Fri, 2012-10-26 at 09:40 -0600, Shuah Khan wrote:
> > Add support for debug_dma_mapping_error() call to avoid warning from
> > debug_dma_unmap() interface when it checks for mapping error checked
> > status. Without this patch, device driver failed to check map error
> > warning is generated.
> > 
> > Signed-off-by: Shuah Khan 
> > ---
> >  arch/c6x/include/asm/dma-mapping.h |1 +
> >  1 file changed, 1 insertion(+)

> Would you like to this patch go through c6x arch tree or linux-next?
> Please let me know your preference.

I tried to test this but I get a build error with CONFIG_DMA_API_DEBUG:

/linux-next/lib/dma-debug.c: In function 'has_mapping_error':
/linux-next/lib/dma-debug.c:863:15: error: implicit declaration of function 
'get_dma_ops' [-Werror=implicit-function-declaration]
/linux-next/lib/dma-debug.c:863:34: warning: initialization makes pointer from 
integer without a cast [enabled by default]

C6X (along with some other architectures) doesn't have a get_dma_ops()
function defined.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread Eric Paris

I know I started it, but Windows really isn't necessary to see value,
even if it is what pushed the timing.

A user installs a package as root.  Absent any flaws in the Linux
kernel (cough) that should be all it can do in a Secure Boot world.
But if you can drop a small trusted Linux system in there and use that
to boot a compromised Linux kernel, it can make itself persistent.

And like I said, I know there are cloud providers out there who want
EXACTLY this type of system.  One in which root in the guest is
untrusted and they want to keep them out of ring 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] bq2415x charger driver

2012-11-02 Thread Pali Rohár

Hello,

I'm sending new version of bq2415x charger driver which is needed
for example on Nokia N900 for charging battery. Driver is part of
open source project to replace proprietary battery management.
Driver is based on old RFC version which I sent months ago.

power_supply: Add bq2415x charger driver

This patch implements driver for bq2415x charging chips.

Signed-off-by: Pali Rohár 

--- /dev/null
+++ linux/drivers/power/bq2415x_charger.c
@@ -0,0 +1,1644 @@
+/*
+bq2415x_charger.c - bq2415x charger driver
+Copyright (C) 2011-2012  Pali Rohár 
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License along
+with this program; if not, write to the Free Software Foundation, Inc.,
+51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+*/
+
+/*
+  Datasheets:
+  http://www.ti.com/product/bq24150
+  http://www.ti.com/product/bq24150a
+  http://www.ti.com/product/bq24152
+  http://www.ti.com/product/bq24153
+  http://www.ti.com/product/bq24153a
+  http://www.ti.com/product/bq24155
+*/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/* timeout for resetting chip timer */
+#define BQ2415X_TIMER_TIMEOUT  10
+
+#define BQ2415X_REG_STATUS 0x00
+#define BQ2415X_REG_CONTROL0x01
+#define BQ2415X_REG_VOLTAGE0x02
+#define BQ2415X_REG_VENDER 0x03
+#define BQ2415X_REG_CURRENT0x04
+
+/* reset state for all registers */
+#define BQ2415X_RESET_STATUS   BIT(6)
+#define BQ2415X_RESET_CONTROL  (BIT(4)|BIT(5))
+#define BQ2415X_RESET_VOLTAGE  (BIT(1)|BIT(3))
+#define BQ2415X_RESET_CURRENT  (BIT(0)|BIT(3)|BIT(7))
+
+/* status register */
+#define BQ2415X_BIT_TMR_RST7
+#define BQ2415X_BIT_OTG7
+#define BQ2415X_BIT_EN_STAT6
+#define BQ2415X_MASK_STAT  (BIT(4)|BIT(5))
+#define BQ2415X_SHIFT_STAT 4
+#define BQ2415X_BIT_BOOST  3
+#define BQ2415X_MASK_FAULT (BIT(0)|BIT(1)|BIT(2))
+#define BQ2415X_SHIFT_FAULT0
+
+/* control register */
+#define BQ2415X_MASK_LIMIT (BIT(6)|BIT(7))
+#define BQ2415X_SHIFT_LIMIT6
+#define BQ2415X_MASK_VLOWV (BIT(4)|BIT(5))
+#define BQ2415X_SHIFT_VLOWV4
+#define BQ2415X_BIT_TE 3
+#define BQ2415X_BIT_CE 2
+#define BQ2415X_BIT_HZ_MODE1
+#define BQ2415X_BIT_OPA_MODE   0
+
+/* voltage register */
+#define BQ2415X_MASK_VO
(BIT(2)|BIT(3)|BIT(4)|BIT(5)|BIT(6)|BIT(7))
+#define BQ2415X_SHIFT_VO   2
+#define BQ2415X_BIT_OTG_PL 1
+#define BQ2415X_BIT_OTG_EN 0
+
+/* vender register */
+#define BQ2415X_MASK_VENDER(BIT(5)|BIT(6)|BIT(7))
+#define BQ2415X_SHIFT_VENDER   5
+#define BQ2415X_MASK_PN(BIT(3)|BIT(4))
+#define BQ2415X_SHIFT_PN   3
+#define BQ2415X_MASK_REVISION  (BIT(0)|BIT(1)|BIT(2))
+#define BQ2415X_SHIFT_REVISION 0
+
+/* current register */
+#define BQ2415X_MASK_RESET BIT(7)
+#define BQ2415X_MASK_VI_CHRG   (BIT(4)|BIT(5)|BIT(6))
+#define BQ2415X_SHIFT_VI_CHRG  4
+/* N/A BIT(3) */
+#define BQ2415X_MASK_VI_TERM   (BIT(0)|BIT(1)|BIT(2))
+#define BQ2415X_SHIFT_VI_TERM  0
+
+
+enum bq2415x_command {
+   BQ2415X_TIMER_RESET,
+   BQ2415X_OTG_STATUS,
+   BQ2415X_STAT_PIN_STATUS,
+   BQ2415X_STAT_PIN_ENABLE,
+   BQ2415X_STAT_PIN_DISABLE,
+   BQ2415X_CHARGE_STATUS,
+   BQ2415X_BOOST_STATUS,
+   BQ2415X_FAULT_STATUS,
+
+   BQ2415X_CHARGE_TERMINATION_STATUS,
+   BQ2415X_CHARGE_TERMINATION_ENABLE,
+   BQ2415X_CHARGE_TERMINATION_DISABLE,
+   BQ2415X_CHARGER_STATUS,
+   BQ2415X_CHARGER_ENABLE,
+   BQ2415X_CHARGER_DISABLE,
+   BQ2415X_HIGH_IMPEDANCE_STATUS,
+   BQ2415X_HIGH_IMPEDANCE_ENABLE,
+   BQ2415X_HIGH_IMPEDANCE_DISABLE,
+   BQ2415X_BOOST_MODE_STATUS,
+   BQ2415X_BOOST_MODE_ENABLE,
+   BQ2415X_BOOST_MODE_DISABLE,
+
+   BQ2415X_OTG_LEVEL,
+   BQ2415X_OTG_ACTIVATE_HIGH,
+   BQ2415X_OTG_ACTIVATE_LOW,
+   BQ2415X_OTG_PIN_STATUS,
+   BQ2415X_OTG_PIN_ENABLE,
+   BQ2415X_OTG_PIN_DISABLE,
+
+   BQ2415X_VENDER_CODE,
+   BQ2415X_PART_NUMBER,
+   BQ2415X_REVISION,
+};
+

[PATCH][GIT PULL][3.8] x86: Don't clobber top of pt_regs in nested NMI

2012-11-02 Thread Steven Rostedt


Ingo,

Please pull the latest tip/x86/asm tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/x86/asm

Head SHA1: 28696f434fef0efa97534b59986ad33b9c4df7f8


Salman Qazi (1):
  x86: Don't clobber top of pt_regs in nested NMI


 arch/x86/kernel/entry_64.S |   41 +++--
 1 file changed, 27 insertions(+), 14 deletions(-)
---
commit 28696f434fef0efa97534b59986ad33b9c4df7f8
Author: Salman Qazi 
Date:   Mon Oct 1 17:29:25 2012 -0700

x86: Don't clobber top of pt_regs in nested NMI

The nested NMI modifies the place (instruction, flags and stack)
that the first NMI will iret to.  However, the copy of registers
modified is exactly the one that is the part of pt_regs in
the first NMI.  This can change the behaviour of the first NMI.

In particular, Google's arch_trigger_all_cpu_backtrace handler
also prints regions of memory surrounding addresses appearing in
registers.  This results in handled exceptions, after which nested NMIs
start coming in.  These nested NMIs change the value of registers
in pt_regs.  This can cause the original NMI handler to produce
incorrect output.

We solve this problem by interchanging the position of the preserved
copy of the iret registers ("saved") and the copy subject to being
trampled by nested NMI ("copied").

Link: 
http://lkml.kernel.org/r/20121002002919.27236.14388.st...@dungbeetle.mtv.corp.google.com

Signed-off-by: Salman Qazi 
[ Added a needed CFI_ADJUST_CFA_OFFSET ]
Signed-off-by: Steven Rostedt 

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b51b2c7..811795d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1699,9 +1699,10 @@ nested_nmi:
 
 1:
/* Set up the interrupted NMIs stack to jump to repeat_nmi */
-   leaq -6*8(%rsp), %rdx
+   leaq -1*8(%rsp), %rdx
movq %rdx, %rsp
-   CFI_ADJUST_CFA_OFFSET 6*8
+   CFI_ADJUST_CFA_OFFSET 1*8
+   leaq -10*8(%rsp), %rdx
pushq_cfi $__KERNEL_DS
pushq_cfi %rdx
pushfq_cfi
@@ -1709,8 +1710,8 @@ nested_nmi:
pushq_cfi $repeat_nmi
 
/* Put stack back */
-   addq $(11*8), %rsp
-   CFI_ADJUST_CFA_OFFSET -11*8
+   addq $(6*8), %rsp
+   CFI_ADJUST_CFA_OFFSET -6*8
 
 nested_nmi_out:
popq_cfi %rdx
@@ -1736,18 +1737,18 @@ first_nmi:
 * +-+
 * | NMI executing variable  |
 * +-+
-* | Saved SS|
-* | Saved Return RSP|
-* | Saved RFLAGS|
-* | Saved CS|
-* | Saved RIP   |
-* +-+
 * | copied SS   |
 * | copied Return RSP   |
 * | copied RFLAGS   |
 * | copied CS   |
 * | copied RIP  |
 * +-+
+* | Saved SS|
+* | Saved Return RSP|
+* | Saved RFLAGS|
+* | Saved CS|
+* | Saved RIP   |
+* +-+
 * | pt_regs |
 * +-+
 *
@@ -1763,9 +1764,14 @@ first_nmi:
/* Set the NMI executing variable on the stack. */
pushq_cfi $1
 
+   /*
+* Leave room for the "copied" frame
+*/
+   subq $(5*8), %rsp
+
/* Copy the stack frame to the Saved frame */
.rept 5
-   pushq_cfi 6*8(%rsp)
+   pushq_cfi 11*8(%rsp)
.endr
CFI_DEF_CFA_OFFSET SS+8-RIP
 
@@ -1786,12 +1792,15 @@ repeat_nmi:
 * is benign for the non-repeat case, where 1 was pushed just above
 * to this very stack slot).
 */
-   movq $1, 5*8(%rsp)
+   movq $1, 10*8(%rsp)
 
/* Make another copy, this one may be modified by nested NMIs */
+   addq $(10*8), %rsp
+   CFI_ADJUST_CFA_OFFSET -10*8
.rept 5
-   pushq_cfi 4*8(%rsp)
+   pushq_cfi -6*8(%rsp)
.endr
+   subq $(5*8), %rsp
CFI_DEF_CFA_OFFSET SS+8-RIP
 end_repeat_nmi:
 
@@ -1842,8 +1851,12 @@ nmi_swapgs:
SWAPGS_UNSAFE_STACK
 nmi_restore:
RESTORE_ALL 8
+
+   /* Pop the extra iret frame */
+   addq $(5*8), %rsp
+
/* Clear the NMI executing stack variable */
-   movq $0, 10*8(%rsp)
+   movq $0, 5*8(%rsp)
jmp irq_return
CFI_ENDPROC
 END(nmi)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 0/6] Move rest of omap-iommu to live in drivers/iommu

2012-11-02 Thread Tony Lindgren

Hi all,

Resending again, now using the updated email address
for Joerg.

We need to move the iommu code to live under drivers
for arm common zImage support.

Regards,

Tony

---

Ido Yariv (3):
  ARM: OMAP: Merge iommu2.h into iommu.h
  ARM: OMAP2+: Move iopgtable header to drivers/iommu/
  ARM: OMAP2+: Make some definitions local

Tony Lindgren (3):
  ARM: OMAP2+: Move plat/iovmm.h to include/linux/omap-iommu.h
  ARM: OMAP2+: Move iommu2 to drivers/iommu/omap-iommu2.c
  ARM: OMAP2+: Move iommu/iovmm headers to platform_data


 arch/arm/mach-omap2/Makefile|2 
 arch/arm/mach-omap2/devices.c   |2 
 arch/arm/mach-omap2/omap-iommu.c|2 
 arch/arm/mach-omap2/omap_hwmod_3xxx_data.c  |2 
 arch/arm/mach-omap2/omap_hwmod_44xx_data.c  |2 
 arch/arm/plat-omap/include/plat/iommu2.h|   96 ---
 arch/arm/plat-omap/include/plat/iovmm.h |   89 --
 drivers/iommu/Makefile  |1 
 drivers/iommu/omap-iommu-debug.c|8 +-
 drivers/iommu/omap-iommu.c  |   39 
 drivers/iommu/omap-iommu.h  |  133 ++-
 drivers/iommu/omap-iommu2.c |   11 ++
 drivers/iommu/omap-iopgtable.h  |   22 
 drivers/iommu/omap-iovmm.c  |   50 ++
 drivers/media/platform/omap3isp/isp.c   |1 
 drivers/media/platform/omap3isp/isp.h   |4 -
 drivers/media/platform/omap3isp/ispccdc.c   |1 
 drivers/media/platform/omap3isp/ispstat.c   |1 
 drivers/media/platform/omap3isp/ispvideo.c  |3 -
 include/linux/omap-iommu.h  |   52 +++
 include/linux/platform_data/iommu-omap.h|   49 ++
 21 files changed, 279 insertions(+), 291 deletions(-)
 delete mode 100644 arch/arm/plat-omap/include/plat/iommu2.h
 delete mode 100644 arch/arm/plat-omap/include/plat/iovmm.h
 rename arch/arm/plat-omap/include/plat/iommu.h => drivers/iommu/omap-iommu.h 
(69%)
 rename arch/arm/mach-omap2/iommu2.c => drivers/iommu/omap-iommu2.c (96%)
 rename arch/arm/plat-omap/include/plat/iopgtable.h => 
drivers/iommu/omap-iopgtable.h (85%)
 create mode 100644 include/linux/omap-iommu.h
 create mode 100644 include/linux/platform_data/iommu-omap.h

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/6] ARM: OMAP: Merge iommu2.h into iommu.h

2012-11-02 Thread Tony Lindgren

From: Ido Yariv 

Since iommu is not supported on OMAP1 and will not likely to ever be
supported, merge plat/iommu2.h into iommu.h so only one file would have
to move to platform_data/ as part of the single zImage effort.

Cc: Joerg Roedel 
Cc: Ohad Ben-Cohen 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: Omar Ramirez Luna 
Signed-off-by: Ido Yariv 
Signed-off-by: Tony Lindgren 
---
 arch/arm/plat-omap/include/plat/iommu.h  |   88 ++--
 arch/arm/plat-omap/include/plat/iommu2.h |   96 --
 2 files changed, 83 insertions(+), 101 deletions(-)
 delete mode 100644 arch/arm/plat-omap/include/plat/iommu2.h

diff --git a/arch/arm/plat-omap/include/plat/iommu.h 
b/arch/arm/plat-omap/include/plat/iommu.h
index 68b5f03..7e8c7b6 100644
--- a/arch/arm/plat-omap/include/plat/iommu.h
+++ b/arch/arm/plat-omap/include/plat/iommu.h
@@ -13,6 +13,12 @@
 #ifndef __MACH_IOMMU_H
 #define __MACH_IOMMU_H
 
+#include 
+
+#if defined(CONFIG_ARCH_OMAP1)
+#error "iommu for this processor not implemented yet"
+#endif
+
 struct iotlb_entry {
u32 da;
u32 pa;
@@ -159,11 +165,70 @@ static inline struct omap_iommu *dev_to_omap_iommu(struct 
device *dev)
 #define OMAP_IOMMU_ERR_TBLWALK_FAULT   (1 << 3)
 #define OMAP_IOMMU_ERR_MULTIHIT_FAULT  (1 << 4)
 
-#if defined(CONFIG_ARCH_OMAP1)
-#error "iommu for this processor not implemented yet"
-#else
-#include 
-#endif
+/*
+ * MMU Register offsets
+ */
+#define MMU_REVISION   0x00
+#define MMU_SYSCONFIG  0x10
+#define MMU_SYSSTATUS  0x14
+#define MMU_IRQSTATUS  0x18
+#define MMU_IRQENABLE  0x1c
+#define MMU_WALKING_ST 0x40
+#define MMU_CNTL   0x44
+#define MMU_FAULT_AD   0x48
+#define MMU_TTB0x4c
+#define MMU_LOCK   0x50
+#define MMU_LD_TLB 0x54
+#define MMU_CAM0x58
+#define MMU_RAM0x5c
+#define MMU_GFLUSH 0x60
+#define MMU_FLUSH_ENTRY0x64
+#define MMU_READ_CAM   0x68
+#define MMU_READ_RAM   0x6c
+#define MMU_EMU_FAULT_AD   0x70
+
+#define MMU_REG_SIZE   256
+
+/*
+ * MMU Register bit definitions
+ */
+#define MMU_LOCK_BASE_SHIFT10
+#define MMU_LOCK_BASE_MASK (0x1f << MMU_LOCK_BASE_SHIFT)
+#define MMU_LOCK_BASE(x)   \
+   ((x & MMU_LOCK_BASE_MASK) >> MMU_LOCK_BASE_SHIFT)
+
+#define MMU_LOCK_VICT_SHIFT4
+#define MMU_LOCK_VICT_MASK (0x1f << MMU_LOCK_VICT_SHIFT)
+#define MMU_LOCK_VICT(x)   \
+   ((x & MMU_LOCK_VICT_MASK) >> MMU_LOCK_VICT_SHIFT)
+
+#define MMU_CAM_VATAG_SHIFT12
+#define MMU_CAM_VATAG_MASK \
+   ((~0UL >> MMU_CAM_VATAG_SHIFT) << MMU_CAM_VATAG_SHIFT)
+#define MMU_CAM_P  (1 << 3)
+#define MMU_CAM_V  (1 << 2)
+#define MMU_CAM_PGSZ_MASK  3
+#define MMU_CAM_PGSZ_1M(0 << 0)
+#define MMU_CAM_PGSZ_64K   (1 << 0)
+#define MMU_CAM_PGSZ_4K(2 << 0)
+#define MMU_CAM_PGSZ_16M   (3 << 0)
+
+#define MMU_RAM_PADDR_SHIFT12
+#define MMU_RAM_PADDR_MASK \
+   ((~0UL >> MMU_RAM_PADDR_SHIFT) << MMU_RAM_PADDR_SHIFT)
+#define MMU_RAM_ENDIAN_SHIFT   9
+#define MMU_RAM_ENDIAN_MASK(1 << MMU_RAM_ENDIAN_SHIFT)
+#define MMU_RAM_ENDIAN_BIG (1 << MMU_RAM_ENDIAN_SHIFT)
+#define MMU_RAM_ENDIAN_LITTLE  (0 << MMU_RAM_ENDIAN_SHIFT)
+#define MMU_RAM_ELSZ_SHIFT 7
+#define MMU_RAM_ELSZ_MASK  (3 << MMU_RAM_ELSZ_SHIFT)
+#define MMU_RAM_ELSZ_8 (0 << MMU_RAM_ELSZ_SHIFT)
+#define MMU_RAM_ELSZ_16(1 << MMU_RAM_ELSZ_SHIFT)
+#define MMU_RAM_ELSZ_32(2 << MMU_RAM_ELSZ_SHIFT)
+#define MMU_RAM_ELSZ_NONE  (3 << MMU_RAM_ELSZ_SHIFT)
+#define MMU_RAM_MIXED_SHIFT6
+#define MMU_RAM_MIXED_MASK (1 << MMU_RAM_MIXED_SHIFT)
+#define MMU_RAM_MIXED  MMU_RAM_MIXED_MASK
 
 /*
  * utilities for super page(16MB, 1MB, 64KB and 4KB)
@@ -218,4 +283,17 @@ omap_iommu_dump_ctx(struct omap_iommu *obj, char *buf, 
ssize_t len);
 extern size_t
 omap_dump_tlb_entries(struct omap_iommu *obj, char *buf, ssize_t len);
 
+/*
+ * register accessors
+ */
+static inline u32 iommu_read_reg(struct omap_iommu *obj, size_t offs)
+{
+   return __raw_readl(obj->regbase + offs);
+}
+
+static inline void iommu_write_reg(struct omap_iommu *obj, u32 val, size_t 
offs)
+{
+   __raw_writel(val, obj->regbase + offs);
+}
+
 #endif /* __MACH_IOMMU_H */
diff --git a/arch/arm/plat-omap/include/plat/iommu2.h 
b/arch/arm/plat-omap/include/plat/iommu2.h
deleted file mode 100644
index d4116b5..000
--- a/arch/arm/plat-omap/include/plat/iommu2.h
+++ /dev/null
@@ -1,96 +0,0 @@
-/*
- * omap iommu: omap2 architecture specific definitions
- *
- * Copyright (C) 2008-2009 Nokia Corporation
- *
- * Written by Hiroshi DOYU 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software

[PATCH 3/6] ARM: OMAP2+: Move plat/iovmm.h to include/linux/omap-iommu.h

2012-11-02 Thread Tony Lindgren

Looks like the iommu framework does not have generic functions
exported for all the needs yet. The hardware specific functions
are defined in files like intel-iommu.h and amd-iommu.h. Follow
the same standard for omap-iommu.h.

This is needed because we are removing plat and mach includes
for ARM common zImage support. Further work should continue
in the iommu framework context as only pure platform data will
be communicated from arch/arm/*omap*/* code to the iommu
framework.

Cc: Joerg Roedel 
Cc: Ohad Ben-Cohen 
Cc: Ido Yariv 
Cc: Laurent Pinchart 
Cc: Omar Ramirez Luna 
Cc: linux-me...@vger.kernel.org
Acked-by: Mauro Carvalho Chehab 
Signed-off-by: Tony Lindgren 
---
 arch/arm/mach-omap2/iommu2.c   |1 
 arch/arm/plat-omap/include/plat/iommu.h|   10 +--
 arch/arm/plat-omap/include/plat/iovmm.h|   89 
 drivers/iommu/omap-iommu-debug.c   |2 -
 drivers/iommu/omap-iommu.c |1 
 drivers/iommu/omap-iovmm.c |   46 ++
 drivers/media/platform/omap3isp/isp.c  |1 
 drivers/media/platform/omap3isp/isp.h  |4 -
 drivers/media/platform/omap3isp/ispccdc.c  |1 
 drivers/media/platform/omap3isp/ispstat.c  |1 
 drivers/media/platform/omap3isp/ispvideo.c |2 -
 include/linux/omap-iommu.h |   52 
 12 files changed, 107 insertions(+), 103 deletions(-)
 delete mode 100644 arch/arm/plat-omap/include/plat/iovmm.h
 create mode 100644 include/linux/omap-iommu.h

diff --git a/arch/arm/mach-omap2/iommu2.c b/arch/arm/mach-omap2/iommu2.c
index eefc379..e8116cf 100644
--- a/arch/arm/mach-omap2/iommu2.c
+++ b/arch/arm/mach-omap2/iommu2.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/arch/arm/plat-omap/include/plat/iommu.h 
b/arch/arm/plat-omap/include/plat/iommu.h
index 7e8c7b6..a4b71b1 100644
--- a/arch/arm/plat-omap/include/plat/iommu.h
+++ b/arch/arm/plat-omap/include/plat/iommu.h
@@ -216,13 +216,10 @@ static inline struct omap_iommu *dev_to_omap_iommu(struct 
device *dev)
 #define MMU_RAM_PADDR_SHIFT12
 #define MMU_RAM_PADDR_MASK \
((~0UL >> MMU_RAM_PADDR_SHIFT) << MMU_RAM_PADDR_SHIFT)
-#define MMU_RAM_ENDIAN_SHIFT   9
+
 #define MMU_RAM_ENDIAN_MASK(1 << MMU_RAM_ENDIAN_SHIFT)
-#define MMU_RAM_ENDIAN_BIG (1 << MMU_RAM_ENDIAN_SHIFT)
-#define MMU_RAM_ENDIAN_LITTLE  (0 << MMU_RAM_ENDIAN_SHIFT)
-#define MMU_RAM_ELSZ_SHIFT 7
 #define MMU_RAM_ELSZ_MASK  (3 << MMU_RAM_ELSZ_SHIFT)
-#define MMU_RAM_ELSZ_8 (0 << MMU_RAM_ELSZ_SHIFT)
+
 #define MMU_RAM_ELSZ_16(1 << MMU_RAM_ELSZ_SHIFT)
 #define MMU_RAM_ELSZ_32(2 << MMU_RAM_ELSZ_SHIFT)
 #define MMU_RAM_ELSZ_NONE  (3 << MMU_RAM_ELSZ_SHIFT)
@@ -269,9 +266,6 @@ extern int omap_iommu_set_isr(const char *name,
void *priv),
 void *isr_priv);
 
-extern void omap_iommu_save_ctx(struct device *dev);
-extern void omap_iommu_restore_ctx(struct device *dev);
-
 extern int omap_install_iommu_arch(const struct iommu_functions *ops);
 extern void omap_uninstall_iommu_arch(const struct iommu_functions *ops);
 
diff --git a/arch/arm/plat-omap/include/plat/iovmm.h 
b/arch/arm/plat-omap/include/plat/iovmm.h
deleted file mode 100644
index 498e57c..000
--- a/arch/arm/plat-omap/include/plat/iovmm.h
+++ /dev/null
@@ -1,89 +0,0 @@
-/*
- * omap iommu: simple virtual address space management
- *
- * Copyright (C) 2008-2009 Nokia Corporation
- *
- * Written by Hiroshi DOYU 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#ifndef __IOMMU_MMAP_H
-#define __IOMMU_MMAP_H
-
-#include 
-
-struct iovm_struct {
-   struct omap_iommu   *iommu; /* iommu object which this belongs to */
-   u32 da_start; /* area definition */
-   u32 da_end;
-   u32 flags; /* IOVMF_: see below */
-   struct list_headlist; /* linked in ascending order */
-   const struct sg_table   *sgt; /* keep 'page' <-> 'da' mapping */
-   void*va; /* mpu side mapped address */
-};
-
-/*
- * IOVMF_FLAGS: attribute for iommu virtual memory area(iovma)
- *
- * lower 16 bit is used for h/w and upper 16 bit is for s/w.
- */
-#define IOVMF_SW_SHIFT 16
-
-/*
- * iovma: h/w flags derived from cam and ram attribute
- */
-#define IOVMF_CAM_MASK (~((1 << 10) - 1))
-#define IOVMF_RAM_MASK (~IOVMF_CAM_MASK)
-
-#define IOVMF_PGSZ_MASK(3 << 0)
-#define IOVMF_PGSZ_1M  MMU_CAM_PGSZ_1M
-#define IOVMF_PGSZ_64K MMU_CAM_PGSZ_64K
-#define IOVMF_PGSZ_4K  MMU_CAM_PGSZ_4K
-#define IOVMF_PGSZ_16M MMU_CAM_PGSZ_16M
-
-#define IOVMF_ENDIAN_MASK  (1 << 9)
-#define IOVMF_ENDIAN_BIG

[PATCH 4/6] ARM: OMAP2+: Move iommu2 to drivers/iommu/omap-iommu2.c

2012-11-02 Thread Tony Lindgren

This file should not be in arch/arm. Move it to drivers/iommu
to allow making most of the header local to drivers/iommu.

This is needed as we are removing plat and mach includes
from drivers for ARM common zImage support.

Cc: Joerg Roedel 
Cc: Ohad Ben-Cohen 
Cc: Ido Yariv 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: Omar Ramirez Luna 
Cc: linux-me...@vger.kernel.org
Signed-off-by: Tony Lindgren 
---
 arch/arm/mach-omap2/Makefile|2 
 arch/arm/plat-omap/include/plat/iommu.h |  272 ++-
 drivers/iommu/Makefile  |1 
 drivers/iommu/omap-iommu-debug.c|1 
 drivers/iommu/omap-iommu.c  |   19 ++
 drivers/iommu/omap-iommu.h  |  255 +
 drivers/iommu/omap-iommu2.c |2 
 drivers/iommu/omap-iopgtable.h  |   22 ---
 drivers/iommu/omap-iovmm.c  |1 
 9 files changed, 293 insertions(+), 282 deletions(-)
 create mode 100644 drivers/iommu/omap-iommu.h
 rename arch/arm/mach-omap2/iommu2.c => drivers/iommu/omap-iommu2.c (99%)

diff --git a/arch/arm/mach-omap2/Makefile b/arch/arm/mach-omap2/Makefile
index fe40d9e..d6721a7 100644
--- a/arch/arm/mach-omap2/Makefile
+++ b/arch/arm/mach-omap2/Makefile
@@ -184,8 +184,6 @@ obj-$(CONFIG_HW_PERF_EVENTS)+= pmu.o
 obj-$(CONFIG_OMAP_MBOX_FWK)+= mailbox_mach.o
 mailbox_mach-objs  := mailbox.o
 
-obj-$(CONFIG_OMAP_IOMMU)   += iommu2.o
-
 iommu-$(CONFIG_OMAP_IOMMU) := omap-iommu.o
 obj-y  += $(iommu-m) $(iommu-y)
 
diff --git a/arch/arm/plat-omap/include/plat/iommu.h 
b/arch/arm/plat-omap/include/plat/iommu.h
index a4b71b1..c677b9f 100644
--- a/arch/arm/plat-omap/include/plat/iommu.h
+++ b/arch/arm/plat-omap/include/plat/iommu.h
@@ -10,103 +10,21 @@
  * published by the Free Software Foundation.
  */
 
-#ifndef __MACH_IOMMU_H
-#define __MACH_IOMMU_H
-
-#include 
-
-#if defined(CONFIG_ARCH_OMAP1)
-#error "iommu for this processor not implemented yet"
-#endif
-
-struct iotlb_entry {
-   u32 da;
-   u32 pa;
-   u32 pgsz, prsvd, valid;
-   union {
-   u16 ap;
-   struct {
-   u32 endian, elsz, mixed;
-   };
-   };
-};
-
-struct omap_iommu {
-   const char  *name;
-   struct module   *owner;
-   struct clk  *clk;
-   void __iomem*regbase;
-   struct device   *dev;
-   void*isr_priv;
-   struct iommu_domain *domain;
-
-   unsigned intrefcount;
-   spinlock_t  iommu_lock; /* global for this whole object */
-
-   /*
-* We don't change iopgd for a situation like pgd for a task,
-* but share it globally for each iommu.
-*/
-   u32 *iopgd;
-   spinlock_t  page_table_lock; /* protect iopgd */
-
-   int nr_tlb_entries;
-
-   struct list_headmmap;
-   struct mutexmmap_lock; /* protect mmap */
-
-   void *ctx; /* iommu context: registres saved area */
-   u32 da_start;
-   u32 da_end;
-};
-
-struct cr_regs {
-   union {
-   struct {
-   u16 cam_l;
-   u16 cam_h;
-   };
-   u32 cam;
-   };
-   union {
-   struct {
-   u16 ram_l;
-   u16 ram_h;
-   };
-   u32 ram;
-   };
-};
-
-struct iotlb_lock {
-   short base;
-   short vict;
-};
-
-/* architecture specific functions */
-struct iommu_functions {
-   unsigned long   version;
-
-   int (*enable)(struct omap_iommu *obj);
-   void (*disable)(struct omap_iommu *obj);
-   void (*set_twl)(struct omap_iommu *obj, bool on);
-   u32 (*fault_isr)(struct omap_iommu *obj, u32 *ra);
-
-   void (*tlb_read_cr)(struct omap_iommu *obj, struct cr_regs *cr);
-   void (*tlb_load_cr)(struct omap_iommu *obj, struct cr_regs *cr);
-
-   struct cr_regs *(*alloc_cr)(struct omap_iommu *obj,
-   struct iotlb_entry *e);
-   int (*cr_valid)(struct cr_regs *cr);
-   u32 (*cr_to_virt)(struct cr_regs *cr);
-   void (*cr_to_e)(struct cr_regs *cr, struct iotlb_entry *e);
-   ssize_t (*dump_cr)(struct omap_iommu *obj, struct cr_regs *cr,
-   char *buf);
-
-   u32 (*get_pte_attr)(struct iotlb_entry *e);
+#define MMU_REG_SIZE   256
 
-   void (*save_ctx)(struct omap_iommu *obj);
-   void (*restore_ctx)(struct omap_iommu *obj);
-   ssize_t (*dump_ctx)(struct omap_iommu *obj, char *buf, ssize_t len);
+/**
+ * struct iommu_arch_data - omap iommu private data
+ * @name: name of the iommu device
+ * @iommu_dev: handle of the iommu device
+ *
+ * This is an omap iommu private data object, which binds an iommu user
+ * to its

[PATCH 6/6] ARM: OMAP2+: Move iommu/iovmm headers to platform_data

2012-11-02 Thread Tony Lindgren

Move iommu/iovmm headers from plat/ to platform_data/ as part of the
single zImage work.

Partially based on an earlier version by Ido Yariv .

Cc: Joerg Roedel 
Cc: Ohad Ben-Cohen 
Cc: Ido Yariv 
Cc: Laurent Pinchart 
Cc: Omar Ramirez Luna 
Acked-by: Mauro Carvalho Chehab 
Signed-off-by: Tony Lindgren 
---
 arch/arm/mach-omap2/devices.c  |2 +-
 arch/arm/mach-omap2/omap-iommu.c   |2 +-
 arch/arm/mach-omap2/omap_hwmod_3xxx_data.c |2 +-
 arch/arm/mach-omap2/omap_hwmod_44xx_data.c |2 +-
 drivers/iommu/omap-iommu-debug.c   |3 +--
 drivers/iommu/omap-iommu.c |2 +-
 drivers/iommu/omap-iommu2.c|2 +-
 drivers/iommu/omap-iovmm.c |3 +--
 drivers/media/platform/omap3isp/ispvideo.c |1 -
 include/linux/platform_data/iommu-omap.h   |0 
 10 files changed, 8 insertions(+), 11 deletions(-)
 rename arch/arm/plat-omap/include/plat/iommu.h => 
include/linux/platform_data/iommu-omap.h (100%)

diff --git a/arch/arm/mach-omap2/devices.c b/arch/arm/mach-omap2/devices.c
index cba60e0..1002ff8 100644
--- a/arch/arm/mach-omap2/devices.c
+++ b/arch/arm/mach-omap2/devices.c
@@ -126,7 +126,7 @@ static struct platform_device omap2cam_device = {
 
 #if defined(CONFIG_IOMMU_API)
 
-#include 
+#include 
 
 static struct resource omap3isp_resources[] = {
{
diff --git a/arch/arm/mach-omap2/omap-iommu.c b/arch/arm/mach-omap2/omap-iommu.c
index df298d4..a6a4ff8 100644
--- a/arch/arm/mach-omap2/omap-iommu.c
+++ b/arch/arm/mach-omap2/omap-iommu.c
@@ -13,7 +13,7 @@
 #include 
 #include 
 
-#include 
+#include 
 
 #include "soc.h"
 #include "common.h"
diff --git a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c 
b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
index f67b7ee..621bc71 100644
--- a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
@@ -26,8 +26,8 @@
 #include 
 #include 
 #include 
+#include 
 #include 
-#include 
 
 #include "am35xx.h"
 
diff --git a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c 
b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
index 652d028..5850b3e 100644
--- a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
@@ -27,10 +27,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-#include 
 
 #include "omap_hwmod_common_data.h"
 #include "cm1_44xx.h"
diff --git a/drivers/iommu/omap-iommu-debug.c b/drivers/iommu/omap-iommu-debug.c
index d0427bd..d97fbe4 100644
--- a/drivers/iommu/omap-iommu-debug.c
+++ b/drivers/iommu/omap-iommu-debug.c
@@ -19,8 +19,7 @@
 #include 
 #include 
 #include 
-
-#include 
+#include 
 
 #include "omap-iopgtable.h"
 #include "omap-iommu.h"
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index df84087..badc17c 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -26,7 +26,7 @@
 
 #include 
 
-#include 
+#include 
 
 #include "omap-iopgtable.h"
 #include "omap-iommu.h"
diff --git a/drivers/iommu/omap-iommu2.c b/drivers/iommu/omap-iommu2.c
index 29e98a2..c020202 100644
--- a/drivers/iommu/omap-iommu2.c
+++ b/drivers/iommu/omap-iommu2.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
+#include 
 
-#include 
 #include "omap-iommu.h"
 
 /*
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index 3e3b242..46d87569 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -18,12 +18,11 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
-#include 
-
 #include "omap-iopgtable.h"
 #include "omap-iommu.h"
 
diff --git a/drivers/media/platform/omap3isp/ispvideo.c 
b/drivers/media/platform/omap3isp/ispvideo.c
index a4b8290..21f7313 100644
--- a/drivers/media/platform/omap3isp/ispvideo.c
+++ b/drivers/media/platform/omap3isp/ispvideo.c
@@ -35,7 +35,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "ispvideo.h"
diff --git a/arch/arm/plat-omap/include/plat/iommu.h 
b/include/linux/platform_data/iommu-omap.h
similarity index 100%
rename from arch/arm/plat-omap/include/plat/iommu.h
rename to include/linux/platform_data/iommu-omap.h

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/6] ARM: OMAP2+: Make some definitions local

2012-11-02 Thread Tony Lindgren

From: Ido Yariv 

Move some of the definitions in omap-iommu.h that can be made local to
either drivers/iommu.

Cc: Joerg Roedel 
Cc: Ohad Ben-Cohen 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: Omar Ramirez Luna 
Signed-off-by: Ido Yariv 
[t...@atomide.com: updated for header changes in the series]
Signed-off-by: Tony Lindgren 
---
 drivers/iommu/omap-iommu.c  |   15 +++
 drivers/iommu/omap-iommu.h  |   33 +++--
 drivers/iommu/omap-iommu2.c |6 ++
 3 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 4db86e1..df84087 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -54,6 +54,21 @@ struct omap_iommu_domain {
spinlock_t lock;
 };
 
+#define MMU_LOCK_BASE_SHIFT10
+#define MMU_LOCK_BASE_MASK (0x1f << MMU_LOCK_BASE_SHIFT)
+#define MMU_LOCK_BASE(x)   \
+   ((x & MMU_LOCK_BASE_MASK) >> MMU_LOCK_BASE_SHIFT)
+
+#define MMU_LOCK_VICT_SHIFT4
+#define MMU_LOCK_VICT_MASK (0x1f << MMU_LOCK_VICT_SHIFT)
+#define MMU_LOCK_VICT(x)   \
+   ((x & MMU_LOCK_VICT_MASK) >> MMU_LOCK_VICT_SHIFT)
+
+struct iotlb_lock {
+   short base;
+   short vict;
+};
+
 /* accommodate the difference between omap1 and omap2/3 */
 static const struct iommu_functions *arch_iommu;
 
diff --git a/drivers/iommu/omap-iommu.h b/drivers/iommu/omap-iommu.h
index 8c3378d..2b5f3c0 100644
--- a/drivers/iommu/omap-iommu.h
+++ b/drivers/iommu/omap-iommu.h
@@ -72,11 +72,6 @@ struct cr_regs {
};
 };
 
-struct iotlb_lock {
-   short base;
-   short vict;
-};
-
 /* architecture specific functions */
 struct iommu_functions {
unsigned long   version;
@@ -117,13 +112,6 @@ static inline struct omap_iommu *dev_to_omap_iommu(struct 
device *dev)
 }
 #endif
 
-/* IOMMU errors */
-#define OMAP_IOMMU_ERR_TLB_MISS(1 << 0)
-#define OMAP_IOMMU_ERR_TRANS_FAULT (1 << 1)
-#define OMAP_IOMMU_ERR_EMU_MISS(1 << 2)
-#define OMAP_IOMMU_ERR_TBLWALK_FAULT   (1 << 3)
-#define OMAP_IOMMU_ERR_MULTIHIT_FAULT  (1 << 4)
-
 /*
  * MMU Register offsets
  */
@@ -151,16 +139,6 @@ static inline struct omap_iommu *dev_to_omap_iommu(struct 
device *dev)
 /*
  * MMU Register bit definitions
  */
-#define MMU_LOCK_BASE_SHIFT10
-#define MMU_LOCK_BASE_MASK (0x1f << MMU_LOCK_BASE_SHIFT)
-#define MMU_LOCK_BASE(x)   \
-   ((x & MMU_LOCK_BASE_MASK) >> MMU_LOCK_BASE_SHIFT)
-
-#define MMU_LOCK_VICT_SHIFT4
-#define MMU_LOCK_VICT_MASK (0x1f << MMU_LOCK_VICT_SHIFT)
-#define MMU_LOCK_VICT(x)   \
-   ((x & MMU_LOCK_VICT_MASK) >> MMU_LOCK_VICT_SHIFT)
-
 #define MMU_CAM_VATAG_SHIFT12
 #define MMU_CAM_VATAG_MASK \
((~0UL >> MMU_CAM_VATAG_SHIFT) << MMU_CAM_VATAG_SHIFT)
@@ -222,20 +200,15 @@ extern void omap_iotlb_cr_to_e(struct cr_regs *cr, struct 
iotlb_entry *e);
 extern int
 omap_iopgtable_store_entry(struct omap_iommu *obj, struct iotlb_entry *e);
 
-extern int omap_iommu_set_isr(const char *name,
-int (*isr)(struct omap_iommu *obj, u32 da, u32 iommu_errs,
-   void *priv),
-void *isr_priv);
-
 extern void omap_iommu_save_ctx(struct device *dev);
 extern void omap_iommu_restore_ctx(struct device *dev);
 
-extern int omap_install_iommu_arch(const struct iommu_functions *ops);
-extern void omap_uninstall_iommu_arch(const struct iommu_functions *ops);
-
 extern int omap_foreach_iommu_device(void *data,
int (*fn)(struct device *, void *));
 
+extern int omap_install_iommu_arch(const struct iommu_functions *ops);
+extern void omap_uninstall_iommu_arch(const struct iommu_functions *ops);
+
 extern ssize_t
 omap_iommu_dump_ctx(struct omap_iommu *obj, char *buf, ssize_t len);
 extern size_t
diff --git a/drivers/iommu/omap-iommu2.c b/drivers/iommu/omap-iommu2.c
index f97c386..29e98a2 100644
--- a/drivers/iommu/omap-iommu2.c
+++ b/drivers/iommu/omap-iommu2.c
@@ -68,6 +68,12 @@
 ((pgsz) == MMU_CAM_PGSZ_64K) ? 0x :\
 ((pgsz) == MMU_CAM_PGSZ_4K)  ? 0xf000 : 0)
 
+/* IOMMU errors */
+#define OMAP_IOMMU_ERR_TLB_MISS(1 << 0)
+#define OMAP_IOMMU_ERR_TRANS_FAULT (1 << 1)
+#define OMAP_IOMMU_ERR_EMU_MISS(1 << 2)
+#define OMAP_IOMMU_ERR_TBLWALK_FAULT   (1 << 3)
+#define OMAP_IOMMU_ERR_MULTIHIT_FAULT  (1 << 4)
 
 static void __iommu_set_twl(struct omap_iommu *obj, bool on)
 {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] ARM: OMAP2+: Move iopgtable header to drivers/iommu/

2012-11-02 Thread Tony Lindgren

From: Ido Yariv 

The iopgtable header file is only used by the iommu & iovmm drivers, so
move it to drivers/iommu/, as part of the single zImage effort.

Cc: Joerg Roedel 
Cc: Ohad Ben-Cohen 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: Omar Ramirez Luna 
Signed-off-by: Ido Yariv 
[t...@atomide.com: updated to be earlier in the series]
Signed-off-by: Tony Lindgren 
---
 drivers/iommu/omap-iommu-debug.c|2 +-
 drivers/iommu/omap-iommu.c  |2 +-
 drivers/iommu/omap-iopgtable.h  |0 
 drivers/iommu/omap-iovmm.c  |2 +-
 4 files changed, 3 insertions(+), 3 deletions(-)
 rename arch/arm/plat-omap/include/plat/iopgtable.h => 
drivers/iommu/omap-iopgtable.h (100%)

diff --git a/drivers/iommu/omap-iommu-debug.c b/drivers/iommu/omap-iommu-debug.c
index f55fc5d..0cac372 100644
--- a/drivers/iommu/omap-iommu-debug.c
+++ b/drivers/iommu/omap-iommu-debug.c
@@ -22,7 +22,7 @@
 #include 
 #include 
 
-#include 
+#include "omap-iopgtable.h"
 
 #define MAXCOLUMN 100 /* for short messages */
 
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index d0b1234..f2bbfb0 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -26,7 +26,7 @@
 
 #include 
 
-#include 
+#include "omap-iopgtable.h"
 
 #define for_each_iotlb_cr(obj, n, __i, cr) \
for (__i = 0;   \
diff --git a/arch/arm/plat-omap/include/plat/iopgtable.h 
b/drivers/iommu/omap-iopgtable.h
similarity index 100%
rename from arch/arm/plat-omap/include/plat/iopgtable.h
rename to drivers/iommu/omap-iopgtable.h
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index 2e10c3e..b332392 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-#include 
+#include "omap-iopgtable.h"
 
 static struct kmem_cache *iovm_area_cachep;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 00/29] kmem controller for memcg.

2012-11-02 Thread JoonSoo Kim

Hello, Glauber.

2012/11/2 Glauber Costa :
> On 11/02/2012 04:04 AM, Andrew Morton wrote:
>> On Thu,  1 Nov 2012 16:07:16 +0400
>> Glauber Costa  wrote:
>>
>>> Hi,
>>>
>>> This work introduces the kernel memory controller for memcg. Unlike previous
>>> submissions, this includes the whole controller, comprised of slab and stack
>>> memory.
>>
>> I'm in the middle of (re)reading all this.  Meanwhile I'll push it all
>> out to http://ozlabs.org/~akpm/mmots/ for the crazier testers.
>>
>> One thing:
>>
>>> Numbers can be found at https://lkml.org/lkml/2012/9/13/239
>>
>> You claim in the above that the fork worload is 'slab intensive".  Or
>> at least, you seem to - it's a bit fuzzy.
>>
>> But how slab intensive is it, really?
>>
>> What is extremely slab intensive is networking.  The networking guys
>> are very sensitive to slab performance.  If this hasn't already been
>> done, could you please determine what impact this has upon networking?
>> I expect Eric Dumazet, Dave Miller and Tom Herbert could suggest
>> testing approaches.
>>
>
> I can test it, but unfortunately I am unlikely to get to prepare a good
> environment before Barcelona.
>
> I know, however, that Greg Thelen was testing netperf in his setup.
> Greg, do you have any publishable numbers you could share?

Below is my humble opinion.
I am worrying about data cache footprint which is possibly caused by
this patchset, especially slab implementation.
If there are several memcg cgroups, each cgroup has it's own kmem_caches.
When each group do slab-intensive job hard, data cache may be overflowed easily,
and cache miss rate will be high, therefore this would decrease system
performance highly.
Is there any result about this?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/asm] x86: Don't clobber top of pt_regs in nested NMI

2012-11-02 Thread tip-bot for Salman Qazi

Commit-ID:  28696f434fef0efa97534b59986ad33b9c4df7f8
Gitweb: http://git.kernel.org/tip/28696f434fef0efa97534b59986ad33b9c4df7f8
Author: Salman Qazi 
AuthorDate: Mon, 1 Oct 2012 17:29:25 -0700
Committer:  Steven Rostedt 
CommitDate: Fri, 2 Nov 2012 11:29:36 -0400

x86: Don't clobber top of pt_regs in nested NMI

The nested NMI modifies the place (instruction, flags and stack)
that the first NMI will iret to.  However, the copy of registers
modified is exactly the one that is the part of pt_regs in
the first NMI.  This can change the behaviour of the first NMI.

In particular, Google's arch_trigger_all_cpu_backtrace handler
also prints regions of memory surrounding addresses appearing in
registers.  This results in handled exceptions, after which nested NMIs
start coming in.  These nested NMIs change the value of registers
in pt_regs.  This can cause the original NMI handler to produce
incorrect output.

We solve this problem by interchanging the position of the preserved
copy of the iret registers ("saved") and the copy subject to being
trampled by nested NMI ("copied").

Link: 
http://lkml.kernel.org/r/20121002002919.27236.14388.st...@dungbeetle.mtv.corp.google.com

Signed-off-by: Salman Qazi 
[ Added a needed CFI_ADJUST_CFA_OFFSET ]
Signed-off-by: Steven Rostedt 
---
 arch/x86/kernel/entry_64.S |   41 +++--
 1 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b51b2c7..811795d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1699,9 +1699,10 @@ nested_nmi:
 
 1:
/* Set up the interrupted NMIs stack to jump to repeat_nmi */
-   leaq -6*8(%rsp), %rdx
+   leaq -1*8(%rsp), %rdx
movq %rdx, %rsp
-   CFI_ADJUST_CFA_OFFSET 6*8
+   CFI_ADJUST_CFA_OFFSET 1*8
+   leaq -10*8(%rsp), %rdx
pushq_cfi $__KERNEL_DS
pushq_cfi %rdx
pushfq_cfi
@@ -1709,8 +1710,8 @@ nested_nmi:
pushq_cfi $repeat_nmi
 
/* Put stack back */
-   addq $(11*8), %rsp
-   CFI_ADJUST_CFA_OFFSET -11*8
+   addq $(6*8), %rsp
+   CFI_ADJUST_CFA_OFFSET -6*8
 
 nested_nmi_out:
popq_cfi %rdx
@@ -1736,18 +1737,18 @@ first_nmi:
 * +-+
 * | NMI executing variable  |
 * +-+
-* | Saved SS|
-* | Saved Return RSP|
-* | Saved RFLAGS|
-* | Saved CS|
-* | Saved RIP   |
-* +-+
 * | copied SS   |
 * | copied Return RSP   |
 * | copied RFLAGS   |
 * | copied CS   |
 * | copied RIP  |
 * +-+
+* | Saved SS|
+* | Saved Return RSP|
+* | Saved RFLAGS|
+* | Saved CS|
+* | Saved RIP   |
+* +-+
 * | pt_regs |
 * +-+
 *
@@ -1763,9 +1764,14 @@ first_nmi:
/* Set the NMI executing variable on the stack. */
pushq_cfi $1
 
+   /*
+* Leave room for the "copied" frame
+*/
+   subq $(5*8), %rsp
+
/* Copy the stack frame to the Saved frame */
.rept 5
-   pushq_cfi 6*8(%rsp)
+   pushq_cfi 11*8(%rsp)
.endr
CFI_DEF_CFA_OFFSET SS+8-RIP
 
@@ -1786,12 +1792,15 @@ repeat_nmi:
 * is benign for the non-repeat case, where 1 was pushed just above
 * to this very stack slot).
 */
-   movq $1, 5*8(%rsp)
+   movq $1, 10*8(%rsp)
 
/* Make another copy, this one may be modified by nested NMIs */
+   addq $(10*8), %rsp
+   CFI_ADJUST_CFA_OFFSET -10*8
.rept 5
-   pushq_cfi 4*8(%rsp)
+   pushq_cfi -6*8(%rsp)
.endr
+   subq $(5*8), %rsp
CFI_DEF_CFA_OFFSET SS+8-RIP
 end_repeat_nmi:
 
@@ -1842,8 +1851,12 @@ nmi_swapgs:
SWAPGS_UNSAFE_STACK
 nmi_restore:
RESTORE_ALL 8
+
+   /* Pop the extra iret frame */
+   addq $(5*8), %rsp
+
/* Clear the NMI executing stack variable */
-   movq $0, 10*8(%rsp)
+   movq $0, 5*8(%rsp)
jmp irq_return
CFI_ENDPROC
 END(nmi)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kswapd0: excessive CPU usage

2012-11-02 Thread Jiri Slaby

On 11/02/2012 11:53 AM, Jiri Slaby wrote:
> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote:
 Yes, applying this instead of the revert fixes the issue as well.
>>
>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive
>> CPU usage - mainly  after  suspend/resume
>>
>> Here is just simple  kswapd backtrace from running kernel:
> 
> Yup, this is what we were seeing with the former patch only too. Try to
> apply the other one too:
> https://patchwork.kernel.org/patch/1673231/
> 
> For me I would say, it is fixed by the two patches now. I won't be able
> to report later, since I'm leaving to a conference tomorrow.

Damn it. It recurred right now, with both patches applied. After I
started a java program which consumed some more memory. Though there are
still 2 gigs free, kswap is spinning:
[] __cond_resched+0x2a/0x40
[] shrink_slab+0x1c0/0x2d0
[] kswapd+0x66d/0xb60
[] kthread+0xc0/0xd0
[] ret_from_fork+0x7c/0xb0
[] 0x

-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFT RESEND linux-next] c6x: dma-mapping: support debug_dma_mapping_error

2012-11-02 Thread Shuah Khan

On Fri, 2012-11-02 at 15:10 -0400, Mark Salter wrote:
> On Fri, 2012-11-02 at 10:44 -0600, Shuah Khan wrote:
> > On Fri, 2012-10-26 at 09:40 -0600, Shuah Khan wrote:
> > > Add support for debug_dma_mapping_error() call to avoid warning from
> > > debug_dma_unmap() interface when it checks for mapping error checked
> > > status. Without this patch, device driver failed to check map error
> > > warning is generated.
> > > 
> > > Signed-off-by: Shuah Khan 
> > > ---
> > >  arch/c6x/include/asm/dma-mapping.h |1 +
> > >  1 file changed, 1 insertion(+)
> 
> > Would you like to this patch go through c6x arch tree or linux-next?
> > Please let me know your preference.
> 
> I tried to test this but I get a build error with CONFIG_DMA_API_DEBUG:
> 
> /linux-next/lib/dma-debug.c: In function 'has_mapping_error':
> /linux-next/lib/dma-debug.c:863:15: error: implicit declaration of function 
> 'get_dma_ops' [-Werror=implicit-function-declaration]
> /linux-next/lib/dma-debug.c:863:34: warning: initialization makes pointer 
> from integer without a cast [enabled by default]
> 
> C6X (along with some other architectures) doesn't have a get_dma_ops()
> function defined.

That is a problem I didn't think about. I did a check and looks like c6x
and frv are the only ones that don't have get_dma_ops() defined. frv is
in a different category as it doesn't use dma_debug interfaces. IN the
case c6x, now with my change to add debug_dma_mapping_error(), we will
start seeing warnings since dma_map_page() and dma_map_single() are
debugged with a call to debug_dma_map_page() and the corresponding
dma_mapping_error() interface doesn't call debug_dma_mapping_error()
interface

- Does adding get_dma_ops() make sense? Doesn't look like c6x exports
dma_ops?

Any other ideas?

-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: urandom is too slow

2012-11-02 Thread Theodore Ts'o

On Fri, Nov 02, 2012 at 03:10:05AM +0200, Lasse Kärkkäinen wrote:
> Thank you for your answers, they should be very helpful for someone
> who is actually blanking or shredding their disks. However, I am
> just genuinely interested on why is no better CSPRNG algorithm used
> in the kernel (is it simply because no-one sent a patch or am I
> missing something?).

The answer is that the goal of /dev/urandom is not to be a
cryptographic random number generator (CRNG); a CRNG relies on the
security of the cryptographic primitive for its strength.  For
example, a CRNG which is based on DES or AES encrypting an
incrementing counter using a secret key, is fundamentally reliant on
the strength of DES or AES.  If DES were to be broken, for example, an
attacker would be able to determine secret key and thus predict all
future outputs of a DES-based CRNG.

The design of the /dev/random and /dev/urandom is to take advantage of
the kernel's access to unpredictability from the hardware, and to
avoid being "brittle" even in the face of a discovery of a weakness of
its cryptographic primitives.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] acpi: add missing newline to printk

2012-11-02 Thread Rafael J. Wysocki

On Friday, November 02, 2012 02:51:18 PM Cesar Eduardo Barros wrote:
> The missing newline causes messages like this on dmesg:
> 
> [2.578212] ACPI: Invalid Power Resource to register!<5>[2.578456] ...
> 
> Cc: Lin Ming 
> Cc: Len Brown 
> Signed-off-by: Cesar Eduardo Barros 

An equivalent patch has been applied already.

Thanks,
Rafael


> ---
>  drivers/acpi/power.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c
> index 40e38a0..7db61b8 100644
> --- a/drivers/acpi/power.c
> +++ b/drivers/acpi/power.c
> @@ -473,7 +473,7 @@ int acpi_power_resource_register_device(struct device 
> *dev, acpi_handle handle)
>   return ret;
>  
>  no_power_resource:
> - printk(KERN_DEBUG PREFIX "Invalid Power Resource to register!");
> + printk(KERN_DEBUG PREFIX "Invalid Power Resource to register!\n");
>   return -ENODEV;
>  }
>  EXPORT_SYMBOL_GPL(acpi_power_resource_register_device);
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2 v5] block/throttle: Add IO submitted information in blkio.throttle

2012-11-02 Thread Vivek Goyal

On Fri, Nov 02, 2012 at 05:31:37PM +0800, Robin Dong wrote:
> From: Robin Dong 
> 
> Currently, if the IO is throttled by io-throttle, the system admin has no idea
> of the situation and can't report it to the real application user about that
> he/she has to do something.
> 
> So this patch adds a new interface named blkio.throttle.io_submitted which
> exposes the number of bios that have been sent into blk-throttle therefore the
> user could calculate the difference from throttle.io_serviced to see how many
> IOs are currently throttled.
> 
> Cc: Tejun Heo 
> Cc: Vivek Goyal 
> Cc: Jens Axboe 
> Signed-off-by: Tao Ma 
> Signed-off-by: Robin Dong 
> ---

Looks good to me.

Acked-by: Vivek Goyal 

Vivek

> v3 <-- v2:
>  - Use nr-queued[] of struct throtl_grp for stats instaed of adding new 
> blkg_rwstat.
> 
> v4 <-- v3:
>  - Add two new blkg_rwstat arguments to count total bios be sent into 
> blk_throttle.
> 
> v5 <-- v4:
>  - Change name "io_submit_bytes" to "io_submitted_bytes".
> 
>  block/blk-throttle.c |   43 +++
>  1 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 46ddeff..c6391b5 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -46,6 +46,10 @@ struct tg_stats_cpu {
>   struct blkg_rwstat  service_bytes;
>   /* total IOs serviced, post merge */
>   struct blkg_rwstat  serviced;
> + /* total bytes submitted into blk-throttle */
> + struct blkg_rwstat  submit_bytes;
> + /* total IOs submitted into blk-throttle */
> + struct blkg_rwstat  submitted;
>  };
>  
>  struct throtl_grp {
> @@ -266,6 +270,8 @@ static void throtl_pd_reset_stats(struct blkcg_gq *blkg)
>  
>   blkg_rwstat_reset(>service_bytes);
>   blkg_rwstat_reset(>serviced);
> + blkg_rwstat_reset(>submit_bytes);
> + blkg_rwstat_reset(>submitted);
>   }
>  }
>  
> @@ -699,6 +705,30 @@ static void throtl_update_dispatch_stats(struct 
> throtl_grp *tg, u64 bytes,
>   local_irq_restore(flags);
>  }
>  
> +static void throtl_update_submit_stats(struct throtl_grp *tg, u64 bytes, int 
> rw)
> +{
> + struct tg_stats_cpu *stats_cpu;
> + unsigned long flags;
> +
> + /* If per cpu stats are not allocated yet, don't do any accounting. */
> + if (tg->stats_cpu == NULL)
> + return;
> +
> + /*
> +  * Disabling interrupts to provide mutual exclusion between two
> +  * writes on same cpu. It probably is not needed for 64bit. Not
> +  * optimizing that case yet.
> +  */
> + local_irq_save(flags);
> +
> + stats_cpu = this_cpu_ptr(tg->stats_cpu);
> +
> + blkg_rwstat_add(_cpu->submitted, rw, 1);
> + blkg_rwstat_add(_cpu->submit_bytes, rw, bytes);
> +
> + local_irq_restore(flags);
> +}
> +
>  static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
>  {
>   bool rw = bio_data_dir(bio);
> @@ -1084,6 +1114,16 @@ static struct cftype throtl_files[] = {
>   .private = offsetof(struct tg_stats_cpu, serviced),
>   .read_seq_string = tg_print_cpu_rwstat,
>   },
> + {
> + .name = "throttle.io_submitted_bytes",
> + .private = offsetof(struct tg_stats_cpu, submit_bytes),
> + .read_seq_string = tg_print_cpu_rwstat,
> + },
> + {
> + .name = "throttle.io_submitted",
> + .private = offsetof(struct tg_stats_cpu, submitted),
> + .read_seq_string = tg_print_cpu_rwstat,
> + },
>   { } /* terminate */
>  };
>  
> @@ -1128,6 +1168,8 @@ bool blk_throtl_bio(struct request_queue *q, struct bio 
> *bio)
>   if (tg_no_rule_group(tg, rw)) {
>   throtl_update_dispatch_stats(tg,
>bio->bi_size, bio->bi_rw);
> + throtl_update_submit_stats(tg,
> + bio->bi_size, bio->bi_rw);
>   goto out_unlock_rcu;
>   }
>   }
> @@ -1141,6 +1183,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio 
> *bio)
>   if (unlikely(!tg))
>   goto out_unlock;
>  
> + throtl_update_submit_stats(tg, bio->bi_size, bio->bi_rw);
>   if (tg->nr_queued[rw]) {
>   /*
>* There is already another bio queued in same dir. No
> -- 
> 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] hwmon patches for 3.7-rc4

2012-11-02 Thread Guenter Roeck

Hi Linus,

Please pull hwmon patches for Linux 3.7-rc4 from signed tag:

git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
hwmon-for-linus

Thanks,
Guenter
--

The following changes since commit 8f0d8163b50e01f398b14bcd4dc039ac5ab18d64:

  Linux 3.7-rc3 (2012-10-28 12:24:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
tags/hwmon-for-linus

for you to fetch changes up to eaa7cc60f7dff5e74ef387ace8228235fab8241b:

  hwmon: Only include of_match_table with CONFIG_OF_GPIO (2012-11-01 17:31:20 
-0700)


An e-mail address update, and fix a compile error on SPARC


Andreas Herrmann (1):
  hwmon, fam15h_power: Change email address, MAINTAINERS entry

Jamie Lentin (1):
  hwmon: Only include of_match_table with CONFIG_OF_GPIO

 Documentation/hwmon/fam15h_power |2 +-
 MAINTAINERS  |2 +-
 drivers/hwmon/fam15h_power.c |4 ++--
 drivers/hwmon/gpio-fan.c |2 ++
 4 files changed, 6 insertions(+), 4 deletions(-)


signature.asc
Description: Digital signature

Re: [PATCH 0/2] xen-pciback: parsing improvements

2012-11-02 Thread Konrad Rzeszutek Wilk

On Fri, Nov 02, 2012 at 02:35:40PM +, Jan Beulich wrote:
> 1: simplify and tighten parsing of device IDs
> 2: reject out of range inputs

applied for v3.8.

Thanks!
> 
> Signed-off-by: Jan Beulich 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL REQUEST] UniCore32 update for v3.7-rc3

2012-11-02 Thread Linus Torvalds

On Fri, Nov 2, 2012 at 2:32 AM, guanxuetao  wrote:
>
>   git://github.com/gxt/linux.git unicore32

Can you please use your gpg signature to make a signed *tag* and ask
me to pull that, instead of signing your email?

Email signing is largely useless, because no email client that I have
ever wanted to use seem to make it useful. In contrast, git signed
tags are checked automatically, and also allow you to write a message
that ends up in the merge automatically..

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: setting up CDB filters in udev (was Re: [PATCH v2 0/3] block: add queue-private command filter, editable via sysfs)

2012-11-02 Thread Alan Cox

>  * Devices are given standard filter matching the device class.  Any
>!CAP_SYS_RAWIO user can only issue commands allowed by the filter.
> 
>  * CAP_SYS_RAWIO can issue an ioctl to disable the filter all
>accessors of the fd and transfer it.
> 
> That should be enough, no?

No 

a - there are lots of cases you want to allow only a subset of commands.

b - if you are using a BPF filter which is the obvious way to do it then
the flexibility comes for free without any extra complexity as the kernel
provides a generic implementation, and even a JIT for complex cases.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 5 6 7 8 9 10 11 12 >

901 - 1000 of 1140 matches

Mail list logo