date:20150401

Re: [PATCH] cpusets: Make cpus_allowed and mems_allowed masks hotplug invariant

2015-04-01 Thread Preeti U Murthy

Hi Tejun, Peter,

On 10/09/2014 06:36 PM, Tejun Heo wrote:
> On Thu, Oct 09, 2014 at 01:50:52PM +0530, Preeti U Murthy wrote:
>> However what remains to be answered is that the V2 of cgroup design -
>> the default hierarchy, tracks hotplug operations for children cgroups as
>> well. Tejun, Li, will not the concerns that Peter raised above hold for
>> the default hierarchy as well?
> 
> I don't think the legacy one is a good design.  Kernel shouldn't lose
> configurations in an irreversible way and the legacy one is also
> making random cpuset flips by migrating tasks upwards anyway.  In
> terms of hotunplug behavior, the legacy and unified ones behave the
> same.  The only difference is that the configuration is independent of
> the current state and the configured behavior is restored when the
> cpus come back.  The other side is that the legacy hierarchy behavior
> simply can't be allowed when the hierarchy is shared among multiple
> controllers as in the unified hierarchy.  It affects all other
> controllers attached to the hierarchy.

We have a use case currently, which needs this to be fixed one way or
the other. While running in a virtualized setup, there may be a need to
hotplug in resources to VMs at runtime. This includes CPUs and Memory.
Due to the behavior of the legacy hierarchy, the new CPUs never get
used. This is not even a scenario where we hot-unplugged CPUs and ask
for it to be plugged back again. Its a case where the workloads running
within a VM are in need of more resources than they began with.

> 
> That said, we can't change the behavior on the legacy one.  It's a
> very userland visible behavior.  We simply can't change it, so

By ensuring that the user configured cpusets are untouched, I don't see
how we affect userspace adversely. The expectation usually is that the
kernel keeps track of the user configurations. If anything we would be
fixing an undesired behavior, wouldn't we?

> unfortunately you're stuck with it at least on the legacy hierarchy.

Given that we are in much need for this to be fixed and that we cannot
easily move to the default hierarchy, can you please take a look at this
patch again?

It is understandable that there are good reasons why legacy hierarchy
currently behaves this way or how we cannot drastically change its
behavior, but there is no sane way in which userspace can get around
this for the sake of genuine use cases such as the above.

Regards
Preeti U Murthy
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] selftests/mount: output error messages when mount test fail

2015-04-01 Thread Zhang Zhen

Without this patch, if /proc/self/uid_map is not exist,
the mount test case will fail and no any prompting.

After applied this patch, the case will prompt why it fail.
Just as follows:
root@kernel-host:/opt/kernel> make -C tools/testing/selftests TARGETS=mount 
run_tests
make: Entering directory `/opt/kernel/tools/testing/selftests'
for TARGET in mount; do \
make -C $TARGET; \
done;
make[1]: Entering directory `/opt/kernel/tools/testing/selftests/mount'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/opt/kernel/tools/testing/selftests/mount'
for TARGET in mount; do \
make -C $TARGET run_tests; \
done;
make[1]: Entering directory `/opt/kernel/tools/testing/selftests/mount'
ERROR: No /proc/self/uid_map exist
make[1]: Leaving directory `/opt/kernel/tools/testing/selftests/mount'
make: Leaving directory `/opt/kernel/tools/testing/selftests'

Signed-off-by: Zhang Zhen 
---
 tools/testing/selftests/mount/Makefile | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mount/Makefile 
b/tools/testing/selftests/mount/Makefile
index a5b367f..b3266db 100644
--- a/tools/testing/selftests/mount/Makefile
+++ b/tools/testing/selftests/mount/Makefile
@@ -8,7 +8,12 @@ unprivileged-remount-test: unprivileged-remount-test.c
 include ../lib.mk

 TEST_PROGS := unprivileged-remount-test
-override RUN_TESTS := if [ -f /proc/self/uid_map ] ; then 
./unprivileged-remount-test ; fi
+override RUN_TESTS :=  @if [ -f /proc/self/uid_map ] ; \
+   then\
+   ./unprivileged-remount-test ; \
+   else\
+   echo "ERROR: No /proc/self/uid_map exist" ; \
+   fi
 override EMIT_TESTS := echo "$(RUN_TESTS)"

 clean:
-- 
1.8.5.5


.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC] allow constant folding in msecs_to_jiffies where possible for gcc

2015-04-01 Thread Nicholas Mc Guire

A number of cleanup patches where switching var * HZ / 1000
constructs to msecs_to_jiffies(var) to ensure that all corener
cases are handled properly. The downside of this though is that
it now uses a function call and also was not performing
constant folding where it was originally possible.

msecs_to_jiffies() will calculate jiffies even if constants are
passed in that could be handled by constant folding at compile time
using __builtin_constant_p() gcc can optimize the constant case
again.

Signed-off-by: Nicholas Mc Guire 
---

reported by Aaron Sierra 
solution suggested by Joe Perches 

Patch is against 4.0-rc6 (localversion-next is -next-20150401)

Boot tested on x86 64 only

 kernel/time/time.c |   39 ---
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 2c85b77..8cb550a 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -496,15 +496,14 @@ EXPORT_SYMBOL(ns_to_timespec64);
  *   the input value by a factor or dividing it with a factor
  *
  * We must also be careful about 32-bit overflows.
+ *
+ * msecs_to_jiffies() will check for the passed in value being constant
+ * via __builtin_constant_p() allowing gcc to eliminate most of the code
+ * __msecs_to_jiffies() will be called if the value passed does not allow
+ * gcc to do constant folding.
  */
-unsigned long msecs_to_jiffies(const unsigned int m)
+static unsigned long __msecs_to_jiffies(const unsigned int m)
 {
-   /*
-* Negative value, means infinite timeout:
-*/
-   if ((int)m < 0)
-   return MAX_JIFFY_OFFSET;
-
 #if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
/*
 * HZ is equal to or smaller than 1000, and 1000 is a nice
@@ -537,6 +536,32 @@ unsigned long msecs_to_jiffies(const unsigned int m)
>> MSEC_TO_HZ_SHR32;
 #endif
 }
+
+unsigned long msecs_to_jiffies(const unsigned int m)
+{
+   /*
+* Negative value, means infinite timeout:
+*/
+   if ((int)m < 0)
+   return MAX_JIFFY_OFFSET;
+
+   if (__builtin_constant_p(m)) {
+#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
+   return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);
+#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
+   if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+   return MAX_JIFFY_OFFSET;
+   return m * (HZ / MSEC_PER_SEC);
+#else
+   if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+   return MAX_JIFFY_OFFSET;
+
+   return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32)
+   >> MSEC_TO_HZ_SHR32;
+#endif
+   } else
+   return __msecs_to_jiffies(m);
+}
 EXPORT_SYMBOL(msecs_to_jiffies);
 
 unsigned long usecs_to_jiffies(const unsigned int u)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] clk: Show clock rate instead of return value

2015-04-01 Thread Chanwoo Choi

This patch shows the current clock rate instead of return value
when clk_set_rate() return fail because log message means the clock rate.

Cc: Mike Turquette 
Cc: Stephen Boyd 
Signed-off-by: Chanwoo Choi 
---
 drivers/clk/clk-conf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clk-conf.c b/drivers/clk/clk-conf.c
index aad4796..fb1ee65 100644
--- a/drivers/clk/clk-conf.c
+++ b/drivers/clk/clk-conf.c
@@ -107,8 +107,8 @@ static int __set_clk_rates(struct device_node *node, bool 
clk_supplier)
 
rc = clk_set_rate(clk, rate);
if (rc < 0)
-   pr_err("clk: couldn't set %s clock rate: %d\n",
-  __clk_get_name(clk), rc);
+   pr_err("clk: couldn't set %s clock rate: %ld\n",
+  __clk_get_name(clk), clk_get_rate(clk));
clk_put(clk);
}
index++;
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next V3 13/23] ptp: igb: convert to the 64 bit get/set time methods.

2015-04-01 Thread Richard Cochran

On Thu, Apr 02, 2015 at 12:06:56AM +, Keller, Jacob E wrote:
> I don't know how kernel would fix this. Usually macros like PRI64d are used 
> but I am not sure those are defined for the kernel builds

Davem fixed it by casting to (long long).

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH urgent v2] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set

2015-04-01 Thread Borislav Petkov

On Wed, Apr 01, 2015 at 02:26:34PM -0700, Andy Lutomirski wrote:
> When I wrote the opportunistic SYSRET code, I missed an important
> difference between SYSRET and IRET.  Both instructions are capable
> of setting EFLAGS.TF, but they behave differently when doing so.
> IRET will not issue a #DB trap after execution when it sets TF This
> is critical -- otherwise you'd never be able to make forward
> progress when returning to userspace.  SYSRET, on the other hand,
> will trap with #DB immediately after returning to CPL3, and the next
> instruction will never execute.
> 
> This breaks anything that opportunistically SYSRETs to a user
> context with TF set.  For example, running this code with TF set and
> a SIGTRAP handler loaded never gets past post_nop.
> 
>   extern unsigned char post_nop[];
>   asm volatile ("pushfq\n\t"
> "popq %%r11\n\t"
> "nop\n\t"
> "post_nop:"
> : : "c" (post_nop) : "r11");
> 
> In my defense, I can't find this documented in the AMD or Intel
> manual.
> 
> Fix it by using IRET to restore TF.
> 
> Fixes: 2a23c6b8a9c4 x86_64, entry: Use sysret to return to userspace when 
> possible
> Signed-off-by: Andy Lutomirski 
> ---
> 
> This affects 4.0-rc as well as -tip.  A full test case lives here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/
> 
> It's called single_step_syscall_64.
> 
> On Intel systems, the 32-bit version of that test fails for unrelated
> reasons, but that's not a regression, and fixing it will be much more
> intrusive.
> 
> Changes from v1:
>  - Remove mention of testl from changelog.
>  - Improve comment per Denys' suggestion.
> 
> arch/x86/kernel/entry_64.S | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 750c6efcb718..537716380959 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -715,7 +715,21 @@ retint_swapgs:   /* return to user-space */
>   cmpq %r11,EFLAGS(%rsp)  /* R11 == RFLAGS */
>   jne opportunistic_sysret_failed
>  
> - testq $X86_EFLAGS_RF,%r11   /* sysret can't restore RF */
> + /*
> +  * SYSRET can't restore RF.  SYSRET can restore TF, but unlike IRET,
> +  * restoring TF results in a trap from userspace immediately after
> +  * SYSRET.  This would cause an infinite loop whenever #DB happens
> +  * with register state that satisfies the opportunistic SYSRET
> +  * conditions.  For example, single-stepping this user code:
> +  *
> +  *   movq $stuck_here,%rcx
> +  *   pushfq
> +  *   popq %r11
> +  *   stuck_here:
> +  *
> +  * would never get past stuck_here.
> +  */
> + testq $(X86_EFLAGS_RF|X86_EFLAGS_TF),%r11
>   jnz opportunistic_sysret_failed
>  
>   /* nothing to check for RSP */

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set

2015-04-01 Thread Borislav Petkov

On Wed, Apr 01, 2015 at 11:18:16PM +0200, Denys Vlasenko wrote:
> On 04/01/2015 09:25 PM, Andy Lutomirski wrote:
> > Fix it by using IRET to restore TF.  Since it's late, I'm keeping
> > this minimal and keeping "testq" instead of switching to "testl".
> 
> Changing to "testl" here wins nothing.

Except less data (a dword) being shuffled and tracked for dependencies
in the machine instead of qword.

> Since r11 is used, REX prefix will be encoded anyway.

As a future cleanup, one could use one of the "old", i.e. not-extended
registers to save 2 bytes per insn (REX pfx and ModRM) but one has to
remember to do

mov %rax, %r11

in the end.

And yep, it should preferrably be %rax as we have opcode 0xa9 which
tests an immediate and RAX and saves us the ModRM as we don't need to
specify a register.

orig:
 a42:   49 f7 c3 00 00 01 00test   $0x1,%r11
 a49:   75 41   jnea8c 

Andy:
 a42:   49 f7 c3 00 01 01 00test   $0x10100,%r11
 a49:   75 41   jnea8c 

testl:
 a42:   41 f7 c3 00 01 01 00test   $0x10100,%r11d
 a49:   75 41   jnea8c 

%rax:
 a42:   a9 00 01 01 00  test   $0x10100,%eax
 a47:   75 41   jnea8a 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/13] Krait clocks + Krait CPUfreq

2015-04-01 Thread Pavel Machek

On Fri 2015-03-20 23:45:19, Stephen Boyd wrote:
> These patches provide cpufreq scaling on devices with Krait CPUs.
> In Krait CPU designs there's one PLL and two muxes per CPU, allowing
> us to switch CPU frequencies independently.
> 
>secondary
>+-++
>| QSB |---+|\
>+-+   || |-+
>  |+---|/  |
>  ||   +   |
>+-+   ||   |
>| PLL |+---+   |   primary
>+-+|  || +
>   |  |+-|\   +--+
>+---+  |  |  | \  |  |
>| HFPLL |--+-|  |-| CPU0 |
>+---+  |  || |  | |  |
>   |  || +-+ | /  +--+
>   |  |+-| / 2 |-|/
>   |  |  +-+ +
>   |  | secondary
>   |  |+
>   |  +|\
>   |   | |-+
>   +---|/  |   primary
>   +   | +
>   +-|\   +--+
>+---+| \  |  |
>| HFPLL ||  |-| CPU1 |
>+---+  | |  | |  |
>   | +-+ | /  +--+
>   +-| / 2 |-|/
> +-+ +

And actually this picture should go to Documentation/ somewhere...

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/13] Krait clocks + Krait CPUfreq

2015-04-01 Thread Pavel Machek

Hi!

On Fri 2015-03-20 23:45:19, Stephen Boyd wrote:
> These patches provide cpufreq scaling on devices with Krait CPUs.
> In Krait CPU designs there's one PLL and two muxes per CPU, allowing
> us to switch CPU frequencies independently.
> 
>secondary
>+-++
>| QSB |---+|\
>+-+   || |-+
>  |+---|/  |
>  ||   +   |
>+-+   ||   |
>| PLL |+---+   |   primary
>+-+|  || +
>   |  |+-|\   +--+
>+---+  |  |  | \  |  |
>| HFPLL |--+-|  |-| CPU0 |
>+---+  |  || |  | |  |
>   |  || +-+ | /  +--+
>   |  |+-| / 2 |-|/
>   |  |  +-+ +
>   |  | secondary
>   |  |+
>   |  +|\
>   |   | |-+
>   +---|/  |   primary
>   +   | +
>   +-|\   +--+
>+---+  v | \  |  |
>| HFPLL ||  |-| CPU1 |
>+---+  | |  | |  |
>   | +-+ | /  +--+
>   +-| / 2 |-|/
> +-+ +

Nice ascii art :-). There should be "+" at place marked by "v" :-).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 1/2] mfd/axp20x: add support for extcon cell

2015-04-01 Thread Ramakrishna Pallala

This patch adds the mfd cell info for axp288 extcon device.

Signed-off-by: Ramakrishna Pallala 
---
 drivers/mfd/axp20x.c   |   28 
 include/linux/mfd/axp20x.h |5 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/mfd/axp20x.c b/drivers/mfd/axp20x.c
index 0acbe52..a569721 100644
--- a/drivers/mfd/axp20x.c
+++ b/drivers/mfd/axp20x.c
@@ -290,6 +290,29 @@ static struct resource axp288_adc_resources[] = {
},
 };
 
+static struct resource axp288_extcon_resources[] = {
+   {
+   .start = AXP288_IRQ_VBUS_FALL,
+   .end   = AXP288_IRQ_VBUS_FALL,
+   .flags = IORESOURCE_IRQ,
+   },
+   {
+   .start = AXP288_IRQ_VBUS_RISE,
+   .end   = AXP288_IRQ_VBUS_RISE,
+   .flags = IORESOURCE_IRQ,
+   },
+   {
+   .start = AXP288_IRQ_MV_CHNG,
+   .end   = AXP288_IRQ_MV_CHNG,
+   .flags = IORESOURCE_IRQ,
+   },
+   {
+   .start = AXP288_IRQ_BC_USB_CHNG,
+   .end   = AXP288_IRQ_BC_USB_CHNG,
+   .flags = IORESOURCE_IRQ,
+   },
+};
+
 static struct resource axp288_charger_resources[] = {
{
.start = AXP288_IRQ_OV,
@@ -345,6 +368,11 @@ static struct mfd_cell axp288_cells[] = {
.resources = axp288_adc_resources,
},
{
+   .name = "axp288_extcon",
+   .num_resources = ARRAY_SIZE(axp288_extcon_resources),
+   .resources = axp288_extcon_resources,
+   },
+   {
.name = "axp288_charger",
.num_resources = ARRAY_SIZE(axp288_charger_resources),
.resources = axp288_charger_resources,
diff --git a/include/linux/mfd/axp20x.h b/include/linux/mfd/axp20x.h
index dfabd6d..4ed8071 100644
--- a/include/linux/mfd/axp20x.h
+++ b/include/linux/mfd/axp20x.h
@@ -275,4 +275,9 @@ struct axp20x_fg_pdata {
int thermistor_curve[MAX_THERM_CURVE_SIZE][2];
 };
 
+struct axp288_extcon_pdata {
+   /* GPIO pin control to switch D+/D- lines b/w PMIC and SOC */
+   struct gpio_desc *gpio_mux_cntl;
+};
+
 #endif /* __LINUX_MFD_AXP20X_H */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 2/2] extcon-axp288: Add axp288 extcon driver support

2015-04-01 Thread Ramakrishna Pallala

This patch adds the extcon support for AXP288 PMIC which
has the BC1.2 charger detection capability. Additionally
it also adds the USB mux switching support b/w SOC and PMIC
based on GPIO control.

Signed-off-by: Ramakrishna Pallala 
---
 drivers/extcon/Kconfig |7 +
 drivers/extcon/Makefile|1 +
 drivers/extcon/extcon-axp288.c |  479 
 3 files changed, 487 insertions(+)
 create mode 100644 drivers/extcon/extcon-axp288.c

diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig
index 6a1f7de..b8627f7 100644
--- a/drivers/extcon/Kconfig
+++ b/drivers/extcon/Kconfig
@@ -93,4 +93,11 @@ config EXTCON_SM5502
  Silicon Mitus SM5502. The SM5502 is a USB port accessory
  detector and switch.
 
+config EXTCON_AXP288
+   tristate "AXP288 EXTCON support"
+   depends on MFD_AXP20X && USB_PHY
+   help
+ Say Y here to enable support for USB peripheral detection
+ and USB MUX switching by AXP288 PMIC.
+
 endif # MULTISTATE_SWITCH
diff --git a/drivers/extcon/Makefile b/drivers/extcon/Makefile
index 0370b42..832ad79 100644
--- a/drivers/extcon/Makefile
+++ b/drivers/extcon/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_EXTCON_MAX8997)  += extcon-max8997.o
 obj-$(CONFIG_EXTCON_PALMAS)+= extcon-palmas.o
 obj-$(CONFIG_EXTCON_RT8973A)   += extcon-rt8973a.o
 obj-$(CONFIG_EXTCON_SM5502)+= extcon-sm5502.o
+obj-$(CONFIG_EXTCON_AXP288)+= extcon-axp288.o
diff --git a/drivers/extcon/extcon-axp288.c b/drivers/extcon/extcon-axp288.c
new file mode 100644
index 000..2e75d45
--- /dev/null
+++ b/drivers/extcon/extcon-axp288.c
@@ -0,0 +1,479 @@
+/*
+ * extcon-axp288.c - X-Power AXP288 PMIC extcon cable detection driver
+ *
+ * Copyright (C) 2015 Intel Corporation
+ * Ramakrishna Pallala 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define AXP288_PS_STAT_REG 0x00
+#define PS_STAT_VBUS_TRIGGER   (1 << 0)
+#define PS_STAT_BAT_CHRG_DIR   (1 << 2)
+#define PS_STAT_VBUS_ABOVE_VHOLD   (1 << 3)
+#define PS_STAT_VBUS_VALID (1 << 4)
+#define PS_STAT_VBUS_PRESENT   (1 << 5)
+
+#define AXP288_BC_GLOBAL_REG   0x2c
+#define BC_GLOBAL_RUN  (1 << 0)
+#define BC_GLOBAL_DET_STAT (1 << 2)
+#define BC_GLOBAL_DBP_TOUT (1 << 3)
+#define BC_GLOBAL_VLGC_COM_SEL (1 << 4)
+#define BC_GLOBAL_DCD_TOUT_MASK0x60
+#define BC_GLOBAL_DCD_TOUT_300MS   0x0
+#define BC_GLOBAL_DCD_TOUT_100MS   0x1
+#define BC_GLOBAL_DCD_TOUT_500MS   0x2
+#define BC_GLOBAL_DCD_TOUT_900MS   0x3
+#define BC_GLOBAL_DCD_DET_SEL  (1 << 7)
+
+#define AXP288_BC_VBUS_CNTL_REG0x2d
+#define VBUS_CNTL_DPDM_PD_EN   (1 << 4)
+#define VBUS_CNTL_DPDM_FD_EN   (1 << 5)
+#define VBUS_CNTL_FIRST_PO_STAT(1 << 6)
+
+#define AXP288_BC_USB_STAT_REG 0x2e
+#define USB_STAT_BUS_STAT_MASK 0x0f
+#define USB_STAT_BUS_STAT_OFFSET   0
+#define USB_STAT_BUS_STAT_ATHD 0x0
+#define USB_STAT_BUS_STAT_CONN 0x1
+#define USB_STAT_BUS_STAT_SUSP 0x2
+#define USB_STAT_BUS_STAT_CONF 0x3
+#define USB_STAT_USB_SS_MODE   (1 << 4)
+#define USB_STAT_DEAD_BAT_DET  (1 << 6)
+#define USB_STAT_DBP_UNCFG (1 << 7)
+
+#define AXP288_BC_DET_STAT_REG 0x2f
+#define DET_STAT_MASK  0xe0
+#define DET_STAT_OFFSET5
+#define DET_STAT_SDP   0x1
+#define DET_STAT_CDP   0x2
+#define DET_STAT_DCP   0x3
+
+#define AXP288_PS_BOOT_REASON_REG  0x2
+
+#define AXP288_PWRSRC_IRQ_CFG_REG  0x40
+#define PWRSRC_IRQ_CFG_MASK0x1c
+
+#define AXP288_BC12_IRQ_CFG_REG0x45
+#define BC12_IRQ_CFG_MASK  0x2
+
+#define AXP288_PWRSRC_INTR_NUM 4
+
+#define AXP288_DRV_NAME"axp288_extcon"
+
+#define AXP288_EXTCON_CABLE_SDP"Slow-charger"
+#define AXP288_EXTCON_CABLE_CDP"Charge-downstream"
+#define AXP288_EXTCON_CABLE_DCP"Fast-charger"
+
+#define EXTCON_GPIO_MUX_SEL_PMIC   0
+#define EXTCON_GPIO_MUX_SEL_SOC1
+
+enum {
+   VBUS_FALLING_IRQ = 0,
+   VBUS_RISING_IRQ,
+   MV_CHNG_IRQ,
+   BC_USB_CHNG_IRQ,
+};
+
+static const c

[PATCH v1 0/2] Add extcon support for AXP288 PMIC

2015-04-01 Thread Ramakrishna Pallala

This patch series adds the support for axp288 extcon driver
and also adds the cell info for extcon device in axp20x mfd driver.

Ramakrishna Pallala (2):
  mfd/axp20x: add support for extcon cell
  extcon-axp288: Add axp288 extcon driver support

 drivers/extcon/Kconfig |7 +
 drivers/extcon/Makefile|1 +
 drivers/extcon/extcon-axp288.c |  479 
 drivers/mfd/axp20x.c   |   28 +++
 include/linux/mfd/axp20x.h |5 +
 5 files changed, 520 insertions(+)
 create mode 100644 drivers/extcon/extcon-axp288.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf: annotate: make it respect -i option.

2015-04-01 Thread Wang Nan

There is a bug in perf annotate that it doesn't respect user provided
'-i'/'--input' option:

 # perf record ls
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
 # mv ./perf.data ./perf.data.new
 # perf annotate -i ./perf.data.new  --stdio
   failed to open perf.data: No such file or directory  (try 'perf record' 
first)

This patch fix it by setting file path after option parsing, like
what 'perf report' does.

Signed-off-by: Wang Nan 
---
 tools/perf/builtin-annotate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 71bf745..929f83c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -283,7 +283,6 @@ int cmd_annotate(int argc, const char **argv, const char 
*prefix __maybe_unused)
},
};
struct perf_data_file file = {
-   .path  = input_name,
.mode  = PERF_DATA_MODE_READ,
};
const struct option options[] = {
@@ -342,6 +341,7 @@ int cmd_annotate(int argc, const char **argv, const char 
*prefix __maybe_unused)
 
setup_browser(true);
 
+   file.path  = input_name,
annotate.session = perf_session__new(&file, false, &annotate.tool);
if (annotate.session == NULL)
return -1;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] arm64: add kdump support

2015-04-01 Thread Pratyush Anand




On Thursday 02 April 2015 11:07 AM, AKASHI Takahiro wrote:

Pratyush,

On 04/02/2015 01:58 PM, Pratyush Anand wrote:



On Thursday 02 April 2015 04:57 AM, AKASHI Takahiro wrote:

Please try my latest kexec-tools in my linaro repo (branch name is
kdump/v0.11)
and let me know the result.


Thanks a lot.. Just fetched your repo and found v.0.11.

With this crash kernel loaded successfully, if I do not use initrd.

With following I still see Overlapping memory segments

kexec -p  /home/panand/work/kernel/bsa2_kdump/vmlinux
--initrd=/boot/initramfs-3.19.0.bz1198945+.img --append="$( cat
/proc/cmdline ) maxcpus=1 mem=64M reset_devices"


How big is your initrd?
If it is good small, please tell me segments info, or messages from
add_segment_phys_virt()
for all the segments.



add_segment_phys_virt: 0dcd0b90 - 0dcd0f90 (0400) -> 
0040c3ff - 0040c400 (0001)
add_segment_phys_virt: 03ff88c10010 - 03ff8984a010 (00c3a000) -> 
0040c008 - 0040c131 (0129)
add_segment_phys_virt: 0dcd53c0 - 0dcd96b8 (42f8) -> 
0040c000 - 0040c001 (0001)
add_segment_phys_virt: 03ff87360010 - 03ff88bfcc2f (0189cc1f) -> 
0040c001 - 0040c18b (018a)

Overlapping memory segments at 0x40c18b
sort_segments failed

Why do we try to fit dtb just after crash_reserved_mem.start. Should n't 
it should start after crash_reserved_mem.start + arm64_mem.text_offset + 
arm64_mem.image_size



I tried following and it works perfectly:

diff --git a/kexec/arch/arm64/crashdump-arm64.c 
b/kexec/arch/arm64/crashdump-arm64.c

index 41266f294589..75f4e4d269ca 100644
--- a/kexec/arch/arm64/crashdump-arm64.c
+++ b/kexec/arch/arm64/crashdump-arm64.c
@@ -312,5 +312,6 @@ void set_crash_entry(struct mem_ehdr *ehdr, struct 
kexec_info *info)

 off_t locate_dtb_in_crashmem(struct kexec_info *info, off_t dtb_size)
 {
return locate_hole(info, dtb_size, 128UL * 1024,
-   crash_reserved_mem.start, crash_reserved_mem.end, 1);
+   crash_reserved_mem.start + arm64_mem.text_offset +
+   arm64_mem.image_size, crash_reserved_mem.end, 1);
 }

With this changes new allocations are:
add_segment_phys_virt: 10350b90 - 10350f90 (0400) -> 
0040c3ff - 0040c400 (0001)
add_segment_phys_virt: 03ff7ad70010 - 03ff7b9aa010 (00c3a000) -> 
0040c008 - 0040c131 (0129)
add_segment_phys_virt: 103553c0 - 103596b8 (42f8) -> 
0040c136 - 0040c137 (0001)
add_segment_phys_virt: 03ff794c0010 - 03ff7ad5cc2f (0189cc1f) -> 
0040c137 - 0040c2c1 (018a)
add_segment_phys_virt: 103596c0 - 10360190 (6ad0) -> 
0040c2c1 - 0040c2c2 (0001)



Crash kernel loaded upon panic.

~Pratyush
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Jason Low

On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:

> > I am sorry I don't quite get this. Can you please elaborate?
> 
> I think the scenario is that we are in nohz_idle_balance() and decide to
> bail out because we have pulled some tasks, but before leaving
> nohz_idle_balance() we want to check if more balancing is necessary
> using nohz_kick_needed() and potentially kick somebody to continue.

Also, below is an example patch.

(Without the conversion to idle_cpu(), the check for rq->idle_balance
would not be accurate anymore)

---
 kernel/sched/fair.c |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdae26e..7749a14 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7620,6 +7620,8 @@ out:
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static inline bool nohz_kick_needed(struct rq *rq);
+
 /*
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
@@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
int this_cpu = this_rq->cpu;
struct rq *rq;
int balance_cpu;
+   bool done_balancing = false;
 
if (idle != CPU_IDLE ||
!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
@@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
 * balancing owner will pick it up.
 */
if (need_resched())
-   break;
+   goto end;
 
rq = cpu_rq(balance_cpu);
 
@@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
if (time_after(this_rq->next_balance, rq->next_balance))
this_rq->next_balance = rq->next_balance;
}
+   done_balancing = true;
nohz.next_balance = this_rq->next_balance;
 end:
clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+   if (!done_balancing && nohz_kick_needed(this_rq))
+   nohz_balancer_kick();
 }
 
 /*
@@ -7687,7 +7693,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
int nr_busy, cpu = rq->cpu;
bool kick = false;
 
-   if (unlikely(rq->idle_balance))
+   if (unlikely(idle_cpu(cpu)))
return false;
 
/*
@@ -7757,16 +7763,13 @@ static void run_rebalance_domains(struct softirq_action 
*h)
enum cpu_idle_type idle = this_rq->idle_balance ?
CPU_IDLE : CPU_NOT_IDLE;
 
+   rebalance_domains(this_rq, idle);
/*
 * If this cpu has a pending nohz_balance_kick, then do the
 * balancing on behalf of the other idle cpus whose ticks are
-* stopped. Do nohz_idle_balance *before* rebalance_domains to
-* give the idle cpus a chance to load balance. Else we may
-* load balance only within the local sched_domain hierarchy
-* and abort nohz_idle_balance altogether if we pull some load.
+* stopped.
 */
nohz_idle_balance(this_rq, idle);
-   rebalance_domains(this_rq, idle);
 }
 
 /*
-- 
1.7.2.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-04-01 Thread James Bottomley

On Thu, 2015-04-02 at 16:39 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote:
> 
> > Ben, this is legal by design.  It was specifically designed for the
> > aic79xx SCSI card, but can be used for a variety of other reasons.  The
> > aic79xx hardware problem was that the DMA engine could address the whole
> > of memory (it had two address modes, a 39 bit one and a 64 bit one) but
> > the script engine that runs the mailboxes only had a 32 bit activation
> > register (the activating write points at the physical address of the
> > script to begin executing).  This meant that the scripts that run in
> > memory had to be in the first 4GB of physical memory, hence the split
> > mask.  The DMA mask specifies that the card can transfer from anywhere
> > in physical memory, but the consistent_dma_mask says that the consistent
> > allocation used to get scripts memory must come from the lower 4GB.
> 
> So looking at that again...
> 
> This is interesting ... basically any driver using a different mask has
> been broken on powerpc for basically ever. The whole concept was poorly
> designed, for example,  the set_consistent_mask isn't a dma_map_ops
> unlike everything else.
> 
> In some cases, what we want is convey a base+offset information to
> drivers but we can't do that.
> 
> This stuff cannot work with setups like a lot of our iommus where we
> have a remapped region at the bottom of the DMA address space and a
> bypass (direct map) region high up.
> 
> Basically, we can fix it, at least for most platforms, but it will be
> hard, invasive, *and* will need to go to stable. Grmbl.

Well, it was originally a hack for altix, because they had no regions
below 4GB and had to specifically manufacture them.  As you know, in
Linux, if Intel doesn't need it, no-one cares and the implementation
bitrots.

> We'll have to replace our "direct" DMA ops (which just apply an offset)
> which we use for devices that set a 64-bit mask on platform that support
> a bypass window, with some smart-ass hybrid variant that selectively
> shoot stuff up to the bypass window or down via the iommu remapped based
> on the applicable mask for a given operation.
> 
> It would be nice if we could come up with a way to inform the driver
> that we support that sort of "bypass" region with an offset. That would
> allow drivers that have that 64-bit base + 32-bit offset scheme to work
> much more efficiently for us. The driver could configure the base to be
> our "bypass window offset", and we could use ZONE_DMA32 for consistent
> DMAs.
> 
> It would also help us with things like some GPUs that can only do 40-bit
> DMA (which won't allow them to reach our bypass region normally) but do
> have a way to configure the generated top bits of all DMA addresses in
> some fixed register.
> 
> Any idea what such an interface might look like ?

Why doesn't the API we have today work (modulo a better implementation)?
A consistent mask specifies a wide range of possible locations for your
bypass region.  You just have to select one and attach it somehow and
remember to use it for consistent allocations.  As long as
set_consistent_mask becomes a dma op, it should all work, right?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2.1] ftracetest: Do not use usleep directly

2015-04-01 Thread Masami Hiramatsu

(2015/04/02 13:34), Namhyung Kim wrote:
> The usleep is only provided on distros from Redhat so running ftracetest
> on other distro resulted in failures due to the missing usleep.
> 
> The reason of using [u]sleep in the test was to generate (scheduler)
> events.  It can be done various ways like this:
> 
> yield() {  ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1; }

Nice hack! :)

Acked-by: Masami Hiramatsu 

> 
> Reported-by: Michael Ellerman 
> Reported-by: Dave Jones 
> Reported-by: Luis Henriques 
> Based-on-patch-by: Pádraig Brady 
> CC: Masami Hiramatsu 
> Signed-off-by: Namhyung Kim 
> ---
> fix a typo of pinc
> 
>  tools/testing/selftests/ftrace/test.d/event/event-enable.tc | 13 
> ++---
>  .../selftests/ftrace/test.d/event/subsystem-enable.tc   | 13 
> ++---
>  .../selftests/ftrace/test.d/event/toplevel-enable.tc| 13 
> +
>  3 files changed, 33 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc 
> b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
> index 668616d9bb03..c40c139aaf2b 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
> @@ -12,6 +12,10 @@ fail() { #msg
>  exit -1
>  }
>  
> +yield() {
> +ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
> +}
> +
>  if [ ! -f set_event -o ! -d events/sched ]; then
>  echo "event tracing is not supported"
>  exit_unsupported
> @@ -21,7 +25,8 @@ reset_tracer
>  do_reset
>  
>  echo 'sched:sched_switch' > set_event
> -usleep 1
> +
> +yield
>  
>  count=`cat trace | grep sched_switch | wc -l`
>  if [ $count -eq 0 ]; then
> @@ -31,7 +36,8 @@ fi
>  do_reset
>  
>  echo 1 > events/sched/sched_switch/enable
> -usleep 1
> +
> +yield
>  
>  count=`cat trace | grep sched_switch | wc -l`
>  if [ $count -eq 0 ]; then
> @@ -41,7 +47,8 @@ fi
>  do_reset
>  
>  echo 0 > events/sched/sched_switch/enable
> -usleep 1
> +
> +yield
>  
>  count=`cat trace | grep sched_switch | wc -l`
>  if [ $count -ne 0 ]; then
> diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc 
> b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> index 655c415b6e7f..cbd98b71ee8a 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> @@ -12,6 +12,10 @@ fail() { #msg
>  exit -1
>  }
>  
> +yield() {
> +ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
> +}
> +
>  if [ ! -f set_event -o ! -d events/sched ]; then
>  echo "event tracing is not supported"
>  exit_unsupported
> @@ -21,7 +25,8 @@ reset_tracer
>  do_reset
>  
>  echo 'sched:*' > set_event
> -usleep 1
> +
> +yield
>  
>  count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>  if [ $count -lt 3 ]; then
> @@ -31,7 +36,8 @@ fi
>  do_reset
>  
>  echo 1 > events/sched/enable
> -usleep 1
> +
> +yield
>  
>  count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>  if [ $count -lt 3 ]; then
> @@ -41,7 +47,8 @@ fi
>  do_reset
>  
>  echo 0 > events/sched/enable
> -usleep 1
> +
> +yield
>  
>  count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>  if [ $count -ne 0 ]; then
> diff --git a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc 
> b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
> index 480845774007..65e2ab11 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
> @@ -12,6 +12,10 @@ fail() { #msg
>  exit -1
>  }
>  
> +yield() {
> +ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
> +}
> +
>  if [ ! -f available_events -o ! -f set_event -o ! -d events ]; then
>  echo "event tracing is not supported"
>  exit_unsupported
> @@ -21,6 +25,9 @@ reset_tracer
>  do_reset
>  
>  echo '*:*' > set_event
> +
> +yield
> +
>  count=`cat trace | grep -v ^# | wc -l`
>  if [ $count -eq 0 ]; then
>  fail "none of events are recorded"
> @@ -29,6 +36,9 @@ fi
>  do_reset
>  
>  echo 1 > events/enable
> +
> +yield
> +
>  count=`cat trace | grep -v ^# | wc -l`
>  if [ $count -eq 0 ]; then
>  fail "none of events are recorded"
> @@ -37,6 +47,9 @@ fi
>  do_reset
>  
>  echo 0 > events/enable
> +
> +yield
> +
>  count=`cat trace | grep -v ^# | wc -l`
>  if [ $count -ne 0 ]; then
>  fail "any of events should not be recorded"
> 


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ

Re: [PATCH 6/7] block/partitions: use block_device name vsprintf helper

2015-04-01 Thread Sergey Senozhatsky

On (03/31/15 20:01), Dmitry Monakhov wrote:
[..]
>   if(be32_to_cpu(magic) != SGI_LABEL_MAGIC) {
> - /*printk("Dev %s SGI disklabel: bad magic %08x\n",
> -bdevname(bdev, b), be32_to_cpu(magic));*/
> + /*printk("Dev %pg SGI disklabel: bad magic %08x\n",
> +bdev, be32_to_cpu(magic));*/
>  
[..]
>   p = label->partitions;
>   if (be16_to_cpu(label->magic) != SUN_LABEL_MAGIC) {
> -/*   printk(KERN_INFO "Dev %s Sun disklabel: bad magic %04x\n",
> -bdevname(bdev, b), be16_to_cpu(label->magic)); */
> +/*   printk(KERN_INFO "Dev %pg Sun disklabel: bad magic %04x\n",
> +bdev, be16_to_cpu(label->magic)); */
>   put_dev_sector(sect);

may be entirely remove these commented out printk()-s?

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] clk: qcom: Add EBI2 clocks for IPQ806x

2015-04-01 Thread Archit Taneja

The NAND controller within EBI2 requires EBI2_CLK and EBI2_ALWAYS_ON_CLK clocks.
Create structs for these clocks so that they can be used by the NAND controller
driver. Add an entry for EBI2_AON_CLK in the gcc-ipq806x DT binding document.

Signed-off-by: Archit Taneja 
---
v2:
- removed hwcg_reg/bit entires in ebi2_aon_clk as they only apply to ebi2_clk.
- This was originally a part of the qcom nand controller series. Separating
  it out so that it's already there before qcom nand is merged.

 drivers/clk/qcom/gcc-ipq806x.c   | 32 
 include/dt-bindings/clock/qcom,gcc-ipq806x.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/clk/qcom/gcc-ipq806x.c b/drivers/clk/qcom/gcc-ipq806x.c
index cbdc31d..f7e2d07 100644
--- a/drivers/clk/qcom/gcc-ipq806x.c
+++ b/drivers/clk/qcom/gcc-ipq806x.c
@@ -2170,6 +2170,36 @@ static struct clk_branch usb_fs1_h_clk = {
},
 };
 
+static struct clk_branch ebi2_clk = {
+   .hwcg_reg = 0x3b00,
+   .hwcg_bit = 6,
+   .halt_reg = 0x2fcc,
+   .halt_bit = 1,
+   .clkr = {
+   .enable_reg = 0x3b00,
+   .enable_mask = BIT(4),
+   .hw.init = &(struct clk_init_data){
+   .name = "ebi2_clk",
+   .ops = &clk_branch_ops,
+   .flags = CLK_IS_ROOT,
+   },
+   },
+};
+
+static struct clk_branch ebi2_aon_clk = {
+   .halt_reg = 0x2fcc,
+   .halt_bit = 0,
+   .clkr = {
+   .enable_reg = 0x3b00,
+   .enable_mask = BIT(8),
+   .hw.init = &(struct clk_init_data){
+   .name = "ebi2_always_on_clk",
+   .ops = &clk_branch_ops,
+   .flags = CLK_IS_ROOT,
+   },
+   },
+};
+
 static struct clk_regmap *gcc_ipq806x_clks[] = {
[PLL0] = &pll0.clkr,
[PLL0_VOTE] = &pll0_vote,
@@ -2273,6 +2303,8 @@ static struct clk_regmap *gcc_ipq806x_clks[] = {
[USB_FS1_XCVR_SRC] = &usb_fs1_xcvr_clk_src.clkr,
[USB_FS1_XCVR_CLK] = &usb_fs1_xcvr_clk.clkr,
[USB_FS1_SYSTEM_CLK] = &usb_fs1_sys_clk.clkr,
+   [EBI2_CLK] = &ebi2_clk.clkr,
+   [EBI2_AON_CLK] = &ebi2_aon_clk.clkr,
 };
 
 static const struct qcom_reset_map gcc_ipq806x_resets[] = {
diff --git a/include/dt-bindings/clock/qcom,gcc-ipq806x.h 
b/include/dt-bindings/clock/qcom,gcc-ipq806x.h
index 04fb29a..ebd63fd 100644
--- a/include/dt-bindings/clock/qcom,gcc-ipq806x.h
+++ b/include/dt-bindings/clock/qcom,gcc-ipq806x.h
@@ -288,5 +288,6 @@
 #define UBI32_CORE2_CLK_SRC278
 #define UBI32_CORE1_CLK279
 #define UBI32_CORE2_CLK280
+#define EBI2_AON_CLK   281
 
 #endif
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 02/10] pci: move pci_msi_init_pci_dev to probe.c

2015-04-01 Thread Eric W. Biederman

Fam Zheng  writes:

> On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
>> commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0
>> "PCI: msi: Disable msi interrupts when we initialize a pci device"
>> fixes kexec when the booting kernel does not enable msi interupts.
>> 
>> Unfortunately the relevant functionality is in msi.c so it isn't
>> compiled in when CONFIG_PCI_MSI is off, which means such configurations
>> would still get interrupt storms.
>> 
>> Fix by moving part of the functionality probe.c, and compiling it
>> unconditionally.
>
> Reviewed-by: Fam Zheng 

Acked-by: "Eric W. Biederman" 

I am a little surprised that there are systems that compile out MSI
support, but I do remember the problem being a limitation of the code
when wrote it and given how nasty screaming irqs are during boot up
it seems well worth it to have a little extra code to turn them off.

Eric

>> Cc: Eric W. Biederman 
>> Signed-off-by: Michael S. Tsirkin 
>> ---
>>  drivers/pci/msi.c   | 12 
>>  drivers/pci/probe.c | 16 
>>  2 files changed, 16 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 9942f68..f66be86 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -1041,18 +1041,6 @@ EXPORT_SYMBOL(pci_msi_enabled);
>>  void pci_msi_init_pci_dev(struct pci_dev *dev)
>>  {
>>  INIT_LIST_HEAD(&dev->msi_list);
>> -
>> -/* Disable the msi hardware to avoid screaming interrupts
>> - * during boot.  This is the power on reset default so
>> - * usually this should be a noop.
>> - */
>> -dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
>> -if (dev->msi_cap)
>> -pci_msi_set_enable(dev, 0);
>> -
>> -dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
>> -if (dev->msix_cap)
>> -pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
>>  }
>>  
>>  /**
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 8d2f400..50dd934 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1483,10 +1483,26 @@ static struct pci_dev *pci_scan_device(struct 
>> pci_bus *bus, int devfn)
>>  return dev;
>>  }
>>  
>> +static void pci_msi_setup_pci_dev(struct pci_dev *dev)
>> +{
>> +/* Disable the msi hardware to avoid screaming interrupts
>> + * during boot.  This is the power on reset default so
>> + * usually this should be a noop.
>> + */
>> +dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
>> +if (dev->msi_cap)
>> +pci_msi_set_enable(dev, 0);
>> +
>> +dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
>> +if (dev->msix_cap)
>> +pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
>> +}
>> +
>>  static void pci_init_capabilities(struct pci_dev *dev)
>>  {
>>  /* MSI/MSI-X list */
>>  pci_msi_init_pci_dev(dev);
>> +pci_msi_setup_pci_dev(dev);
>>  
>>  /* Buffers for saving PCIe and PCI-X capabilities */
>>  pci_allocate_cap_save_buffers(dev);
>> -- 
>> MST
>> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v3 5/5] arm: kvm: add stub implementation for kvm_cpu_reset()

2015-04-01 Thread AKASHI Takahiro

Signed-off-by: AKASHI Takahiro 
---
 arch/arm/include/asm/kvm_asm.h  |1 +
 arch/arm/include/asm/kvm_host.h |   12 
 arch/arm/include/asm/kvm_mmu.h  |5 +
 arch/arm/kvm/init.S |6 ++
 4 files changed, 24 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 25410b2..462babf 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -85,6 +85,7 @@ struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
+extern char __kvm_hyp_reset[];
 
 extern char __kvm_hyp_exit[];
 extern char __kvm_hyp_exit_end[];
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ca97764..6d38134 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -220,6 +220,18 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
 
+static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
+   phys_addr_t phys_idmap_start,
+   unsigned long stub_vector_ptr,
+   unsigned long reset_func)
+{
+   /*
+* TODO
+* kvm_call_reset(boot_pgd_ptr, phys_idmap_start, stub_vector_ptr,
+*reset_func);
+*/
+}
+
 static inline int kvm_arch_dev_ioctl_check_extension(long ext)
 {
return 0;
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index bf0fe99..dc9543b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -66,6 +66,7 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
+phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
@@ -270,6 +271,10 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+extern char __hyp_idmap_text_start[];
+#define kvm_virt_to_trampoline(x)   \
+   (TRAMPOLINE_VA + ((x) - __hyp_idmap_text_start))
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72..9178c9a 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -156,4 +156,10 @@ target:@ We're now in the trampoline code, switch page 
tables
.globl __kvm_hyp_init_end
 __kvm_hyp_init_end:
 
+   .globl __kvm_hyp_reset
+__kvm_hyp_reset:
+   @ TODO
+
+   eret
+
.popsection
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v3 3/5] arm64: kvm: add cpu reset hook for cpu hotplug

2015-04-01 Thread AKASHI Takahiro

This patch doesn't enable cpu hotplug under kvm, but is a prerequiste
when the feature is implemented.
Once kvm_arch_hardware_enable/disable() is properly implemented,
arm64-specific cpu notifier hook, hyp_init_cpu_notify(), will be
able to be removed and replaced by generic kvm_cpu_hotplug().

Signed-off-by: AKASHI Takahiro 
---
 arch/arm/include/asm/kvm_host.h   |1 -
 arch/arm/kvm/arm.c|9 +
 arch/arm64/include/asm/kvm_host.h |1 -
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 41008cd..ca97764 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -237,7 +237,6 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
-static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f64713e..4892974 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -922,6 +922,10 @@ static int hyp_init_cpu_notify(struct notifier_block *self,
if (__hyp_get_vectors() == hyp_default_vectors)
cpu_init_hyp_mode(NULL);
break;
+   case CPU_DYING:
+   case CPU_DYING_FROZEN:
+   kvm_cpu_reset(NULL);
+   break;
}
 
return NOTIFY_OK;
@@ -1165,6 +1169,11 @@ out_err:
return err;
 }
 
+void kvm_arch_hardware_disable(void)
+{
+   kvm_cpu_reset(NULL);
+}
+
 /* NOP: Compiling as a module not supported */
 void kvm_arch_exit(void)
 {
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 6a8da9c..831e6a4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -262,7 +262,6 @@ static inline void vgic_arch_setup(const struct vgic_params 
*vgic)
}
 }
 
-static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v3 2/5] arm64: kvm: allow EL2 context to be reset on shutdown

2015-04-01 Thread AKASHI Takahiro

The current kvm implementation keeps EL2 vector table installed even
when the system is shut down. This prevents kexec from putting the system
with kvm back into EL2 when starting a new kernel.

This patch resolves this issue by calling a cpu tear-down function via
reboot notifier, kvm_reboot_notify(), which is invoked by
kernel_restart_prepare() in kernel_kexec().
While kvm has a generic hook, kvm_reboot(), we can't use it here because
a cpu teardown function will not be invoked, under current implementation,
if no guest vm has been created by kvm_create_vm().
Please note that kvm_usage_count is zero in this case.

We'd better, in the future, implement cpu hotplug support and put the
arch-specific initialization into kvm_arch_hardware_enable/disable().
This way, we would be able to revert this patch.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm/kvm/arm.c |   21 +
 arch/arm64/kvm/Kconfig |1 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 39df694..f64713e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1100,6 +1101,23 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, 
unsigned long mpidr)
return NULL;
 }
 
+static int kvm_reboot_notify(struct notifier_block *nb,
+unsigned long val, void *v)
+{
+   /*
+* Reset each CPU in EL2 to initial state.
+*/
+   on_each_cpu(kvm_cpu_reset, NULL, 1);
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_reboot_nb = {
+   .notifier_call  = kvm_reboot_notify,
+   .next   = NULL,
+   .priority   = 0, /* FIXME */
+};
+
 /**
  * Initialize Hyp-mode and memory mappings on all CPUs.
  */
@@ -1138,6 +1156,9 @@ int kvm_arch_init(void *opaque)
hyp_cpu_pm_init();
 
kvm_coproc_table_init();
+
+   register_reboot_notifier(&kvm_reboot_nb);
+
return 0;
 out_err:
cpu_notifier_register_done();
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 30ae7a7..f5590c8 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
 
 config KVM
bool "Kernel-based Virtual Machine (KVM) support"
-   depends on !KEXEC
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select ANON_INODES
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v3 4/5] arm64: kvm: add cpu reset at module exit

2015-04-01 Thread AKASHI Takahiro

This patch doesn't enable kvm to be built as a module, but is
a prerequisite when kvm is transformed to be module-capable.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm/kvm/arm.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4892974..85c142b 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1178,6 +1178,12 @@ void kvm_arch_hardware_disable(void)
 void kvm_arch_exit(void)
 {
kvm_perf_teardown();
+
+   unregister_reboot_notifier(&kvm_reboot_nb);
+   /*
+* Reset each CPU in EL2 to initial state.
+*/
+   on_each_cpu(kvm_cpu_reset, NULL, 1);
 }
 
 static int arm_init(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v3 1/5] arm64: kvm: add a cpu tear-down function

2015-04-01 Thread AKASHI Takahiro

Cpu must be put back into its initial state, at least, in the
following cases in order to shutdown the system and/or re-initialize cpus
later on:
1) kexec/kdump
2) cpu hotplug (offline)
3) removing kvm as a module

To address those issues in later patches, this patch adds a tear-down
function, kvm_cpu_reset(), that disables D-cache & MMU and restore a vector
table to the initial stub at EL2.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm/kvm/arm.c|   15 +++
 arch/arm/kvm/mmu.c|5 +
 arch/arm64/include/asm/kvm_asm.h  |1 +
 arch/arm64/include/asm/kvm_host.h |   11 +++
 arch/arm64/include/asm/kvm_mmu.h  |7 +++
 arch/arm64/include/asm/virt.h |   11 +++
 arch/arm64/kvm/hyp-init.S |   35 +++
 arch/arm64/kvm/hyp.S  |   16 +---
 8 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 5560f74..39df694 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -897,6 +897,21 @@ static void cpu_init_hyp_mode(void *dummy)
__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 }
 
+static void kvm_cpu_reset(void *dummy)
+{
+   phys_addr_t boot_pgd_ptr;
+   phys_addr_t phys_idmap_start;
+
+   if (__hyp_get_vectors() == hyp_default_vectors)
+   return;
+
+   boot_pgd_ptr = kvm_mmu_get_boot_httbr();
+   phys_idmap_start = kvm_get_idmap_start();
+   __cpu_reset_hyp_mode(boot_pgd_ptr, phys_idmap_start,
+hyp_default_vectors,
+kvm_virt_to_trampoline(__kvm_hyp_reset));
+}
+
 static int hyp_init_cpu_notify(struct notifier_block *self,
   unsigned long action, void *cpu)
 {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859b..3631a37 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1490,6 +1490,11 @@ phys_addr_t kvm_get_idmap_vector(void)
return hyp_idmap_vector;
 }
 
+phys_addr_t kvm_get_idmap_start(void)
+{
+   return hyp_idmap_start;
+}
+
 int kvm_mmu_init(void)
 {
int err;
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 4f7310f..f1c16e2 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -118,6 +118,7 @@ struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
+extern char __kvm_hyp_reset[];
 
 extern char __kvm_hyp_vector[];
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 8ac3c70..6a8da9c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -199,6 +199,8 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
 u64 kvm_call_hyp(void *hypfn, ...);
+void kvm_call_reset(phys_addr_t boot_pgd_ptr, phys_addr_t phys_idmap_start,
+   unsigned long stub_vector_ptr, unsigned long reset_func);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
@@ -223,6 +225,15 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
boot_pgd_ptr,
 hyp_stack_ptr, vector_ptr);
 }
 
+static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
+   phys_addr_t phys_idmap_start,
+   unsigned long stub_vector_ptr,
+   unsigned long reset_func)
+{
+   kvm_call_reset(boot_pgd_ptr, phys_idmap_start, stub_vector_ptr,
+  reset_func);
+}
+
 struct vgic_sr_vectors {
void*save_vgic;
void*restore_vgic;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b53..facfd6d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -96,6 +96,7 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
+phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
@@ -305,5 +306,11 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+extern char __hyp_idmap_text_start[];
+#define kvm_virt_to_trampoline(x)  \
+   (TRAMPOLINE_VA  \
+   + ((unsigned long)(x)   \
+ - ((unsigned long)__hyp_idmap_text_start & PAGE_MASK)))
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 3070096..7fcd087 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -61,6 +61,17 @@
 #define BOOT_CPU_MODE_EL1  (0xe11)
 #define

[v3 0/5] arm64: kvm: reset hyp context for kexec

2015-04-01 Thread AKASHI Takahiro

This patch set addresses KVM issue described in Geoff's kexec patch set[1].
(The subject was changed from "arm64: kexec: fix kvm issue in kexec.")
See "Changes" below.

The basic approach here is to define a kvm tear-down function and add
a reboot hook to gracefully shutdown the 1st kernel. This way, kvm gets
free from kexec-specific cleanup and yet we allows future enhancement,
like cpu hotplug & building kvm as a module, based on tear-down function.
In this sense, patch #1 & #2 (and #5) actually fix the problem, and #3 & #4
are rather informative.

I confirmed that 1st kernel successfully shut down and 2nd kernel started
with the following messages:

kvm [1]: Using HYP init bounce page @8fa52f000
kvm [1]: interrupt-controller@2c02f000 IRQ6
kvm [1]: timer IRQ3
kvm [1]: Hyp mode initialized successfully

test target: Base fast model
version: kernel v4.0-rc4 + Geoff's kexec v8 + Arn's patch[2]


I still have some concerns about the following points. Please let me know
if you have any comments:

1) Call kvm_cpu_reset() on non-boot cpus in reboot notifier
   We don't have to do so in kexec-specific case. But the current code runs
   the function on each cpu for safety since we use a general reboot hook.
2) Flush D$ in kvm_cpu_reset()
   Currently doesn't do so because all the cpus are just going to shut down,
   and we actually flush D$ on boot cpu in Geoff's cpu_reset().
3) Compatibility with arm implementation
   Frediano[2] is no longer working on this issue on arm as he left his
   company. But my approach here is based on a generic interface and can be
   applied to arm in a similar way.

Changes from v2:
* modified kvm_virt_to_trampoline() macro to fix a page-alignment issue[4]

Changes from v1:
* modified kvm_cpu_reset() implementation:
- define a macro to translate va to addr in trampoline
- use __hyp_default_vectors instead of kvm_get_hyp_stub_vectors()
- shuffle the arguments in __cpu_reset_hyp_mode()
- optimize TLB flush operations
* changed a patch#2's name
* added a patch#5 to add stub code for arm

[1] http://lists.infradead.org/pipermail/kexec/2015-March/013432.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/334002.html
[3] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/322231.html
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/334910.html


AKASHI Takahiro (5):
  arm64: kvm: add a cpu tear-down function
  arm64: kvm: allow EL2 context to be reset on shutdown
  arm64: kvm: add cpu reset hook for cpu hotplug
  arm64: kvm: add cpu reset at module exit
  arm: kvm: add stub implementation for kvm_cpu_reset()

 arch/arm/include/asm/kvm_asm.h|1 +
 arch/arm/include/asm/kvm_host.h   |   13 +-
 arch/arm/include/asm/kvm_mmu.h|5 
 arch/arm/kvm/arm.c|   51 +
 arch/arm/kvm/init.S   |6 +
 arch/arm/kvm/mmu.c|5 
 arch/arm64/include/asm/kvm_asm.h  |1 +
 arch/arm64/include/asm/kvm_host.h |   12 -
 arch/arm64/include/asm/kvm_mmu.h  |7 +
 arch/arm64/include/asm/virt.h |   11 
 arch/arm64/kvm/Kconfig|1 -
 arch/arm64/kvm/hyp-init.S |   35 +
 arch/arm64/kvm/hyp.S  |   16 +---
 13 files changed, 158 insertions(+), 6 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-04-01 Thread Benjamin Herrenschmidt

On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote:

> Ben, this is legal by design.  It was specifically designed for the
> aic79xx SCSI card, but can be used for a variety of other reasons.  The
> aic79xx hardware problem was that the DMA engine could address the whole
> of memory (it had two address modes, a 39 bit one and a 64 bit one) but
> the script engine that runs the mailboxes only had a 32 bit activation
> register (the activating write points at the physical address of the
> script to begin executing).  This meant that the scripts that run in
> memory had to be in the first 4GB of physical memory, hence the split
> mask.  The DMA mask specifies that the card can transfer from anywhere
> in physical memory, but the consistent_dma_mask says that the consistent
> allocation used to get scripts memory must come from the lower 4GB.

So looking at that again...

This is interesting ... basically any driver using a different mask has
been broken on powerpc for basically ever. The whole concept was poorly
designed, for example,  the set_consistent_mask isn't a dma_map_ops
unlike everything else.

In some cases, what we want is convey a base+offset information to
drivers but we can't do that.

This stuff cannot work with setups like a lot of our iommus where we
have a remapped region at the bottom of the DMA address space and a
bypass (direct map) region high up.

Basically, we can fix it, at least for most platforms, but it will be
hard, invasive, *and* will need to go to stable. Grmbl.

We'll have to replace our "direct" DMA ops (which just apply an offset)
which we use for devices that set a 64-bit mask on platform that support
a bypass window, with some smart-ass hybrid variant that selectively
shoot stuff up to the bypass window or down via the iommu remapped based
on the applicable mask for a given operation.

It would be nice if we could come up with a way to inform the driver
that we support that sort of "bypass" region with an offset. That would
allow drivers that have that 64-bit base + 32-bit offset scheme to work
much more efficiently for us. The driver could configure the base to be
our "bypass window offset", and we could use ZONE_DMA32 for consistent
DMAs.

It would also help us with things like some GPUs that can only do 40-bit
DMA (which won't allow them to reach our bypass region normally) but do
have a way to configure the generated top bits of all DMA addresses in
some fixed register.

Any idea what such an interface might look like ?

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] arm64: add kdump support

2015-04-01 Thread AKASHI Takahiro


Pratyush,

On 04/02/2015 01:58 PM, Pratyush Anand wrote:



On Thursday 02 April 2015 04:57 AM, AKASHI Takahiro wrote:

Please try my latest kexec-tools in my linaro repo (branch name is
kdump/v0.11)
and let me know the result.


Thanks a lot.. Just fetched your repo and found v.0.11.

With this crash kernel loaded successfully, if I do not use initrd.

With following I still see Overlapping memory segments

kexec -p  /home/panand/work/kernel/bsa2_kdump/vmlinux 
--initrd=/boot/initramfs-3.19.0.bz1198945+.img --append="$( cat
/proc/cmdline ) maxcpus=1 mem=64M reset_devices"


How big is your initrd?
If it is good small, please tell me segments info, or messages from 
add_segment_phys_virt()
for all the segments.

FYI,
my latest kexec-tools, ie. v0.11, automatically appends "mem=".

Thanks,
-Takahiro AKASHI


~Pratyush

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/3] toshiba_acpi: Update and fix USB Sleep and Charge modes

2015-04-01 Thread Darren Hart

On Sun, Mar 29, 2015 at 07:25:39PM -0600, Azael Avalos wrote:
> This patch fixes the USB Sleep and Charge mode on certain models
> where the value returned by the BIOS is different, and thus, making
> this feature not to work for those models.
> 
> Also, the "Typical" charging mode was added as a supported mode.
> 
> Signed-off-by: Azael Avalos 
> ---
>  drivers/platform/x86/toshiba_acpi.c | 69 
> -
>  1 file changed, 60 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/platform/x86/toshiba_acpi.c 
> b/drivers/platform/x86/toshiba_acpi.c
> index 17a259e..c8ad61c 100644
> --- a/drivers/platform/x86/toshiba_acpi.c
> +++ b/drivers/platform/x86/toshiba_acpi.c
> @@ -150,9 +150,10 @@ MODULE_LICENSE("GPL");
>  #define SCI_KBD_MODE_OFF 0x10
>  #define SCI_KBD_TIME_MAX 0x3c001a
>  #define SCI_USB_CHARGE_MODE_MASK 0xff
> -#define SCI_USB_CHARGE_DISABLED  0x3
> -#define SCI_USB_CHARGE_ALTERNATE 0x30009
> -#define SCI_USB_CHARGE_AUTO  0x30021
> +#define SCI_USB_CHARGE_DISABLED  0x00
> +#define SCI_USB_CHARGE_ALTERNATE 0x09
> +#define SCI_USB_CHARGE_TYPICAL   0x11
> +#define SCI_USB_CHARGE_AUTO  0x21
>  #define SCI_USB_CHARGE_BAT_MASK  0x7
>  #define SCI_USB_CHARGE_BAT_LVL_OFF   0x1
>  #define SCI_USB_CHARGE_BAT_LVL_ON0x4
> @@ -177,6 +178,7 @@ struct toshiba_acpi_dev {
>   int kbd_mode;
>   int kbd_time;
>   int usbsc_bat_level;
> + int usbsc_mode_base;
>   int hotkey_event_type;
>  
>   unsigned int illumination_supported:1;
> @@ -800,6 +802,52 @@ static int toshiba_accelerometer_get(struct 
> toshiba_acpi_dev *dev,
>  }
>  
>  /* Sleep (Charge and Music) utilities support */
> +static void toshiba_usb_sleep_charge_available(struct toshiba_acpi_dev *dev)
> +{
> + u32 in[TCI_WORDS] = { SCI_GET, SCI_USB_SLEEP_CHARGE, 0, 0, 0, 0 };
> + u32 out[TCI_WORDS];
> + acpi_status status;
> +
> + /* Set the feature to "not supported" in case of error */
> + dev->usb_sleep_charge_supported = 0;
> +
> + if (!sci_open(dev))
> + return;
> +
> + status = tci_raw(dev, in, out);
> + if (ACPI_FAILURE(status) || out[0] == TOS_FAILURE) {
> + pr_err("ACPI call to get USB Sleep and Charge mode failed\n");
> + sci_close(dev);
> + return;
> + } else if (out[0] == TOS_NOT_SUPPORTED) {
> + pr_info("USB Sleep and Charge not supported\n");
> + sci_close(dev);
> + return;
> + }

Sorry Azael for not asking the first time, and maybe this is just how it is -
but it occurs to me that after the above tci_raw call, we check for 3 error
cases, but we never test for success. Can we not check for out[0] == TOS_SUCCESS
or similar? The above logic seems like the kind to lead to failure going
unnoticed as success is assumed and not confirmed.

> + dev->usbsc_mode_base = out[4];
> +
> + in[5] = SCI_USB_CHARGE_BAT_LVL;
> + status = tci_raw(dev, in, out);
> + if (ACPI_FAILURE(status) || out[0] == TOS_FAILURE) {
> + pr_err("ACPI call to get USB Sleep and Charge mode failed\n");
> + sci_close(dev);
> + return;
> + } else if (out[0] == TOS_NOT_SUPPORTED) {
> + pr_info("USB Sleep and Charge not supported\n");
> + sci_close(dev);
> + return;
> + }

Here too.

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf: unwind: fix segbase for libunwind.

2015-04-01 Thread Wang Nan

On 2015/4/2 13:07, Wang Nan wrote:
> On 2015/4/1 22:48, Ingo Molnar wrote:
>>
>> * Wang Nan  wrote:
>>
>>> Perf passes incorrect segbase and table_data to libunwind when 
>>> map->pgoff != 0, causes unwind failure. This patch fixes this 
>>> problem.
>>>
>>> segbase is an absolute offset from the head of object file, directly 
>>> read from ELF file. Original code computes corresponding virtual 
>>> address using map->start + segbase, doesn't consider map->pgoff. 
>>> Which causes libunwind read from incorrect offset.
>>
>> What's the effect of this bug in practice?
>>
>> Is there any before/after output you can show that demonstrates the 
>> fix?
>>
> 
> I found the problem when testing my '--map-adjustment' argument. I tried to
> construct a test case for normal case, but it seems triggers futher bugs.
> 
> Following is the reproducing steps. The first two
> 
> Step 1: create a C file like this (libtest-load.c):
> 
> test-load.c---
> #define DATA_SZ 65535
> int data[DATA_SZ] = {1,1,2,3,5,8,13,};
> 
> int fib(int x)
> {
> if (x >= DATA_SZ)
> return -1;
> if ((x == 0) || (x == 1))
> return 1;
> data[x] = fib(x - 1) + fib(x - 2);
> return data[x];
> }
> --
> 
> Step 2: create a shared object with folowing ld-script (test-load.lds) using:
> 
> $ gcc -shared -fPIC ./test-load.c -Wl,-T./test-load.lds -O0 -o libtest-load.so
> 
> --test-load.lds---
> SECTIONS
> {
> .note.gnu.build-id : { *(.node.*) } :text
> .gnu.hash : { *(.gnu.hash) } :text
> .dynsym : { *(.dynsym) } :text
> .dynstr : { *(.dynstr) } :text
> .gnu.version : { *(.gnu.version) } :text
> .gnu.version_r : { *(.gnu.version_r) } :text
> .rela.dyn : { *(.rela.dyn) } :text
> .rela.plt : { *(.rela.plt) } :text
> .init : { *(.init) } :text
> .plt : { *(.plt) } :text
> .text : { *(.text) } :text
> .fini : { *(.fini) } :text
> .eh_frame_hdr : { *(.eh_frame_hdr) } :text
> .eh_frame : { *(.eh_frame) } :text
> 
> .ctors : { *(.ctors) } :data
> .dtors : { *(.dtors) } :data
> .jcr : { *(.jcr) } :data
> .got : { *(.got) } :data
> .got.plt : { *(.got.plt) } :data
> .data : { *(.data) } :data
> .bss : { *(.bss) } :data
> .dynamic : { *(.dynamic) } :dynamic
> 
> }
> 
> PHDRS
> {
> text PT_LOAD FLAGS(7);
> data PT_LOAD FLAGS(7);
> dynamic PT_DYNAMIC FLAGS(7);
> note PT_NOTE FLAGS(4);
> }
> -
> 
> In my environment, I get a shared object with:
> 
> $ readelf -l ./libtest-load.so
> 
> Elf file type is DYN (Shared object file)
> Entry point 0x390
> There are 4 program headers, starting at offset 64
> 
> Program Headers:
>   Type   Offset VirtAddr   PhysAddr
>  FileSizMemSiz  Flags  Align
>   LOAD   0x0020 0x 0x
>  0x05c4 0x05c4  RWE20
>   LOAD   0x002005c8 0x05c8 0x05c8
>  0x000400a0 0x000400b0  RWE20
>   DYNAMIC0x00240678 0x00040678 0x00040678
>  0x0180 0x0180  RWE8
>   NOTE   0x 0x 0x
>  0x 0x  R  8
> 
>  Section to Segment mapping:
>   Segment Sections...
>00 .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.plt 
> .init .plt .text .fini .eh_frame_hdr .eh_frame .rela.dyn .note.gnu.build-id
>01 .ctors .dtors .jcr .got .got.plt .data .data.rel .bss
>02 .dynamic
>03
> 
> The goal of the first 2 steps is to create a shared object which will be 
> mmap-ed with
> pgoff != 0 (offset of a PT_LOAD segment != 0). I'm not sure which part of the 
> ld scripts
> leads to this, so I paste all of them.
> 
> Step 3: Use that shared object: compile following C file using:
> 
> $ gcc ./test.c -O0 -ltest-load -L.
> 
> - test.c -
> #include 
> extern int fib(int x);
> int main()
> {
>   printf("%d\n", fib(30));
>   return 0;
> }
> -
> 
> Step 4: perf record:
> 
> $ perf record -g --call-graph=dwarf   ./a.out
> 
> Step 5: perf report:
> 
> $ perf report --stdio --no-children
> 
> # To display the perf.data header info, please use --header/--header-only 
> options.
> #
> # Samples: 40  of event 'cycles'
> # Event count (approx.): 28432005
> #
> # Overhead  Command  Shared Object Symbol
> #   ...    ..
> #
> 18.40%  a.outlibtest-load.so   [.] 0x002004ee
>   |
>   ---0x7f17127964ee
> 
> 16.85%  a.outlibtest-load.so   [.] 0x002004bf

Re: [PATCH 4/7] wmi: Use bool function return values of true/false not 1/0

2015-04-01 Thread Darren Hart

On Mon, Mar 30, 2015 at 10:43:20AM -0700, Joe Perches wrote:
> Use the normal return values for bool functions
> 
> Signed-off-by: Joe Perches 

Queued, thank you Joe.

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 59/86] x86/thinkpad_acpi: use uapi/linux/pci_ids.h directly

2015-04-01 Thread Darren Hart

On Sun, Mar 29, 2015 at 03:41:54PM +0200, Michael S. Tsirkin wrote:
> Header moved from linux/pci_ids.h to uapi/linux/pci_ids.h,
> use the new header directly so we can drop
> the wrapper in include/linux/pci_ids.h.
> 
> Signed-off-by: Michael S. Tsirkin 

This isn't in mainline yet, so I presume whoever is pushing the move to next
will pick this up as well and keep it all together.

Acked-by: Darren Hart 

> ---
>  drivers/platform/x86/thinkpad_acpi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/thinkpad_acpi.c 
> b/drivers/platform/x86/thinkpad_acpi.c
> index 3b8ceee..f1ac982 100644
> --- a/drivers/platform/x86/thinkpad_acpi.c
> +++ b/drivers/platform/x86/thinkpad_acpi.c
> @@ -77,7 +77,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> MST
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MAINTAINERS: Add me on list of Dell laptop drivers

2015-04-01 Thread Darren Hart

On Sun, Mar 29, 2015 at 03:38:50PM +0200, Pali Rohár wrote:

Please include why. No empty commit messages please.

For example, you've written nearly a third of the dell-wmi.c driver, and all the
dell-smo8800.c driver.

Dell Laptop is still pending the keyboard backlight patches though I believe
(have I missed them?), so you don't currently have code ownership there. I agree
you should be on the MAINTAINERS list, but it would be good to get your keyboard
backlight changes in first.

Gosh... sure seems like we should have merged that already... am I missing
something? The last thing I see in master, testing, and next is the revert from
Jan 21

Also, Matthew, you're the listed maintainer for these, do you agree to adding
Pali?

> Signed-off-by: Pali Rohár 
> ---
>  MAINTAINERS |7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 88c09ca..72a08ef 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3058,10 +3058,16 @@ F:drivers/net/fddi/defxx.*
>  
>  DELL LAPTOP DRIVER
>  M:   Matthew Garrett 
> +M:   Pali Rohár 
>  L:   platform-driver-...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/platform/x86/dell-laptop.c
>  
> +DELL LAPTOP FREEFALL DRIVER
> +M:   Pali Rohár 
> +S:   Maintained
> +F:   drivers/platform/x86/dell-smo8800.c
> +
>  DELL LAPTOP SMM DRIVER
>  M:   Guenter Roeck 
>  S:   Maintained
> @@ -3076,6 +3082,7 @@ F:  drivers/firmware/dcdbas.*
>  
>  DELL WMI EXTRAS DRIVER
>  M:   Matthew Garrett 
> +M:   Pali Rohár 
>  S:   Maintained
>  F:   drivers/platform/x86/dell-wmi.c
>  
> -- 
> 1.7.9.5
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf: unwind: fix segbase for libunwind.

2015-04-01 Thread Wang Nan

On 2015/4/1 22:48, Ingo Molnar wrote:
> 
> * Wang Nan  wrote:
> 
>> Perf passes incorrect segbase and table_data to libunwind when 
>> map->pgoff != 0, causes unwind failure. This patch fixes this 
>> problem.
>>
>> segbase is an absolute offset from the head of object file, directly 
>> read from ELF file. Original code computes corresponding virtual 
>> address using map->start + segbase, doesn't consider map->pgoff. 
>> Which causes libunwind read from incorrect offset.
> 
> What's the effect of this bug in practice?
> 
> Is there any before/after output you can show that demonstrates the 
> fix?
> 

I found the problem when testing my '--map-adjustment' argument. I tried to
construct a test case for normal case, but it seems triggers futher bugs.

Following is the reproducing steps. The first two

Step 1: create a C file like this (libtest-load.c):

test-load.c---
#define DATA_SZ 65535
int data[DATA_SZ] = {1,1,2,3,5,8,13,};

int fib(int x)
{
if (x >= DATA_SZ)
return -1;
if ((x == 0) || (x == 1))
return 1;
data[x] = fib(x - 1) + fib(x - 2);
return data[x];
}
--

Step 2: create a shared object with folowing ld-script (test-load.lds) using:

$ gcc -shared -fPIC ./test-load.c -Wl,-T./test-load.lds -O0 -o libtest-load.so

--test-load.lds---
SECTIONS
{
.note.gnu.build-id : { *(.node.*) } :text
.gnu.hash : { *(.gnu.hash) } :text
.dynsym : { *(.dynsym) } :text
.dynstr : { *(.dynstr) } :text
.gnu.version : { *(.gnu.version) } :text
.gnu.version_r : { *(.gnu.version_r) } :text
.rela.dyn : { *(.rela.dyn) } :text
.rela.plt : { *(.rela.plt) } :text
.init : { *(.init) } :text
.plt : { *(.plt) } :text
.text : { *(.text) } :text
.fini : { *(.fini) } :text
.eh_frame_hdr : { *(.eh_frame_hdr) } :text
.eh_frame : { *(.eh_frame) } :text

.ctors : { *(.ctors) } :data
.dtors : { *(.dtors) } :data
.jcr : { *(.jcr) } :data
.got : { *(.got) } :data
.got.plt : { *(.got.plt) } :data
.data : { *(.data) } :data
.bss : { *(.bss) } :data
.dynamic : { *(.dynamic) } :dynamic

}

PHDRS
{
text PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
dynamic PT_DYNAMIC FLAGS(7);
note PT_NOTE FLAGS(4);
}
-

In my environment, I get a shared object with:

$ readelf -l ./libtest-load.so

Elf file type is DYN (Shared object file)
Entry point 0x390
There are 4 program headers, starting at offset 64

Program Headers:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x0020 0x 0x
 0x05c4 0x05c4  RWE20
  LOAD   0x002005c8 0x05c8 0x05c8
 0x000400a0 0x000400b0  RWE20
  DYNAMIC0x00240678 0x00040678 0x00040678
 0x0180 0x0180  RWE8
  NOTE   0x 0x 0x
 0x 0x  R  8

 Section to Segment mapping:
  Segment Sections...
   00 .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.plt .init 
.plt .text .fini .eh_frame_hdr .eh_frame .rela.dyn .note.gnu.build-id
   01 .ctors .dtors .jcr .got .got.plt .data .data.rel .bss
   02 .dynamic
   03

The goal of the first 2 steps is to create a shared object which will be 
mmap-ed with
pgoff != 0 (offset of a PT_LOAD segment != 0). I'm not sure which part of the 
ld scripts
leads to this, so I paste all of them.

Step 3: Use that shared object: compile following C file using:

$ gcc ./test.c -O0 -ltest-load -L.

- test.c -
#include 
extern int fib(int x);
int main()
{
printf("%d\n", fib(30));
return 0;
}
-

Step 4: perf record:

$ perf record -g --call-graph=dwarf   ./a.out

Step 5: perf report:

$ perf report --stdio --no-children

# To display the perf.data header info, please use --header/--header-only 
options.
#
# Samples: 40  of event 'cycles'
# Event count (approx.): 28432005
#
# Overhead  Command  Shared Object Symbol
#   ...    ..
#
18.40%  a.outlibtest-load.so   [.] 0x002004ee
  |
  ---0x7f17127964ee

16.85%  a.outlibtest-load.so   [.] 0x002004bf
  |
  ---0x7f17127964bf

15.20%  a.outlibtest-load.so   [.] 0x002004c2
  |
  ---0x7f17127964c2

 9.33%  a.outlibtest-load.so   [.] 0x0020049c
  |
  ---0x7f171279649c


Perf failed to extract symbol name and als

Re: [PATCH 0/5] arm64: add kdump support

2015-04-01 Thread Pratyush Anand




On Thursday 02 April 2015 04:57 AM, AKASHI Takahiro wrote:

Please try my latest kexec-tools in my linaro repo (branch name is
kdump/v0.11)
and let me know the result.


Thanks a lot.. Just fetched your repo and found v.0.11.

With this crash kernel loaded successfully, if I do not use initrd.

With following I still see Overlapping memory segments

kexec -p  /home/panand/work/kernel/bsa2_kdump/vmlinux 
--initrd=/boot/initramfs-3.19.0.bz1198945+.img --append="$( cat 
/proc/cmdline ) maxcpus=1 mem=64M reset_devices"


~Pratyush
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2.1] ftracetest: Do not use usleep directly

2015-04-01 Thread Namhyung Kim

The usleep is only provided on distros from Redhat so running ftracetest
on other distro resulted in failures due to the missing usleep.

The reason of using [u]sleep in the test was to generate (scheduler)
events.  It can be done various ways like this:

yield() {  ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1; }

Reported-by: Michael Ellerman 
Reported-by: Dave Jones 
Reported-by: Luis Henriques 
Based-on-patch-by: Pádraig Brady 
CC: Masami Hiramatsu 
Signed-off-by: Namhyung Kim 
---
fix a typo of pinc

 tools/testing/selftests/ftrace/test.d/event/event-enable.tc | 13 ++---
 .../selftests/ftrace/test.d/event/subsystem-enable.tc   | 13 ++---
 .../selftests/ftrace/test.d/event/toplevel-enable.tc| 13 +
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc 
b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
index 668616d9bb03..c40c139aaf2b 100644
--- a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
@@ -12,6 +12,10 @@ fail() { #msg
 exit -1
 }
 
+yield() {
+ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
+}
+
 if [ ! -f set_event -o ! -d events/sched ]; then
 echo "event tracing is not supported"
 exit_unsupported
@@ -21,7 +25,8 @@ reset_tracer
 do_reset
 
 echo 'sched:sched_switch' > set_event
-usleep 1
+
+yield
 
 count=`cat trace | grep sched_switch | wc -l`
 if [ $count -eq 0 ]; then
@@ -31,7 +36,8 @@ fi
 do_reset
 
 echo 1 > events/sched/sched_switch/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep sched_switch | wc -l`
 if [ $count -eq 0 ]; then
@@ -41,7 +47,8 @@ fi
 do_reset
 
 echo 0 > events/sched/sched_switch/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep sched_switch | wc -l`
 if [ $count -ne 0 ]; then
diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc 
b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
index 655c415b6e7f..cbd98b71ee8a 100644
--- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
@@ -12,6 +12,10 @@ fail() { #msg
 exit -1
 }
 
+yield() {
+ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
+}
+
 if [ ! -f set_event -o ! -d events/sched ]; then
 echo "event tracing is not supported"
 exit_unsupported
@@ -21,7 +25,8 @@ reset_tracer
 do_reset
 
 echo 'sched:*' > set_event
-usleep 1
+
+yield
 
 count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -lt 3 ]; then
@@ -31,7 +36,8 @@ fi
 do_reset
 
 echo 1 > events/sched/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -lt 3 ]; then
@@ -41,7 +47,8 @@ fi
 do_reset
 
 echo 0 > events/sched/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -ne 0 ]; then
diff --git a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc 
b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
index 480845774007..65e2ab11 100644
--- a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
@@ -12,6 +12,10 @@ fail() { #msg
 exit -1
 }
 
+yield() {
+ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
+}
+
 if [ ! -f available_events -o ! -f set_event -o ! -d events ]; then
 echo "event tracing is not supported"
 exit_unsupported
@@ -21,6 +25,9 @@ reset_tracer
 do_reset
 
 echo '*:*' > set_event
+
+yield
+
 count=`cat trace | grep -v ^# | wc -l`
 if [ $count -eq 0 ]; then
 fail "none of events are recorded"
@@ -29,6 +36,9 @@ fi
 do_reset
 
 echo 1 > events/enable
+
+yield
+
 count=`cat trace | grep -v ^# | wc -l`
 if [ $count -eq 0 ]; then
 fail "none of events are recorded"
@@ -37,6 +47,9 @@ fi
 do_reset
 
 echo 0 > events/enable
+
+yield
+
 count=`cat trace | grep -v ^# | wc -l`
 if [ $count -ne 0 ]; then
 fail "any of events should not be recorded"
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 1/2] DT: hwspinlock: Add binding documentation for Qualcomm hwmutex

2015-04-01 Thread Ohad Ben-Cohen

On Thu, Apr 2, 2015 at 12:32 AM, Tim Bird  wrote:
> I didn't see an Ack from Mark or Rob.  But I did see a question from
> Mark and response from Bjorn.
>
> Ohad - did you take this or are you still waiting for something?
>
> Who should I pester about this? :-)

Sorry, I can't take this without a DT ack.

Ohad.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] clk: Provide always-on clock support

2015-04-01 Thread Jassi Brar

On Wed, Apr 1, 2015 at 7:12 AM, Michael Turquette  wrote:
> Quoting Jassi Brar (2015-03-02 02:28:44)
>> On 2 March 2015 at 15:48, Lee Jones  wrote:
>> > On Mon, 02 Mar 2015, Jassi Brar wrote:
>> >
>> >> On Mon, Mar 2, 2015 at 2:06 PM, Lee Jones  wrote:
>> >> > On Sat, 28 Feb 2015, Jassi Brar wrote:
>> >> >
>> >> >> On 28 February 2015 at 02:44, Lee Jones  wrote:
>> >> >> > Lots of platforms contain clocks which if turned off would prove 
>> >> >> > fatal.
>> >> >> > The only way to recover from these catastrophic failures is to 
>> >> >> > restart
>> >> >> > the board(s).  Now, when a clock is registered with the framework it 
>> >> >> > is
>> >> >> > compared against a list of provided always-on clock names which must 
>> >> >> > be
>> >> >> > kept ungated.  If it matches, we enable the existing 
>> >> >> > CLK_IGNORE_UNUSED
>> >> >> > flag, which will prevent the common clk framework from attempting to
>> >> >> > gate it during the clk_disable_unused() procedure.
>> >> >> >
>> >> >> If a clock is critical on a certain board, it could be got+enabled
>> >> >> during early boot so there is always a user.
>> >> >
>> >> > I tried this.  There was push-back from the DT maintainers.
>> >> >
>> >> >   
>> >> > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/324417.html
>> >> >
>> >> Thanks, I wasn't aware of the history.
>> >>
>> >> >> To be able to do that from DT, maybe add a new, say, CLK_ALWAYS_ON
>> >> >> flag could be made to initialize the clock with one phantom user
>> >> >> already. Or just reuse the CLK_IGNORE_UNUSED?
>> >> >
>> >> > How is that different to what this set is doing?
>> >> >
>> >> The phantom user - that's there but none can see it.
>> >>
>> >> How about?
>> >>
>> >> +   of_property_for_each_string(np, "clock-always-on", prop, clkname) 
>> >> {
>> >> +   clk = __clk_lookup(clkname);
>> >> +   if (!clk)
>> >> +   continue;
>> >> +
>> >> +   clk->core->enable_count = 1;
>> >> +   clk->core->prepare_count = 1;
>> >> +   }
>> >
>> > This is only fractionally different from the current implementation.
>> >
>> > I believe the current way it slightly nicer, as we don't have to fake
>> > the user count.
>> >
>> Well... the user is indeed there, isn't it? It's just not known to
>> Linux. So 'fake' isn't most applicable here.
>> Otherwise you might have to stub out some existing and future
>> functions for CLK_IGNORE_UNUSED. And how do we explain to userspace
>> which would see power drawn but no user of the clock?
>
> Jassi,
>
> This is broken. What if the parent of this clock has
> {enable,prepare}_count of zero? The way we propagate these refcounts up
> the tree would fall over.
>
Yeah it needs to be done at higher level,
- clk->core->enable_count = 1;
- clk->core->prepare_count = 1;
+ clk_prepare_enable(clk);

cheers!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] x86, earlyprintk: Fix two 'defined but not used' compile warnings

2015-04-01 Thread Borislav Petkov

On Wed, Apr 01, 2015 at 10:32:04PM +0100, Mark Einon wrote:
> Two static functions are only used if CONFIG_PCI is defined,so only build them
> if this is the case. Fixes the build warnings:
> 
> arch/x86/kernel/early_printk.c:98:13: warning: ‘mem32_serial_out’ defined but 
> not used [-Wunused-function]
>  static void mem32_serial_out(unsigned long addr, int offset, int value)
>  ^
> arch/x86/kernel/early_printk.c:105:21: warning: ‘mem32_serial_in’ defined but 
> not used [-Wunused-function]
>  static unsigned int mem32_serial_in(unsigned long addr, int offset)
>  ^
> 
> Also convert a few related instances of uintXX_t to kernel specific uXX 
> defines.
> 
> Signed-off-by: Mark Einon 
> ---
> v2 - Move code to another #ifdef instead of creating a new ifdef pair after
> comment by Borislav Petkov .
> 
> v3 - Fixed commit errors from v2, changed some uintXX_t data types to 
> equivalent uXX.
> 
>  arch/x86/kernel/early_printk.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)

Applied, thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] ftracetest: Do not use usleep directly

2015-04-01 Thread Namhyung Kim

The usleep is only provided on distros from Redhat so running ftracetest
on other distro resulted in failures due to the missing usleep.

The reason of using [u]sleep in the test was to generate (scheduler)
events.  It can be done various ways like this:

yield() {  pinc localhost -c 1 || sleep .001 || usleep 1 || sleep 1; }

Reported-by: Michael Ellerman 
Reported-by: Dave Jones 
Reported-by: Luis Henriques 
Based-on-patch-by: Pádraig Brady 
Cc: Masami Hiramatsu 
Signed-off-by: Namhyung Kim 
---
 tools/testing/selftests/ftrace/test.d/event/event-enable.tc | 13 ++---
 .../selftests/ftrace/test.d/event/subsystem-enable.tc   | 13 ++---
 .../selftests/ftrace/test.d/event/toplevel-enable.tc| 13 +
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc 
b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
index 668616d9bb03..c40c139aaf2b 100644
--- a/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/event-enable.tc
@@ -12,6 +12,10 @@ fail() { #msg
 exit -1
 }
 
+yield() {
+ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
+}
+
 if [ ! -f set_event -o ! -d events/sched ]; then
 echo "event tracing is not supported"
 exit_unsupported
@@ -21,7 +25,8 @@ reset_tracer
 do_reset
 
 echo 'sched:sched_switch' > set_event
-usleep 1
+
+yield
 
 count=`cat trace | grep sched_switch | wc -l`
 if [ $count -eq 0 ]; then
@@ -31,7 +36,8 @@ fi
 do_reset
 
 echo 1 > events/sched/sched_switch/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep sched_switch | wc -l`
 if [ $count -eq 0 ]; then
@@ -41,7 +47,8 @@ fi
 do_reset
 
 echo 0 > events/sched/sched_switch/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep sched_switch | wc -l`
 if [ $count -ne 0 ]; then
diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc 
b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
index 655c415b6e7f..cbd98b71ee8a 100644
--- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
@@ -12,6 +12,10 @@ fail() { #msg
 exit -1
 }
 
+yield() {
+ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
+}
+
 if [ ! -f set_event -o ! -d events/sched ]; then
 echo "event tracing is not supported"
 exit_unsupported
@@ -21,7 +25,8 @@ reset_tracer
 do_reset
 
 echo 'sched:*' > set_event
-usleep 1
+
+yield
 
 count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -lt 3 ]; then
@@ -31,7 +36,8 @@ fi
 do_reset
 
 echo 1 > events/sched/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -lt 3 ]; then
@@ -41,7 +47,8 @@ fi
 do_reset
 
 echo 0 > events/sched/enable
-usleep 1
+
+yield
 
 count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -ne 0 ]; then
diff --git a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc 
b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
index 480845774007..65e2ab11 100644
--- a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
@@ -12,6 +12,10 @@ fail() { #msg
 exit -1
 }
 
+yield() {
+ping localhost -c 1 || sleep .001 || usleep 1 || sleep 1
+}
+
 if [ ! -f available_events -o ! -f set_event -o ! -d events ]; then
 echo "event tracing is not supported"
 exit_unsupported
@@ -21,6 +25,9 @@ reset_tracer
 do_reset
 
 echo '*:*' > set_event
+
+yield
+
 count=`cat trace | grep -v ^# | wc -l`
 if [ $count -eq 0 ]; then
 fail "none of events are recorded"
@@ -29,6 +36,9 @@ fi
 do_reset
 
 echo 1 > events/enable
+
+yield
+
 count=`cat trace | grep -v ^# | wc -l`
 if [ $count -eq 0 ]; then
 fail "none of events are recorded"
@@ -37,6 +47,9 @@ fi
 do_reset
 
 echo 0 > events/enable
+
+yield
+
 count=`cat trace | grep -v ^# | wc -l`
 if [ $count -ne 0 ]; then
 fail "any of events should not be recorded"
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rhashtable] [ INFO: suspicious RCU usage. ]

2015-04-01 Thread Herbert Xu

On Thu, Apr 02, 2015 at 12:11:35PM +0800, Fengguang Wu wrote:
> 
> Yes it is contained in next-20150401 which is bad:
> 
> # extra tests on tree/branch next/master
> git bisect  bad e954104e2b634b42811dad8d502cbf240f206df2  # 21:22  0- 
> 60  Add linux-next specific files for 20150401
> 
> The dmesg there is
> 
> [1.149409] test_firmware: interface ready
> [1.150293] Running resizable hashtable tests...
> [1.151209]   Adding 2048 keys
> [1.152069] [ cut here ]
> [1.152978] WARNING: CPU: 0 PID: 1 at lib/rhashtable.c:409 
> rhashtable_insert_rehash+0x9d/0x1d0()

I see.  This is actually a completely different problem.

---8<---
test_rhashtable: Remove bogus max_size setting

Now that resizing is completely automatic, we need to remove
the max_size setting or the test will fail.

Reported-by: Fengguang Wu 
Signed-off-by: Herbert Xu 

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index a42a0d4..b295754 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -44,7 +44,6 @@ static const struct rhashtable_params test_rht_params = {
.key_offset = offsetof(struct test_obj, value),
.key_len = sizeof(int),
.hashfn = jhash,
-   .max_size = 2, /* we expand/shrink manually here */
.nulls_base = (3U << RHT_BASE_SHIFT),
 };
 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rhashtable] [ INFO: suspicious RCU usage. ]

2015-04-01 Thread Fengguang Wu

On Thu, Apr 02, 2015 at 11:58:10AM +0800, Herbert Xu wrote:
> On Thu, Apr 02, 2015 at 08:52:11AM +0800, Fengguang Wu wrote:
> > Hi Herbert,
> > 
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > 
> > commit ccd57b1bd32460d27bbb9c599e795628a3c66983
> > Author: Herbert Xu 
> > AuthorDate: Tue Mar 24 00:50:28 2015 +1100
> > Commit: David S. Miller 
> > CommitDate: Mon Mar 23 22:07:52 2015 -0400
> > 
> > rhashtable: Add immediate rehash during insertion
> > 
> > This patch reintroduces immediate rehash during insertion.  If
> > we find during insertion that the table is full or the chain
> > length exceeds a set limit (currently 16 but may be disabled
> > with insecure_elasticity) then we will force an immediate rehash.
> > The rehash will contain an expansion if the table utilisation
> > exceeds 75%.
> > 
> > If this rehash fails then the insertion will fail.  Otherwise the
> > insertion will be reattempted in the new hash table.
> > 
> > Signed-off-by: Herbert Xu 
> > Acked-by: Thomas Graf 
> > Signed-off-by: David S. Miller 
> > 
> > [0.552992]   Adding 2048 keys
> > [0.553792] 
> > [0.554400] ===
> > [0.555285] [ INFO: suspicious RCU usage. ]
> > [0.556176] 4.0.0-rc4-01225-gccd57b1 #171 Not tainted
> > [0.557156] ---
> > [0.558044] lib/rhashtable.c:400 suspicious rcu_dereference_check() 
> > usage!
> 
> This should have been fixed by
> 
>   58be8a583d8d316448bafa5926414cfb83c02dec.
> 
> Can you check whether this commit was in your tested tree?

Yes it is contained in next-20150401 which is bad:

# extra tests on tree/branch next/master
git bisect  bad e954104e2b634b42811dad8d502cbf240f206df2  # 21:22  0- 
60  Add linux-next specific files for 20150401

The dmesg there is

[1.149409] test_firmware: interface ready
[1.150293] Running resizable hashtable tests...
[1.151209]   Adding 2048 keys
[1.152069] [ cut here ]
[1.152978] WARNING: CPU: 0 PID: 1 at lib/rhashtable.c:409 
rhashtable_insert_rehash+0x9d/0x1d0()
[1.154802] Modules linked in:
[1.155628] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.0.0-rc6-next-20150401 #1
[1.157185] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[1.159019]  81d1ce18 880010dc3d18 8189b1c3 

[1.160865]   880010dc3d58 810b12aa 
880010dc3d38
[1.162716]  8323a500 0010 88000beda780 
0001
[1.164558] Call Trace:
[1.165249]  [] dump_stack+0x4c/0x65
[1.166220]  [] warn_slowpath_common+0x8a/0xc0
[1.167267]  [] warn_slowpath_null+0x1a/0x20
[1.168317]  [] rhashtable_insert_rehash+0x9d/0x1d0
[1.169415]  [] test_rht_init+0x310/0x12ba
[1.170425]  [] ? do_one_initcall+0x80/0x200
[1.171466]  [] ? test_firmware_init+0x79/0x79
[1.172524]  [] do_one_initcall+0x90/0x200
[1.173552]  [] kernel_init_freeable+0x22d/0x2b5
[1.174625]  [] ? rest_init+0x140/0x140
[1.175617]  [] kernel_init+0xe/0xf0
[1.176572]  [] ret_from_fork+0x53/0x90
[1.177576]  [] ? rest_init+0x140/0x140
[1.178570] ---[ end trace 0ba1594e5c63400d ]---
[1.179477] 
[1.180071] ===
[1.180911] [ INFO: suspicious RCU usage. ]
[1.181768] 4.0.0-rc6-next-20150401 #1 Tainted: GW  
[1.182818] ---
[1.183688] lib/test_rhashtable.c:176 suspicious rcu_dereference_check() 
usage!
[1.185387] 
[1.185387] other info that might help us debug this:
[1.185387] 
[1.187369] 
[1.187369] rcu_scheduler_active = 1, debug_locks = 0
[1.188835] no locks held by swapper/0/1.
[1.189676] 
[1.189676] stack backtrace:
[1.190917] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
4.0.0-rc6-next-20150401 #1
[1.192595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[1.194428]  0001 880010dc3d78 8189b1c3 
0011
[1.196272]  880010db8000 880010dc3da8 81102d17 
880010f93d20
[1.207362]  88000c6530f0 0004 88000beda6c0 
880010dc3e18
[1.209218] Call Trace:
[1.209897]  [] dump_stack+0x4c/0x65
[1.210867]  [] lockdep_rcu_suspicious+0xe7/0x120
[1.211948]  [] test_rht_init+0x1168/0x12ba
[1.212968]  [] ? do_one_initcall+0x80/0x200
[1.214007]  [] ? test_firmware_init+0x79/0x79
[1.215060]  [

[PATCH v1.1] x86/mm/ASLR: Propagate ASLR status to kernel proper

2015-04-01 Thread Borislav Petkov

From: Borislav Petkov 
Date: Wed, 1 Apr 2015 12:49:52 +0200
Subject: [PATCH v1.1] x86/mm/ASLR: Propagate ASLR status to kernel proper

Commit

  e2b32e678513 ("x86, kaslr: randomize module base load address")

made module base address randomization unconditional and didn't regard
disabled KASLR due to CONFIG_HIBERNATION and command line option
"nokaslr". For more info see (now reverted) commit:

  f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")

In order to propagate ASLR status to kernel proper, we need a single bit
in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
top-down allocated bits for bits supposed to be used by the bootloader.

Originally-by: Jiri Kosina 
Suggested-by: "H. Peter Anvin" 
Cc: Kees Cook 
Cc: Ingo Molnar 
Signed-off-by: Borislav Petkov 
---

v1.1: Correct ASLR_FLAG bit type in boot.txt

 Documentation/x86/boot.txt|  6 ++
 arch/x86/boot/compressed/aslr.c   |  5 -
 arch/x86/boot/compressed/misc.c   |  5 -
 arch/x86/boot/compressed/misc.h   |  6 --
 arch/x86/include/asm/setup.h  |  5 +
 arch/x86/include/uapi/asm/bootparam.h |  1 +
 arch/x86/kernel/module.c  | 11 ++-
 arch/x86/kernel/setup.c   | 12 
 8 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index a75e3adaa39d..f84a03eea773 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -406,6 +406,12 @@ Protocol:  2.00+
- If 0, the protected-mode code is loaded at 0x1.
- If 1, the protected-mode code is loaded at 0x10.
 
+  Bit 1 (kernel internal): ALSR_FLAG
+   - Used internally by the compressed kernel to communicate
+ ASLR status to kernel proper.
+ If 1, ASLR enabled.
+ If 0, ASLR disabled.
+
   Bit 5 (write): QUIET_FLAG
- If 0, print early messages.
- If 1, suppress early messages.
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index bb1376381985..370e47d763b0 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -295,7 +295,8 @@ static unsigned long find_random_addr(unsigned long minimum,
return slots_fetch_random();
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
+unsigned char *choose_kernel_location(struct boot_params *boot_params,
+ unsigned char *input,
  unsigned long input_size,
  unsigned char *output,
  unsigned long output_size)
@@ -315,6 +316,8 @@ unsigned char *choose_kernel_location(unsigned char *input,
}
 #endif
 
+   boot_params->hdr.loadflags |= ASLR_FLAG;
+
/* Record the various known unsafe memory ranges. */
mem_avoid_init((unsigned long)input, input_size,
   (unsigned long)output, output_size);
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index a950864a64da..ca83518e405e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -377,6 +377,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, 
memptr heap,
 
real_mode = rmode;
 
+   /* Clear it for solely in-kernel use */
+   real_mode->hdr.loadflags &= ~ASLR_FLAG;
+
sanitize_boot_params(real_mode);
 
if (real_mode->screen_info.orig_video_mode == 7) {
@@ -401,7 +404,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, 
memptr heap,
 * the entire decompressed kernel plus relocation table, or the
 * entire decompressed kernel plus .bss and .brk sections.
 */
-   output = choose_kernel_location(input_data, input_len, output,
+   output = choose_kernel_location(real_mode, input_data, input_len, 
output,
output_len > run_size ? output_len
  : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 04477d68403f..89dd0d78013a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -57,7 +57,8 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
+unsigned char *choose_kernel_location(struct boot_params *boot_params,
+ unsigned char *input,
  unsigned long input_size,
  unsigned char *output,
  unsigned long output_size);
@@ -65,7 +66,8 @@ unsigned char *choose_kernel_location(unsigned char *input,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
+unsigned char *choose_kernel_location(st

Re: [rhashtable] [ INFO: suspicious RCU usage. ]

2015-04-01 Thread Herbert Xu

On Thu, Apr 02, 2015 at 08:52:11AM +0800, Fengguang Wu wrote:
> Hi Herbert,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> 
> commit ccd57b1bd32460d27bbb9c599e795628a3c66983
> Author: Herbert Xu 
> AuthorDate: Tue Mar 24 00:50:28 2015 +1100
> Commit: David S. Miller 
> CommitDate: Mon Mar 23 22:07:52 2015 -0400
> 
> rhashtable: Add immediate rehash during insertion
> 
> This patch reintroduces immediate rehash during insertion.  If
> we find during insertion that the table is full or the chain
> length exceeds a set limit (currently 16 but may be disabled
> with insecure_elasticity) then we will force an immediate rehash.
> The rehash will contain an expansion if the table utilisation
> exceeds 75%.
> 
> If this rehash fails then the insertion will fail.  Otherwise the
> insertion will be reattempted in the new hash table.
> 
> Signed-off-by: Herbert Xu 
> Acked-by: Thomas Graf 
> Signed-off-by: David S. Miller 
> 
> [0.552992]   Adding 2048 keys
> [0.553792] 
> [0.554400] ===
> [0.555285] [ INFO: suspicious RCU usage. ]
> [0.556176] 4.0.0-rc4-01225-gccd57b1 #171 Not tainted
> [0.557156] ---
> [0.558044] lib/rhashtable.c:400 suspicious rcu_dereference_check() usage!

This should have been fixed by

58be8a583d8d316448bafa5926414cfb83c02dec.

Can you check whether this commit was in your tested tree?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 06/10] virtio_pci: drop msi_off on probe

2015-04-01 Thread Fam Zheng

On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> pci core now disables msi on probe automatically,
> drop this from device-specific code.
> 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/virtio/virtio_pci_common.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index e894eb2..806bb2c 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -501,9 +501,6 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
>   INIT_LIST_HEAD(&vp_dev->virtqueues);
>   spin_lock_init(&vp_dev->lock);
>  
> - /* Disable MSI/MSIX to bring device to a known good state. */
> - pci_msi_off(pci_dev);
> -
>   /* enable the device */
>   rc = pci_enable_device(pci_dev);
>   if (rc)
> -- 
> MST
> 

Reviewed-by: Fam Zheng 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 05/10] pci: make msi/msix shutdown functions static

2015-04-01 Thread Fam Zheng

On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> pci_msi_shutdown and pci_msix_shutdown are now internal to msi.c, drop
> them from header and make them static.

Reviewed-by: Fam Zheng 

> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/linux/pci.h | 4 
>  drivers/pci/msi.c   | 4 ++--
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 211e9da..a34df45 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1209,11 +1209,9 @@ struct msix_entry {
>  
>  #ifdef CONFIG_PCI_MSI
>  int pci_msi_vec_count(struct pci_dev *dev);
> -void pci_msi_shutdown(struct pci_dev *dev);
>  void pci_disable_msi(struct pci_dev *dev);
>  int pci_msix_vec_count(struct pci_dev *dev);
>  int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int 
> nvec);
> -void pci_msix_shutdown(struct pci_dev *dev);
>  void pci_disable_msix(struct pci_dev *dev);
>  void pci_restore_msi_state(struct pci_dev *dev);
>  int pci_msi_enabled(void);
> @@ -1237,13 +1235,11 @@ static inline int pci_enable_msix_exact(struct 
> pci_dev *dev,
>  }
>  #else
>  static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; }
> -static inline void pci_msi_shutdown(struct pci_dev *dev) { }
>  static inline void pci_disable_msi(struct pci_dev *dev) { }
>  static inline int pci_msix_vec_count(struct pci_dev *dev) { return -ENOSYS; }
>  static inline int pci_enable_msix(struct pci_dev *dev,
> struct msix_entry *entries, int nvec)
>  { return -ENOSYS; }
> -static inline void pci_msix_shutdown(struct pci_dev *dev) { }
>  static inline void pci_disable_msix(struct pci_dev *dev) { }
>  static inline void pci_restore_msi_state(struct pci_dev *dev) { }
>  static inline int pci_msi_enabled(void) { return 0; }
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index f66be86..ea78a07 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -887,7 +887,7 @@ int pci_msi_vec_count(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_msi_vec_count);
>  
> -void pci_msi_shutdown(struct pci_dev *dev)
> +static void pci_msi_shutdown(struct pci_dev *dev)
>  {
>   struct msi_desc *desc;
>   u32 mask;
> @@ -993,7 +993,7 @@ int pci_enable_msix(struct pci_dev *dev, struct 
> msix_entry *entries, int nvec)
>  }
>  EXPORT_SYMBOL(pci_enable_msix);
>  
> -void pci_msix_shutdown(struct pci_dev *dev)
> +static void pci_msix_shutdown(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>  
> -- 
> MST
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 04/10] pci: don't disable msi/msix at shutdown

2015-04-01 Thread Fam Zheng

On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> This partially reverts commit d52877c7b1afb8c37ebe17e2005040b79cb618b0:
>   "pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2"
> 
> It's un-necessary now that we disable msi at start, and it actually
> turns out to cause problems: some device drivers don't register a level
> interrupt handler when they detect msi/msix capability, switching off
> msi while device is going causes device to assert a level interrupt
> which is never de-asserted, causing a kernel hang.
> 
> In particular, this was observed with virtio.
> 
> Cc: Yinghai Lu 
> Cc: Ulrich Obergfell 
> Cc: Rusty Russell 
> Reported-by: Fam Zheng 
> Signed-off-by: Michael S. Tsirkin 

Reviewed-by: Fam Zheng 
Tested-by: Fam Zheng 

> ---
>  drivers/pci/pci-driver.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 3cb2210..38a602c 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev)
>  
>   if (drv && drv->shutdown)
>   drv->shutdown(pci_dev);
> - pci_msi_shutdown(pci_dev);
> - pci_msix_shutdown(pci_dev);
>  
>  #ifdef CONFIG_KEXEC
>   /*
> -- 
> MST
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 03/10] pci: drop some duplicate code

2015-04-01 Thread Fam Zheng

On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> pci_msi_setup_pci_dev and pci_msi_off share a lot of code.
> This used to be justified since pci_msi_setup_pci_dev
> wasn't compiled in when CONFIG_PCI_MSI is off.
> Now that it is, let's reuse code.
> 
> Since pci_msi_off is used by early quirks, doing this requires calling
> pci_msi_setup_pci_dev early after device allocation, so move the call to
> pci_setup_device.

The dropping of duplicated code looks good, but doesn't the moving of
pci_msi_setup_pci_dev call belong to a separate patch?

Fam

> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/pci/pci.c   | 23 ---
>  drivers/pci/probe.c | 31 +++
>  2 files changed, 19 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..fcee8ea 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3106,26 +3106,11 @@ EXPORT_SYMBOL_GPL(pci_check_and_unmask_intx);
>   */
>  void pci_msi_off(struct pci_dev *dev)
>  {
> - int pos;
> - u16 control;
> + if (dev->msi_cap)
> + pci_msi_set_enable(dev, 0);
>  
> - /*
> -  * This looks like it could go in msi.c, but we need it even when
> -  * CONFIG_PCI_MSI=n.  For the same reason, we can't use
> -  * dev->msi_cap or dev->msix_cap here.
> -  */
> - pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
> - if (pos) {
> - pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &control);
> - control &= ~PCI_MSI_FLAGS_ENABLE;
> - pci_write_config_word(dev, pos + PCI_MSI_FLAGS, control);
> - }
> - pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
> - if (pos) {
> - pci_read_config_word(dev, pos + PCI_MSIX_FLAGS, &control);
> - control &= ~PCI_MSIX_FLAGS_ENABLE;
> - pci_write_config_word(dev, pos + PCI_MSIX_FLAGS, control);
> - }
> + if (dev->msix_cap)
> + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
>  }
>  EXPORT_SYMBOL_GPL(pci_msi_off);
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 50dd934..120772c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1084,6 +1084,18 @@ int pci_cfg_space_size(struct pci_dev *dev)
>  
>  #define LEGACY_IO_RESOURCE   (IORESOURCE_IO | IORESOURCE_PCI_FIXED)
>  
> +static void pci_msi_setup_pci_dev(struct pci_dev *dev)
> +{
> + dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
> + dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
> +
> + /* Disable the msi hardware to avoid screaming interrupts
> +  * during boot.  This is the power on reset default so
> +  * usually this should be a noop.
> +  */
> + pci_msi_off(dev);
> +}
> +
>  /**
>   * pci_setup_device - fill in class and map information of a device
>   * @dev: the device structure to fill
> @@ -1139,6 +1151,9 @@ int pci_setup_device(struct pci_dev *dev)
>   /* "Unknown power state" */
>   dev->current_state = PCI_UNKNOWN;
>  
> + /* MSI/MSI-X setup has to be done early since it's used by quirks. */
> + pci_msi_setup_pci_dev(dev);
> +
>   /* Early fixups, before probing the BARs */
>   pci_fixup_device(pci_fixup_early, dev);
>   /* device class may be changed after fixup */
> @@ -1483,26 +1498,10 @@ static struct pci_dev *pci_scan_device(struct pci_bus 
> *bus, int devfn)
>   return dev;
>  }
>  
> -static void pci_msi_setup_pci_dev(struct pci_dev *dev)
> -{
> - /* Disable the msi hardware to avoid screaming interrupts
> -  * during boot.  This is the power on reset default so
> -  * usually this should be a noop.
> -  */
> - dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
> - if (dev->msi_cap)
> - pci_msi_set_enable(dev, 0);
> -
> - dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
> - if (dev->msix_cap)
> - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> -}
> -
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>   /* MSI/MSI-X list */
>   pci_msi_init_pci_dev(dev);
> - pci_msi_setup_pci_dev(dev);
>  
>   /* Buffers for saving PCIe and PCI-X capabilities */
>   pci_allocate_cap_save_buffers(dev);
> -- 
> MST
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-04-01 Thread Omar Sandoval

On Wed, Apr 01, 2015 at 01:22:42PM +0200, David Sterba wrote:
> On Wed, Apr 01, 2015 at 12:03:28AM -0700, Omar Sandoval wrote:
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -1024,6 +1024,10 @@ static int btrfs_show_options(struct seq_file *seq, 
> > struct dentry *dentry)
> > struct btrfs_root *root = info->tree_root;
> > char *compress_type;
> >  
> > +   if (dentry != dentry->d_sb->s_root) {
> > +   seq_puts(seq, ",subvol=");
> > +   seq_dentry(seq, dentry, " \t\n\\");
> 
> Unfortunatelly this does not work if the default subvolume is not the
> toplevel one and the implicit mount (ie. without subvol=) is used. Then
> this leads to subvol=/ although it should be subvol=/the/default .
> 
> There was a patch to build the path in the show_options callback, but it
> looked too heavy (taking locks, doing lookups). This is unrelated to the
> problem reported by Timo, though the fix might also fix this one.

Hm, yeah, that's unfortunate, thanks for pointing that out. It looks
like we can get the subvolume ID reliably:


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef19..a74ddb3 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1024,6 +1024,8 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
struct btrfs_root *root = info->tree_root;
char *compress_type;
 
+   seq_printf(seq, ",subvolid=%llu",
+ BTRFS_I(d_inode(dentry))->root->root_key.objectid);
if (btrfs_test_opt(root, DEGRADED))
seq_puts(seq, ",degraded");
if (btrfs_test_opt(root, NODATASUM))


With that, userspace has enough information to determine whether a
subvolume is mounted. That would be racy with concurrent mounts,
though...

Just to throw another idea out there, what about doing something like my
VFS patch, but then making it optional whether the kernel should error
out on a mounted subvolume, e.g., with a flag to the ioctl? btrfs-progs
could default to the original EBUSY behavior for users who depend on it,
but we could add a "force" flag to `btrfs subvolume delete` in order to
avert the DoS situation Eric wants to avoid. Thoughts on that?

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH perf/core ] perf-probe: Fix to track down unnamed union/structure members

2015-04-01 Thread Masami Hiramatsu

(2015/04/01 23:41), Arnaldo Carvalho de Melo wrote:
> Em Wed, Apr 01, 2015 at 06:08:17PM +0900, Masami Hiramatsu escreveu:
>> Ping?
> 
> 
> 
>>> With this patch, perf probe can access unnamed fields.
>>>   -
>>>   #./perf probe -nfx ./perf lock__delete ops 'locked_ops=ops->locked.ops'
>>>   Added new event:
>>> probe_perf:lock__delete (on lock__delete in 
>>> /home/mhiramat/ksrc/linux-3/tools/perf/perf with ops 
>>> locked_ops=ops->locked.ops)
> 
>>>   You can now use it in all perf tools, such as:
> 
>>>   perf record -e probe_perf:lock__delete -aR sleep 1
>>>   -
> 
>>> The original report of this issue is: https://lkml.org/lkml/2015/3/5/431
> 
> what am I doing wrong?
> 
> [root@ssdandy ~]# perf probe ~acme/bin/perf lock__delete 'locked_ops=ops'
> Added new event:
>   probe_perf:lock__delete (on lock__delete in /home/acme/bin/perf with 
> locked_ops=ops)
> 
> You can now use it in all perf tools, such as:
> 
>   perf record -e probe_perf:lock__delete -aR sleep 1
> 
> [root@ssdandy ~]# perf probe -d probe_perf:*
> Removed event: probe_perf:lock__delete
> [root@ssdandy ~]# perf probe ~acme/bin/perf lock__delete 
> 'locked_ops=ops->locked'
> Semantic error: locked must be referred by '.'
>   Error: Failed to add events.
> [root@ssdandy ~]# perf probe ~acme/bin/perf lock__delete 
> 'locked_ops=ops.locked'
> Semantic error: locked must be referred by '->'
>   Error: Failed to add events.
> [root@ssdandy ~]# perf probe ~acme/bin/perf lock__delete 
> 'locked_ops=ops->locked'
> Semantic error: locked must be referred by '.'
>   Error: Failed to add events.
> [root@ssdandy ~]# perf probe ~acme/bin/perf lock__delete 
> 'locked_ops=ops->locked.ops'
> Semantic error: locked must be referred by '.'
>   Error: Failed to add events.
> [root@ssdandy ~]#

Oops, thank you for reporting!
I must miss something...

Thank you,

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: msgrcv: use freezable blocking call

2015-04-01 Thread Maninder Singh

Hi Andrew,
Both patches looks fine to us.

Thank You

> On Wed, Apr 01, 2015 at 05:18:46AM +, Maninder Singh wrote:
> > Hi Andrew,
> > Thanks for making new patch, Actually there is some problem with our mail 
> > editor.
> > It changes tabs with spaces and corrupts the patch, we are solving the same 
> > at our end.
> > Thats why i am sending you signed -off by only for both patches.
> 
> Sort it and resend, no real great hurry with these patches, right?

I tend not to bother too much about occasional messy patches.  These
ones appear to be be the first patches from these contributors and
fixing them up only takes a couple of minutes.  If Maninder's team
expects to send more patches in the future then yes, please fix this
stuff.  But for now, the important thing is to get these kernel
problems sorted out.

> > 1. For msgrcv: use freezable blocking call
> > Signed-off-by: Yogesh Gaur 
> > Signed-off-by: Maninder Singh 
> > Signed-off-by: Manjeet Pawar 
> 
> Did you really pass around that patch through 3 people or did it take
> all three of you to modify those two lines?
> 
> Should some of those SoBs be a reviewed-by perhaps?
> 
> 
> > > For Peter's Review comment:- This is what, no why mentioned
> > 
> > This call was selected to be converted to a freezable call because
> > it doesn't hold any locks or release any resources when interrupted
> > that might be needed by another freezing task or a kernel driver
> > during suspend, and is a common site where idle userspace tasks are
> > blocked.
> 
> Please put such things in the Changelog so that we can see you've
> thought about things.

I have made that change.

Maninder, we currently have yourself as the primary author of
"restart_syscall: use freezable blocking call".  Is that correct, or
should that be Yogesh Gaur?
--> It is correct

Below are my latest copies of these two patches.  How do they look?
-- > Looks fine, Thnaks for making patches.

From: Yogesh Gaur 
Subject: ipc/msg.c: use freezable blocking call

Avoid waking up every thread sleeping in a msgrcv call during suspend and
resume by calling a freezable blocking call.  Previous patches modified
the freezer to avoid sending wakeups to threads that are blocked in
freezable blocking calls.

Ref: https://lkml.org/lkml/2013/5/1/424

Backtrace: 
[] (__schedule+0x0/0x5d8) from [] (schedule+0x8c/0x90)
[] (schedule+0x0/0x90) from [] (do_msgrcv+0x2e0/0x368)
[] (do_msgrcv+0x0/0x368) from [] (SyS_msgrcv+0x2c/0x38)
[] (SyS_msgrcv+0x0/0x38) from [] (ret_fast_syscall+0x0/0x48)
tPlay0Cb2   R running  0   297204 0x0001

This call was selected to be converted to a freezable call because it
doesn't hold any locks or release any resources when interrupted that
might be needed by another freezing task or a kernel driver during
suspend, and is a common site where idle userspace tasks are blocked.

Signed-off-by: Yogesh Gaur 
Signed-off-by: Manjeet Pawar 
Signed-off-by: Maninder Singh 
Reviewed-by : Ajeet Yadav 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Signed-off-by: Andrew Morton 
---

 ipc/msg.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN ipc/msg.c~msgrcv-use-freezable-blocking-call ipc/msg.c
--- a/ipc/msg.c~msgrcv-use-freezable-blocking-call
+++ a/ipc/msg.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -915,7 +916,7 @@ long do_msgrcv(int msqid, void __user *b
 
ipc_unlock_object(&msq->q_perm);
rcu_read_unlock();
-   schedule();
+   freezable_schedule();
 
/* Lockless receive, part 1:
 * Disable preemption.  We don't hold a reference to the queue
_





From: Maninder Singh 
Subject: kernel/time/hrtimer.c: restart_syscall: use freezable blocking call

Avoid waking up every thread sleeping in a restart_syscall call during
suspend and resume by calling a freezable blocking call.  Previous patches
modified the freezer to avoid sending wakeups to threads that are blocked
in freezable blocking calls.

Ref: https://lkml.org/lkml/2013/5/1/424

Backtrace: 
[] (__schedule+0x0/0x5d8) from [] (schedule+0x8c/0x90)
[] (schedule+0x0/0x90) from [] 
(schedule_hrtimeout_range_clock+0xdc/0x110)
[] (schedule_hrtimeout_range_clock+0x0/0x110) from [] 
(schedule_hrtimeout_range+0x1c/0x20)
 r9:d16c9be0 r8:8b7d9c2c r7: r6: r5:d16c8028
[] (schedule_hrtimeout_range+0x0/0x20) from [] 
(poll_schedule_timeout+0x48/0x6c)
[] (poll_schedule_timeout+0x0/0x6c) from [] 
(do_sys_poll+0x2c8/0x378) r5:d16c9f78 r4:
[] (do_sys_poll+0x0/0x378) from [] 
(do_restart_poll+0x40/0x5c)
[] (do_restart_poll+0x0/0x5c) from [] 
(sys_restart_syscall+0x2c/0x30) r4:fe7a
[] (sys_restart_syscall+0x0/0x30) from [] 
(ret_fast_syscall+0x0/0x48)

This call was selected to be converted to a freezable call because it
doesn't hold any locks or release any resources when interrupted that
might be needed by another freezing task or a kernel driver during
suspend, and is a commo

Re: [PATCH kernel v7 04/31] vfio: powerpc/spapr: Use it_page_size

2015-04-01 Thread Alex Williamson

On Thu, 2015-04-02 at 14:40 +1100, Alexey Kardashevskiy wrote:
> On 04/02/2015 01:50 PM, Alex Williamson wrote:
> > On Thu, 2015-04-02 at 13:30 +1100, Alexey Kardashevskiy wrote:
> >> On 04/02/2015 08:48 AM, Alex Williamson wrote:
> >>> On Sat, 2015-03-28 at 01:54 +1100, Alexey Kardashevskiy wrote:
>  This makes use of the it_page_size from the iommu_table struct
>  as page size can differ.
> 
>  This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
>  as recently introduced IOMMU_PAGE_XXX macros do not include
>  IOMMU_PAGE_SHIFT.
> 
>  Signed-off-by: Alexey Kardashevskiy 
>  Reviewed-by: David Gibson 
>  ---
> drivers/vfio/vfio_iommu_spapr_tce.c | 26 +-
> 1 file changed, 13 insertions(+), 13 deletions(-)
> 
>  diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>  b/drivers/vfio/vfio_iommu_spapr_tce.c
>  index f835e63..8bbee22 100644
>  --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>  +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>  @@ -91,7 +91,7 @@ static int tce_iommu_enable(struct tce_container 
>  *container)
>    * enforcing the limit based on the max that the guest can map.
>    */
>   down_write(¤t->mm->mmap_sem);
>  -npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
>  +npages = (tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT;
>   locked = current->mm->locked_vm + npages;
>   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
>  @@ -120,7 +120,7 @@ static void tce_iommu_disable(struct tce_container 
>  *container)
> 
>   down_write(¤t->mm->mmap_sem);
>   current->mm->locked_vm -= (container->tbl->it_size <<
>  -IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
>  +container->tbl->it_page_shift) >> PAGE_SHIFT;
>   up_write(¤t->mm->mmap_sem);
> }
> 
>  @@ -222,7 +222,7 @@ static long tce_iommu_build(struct tce_container 
>  *container,
>   tce, ret);
>   break;
>   }
>  -tce += IOMMU_PAGE_SIZE_4K;
>  +tce += IOMMU_PAGE_SIZE(tbl);
> >>>
> >>>
> >>> Is PAGE_SIZE ever smaller than IOMMU_PAGE_SIZE(tbl)?  IOW, can the page
> >>> we got from get_user_pages_fast() ever not completely fill the tce
> >>> entry?
> >>
> >>
> >> Yes. IOMMU_PAGE_SIZE is 4K/64K/16M (16M is with huge pages enabled in QEMU
> >> with -mempath), PAGE_SIZE is 4K/64K (normally 64K).
> >
> > Isn't that a problem then that you're filling the tce with processor
> > page sizes via get_user_pages_fast(), but incrementing the tce by by
> > IOMMU page size?  For example, if PAGE_SIZE = 4K and IOMMU_PAGE_SIZE !=
> > 4K have we really pinned all of the memory backed by the tce?Where do
> > you make sure the 4K page is really contiguous for the IOMMU page?
> 
> 
> Aaaah. This is just not supported. Instead, after the previous patch 
> ("vfio: powerpc/spapr: Check that TCE page size is equal to it_page_size", 
> which need fixed subject), tce_page_is_contained(page4K, 64K) will return 
> false and the caller - tce_iommu_build() - will return -EPERM.

Ok, makes sense.  Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] kselftests: timers: Make set-timer-lat fail more gracefully for !CAP_WAKE_ALARM

2015-04-01 Thread John Stultz

The set-timer-lat test fails when testing CLOCK_BOOTTIME_ALARM
or CLOCK_REALTIME_ALARM when the user isn't running as root or
with CAP_WAKE_ALARM.

So this patch improves the error checking so we report the
issue more clearly and continue rather then reporting a failure.

Cc: Shuah Khan 
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Richard Cochran 
Signed-off-by: John Stultz 
---
v2: Fix a few checkpatch warnings.

 tools/testing/selftests/timers/set-timer-lat.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/timers/set-timer-lat.c 
b/tools/testing/selftests/timers/set-timer-lat.c
index 2ed3267..4fc98c5 100644
--- a/tools/testing/selftests/timers/set-timer-lat.c
+++ b/tools/testing/selftests/timers/set-timer-lat.c
@@ -139,6 +139,13 @@ int do_timer(int clock_id, int flags)
 
err = timer_create(clock_id, &se, &tm1);
if (err) {
+   if ((clock_id == CLOCK_REALTIME_ALARM) ||
+   (clock_id == CLOCK_BOOTTIME_ALARM)) {
+   printf("%-22s %s missing CAP_WAKE_ALARM?: 
[UNSUPPORTED]\n",
+   clockstring(clock_id),
+   flags ? "ABSTIME":"RELTIME");
+   return 0;
+   }
printf("%s - timer_create() failed\n", clockstring(clock_id));
return -1;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] kselftests: timers: Make set-timer-lat fail more gracefully for !CAP_WAKE_ALARM

2015-04-01 Thread John Stultz

On Tue, Mar 31, 2015 at 8:55 AM, Shuah Khan  wrote:
> Hi John,
>
> I am seeing checkpatch warnings on this patch. See below.

Sorry!

> On 03/26/2015 05:31 AM, Prarit Bhargava wrote:
>>
>>
>> On 03/25/2015 07:44 PM, John Stultz wrote:
>>> The set-timer-lat test fails when testing CLOCK_BOOTTIME_ALARM
>>> or CLOCK_REALTIME_ALARM when the user isn't running as root or
>>> with CAP_WAKE_ALARM.
>>>
>>> So this patch improves the error checking so we report the
>>> issue more clearly and continue rather then reporting a failure.
>>>
>>> Cc: Shuah Khan 
>>> Cc: Prarit Bhargava 
>>> Cc: Thomas Gleixner 
>>> Cc: Richard Cochran 
>>> Signed-off-by: John Stultz 
>>> Signed-off-by: John Stultz 
>
> WARNING: Duplicate signature
> #115:
> Signed-off-by: John Stultz 

Fixed.

>>> ---
>>>  tools/testing/selftests/timers/set-timer-lat.c | 7 +++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/timers/set-timer-lat.c 
>>> b/tools/testing/selftests/timers/set-timer-lat.c
>>> index 3ea2eff..dbc9537c 100644
>>> --- a/tools/testing/selftests/timers/set-timer-lat.c
>>> +++ b/tools/testing/selftests/timers/set-timer-lat.c
>>> @@ -139,6 +139,13 @@ int do_timer(int clock_id, int flags)
>>>
>>>  err = timer_create(clock_id, &se, &tm1);
>>>  if (err) {
>>> +if ((clock_id == CLOCK_REALTIME_ALARM)
>>> +|| (clock_id == CLOCK_BOOTTIME_ALARM)) {
>>
>> I dunno of there is actually a CodingStyle rule for this, but I've always 
>> seen
>> this written with the operator on the first line:
>
> Yes it would be good to fix this one as well when you re-do the patch.

Fixed.

>>
>>   if ((clock_id == CLOCK_REALTIME_ALARM) ||
>> (clock_id == CLOCK_BOOTTIME_ALARM)) {
>>
>>> +printf("%-22s %s missing CAP_WAKE_ALARM?: 
>>> [UNSUPPORTED]\n",
>>> +clockstring(clock_id),
>>> +flags ? "ABSTIME":"RELTIME");
>>
>
> WARNING: line over 80 characters
> #130: FILE: tools/testing/selftests/timers/set-timer-lat.c:144:
> +   printf("%-22s %s missing CAP_WAKE_ALARM?: 
> [UNSUPPORTED]\n",

This I probably will leave, as the alternative is breaking the text
string across lines, which checkpatch will also gripe about.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/21] 4.1 time and rtc changes for tip/timers/core

2015-04-01 Thread John Stultz

Hey Ingo, Thomas, Peter,

I wanted to send along my remaining 4.1 queue, which contains:

* y2038 fixes for the timekeeping persistent- and boot-clock interfaces.
  (Xunlei)
* y2038 fixes for RTC drivers (Xunlei)
* Small suspend/resume timing fixes (Xunlei)
* Minor cleanups requested by Ingo (Me)

Let me know if you have any objections.

thanks
-john

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Arnd Bergmann 
Cc: Xunlei Pang 


John Stultz (3):
  clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50%
safety margin
  timekeeping: Change timekeeping_check_update() to take a tk_read_base
  time: Rework debugging variables so they aren't global

Xunlei Pang (18):
  time: Add y2038 safe read_boot_clock64()
  time: Add y2038 safe read_persistent_clock64()
  time: Add y2038 safe update_persistent_clock64()
  ARM: OMAP: 32k counter: Provide y2038-safe
omap_read_persistent_clock() replacement
  ARM: tegra: clock: Provide y2038-safe tegra_read_persistent_clock()
replacement
  ARM: time: Provide read_boot_clock64() and read_persistent_clock64()
  rtc: Provide y2038 safe rtc_class_ops.set_mmss() replacement
  rtc/test: Update driver to address y2038/y2106 issues
  rtc/ab3100: Update driver to address y2038/y2106 issues
  rtc/mc13xxx: Update driver to address y2038/y2106 issues
  rtc/mxc: Modify rtc_update_alarm() not to touch the alarm time
  rtc/mxc: Convert get_alarm_or_time()/set_alarm_or_time() to use
time64_t
  rtc/mxc: Update driver to address y2038/y2106 issues
  alpha: rtc: Change to use rtc_class_ops's set_mmss64()
  time: Don't build timekeeping_inject_sleeptime64() if no one uses it
  rtc: Remove redundant rtc_valid_tm() from rtc_resume()
  time: Fix a bug in timekeeping_suspend() with no persistent clock
  time: rtc: Don't bother into rtc_resume() for the nonstop clocksource

 arch/alpha/kernel/rtc.c |   8 +-
 arch/arm/include/asm/mach/time.h|   3 +-
 arch/arm/kernel/time.c  |   6 +-
 arch/arm/plat-omap/counter_32k.c|  18 ++--
 arch/mips/lasat/sysctl.c|   4 +-
 drivers/clocksource/tegra20_timer.c |  15 ++-
 drivers/rtc/class.c |   8 +-
 drivers/rtc/interface.c |   8 +-
 drivers/rtc/rtc-ab3100.c|  55 ++-
 drivers/rtc/rtc-mc13xxx.c   |  32 +++
 drivers/rtc/rtc-mxc.c   |  55 ---
 drivers/rtc/rtc-test.c  |  19 +++-
 drivers/rtc/systohc.c   |   7 +-
 include/linux/rtc.h |   1 +
 include/linux/timekeeper_internal.h |  18 +++-
 include/linux/timekeeping.h |  12 +--
 kernel/time/clocksource.c   |   7 +-
 kernel/time/ntp.c   |  13 ++-
 kernel/time/timekeeping.c   | 178 +---
 19 files changed, 261 insertions(+), 206 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH kernel v7 04/31] vfio: powerpc/spapr: Use it_page_size

2015-04-01 Thread Alexey Kardashevskiy


On 04/02/2015 01:50 PM, Alex Williamson wrote:

On Thu, 2015-04-02 at 13:30 +1100, Alexey Kardashevskiy wrote:

On 04/02/2015 08:48 AM, Alex Williamson wrote:

On Sat, 2015-03-28 at 01:54 +1100, Alexey Kardashevskiy wrote:

This makes use of the it_page_size from the iommu_table struct
as page size can differ.

This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
   drivers/vfio/vfio_iommu_spapr_tce.c | 26 +-
   1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index f835e63..8bbee22 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -91,7 +91,7 @@ static int tce_iommu_enable(struct tce_container *container)
 * enforcing the limit based on the max that the guest can map.
 */
down_write(¤t->mm->mmap_sem);
-   npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
+   npages = (tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT;
locked = current->mm->locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
@@ -120,7 +120,7 @@ static void tce_iommu_disable(struct tce_container 
*container)

down_write(¤t->mm->mmap_sem);
current->mm->locked_vm -= (container->tbl->it_size <<
-   IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
+   container->tbl->it_page_shift) >> PAGE_SHIFT;
up_write(¤t->mm->mmap_sem);
   }

@@ -222,7 +222,7 @@ static long tce_iommu_build(struct tce_container *container,
tce, ret);
break;
}
-   tce += IOMMU_PAGE_SIZE_4K;
+   tce += IOMMU_PAGE_SIZE(tbl);



Is PAGE_SIZE ever smaller than IOMMU_PAGE_SIZE(tbl)?  IOW, can the page
we got from get_user_pages_fast() ever not completely fill the tce
entry?



Yes. IOMMU_PAGE_SIZE is 4K/64K/16M (16M is with huge pages enabled in QEMU
with -mempath), PAGE_SIZE is 4K/64K (normally 64K).


Isn't that a problem then that you're filling the tce with processor
page sizes via get_user_pages_fast(), but incrementing the tce by by
IOMMU page size?  For example, if PAGE_SIZE = 4K and IOMMU_PAGE_SIZE !=
4K have we really pinned all of the memory backed by the tce?Where do
you make sure the 4K page is really contiguous for the IOMMU page?



Aaaah. This is just not supported. Instead, after the previous patch 
("vfio: powerpc/spapr: Check that TCE page size is equal to it_page_size", 
which need fixed subject), tce_page_is_contained(page4K, 64K) will return 
false and the caller - tce_iommu_build() - will return -EPERM.



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/21] time: Add y2038 safe update_persistent_clock64()

2015-04-01 Thread John Stultz

From: Xunlei Pang 

As part of addressing in-kernel y2038 issues, this patch adds
update_persistent_clock64() and replaces all the call sites of
update_persistent_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
update_persistent_clock().

This allows architecture specific implementations to be converted
independently, and eventually y2038-unsafe update_persistent_clock()
can be removed after all its architecture specific implementations
have been converted to update_persistent_clock64().

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Suggested-by: Arnd Bergmann 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/systohc.c   |  2 +-
 include/linux/timekeeping.h |  1 +
 kernel/time/ntp.c   | 13 -
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index eb71872..ef3c07a 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -11,7 +11,7 @@
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
  *
- * Replacement for the NTP platform function update_persistent_clock
+ * Replacement for the NTP platform function update_persistent_clock64
  * that stores time for later retrieval by rtc_hctosys.
  *
  * Returns 0 on successful RTC update, -ENODEV if a RTC update is not
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 4c0f76f..7a2369d 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -272,6 +272,7 @@ extern void read_persistent_clock64(struct timespec64 *ts);
 extern void read_boot_clock(struct timespec *ts);
 extern void read_boot_clock64(struct timespec64 *ts);
 extern int update_persistent_clock(struct timespec now);
+extern int update_persistent_clock64(struct timespec64 now);
 
 
 #endif
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 0f60b08..42d1bc7 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -459,6 +459,16 @@ out:
return leap;
 }
 
+#ifdef CONFIG_GENERIC_CMOS_UPDATE
+int __weak update_persistent_clock64(struct timespec64 now64)
+{
+   struct timespec now;
+
+   now = timespec64_to_timespec(now64);
+   return update_persistent_clock(now);
+}
+#endif
+
 #if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
 static void sync_cmos_clock(struct work_struct *work);
 
@@ -494,8 +504,9 @@ static void sync_cmos_clock(struct work_struct *work)
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
 #ifdef CONFIG_GENERIC_CMOS_UPDATE
-   fail = update_persistent_clock(timespec64_to_timespec(adjust));
+   fail = update_persistent_clock64(adjust);
 #endif
+
 #ifdef CONFIG_RTC_SYSTOHC
if (fail == -ENODEV)
fail = rtc_set_ntp_time(adjust);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/21] ARM: OMAP: 32k counter: Provide y2038-safe omap_read_persistent_clock() replacement

2015-04-01 Thread John Stultz

From: Xunlei Pang 

As part of addressing "y2038 problem" for in-kernel uses, this
patch adds the y2038-safe omap_read_persistent_clock64() using
timespec64.

Because we rely on some subsequent changes to convert arm multiarch
support, omap_read_persistent_clock() will be removed then.

Also remove the needless spinlock, because read_persistent_clock()
doesn't run simultaneously.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Tony Lindgren 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 arch/arm/plat-omap/counter_32k.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c
index 43cf745..b7b7b07 100644
--- a/arch/arm/plat-omap/counter_32k.c
+++ b/arch/arm/plat-omap/counter_32k.c
@@ -44,24 +44,20 @@ static u64 notrace omap_32k_read_sched_clock(void)
 }
 
 /**
- * omap_read_persistent_clock -  Return time from a persistent clock.
+ * omap_read_persistent_clock64 -  Return time from a persistent clock.
  *
  * Reads the time from a source which isn't disabled during PM, the
  * 32k sync timer.  Convert the cycles elapsed since last read into
- * nsecs and adds to a monotonically increasing timespec.
+ * nsecs and adds to a monotonically increasing timespec64.
  */
-static struct timespec persistent_ts;
+static struct timespec64 persistent_ts;
 static cycles_t cycles;
 static unsigned int persistent_mult, persistent_shift;
-static DEFINE_SPINLOCK(read_persistent_clock_lock);
 
-static void omap_read_persistent_clock(struct timespec *ts)
+static void omap_read_persistent_clock64(struct timespec64 *ts)
 {
unsigned long long nsecs;
cycles_t last_cycles;
-   unsigned long flags;
-
-   spin_lock_irqsave(&read_persistent_clock_lock, flags);
 
last_cycles = cycles;
cycles = sync32k_cnt_reg ? readl_relaxed(sync32k_cnt_reg) : 0;
@@ -69,11 +65,17 @@ static void omap_read_persistent_clock(struct timespec *ts)
nsecs = clocksource_cyc2ns(cycles - last_cycles,
persistent_mult, persistent_shift);
 
-   timespec_add_ns(&persistent_ts, nsecs);
+   timespec64_add_ns(&persistent_ts, nsecs);
 
*ts = persistent_ts;
+}
+
+static void omap_read_persistent_clock(struct timespec *ts)
+{
+   struct timespec64 ts64;
 
-   spin_unlock_irqrestore(&read_persistent_clock_lock, flags);
+   omap_read_persistent_clock64(&ts64);
+   *ts = timespec64_to_timespec(ts64);
 }
 
 /**
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/21] ARM: tegra: clock: Provide y2038-safe tegra_read_persistent_clock() replacement

2015-04-01 Thread John Stultz

From: Xunlei Pang 

As part of addressing "y2038 problem" for in-kernel uses, this
patch adds the y2038-safe tegra_read_persistent_clock64() using
timespec64.

Because we rely on some subsequent changes to convert arm multiarch
support, tegra_read_persistent_clock() will be removed then.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Thierry Reding 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/clocksource/tegra20_timer.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/tegra20_timer.c 
b/drivers/clocksource/tegra20_timer.c
index d8a3a4e..4a0a603 100644
--- a/drivers/clocksource/tegra20_timer.c
+++ b/drivers/clocksource/tegra20_timer.c
@@ -51,7 +51,7 @@
 static void __iomem *timer_reg_base;
 static void __iomem *rtc_base;
 
-static struct timespec persistent_ts;
+static struct timespec64 persistent_ts;
 static u64 persistent_ms, last_persistent_ms;
 
 static struct delay_timer tegra_delay_timer;
@@ -120,26 +120,33 @@ static u64 tegra_rtc_read_ms(void)
 }
 
 /*
- * tegra_read_persistent_clock -  Return time from a persistent clock.
+ * tegra_read_persistent_clock64 -  Return time from a persistent clock.
  *
  * Reads the time from a source which isn't disabled during PM, the
  * 32k sync timer.  Convert the cycles elapsed since last read into
- * nsecs and adds to a monotonically increasing timespec.
+ * nsecs and adds to a monotonically increasing timespec64.
  * Care must be taken that this funciton is not called while the
  * tegra_rtc driver could be executing to avoid race conditions
  * on the RTC shadow register
  */
-static void tegra_read_persistent_clock(struct timespec *ts)
+static void tegra_read_persistent_clock64(struct timespec64 *ts)
 {
u64 delta;
-   struct timespec *tsp = &persistent_ts;
 
last_persistent_ms = persistent_ms;
persistent_ms = tegra_rtc_read_ms();
delta = persistent_ms - last_persistent_ms;
 
-   timespec_add_ns(tsp, delta * NSEC_PER_MSEC);
-   *ts = *tsp;
+   timespec64_add_ns(&persistent_ts, delta * NSEC_PER_MSEC);
+   *ts = persistent_ts;
+}
+
+static void tegra_read_persistent_clock(struct timespec *ts)
+{
+   struct timespec ts64;
+
+   tegra_read_persistent_clock64(&ts64);
+   *ts = timespec64_to_timespec(ts64);
 }
 
 static unsigned long tegra_delay_timer_read_counter_long(void)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/21] time: Add y2038 safe read_persistent_clock64()

2015-04-01 Thread John Stultz

From: Xunlei Pang 

As part of addressing in-kernel y2038 issues, this patch adds
read_persistent_clock64() and replaces all the call sites of
read_persistent_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
read_persistent_clock().

This allows architecture specific implementations to be converted
independently, and eventually the y2038 unsafe read_persistent_clock()
can be removed after all its architecture specific implementations
have been converted to read_persistent_clock64().

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Suggested-by: Arnd Bergmann 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 arch/mips/lasat/sysctl.c|  4 ++--
 include/linux/timekeeping.h |  1 +
 kernel/time/timekeeping.c   | 22 --
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/mips/lasat/sysctl.c b/arch/mips/lasat/sysctl.c
index 3b7f65c..cf9b4633 100644
--- a/arch/mips/lasat/sysctl.c
+++ b/arch/mips/lasat/sysctl.c
@@ -75,11 +75,11 @@ static int rtctmp;
 int proc_dolasatrtc(struct ctl_table *table, int write,
   void *buffer, size_t *lenp, loff_t *ppos)
 {
-   struct timespec ts;
+   struct timespec64 ts;
int r;
 
if (!write) {
-   read_persistent_clock(&ts);
+   read_persistent_clock64(&ts);
rtctmp = ts.tv_sec;
/* check for time < 0 and set to 0 */
if (rtctmp < 0)
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 18d27a3..4c0f76f 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -268,6 +268,7 @@ static inline bool has_persistent_clock(void)
 }
 
 extern void read_persistent_clock(struct timespec *ts);
+extern void read_persistent_clock64(struct timespec64 *ts);
 extern void read_boot_clock(struct timespec *ts);
 extern void read_boot_clock64(struct timespec64 *ts);
 extern int update_persistent_clock(struct timespec now);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 50c4bec..39df498 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1173,6 +1173,14 @@ void __weak read_persistent_clock(struct timespec *ts)
ts->tv_nsec = 0;
 }
 
+void __weak read_persistent_clock64(struct timespec64 *ts64)
+{
+   struct timespec ts;
+
+   read_persistent_clock(&ts);
+   *ts64 = timespec_to_timespec64(ts);
+}
+
 /**
  * read_boot_clock -  Return time of the system start.
  *
@@ -1205,10 +1213,8 @@ void __init timekeeping_init(void)
struct clocksource *clock;
unsigned long flags;
struct timespec64 now, boot, tmp;
-   struct timespec ts;
 
-   read_persistent_clock(&ts);
-   now = timespec_to_timespec64(ts);
+   read_persistent_clock64(&now);
if (!timespec64_valid_strict(&now)) {
pr_warn("WARNING: Persistent clock returned invalid value!\n"
" Check your CMOS/BIOS settings.\n");
@@ -1278,7 +1284,7 @@ static void __timekeeping_inject_sleeptime(struct 
timekeeper *tk,
  * timekeeping_inject_sleeptime64 - Adds suspend interval to timeekeeping 
values
  * @delta: pointer to a timespec64 delta value
  *
- * This hook is for architectures that cannot support read_persistent_clock
+ * This hook is for architectures that cannot support read_persistent_clock64
  * because their RTC/persistent clock is only accessible when irqs are enabled.
  *
  * This function should only be called by rtc_resume(), and allows
@@ -1325,12 +1331,10 @@ void timekeeping_resume(void)
struct clocksource *clock = tk->tkr_mono.clock;
unsigned long flags;
struct timespec64 ts_new, ts_delta;
-   struct timespec tmp;
cycle_t cycle_now, cycle_delta;
bool suspendtime_found = false;
 
-   read_persistent_clock(&tmp);
-   ts_new = timespec_to_timespec64(tmp);
+   read_persistent_clock64(&ts_new);
 
clockevents_resume();
clocksource_resume();
@@ -1408,10 +1412,8 @@ int timekeeping_suspend(void)
unsigned long flags;
struct timespec64   delta, delta_delta;
static struct timespec64old_delta;
-   struct timespec tmp;
 
-   read_persistent_clock(&tmp);
-   timekeeping_suspend_time = timespec_to_timespec64(tmp);
+   read_persistent_clock64(&timekeeping_suspend_time);
 
/*
 * On some systems the persistent_clock can not be detected at
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/21] rtc/ab3100: Update driver to address y2038/y2106 issues

2015-04-01 Thread John Stultz

From: Xunlei Pang 

This driver has a number of y2038/y2106 issues.

This patch resolves them by:
- Replace rtc_tm_to_time() with rtc_tm_to_time64()
- Replace rtc_time_to_tm() with rtc_time64_to_tm()
- Change ab3100_rtc_set_mmss() to use rtc_class_ops's set_mmss64()

After this patch, the driver should not have any remaining
y2038/y2106 issues.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Acked-by: Linus Walleij 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/rtc-ab3100.c | 55 
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/rtc/rtc-ab3100.c b/drivers/rtc/rtc-ab3100.c
index 1d0340f..9b725c5 100644
--- a/drivers/rtc/rtc-ab3100.c
+++ b/drivers/rtc/rtc-ab3100.c
@@ -43,21 +43,21 @@
 /*
  * RTC clock functions and device struct declaration
  */
-static int ab3100_rtc_set_mmss(struct device *dev, unsigned long secs)
+static int ab3100_rtc_set_mmss(struct device *dev, time64_t secs)
 {
u8 regs[] = {AB3100_TI0, AB3100_TI1, AB3100_TI2,
 AB3100_TI3, AB3100_TI4, AB3100_TI5};
unsigned char buf[6];
-   u64 fat_time = (u64) secs * AB3100_RTC_CLOCK_RATE * 2;
+   u64 hw_counter = secs * AB3100_RTC_CLOCK_RATE * 2;
int err = 0;
int i;
 
-   buf[0] = (fat_time) & 0xFF;
-   buf[1] = (fat_time >> 8) & 0xFF;
-   buf[2] = (fat_time >> 16) & 0xFF;
-   buf[3] = (fat_time >> 24) & 0xFF;
-   buf[4] = (fat_time >> 32) & 0xFF;
-   buf[5] = (fat_time >> 40) & 0xFF;
+   buf[0] = (hw_counter) & 0xFF;
+   buf[1] = (hw_counter >> 8) & 0xFF;
+   buf[2] = (hw_counter >> 16) & 0xFF;
+   buf[3] = (hw_counter >> 24) & 0xFF;
+   buf[4] = (hw_counter >> 32) & 0xFF;
+   buf[5] = (hw_counter >> 40) & 0xFF;
 
for (i = 0; i < 6; i++) {
err = abx500_set_register_interruptible(dev, 0,
@@ -75,7 +75,7 @@ static int ab3100_rtc_set_mmss(struct device *dev, unsigned 
long secs)
 
 static int ab3100_rtc_read_time(struct device *dev, struct rtc_time *tm)
 {
-   unsigned long time;
+   time64_t time;
u8 rtcval;
int err;
 
@@ -88,7 +88,7 @@ static int ab3100_rtc_read_time(struct device *dev, struct 
rtc_time *tm)
dev_info(dev, "clock not set (lost power)");
return -EINVAL;
} else {
-   u64 fat_time;
+   u64 hw_counter;
u8 buf[6];
 
/* Read out time registers */
@@ -98,22 +98,21 @@ static int ab3100_rtc_read_time(struct device *dev, struct 
rtc_time *tm)
if (err != 0)
return err;
 
-   fat_time = ((u64) buf[5] << 40) | ((u64) buf[4] << 32) |
+   hw_counter = ((u64) buf[5] << 40) | ((u64) buf[4] << 32) |
((u64) buf[3] << 24) | ((u64) buf[2] << 16) |
((u64) buf[1] << 8) | (u64) buf[0];
-   time = (unsigned long) (fat_time /
-   (u64) (AB3100_RTC_CLOCK_RATE * 2));
+   time = hw_counter / (u64) (AB3100_RTC_CLOCK_RATE * 2);
}
 
-   rtc_time_to_tm(time, tm);
+   rtc_time64_to_tm(time, tm);
 
return rtc_valid_tm(tm);
 }
 
 static int ab3100_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alarm)
 {
-   unsigned long time;
-   u64 fat_time;
+   time64_t time;
+   u64 hw_counter;
u8 buf[6];
u8 rtcval;
int err;
@@ -134,11 +133,11 @@ static int ab3100_rtc_read_alarm(struct device *dev, 
struct rtc_wkalrm *alarm)
 AB3100_AL0, buf, 4);
if (err)
return err;
-   fat_time = ((u64) buf[3] << 40) | ((u64) buf[2] << 32) |
+   hw_counter = ((u64) buf[3] << 40) | ((u64) buf[2] << 32) |
((u64) buf[1] << 24) | ((u64) buf[0] << 16);
-   time = (unsigned long) (fat_time / (u64) (AB3100_RTC_CLOCK_RATE * 2));
+   time = hw_counter / (u64) (AB3100_RTC_CLOCK_RATE * 2);
 
-   rtc_time_to_tm(time, &alarm->time);
+   rtc_time64_to_tm(time, &alarm->time);
 
return rtc_valid_tm(&alarm->time);
 }
@@ -147,17 +146,17 @@ static int ab3100_rtc_set_alarm(struct device *dev, 
struct rtc_wkalrm *alarm)
 {
u8 regs[] = {AB3100_AL0, AB3100_AL1, AB3100_AL2, AB3100_AL3};
unsigned char buf[4];
-   unsigned long secs;
-   u64 fat_time;
+   time64_t secs;
+   u64 hw_counter;
int err;
int i;
 
-   rtc_tm_to_time(&alarm->time, &secs);
-   fat_time = (u64) secs * AB3100_RTC_CLOCK_RATE * 2;
-   buf[0] = (fat_time >> 16) & 0xFF;
-   buf[1] = (fat_time >> 24) & 0xFF;
-   buf[2] = (fat_time >> 32) & 0xFF;
-   buf[3] = (fat_time >> 40) & 0xFF;
+   secs = rtc_tm_to_time64(&alarm->time);
+   hw_counter = secs * AB3100_RTC_CLOCK_RATE * 2;
+   buf[0] = (hw_counter >> 16) & 0xFF;
+   buf[1] = (

Re: [PATCH v5 02/10] pci: move pci_msi_init_pci_dev to probe.c

2015-04-01 Thread Fam Zheng

On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0
> "PCI: msi: Disable msi interrupts when we initialize a pci device"
> fixes kexec when the booting kernel does not enable msi interupts.
> 
> Unfortunately the relevant functionality is in msi.c so it isn't
> compiled in when CONFIG_PCI_MSI is off, which means such configurations
> would still get interrupt storms.
> 
> Fix by moving part of the functionality probe.c, and compiling it
> unconditionally.

Reviewed-by: Fam Zheng 

> 
> Cc: Eric W. Biederman 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/pci/msi.c   | 12 
>  drivers/pci/probe.c | 16 
>  2 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 9942f68..f66be86 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1041,18 +1041,6 @@ EXPORT_SYMBOL(pci_msi_enabled);
>  void pci_msi_init_pci_dev(struct pci_dev *dev)
>  {
>   INIT_LIST_HEAD(&dev->msi_list);
> -
> - /* Disable the msi hardware to avoid screaming interrupts
> -  * during boot.  This is the power on reset default so
> -  * usually this should be a noop.
> -  */
> - dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
> - if (dev->msi_cap)
> - pci_msi_set_enable(dev, 0);
> -
> - dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
> - if (dev->msix_cap)
> - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
>  }
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8d2f400..50dd934 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1483,10 +1483,26 @@ static struct pci_dev *pci_scan_device(struct pci_bus 
> *bus, int devfn)
>   return dev;
>  }
>  
> +static void pci_msi_setup_pci_dev(struct pci_dev *dev)
> +{
> + /* Disable the msi hardware to avoid screaming interrupts
> +  * during boot.  This is the power on reset default so
> +  * usually this should be a noop.
> +  */
> + dev->msi_cap = pci_find_capability(dev, PCI_CAP_ID_MSI);
> + if (dev->msi_cap)
> + pci_msi_set_enable(dev, 0);
> +
> + dev->msix_cap = pci_find_capability(dev, PCI_CAP_ID_MSIX);
> + if (dev->msix_cap)
> + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>   /* MSI/MSI-X list */
>   pci_msi_init_pci_dev(dev);
> + pci_msi_setup_pci_dev(dev);
>  
>   /* Buffers for saving PCIe and PCI-X capabilities */
>   pci_allocate_cap_save_buffers(dev);
> -- 
> MST
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/21] ARM: time: Provide read_boot_clock64() and read_persistent_clock64()

2015-04-01 Thread John Stultz

From: Xunlei Pang 

As part of addressing "y2038 problem" for in-kernel uses, this
patch converts read_boot_clock() to read_boot_clock64() and
read_persistent_clock() to read_persistent_clock64() using
timespec64 by converting clock_access_fn to use timespec64.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Thierry Reding  (for tegra part)
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 arch/arm/include/asm/mach/time.h|  3 +--
 arch/arm/kernel/time.c  |  6 +++---
 arch/arm/plat-omap/counter_32k.c| 10 +-
 drivers/clocksource/tegra20_timer.c | 10 +-
 4 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/arch/arm/include/asm/mach/time.h b/arch/arm/include/asm/mach/time.h
index 90c12e1..0f79e4d 100644
--- a/arch/arm/include/asm/mach/time.h
+++ b/arch/arm/include/asm/mach/time.h
@@ -12,8 +12,7 @@
 
 extern void timer_tick(void);
 
-struct timespec;
-typedef void (*clock_access_fn)(struct timespec *);
+typedef void (*clock_access_fn)(struct timespec64 *);
 extern int register_persistent_clock(clock_access_fn read_boot,
 clock_access_fn read_persistent);
 
diff --git a/arch/arm/kernel/time.c b/arch/arm/kernel/time.c
index 0cc7e58..a66e37e 100644
--- a/arch/arm/kernel/time.c
+++ b/arch/arm/kernel/time.c
@@ -76,7 +76,7 @@ void timer_tick(void)
 }
 #endif
 
-static void dummy_clock_access(struct timespec *ts)
+static void dummy_clock_access(struct timespec64 *ts)
 {
ts->tv_sec = 0;
ts->tv_nsec = 0;
@@ -85,12 +85,12 @@ static void dummy_clock_access(struct timespec *ts)
 static clock_access_fn __read_persistent_clock = dummy_clock_access;
 static clock_access_fn __read_boot_clock = dummy_clock_access;;
 
-void read_persistent_clock(struct timespec *ts)
+void read_persistent_clock64(struct timespec64 *ts)
 {
__read_persistent_clock(ts);
 }
 
-void read_boot_clock(struct timespec *ts)
+void read_boot_clock64(struct timespec64 *ts)
 {
__read_boot_clock(ts);
 }
diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c
index b7b7b07..2438b96 100644
--- a/arch/arm/plat-omap/counter_32k.c
+++ b/arch/arm/plat-omap/counter_32k.c
@@ -70,14 +70,6 @@ static void omap_read_persistent_clock64(struct timespec64 
*ts)
*ts = persistent_ts;
 }
 
-static void omap_read_persistent_clock(struct timespec *ts)
-{
-   struct timespec64 ts64;
-
-   omap_read_persistent_clock64(&ts64);
-   *ts = timespec64_to_timespec(ts64);
-}
-
 /**
  * omap_init_clocksource_32k - setup and register counter 32k as a
  * kernel clocksource
@@ -118,7 +110,7 @@ int __init omap_init_clocksource_32k(void __iomem *vbase)
}
 
sched_clock_register(omap_32k_read_sched_clock, 32, 32768);
-   register_persistent_clock(NULL, omap_read_persistent_clock);
+   register_persistent_clock(NULL, omap_read_persistent_clock64);
pr_info("OMAP clocksource: 32k_counter at 32768 Hz\n");
 
return 0;
diff --git a/drivers/clocksource/tegra20_timer.c 
b/drivers/clocksource/tegra20_timer.c
index 4a0a603..5a112d7 100644
--- a/drivers/clocksource/tegra20_timer.c
+++ b/drivers/clocksource/tegra20_timer.c
@@ -141,14 +141,6 @@ static void tegra_read_persistent_clock64(struct 
timespec64 *ts)
*ts = persistent_ts;
 }
 
-static void tegra_read_persistent_clock(struct timespec *ts)
-{
-   struct timespec ts64;
-
-   tegra_read_persistent_clock64(&ts64);
-   *ts = timespec64_to_timespec(ts64);
-}
-
 static unsigned long tegra_delay_timer_read_counter_long(void)
 {
return readl(timer_reg_base + TIMERUS_CNTR_1US);
@@ -259,7 +251,7 @@ static void __init tegra20_init_rtc(struct device_node *np)
else
clk_prepare_enable(clk);
 
-   register_persistent_clock(NULL, tegra_read_persistent_clock);
+   register_persistent_clock(NULL, tegra_read_persistent_clock64);
 }
 CLOCKSOURCE_OF_DECLARE(tegra20_rtc, "nvidia,tegra20-rtc", tegra20_init_rtc);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/21] rtc/mxc: Modify rtc_update_alarm() not to touch the alarm time

2015-04-01 Thread John Stultz

From: Xunlei Pang 

rtc_class_ops's set_alarm() shouldn't deal with the alarm date,
as this is handled in the rtc core.

See rtc_dev_ioctl()'s RTC_ALM_SET and RTC_WKALM_SET cases.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/rtc-mxc.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c
index 3c3f8d1..a7b218f 100644
--- a/drivers/rtc/rtc-mxc.c
+++ b/drivers/rtc/rtc-mxc.c
@@ -173,29 +173,18 @@ static void set_alarm_or_time(struct device *dev, int 
time_alarm, u32 time)
  * This function updates the RTC alarm registers and then clears all the
  * interrupt status bits.
  */
-static int rtc_update_alarm(struct device *dev, struct rtc_time *alrm)
+static void rtc_update_alarm(struct device *dev, struct rtc_time *alrm)
 {
-   struct rtc_time alarm_tm, now_tm;
-   unsigned long now, time;
+   unsigned long time;
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
void __iomem *ioaddr = pdata->ioaddr;
 
-   now = get_alarm_or_time(dev, MXC_RTC_TIME);
-   rtc_time_to_tm(now, &now_tm);
-   alarm_tm.tm_year = now_tm.tm_year;
-   alarm_tm.tm_mon = now_tm.tm_mon;
-   alarm_tm.tm_mday = now_tm.tm_mday;
-   alarm_tm.tm_hour = alrm->tm_hour;
-   alarm_tm.tm_min = alrm->tm_min;
-   alarm_tm.tm_sec = alrm->tm_sec;
-   rtc_tm_to_time(&alarm_tm, &time);
+   rtc_tm_to_time(alrm, &time);
 
/* clear all the interrupt status bits */
writew(readw(ioaddr + RTC_RTCISR), ioaddr + RTC_RTCISR);
set_alarm_or_time(dev, MXC_RTC_ALARM, time);
-
-   return 0;
 }
 
 static void mxc_rtc_irq_enable(struct device *dev, unsigned int bit,
@@ -346,11 +335,8 @@ static int mxc_rtc_set_alarm(struct device *dev, struct 
rtc_wkalrm *alrm)
 {
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
-   int ret;
 
-   ret = rtc_update_alarm(dev, &alrm->time);
-   if (ret)
-   return ret;
+   rtc_update_alarm(dev, &alrm->time);
 
memcpy(&pdata->g_rtc_alarm, &alrm->time, sizeof(struct rtc_time));
mxc_rtc_irq_enable(dev, RTC_ALM_BIT, alrm->enabled);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/21] alpha: rtc: Change to use rtc_class_ops's set_mmss64()

2015-04-01 Thread John Stultz

From: Xunlei Pang 

Change alpha_rtc_set_mmss() and remote_set_mmss() to use
rtc_class_ops's set_mmss64().

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 arch/alpha/kernel/rtc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c
index c8d284d..f535a3f 100644
--- a/arch/alpha/kernel/rtc.c
+++ b/arch/alpha/kernel/rtc.c
@@ -116,7 +116,7 @@ alpha_rtc_set_time(struct device *dev, struct rtc_time *tm)
 }
 
 static int
-alpha_rtc_set_mmss(struct device *dev, unsigned long nowtime)
+alpha_rtc_set_mmss(struct device *dev, time64_t nowtime)
 {
int retval = 0;
int real_seconds, real_minutes, cmos_minutes;
@@ -211,7 +211,7 @@ alpha_rtc_ioctl(struct device *dev, unsigned int cmd, 
unsigned long arg)
 static const struct rtc_class_ops alpha_rtc_ops = {
.read_time = alpha_rtc_read_time,
.set_time = alpha_rtc_set_time,
-   .set_mmss = alpha_rtc_set_mmss,
+   .set_mmss64 = alpha_rtc_set_mmss,
.ioctl = alpha_rtc_ioctl,
 };
 
@@ -276,7 +276,7 @@ do_remote_mmss(void *data)
 }
 
 static int
-remote_set_mmss(struct device *dev, unsigned long now)
+remote_set_mmss(struct device *dev, time64_t now)
 {
union remote_data x;
if (smp_processor_id() != boot_cpuid) {
@@ -290,7 +290,7 @@ remote_set_mmss(struct device *dev, unsigned long now)
 static const struct rtc_class_ops remote_rtc_ops = {
.read_time = remote_read_time,
.set_time = remote_set_time,
-   .set_mmss = remote_set_mmss,
+   .set_mmss64 = remote_set_mmss,
.ioctl = alpha_rtc_ioctl,
 };
 #endif
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/21] time: Fix a bug in timekeeping_suspend() with no persistent clock

2015-04-01 Thread John Stultz

From: Xunlei Pang 

When there's no persistent clock, normally timekeeping_suspend_time
should always be zero, but this can break in timekeeping_suspend().

At T1, there was a system suspend, so old_delta was assigned T1.
After some time, one time adjustment happened, and xtime got the
value of T1-dt(0s
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7d799d3..6dafcea 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1255,7 +1255,7 @@ void __init timekeeping_init(void)
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 }
 
-/* time in seconds when suspend began */
+/* time in seconds when suspend began for persistent clock */
 static struct timespec64 timekeeping_suspend_time;
 
 /**
@@ -1430,24 +1430,26 @@ int timekeeping_suspend(void)
timekeeping_forward_now(tk);
timekeeping_suspended = 1;
 
-   /*
-* To avoid drift caused by repeated suspend/resumes,
-* which each can add ~1 second drift error,
-* try to compensate so the difference in system time
-* and persistent_clock time stays close to constant.
-*/
-   delta = timespec64_sub(tk_xtime(tk), timekeeping_suspend_time);
-   delta_delta = timespec64_sub(delta, old_delta);
-   if (abs(delta_delta.tv_sec)  >= 2) {
+   if (has_persistent_clock()) {
/*
-* if delta_delta is too large, assume time correction
-* has occured and set old_delta to the current delta.
+* To avoid drift caused by repeated suspend/resumes,
+* which each can add ~1 second drift error,
+* try to compensate so the difference in system time
+* and persistent_clock time stays close to constant.
 */
-   old_delta = delta;
-   } else {
-   /* Otherwise try to adjust old_system to compensate */
-   timekeeping_suspend_time =
-   timespec64_add(timekeeping_suspend_time, delta_delta);
+   delta = timespec64_sub(tk_xtime(tk), timekeeping_suspend_time);
+   delta_delta = timespec64_sub(delta, old_delta);
+   if (abs(delta_delta.tv_sec) >= 2) {
+   /*
+* if delta_delta is too large, assume time correction
+* has occurred and set old_delta to the current delta.
+*/
+   old_delta = delta;
+   } else {
+   /* Otherwise try to adjust old_system to compensate */
+   timekeeping_suspend_time =
+   timespec64_add(timekeeping_suspend_time, 
delta_delta);
+   }
}
 
timekeeping_update(tk, TK_MIRROR);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/21] rtc: Provide y2038 safe rtc_class_ops.set_mmss() replacement

2015-04-01 Thread John Stultz

From: Xunlei Pang 

Currently the rtc_class_op's set_mmss() function takes a 32bit second
value (on 32bit systems), which is problematic for dates past y2038.

This patch provides a safe version named set_mmss64() using y2038 safe
time64_t.

After this patch, set_mmss() is deprecated and all its users will be
fixed to use set_mmss64(), it can be removed when having no users.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
[jstultz: Add whitespace fix for checkpatch]
Signed-off-by: John Stultz 
---
 drivers/rtc/interface.c | 8 +++-
 drivers/rtc/systohc.c   | 5 -
 include/linux/rtc.h | 1 +
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 37215cf..d43ee40 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -72,7 +72,11 @@ int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm)
err = -ENODEV;
else if (rtc->ops->set_time)
err = rtc->ops->set_time(rtc->dev.parent, tm);
-   else if (rtc->ops->set_mmss) {
+   else if (rtc->ops->set_mmss64) {
+   time64_t secs64 = rtc_tm_to_time64(tm);
+
+   err = rtc->ops->set_mmss64(rtc->dev.parent, secs64);
+   } else if (rtc->ops->set_mmss) {
time64_t secs64 = rtc_tm_to_time64(tm);
err = rtc->ops->set_mmss(rtc->dev.parent, secs64);
} else
@@ -96,6 +100,8 @@ int rtc_set_mmss(struct rtc_device *rtc, unsigned long secs)
 
if (!rtc->ops)
err = -ENODEV;
+   else if (rtc->ops->set_mmss64)
+   err = rtc->ops->set_mmss64(rtc->dev.parent, secs);
else if (rtc->ops->set_mmss)
err = rtc->ops->set_mmss(rtc->dev.parent, secs);
else if (rtc->ops->read_time && rtc->ops->set_time) {
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index ef3c07a..7728d5e 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -35,7 +35,10 @@ int rtc_set_ntp_time(struct timespec64 now)
if (rtc) {
/* rtc_hctosys exclusively uses UTC, so we call set_time here,
 * not set_mmss. */
-   if (rtc->ops && (rtc->ops->set_time || rtc->ops->set_mmss))
+   if (rtc->ops &&
+   (rtc->ops->set_time ||
+rtc->ops->set_mmss64 ||
+rtc->ops->set_mmss))
err = rtc_set_time(rtc, &tm);
rtc_class_close(rtc);
}
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index dcad7ee..8dcf682 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -77,6 +77,7 @@ struct rtc_class_ops {
int (*read_alarm)(struct device *, struct rtc_wkalrm *);
int (*set_alarm)(struct device *, struct rtc_wkalrm *);
int (*proc)(struct device *, struct seq_file *);
+   int (*set_mmss64)(struct device *, time64_t secs);
int (*set_mmss)(struct device *, unsigned long secs);
int (*read_callback)(struct device *, int data);
int (*alarm_irq_enable)(struct device *, unsigned int enabled);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/21] time: Rework debugging variables so they aren't global

2015-04-01 Thread John Stultz

Ingo suggested that the timekeeping debugging variables
recently added should not be global, and should be tied
to the timekeeper's read_base.

Thus this patch implements that suggestion.

I'm a little hesitant here, since the tkr structure
has been carefully designed to fit in a cacheline.
However, these additions are all at the end of the
structure and are conditionally compiled.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: John Stultz 
---
 include/linux/timekeeper_internal.h | 18 +-
 kernel/time/timekeeping.c   | 33 ++---
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index fb86963..9b33027 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -20,9 +20,13 @@
  * @shift: Shift value for scaled math conversion
  * @xtime_nsec: Shifted (fractional) nano seconds offset for readout
  * @base:  ktime_t (nanoseconds) base time for readout
+ * @last_warning:   Warning ratelimiter (DEBUG_TIMEKEEPING)
+ * @underflow_seen: Underflow warning flag (DEBUG_TIMEKEEPING)
+ * @overflow_seen:  Overflow warning flag (DEBUG_TIMEKEEPING)
  *
  * This struct has size 56 byte on 64 bit. Together with a seqcount it
- * occupies a single 64byte cache line.
+ * occupies a single 64byte cache line (when DEBUG_TIMEKEEPING is not
+ * enabled).
  *
  * The struct is separate from struct timekeeper as it is also used
  * for a fast NMI safe accessors.
@@ -36,6 +40,18 @@ struct tk_read_base {
u32 shift;
u64 xtime_nsec;
ktime_t base;
+#ifdef CONFIG_DEBUG_TIMEKEEPING
+   longlast_warning;
+   /*
+* These simple flag variables are managed
+* without locks, which is racy, but ok since
+* we don't really care about being super
+* precise about how many events were seen,
+* just that a problem was observed.
+*/
+   int underflow_seen;
+   int overflow_seen;
+#endif
 };
 
 /**
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 676896e..ad7a80f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,19 +118,6 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
 #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
-/*
- * These simple flag variables are managed
- * without locks, which is racy, but ok since
- * we don't really care about being super
- * precise about how many events were seen,
- * just that a problem was observed.
- */
-static int timekeeping_underflow_seen;
-static int timekeeping_overflow_seen;
-
-/* last_warning is only modified under the timekeeping lock */
-static long timekeeping_last_warning;
-
 static void timekeeping_check_update(struct tk_read_base *tkr, cycle_t offset)
 {
 
@@ -149,24 +136,24 @@ static void timekeeping_check_update(struct tk_read_base 
*tkr, cycle_t offset)
}
}
 
-   if (timekeeping_underflow_seen) {
-   if (jiffies - timekeeping_last_warning > WARNING_FREQ) {
+   if (tkr->underflow_seen) {
+   if (jiffies - tkr->last_warning > WARNING_FREQ) {
printk_deferred("WARNING: Underflow in clocksource '%s' 
observed, time update ignored.\n", name);
printk_deferred(" Please report this, consider 
using a different clocksource, if possible.\n");
printk_deferred(" Your kernel is probably still 
fine.\n");
-   timekeeping_last_warning = jiffies;
+   tkr->last_warning = jiffies;
}
-   timekeeping_underflow_seen = 0;
+   tkr->underflow_seen = 0;
}
 
-   if (timekeeping_overflow_seen) {
-   if (jiffies - timekeeping_last_warning > WARNING_FREQ) {
+   if (tkr->overflow_seen) {
+   if (jiffies - tkr->last_warning > WARNING_FREQ) {
printk_deferred("WARNING: Overflow in clocksource '%s' 
observed, time update capped.\n", name);
printk_deferred(" Please report this, consider 
using a different clocksource, if possible.\n");
printk_deferred(" Your kernel is probably still 
fine.\n");
-   timekeeping_last_warning = jiffies;
+   tkr->last_warning = jiffies;
}
-   timekeeping_overflow_seen = 0;
+   tkr->overflow_seen = 0;
}
 }
 
@@ -197,13 +184,13 @@ static inline cycle_t timekeeping_get_delta(struct 
tk_read_base *tkr)
 * mask-relative negative values.
 */
if (unlikely((~delta & mask) < (mask

[PATCH 19/21] clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin

2015-04-01 Thread John Stultz

Ingo noted that the description of clocks_calc_max_nsecs()'s
50% safety margin was somewhat circular. So this patch tries
to improve the comment to better explain what we mean by the
50% safety margin and why we need it.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: John Stultz 
---
 kernel/time/clocksource.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c3be3c7..15facb1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -472,8 +472,11 @@ static u32 clocksource_max_adjustment(struct clocksource 
*cs)
  * @max_cyc:   maximum cycle value before potential overflow (does not include
  * any safety margin)
  *
- * NOTE: This function includes a safety margin of 50%, so that bad clock 
values
- * can be detected.
+ * NOTE: This function includes a safety margin of 50%, in other words, we
+ * return half the number of nanoseconds the hardware counter can technically
+ * cover. This is done so that we can potentially detect problems caused by
+ * delayed timers or bad hardware, which might result in time intervals that
+ * are larger then what the math used can handle without overflows.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 
*max_cyc)
 {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/21] rtc/mxc: Update driver to address y2038/y2106 issues

2015-04-01 Thread John Stultz

From: Xunlei Pang 

This driver has a number of y2038/y2106 issues.

This patch resolves them by:
- Replace rtc_time_to_tm() with rtc_time64_to_tm()
- Replace rtc_tm_to_time() with rtc_tm_to_time64()
- Change mxc_rtc_set_mmss() to use rtc_class_ops's set_mmss64()

After this patch, the driver should not have any remaining
y2038/y2106 issues.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/rtc-mxc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c
index 83cba23..09d422b 100644
--- a/drivers/rtc/rtc-mxc.c
+++ b/drivers/rtc/rtc-mxc.c
@@ -286,7 +286,7 @@ static int mxc_rtc_read_time(struct device *dev, struct 
rtc_time *tm)
 /*
  * This function sets the internal RTC time based on tm in Gregorian date.
  */
-static int mxc_rtc_set_mmss(struct device *dev, unsigned long time)
+static int mxc_rtc_set_mmss(struct device *dev, time64_t time)
 {
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
@@ -297,9 +297,9 @@ static int mxc_rtc_set_mmss(struct device *dev, unsigned 
long time)
if (is_imx1_rtc(pdata)) {
struct rtc_time tm;
 
-   rtc_time_to_tm(time, &tm);
+   rtc_time64_to_tm(time, &tm);
tm.tm_year = 70;
-   rtc_tm_to_time(&tm, &time);
+   time = rtc_tm_to_time64(&tm);
}
 
/* Avoid roll-over from reading the different registers */
@@ -347,7 +347,7 @@ static int mxc_rtc_set_alarm(struct device *dev, struct 
rtc_wkalrm *alrm)
 static struct rtc_class_ops mxc_rtc_ops = {
.release= mxc_rtc_release,
.read_time  = mxc_rtc_read_time,
-   .set_mmss   = mxc_rtc_set_mmss,
+   .set_mmss64 = mxc_rtc_set_mmss,
.read_alarm = mxc_rtc_read_alarm,
.set_alarm  = mxc_rtc_set_alarm,
.alarm_irq_enable   = mxc_rtc_alarm_irq_enable,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/21] time: Don't build timekeeping_inject_sleeptime64() if no one uses it

2015-04-01 Thread John Stultz

From: Xunlei Pang 

timekeeping_inject_sleeptime64() is only used by RTC suspend/resume,
so add build dependencies on the necessary RTC related macros.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Xunlei Pang 
[jstultz: Improve commit message clarity]
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 39df498..7d799d3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1280,6 +1280,7 @@ static void __timekeeping_inject_sleeptime(struct 
timekeeper *tk,
tk_debug_account_sleep_time(delta);
 }
 
+#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_RTC_HCTOSYS_DEVICE)
 /**
  * timekeeping_inject_sleeptime64 - Adds suspend interval to timeekeeping 
values
  * @delta: pointer to a timespec64 delta value
@@ -1317,6 +1318,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 
*delta)
/* signal hrtimers about time change */
clock_was_set();
 }
+#endif
 
 /**
  * timekeeping_resume - Resumes the generic timekeeping subsystem.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/21] rtc/mc13xxx: Update driver to address y2038/y2106 issues

2015-04-01 Thread John Stultz

From: Xunlei Pang 

This driver has a number of y2038/y2106 issues.

This patch resolves them by:
- Replace rtc_time_to_tm() with rtc_time64_to_tm()
- Change mc13xxx_rtc_set_mmss() to use rtc_class_ops's set_mmss64()

After this patch, the driver should not have any remaining
y2038/y2106 issues.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/rtc-mc13xxx.c | 32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/rtc/rtc-mc13xxx.c b/drivers/rtc/rtc-mc13xxx.c
index 5bce904b..32df1d8 100644
--- a/drivers/rtc/rtc-mc13xxx.c
+++ b/drivers/rtc/rtc-mc13xxx.c
@@ -83,20 +83,19 @@ static int mc13xxx_rtc_read_time(struct device *dev, struct 
rtc_time *tm)
return ret;
} while (days1 != days2);
 
-   rtc_time_to_tm(days1 * SEC_PER_DAY + seconds, tm);
+   rtc_time64_to_tm((time64_t)days1 * SEC_PER_DAY + seconds, tm);
 
return rtc_valid_tm(tm);
 }
 
-static int mc13xxx_rtc_set_mmss(struct device *dev, unsigned long secs)
+static int mc13xxx_rtc_set_mmss(struct device *dev, time64_t secs)
 {
struct mc13xxx_rtc *priv = dev_get_drvdata(dev);
unsigned int seconds, days;
unsigned int alarmseconds;
int ret;
 
-   seconds = secs % SEC_PER_DAY;
-   days = secs / SEC_PER_DAY;
+   days = div_s64_rem(secs, SEC_PER_DAY, &seconds);
 
mc13xxx_lock(priv->mc13xxx);
 
@@ -159,7 +158,7 @@ static int mc13xxx_rtc_read_alarm(struct device *dev, 
struct rtc_wkalrm *alarm)
 {
struct mc13xxx_rtc *priv = dev_get_drvdata(dev);
unsigned seconds, days;
-   unsigned long s1970;
+   time64_t s1970;
int enabled, pending;
int ret;
 
@@ -189,10 +188,10 @@ out:
alarm->enabled = enabled;
alarm->pending = pending;
 
-   s1970 = days * SEC_PER_DAY + seconds;
+   s1970 = (time64_t)days * SEC_PER_DAY + seconds;
 
-   rtc_time_to_tm(s1970, &alarm->time);
-   dev_dbg(dev, "%s: %lu\n", __func__, s1970);
+   rtc_time64_to_tm(s1970, &alarm->time);
+   dev_dbg(dev, "%s: %lld\n", __func__, (long long)s1970);
 
return 0;
 }
@@ -200,8 +199,8 @@ out:
 static int mc13xxx_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alarm)
 {
struct mc13xxx_rtc *priv = dev_get_drvdata(dev);
-   unsigned long s1970;
-   unsigned seconds, days;
+   time64_t s1970;
+   u32 seconds, days;
int ret;
 
mc13xxx_lock(priv->mc13xxx);
@@ -215,20 +214,17 @@ static int mc13xxx_rtc_set_alarm(struct device *dev, 
struct rtc_wkalrm *alarm)
if (unlikely(ret))
goto out;
 
-   ret = rtc_tm_to_time(&alarm->time, &s1970);
-   if (unlikely(ret))
-   goto out;
+   s1970 = rtc_tm_to_time64(&alarm->time);
 
-   dev_dbg(dev, "%s: o%2.s %lu\n", __func__, alarm->enabled ? "n" : "ff",
-   s1970);
+   dev_dbg(dev, "%s: o%2.s %lld\n", __func__, alarm->enabled ? "n" : "ff",
+   (long long)s1970);
 
ret = mc13xxx_rtc_irq_enable_unlocked(dev, alarm->enabled,
MC13XXX_IRQ_TODA);
if (unlikely(ret))
goto out;
 
-   seconds = s1970 % SEC_PER_DAY;
-   days = s1970 / SEC_PER_DAY;
+   days = div_s64_rem(s1970, SEC_PER_DAY, &seconds);
 
ret = mc13xxx_reg_write(priv->mc13xxx, MC13XXX_RTCDAYA, days);
if (unlikely(ret))
@@ -268,7 +264,7 @@ static irqreturn_t mc13xxx_rtc_update_handler(int irq, void 
*dev)
 
 static const struct rtc_class_ops mc13xxx_rtc_ops = {
.read_time = mc13xxx_rtc_read_time,
-   .set_mmss = mc13xxx_rtc_set_mmss,
+   .set_mmss64 = mc13xxx_rtc_set_mmss,
.read_alarm = mc13xxx_rtc_read_alarm,
.set_alarm = mc13xxx_rtc_set_alarm,
.alarm_irq_enable = mc13xxx_rtc_alarm_irq_enable,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/21] timekeeping: Change timekeeping_check_update() to take a tk_read_base

2015-04-01 Thread John Stultz

Ingo noted there was no reason to pass the timekeeper structure
to timekeeping_check_update(), and the tk_read_base would be fine,
which simplifies the amount of dereferencing to get to values we
care about.

So this patch simply changes the function as suggested.

Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index fe91e8e..676896e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -131,11 +131,11 @@ static int timekeeping_overflow_seen;
 /* last_warning is only modified under the timekeeping lock */
 static long timekeeping_last_warning;
 
-static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+static void timekeeping_check_update(struct tk_read_base *tkr, cycle_t offset)
 {
 
-   cycle_t max_cycles = tk->tkr_mono.clock->max_cycles;
-   const char *name = tk->tkr_mono.clock->name;
+   cycle_t max_cycles = tkr->clock->max_cycles;
+   const char *name = tkr->clock->name;
 
if (offset > max_cycles) {
printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is 
larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow 
danger\n",
@@ -210,7 +210,7 @@ static inline cycle_t timekeeping_get_delta(struct 
tk_read_base *tkr)
return delta;
 }
 #else
-static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t 
offset)
+static inline void timekeeping_check_update(struct tk_read_base *tkr, cycle_t 
offset)
 {
 }
 static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
@@ -1794,7 +1794,7 @@ void update_wall_time(void)
goto out;
 
/* Do some additional sanity checking */
-   timekeeping_check_update(real_tk, offset);
+   timekeeping_check_update(&real_tk->tkr_mono, offset);
 
/*
 * With NO_HZ we may have to accumulate many cycle_intervals
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/21] time: rtc: Don't bother into rtc_resume() for the nonstop clocksource

2015-04-01 Thread John Stultz

From: Xunlei Pang 

If a system does not provide a persistent_clock(), the time
will be updated on resume by rtc_resume(). With the addition
of the non-stop clocksources for suspend timing, those systems
set the time on resume in timekeeping_resume(), but may not
provide a valid persistent_clock().

This results in the rtc_resume() logic thinking no one has set
the time and it then will over-write the suspend time again,
which is not necessary and only increases clock error.

So, fix this for rtc_resume().

This patch also improves the name of persistent_clock_exist to
make it more grammatical.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/class.c |  4 +--
 include/linux/timekeeping.h |  9 +++
 kernel/time/timekeeping.c   | 66 +
 3 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index d40760a..c29ba7e 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -55,7 +55,7 @@ static int rtc_suspend(struct device *dev)
struct timespec64   delta, delta_delta;
int err;
 
-   if (has_persistent_clock())
+   if (timekeeping_rtc_skipsuspend())
return 0;
 
if (strcmp(dev_name(&rtc->dev), CONFIG_RTC_HCTOSYS_DEVICE) != 0)
@@ -102,7 +102,7 @@ static int rtc_resume(struct device *dev)
struct timespec64   sleep_time;
int err;
 
-   if (has_persistent_clock())
+   if (timekeeping_rtc_skipresume())
return 0;
 
rtc_hctosys_ret = -ENODEV;
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 7a2369d..99176af 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -248,6 +248,9 @@ static inline void timekeeping_clocktai(struct timespec *ts)
 /*
  * RTC specific
  */
+extern bool timekeeping_rtc_skipsuspend(void);
+extern bool timekeeping_rtc_skipresume(void);
+
 extern void timekeeping_inject_sleeptime64(struct timespec64 *delta);
 
 /*
@@ -259,14 +262,8 @@ extern void getnstime_raw_and_real(struct timespec *ts_raw,
 /*
  * Persistent clock related interfaces
  */
-extern bool persistent_clock_exist;
 extern int persistent_clock_is_local;
 
-static inline bool has_persistent_clock(void)
-{
-   return persistent_clock_exist;
-}
-
 extern void read_persistent_clock(struct timespec *ts);
 extern void read_persistent_clock64(struct timespec64 *ts);
 extern void read_boot_clock(struct timespec *ts);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6dafcea..fe91e8e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -64,9 +64,6 @@ static struct tk_fast tk_fast_raw  cacheline_aligned;
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
 
-/* Flag for if there is a persistent clock on this platform */
-bool __read_mostly persistent_clock_exist = false;
-
 static inline void tk_normalize_xtime(struct timekeeper *tk)
 {
while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << 
tk->tkr_mono.shift)) {
@@ -1204,6 +1201,12 @@ void __weak read_boot_clock64(struct timespec64 *ts64)
*ts64 = timespec_to_timespec64(ts);
 }
 
+/* Flag for if timekeeping_resume() has injected sleeptime */
+static bool sleeptime_injected;
+
+/* Flag for if there is a persistent clock on this platform */
+static bool persistent_clock_exists;
+
 /*
  * timekeeping_init - Initializes the clocksource and common timekeeping values
  */
@@ -1221,7 +1224,7 @@ void __init timekeeping_init(void)
now.tv_sec = 0;
now.tv_nsec = 0;
} else if (now.tv_sec || now.tv_nsec)
-   persistent_clock_exist = true;
+   persistent_clock_exists = true;
 
read_boot_clock64(&boot);
if (!timespec64_valid_strict(&boot)) {
@@ -1282,11 +1285,47 @@ static void __timekeeping_inject_sleeptime(struct 
timekeeper *tk,
 
 #if defined(CONFIG_PM_SLEEP) && defined(CONFIG_RTC_HCTOSYS_DEVICE)
 /**
+ * We have three kinds of time sources to use for sleep time
+ * injection, the preference order is:
+ * 1) non-stop clocksource
+ * 2) persistent clock (ie: RTC accessible when irqs are off)
+ * 3) RTC
+ *
+ * 1) and 2) are used by timekeeping, 3) by RTC subsystem.
+ * If system has neither 1) nor 2), 3) will be used finally.
+ *
+ *
+ * If timekeeping has injected sleeptime via either 1) or 2),
+ * 3) becomes needless, so in this case we don't need to call
+ * rtc_resume(), and this is what timekeeping_rtc_skipresume()
+ * means.
+ */
+bool timekeeping_rtc_skipresume(void)
+{
+   return sleeptime_injected;
+}
+
+/**
+ * 1) can be determined whether to use or not only when doing
+ * timekeeping_resume() which is invoked after rtc_suspend(),
+ * so we can't skip rtc_suspend() surely if system has 1).
+ *
+ * But if system has 2), 2) will definitely be used, so in this
+ * case we don'

[PATCH 12/21] rtc/mxc: Convert get_alarm_or_time()/set_alarm_or_time() to use time64_t

2015-04-01 Thread John Stultz

From: Xunlei Pang 

We want to convert mxc_rtc_set_mmss() to use rtc_class_ops's set_mmss64(),
but it uses get_alarm_or_time()/set_alarm_or_time() internal interfaces
which are y2038 unsafe.

So here as a separate patch, it converts these two internal interfaces
of "mxc" to use safe time64_t to make some preparations.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/rtc-mxc.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/rtc/rtc-mxc.c b/drivers/rtc/rtc-mxc.c
index a7b218f..83cba23 100644
--- a/drivers/rtc/rtc-mxc.c
+++ b/drivers/rtc/rtc-mxc.c
@@ -106,7 +106,7 @@ static inline int is_imx1_rtc(struct rtc_plat_data *data)
  * This function is used to obtain the RTC time or the alarm value in
  * second.
  */
-static u32 get_alarm_or_time(struct device *dev, int time_alarm)
+static time64_t get_alarm_or_time(struct device *dev, int time_alarm)
 {
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
@@ -129,29 +129,28 @@ static u32 get_alarm_or_time(struct device *dev, int 
time_alarm)
hr = hr_min >> 8;
min = hr_min & 0xff;
 
-   return (((day * 24 + hr) * 60) + min) * 60 + sec;
+   return time64_t)day * 24 + hr) * 60) + min) * 60 + sec;
 }
 
 /*
  * This function sets the RTC alarm value or the time value.
  */
-static void set_alarm_or_time(struct device *dev, int time_alarm, u32 time)
+static void set_alarm_or_time(struct device *dev, int time_alarm, time64_t 
time)
 {
-   u32 day, hr, min, sec, temp;
+   u32 tod, day, hr, min, sec, temp;
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
void __iomem *ioaddr = pdata->ioaddr;
 
-   day = time / 86400;
-   time -= day * 86400;
+   day = div_s64_rem(time, 86400, &tod);
 
/* time is within a day now */
-   hr = time / 3600;
-   time -= hr * 3600;
+   hr = tod / 3600;
+   tod -= hr * 3600;
 
/* time is within an hour now */
-   min = time / 60;
-   sec = time - min * 60;
+   min = tod / 60;
+   sec = tod - min * 60;
 
temp = (hr << 8) + min;
 
@@ -175,12 +174,12 @@ static void set_alarm_or_time(struct device *dev, int 
time_alarm, u32 time)
  */
 static void rtc_update_alarm(struct device *dev, struct rtc_time *alrm)
 {
-   unsigned long time;
+   time64_t time;
struct platform_device *pdev = to_platform_device(dev);
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
void __iomem *ioaddr = pdata->ioaddr;
 
-   rtc_tm_to_time(alrm, &time);
+   time = rtc_tm_to_time64(alrm);
 
/* clear all the interrupt status bits */
writew(readw(ioaddr + RTC_RTCISR), ioaddr + RTC_RTCISR);
@@ -272,14 +271,14 @@ static int mxc_rtc_alarm_irq_enable(struct device *dev, 
unsigned int enabled)
  */
 static int mxc_rtc_read_time(struct device *dev, struct rtc_time *tm)
 {
-   u32 val;
+   time64_t val;
 
/* Avoid roll-over from reading the different registers */
do {
val = get_alarm_or_time(dev, MXC_RTC_TIME);
} while (val != get_alarm_or_time(dev, MXC_RTC_TIME));
 
-   rtc_time_to_tm(val, tm);
+   rtc_time64_to_tm(val, tm);
 
return 0;
 }
@@ -322,7 +321,7 @@ static int mxc_rtc_read_alarm(struct device *dev, struct 
rtc_wkalrm *alrm)
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
void __iomem *ioaddr = pdata->ioaddr;
 
-   rtc_time_to_tm(get_alarm_or_time(dev, MXC_RTC_ALARM), &alrm->time);
+   rtc_time64_to_tm(get_alarm_or_time(dev, MXC_RTC_ALARM), &alrm->time);
alrm->pending = ((readw(ioaddr + RTC_RTCISR) & RTC_ALM_BIT)) ? 1 : 0;
 
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/21] rtc: Remove redundant rtc_valid_tm() from rtc_resume()

2015-04-01 Thread John Stultz

From: Xunlei Pang 

rtc_read_time() has already judged valid tm by rtc_valid_tm(),
so just remove it.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/class.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 472a5ad..d40760a 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -117,10 +117,6 @@ static int rtc_resume(struct device *dev)
return 0;
}
 
-   if (rtc_valid_tm(&tm) != 0) {
-   pr_debug("%s:  bogus resume time\n", dev_name(&rtc->dev));
-   return 0;
-   }
new_rtc.tv_sec = rtc_tm_to_time64(&tm);
new_rtc.tv_nsec = 0;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/21] rtc/test: Update driver to address y2038/y2106 issues

2015-04-01 Thread John Stultz

From: Xunlei Pang 

This driver has a number of y2038/y2106 issues.

This patch resolves them by:
- Repalce get_seconds() with ktime_get_real_seconds()
- Replace rtc_time_to_tm() with rtc_time64_to_tm()

Also add test_rtc_set_mmss64() for testing rtc_class_ops's
set_mmss64(), which can be activated by "test_mmss64" module
parameter.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Acked-by: Alessandro Zummo 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 drivers/rtc/rtc-test.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-test.c b/drivers/rtc/rtc-test.c
index 8f86fa9..3a2da4c 100644
--- a/drivers/rtc/rtc-test.c
+++ b/drivers/rtc/rtc-test.c
@@ -13,6 +13,10 @@
 #include 
 #include 
 
+static int test_mmss64;
+module_param(test_mmss64, int, 0644);
+MODULE_PARM_DESC(test_mmss64, "Test struct rtc_class_ops.set_mmss64().");
+
 static struct platform_device *test0 = NULL, *test1 = NULL;
 
 static int test_rtc_read_alarm(struct device *dev,
@@ -30,7 +34,13 @@ static int test_rtc_set_alarm(struct device *dev,
 static int test_rtc_read_time(struct device *dev,
struct rtc_time *tm)
 {
-   rtc_time_to_tm(get_seconds(), tm);
+   rtc_time64_to_tm(ktime_get_real_seconds(), tm);
+   return 0;
+}
+
+static int test_rtc_set_mmss64(struct device *dev, time64_t secs)
+{
+   dev_info(dev, "%s, secs = %lld\n", __func__, (long long)secs);
return 0;
 }
 
@@ -55,7 +65,7 @@ static int test_rtc_alarm_irq_enable(struct device *dev, 
unsigned int enable)
return 0;
 }
 
-static const struct rtc_class_ops test_rtc_ops = {
+static struct rtc_class_ops test_rtc_ops = {
.proc = test_rtc_proc,
.read_time = test_rtc_read_time,
.read_alarm = test_rtc_read_alarm,
@@ -101,6 +111,11 @@ static int test_probe(struct platform_device *plat_dev)
int err;
struct rtc_device *rtc;
 
+   if (test_mmss64) {
+   test_rtc_ops.set_mmss64 = test_rtc_set_mmss64;
+   test_rtc_ops.set_mmss = NULL;
+   }
+
rtc = devm_rtc_device_register(&plat_dev->dev, "test",
&test_rtc_ops, THIS_MODULE);
if (IS_ERR(rtc)) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/21] time: Add y2038 safe read_boot_clock64()

2015-04-01 Thread John Stultz

From: Xunlei Pang 

As part of addressing in-kernel y2038 issues, this patch adds
read_boot_clock64() and replaces all the call sites of read_boot_clock()
with this function. This is a __weak implementation, which simply
calls the existing y2038 unsafe read_boot_clock().

This allows architecture specific implementations to be converted
independently, and eventually the y2038 unsafe read_boot_clock()
can be removed after all its architecture specific implementations
have been converted to read_boot_clock64().

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Suggested-by: Arnd Bergmann 
Signed-off-by: Xunlei Pang 
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h |  1 +
 kernel/time/timekeeping.c   | 11 +--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 5047b83..18d27a3 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -269,6 +269,7 @@ static inline bool has_persistent_clock(void)
 
 extern void read_persistent_clock(struct timespec *ts);
 extern void read_boot_clock(struct timespec *ts);
+extern void read_boot_clock64(struct timespec64 *ts);
 extern int update_persistent_clock(struct timespec now);
 
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c3fcff0..50c4bec 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1188,6 +1188,14 @@ void __weak read_boot_clock(struct timespec *ts)
ts->tv_nsec = 0;
 }
 
+void __weak read_boot_clock64(struct timespec64 *ts64)
+{
+   struct timespec ts;
+
+   read_boot_clock(&ts);
+   *ts64 = timespec_to_timespec64(ts);
+}
+
 /*
  * timekeeping_init - Initializes the clocksource and common timekeeping values
  */
@@ -1209,8 +1217,7 @@ void __init timekeeping_init(void)
} else if (now.tv_sec || now.tv_nsec)
persistent_clock_exist = true;
 
-   read_boot_clock(&ts);
-   boot = timespec_to_timespec64(ts);
+   read_boot_clock64(&boot);
if (!timespec64_valid_strict(&boot)) {
pr_warn("WARNING: Boot clock returned invalid value!\n"
" Check your CMOS/BIOS settings.\n");
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel/timer: avoid spurious ksoftirqd wakeups

2015-04-01 Thread Hillf Danton

> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct 
> tick_sched *ts,
>   unsigned long rcu_delta_jiffies;
>   struct clock_event_device *dev = 
> __this_cpu_read(tick_cpu_device.evtdev);
>   u64 time_delta;
> + bool raise_softirq;
> 
s/raise_softirq/raise_softirq = false/ ?
>   time_delta = timekeeping_max_deferment();
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Jason Low

On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> > 
> > On 04/01/2015 12:24 AM, Jason Low wrote:
> > > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> > >> Hi Jason,
> > >>
> > >> On 03/31/2015 12:25 AM, Jason Low wrote:
> > >>> Hi Preeti,
> > >>>
> > >>> I noticed that another commit 4a725627f21d converted the check in
> > >>> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> > >>> potentially outdated value to be used if this cpu is able to pull tasks
> > >>> using rebalance_domains(), and nohz_kick_needed() directly returning
> > >>> false.
> > >>
> > >> I see that rebalance_domains() will be run at the end of the scheduler
> > >> tick interrupt handling. trigger_load_balance() only sets the softirq,
> > >> it does not call rebalance_domains() immediately. So the call graph
> > >> would be:
> > > 
> > > Oh right, since that only sets the softirq, this wouldn't be the issue,
> > > though we would need these changes if we were to incorporate any sort of
> > > nohz_kick_needed() logic into the nohz_idle_balance() code path correct?
> > 
> > I am sorry I don't quite get this. Can you please elaborate?
> 
> I think the scenario is that we are in nohz_idle_balance() and decide to
> bail out because we have pulled some tasks, but before leaving
> nohz_idle_balance() we want to check if more balancing is necessary
> using nohz_kick_needed() and potentially kick somebody to continue.

> Note that the balance cpu is currently skipped in nohz_idle_balance(),
> but if it wasn't the scenario would be possible.

This scenario would also be possible if we call rebalance_domains()
first again.

I'm wondering if adding the nohz_kick_needed(), ect... in
nohz_idle_balance() can address the 10 second latency issue while still
calling rebalance_domains() first, since it seems more ideal to try
balancing on the current awake CPU first, as you also have mentioned

> In that case, we can't rely on rq->idle_balance as it would not be
> up-to-date. Also, we may even want to use nohz_kick_needed(rq) where rq
> != this_rq, in which case we probably also want an updated status. It
> seems that rq->idle_balance is only updated at each tick.

Yup, that's about what I was describing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 01/10] pci: export functions for msi/msix ctrl

2015-04-01 Thread Fam Zheng

On Sun, 03/29 17:04, Michael S. Tsirkin wrote:
> move pci_msi_set_enable and pci_msix_clear_and_set_ctrl out of msi.c, so
> we can use them will be used which MSI isn't configured in kernel.

if

s/will be used which/when/

then

Reviewed-by: Fam Zheng 
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/pci/pci.h | 21 +
>  drivers/pci/msi.c | 45 -
>  2 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4091f82..17f213d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -146,6 +146,27 @@ static inline void pci_no_msi(void) { }
>  static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { }
>  #endif
>  
> +static inline void pci_msi_set_enable(struct pci_dev *dev, int enable)
> +{
> + u16 control;
> +
> + pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control);
> + control &= ~PCI_MSI_FLAGS_ENABLE;
> + if (enable)
> + control |= PCI_MSI_FLAGS_ENABLE;
> + pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
> +}
> +
> +static inline void pci_msix_clear_and_set_ctrl(struct pci_dev *dev, u16 
> clear, u16 set)
> +{
> + u16 ctrl;
> +
> + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
> + ctrl &= ~clear;
> + ctrl |= set;
> + pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
> +}
> +
>  void pci_realloc_get_opt(char *);
>  
>  static inline int pci_no_d1d2(struct pci_dev *dev)
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index c3e7dfc..9942f68 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -185,27 +185,6 @@ void __weak arch_restore_msi_irqs(struct pci_dev *dev)
>   return default_restore_msi_irqs(dev);
>  }
>  
> -static void msi_set_enable(struct pci_dev *dev, int enable)
> -{
> - u16 control;
> -
> - pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control);
> - control &= ~PCI_MSI_FLAGS_ENABLE;
> - if (enable)
> - control |= PCI_MSI_FLAGS_ENABLE;
> - pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
> -}
> -
> -static void msix_clear_and_set_ctrl(struct pci_dev *dev, u16 clear, u16 set)
> -{
> - u16 ctrl;
> -
> - pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
> - ctrl &= ~clear;
> - ctrl |= set;
> - pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
> -}
> -
>  static inline __attribute_const__ u32 msi_mask(unsigned x)
>  {
>   /* Don't shift by >= width of type */
> @@ -452,7 +431,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
>   entry = irq_get_msi_desc(dev->irq);
>  
>   pci_intx_for_msi(dev, 0);
> - msi_set_enable(dev, 0);
> + pci_msi_set_enable(dev, 0);
>   arch_restore_msi_irqs(dev);
>  
>   pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control);
> @@ -473,14 +452,14 @@ static void __pci_restore_msix_state(struct pci_dev 
> *dev)
>  
>   /* route the table */
>   pci_intx_for_msi(dev, 0);
> - msix_clear_and_set_ctrl(dev, 0,
> + pci_msix_clear_and_set_ctrl(dev, 0,
>   PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>  
>   arch_restore_msi_irqs(dev);
>   list_for_each_entry(entry, &dev->msi_list, list)
>   msix_mask_irq(entry, entry->masked);
>  
> - msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  }
>  
>  void pci_restore_msi_state(struct pci_dev *dev)
> @@ -647,7 +626,7 @@ static int msi_capability_init(struct pci_dev *dev, int 
> nvec)
>   int ret;
>   unsigned mask;
>  
> - msi_set_enable(dev, 0); /* Disable MSI during set up */
> + pci_msi_set_enable(dev, 0); /* Disable MSI during set up */
>  
>   entry = msi_setup_entry(dev, nvec);
>   if (!entry)
> @@ -683,7 +662,7 @@ static int msi_capability_init(struct pci_dev *dev, int 
> nvec)
>  
>   /* Set MSI enabled bits  */
>   pci_intx_for_msi(dev, 0);
> - msi_set_enable(dev, 1);
> + pci_msi_set_enable(dev, 1);
>   dev->msi_enabled = 1;
>  
>   dev->irq = entry->irq;
> @@ -775,7 +754,7 @@ static int msix_capability_init(struct pci_dev *dev,
>   void __iomem *base;
>  
>   /* Ensure MSI-X is disabled while it is set up */
> - msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
>  
>   pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
>   /* Request & Map MSI-X table region */
> @@ -801,7 +780,7 @@ static int msix_capability_init(struct pci_dev *dev,
>* MSI-X registers.  We need to mask all the vectors to prevent
>* interrupts coming in before they're fully set up.
>*/
> - msix_clear_and_set_ctrl(dev, 0,
> + pci_msix_clear_and_set_ctrl(dev, 0,
>

Re: [PATCH] x86/numa: kernel stack corruption fix

2015-04-01 Thread Dave Young

Hi, Xishi

[snip]
> >>  arch/x86/mm/numa.c |3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> --- linux.orig/arch/x86/mm/numa.c
> >> +++ linux/arch/x86/mm/numa.c
> >> @@ -484,7 +484,8 @@ static void __init numa_clear_kernel_nod
> >>  
> >>/* Mark all kernel nodes. */
> >>for_each_memblock(reserved, r)
> >> -  node_set(r->nid, numa_kernel_nodes);
> 
> Hi Dave,
> 
> How about add some comment here? if set numa=off, numa_meminfo may not include
> all the memblock.reserved memory. e.g. trim_snb_memory()

Sure, I can do that. I will send an update if there's no other comments.

> >> +  if (r->nid != MAX_NUMNODES)
> >> +  node_set(r->nid, numa_kernel_nodes);
> >>  
> >>/* Clear MEMBLOCK_HOTPLUG flag for memory in kernel nodes. */
> >>for (i = 0; i < numa_meminfo.nr_blks; i++) {
> >>
> > 
> 

Thanks
Dave
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Preeti U Murthy

Hi Morten,

On 04/01/2015 06:33 PM, Morten Rasmussen wrote:
>> Alright I see. But it is one additional wake up. And the wake up will be
>> within the cluster. We will not wake up any CPU in the neighboring
>> cluster unless there are tasks to be pulled. So, we can wake up a core
>> out of a deep idle state and never a cluster in the problem described.
>> In terms of energy efficiency, this is not so bad a scenario, is it?
> 
> After Peter pointed out that it shouldn't happen across clusters due to
> group_classify()/sg_capacity_factor() it isn't as bad as I initially
> thought. It is still not an ideal solution I think. Wake-ups aren't nice
> for battery-powered devices. Waking up a cpu in an already active
> cluster may still imply powering up the core and bringing the L1 cache
> into a usable state, but it isn't as bad as waking up a cluster. I would
> prefer to avoid it if we can.
> 
> Thinking more about it, don't we also risk doing a lot of iterations in
> nohz_idle_balance() leading to nothing (pure overhead) in certain corner
> cases? If find_new_ild() is the last cpu in the cluster and we have one
> task for each cpu in the cluster but one cpu is currently having two.
> Don't we end up trying all nohz-idle cpus before giving up and balancing
> the balancer cpu itself. On big machines, going through everyone could
> take a while I think. No?

The balancer CPU will iterate as long as need_resched() is not set on
it. It may take a while, but if the balancer CPU has nothing to do, the
iteration will not be at a cost of performance.

Besides this, load balancing itself is optimized in terms of who does it
and how often. The candidate CPUs for load balancing are the first idle
CPUs in a given group. So nohz idle load balancing may abort on some of
the idle CPUs. If the CPUs on our left have not managed to pull tasks,
we abort load balancing too. We will save time here.

Load balancing on bigger sched domains is spaced out in time too. The
min_interval is equal to sd_weight and the balance_interval can be as
large as 2*sd_weight. This should ensure that load balancing across
large scheduling domains are not carried out too often. nohz_idle_load
balancing may therefore not go through the entire scheduling domain
hierarchy for each CPU. This will cut down on the time too.

Regards
Preeti U Murthy
> 
> Morten
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 -next 11/11] serial: 8250_early: Remove setup_early_serial8250_console()

2015-04-01 Thread Peter Hurley

Hi Yinghai,

On 04/01/2015 10:04 PM, Yinghai Lu wrote:
> On Mon, Mar 9, 2015 at 1:27 PM, Peter Hurley  wrote:
>> setup_earlycon() will now match and register the desired earlycon
>> from the param string (as if 'earlycon=...' had been set on the
>> command line). Use setup_earlycon() from existing arch call sites
>> which start an earlycon directly.
>>
> 
> Hi,
> 
> Looks like this patcheset cause regression:
> when set grub console to 115200, and later kernel only have
> 
> console=uart8250,io,0x3f8
> 
> the kernel will revert baud rate to 9600 instead of keeping 115200.
> 
> in setup_earlycon: you say:
> 
>  *  Registers the earlycon console matching the earlycon specified
>  *  in the param string @buf. Acceptable param strings are of the form
>  * ,io|mmio|mmio32,,
>  * ,0x,
>  * ,
>  * 
>  *
>  *  Only for the third form does the earlycon setup() method receive the
>  *   string in the 'options' parameter; all other forms set
>  *  the parameter to NULL.
> 
> 
> so that change the old behavior that we defined in
> Documentation/kernel-parameters.txt
> 
> uart[8250],io,[,options]
> uart[8250],mmio,[,options]
> uart[8250],mmio32,[,options]
> Start an early, polled-mode console on the 8250/16550
> UART at the specified I/O port or MMIO address.
> MMIO inter-register address stride is either 8-bit
> (mmio) or 32-bit (mmio32).

> The options are the same as for ttyS, above.

The documented behavior of console=ttyS options, to which your
quote refers, clearly states:

Default is "9600n8".

> The old behavior: options is optional , and will use baud rate that is
> set by bootloader.

so the previous behavior was actually at odds with the documentation.

> Please fix the problem and restore to old behavior.

Is this really necessary (or even desirable)?
I think it's a bad idea to have one console type (ttyS) initialize its
options to default settings, but yet allow another console type (uart) to
probe the existing state.

Also, this expectation is an impediment when adding support for other
8250-like designs that don't have the same 8250 divisor registers
(ie., _every_ new design). To properly support this requirement for
just the existing 8250 hardware will require special probe_baud()
functions for: dw_8250, intel byt, intel mid, omap_8250, exar 17v35 series,
omap 1510.

Is specifying the line speed on the command line really a burden?

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH drm] drm/vgem: vgem_gem_dumb_map() can be static

2015-04-01 Thread Joe Perches

On Thu, 2015-04-02 at 10:59 +0800, kbuild test robot wrote:
> Signed-off-by: Fengguang Wu 
[]
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
[]
> @@ -195,7 +195,7 @@ static int vgem_gem_dumb_create(struct drm_file *file, 
> struct drm_device *dev,
>   return 0;
>  }
>  
> -int vgem_gem_dumb_map(struct drm_file *file, struct drm_device *dev,
> +static int vgem_gem_dumb_map(struct drm_file *file, struct drm_device *dev,
> uint32_t handle, uint64_t *offset)

Hello Fengguang.

When you or your robot does these, can you please
make sure the entire statement is indented properly?

thanks,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH drm] drm/vgem: vgem_gem_dumb_map() can be static

2015-04-01 Thread kbuild test robot


Signed-off-by: Fengguang Wu 
---
 vgem_drv.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index cb3b435..f7440e8c8 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -195,7 +195,7 @@ static int vgem_gem_dumb_create(struct drm_file *file, 
struct drm_device *dev,
return 0;
 }
 
-int vgem_gem_dumb_map(struct drm_file *file, struct drm_device *dev,
+static int vgem_gem_dumb_map(struct drm_file *file, struct drm_device *dev,
  uint32_t handle, uint64_t *offset)
 {
int ret = 0;
@@ -235,7 +235,7 @@ unlock:
return ret;
 }
 
-int vgem_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
+static int vgem_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
 {
struct drm_file *priv = filp->private_data;
struct drm_device *dev = priv->minor->dev;
@@ -325,7 +325,7 @@ static struct drm_driver vgem_driver = {
.minor  = DRIVER_MINOR,
 };
 
-struct drm_device *vgem_device;
+static struct drm_device *vgem_device;
 
 static int __init vgem_init(void)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[drm:drm-next 23/23] drivers/gpu/drm/vgem/vgem_drv.c:198:5: sparse: symbol 'vgem_gem_dumb_map' was not declared. Should it be static?

2015-04-01 Thread kbuild test robot

tree:   git://people.freedesktop.org/~airlied/linux.git drm-next
head:   502e95c6678505474f1056480310cd9382bacbac
commit: 502e95c6678505474f1056480310cd9382bacbac [23/23] drm/vgem: implement 
virtual GEM
reproduce:
  # apt-get install sparse
  git checkout 502e95c6678505474f1056480310cd9382bacbac
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/gpu/drm/vgem/vgem_drv.c:198:5: sparse: symbol 'vgem_gem_dumb_map' 
>> was not declared. Should it be static?
>> drivers/gpu/drm/vgem/vgem_drv.c:238:5: sparse: symbol 'vgem_drm_gem_mmap' 
>> was not declared. Should it be static?
>> drivers/gpu/drm/vgem/vgem_drv.c:328:19: sparse: symbol 'vgem_device' was not 
>> declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
http://lists.01.org/mailman/listinfo/kbuild Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 09/17 v2] tracing: Show the mapped enums in enum_map file

2015-04-01 Thread Steven Rostedt

On Wed, 01 Apr 2015 21:56:57 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Red Hat)" 
> 
> Add a enum_map file in the tracing directory to see what enums have been
> saved to convert in the print fmt files.

Hmm, I'm thinking of dropping this patch, and adding it later with a
config option. This way, when it is not enabled, the data used for
conversion can be freed. I rather have that then have this stuck as an
API.

Adding it with a config option that states that it will waste memory
lets the distros decide if they want the extra footprint or not.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-04-01 Thread Kamezawa Hiroyuki


On 2015/04/02 10:36, Gu Zheng wrote:

Hi Kame, TJ,

On 04/01/2015 04:30 PM, Kamezawa Hiroyuki wrote:


On 2015/04/01 12:02, Tejun Heo wrote:

On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote:

Now, hot-added cpus will have the lowest free cpu id.

Because of this, in most of systems which has only cpu-hot-add, cpu-ids are 
always
contiguous even after cpu hot add.
In enterprise, this would be considered as imcompatibility.

determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt
exisiting script or configuration/resource management software.


Ugh... so, cpu number allocation on hot-add is part of userland
interface that we're locked into?


We checked most of RHEL7 packages and didn't find a problem yet.
But, for examle, we know some performance test team's test program assumed 
contiguous
cpuids and it failed. It was an easy case because we can ask them to fix the 
application
but I guess there will be some amount of customers that cpuids are contiguous.


Tying hotplug and id allocation
order together usually isn't a good idea.  What if the cpu up fails
while running the notifiers?  The ID is already allocated and the next
cpu being brought up will be after a hole anyway.  Is this even
actually gonna affect userland?



Maybe. It's not fail-safe but

In general, all kernel engineers (and skilled userland engineers) knows that
cpuids cannot be always contiguous and cpuids/nodeids should be checked before
running programs. I think most of engineers should be aware of that but many
users have their own assumption :(

Basically, I don't have strong objections, you're right technically.

In summary...
  - users should not assume cpuids are contiguous.
  - all possible ids should be fixed at boot time.
  - For uses, some clarification document should be somewhere in Documenatation.


Fine to me.



So, Gu-san
  1) determine all possible ids at boot.
  2) clarify cpuid/nodeid can have hole because of 1) in Documenation.
  3) It would be good if other guys give us ack.


Also fine.
But before this going, could you please reconsider determining the ids when 
firstly
present (the implementation on this patchset)?
Though it is not the perfect one in some words, but we can ignore the doubts 
that
mentioned above as the cpu/node hotplug is not frequent behaviours, and there 
seems
not anything harmful to us if we go this way.



Is it so heavy work ?  Hmm. My requests are

Implement your patches as
- Please don't change current behavior at boot.
- Remember all possible apicids and give them future cpuids if not assigned.
as step 1.

Please fix dynamic pxm<->node detection in step2.

In future, memory-less node handling in x86 should be revisited.

Thanks,
-Kame







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH kernel v7 28/31] powerpc/mmu: Add userspace-to-physical addresses translation cache

2015-04-01 Thread Alexey Kardashevskiy


On 04/02/2015 08:48 AM, Alex Williamson wrote:

On Sat, 2015-03-28 at 01:55 +1100, Alexey Kardashevskiy wrote:

We are adding support for DMA memory pre-registration to be used in
conjunction with VFIO. The idea is that the userspace which is going to
run a guest may want to pre-register a user space memory region so
it all gets pinned once and never goes away. Having this done,
a hypervisor will not have to pin/unpin pages on every DMA map/unmap
request. This is going to help with multiple pinning of the same memory
and in-kernel acceleration of DMA requests.

This adds a list of memory regions to mm_context_t. Each region consists
of a header and a list of physical addresses. This adds API to:
1. register/unregister memory regions;
2. do final cleanup (which puts all pre-registered pages);
3. do userspace to physical address translation;
4. manage a mapped pages counter; when it is zero, it is safe to
unregister the region.

Multiple registration of the same region is allowed, kref is used to
track the number of registrations.

Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/mmu-hash64.h  |   3 +
  arch/powerpc/include/asm/mmu_context.h |  16 +++
  arch/powerpc/mm/Makefile   |   1 +
  arch/powerpc/mm/mmu_context_hash64.c   |   6 +
  arch/powerpc/mm/mmu_context_hash64_iommu.c | 215 +
  5 files changed, 241 insertions(+)
  create mode 100644 arch/powerpc/mm/mmu_context_hash64_iommu.c

diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index 4f13c3e..83214c4 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -535,6 +535,9 @@ typedef struct {
/* for 4K PTE fragment support */
void *pte_frag;
  #endif
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+   struct list_head iommu_group_mem_list;
+#endif
  } mm_context_t;


diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 73382eb..3461c91 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -16,6 +16,22 @@
   */
  extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
  extern void destroy_context(struct mm_struct *mm);
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+typedef struct mm_iommu_table_group_mem_t mm_iommu_table_group_mem_t;
+
+extern bool mm_iommu_preregistered(void);
+extern long mm_iommu_alloc(unsigned long ua, unsigned long entries,
+   mm_iommu_table_group_mem_t **pmem);
+extern mm_iommu_table_group_mem_t *mm_iommu_get(unsigned long ua,
+   unsigned long entries);
+extern long mm_iommu_put(mm_iommu_table_group_mem_t *mem);
+extern void mm_iommu_cleanup(mm_context_t *ctx);
+extern mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
+   unsigned long size);
+extern long mm_iommu_ua_to_hpa(mm_iommu_table_group_mem_t *mem,
+   unsigned long ua, unsigned long *hpa);
+extern long mm_iommu_mapped_update(mm_iommu_table_group_mem_t *mem, bool inc);
+#endif

  extern void switch_mmu_context(struct mm_struct *prev, struct mm_struct 
*next);
  extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 438dcd3..49fbfc7 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -35,3 +35,4 @@ obj-$(CONFIG_PPC_SUBPAGE_PROT)+= subpage-prot.o
  obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
  obj-$(CONFIG_HIGHMEM) += highmem.o
  obj-$(CONFIG_PPC_COPRO_BASE)  += copro_fault.o
+obj-$(CONFIG_SPAPR_TCE_IOMMU)  += mmu_context_hash64_iommu.o
diff --git a/arch/powerpc/mm/mmu_context_hash64.c 
b/arch/powerpc/mm/mmu_context_hash64.c
index 178876ae..eb3080c 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -89,6 +89,9 @@ int init_new_context(struct task_struct *tsk, struct 
mm_struct *mm)
  #ifdef CONFIG_PPC_64K_PAGES
mm->context.pte_frag = NULL;
  #endif
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+   INIT_LIST_HEAD_RCU(&mm->context.iommu_group_mem_list);
+#endif
return 0;
  }

@@ -132,6 +135,9 @@ static inline void destroy_pagetable_page(struct mm_struct 
*mm)

  void destroy_context(struct mm_struct *mm)
  {
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+   mm_iommu_cleanup(&mm->context);
+#endif

  #ifdef CONFIG_PPC_ICSWX
drop_cop(mm->context.acop, mm);
diff --git a/arch/powerpc/mm/mmu_context_hash64_iommu.c 
b/arch/powerpc/mm/mmu_context_hash64_iommu.c
new file mode 100644
index 000..c268c4d
--- /dev/null
+++ b/arch/powerpc/mm/mmu_context_hash64_iommu.c
@@ -0,0 +1,215 @@
+/*
+ *  IOMMU helpers in MMU context.
+ *
+ *  Copyright (C) 2015 IBM Corp. 
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at you

Re: [PATCH kernel v7 12/31] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-04-01 Thread Alex Williamson

On Thu, 2015-04-02 at 13:33 +1100, Alexey Kardashevskiy wrote:
> On 04/02/2015 08:48 AM, Alex Williamson wrote:
> > On Sat, 2015-03-28 at 01:54 +1100, Alexey Kardashevskiy wrote:
> >> Modern IBM POWERPC systems support multiple (currently two) TCE tables
> >> per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
> >> for TCE tables. Right now just one table is supported.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>   Documentation/vfio.txt  |  23 ++
> >>   arch/powerpc/include/asm/iommu.h|  18 +++--
> >>   arch/powerpc/kernel/iommu.c |  34 
> >>   arch/powerpc/platforms/powernv/pci-ioda.c   |  38 +
> >>   arch/powerpc/platforms/powernv/pci-p5ioc2.c |  17 ++--
> >>   arch/powerpc/platforms/powernv/pci.c|   2 +-
> >>   arch/powerpc/platforms/powernv/pci.h|   4 +-
> >>   arch/powerpc/platforms/pseries/iommu.c  |   9 ++-
> >>   drivers/vfio/vfio_iommu_spapr_tce.c | 120 
> >> 
> >>   9 files changed, 183 insertions(+), 82 deletions(-)
> >>
> >> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> >> index 96978ec..94328c8 100644
> >> --- a/Documentation/vfio.txt
> >> +++ b/Documentation/vfio.txt
> >> @@ -427,6 +427,29 @@ The code flow from the example above should be 
> >> slightly changed:
> >>
> >>
> >>
> >> +5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
> >> +VFIO_IOMMU_DISABLE and implements 2 new ioctls:
> >> +VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
> >> +(which are unsupported in v1 IOMMU).
> >> +
> >> +PPC64 paravirtualized guests generate a lot of map/unmap requests,
> >> +and the handling of those includes pinning/unpinning pages and updating
> >> +mm::locked_vm counter to make sure we do not exceed the rlimit.
> >> +The v2 IOMMU splits accounting and pinning into separate operations:
> >> +
> >> +- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY 
> >> ioctls
> >> +receive a user space address and size of the block to be pinned.
> >> +Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
> >> +be called with the exact address and size used for registering
> >> +the memory block. The userspace is not expected to call these often.
> >> +The ranges are stored in a linked list in a VFIO container.
> >> +
> >> +- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
> >> +IOMMU table and do not do pinning; instead these check that the userspace
> >> +address is from pre-registered range.
> >> +
> >> +This separation helps in optimizing DMA for guests.
> >> +
> >>   
> >> ---
> >>
> >>   [1] VFIO was originally an acronym for "Virtual Function I/O" in its
> >> diff --git a/arch/powerpc/include/asm/iommu.h 
> >> b/arch/powerpc/include/asm/iommu.h
> >> index eb75726..667aa1a 100644
> >> --- a/arch/powerpc/include/asm/iommu.h
> >> +++ b/arch/powerpc/include/asm/iommu.h
> >> @@ -90,9 +90,7 @@ struct iommu_table {
> >>struct iommu_pool pools[IOMMU_NR_POOLS];
> >>unsigned long *it_map;   /* A simple allocation bitmap for now */
> >>unsigned long  it_page_shift;/* table iommu page size */
> >> -#ifdef CONFIG_IOMMU_API
> >> -  struct iommu_group *it_group;
> >> -#endif
> >> +  struct iommu_table_group *it_group;
> >>struct iommu_table_ops *it_ops;
> >>void (*set_bypass)(struct iommu_table *tbl, bool enable);
> >>   };
> >> @@ -126,14 +124,24 @@ extern void iommu_free_table(struct iommu_table 
> >> *tbl, const char *node_name);
> >>*/
> >>   extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
> >>int nid);
> >> +
> >> +#define IOMMU_TABLE_GROUP_MAX_TABLES  1
> >> +
> >> +struct iommu_table_group {
> >>   #ifdef CONFIG_IOMMU_API
> >> -extern void iommu_register_group(struct iommu_table *tbl,
> >> +  struct iommu_group *group;
> >> +#endif
> >> +  struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
> >> +};
> >> +
> >> +#ifdef CONFIG_IOMMU_API
> >> +extern void iommu_register_group(struct iommu_table_group *table_group,
> >> int pci_domain_number, unsigned long pe_num);
> >>   extern int iommu_add_device(struct device *dev);
> >>   extern void iommu_del_device(struct device *dev);
> >>   extern int __init tce_iommu_bus_notifier_init(void);
> >>   #else
> >> -static inline void iommu_register_group(struct iommu_table *tbl,
> >> +static inline void iommu_register_group(struct iommu_table_group 
> >> *table_group,
> >>int pci_domain_number,
> >>unsigned long pe_num)
> >
> >
> > Not a new problem, but there's some awfully liberal use of the namespace
> > with function names here.  IOMMU API uses iommu_foo() functions.  IOMMU
> > group related interfaces within the IOMMU API incl

Re: [PATCH kernel v7 04/31] vfio: powerpc/spapr: Use it_page_size

2015-04-01 Thread Alex Williamson

On Thu, 2015-04-02 at 13:30 +1100, Alexey Kardashevskiy wrote:
> On 04/02/2015 08:48 AM, Alex Williamson wrote:
> > On Sat, 2015-03-28 at 01:54 +1100, Alexey Kardashevskiy wrote:
> >> This makes use of the it_page_size from the iommu_table struct
> >> as page size can differ.
> >>
> >> This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
> >> as recently introduced IOMMU_PAGE_XXX macros do not include
> >> IOMMU_PAGE_SHIFT.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> Reviewed-by: David Gibson 
> >> ---
> >>   drivers/vfio/vfio_iommu_spapr_tce.c | 26 +-
> >>   1 file changed, 13 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >> b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> index f835e63..8bbee22 100644
> >> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> @@ -91,7 +91,7 @@ static int tce_iommu_enable(struct tce_container 
> >> *container)
> >> * enforcing the limit based on the max that the guest can map.
> >> */
> >>down_write(¤t->mm->mmap_sem);
> >> -  npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
> >> +  npages = (tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT;
> >>locked = current->mm->locked_vm + npages;
> >>lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> >>if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
> >> @@ -120,7 +120,7 @@ static void tce_iommu_disable(struct tce_container 
> >> *container)
> >>
> >>down_write(¤t->mm->mmap_sem);
> >>current->mm->locked_vm -= (container->tbl->it_size <<
> >> -  IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
> >> +  container->tbl->it_page_shift) >> PAGE_SHIFT;
> >>up_write(¤t->mm->mmap_sem);
> >>   }
> >>
> >> @@ -222,7 +222,7 @@ static long tce_iommu_build(struct tce_container 
> >> *container,
> >>tce, ret);
> >>break;
> >>}
> >> -  tce += IOMMU_PAGE_SIZE_4K;
> >> +  tce += IOMMU_PAGE_SIZE(tbl);
> >
> >
> > Is PAGE_SIZE ever smaller than IOMMU_PAGE_SIZE(tbl)?  IOW, can the page
> > we got from get_user_pages_fast() ever not completely fill the tce
> > entry?
> 
> 
> Yes. IOMMU_PAGE_SIZE is 4K/64K/16M (16M is with huge pages enabled in QEMU 
> with -mempath), PAGE_SIZE is 4K/64K (normally 64K).

Isn't that a problem then that you're filling the tce with processor
page sizes via get_user_pages_fast(), but incrementing the tce by by
IOMMU page size?  For example, if PAGE_SIZE = 4K and IOMMU_PAGE_SIZE !=
4K have we really pinned all of the memory backed by the tce?  Where do
you make sure the 4K page is really contiguous for the IOMMU page?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH kernel v7 12/31] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-04-01 Thread Alexey Kardashevskiy


On 04/02/2015 08:48 AM, Alex Williamson wrote:

On Sat, 2015-03-28 at 01:54 +1100, Alexey Kardashevskiy wrote:

Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

Signed-off-by: Alexey Kardashevskiy 
---
  Documentation/vfio.txt  |  23 ++
  arch/powerpc/include/asm/iommu.h|  18 +++--
  arch/powerpc/kernel/iommu.c |  34 
  arch/powerpc/platforms/powernv/pci-ioda.c   |  38 +
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  17 ++--
  arch/powerpc/platforms/powernv/pci.c|   2 +-
  arch/powerpc/platforms/powernv/pci.h|   4 +-
  arch/powerpc/platforms/pseries/iommu.c  |   9 ++-
  drivers/vfio/vfio_iommu_spapr_tce.c | 120 
  9 files changed, 183 insertions(+), 82 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..94328c8 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -427,6 +427,29 @@ The code flow from the example above should be slightly 
changed:



+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
+VFIO_IOMMU_DISABLE and implements 2 new ioctls:
+VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
+(which are unsupported in v1 IOMMU).
+
+PPC64 paravirtualized guests generate a lot of map/unmap requests,
+and the handling of those includes pinning/unpinning pages and updating
+mm::locked_vm counter to make sure we do not exceed the rlimit.
+The v2 IOMMU splits accounting and pinning into separate operations:
+
+- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
+receive a user space address and size of the block to be pinned.
+Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
+be called with the exact address and size used for registering
+the memory block. The userspace is not expected to call these often.
+The ranges are stored in a linked list in a VFIO container.
+
+- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
+IOMMU table and do not do pinning; instead these check that the userspace
+address is from pre-registered range.
+
+This separation helps in optimizing DMA for guests.
+
  
---

  [1] VFIO was originally an acronym for "Virtual Function I/O" in its
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index eb75726..667aa1a 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -90,9 +90,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-#ifdef CONFIG_IOMMU_API
-   struct iommu_group *it_group;
-#endif
+   struct iommu_table_group *it_group;
struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
  };
@@ -126,14 +124,24 @@ extern void iommu_free_table(struct iommu_table *tbl, 
const char *node_name);
   */
  extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);
+
+#define IOMMU_TABLE_GROUP_MAX_TABLES   1
+
+struct iommu_table_group {
  #ifdef CONFIG_IOMMU_API
-extern void iommu_register_group(struct iommu_table *tbl,
+   struct iommu_group *group;
+#endif
+   struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
+};
+
+#ifdef CONFIG_IOMMU_API
+extern void iommu_register_group(struct iommu_table_group *table_group,
 int pci_domain_number, unsigned long pe_num);
  extern int iommu_add_device(struct device *dev);
  extern void iommu_del_device(struct device *dev);
  extern int __init tce_iommu_bus_notifier_init(void);
  #else
-static inline void iommu_register_group(struct iommu_table *tbl,
+static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
unsigned long pe_num)



Not a new problem, but there's some awfully liberal use of the namespace
with function names here.  IOMMU API uses iommu_foo() functions.  IOMMU
group related interfaces within the IOMMU API include "group" somewhere
in that name.  powerpc specific functions should include a tag to avoid
causing conflicts there.



Cannot argue with that but it is kind of late or not for this patchset, no? 
And iommu_table is way too generic for powerpc/spapr-specific thing.


I can replace with something better, should I do this now?



(sorry for commenting twice on the same patch)




--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger

Re: [PATCH kernel v7 04/31] vfio: powerpc/spapr: Use it_page_size

2015-04-01 Thread Alexey Kardashevskiy


On 04/02/2015 08:48 AM, Alex Williamson wrote:

On Sat, 2015-03-28 at 01:54 +1100, Alexey Kardashevskiy wrote:

This makes use of the it_page_size from the iommu_table struct
as page size can differ.

This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: David Gibson 
---
  drivers/vfio/vfio_iommu_spapr_tce.c | 26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index f835e63..8bbee22 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -91,7 +91,7 @@ static int tce_iommu_enable(struct tce_container *container)
 * enforcing the limit based on the max that the guest can map.
 */
down_write(¤t->mm->mmap_sem);
-   npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
+   npages = (tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT;
locked = current->mm->locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
@@ -120,7 +120,7 @@ static void tce_iommu_disable(struct tce_container 
*container)

down_write(¤t->mm->mmap_sem);
current->mm->locked_vm -= (container->tbl->it_size <<
-   IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
+   container->tbl->it_page_shift) >> PAGE_SHIFT;
up_write(¤t->mm->mmap_sem);
  }

@@ -222,7 +222,7 @@ static long tce_iommu_build(struct tce_container *container,
tce, ret);
break;
}
-   tce += IOMMU_PAGE_SIZE_4K;
+   tce += IOMMU_PAGE_SIZE(tbl);



Is PAGE_SIZE ever smaller than IOMMU_PAGE_SIZE(tbl)?  IOW, can the page
we got from get_user_pages_fast() ever not completely fill the tce
entry?



Yes. IOMMU_PAGE_SIZE is 4K/64K/16M (16M is with huge pages enabled in QEMU 
with -mempath), PAGE_SIZE is 4K/64K (normally 64K).




(Have I asked this before?  Sorry if so)



:)



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] phy: add Broadcom SATA3 PHY driver for Broadcom STB SoCs

2015-04-01 Thread Brian Norris

On Tue, Mar 31, 2015 at 11:31:40AM +0530, Kishon Vijay Abraham I wrote:
> On Saturday 28 March 2015 05:58 AM, Brian Norris wrote:
> >On Thu, Mar 26, 2015 at 03:29:44AM +0530, Kishon Vijay Abraham I wrote:
> >>On Thursday 19 March 2015 06:53 AM, Brian Norris wrote:
> >>>+static struct phy *brcm_sata_phy_xlate(struct device *dev,
> >>>+ struct of_phandle_args *args)
> >>>+{
> >>>+  struct brcm_sata_phy *priv = dev_get_drvdata(dev);
> >>>+  int i = args->args[0];
> >>>+
> >>>+  if (i >= MAX_PORTS || !priv->phys[i].phy) {
> >>>+  dev_err(dev, "invalid phy: %d\n", i);
> >>>+  return ERR_PTR(-ENODEV);
> >>>+  }
> >>>+
> >>>+  return priv->phys[i].phy;
> >>>+}
> >>
> >>this xlate is not required at all if the controller device tree node has
> >>phandle to the phy node (sub node) instead of the phy provider device tree
> >>node.
> >
> >That doesn't match any convention I see in existing SATA phy bindings,
> >nor do I see how the existing of_phy_simple_xlate() would support this,
> >unless I instantiate a device for each port's PHY. If I adjust the
> >device tree as you suggest, and use of_phy_simple_xlate() instead of
> >this, of_phy_get() can't find the PHY provider, because the provider is
> >registered to the parent, not the subnode.
> 
> The phy core should still be able to get the PHY provider.
> See this in of_phy_provider_lookup
> for_each_child_of_node(phy_provider->dev->of_node, child)
> if (child == node)
> return phy_provider;

That just searches for children of the node. It doesn't walk parent
nodes.

> Can you post your device tree node here?

You mean patch 5?

https://lkml.org/lkml/2015/3/18/937

> >
> >Can you elaborate on your suggestion?
> >
> >>>+
> >>>+static const struct of_device_id brcmstb_sata_phy_of_match[] = {
> >>>+  { .compatible   = "brcm,bcm7445-sata-phy" },
> >>>+  {},
> >>>+};
> >>>+MODULE_DEVICE_TABLE(of, brcmstb_sata_phy_of_match);
> >>>+
> >>>+static int brcmstb_sata_phy_probe(struct platform_device *pdev)
> >>>+{
> >>>+  struct device *dev = &pdev->dev;
> >>>+  struct device_node *dn = dev->of_node, *child;
> >>>+  struct brcm_sata_phy *priv;
> >>>+  struct resource *res;
> >>>+  struct phy_provider *provider;
> >>>+  int count = 0;
> >>>+
> >>>+  if (of_get_child_count(dn) == 0)
> >>>+  return -ENODEV;
> >>>+
> >>>+  priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> >>>+  if (!priv)
> >>>+  return -ENOMEM;
> >>>+  dev_set_drvdata(dev, priv);
> >>>+  priv->dev = dev;
> >>>+
> >>>+  res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "port-ctrl");
> >>>+  if (!res) {
> >>>+  dev_err(dev, "couldn't get port-ctrl resource\n");
> >>>+  return -EINVAL;
> >>>+  }
> >>>+  /*
> >>>+   * Don't request region, since it may be within a region owned by the
> >>>+   * SATA driver
> >>
> >>It should be in the SATA driver then. Why is it here?
> >
> >Did you read the discussion branching here?
> >
> >http://article.gmane.org/gmane.linux.drivers.devicetree/114637
> >
> >I've seen the exact opposite suggestion already (move it to the PHY
> >driver), and I'm not sure either suggestion is correct. The same
> >register block has registers for both the PHY and the SATA controller.
> 
> IMHO the register space shouldn't be defined based on the programming sequence
> but by where the register is actually present in the IP. From that thread it
> doesn't look like it is present in the PHY IP.

If you say so. But it has plenty of PHY controls packed into those
registers, so are you just recommending handling those bits from the
SATA driver?

...

> >>lets not make the boot noisy. Make it dev_vdbg if it is required.
> >
> >I think it's important to have at least some of the three informational
> >prints that you're suggesting turn into dbg. It's pretty important to
> >see that we're powering on one or more PHY ports, for both
> >power/correctness concerns (trying to power on a port that is
> >OTP-disabled, for instance) and debugging concerns (the suggestions you
> >made about the device tree yielded a dead SATA, and it was obvious,
> >because the "powering on" prints were missing).
> 
> All these are debugging info. Hence it's better to keep in dev_vdbg or 
> dev_dbg.
> >
> >I'd kinda like to see the previous power on/off prints above stay as
> >dev_info(), though the "registered" print might be superfluous, as the
> >registration info should show up in sysfs.
> >
> >Related: I don't see any API for monitoring the PHY status from user
> >space. e.g., there's nothing useful in /sys/class/phy/*/.
> 
> Do you really need to monitor the PHY status? We should be careful about
> exposing anything to the user space since it will become an ABI and we can
> never modify it.

Not really, but the debugging info (which you want me to remove by
default, and which is unretrievable after system boot) is the next
easiest solution. It doesn't provide an ABI, exa

[RFC][PATCH 04/17 v2] tracing: Give system name a pointer

2015-04-01 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

Normally the compiler will use the same pointer for a string throughout
the file. But there's no guarantee of that happening. Later changes will
require that all events have the same pointer to the system string.

Name the system string and have all events point to it.

Testing this, it did not increases the size of the text, except for the
notes section, which should not harm the real size any.

Signed-off-by: Steven Rostedt 
---
 include/linux/ftrace_event.h |  2 +-
 include/trace/ftrace.h   | 19 +--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c674ee8f7fca..62b8fac7ded5 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -202,7 +202,7 @@ enum trace_reg {
 struct ftrace_event_call;
 
 struct ftrace_event_class {
-   char*system;
+   const char  *system;
void*probe;
 #ifdef CONFIG_PERF_EVENTS
void*perf_probe;
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 41bf65f04dd9..2f9b95b6d3fb 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -18,6 +18,21 @@
 
 #include 
 
+#ifndef TRACE_SYSTEM_VAR
+#define TRACE_SYSTEM_VAR TRACE_SYSTEM
+#endif
+
+#define __app__(x, y) str__##x##y
+#define __app(x, y) __app__(x, y)
+
+#define TRACE_SYSTEM_STRING __app(TRACE_SYSTEM_VAR,__trace_system_name)
+
+#define TRACE_MAKE_SYSTEM_STR()\
+   static const char TRACE_SYSTEM_STRING[] =   \
+   __stringify(TRACE_SYSTEM)
+
+TRACE_MAKE_SYSTEM_STR();
+
 /*
  * DECLARE_EVENT_CLASS can be used to add a generic function
  * handlers for events. That is, if all events have the same
@@ -105,7 +120,6 @@
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
-
 /*
  * Stage 2 of the trace events.
  *
@@ -692,7 +706,7 @@ static inline void ftrace_test_probe_##call(void)   
\
 _TRACE_PERF_PROTO(call, PARAMS(proto));
\
 static const char print_fmt_##call[] = print;  \
 static struct ftrace_event_class __used __refdata event_class_##call = { \
-   .system = __stringify(TRACE_SYSTEM),\
+   .system = TRACE_SYSTEM_STRING,  \
.define_fields  = ftrace_define_fields_##call,  \
.fields = LIST_HEAD_INIT(event_class_##call.fields),\
.raw_init   = trace_event_raw_init, \
@@ -735,6 +749,7 @@ __attribute__((section("_ftrace_events"))) *__event_##call 
= &event_##call
 
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
+#undef TRACE_SYSTEM_VAR
 
 #ifdef CONFIG_PERF_EVENTS
 
-- 
2.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 09/17 v2] tracing: Show the mapped enums in enum_map file

2015-04-01 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

Add a enum_map file in the tracing directory to see what enums have been
saved to convert in the print fmt files.

Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c | 110 +--
 1 file changed, 107 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9a714a42b7b..f2e63d0bf45e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -123,7 +123,7 @@ enum ftrace_dump_mode ftrace_dump_on_oops;
 /* When set, tracing will stop when a WARN*() is hit */
 int __disable_trace_on_warning;
 
-/* Map of enums to their values */
+/* Map of enums to their values, for "enum_map" file */
 struct trace_enum_map_head {
struct module   *mod;
unsigned long   length;
@@ -136,10 +136,12 @@ struct trace_enum_map_tail {
 * "end" is first and points to NULL as it must be different
 * than "mod" or "enum_string"
 */
-   const char  *end;   /* points to NULL */
union trace_enum_map_item   *next;
+   const char  *end;   /* points to NULL */
 };
 
+static DEFINE_MUTEX(trace_enum_mutex);
+
 /*
  * The trace_enum_maps are saved in an array with two extra elements,
  * one at the beginning, and one at the end. The beginning item contains
@@ -3940,6 +3942,97 @@ static const struct file_operations 
tracing_saved_cmdlines_size_fops = {
.write  = tracing_saved_cmdlines_size_write,
 };
 
+static union trace_enum_map_item *
+update_enum_map(union trace_enum_map_item *ptr)
+{
+   if (!ptr->map.enum_string) {
+   if (ptr->tail.next) {
+   ptr = ptr->tail.next;
+   /* Set ptr to the next real item (skip head) */
+   ptr++;
+   } else
+   return NULL;
+   }
+   return ptr;
+}
+
+static void *enum_map_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   union trace_enum_map_item *ptr = v;
+
+   /*
+* Paranoid! If ptr points to end, we don't want to increment past it.
+* This really should never happen.
+*/
+   ptr = update_enum_map(ptr);
+   if (WARN_ON_ONCE(!ptr))
+   return NULL;
+
+   ptr++;
+
+   (*pos)++;
+
+   ptr = update_enum_map(ptr);
+
+   return ptr;
+}
+
+static void *enum_map_start(struct seq_file *m, loff_t *pos)
+{
+   union trace_enum_map_item *v;
+   loff_t l = 0;
+
+   mutex_lock(&trace_enum_mutex);
+
+   v = trace_enum_maps;
+   if (v)
+   v++;
+
+   while (v && l < *pos) {
+   v = enum_map_next(m, v, &l);
+   }
+
+   return v;
+}
+
+static void enum_map_stop(struct seq_file *m, void *v)
+{
+   mutex_unlock(&trace_enum_mutex);
+}
+
+static int enum_map_show(struct seq_file *m, void *v)
+{
+   union trace_enum_map_item *ptr = v;
+
+   seq_printf(m, "%s %ld (%s)\n",
+  ptr->map.enum_string, ptr->map.enum_value,
+  ptr->map.system);
+
+   return 0;
+}
+
+static const struct seq_operations tracing_enum_map_seq_ops = {
+   .start  = enum_map_start,
+   .next   = enum_map_next,
+   .stop   = enum_map_stop,
+   .show   = enum_map_show,
+};
+
+static int tracing_enum_map_open(struct inode *inode, struct file *filp)
+{
+   if (tracing_disabled)
+   return -ENODEV;
+
+   return seq_open(filp, &tracing_enum_map_seq_ops);
+}
+
+static const struct file_operations tracing_enum_map_fops = {
+   .open   = tracing_enum_map_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 static inline union trace_enum_map_item *
 trace_enum_jmp_to_tail(union trace_enum_map_item *ptr)
 {
@@ -3971,6 +4064,8 @@ static void trace_insert_enum_map(struct module *mod,
return;
}
 
+   mutex_lock(&trace_enum_mutex);
+
if (!trace_enum_maps)
trace_enum_maps = map_array;
else {
@@ -3996,6 +4091,8 @@ static void trace_insert_enum_map(struct module *mod,
 
/* Pass in the start of the array (-len) */
trace_event_enum_update(&map_array[-len].map, len);
+
+   mutex_unlock(&trace_enum_mutex);
 }
 
 static ssize_t
@@ -6667,6 +6764,8 @@ static void trace_module_remove_enums(struct module *mod)
if (!mod->num_trace_enums)
return;
 
+   mutex_lock(&trace_enum_mutex);
+
map = trace_enum_maps;
 
while (map) {
@@ -6677,10 +6776,12 @@ static void trace_module_remove_enums(struct module 
*mod)
map = map->tail.next;
}
if (!map)
-   return;
+   goto out;
 
*last = trace_enum_jmp_to_tail(map)->tail.next;
kfree(map);
+ out:
+   mutex_unlock(&trace_enum_mutex);
 }
 
 static int trac

RE: [PATCH] PM / watchdog: iTCO: stop watchdog during system suspend

2015-04-01 Thread Fu, Borun

Sorry, my fault. It should work as well. 

-Original Message-
From: Guenter Roeck [mailto:li...@roeck-us.net] 
Sent: Thursday, April 02, 2015 9:11
To: Fu, Borun; Rafael J. Wysocki; linux-watch...@vger.kernel.org
Cc: Linux Kernel Mailing List; Linux PM list; Wim Van Sebroeck; Li, Aubrey; 
Wang, Frank
Subject: Re: [PATCH] PM / watchdog: iTCO: stop watchdog during system suspend

On 04/01/2015 05:52 PM, Fu, Borun wrote:
> Hi Rafael,
> watchdog_active() function is not implemented in your patch. Pls. add it.
>

Please explain why using the standard watchdog_active() would not work here.

Guenter

> static int watchdog_active(struct device *dev) {
> unsigned int val;
>
> spin_lock(&iTCO_wdt_private.io_lock);
> val = inw(TCO1_CNT);
> spin_unlock(&iTCO_wdt_private.io_lock);
>
> /* Bit 11: TCO Timer Halt */
> if (val & 0x0800)
> return 0;
> else
> return 1;
> }
>
> Regards,
> Borun
> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Sent: Thursday, April 02, 2015 8:31
> To: linux-watch...@vger.kernel.org
> Cc: Linux Kernel Mailing List; Linux PM list; Wim Van Sebroeck; Li, 
> Aubrey; Fu, Borun
> Subject: [PATCH] PM / watchdog: iTCO: stop watchdog during system 
> suspend
>
> From: Rafael J. Wysocki 
>
> If the target sleep state of the system is not an ACPI sleep state (S1, S2 or 
> S3), the TCO watchdog needs to be stopped during system suspend, because it 
> may not be possible to ping it any more after timekeeping has been suspended 
> (suspend-to-idle does that for one example).
>
> For this reason, provide ->suspend_noirq and ->resume_noirq callbacks for the 
> iTCO watchdog driver and use them to stop and restart the watchdog during 
> system suspend and resume, respectively, if the system is not going to enter 
> an ACPI sleep state (in which case the watchdog will be stopped by the 
> platform firmware before the state is entered).
>
> Reported-by: Borun Fu 
> Signed-off-by: Rafael J. Wysocki 
> ---
>   drivers/watchdog/iTCO_wdt.c |   51 
> 
>   1 file changed, 51 insertions(+)
>
> Index: linux-pm/drivers/watchdog/iTCO_wdt.c
> ===
> --- linux-pm.orig/drivers/watchdog/iTCO_wdt.c
> +++ linux-pm/drivers/watchdog/iTCO_wdt.c
> @@ -51,6 +51,7 @@
>   #define DRV_VERSION "1.11"
>
>   /* Includes */
> +#include   /* For ACPI support */
>   #include/* For module specific items */
>   #include   /* For new moduleparam's */
>   #include /* For standard types (like size_t) */
> @@ -103,6 +104,8 @@ static struct {   /* this is private data
>   struct platform_device *dev;
>   /* the PCI-device */
>   struct pci_dev *pdev;
> + /* whether or not the watchdog has been suspended */
> + bool suspended;
>   } iTCO_wdt_private;
>
>   /* module parameters */
> @@ -571,12 +574,60 @@ static void iTCO_wdt_shutdown(struct pla
>   iTCO_wdt_stop(NULL);
>   }
>
> +#ifdef CONFIG_PM_SLEEP
> +/*
> + * Suspend-to-idle requires this, because it stops the ticks and 
> +timekeeping, so
> + * the watchdog cannot be pinged while in that state.  In ACPI sleep 
> +states the
> + * watchdog is stopped by the platform firmware.
> + */
> +
> +#ifdef CONFIG_ACPI
> +static inline bool need_suspend(void) {
> + return acpi_target_system_state() == ACPI_STATE_S0; } #else static 
> +inline bool need_suspend(void) { return true; } #endif
> +
> +static int iTCO_wdt_suspend_noirq(struct device *dev) {
> + int ret = 0;
> +
> + iTCO_wdt_private.suspended = false;
> + if (watchdog_active(&iTCO_wdt_watchdog_dev) && need_suspend()) {
> + ret = iTCO_wdt_stop(&iTCO_wdt_watchdog_dev);
> + if (!ret)
> + iTCO_wdt_private.suspended = true;
> + }
> + return ret;
> +}
> +
> +static int iTCO_wdt_resume_noirq(struct device *dev) {
> + if (iTCO_wdt_private.suspended)
> + iTCO_wdt_start(&iTCO_wdt_watchdog_dev);
> +
> + return 0;
> +}
> +
> +struct dev_pm_ops iTCO_wdt_pm = {
> + .suspend_noirq = iTCO_wdt_suspend_noirq,
> + .resume_noirq = iTCO_wdt_resume_noirq, };
> +
> +#define ITCO_WDT_PM_OPS  &iTCO_wdt_pm
> +#else
> +#define ITCO_WDT_PM_OPS  NULL
> +#endif /* CONFIG_PM_SLEEP */
> +
>   static struct platform_driver iTCO_wdt_driver = {
>   .probe  = iTCO_wdt_probe,
>   .remove = iTCO_wdt_remove,
>   .shutdown   = iTCO_wdt_shutdown,
>   .driver = {
>   .name   = DRV_NAME,
> + .pm = ITCO_WDT_PM_OPS,
>   },
>   };
>
>
> N r  y   b X  ǧv ^ )޺{.n +{   \{ay ʇڙ ,j 
>   f   h   z  w   
   j:+v   w j m zZ+ ݢj"  !tml=
>

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[RFC][PATCH 10/17 v2] x86/tlb/trace: Export enums in used by tlb_flush tracepoint

2015-04-01 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

Have the enums used in __print_symbolic() by the trace_tlb_flush()
tracepoint exported to userpace such that they can be parsed by
userspace tools.

Cc: Dave Hansen 
Signed-off-by: Steven Rostedt 
---
 include/trace/events/tlb.h | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
index 0e7635765153..4250f364a6ca 100644
--- a/include/trace/events/tlb.h
+++ b/include/trace/events/tlb.h
@@ -7,11 +7,31 @@
 #include 
 #include 
 
-#define TLB_FLUSH_REASON   \
-   { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },   \
-   { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },   \
-   { TLB_LOCAL_SHOOTDOWN,  "local shootdown" },\
-   { TLB_LOCAL_MM_SHOOTDOWN,   "local mm shootdown" }
+#define TLB_FLUSH_REASON   \
+   EM(  TLB_FLUSH_ON_TASK_SWITCH,  "flush on task switch" )\
+   EM(  TLB_REMOTE_SHOOTDOWN,  "remote shootdown" )\
+   EM(  TLB_LOCAL_SHOOTDOWN,   "local shootdown" ) \
+   EMe( TLB_LOCAL_MM_SHOOTDOWN,"local mm shootdown" )
+
+/*
+ * First define the enums in TLB_FLUSH_REASON to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a,b)TRACE_DEFINE_ENUM(a);
+#define EMe(a,b)   TRACE_DEFINE_ENUM(a);
+
+TLB_FLUSH_REASON
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a,b){ a, b },
+#define EMe(a,b)   { a, b }
 
 TRACE_EVENT_CONDITION(tlb_flush,
 
-- 
2.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 13/17 v2] irq/tracing: Export enums in tracepoints to user space

2015-04-01 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

The enums used by the softirq mapping is what is shown in the output
of the __print_symbolic() and not their values, that are needed
to map them to their strings. Export them to userspace with the
TRACE_DEFINE_ENUM() macro so that user space tools can map the enums
with their values.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Signed-off-by: Steven Rostedt 
---
 include/trace/events/irq.h | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 3608bebd3d9c..ff8f6c091a15 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -9,19 +9,34 @@
 struct irqaction;
 struct softirq_action;
 
-#define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
+#define SOFTIRQ_NAME_LIST  \
+softirq_name(HI)   \
+softirq_name(TIMER)\
+softirq_name(NET_TX)   \
+softirq_name(NET_RX)   \
+softirq_name(BLOCK)\
+softirq_name(BLOCK_IOPOLL) \
+softirq_name(TASKLET)  \
+softirq_name(SCHED)\
+softirq_name(HRTIMER)  \
+softirq_name_end(RCU)
+
+#undef softirq_name
+#undef softirq_name_end
+
+#define softirq_name(sirq) TRACE_DEFINE_ENUM(sirq##_SOFTIRQ);
+#define softirq_name_end(sirq)  TRACE_DEFINE_ENUM(sirq##_SOFTIRQ);
+
+SOFTIRQ_NAME_LIST
+
+#undef softirq_name
+#undef softirq_name_end
+
+#define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq },
+#define softirq_name_end(sirq) { sirq##_SOFTIRQ, #sirq }
+
 #define show_softirq_name(val) \
-   __print_symbolic(val,   \
-softirq_name(HI),  \
-softirq_name(TIMER),   \
-softirq_name(NET_TX),  \
-softirq_name(NET_RX),  \
-softirq_name(BLOCK),   \
-softirq_name(BLOCK_IOPOLL),\
-softirq_name(TASKLET), \
-softirq_name(SCHED),   \
-softirq_name(HRTIMER), \
-softirq_name(RCU))
+   __print_symbolic(val, SOFTIRQ_NAME_LIST)
 
 /**
  * irq_handler_entry - called immediately before the irq action handler
-- 
2.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 742 matches

Mail list logo