Re: [PATCH] time: Avoid signed overflow in timekeeping_delta_to_ns()

2016-11-14 Thread Laurent Vivier


On 14/11/2016 20:42, Chris Metcalf wrote:
> This bugfix was originally made in commit 35a4933a8959 ("time:
> Avoid signed overflow in timekeeping_get_ns()").  When the code was
> refactored in commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds
> translation") the signed overflow fix was lost.  Re-introduce it.
> 
> Signed-off-by: Chris Metcalf 
> ---
> I happened to be looking for an unrelated fix, found this code,
> realized the tip code didn't match the fixed code, and
> backtracked to where it had gone away.
> 
>  kernel/time/timekeeping.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 37dec7e3db43..57926bc7b7f3 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -304,8 +304,7 @@ static inline s64 timekeeping_delta_to_ns(struct 
> tk_read_base *tkr,
>  {
>   s64 nsec;
>  
> - nsec = delta * tkr->mult + tkr->xtime_nsec;
> - nsec >>= tkr->shift;
> + nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>  
>   /* If arch requires, add in get_arch_timeoffset() */
>   return nsec + arch_gettimeoffset();
> 
Reviewed-by: Laurent Vivier 


Re: [PATCH] time: Avoid signed overflow in timekeeping_delta_to_ns()

2016-11-14 Thread Laurent Vivier


On 14/11/2016 20:42, Chris Metcalf wrote:
> This bugfix was originally made in commit 35a4933a8959 ("time:
> Avoid signed overflow in timekeeping_get_ns()").  When the code was
> refactored in commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds
> translation") the signed overflow fix was lost.  Re-introduce it.
> 
> Signed-off-by: Chris Metcalf 
> ---
> I happened to be looking for an unrelated fix, found this code,
> realized the tip code didn't match the fixed code, and
> backtracked to where it had gone away.
> 
>  kernel/time/timekeeping.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 37dec7e3db43..57926bc7b7f3 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -304,8 +304,7 @@ static inline s64 timekeeping_delta_to_ns(struct 
> tk_read_base *tkr,
>  {
>   s64 nsec;
>  
> - nsec = delta * tkr->mult + tkr->xtime_nsec;
> - nsec >>= tkr->shift;
> + nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>  
>   /* If arch requires, add in get_arch_timeoffset() */
>   return nsec + arch_gettimeoffset();
> 
Reviewed-by: Laurent Vivier 


Re: [PATCH 1/2] Staging: fsl-mc: include: mc: Kernel type 's16' preferred over 'int16_t'

2016-11-14 Thread Shiva Kerdel



-Original Message-
From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
Sent: Monday, November 14, 2016 4:06 AM
To: Stuart Yoder 
Cc: Shiva Kerdel ; de...@driverdev.osuosl.org; 
gre...@linuxfoundation.org; linux-
ker...@vger.kernel.org; Nipun Gupta ; tred...@nvidia.com; 
Laurentiu Tudor

Subject: Re: [PATCH 1/2] Staging: fsl-mc: include: mc: Kernel type 's16' 
preferred over 'int16_t'

On Fri, Nov 11, 2016 at 02:52:31PM +, Stuart Yoder wrote:

diff --git a/drivers/staging/fsl-mc/include/mc-bus.h 
b/drivers/staging/fsl-mc/include/mc-bus.h
index e915574..c7cad87 100644
--- a/drivers/staging/fsl-mc/include/mc-bus.h
+++ b/drivers/staging/fsl-mc/include/mc-bus.h
@@ -42,8 +42,8 @@ struct msi_domain_info;
   */
  struct fsl_mc_resource_pool {
enum fsl_mc_pool_type type;
-   int16_t max_count;
-   int16_t free_count;
+   s16 max_count;

My understanding is that this has to be signed because the design of
this driver is that we keep adding devices until the the counter
overflows.  After that there are a couple tests for
"if (WARN_ON(res_pool->max_count < 0)) " which prevent the driver from
working again.

This all seems pretty horrible.

Can you elaborate?

The resource pools managed by this driver are populated by hardware objects
discovered when the fsl-mc bus probes a DPRC/container.

The number of potential objects discovered of a given type is in the hundreds,
so a signed 16-bit number is order of magnitudes larger than anything we will
ever encounter.

Would you feel better about this if max_count was an int?

Yeah.


The max_count reflects the total number of objects discovered.  If that is
exceeded we display a warning, because something is horribly wrong.  Nothing
stops working, the allocator simply refuses to add anything else to the
free list.

I didn't look at this carefully...  Anyway we can't remove devices
either.  If we just had an upper bound instead of overflowing the s16
then we could still remove devices.


The only reason max_count is there at all is as an internal check against
bugs and resource leaks.  If the driver is being removed and a resource
pool is being freed, max_count must be zero...i.e. all objects should have
been removed.  If not, there is a leak somewhere.  So, it's a sanity check.


Just use a normal upper bound with a #define instead of an magic number
hidden and then disguised as an integer overflow.

Ok, agree that it would be clearer like that.

Shiva, can you respin this patch and just make both max_count and free_count
to be of type "int".

I will get Dan's suggestion sent as a separate patch...to #define the upper 
bound
instead of relying on integer overflow.

Thanks,
Stuart

I will do that, thank you for the clarification of what I should do.

Thanks,
Shiva Kerdel


Re: [PATCH 1/2] Staging: fsl-mc: include: mc: Kernel type 's16' preferred over 'int16_t'

2016-11-14 Thread Shiva Kerdel



-Original Message-
From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
Sent: Monday, November 14, 2016 4:06 AM
To: Stuart Yoder 
Cc: Shiva Kerdel ; de...@driverdev.osuosl.org; 
gre...@linuxfoundation.org; linux-
ker...@vger.kernel.org; Nipun Gupta ; tred...@nvidia.com; 
Laurentiu Tudor

Subject: Re: [PATCH 1/2] Staging: fsl-mc: include: mc: Kernel type 's16' 
preferred over 'int16_t'

On Fri, Nov 11, 2016 at 02:52:31PM +, Stuart Yoder wrote:

diff --git a/drivers/staging/fsl-mc/include/mc-bus.h 
b/drivers/staging/fsl-mc/include/mc-bus.h
index e915574..c7cad87 100644
--- a/drivers/staging/fsl-mc/include/mc-bus.h
+++ b/drivers/staging/fsl-mc/include/mc-bus.h
@@ -42,8 +42,8 @@ struct msi_domain_info;
   */
  struct fsl_mc_resource_pool {
enum fsl_mc_pool_type type;
-   int16_t max_count;
-   int16_t free_count;
+   s16 max_count;

My understanding is that this has to be signed because the design of
this driver is that we keep adding devices until the the counter
overflows.  After that there are a couple tests for
"if (WARN_ON(res_pool->max_count < 0)) " which prevent the driver from
working again.

This all seems pretty horrible.

Can you elaborate?

The resource pools managed by this driver are populated by hardware objects
discovered when the fsl-mc bus probes a DPRC/container.

The number of potential objects discovered of a given type is in the hundreds,
so a signed 16-bit number is order of magnitudes larger than anything we will
ever encounter.

Would you feel better about this if max_count was an int?

Yeah.


The max_count reflects the total number of objects discovered.  If that is
exceeded we display a warning, because something is horribly wrong.  Nothing
stops working, the allocator simply refuses to add anything else to the
free list.

I didn't look at this carefully...  Anyway we can't remove devices
either.  If we just had an upper bound instead of overflowing the s16
then we could still remove devices.


The only reason max_count is there at all is as an internal check against
bugs and resource leaks.  If the driver is being removed and a resource
pool is being freed, max_count must be zero...i.e. all objects should have
been removed.  If not, there is a leak somewhere.  So, it's a sanity check.


Just use a normal upper bound with a #define instead of an magic number
hidden and then disguised as an integer overflow.

Ok, agree that it would be clearer like that.

Shiva, can you respin this patch and just make both max_count and free_count
to be of type "int".

I will get Dan's suggestion sent as a separate patch...to #define the upper 
bound
instead of relying on integer overflow.

Thanks,
Stuart

I will do that, thank you for the clarification of what I should do.

Thanks,
Shiva Kerdel


Re: [PATCH RFC tip/core/rcu] SRCU rewrite

2016-11-14 Thread Peter Zijlstra
On Mon, Nov 14, 2016 at 10:36:36AM -0800, Paul E. McKenney wrote:
> SRCU uses two per-cpu counters: a nesting counter to count the number of
> active critical sections, and a sequence counter to ensure that the nesting
> counters don't change while they are being added together in
> srcu_readers_active_idx_check().
> 
> This patch instead uses per-cpu lock and unlock counters. Because the both
> counters only increase and srcu_readers_active_idx_check() reads the unlock
> counter before the lock counter, this achieves the same end without having
> to increment two different counters in srcu_read_lock(). This also saves a
> smp_mb() in srcu_readers_active_idx_check().

A very small improvement... I feel SRCU has much bigger issues :/


Re: [PATCH RFC tip/core/rcu] SRCU rewrite

2016-11-14 Thread Peter Zijlstra
On Mon, Nov 14, 2016 at 10:36:36AM -0800, Paul E. McKenney wrote:
> SRCU uses two per-cpu counters: a nesting counter to count the number of
> active critical sections, and a sequence counter to ensure that the nesting
> counters don't change while they are being added together in
> srcu_readers_active_idx_check().
> 
> This patch instead uses per-cpu lock and unlock counters. Because the both
> counters only increase and srcu_readers_active_idx_check() reads the unlock
> counter before the lock counter, this achieves the same end without having
> to increment two different counters in srcu_read_lock(). This also saves a
> smp_mb() in srcu_readers_active_idx_check().

A very small improvement... I feel SRCU has much bigger issues :/


[PATCH] [media] ir-hix5hd2: make hisilicon,power-syscon property deprecated

2016-11-14 Thread Jiancheng Xue
From: Ruqiang Ju 

The clock of IR can be provided by the clock provider and controlled
by common clock framework APIs.

Signed-off-by: Ruqiang Ju 
Signed-off-by: Jiancheng Xue 
---
 .../devicetree/bindings/media/hix5hd2-ir.txt   |  6 +++---
 drivers/media/rc/ir-hix5hd2.c  | 25 ++
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/media/hix5hd2-ir.txt 
b/Documentation/devicetree/bindings/media/hix5hd2-ir.txt
index fb5e760..54e1bed 100644
--- a/Documentation/devicetree/bindings/media/hix5hd2-ir.txt
+++ b/Documentation/devicetree/bindings/media/hix5hd2-ir.txt
@@ -8,10 +8,11 @@ Required properties:
  the device. The interrupt specifier format depends on the interrupt
  controller parent.
- clocks: clock phandle and specifier pair.
-   - hisilicon,power-syscon: phandle of syscon used to control power.

 Optional properties:
- linux,rc-map-name : Remote control map name.
+   - hisilicon,power-syscon: DEPRECATED. Don't use this in new dts files.
+   Provide correct clocks instead.

 Example node:

@@ -19,7 +20,6 @@ Example node:
compatible = "hisilicon,hix5hd2-ir";
reg = <0xf8001000 0x1000>;
interrupts = <0 47 4>;
-   clocks = < HIX5HD2_FIXED_24M>;
-   hisilicon,power-syscon = <>;
+   clocks = < HIX5HD2_IR_CLOCK>;
linux,rc-map-name = "rc-tivo";
};
diff --git a/drivers/media/rc/ir-hix5hd2.c b/drivers/media/rc/ir-hix5hd2.c
index d0549fb..d26907e 100644
--- a/drivers/media/rc/ir-hix5hd2.c
+++ b/drivers/media/rc/ir-hix5hd2.c
@@ -75,15 +75,22 @@ static void hix5hd2_ir_enable(struct hix5hd2_ir_priv *dev, 
bool on)
 {
u32 val;

-   regmap_read(dev->regmap, IR_CLK, );
-   if (on) {
-   val &= ~IR_CLK_RESET;
-   val |= IR_CLK_ENABLE;
+   if (dev->regmap) {
+   regmap_read(dev->regmap, IR_CLK, );
+   if (on) {
+   val &= ~IR_CLK_RESET;
+   val |= IR_CLK_ENABLE;
+   } else {
+   val &= ~IR_CLK_ENABLE;
+   val |= IR_CLK_RESET;
+   }
+   regmap_write(dev->regmap, IR_CLK, val);
} else {
-   val &= ~IR_CLK_ENABLE;
-   val |= IR_CLK_RESET;
+   if (on)
+   clk_prepare_enable(dev->clock);
+   else
+   clk_disable_unprepare(dev->clock);
}
-   regmap_write(dev->regmap, IR_CLK, val);
 }

 static int hix5hd2_ir_config(struct hix5hd2_ir_priv *priv)
@@ -207,8 +214,8 @@ static int hix5hd2_ir_probe(struct platform_device *pdev)
priv->regmap = syscon_regmap_lookup_by_phandle(node,
   
"hisilicon,power-syscon");
if (IS_ERR(priv->regmap)) {
-   dev_err(dev, "no power-reg\n");
-   return -EINVAL;
+   dev_info(dev, "no power-reg\n");
+   priv->regmap = NULL;
}

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
--
1.9.1



[PATCH] [media] ir-hix5hd2: make hisilicon,power-syscon property deprecated

2016-11-14 Thread Jiancheng Xue
From: Ruqiang Ju 

The clock of IR can be provided by the clock provider and controlled
by common clock framework APIs.

Signed-off-by: Ruqiang Ju 
Signed-off-by: Jiancheng Xue 
---
 .../devicetree/bindings/media/hix5hd2-ir.txt   |  6 +++---
 drivers/media/rc/ir-hix5hd2.c  | 25 ++
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/media/hix5hd2-ir.txt 
b/Documentation/devicetree/bindings/media/hix5hd2-ir.txt
index fb5e760..54e1bed 100644
--- a/Documentation/devicetree/bindings/media/hix5hd2-ir.txt
+++ b/Documentation/devicetree/bindings/media/hix5hd2-ir.txt
@@ -8,10 +8,11 @@ Required properties:
  the device. The interrupt specifier format depends on the interrupt
  controller parent.
- clocks: clock phandle and specifier pair.
-   - hisilicon,power-syscon: phandle of syscon used to control power.

 Optional properties:
- linux,rc-map-name : Remote control map name.
+   - hisilicon,power-syscon: DEPRECATED. Don't use this in new dts files.
+   Provide correct clocks instead.

 Example node:

@@ -19,7 +20,6 @@ Example node:
compatible = "hisilicon,hix5hd2-ir";
reg = <0xf8001000 0x1000>;
interrupts = <0 47 4>;
-   clocks = < HIX5HD2_FIXED_24M>;
-   hisilicon,power-syscon = <>;
+   clocks = < HIX5HD2_IR_CLOCK>;
linux,rc-map-name = "rc-tivo";
};
diff --git a/drivers/media/rc/ir-hix5hd2.c b/drivers/media/rc/ir-hix5hd2.c
index d0549fb..d26907e 100644
--- a/drivers/media/rc/ir-hix5hd2.c
+++ b/drivers/media/rc/ir-hix5hd2.c
@@ -75,15 +75,22 @@ static void hix5hd2_ir_enable(struct hix5hd2_ir_priv *dev, 
bool on)
 {
u32 val;

-   regmap_read(dev->regmap, IR_CLK, );
-   if (on) {
-   val &= ~IR_CLK_RESET;
-   val |= IR_CLK_ENABLE;
+   if (dev->regmap) {
+   regmap_read(dev->regmap, IR_CLK, );
+   if (on) {
+   val &= ~IR_CLK_RESET;
+   val |= IR_CLK_ENABLE;
+   } else {
+   val &= ~IR_CLK_ENABLE;
+   val |= IR_CLK_RESET;
+   }
+   regmap_write(dev->regmap, IR_CLK, val);
} else {
-   val &= ~IR_CLK_ENABLE;
-   val |= IR_CLK_RESET;
+   if (on)
+   clk_prepare_enable(dev->clock);
+   else
+   clk_disable_unprepare(dev->clock);
}
-   regmap_write(dev->regmap, IR_CLK, val);
 }

 static int hix5hd2_ir_config(struct hix5hd2_ir_priv *priv)
@@ -207,8 +214,8 @@ static int hix5hd2_ir_probe(struct platform_device *pdev)
priv->regmap = syscon_regmap_lookup_by_phandle(node,
   
"hisilicon,power-syscon");
if (IS_ERR(priv->regmap)) {
-   dev_err(dev, "no power-reg\n");
-   return -EINVAL;
+   dev_info(dev, "no power-reg\n");
+   priv->regmap = NULL;
}

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
--
1.9.1



Re: [RFC][PATCH 5/7] kref: Implement kref_put_lock()

2016-11-14 Thread Peter Zijlstra
On Mon, Nov 14, 2016 at 12:35:48PM -0800, Kees Cook wrote:
> On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra  wrote:
> > Because home-rolling your own is _awesome_, stop doing it. Provide
> > kref_put_lock(), just like kref_put_mutex() but for a spinlock.
> >
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  include/linux/kref.h |   21 +++--
> >  net/sunrpc/svcauth.c |   15 ++-
> >  2 files changed, 25 insertions(+), 11 deletions(-)
> >
> > --- a/include/linux/kref.h
> > +++ b/include/linux/kref.h
> > @@ -86,12 +86,21 @@ static inline int kref_put_mutex(struct
> >  struct mutex *lock)
> >  {
> > WARN_ON(release == NULL);
> 
> This WARN_ON makes sense, yes, though it seems like it should be deal
> with differently. If it's NULL, we'll just Oops when we call release()
> later... Seems like this should saturate the kref or something else
> similar.

So I simply took the pattern from the existing kref_put().

But I like it more in these kref_put_{lock,mutex}() variants, because
someone will need to unlock. If we simply crash/bug without unlock we'll
have broken state the rest of the kernel cannot fix up.



Re: [RFC][PATCH 5/7] kref: Implement kref_put_lock()

2016-11-14 Thread Peter Zijlstra
On Mon, Nov 14, 2016 at 12:35:48PM -0800, Kees Cook wrote:
> On Mon, Nov 14, 2016 at 9:39 AM, Peter Zijlstra  wrote:
> > Because home-rolling your own is _awesome_, stop doing it. Provide
> > kref_put_lock(), just like kref_put_mutex() but for a spinlock.
> >
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  include/linux/kref.h |   21 +++--
> >  net/sunrpc/svcauth.c |   15 ++-
> >  2 files changed, 25 insertions(+), 11 deletions(-)
> >
> > --- a/include/linux/kref.h
> > +++ b/include/linux/kref.h
> > @@ -86,12 +86,21 @@ static inline int kref_put_mutex(struct
> >  struct mutex *lock)
> >  {
> > WARN_ON(release == NULL);
> 
> This WARN_ON makes sense, yes, though it seems like it should be deal
> with differently. If it's NULL, we'll just Oops when we call release()
> later... Seems like this should saturate the kref or something else
> similar.

So I simply took the pattern from the existing kref_put().

But I like it more in these kref_put_{lock,mutex}() variants, because
someone will need to unlock. If we simply crash/bug without unlock we'll
have broken state the rest of the kernel cannot fix up.



Re: [RFC][PATCH 0/7] kref improvements

2016-11-14 Thread Peter Zijlstra
On Tue, Nov 15, 2016 at 08:27:42AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > This series unfscks kref and then implements it in terms of refcount_t.
> > 
> > x86_64-allyesconfig compile tested and boot tested with my regular config.
> > 
> > refcount_t is as per the previous thread, it BUGs on over-/underflow and
> > saturates at UINT_MAX, such that if we ever overflow, we'll never free 
> > again.
> > 
> > 
> 
> Thanks so much for doing these, at the very least, I want to take the
> kref-abuse-fixes now as those users shouldn't be doing those foolish
> things.  Any objection for me taking some of them through my tree now?

None at all, but please double check at least the 'kill kref_sub()' one,
I might have messed up drbd or something, that code isn't entirely
transparant.


Re: [RFC][PATCH 0/7] kref improvements

2016-11-14 Thread Peter Zijlstra
On Tue, Nov 15, 2016 at 08:27:42AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > This series unfscks kref and then implements it in terms of refcount_t.
> > 
> > x86_64-allyesconfig compile tested and boot tested with my regular config.
> > 
> > refcount_t is as per the previous thread, it BUGs on over-/underflow and
> > saturates at UINT_MAX, such that if we ever overflow, we'll never free 
> > again.
> > 
> > 
> 
> Thanks so much for doing these, at the very least, I want to take the
> kref-abuse-fixes now as those users shouldn't be doing those foolish
> things.  Any objection for me taking some of them through my tree now?

None at all, but please double check at least the 'kill kref_sub()' one,
I might have messed up drbd or something, that code isn't entirely
transparant.


Re: [RFC][PATCH 2/7] kref: Add kref_read()

2016-11-14 Thread Peter Zijlstra
On Tue, Nov 15, 2016 at 08:28:55AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> > On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > > Since we need to change the implementation, stop exposing internals.
> > > 
> > > Provide kref_read() to read the current reference count; typically
> > > used for debug messages.
> > 
> > Can we just provide a printk specifier for a kref value instead as
> > that is the only valid use case for reading the value?
> 
> Yeah, that would be great as no one should be doing anything
> logic-related based on the kref value.

IIRC there are a few users that WARN_ON() the value with a minimum bound
or somesuch. Those would be left in the cold, but yes I too like the
idea of a printk() format specifier.


Re: [RFC][PATCH 2/7] kref: Add kref_read()

2016-11-14 Thread Peter Zijlstra
On Tue, Nov 15, 2016 at 08:28:55AM +0100, Greg KH wrote:
> On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> > On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > > Since we need to change the implementation, stop exposing internals.
> > > 
> > > Provide kref_read() to read the current reference count; typically
> > > used for debug messages.
> > 
> > Can we just provide a printk specifier for a kref value instead as
> > that is the only valid use case for reading the value?
> 
> Yeah, that would be great as no one should be doing anything
> logic-related based on the kref value.

IIRC there are a few users that WARN_ON() the value with a minimum bound
or somesuch. Those would be left in the cold, but yes I too like the
idea of a printk() format specifier.


Re: [RFC][PATCH 0/7] kref improvements

2016-11-14 Thread Ingo Molnar

* Greg KH  wrote:

> On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > This series unfscks kref and then implements it in terms of refcount_t.
> > 
> > x86_64-allyesconfig compile tested and boot tested with my regular config.
> > 
> > refcount_t is as per the previous thread, it BUGs on over-/underflow and
> > saturates at UINT_MAX, such that if we ever overflow, we'll never free 
> > again.
> > 
> > 
> 
> Thanks so much for doing these, at the very least, I want to take the
> kref-abuse-fixes now as those users shouldn't be doing those foolish
> things.  Any objection for me taking some of them through my tree now?

Very nice series indeed!

We normally route atomics related patches through tip:locking/core (there's 
also 
tip:atomic/core), but this is a special case I think, given how broadly it 
interacts with driver code.

So both would work I think: we could concentrate these and only these patches 
into 
tip:atomic/core into an append-only tree, or you can carry them in the driver 
tree 
- whichever variant you prefer!

Thanks,

Ingo


Re: [RFC][PATCH 0/7] kref improvements

2016-11-14 Thread Ingo Molnar

* Greg KH  wrote:

> On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> > This series unfscks kref and then implements it in terms of refcount_t.
> > 
> > x86_64-allyesconfig compile tested and boot tested with my regular config.
> > 
> > refcount_t is as per the previous thread, it BUGs on over-/underflow and
> > saturates at UINT_MAX, such that if we ever overflow, we'll never free 
> > again.
> > 
> > 
> 
> Thanks so much for doing these, at the very least, I want to take the
> kref-abuse-fixes now as those users shouldn't be doing those foolish
> things.  Any objection for me taking some of them through my tree now?

Very nice series indeed!

We normally route atomics related patches through tip:locking/core (there's 
also 
tip:atomic/core), but this is a special case I think, given how broadly it 
interacts with driver code.

So both would work I think: we could concentrate these and only these patches 
into 
tip:atomic/core into an append-only tree, or you can carry them in the driver 
tree 
- whichever variant you prefer!

Thanks,

Ingo


Re: [PATCH v2] tile: handle __ro_after_init like parisc does

2016-11-14 Thread Heiko Carstens
On Mon, Nov 14, 2016 at 01:12:05PM -0800, Kees Cook wrote:
> At some point here, I want to collect all the arch maintainers and
> discuss the options for correctly reflecting the three data
> memory-protection needs we have:
> 
> - always read-only
> - read-only after init
> - read-only except during rare updates
> 
> (The latter one doesn't exist all yet...)
> 
> x86, arm, and arm64 use mark_rodata_ro() after init finishes, so they
> don't technically implement "always read-only". parisc, tile, powerpc,
> others have "always read-only", but disable read-only-after-init since
> they don't use mark_rodata_ro(). I think s390 has recently implemented
> both, but I have to double-check...

Yes, s390 has both: an early always read-only support, which is effective
as soon as paging_init() has set up and enabled page tables.
Our mark_rodata_ro() implementation only makes the ro_after_init section
read-only.



Re: [PATCH v2] tile: handle __ro_after_init like parisc does

2016-11-14 Thread Heiko Carstens
On Mon, Nov 14, 2016 at 01:12:05PM -0800, Kees Cook wrote:
> At some point here, I want to collect all the arch maintainers and
> discuss the options for correctly reflecting the three data
> memory-protection needs we have:
> 
> - always read-only
> - read-only after init
> - read-only except during rare updates
> 
> (The latter one doesn't exist all yet...)
> 
> x86, arm, and arm64 use mark_rodata_ro() after init finishes, so they
> don't technically implement "always read-only". parisc, tile, powerpc,
> others have "always read-only", but disable read-only-after-init since
> they don't use mark_rodata_ro(). I think s390 has recently implemented
> both, but I have to double-check...

Yes, s390 has both: an early always read-only support, which is effective
as soon as paging_init() has set up and enabled page tables.
Our mark_rodata_ro() implementation only makes the ro_after_init section
read-only.



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 08:15, Jan Beulich wrote:
 On 15.11.16 at 07:33,  wrote:
>> On 15/11/16 01:11, Alex Thorlton wrote:
>>> Hey everyone,
>>>
>>> We're having problems with large systems hitting a BUG in
>>> xen_memory_setup, due to extra e820 entries created in the
>>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>>> working, but Boris and I wanted to get opinions on whether or not this
>>> is the appropriate/entire solution, which is why I've sent it as an RFC
>>> for now.
>>>
>>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>>> which is a detail worth discussig.  He proposed possibly adding
>>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>>> value than E820MAX, since the Xen e820 table isn't bound by the
>>> zero-page memory limitations.
>>>
>>> I do *slightly* question the use of E820_X_MAX here, only from a
>>> cosmetic prospective, as I believe this macro is intended to describe
>>> the maximum size of the extended e820 table, which, AFAIK, is not used
>>> by the Xen HV.  That being said, there isn't exactly a "more
>>> appropriate" macro/variable to use, so this may not really be an issue.
>>>
>>> Any input on the patch, or the questions I've raised above is greatly
>>> appreciated!
>>
>> While I think extending the e820 table is the right thing to do I'm
>> questioning the assumptions here.
>>
>> Looking briefly through the Xen hypervisor sources I think it isn't
>> yet ready for such large machines: the hypervisor's e820 map seems to
>> be still limited to 128 e820 entries. Jan, did I overlook an EFI
>> specific path extending this limitation?
> 
> No, you didn't. I do question the correlation with "large machines"
> here though: The issue isn't with large machines afaict, but with
> ones having very many entries (i.e. heavily fragmented).

Fact is: non-EFI machines are limited to 128 entries.

One reason for fragmentation is NUMA - which tends to be present and
especially adding many entries only with lots of nodes - on large
machines.

So while you are of course right that the problem isn't the size of
the machine, but the memory fragmentation, the chances to show up
are much higher on large machines.

So I'd rephrase:

Looking briefly through the Xen hypervisor sources I think it isn't yet
ready for machines with many e820 entries due to memory fragmentation
to be found e.g. on very large NUMA machines with lots of nodes: ...

>> In case I'm right the Xen hypervisor should be prepared for a larger
>> e820 map, but this won't help alone as there would still be additional
>> entries for the IOAPICs created.
>>
>> So I think we need something like:
>>
>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>
>> and use this for sizing xen_e820_map[].
> 
> I would say that if any change gets done here, there shouldn't be
> any static upper limit at all. That could even be viewed as in line
> with recent e820.c changes moving to dynamic allocations. In
> particular I don't see why MAX_IO_APICS would need adding in
> here, but not other (current and future) factors determining the
> (pseudo) E820 map Xen presents to Dom0.

The hypervisor interface of XENMEM_machine_memory_map requires to
specify the size of the e820 map where the hypervisor can supply
entries. The alternative would be try and error: start with a small
e820 map and in case of error increasing the size until success. I
don't like this approach. Especially as dynamic memory allocations
are not possible at this stage (using RESERVE_BRK() isn't any better
than a static __initdata array IMO).

Which other current factors do you see determining the number of
e820 entries presented to Dom0?


Juergen


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 08:15, Jan Beulich wrote:
 On 15.11.16 at 07:33,  wrote:
>> On 15/11/16 01:11, Alex Thorlton wrote:
>>> Hey everyone,
>>>
>>> We're having problems with large systems hitting a BUG in
>>> xen_memory_setup, due to extra e820 entries created in the
>>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>>> working, but Boris and I wanted to get opinions on whether or not this
>>> is the appropriate/entire solution, which is why I've sent it as an RFC
>>> for now.
>>>
>>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>>> which is a detail worth discussig.  He proposed possibly adding
>>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>>> value than E820MAX, since the Xen e820 table isn't bound by the
>>> zero-page memory limitations.
>>>
>>> I do *slightly* question the use of E820_X_MAX here, only from a
>>> cosmetic prospective, as I believe this macro is intended to describe
>>> the maximum size of the extended e820 table, which, AFAIK, is not used
>>> by the Xen HV.  That being said, there isn't exactly a "more
>>> appropriate" macro/variable to use, so this may not really be an issue.
>>>
>>> Any input on the patch, or the questions I've raised above is greatly
>>> appreciated!
>>
>> While I think extending the e820 table is the right thing to do I'm
>> questioning the assumptions here.
>>
>> Looking briefly through the Xen hypervisor sources I think it isn't
>> yet ready for such large machines: the hypervisor's e820 map seems to
>> be still limited to 128 e820 entries. Jan, did I overlook an EFI
>> specific path extending this limitation?
> 
> No, you didn't. I do question the correlation with "large machines"
> here though: The issue isn't with large machines afaict, but with
> ones having very many entries (i.e. heavily fragmented).

Fact is: non-EFI machines are limited to 128 entries.

One reason for fragmentation is NUMA - which tends to be present and
especially adding many entries only with lots of nodes - on large
machines.

So while you are of course right that the problem isn't the size of
the machine, but the memory fragmentation, the chances to show up
are much higher on large machines.

So I'd rephrase:

Looking briefly through the Xen hypervisor sources I think it isn't yet
ready for machines with many e820 entries due to memory fragmentation
to be found e.g. on very large NUMA machines with lots of nodes: ...

>> In case I'm right the Xen hypervisor should be prepared for a larger
>> e820 map, but this won't help alone as there would still be additional
>> entries for the IOAPICs created.
>>
>> So I think we need something like:
>>
>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>
>> and use this for sizing xen_e820_map[].
> 
> I would say that if any change gets done here, there shouldn't be
> any static upper limit at all. That could even be viewed as in line
> with recent e820.c changes moving to dynamic allocations. In
> particular I don't see why MAX_IO_APICS would need adding in
> here, but not other (current and future) factors determining the
> (pseudo) E820 map Xen presents to Dom0.

The hypervisor interface of XENMEM_machine_memory_map requires to
specify the size of the e820 map where the hypervisor can supply
entries. The alternative would be try and error: start with a small
e820 map and in case of error increasing the size until success. I
don't like this approach. Especially as dynamic memory allocations
are not possible at this stage (using RESERVE_BRK() isn't any better
than a static __initdata array IMO).

Which other current factors do you see determining the number of
e820 entries presented to Dom0?


Juergen


Re: Linux 4.4.32

2016-11-14 Thread Greg KH
diff --git a/Makefile b/Makefile
index 7c6f28e7a2f6..fba9b09a1330 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 4
 PATCHLEVEL = 4
-SUBLEVEL = 31
+SUBLEVEL = 32
 EXTRAVERSION =
 NAME = Blurry Fish Butt
 
diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
index bbe56871245c..4298aeb1e20f 100644
--- a/arch/mips/kvm/emulate.c
+++ b/arch/mips/kvm/emulate.c
@@ -822,7 +822,7 @@ static void kvm_mips_invalidate_guest_tlb(struct kvm_vcpu 
*vcpu,
bool user;
 
/* No need to flush for entries which are already invalid */
-   if (!((tlb->tlb_lo[0] | tlb->tlb_lo[1]) & ENTRYLO_V))
+   if (!((tlb->tlb_lo0 | tlb->tlb_lo1) & MIPS3_PG_V))
return;
/* User address space doesn't need flushing for KSeg2/3 changes */
user = tlb->tlb_hi < KVM_GUEST_KSEG0;
diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c 
b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
index 21aacc1f45c1..7f85c2c1d681 100644
--- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
@@ -265,15 +265,27 @@ static int amdgpu_atombios_dp_get_dp_link_config(struct 
drm_connector *connector
unsigned max_lane_num = drm_dp_max_lane_count(dpcd);
unsigned lane_num, i, max_pix_clock;
 
-   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
-   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
-   max_pix_clock = (lane_num * link_rates[i] * 8) / bpp;
+   if (amdgpu_connector_encoder_get_dp_bridge_encoder_id(connector) ==
+   ENCODER_OBJECT_ID_NUTMEG) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
+   max_pix_clock = (lane_num * 27 * 8) / bpp;
if (max_pix_clock >= pix_clock) {
*dp_lanes = lane_num;
-   *dp_rate = link_rates[i];
+   *dp_rate = 27;
return 0;
}
}
+   } else {
+   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num 
<<= 1) {
+   max_pix_clock = (lane_num * link_rates[i] * 8) 
/ bpp;
+   if (max_pix_clock >= pix_clock) {
+   *dp_lanes = lane_num;
+   *dp_rate = link_rates[i];
+   return 0;
+   }
+   }
+   }
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/atombios_dp.c 
b/drivers/gpu/drm/radeon/atombios_dp.c
index 44ee72e04df9..b5760851195c 100644
--- a/drivers/gpu/drm/radeon/atombios_dp.c
+++ b/drivers/gpu/drm/radeon/atombios_dp.c
@@ -315,15 +315,27 @@ int radeon_dp_get_dp_link_config(struct drm_connector 
*connector,
unsigned max_lane_num = drm_dp_max_lane_count(dpcd);
unsigned lane_num, i, max_pix_clock;
 
-   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
-   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
-   max_pix_clock = (lane_num * link_rates[i] * 8) / bpp;
+   if (radeon_connector_encoder_get_dp_bridge_encoder_id(connector) ==
+   ENCODER_OBJECT_ID_NUTMEG) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
+   max_pix_clock = (lane_num * 27 * 8) / bpp;
if (max_pix_clock >= pix_clock) {
*dp_lanes = lane_num;
-   *dp_rate = link_rates[i];
+   *dp_rate = 27;
return 0;
}
}
+   } else {
+   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num 
<<= 1) {
+   max_pix_clock = (lane_num * link_rates[i] * 8) 
/ bpp;
+   if (max_pix_clock >= pix_clock) {
+   *dp_lanes = lane_num;
+   *dp_rate = link_rates[i];
+   return 0;
+   }
+   }
+   }
}
 
return -EINVAL;
diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index ca5ac5d6f4e6..49056c33be74 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -18142,14 +18142,14 @@ static pci_ers_result_t tg3_io_error_detected(struct 
pci_dev *pdev,
 
rtnl_lock();
 
-   /* We needn't recover from permanent 

Re: Linux 4.4.32

2016-11-14 Thread Greg KH
diff --git a/Makefile b/Makefile
index 7c6f28e7a2f6..fba9b09a1330 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 4
 PATCHLEVEL = 4
-SUBLEVEL = 31
+SUBLEVEL = 32
 EXTRAVERSION =
 NAME = Blurry Fish Butt
 
diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
index bbe56871245c..4298aeb1e20f 100644
--- a/arch/mips/kvm/emulate.c
+++ b/arch/mips/kvm/emulate.c
@@ -822,7 +822,7 @@ static void kvm_mips_invalidate_guest_tlb(struct kvm_vcpu 
*vcpu,
bool user;
 
/* No need to flush for entries which are already invalid */
-   if (!((tlb->tlb_lo[0] | tlb->tlb_lo[1]) & ENTRYLO_V))
+   if (!((tlb->tlb_lo0 | tlb->tlb_lo1) & MIPS3_PG_V))
return;
/* User address space doesn't need flushing for KSeg2/3 changes */
user = tlb->tlb_hi < KVM_GUEST_KSEG0;
diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c 
b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
index 21aacc1f45c1..7f85c2c1d681 100644
--- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
@@ -265,15 +265,27 @@ static int amdgpu_atombios_dp_get_dp_link_config(struct 
drm_connector *connector
unsigned max_lane_num = drm_dp_max_lane_count(dpcd);
unsigned lane_num, i, max_pix_clock;
 
-   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
-   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
-   max_pix_clock = (lane_num * link_rates[i] * 8) / bpp;
+   if (amdgpu_connector_encoder_get_dp_bridge_encoder_id(connector) ==
+   ENCODER_OBJECT_ID_NUTMEG) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
+   max_pix_clock = (lane_num * 27 * 8) / bpp;
if (max_pix_clock >= pix_clock) {
*dp_lanes = lane_num;
-   *dp_rate = link_rates[i];
+   *dp_rate = 27;
return 0;
}
}
+   } else {
+   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num 
<<= 1) {
+   max_pix_clock = (lane_num * link_rates[i] * 8) 
/ bpp;
+   if (max_pix_clock >= pix_clock) {
+   *dp_lanes = lane_num;
+   *dp_rate = link_rates[i];
+   return 0;
+   }
+   }
+   }
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/atombios_dp.c 
b/drivers/gpu/drm/radeon/atombios_dp.c
index 44ee72e04df9..b5760851195c 100644
--- a/drivers/gpu/drm/radeon/atombios_dp.c
+++ b/drivers/gpu/drm/radeon/atombios_dp.c
@@ -315,15 +315,27 @@ int radeon_dp_get_dp_link_config(struct drm_connector 
*connector,
unsigned max_lane_num = drm_dp_max_lane_count(dpcd);
unsigned lane_num, i, max_pix_clock;
 
-   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
-   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
-   max_pix_clock = (lane_num * link_rates[i] * 8) / bpp;
+   if (radeon_connector_encoder_get_dp_bridge_encoder_id(connector) ==
+   ENCODER_OBJECT_ID_NUTMEG) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num <<= 1) {
+   max_pix_clock = (lane_num * 27 * 8) / bpp;
if (max_pix_clock >= pix_clock) {
*dp_lanes = lane_num;
-   *dp_rate = link_rates[i];
+   *dp_rate = 27;
return 0;
}
}
+   } else {
+   for (i = 0; i < ARRAY_SIZE(link_rates) && link_rates[i] <= 
max_link_rate; i++) {
+   for (lane_num = 1; lane_num <= max_lane_num; lane_num 
<<= 1) {
+   max_pix_clock = (lane_num * link_rates[i] * 8) 
/ bpp;
+   if (max_pix_clock >= pix_clock) {
+   *dp_lanes = lane_num;
+   *dp_rate = link_rates[i];
+   return 0;
+   }
+   }
+   }
}
 
return -EINVAL;
diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index ca5ac5d6f4e6..49056c33be74 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -18142,14 +18142,14 @@ static pci_ers_result_t tg3_io_error_detected(struct 
pci_dev *pdev,
 
rtnl_lock();
 
-   /* We needn't recover from permanent 

Linux 4.8.8

2016-11-14 Thread Greg KH
I'm announcing the release of the 4.8.8 kernel.

All users of the 4.8 kernel series must upgrade.

The updated 4.8.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-4.8.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile   |2 
 arch/powerpc/include/asm/checksum.h|   12 +--
 drivers/infiniband/ulp/ipoib/ipoib.h   |   20 --
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|   15 ++--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|   12 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   54 ++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |6 +
 drivers/net/ethernet/freescale/fec_main.c  |   18 ++---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |   10 ++-
 drivers/net/geneve.c   |2 
 drivers/net/hyperv/netvsc_drv.c|   19 +++--
 drivers/net/macsec.c   |   26 +---
 drivers/net/phy/phy.c  |   22 ++
 drivers/net/vxlan.c|2 
 drivers/ptp/ptp_chardev.c  |1 
 drivers/scsi/megaraid/megaraid_sas.h   |2 
 drivers/scsi/megaraid/megaraid_sas_base.c  |   13 +---
 drivers/usb/dwc3/gadget.c  |7 +-
 include/linux/netdevice.h  |   41 
 include/net/ip.h   |4 -
 include/net/ip6_route.h|1 
 include/uapi/linux/rtnetlink.h |2 
 net/8021q/vlan.c   |2 
 net/bridge/br_multicast.c  |   23 ---
 net/core/dev.c |   80 ++---
 net/core/pktgen.c  |   38 ++-
 net/ethernet/eth.c |2 
 net/ipv4/af_inet.c |2 
 net/ipv4/fou.c |4 -
 net/ipv4/gre_offload.c |2 
 net/ipv4/ip_sockglue.c |   11 +--
 net/ipv4/sysctl_net_ipv4.c |8 +-
 net/ipv4/udp.c |2 
 net/ipv4/udp_offload.c |2 
 net/ipv6/addrconf.c|2 
 net/ipv6/ip6_offload.c |2 
 net/ipv6/ip6_tunnel.c  |3 
 net/ipv6/route.c   |6 +
 net/ipv6/tcp_ipv6.c|   20 +++---
 net/ipv6/udp.c |3 
 net/netlink/af_netlink.c   |7 +-
 net/packet/af_packet.c |   10 +--
 net/sched/act_api.c|   19 +++--
 net/sched/act_vlan.c   |9 ++
 net/sched/cls_api.c|3 
 net/sctp/output.c  |8 ++
 net/sctp/sm_statefuns.c|   12 +--
 net/sctp/socket.c  |5 +
 net/switchdev/switchdev.c  |9 ++
 49 files changed, 368 insertions(+), 217 deletions(-)

Andrew Collins (1):
  net: Add netdev all_adj_list refcnt propagation to fix panic

Andrew Lunn (1):
  net: phy: Trigger state machine on state change and not polling.

Anoob Soman (1):
  packet: call fanout_release, while UNREGISTERING a netdev

Brenden Blanco (1):
  net/mlx4_en: fixup xdp tx irq to match rx

David Ahern (1):
  net: ipv6: Do not consider link state for nexthop validation

Eli Cooper (1):
  ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()

Eric Dumazet (5):
  netlink: do not enter direct reclaim from netlink_dump()
  ipv6: tcp: restore IP6CB for pktoptions skbs
  net: pktgen: remove rcu locking in pktgen_change_name()
  ipv4: disable BH in set_ping_group_range()
  udp: fix IP_CHECKSUM handling

Fabio Estevam (1):
  net: fec: Call swap_buffer() prior to IP header alignment

Felipe Balbi (1):
  usb: dwc3: gadget: properly account queued requests

Gavin Schenk (1):
  net: fec: set mac address unconditionally

Greg Kroah-Hartman (1):
  Linux 4.8.8

Ido Schimmel (2):
  switchdev: Execute bridge ndos only for bridge ports
  net: core: Correctly iterate over lower adjacency list

Ivan Vecera (1):
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold

Jamal Hadi Salim (1):
  net sched filters: fix notification of filter delete with proper handle

Jiri Pirko (1):
  rtnetlink: Add rtnexthop offload flag to compare mask

Jiri Slaby (1):
  net: sctp, forbid negative length

Kashyap Desai (1):
  scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) 
devices


Linux 4.8.8

2016-11-14 Thread Greg KH
I'm announcing the release of the 4.8.8 kernel.

All users of the 4.8 kernel series must upgrade.

The updated 4.8.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-4.8.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile   |2 
 arch/powerpc/include/asm/checksum.h|   12 +--
 drivers/infiniband/ulp/ipoib/ipoib.h   |   20 --
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|   15 ++--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|   12 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   54 ++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |6 +
 drivers/net/ethernet/freescale/fec_main.c  |   18 ++---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |   10 ++-
 drivers/net/geneve.c   |2 
 drivers/net/hyperv/netvsc_drv.c|   19 +++--
 drivers/net/macsec.c   |   26 +---
 drivers/net/phy/phy.c  |   22 ++
 drivers/net/vxlan.c|2 
 drivers/ptp/ptp_chardev.c  |1 
 drivers/scsi/megaraid/megaraid_sas.h   |2 
 drivers/scsi/megaraid/megaraid_sas_base.c  |   13 +---
 drivers/usb/dwc3/gadget.c  |7 +-
 include/linux/netdevice.h  |   41 
 include/net/ip.h   |4 -
 include/net/ip6_route.h|1 
 include/uapi/linux/rtnetlink.h |2 
 net/8021q/vlan.c   |2 
 net/bridge/br_multicast.c  |   23 ---
 net/core/dev.c |   80 ++---
 net/core/pktgen.c  |   38 ++-
 net/ethernet/eth.c |2 
 net/ipv4/af_inet.c |2 
 net/ipv4/fou.c |4 -
 net/ipv4/gre_offload.c |2 
 net/ipv4/ip_sockglue.c |   11 +--
 net/ipv4/sysctl_net_ipv4.c |8 +-
 net/ipv4/udp.c |2 
 net/ipv4/udp_offload.c |2 
 net/ipv6/addrconf.c|2 
 net/ipv6/ip6_offload.c |2 
 net/ipv6/ip6_tunnel.c  |3 
 net/ipv6/route.c   |6 +
 net/ipv6/tcp_ipv6.c|   20 +++---
 net/ipv6/udp.c |3 
 net/netlink/af_netlink.c   |7 +-
 net/packet/af_packet.c |   10 +--
 net/sched/act_api.c|   19 +++--
 net/sched/act_vlan.c   |9 ++
 net/sched/cls_api.c|3 
 net/sctp/output.c  |8 ++
 net/sctp/sm_statefuns.c|   12 +--
 net/sctp/socket.c  |5 +
 net/switchdev/switchdev.c  |9 ++
 49 files changed, 368 insertions(+), 217 deletions(-)

Andrew Collins (1):
  net: Add netdev all_adj_list refcnt propagation to fix panic

Andrew Lunn (1):
  net: phy: Trigger state machine on state change and not polling.

Anoob Soman (1):
  packet: call fanout_release, while UNREGISTERING a netdev

Brenden Blanco (1):
  net/mlx4_en: fixup xdp tx irq to match rx

David Ahern (1):
  net: ipv6: Do not consider link state for nexthop validation

Eli Cooper (1):
  ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()

Eric Dumazet (5):
  netlink: do not enter direct reclaim from netlink_dump()
  ipv6: tcp: restore IP6CB for pktoptions skbs
  net: pktgen: remove rcu locking in pktgen_change_name()
  ipv4: disable BH in set_ping_group_range()
  udp: fix IP_CHECKSUM handling

Fabio Estevam (1):
  net: fec: Call swap_buffer() prior to IP header alignment

Felipe Balbi (1):
  usb: dwc3: gadget: properly account queued requests

Gavin Schenk (1):
  net: fec: set mac address unconditionally

Greg Kroah-Hartman (1):
  Linux 4.8.8

Ido Schimmel (2):
  switchdev: Execute bridge ndos only for bridge ports
  net: core: Correctly iterate over lower adjacency list

Ivan Vecera (1):
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold

Jamal Hadi Salim (1):
  net sched filters: fix notification of filter delete with proper handle

Jiri Pirko (1):
  rtnetlink: Add rtnexthop offload flag to compare mask

Jiri Slaby (1):
  net: sctp, forbid negative length

Kashyap Desai (1):
  scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) 
devices


Linux 4.4.32

2016-11-14 Thread Greg KH
I'm announcing the release of the 4.4.32 kernel.

All users of the 4.4 kernel series must upgrade.

The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-4.4.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile  |2 
 arch/mips/kvm/emulate.c   |2 
 drivers/gpu/drm/amd/amdgpu/atombios_dp.c  |   20 ++--
 drivers/gpu/drm/radeon/atombios_dp.c  |   20 ++--
 drivers/net/ethernet/broadcom/tg3.c   |   10 ++--
 drivers/net/ethernet/freescale/fec_main.c |   10 ++--
 drivers/net/geneve.c  |2 
 drivers/net/vxlan.c   |2 
 drivers/of/of_reserved_mem.c  |8 ++-
 drivers/scsi/megaraid/megaraid_sas.h  |2 
 include/linux/mroute.h|2 
 include/linux/mroute6.h   |2 
 include/linux/netdevice.h |   40 -
 include/net/ip.h  |4 -
 include/net/sch_generic.h |9 +++
 include/net/sock.h|   10 
 include/uapi/linux/rtnetlink.h|2 
 net/8021q/vlan.c  |2 
 net/bridge/br_multicast.c |   23 ++---
 net/core/dev.c|   70 --
 net/core/pktgen.c |   38 
 net/ethernet/eth.c|2 
 net/ipv4/af_inet.c|2 
 net/ipv4/fou.c|4 -
 net/ipv4/gre_offload.c|2 
 net/ipv4/ip_sockglue.c|   10 ++--
 net/ipv4/ipmr.c   |3 -
 net/ipv4/route.c  |3 -
 net/ipv4/sysctl_net_ipv4.c|8 +--
 net/ipv4/tcp_input.c  |3 -
 net/ipv4/tcp_output.c |   15 +++---
 net/ipv4/udp.c|2 
 net/ipv4/udp_offload.c|4 -
 net/ipv6/addrconf.c   |2 
 net/ipv6/ip6_gre.c|1 
 net/ipv6/ip6_offload.c|2 
 net/ipv6/ip6_tunnel.c |2 
 net/ipv6/ip6mr.c  |5 +-
 net/ipv6/route.c  |4 +
 net/ipv6/tcp_ipv6.c   |   20 
 net/ipv6/udp.c|3 -
 net/netlink/af_netlink.c  |9 +--
 net/packet/af_packet.c|   10 ++--
 net/sched/act_vlan.c  |9 +++
 net/sched/cls_api.c   |3 -
 net/sctp/sm_statefuns.c   |   12 ++---
 net/sctp/socket.c |5 +-
 47 files changed, 275 insertions(+), 150 deletions(-)

Alex Deucher (4):
  drm/amdgpu/dp: add back special handling for NUTMEG
  drm/amdgpu: fix DP mode validation
  drm/radeon/dp: add back special handling for NUTMEG
  drm/radeon: fix DP mode validation

Andrew Collins (1):
  net: Add netdev all_adj_list refcnt propagation to fix panic

Anoob Soman (1):
  packet: call fanout_release, while UNREGISTERING a netdev

Douglas Caetano dos Santos (1):
  tcp: fix wrong checksum calculation on MTU probing

Eric Dumazet (8):
  tcp: fix overflow in __tcp_retransmit_skb()
  net: avoid sk_forward_alloc overflows
  tcp: fix a compile error in DBGUNDO()
  netlink: do not enter direct reclaim from netlink_dump()
  ipv6: tcp: restore IP6CB for pktoptions skbs
  net: pktgen: remove rcu locking in pktgen_change_name()
  ipv4: disable BH in set_ping_group_range()
  udp: fix IP_CHECKSUM handling

Gavin Schenk (1):
  net: fec: set mac address unconditionally

Greg Kroah-Hartman (2):
  Revert KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
  Linux 4.4.32

Jamal Hadi Salim (1):
  net sched filters: fix notification of filter delete with proper handle

James Hogan (1):
  KVM: MIPS: Drop other CPU ASIDs on guest MMU changes

Jiri Pirko (1):
  rtnetlink: Add rtnexthop offload flag to compare mask

Jiri Slaby (1):
  net: sctp, forbid negative length

Lance Richardson (1):
  ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()

Marcelo Ricardo Leitner (1):
  sctp: validate chunk len before actually using it

Milton Miller (1):
  tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

Nicolas Dichtel (1):
  ipv6: correctly add local routes when lo goes up

Nikolay Aleksandrov (2):
  ipmr, ip6mr: fix scheduling while atomic and a deadlock with 
ipmr_get_route
  bridge: multicast: restore perm router ports on multicast enable

Paolo Abeni (1):
  net: pktgen: fix pkt_size

Sabrina Dubroca (1):
 

Re: Linux 4.8.8

2016-11-14 Thread Greg KH
diff --git a/Makefile b/Makefile
index 4d0f28cb481d..8f18daa2c76a 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 4
 PATCHLEVEL = 8
-SUBLEVEL = 7
+SUBLEVEL = 8
 EXTRAVERSION =
 NAME = Psychotic Stoned Sheep
 
diff --git a/arch/powerpc/include/asm/checksum.h 
b/arch/powerpc/include/asm/checksum.h
index ee655ed1ff1b..1e8fceb308a5 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -53,10 +53,8 @@ static inline __sum16 csum_fold(__wsum sum)
return (__force __sum16)(~((__force u32)sum + tmp) >> 16);
 }
 
-static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
- unsigned short len,
- unsigned short proto,
- __wsum sum)
+static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, __u32 len,
+   __u8 proto, __wsum sum)
 {
 #ifdef __powerpc64__
unsigned long s = (__force u32)sum;
@@ -83,10 +81,8 @@ static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 
daddr,
  * computes the checksum of the TCP/UDP pseudo-header
  * returns a 16-bit checksum, already complemented
  */
-static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
-   unsigned short len,
-   unsigned short proto,
-   __wsum sum)
+static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
+   __u8 proto, __wsum sum)
 {
return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 9dbfcc0ab577..5ff64afd69f9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -63,6 +63,8 @@ enum ipoib_flush_level {
 
 enum {
IPOIB_ENCAP_LEN   = 4,
+   IPOIB_PSEUDO_LEN  = 20,
+   IPOIB_HARD_LEN= IPOIB_ENCAP_LEN + IPOIB_PSEUDO_LEN,
 
IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN,
IPOIB_UD_RX_SG= 2, /* max buffer needed for 4K mtu */
@@ -134,15 +136,21 @@ struct ipoib_header {
u16 reserved;
 };
 
-struct ipoib_cb {
-   struct qdisc_skb_cb qdisc_cb;
-   u8  hwaddr[INFINIBAND_ALEN];
+struct ipoib_pseudo_header {
+   u8  hwaddr[INFINIBAND_ALEN];
 };
 
-static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
+static inline void skb_add_pseudo_hdr(struct sk_buff *skb)
 {
-   BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
-   return (struct ipoib_cb *)skb->cb;
+   char *data = skb_push(skb, IPOIB_PSEUDO_LEN);
+
+   /*
+* only the ipoib header is present now, make room for a dummy
+* pseudo header and set skb field accordingly
+*/
+   memset(data, 0, IPOIB_PSEUDO_LEN);
+   skb_reset_mac_header(skb);
+   skb_pull(skb, IPOIB_HARD_LEN);
 }
 
 /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d3de89..339a1eecdfe3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -63,6 +63,8 @@ MODULE_PARM_DESC(cm_data_debug_level,
 #define IPOIB_CM_RX_DELAY   (3 * 256 * HZ)
 #define IPOIB_CM_RX_UPDATE_MASK (0x3)
 
+#define IPOIB_CM_RX_RESERVE (ALIGN(IPOIB_HARD_LEN, 16) - IPOIB_ENCAP_LEN)
+
 static struct ib_qp_attr ipoib_cm_err_attr = {
.qp_state = IB_QPS_ERR
 };
@@ -146,15 +148,15 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct 
net_device *dev,
struct sk_buff *skb;
int i;
 
-   skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
+   skb = dev_alloc_skb(ALIGN(IPOIB_CM_HEAD_SIZE + IPOIB_PSEUDO_LEN, 16));
if (unlikely(!skb))
return NULL;
 
/*
-* IPoIB adds a 4 byte header. So we need 12 more bytes to align the
+* IPoIB adds a IPOIB_ENCAP_LEN byte header, this will align the
 * IP header to a multiple of 16.
 */
-   skb_reserve(skb, 12);
+   skb_reserve(skb, IPOIB_CM_RX_RESERVE);
 
mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE,
   DMA_FROM_DEVICE);
@@ -624,9 +626,9 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
if (wc->byte_len < IPOIB_CM_COPYBREAK) {
int dlen = wc->byte_len;
 
-   small_skb = dev_alloc_skb(dlen + 12);
+   small_skb = dev_alloc_skb(dlen + IPOIB_CM_RX_RESERVE);
if (small_skb) {
-   skb_reserve(small_skb, 12);
+   skb_reserve(small_skb, IPOIB_CM_RX_RESERVE);
ib_dma_sync_single_for_cpu(priv->ca, 

Linux 4.4.32

2016-11-14 Thread Greg KH
I'm announcing the release of the 4.4.32 kernel.

All users of the 4.4 kernel series must upgrade.

The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-4.4.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile  |2 
 arch/mips/kvm/emulate.c   |2 
 drivers/gpu/drm/amd/amdgpu/atombios_dp.c  |   20 ++--
 drivers/gpu/drm/radeon/atombios_dp.c  |   20 ++--
 drivers/net/ethernet/broadcom/tg3.c   |   10 ++--
 drivers/net/ethernet/freescale/fec_main.c |   10 ++--
 drivers/net/geneve.c  |2 
 drivers/net/vxlan.c   |2 
 drivers/of/of_reserved_mem.c  |8 ++-
 drivers/scsi/megaraid/megaraid_sas.h  |2 
 include/linux/mroute.h|2 
 include/linux/mroute6.h   |2 
 include/linux/netdevice.h |   40 -
 include/net/ip.h  |4 -
 include/net/sch_generic.h |9 +++
 include/net/sock.h|   10 
 include/uapi/linux/rtnetlink.h|2 
 net/8021q/vlan.c  |2 
 net/bridge/br_multicast.c |   23 ++---
 net/core/dev.c|   70 --
 net/core/pktgen.c |   38 
 net/ethernet/eth.c|2 
 net/ipv4/af_inet.c|2 
 net/ipv4/fou.c|4 -
 net/ipv4/gre_offload.c|2 
 net/ipv4/ip_sockglue.c|   10 ++--
 net/ipv4/ipmr.c   |3 -
 net/ipv4/route.c  |3 -
 net/ipv4/sysctl_net_ipv4.c|8 +--
 net/ipv4/tcp_input.c  |3 -
 net/ipv4/tcp_output.c |   15 +++---
 net/ipv4/udp.c|2 
 net/ipv4/udp_offload.c|4 -
 net/ipv6/addrconf.c   |2 
 net/ipv6/ip6_gre.c|1 
 net/ipv6/ip6_offload.c|2 
 net/ipv6/ip6_tunnel.c |2 
 net/ipv6/ip6mr.c  |5 +-
 net/ipv6/route.c  |4 +
 net/ipv6/tcp_ipv6.c   |   20 
 net/ipv6/udp.c|3 -
 net/netlink/af_netlink.c  |9 +--
 net/packet/af_packet.c|   10 ++--
 net/sched/act_vlan.c  |9 +++
 net/sched/cls_api.c   |3 -
 net/sctp/sm_statefuns.c   |   12 ++---
 net/sctp/socket.c |5 +-
 47 files changed, 275 insertions(+), 150 deletions(-)

Alex Deucher (4):
  drm/amdgpu/dp: add back special handling for NUTMEG
  drm/amdgpu: fix DP mode validation
  drm/radeon/dp: add back special handling for NUTMEG
  drm/radeon: fix DP mode validation

Andrew Collins (1):
  net: Add netdev all_adj_list refcnt propagation to fix panic

Anoob Soman (1):
  packet: call fanout_release, while UNREGISTERING a netdev

Douglas Caetano dos Santos (1):
  tcp: fix wrong checksum calculation on MTU probing

Eric Dumazet (8):
  tcp: fix overflow in __tcp_retransmit_skb()
  net: avoid sk_forward_alloc overflows
  tcp: fix a compile error in DBGUNDO()
  netlink: do not enter direct reclaim from netlink_dump()
  ipv6: tcp: restore IP6CB for pktoptions skbs
  net: pktgen: remove rcu locking in pktgen_change_name()
  ipv4: disable BH in set_ping_group_range()
  udp: fix IP_CHECKSUM handling

Gavin Schenk (1):
  net: fec: set mac address unconditionally

Greg Kroah-Hartman (2):
  Revert KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
  Linux 4.4.32

Jamal Hadi Salim (1):
  net sched filters: fix notification of filter delete with proper handle

James Hogan (1):
  KVM: MIPS: Drop other CPU ASIDs on guest MMU changes

Jiri Pirko (1):
  rtnetlink: Add rtnexthop offload flag to compare mask

Jiri Slaby (1):
  net: sctp, forbid negative length

Lance Richardson (1):
  ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()

Marcelo Ricardo Leitner (1):
  sctp: validate chunk len before actually using it

Milton Miller (1):
  tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

Nicolas Dichtel (1):
  ipv6: correctly add local routes when lo goes up

Nikolay Aleksandrov (2):
  ipmr, ip6mr: fix scheduling while atomic and a deadlock with 
ipmr_get_route
  bridge: multicast: restore perm router ports on multicast enable

Paolo Abeni (1):
  net: pktgen: fix pkt_size

Sabrina Dubroca (1):
 

Re: Linux 4.8.8

2016-11-14 Thread Greg KH
diff --git a/Makefile b/Makefile
index 4d0f28cb481d..8f18daa2c76a 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 4
 PATCHLEVEL = 8
-SUBLEVEL = 7
+SUBLEVEL = 8
 EXTRAVERSION =
 NAME = Psychotic Stoned Sheep
 
diff --git a/arch/powerpc/include/asm/checksum.h 
b/arch/powerpc/include/asm/checksum.h
index ee655ed1ff1b..1e8fceb308a5 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -53,10 +53,8 @@ static inline __sum16 csum_fold(__wsum sum)
return (__force __sum16)(~((__force u32)sum + tmp) >> 16);
 }
 
-static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
- unsigned short len,
- unsigned short proto,
- __wsum sum)
+static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, __u32 len,
+   __u8 proto, __wsum sum)
 {
 #ifdef __powerpc64__
unsigned long s = (__force u32)sum;
@@ -83,10 +81,8 @@ static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 
daddr,
  * computes the checksum of the TCP/UDP pseudo-header
  * returns a 16-bit checksum, already complemented
  */
-static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
-   unsigned short len,
-   unsigned short proto,
-   __wsum sum)
+static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
+   __u8 proto, __wsum sum)
 {
return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 9dbfcc0ab577..5ff64afd69f9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -63,6 +63,8 @@ enum ipoib_flush_level {
 
 enum {
IPOIB_ENCAP_LEN   = 4,
+   IPOIB_PSEUDO_LEN  = 20,
+   IPOIB_HARD_LEN= IPOIB_ENCAP_LEN + IPOIB_PSEUDO_LEN,
 
IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN,
IPOIB_UD_RX_SG= 2, /* max buffer needed for 4K mtu */
@@ -134,15 +136,21 @@ struct ipoib_header {
u16 reserved;
 };
 
-struct ipoib_cb {
-   struct qdisc_skb_cb qdisc_cb;
-   u8  hwaddr[INFINIBAND_ALEN];
+struct ipoib_pseudo_header {
+   u8  hwaddr[INFINIBAND_ALEN];
 };
 
-static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
+static inline void skb_add_pseudo_hdr(struct sk_buff *skb)
 {
-   BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
-   return (struct ipoib_cb *)skb->cb;
+   char *data = skb_push(skb, IPOIB_PSEUDO_LEN);
+
+   /*
+* only the ipoib header is present now, make room for a dummy
+* pseudo header and set skb field accordingly
+*/
+   memset(data, 0, IPOIB_PSEUDO_LEN);
+   skb_reset_mac_header(skb);
+   skb_pull(skb, IPOIB_HARD_LEN);
 }
 
 /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d3de89..339a1eecdfe3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -63,6 +63,8 @@ MODULE_PARM_DESC(cm_data_debug_level,
 #define IPOIB_CM_RX_DELAY   (3 * 256 * HZ)
 #define IPOIB_CM_RX_UPDATE_MASK (0x3)
 
+#define IPOIB_CM_RX_RESERVE (ALIGN(IPOIB_HARD_LEN, 16) - IPOIB_ENCAP_LEN)
+
 static struct ib_qp_attr ipoib_cm_err_attr = {
.qp_state = IB_QPS_ERR
 };
@@ -146,15 +148,15 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct 
net_device *dev,
struct sk_buff *skb;
int i;
 
-   skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
+   skb = dev_alloc_skb(ALIGN(IPOIB_CM_HEAD_SIZE + IPOIB_PSEUDO_LEN, 16));
if (unlikely(!skb))
return NULL;
 
/*
-* IPoIB adds a 4 byte header. So we need 12 more bytes to align the
+* IPoIB adds a IPOIB_ENCAP_LEN byte header, this will align the
 * IP header to a multiple of 16.
 */
-   skb_reserve(skb, 12);
+   skb_reserve(skb, IPOIB_CM_RX_RESERVE);
 
mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE,
   DMA_FROM_DEVICE);
@@ -624,9 +626,9 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
if (wc->byte_len < IPOIB_CM_COPYBREAK) {
int dlen = wc->byte_len;
 
-   small_skb = dev_alloc_skb(dlen + 12);
+   small_skb = dev_alloc_skb(dlen + IPOIB_CM_RX_RESERVE);
if (small_skb) {
-   skb_reserve(small_skb, 12);
+   skb_reserve(small_skb, IPOIB_CM_RX_RESERVE);
ib_dma_sync_single_for_cpu(priv->ca, 

Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)

2016-11-14 Thread Ingo Molnar

* Namhyung Kim  wrote:

> > > By default it shows the individual schedule events, including the time 
> > > between
> > > sched-in events for the task, the task scheduling delay (time between 
> > > wakeup
> > > and actually running) and run time for the task:
> > > 
> > >time cpu  task name[tid/pid]b/n time sch delay  run time
> > >   -   - - -
> > >79371.874569 [11] gcc[31949]   0.014 0.000 1.148
> > >79371.874591 [10] gcc[31951]   0.000 0.000 0.024
> > >79371.874603 [10] migration/10[59] 3.350 0.004 0.011
> > >79371.874604 [11]1.148 0.000 0.035
> > >79371.874723 [05]0.016 0.000 1.383
> > >79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> > > ...
> > 
> > What does the 'b/n' abbreviation stand for? 'Between'? Could we call the 
> > column 
> > 'sch wait' instead, or so?
> 
> Looks better, or what about 'wait time'?

Works for me!

> I'd go with the first option - simply adding arrows.  It's good enough to 
> identify each function IMHO.

Ok!

Thanks,

Ingo


Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)

2016-11-14 Thread Ingo Molnar

* Namhyung Kim  wrote:

> > > By default it shows the individual schedule events, including the time 
> > > between
> > > sched-in events for the task, the task scheduling delay (time between 
> > > wakeup
> > > and actually running) and run time for the task:
> > > 
> > >time cpu  task name[tid/pid]b/n time sch delay  run time
> > >   -   - - -
> > >79371.874569 [11] gcc[31949]   0.014 0.000 1.148
> > >79371.874591 [10] gcc[31951]   0.000 0.000 0.024
> > >79371.874603 [10] migration/10[59] 3.350 0.004 0.011
> > >79371.874604 [11]1.148 0.000 0.035
> > >79371.874723 [05]0.016 0.000 1.383
> > >79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> > > ...
> > 
> > What does the 'b/n' abbreviation stand for? 'Between'? Could we call the 
> > column 
> > 'sch wait' instead, or so?
> 
> Looks better, or what about 'wait time'?

Works for me!

> I'd go with the first option - simply adding arrows.  It's good enough to 
> identify each function IMHO.

Ok!

Thanks,

Ingo


Re: [RFC][PATCH 2/7] kref: Add kref_read()

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> Since we need to change the implementation, stop exposing internals.
> 
> Provide kref_read() to read the current reference count; typically
> used for debug messages.
> 
> Kills two anti-patterns:
> 
>   atomic_read(>refcount)
>   kref->refcount.counter
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  drivers/block/drbd/drbd_req.c|2 -
>  drivers/block/rbd.c  |8 ++---
>  drivers/block/virtio_blk.c   |2 -
>  drivers/gpu/drm/drm_gem_cma_helper.c |2 -
>  drivers/gpu/drm/drm_info.c   |2 -
>  drivers/gpu/drm/drm_mode_object.c|4 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c|2 -
>  drivers/gpu/drm/msm/msm_gem.c|2 -
>  drivers/gpu/drm/nouveau/nouveau_fence.c  |2 -
>  drivers/gpu/drm/omapdrm/omap_gem.c   |2 -
>  drivers/gpu/drm/ttm/ttm_bo.c |4 +-
>  drivers/gpu/drm/ttm/ttm_object.c |2 -
>  drivers/infiniband/hw/cxgb3/iwch_cm.h|6 ++--
>  drivers/infiniband/hw/cxgb3/iwch_qp.c|2 -
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |6 ++--
>  drivers/infiniband/hw/cxgb4/qp.c |2 -
>  drivers/infiniband/hw/usnic/usnic_ib_sysfs.c |6 ++--
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |4 +-
>  drivers/misc/genwqe/card_dev.c   |2 -
>  drivers/misc/mei/debugfs.c   |2 -
>  drivers/pci/hotplug/pnv_php.c|2 -
>  drivers/pci/slot.c   |2 -
>  drivers/scsi/bnx2fc/bnx2fc_io.c  |8 ++---
>  drivers/scsi/cxgbi/libcxgbi.h|4 +-
>  drivers/scsi/lpfc/lpfc_debugfs.c |2 -
>  drivers/scsi/lpfc/lpfc_els.c |2 -
>  drivers/scsi/lpfc/lpfc_hbadisc.c |   40 
> +--
>  drivers/scsi/lpfc/lpfc_init.c|3 --
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c   |4 +-
>  drivers/staging/android/ion/ion.c|2 -
>  drivers/staging/comedi/comedi_buf.c  |2 -
>  drivers/target/target_core_pr.c  |   10 +++---
>  drivers/target/tcm_fc/tfc_sess.c |2 -
>  drivers/usb/gadget/function/f_fs.c   |2 -
>  fs/exofs/sys.c   |2 -
>  fs/ocfs2/cluster/netdebug.c  |2 -
>  fs/ocfs2/cluster/tcp.c   |2 -
>  fs/ocfs2/dlm/dlmdebug.c  |   12 
>  fs/ocfs2/dlm/dlmdomain.c |2 -
>  fs/ocfs2/dlm/dlmmaster.c |8 ++---
>  fs/ocfs2/dlm/dlmunlock.c |2 -
>  include/drm/drm_framebuffer.h|2 -
>  include/drm/ttm/ttm_bo_driver.h  |4 +-
>  include/linux/kref.h |5 +++
>  include/linux/sunrpc/cache.h |2 -
>  include/net/bluetooth/hci_core.h |4 +-
>  net/bluetooth/6lowpan.c  |2 -
>  net/bluetooth/a2mp.c |4 +-
>  net/bluetooth/amp.c  |4 +-
>  net/bluetooth/l2cap_core.c   |4 +-
>  net/ceph/messenger.c |4 +-
>  net/ceph/osd_client.c|   10 +++---
>  net/sunrpc/cache.c   |2 -
>  net/sunrpc/svc_xprt.c|6 ++--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |4 +-
>  55 files changed, 120 insertions(+), 116 deletions(-)
> 
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
>   /* Completion does it's own kref_put.  If we are going to
>* kref_sub below, we need req to be still around then. */
>   int at_least = k_put + !!c_put;
> - int refcount = atomic_read(>kref.refcount);
> + int refcount = kref_read(>kref);
>   if (refcount < at_least)
>   drbd_err(device,
>   "mod_rq_state: Logic BUG: %x -> %x: refcount = 
> %d, should be >= %d\n",

As proof of "things you should never do", here is one such example.

ugh.


> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
>   /* Stop all the virtqueues. */
>   vdev->config->reset(vdev);
>  
> - refc = atomic_read(_to_dev(vblk->disk)->kobj.kref.refcount);
> + refc = kref_read(_to_dev(vblk->disk)->kobj.kref);
>   put_disk(vblk->disk);
>   vdev->config->del_vqs(vdev);
>   kfree(vblk->vqs);

And this too, ugh, that's a huge abuse and is probably totally wrong...

thanks again for digging through this crap.  I wonder if we need to name
the kref reference variable 

Re: [RFC][PATCH 2/7] kref: Add kref_read()

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> Since we need to change the implementation, stop exposing internals.
> 
> Provide kref_read() to read the current reference count; typically
> used for debug messages.
> 
> Kills two anti-patterns:
> 
>   atomic_read(>refcount)
>   kref->refcount.counter
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  drivers/block/drbd/drbd_req.c|2 -
>  drivers/block/rbd.c  |8 ++---
>  drivers/block/virtio_blk.c   |2 -
>  drivers/gpu/drm/drm_gem_cma_helper.c |2 -
>  drivers/gpu/drm/drm_info.c   |2 -
>  drivers/gpu/drm/drm_mode_object.c|4 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c|2 -
>  drivers/gpu/drm/msm/msm_gem.c|2 -
>  drivers/gpu/drm/nouveau/nouveau_fence.c  |2 -
>  drivers/gpu/drm/omapdrm/omap_gem.c   |2 -
>  drivers/gpu/drm/ttm/ttm_bo.c |4 +-
>  drivers/gpu/drm/ttm/ttm_object.c |2 -
>  drivers/infiniband/hw/cxgb3/iwch_cm.h|6 ++--
>  drivers/infiniband/hw/cxgb3/iwch_qp.c|2 -
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |6 ++--
>  drivers/infiniband/hw/cxgb4/qp.c |2 -
>  drivers/infiniband/hw/usnic/usnic_ib_sysfs.c |6 ++--
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c |4 +-
>  drivers/misc/genwqe/card_dev.c   |2 -
>  drivers/misc/mei/debugfs.c   |2 -
>  drivers/pci/hotplug/pnv_php.c|2 -
>  drivers/pci/slot.c   |2 -
>  drivers/scsi/bnx2fc/bnx2fc_io.c  |8 ++---
>  drivers/scsi/cxgbi/libcxgbi.h|4 +-
>  drivers/scsi/lpfc/lpfc_debugfs.c |2 -
>  drivers/scsi/lpfc/lpfc_els.c |2 -
>  drivers/scsi/lpfc/lpfc_hbadisc.c |   40 
> +--
>  drivers/scsi/lpfc/lpfc_init.c|3 --
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c   |4 +-
>  drivers/staging/android/ion/ion.c|2 -
>  drivers/staging/comedi/comedi_buf.c  |2 -
>  drivers/target/target_core_pr.c  |   10 +++---
>  drivers/target/tcm_fc/tfc_sess.c |2 -
>  drivers/usb/gadget/function/f_fs.c   |2 -
>  fs/exofs/sys.c   |2 -
>  fs/ocfs2/cluster/netdebug.c  |2 -
>  fs/ocfs2/cluster/tcp.c   |2 -
>  fs/ocfs2/dlm/dlmdebug.c  |   12 
>  fs/ocfs2/dlm/dlmdomain.c |2 -
>  fs/ocfs2/dlm/dlmmaster.c |8 ++---
>  fs/ocfs2/dlm/dlmunlock.c |2 -
>  include/drm/drm_framebuffer.h|2 -
>  include/drm/ttm/ttm_bo_driver.h  |4 +-
>  include/linux/kref.h |5 +++
>  include/linux/sunrpc/cache.h |2 -
>  include/net/bluetooth/hci_core.h |4 +-
>  net/bluetooth/6lowpan.c  |2 -
>  net/bluetooth/a2mp.c |4 +-
>  net/bluetooth/amp.c  |4 +-
>  net/bluetooth/l2cap_core.c   |4 +-
>  net/ceph/messenger.c |4 +-
>  net/ceph/osd_client.c|   10 +++---
>  net/sunrpc/cache.c   |2 -
>  net/sunrpc/svc_xprt.c|6 ++--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |4 +-
>  55 files changed, 120 insertions(+), 116 deletions(-)
> 
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
>   /* Completion does it's own kref_put.  If we are going to
>* kref_sub below, we need req to be still around then. */
>   int at_least = k_put + !!c_put;
> - int refcount = atomic_read(>kref.refcount);
> + int refcount = kref_read(>kref);
>   if (refcount < at_least)
>   drbd_err(device,
>   "mod_rq_state: Logic BUG: %x -> %x: refcount = 
> %d, should be >= %d\n",

As proof of "things you should never do", here is one such example.

ugh.


> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
>   /* Stop all the virtqueues. */
>   vdev->config->reset(vdev);
>  
> - refc = atomic_read(_to_dev(vblk->disk)->kobj.kref.refcount);
> + refc = kref_read(_to_dev(vblk->disk)->kobj.kref);
>   put_disk(vblk->disk);
>   vdev->config->del_vqs(vdev);
>   kfree(vblk->vqs);

And this too, ugh, that's a huge abuse and is probably totally wrong...

thanks again for digging through this crap.  I wonder if we need to name
the kref reference variable 

[PATCH] reset: hisilicon: add a polarity cell for reset line specifier

2016-11-14 Thread Jiancheng Xue
Add a polarity cell for reset line specifier. If the reset line
is asserted when the register bit is 1, the polarity is
normal. Otherwise, it is inverted.

Signed-off-by: Jiancheng Xue 
---
 .../devicetree/bindings/clock/hisi-crg.txt | 11 ---
 arch/arm/boot/dts/hi3519.dtsi  |  2 +-
 drivers/clk/hisilicon/reset.c  | 36 --
 3 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/hisi-crg.txt 
b/Documentation/devicetree/bindings/clock/hisi-crg.txt
index e3919b6..fcbb4f3 100644
--- a/Documentation/devicetree/bindings/clock/hisi-crg.txt
+++ b/Documentation/devicetree/bindings/clock/hisi-crg.txt
@@ -25,19 +25,20 @@ to specify the clock which they consume.
 
 All these identifier could be found in .
 
-- #reset-cells: should be 2.
+- #reset-cells: should be 3.
 
 A reset signal can be controlled by writing a bit register in the CRG module.
-The reset specifier consists of two cells. The first cell represents the
+The reset specifier consists of three cells. The first cell represents the
 register offset relative to the base address. The second cell represents the
-bit index in the register.
+bit index in the register. The third cell represents the polarity of the reset
+line (0 for normal, 1 for inverted).
 
 Example: CRG nodes
 CRG: clock-reset-controller@1201 {
compatible = "hisilicon,hi3519-crg";
reg = <0x1201 0x1>;
#clock-cells = <1>;
-   #reset-cells = <2>;
+   #reset-cells = <3>;
 };
 
 Example: consumer nodes
@@ -45,5 +46,5 @@ i2c0: i2c@1211 {
compatible = "hisilicon,hi3519-i2c";
reg = <0x1211 0x1000>;
clocks = < HI3519_I2C0_RST>;
-   resets = < 0xe4 0>;
+   resets = < 0xe4 0 0>;
 };
diff --git a/arch/arm/boot/dts/hi3519.dtsi b/arch/arm/boot/dts/hi3519.dtsi
index 5729ecf..b7cb182 100644
--- a/arch/arm/boot/dts/hi3519.dtsi
+++ b/arch/arm/boot/dts/hi3519.dtsi
@@ -50,7 +50,7 @@
crg: clock-reset-controller@1201 {
compatible = "hisilicon,hi3519-crg";
#clock-cells = <1>;
-   #reset-cells = <2>;
+   #reset-cells = <3>;
reg = <0x1201 0x1>;
};
 
diff --git a/drivers/clk/hisilicon/reset.c b/drivers/clk/hisilicon/reset.c
index 2a5015c..c0ab0b6 100644
--- a/drivers/clk/hisilicon/reset.c
+++ b/drivers/clk/hisilicon/reset.c
@@ -17,6 +17,7 @@
  * along with this program. If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -25,9 +26,11 @@
 #include 
 #include "reset.h"
 
-#defineHISI_RESET_BIT_MASK 0x1f
-#defineHISI_RESET_OFFSET_SHIFT 8
-#defineHISI_RESET_OFFSET_MASK  0x00
+#define HISI_RESET_POLARITY_MASK   BIT(0)
+#define HISI_RESET_BIT_SHIFT   1
+#define HISI_RESET_BIT_MASKGENMASK(6, 1)
+#define HISI_RESET_OFFSET_SHIFT8
+#define HISI_RESET_OFFSET_MASK GENMASK(23, 8)
 
 struct hisi_reset_controller {
spinlock_t  lock;
@@ -44,12 +47,15 @@ static int hisi_reset_of_xlate(struct reset_controller_dev 
*rcdev,
 {
u32 offset;
u8 bit;
+   bool polarity;
 
offset = (reset_spec->args[0] << HISI_RESET_OFFSET_SHIFT)
& HISI_RESET_OFFSET_MASK;
-   bit = reset_spec->args[1] & HISI_RESET_BIT_MASK;
+   bit = (reset_spec->args[1] << HISI_RESET_BIT_SHIFT)
+   & HISI_RESET_BIT_MASK;
+   polarity = reset_spec->args[2] & HISI_RESET_POLARITY_MASK;
 
-   return (offset | bit);
+   return (offset | bit | polarity);
 }
 
 static int hisi_reset_assert(struct reset_controller_dev *rcdev,
@@ -59,14 +65,19 @@ static int hisi_reset_assert(struct reset_controller_dev 
*rcdev,
unsigned long flags;
u32 offset, reg;
u8 bit;
+   bool polarity;
 
offset = (id & HISI_RESET_OFFSET_MASK) >> HISI_RESET_OFFSET_SHIFT;
-   bit = id & HISI_RESET_BIT_MASK;
+   bit = (id & HISI_RESET_BIT_MASK) >> HISI_RESET_BIT_SHIFT;
+   polarity = id & HISI_RESET_POLARITY_MASK;
 
spin_lock_irqsave(>lock, flags);
 
reg = readl(rstc->membase + offset);
-   writel(reg | BIT(bit), rstc->membase + offset);
+   if (polarity)
+   writel(reg & ~BIT(bit), rstc->membase + offset);
+   else
+   writel(reg | BIT(bit), rstc->membase + offset);
 
spin_unlock_irqrestore(>lock, flags);
 
@@ -80,14 +91,19 @@ static int hisi_reset_deassert(struct reset_controller_dev 
*rcdev,
unsigned long flags;
u32 offset, reg;
u8 bit;
+   bool polarity;
 
offset = (id & HISI_RESET_OFFSET_MASK) >> HISI_RESET_OFFSET_SHIFT;
-   bit = id & HISI_RESET_BIT_MASK;
+   bit = (id & HISI_RESET_BIT_MASK) >> HISI_RESET_BIT_SHIFT;
+   polarity = id & HISI_RESET_POLARITY_MASK;
 
spin_lock_irqsave(>lock, flags);
 
reg = readl(rstc->membase + offset);
-  

Re: [RFC][PATCH 0/7] kref improvements

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> This series unfscks kref and then implements it in terms of refcount_t.
> 
> x86_64-allyesconfig compile tested and boot tested with my regular config.
> 
> refcount_t is as per the previous thread, it BUGs on over-/underflow and
> saturates at UINT_MAX, such that if we ever overflow, we'll never free again.
> 
> 

Thanks so much for doing these, at the very least, I want to take the
kref-abuse-fixes now as those users shouldn't be doing those foolish
things.  Any objection for me taking some of them through my tree now?

thanks,

greg k-h


Re: [RFC][PATCH 0/7] kref improvements

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 06:39:46PM +0100, Peter Zijlstra wrote:
> This series unfscks kref and then implements it in terms of refcount_t.
> 
> x86_64-allyesconfig compile tested and boot tested with my regular config.
> 
> refcount_t is as per the previous thread, it BUGs on over-/underflow and
> saturates at UINT_MAX, such that if we ever overflow, we'll never free again.
> 
> 

Thanks so much for doing these, at the very least, I want to take the
kref-abuse-fixes now as those users shouldn't be doing those foolish
things.  Any objection for me taking some of them through my tree now?

thanks,

greg k-h


[PATCH] reset: hisilicon: add a polarity cell for reset line specifier

2016-11-14 Thread Jiancheng Xue
Add a polarity cell for reset line specifier. If the reset line
is asserted when the register bit is 1, the polarity is
normal. Otherwise, it is inverted.

Signed-off-by: Jiancheng Xue 
---
 .../devicetree/bindings/clock/hisi-crg.txt | 11 ---
 arch/arm/boot/dts/hi3519.dtsi  |  2 +-
 drivers/clk/hisilicon/reset.c  | 36 --
 3 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/hisi-crg.txt 
b/Documentation/devicetree/bindings/clock/hisi-crg.txt
index e3919b6..fcbb4f3 100644
--- a/Documentation/devicetree/bindings/clock/hisi-crg.txt
+++ b/Documentation/devicetree/bindings/clock/hisi-crg.txt
@@ -25,19 +25,20 @@ to specify the clock which they consume.
 
 All these identifier could be found in .
 
-- #reset-cells: should be 2.
+- #reset-cells: should be 3.
 
 A reset signal can be controlled by writing a bit register in the CRG module.
-The reset specifier consists of two cells. The first cell represents the
+The reset specifier consists of three cells. The first cell represents the
 register offset relative to the base address. The second cell represents the
-bit index in the register.
+bit index in the register. The third cell represents the polarity of the reset
+line (0 for normal, 1 for inverted).
 
 Example: CRG nodes
 CRG: clock-reset-controller@1201 {
compatible = "hisilicon,hi3519-crg";
reg = <0x1201 0x1>;
#clock-cells = <1>;
-   #reset-cells = <2>;
+   #reset-cells = <3>;
 };
 
 Example: consumer nodes
@@ -45,5 +46,5 @@ i2c0: i2c@1211 {
compatible = "hisilicon,hi3519-i2c";
reg = <0x1211 0x1000>;
clocks = < HI3519_I2C0_RST>;
-   resets = < 0xe4 0>;
+   resets = < 0xe4 0 0>;
 };
diff --git a/arch/arm/boot/dts/hi3519.dtsi b/arch/arm/boot/dts/hi3519.dtsi
index 5729ecf..b7cb182 100644
--- a/arch/arm/boot/dts/hi3519.dtsi
+++ b/arch/arm/boot/dts/hi3519.dtsi
@@ -50,7 +50,7 @@
crg: clock-reset-controller@1201 {
compatible = "hisilicon,hi3519-crg";
#clock-cells = <1>;
-   #reset-cells = <2>;
+   #reset-cells = <3>;
reg = <0x1201 0x1>;
};
 
diff --git a/drivers/clk/hisilicon/reset.c b/drivers/clk/hisilicon/reset.c
index 2a5015c..c0ab0b6 100644
--- a/drivers/clk/hisilicon/reset.c
+++ b/drivers/clk/hisilicon/reset.c
@@ -17,6 +17,7 @@
  * along with this program. If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -25,9 +26,11 @@
 #include 
 #include "reset.h"
 
-#defineHISI_RESET_BIT_MASK 0x1f
-#defineHISI_RESET_OFFSET_SHIFT 8
-#defineHISI_RESET_OFFSET_MASK  0x00
+#define HISI_RESET_POLARITY_MASK   BIT(0)
+#define HISI_RESET_BIT_SHIFT   1
+#define HISI_RESET_BIT_MASKGENMASK(6, 1)
+#define HISI_RESET_OFFSET_SHIFT8
+#define HISI_RESET_OFFSET_MASK GENMASK(23, 8)
 
 struct hisi_reset_controller {
spinlock_t  lock;
@@ -44,12 +47,15 @@ static int hisi_reset_of_xlate(struct reset_controller_dev 
*rcdev,
 {
u32 offset;
u8 bit;
+   bool polarity;
 
offset = (reset_spec->args[0] << HISI_RESET_OFFSET_SHIFT)
& HISI_RESET_OFFSET_MASK;
-   bit = reset_spec->args[1] & HISI_RESET_BIT_MASK;
+   bit = (reset_spec->args[1] << HISI_RESET_BIT_SHIFT)
+   & HISI_RESET_BIT_MASK;
+   polarity = reset_spec->args[2] & HISI_RESET_POLARITY_MASK;
 
-   return (offset | bit);
+   return (offset | bit | polarity);
 }
 
 static int hisi_reset_assert(struct reset_controller_dev *rcdev,
@@ -59,14 +65,19 @@ static int hisi_reset_assert(struct reset_controller_dev 
*rcdev,
unsigned long flags;
u32 offset, reg;
u8 bit;
+   bool polarity;
 
offset = (id & HISI_RESET_OFFSET_MASK) >> HISI_RESET_OFFSET_SHIFT;
-   bit = id & HISI_RESET_BIT_MASK;
+   bit = (id & HISI_RESET_BIT_MASK) >> HISI_RESET_BIT_SHIFT;
+   polarity = id & HISI_RESET_POLARITY_MASK;
 
spin_lock_irqsave(>lock, flags);
 
reg = readl(rstc->membase + offset);
-   writel(reg | BIT(bit), rstc->membase + offset);
+   if (polarity)
+   writel(reg & ~BIT(bit), rstc->membase + offset);
+   else
+   writel(reg | BIT(bit), rstc->membase + offset);
 
spin_unlock_irqrestore(>lock, flags);
 
@@ -80,14 +91,19 @@ static int hisi_reset_deassert(struct reset_controller_dev 
*rcdev,
unsigned long flags;
u32 offset, reg;
u8 bit;
+   bool polarity;
 
offset = (id & HISI_RESET_OFFSET_MASK) >> HISI_RESET_OFFSET_SHIFT;
-   bit = id & HISI_RESET_BIT_MASK;
+   bit = (id & HISI_RESET_BIT_MASK) >> HISI_RESET_BIT_SHIFT;
+   polarity = id & HISI_RESET_POLARITY_MASK;
 
spin_lock_irqsave(>lock, flags);
 
reg = readl(rstc->membase + offset);
-   writel(reg & 

Re: [RFC][PATCH 2/7] kref: Add kref_read()

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > Since we need to change the implementation, stop exposing internals.
> > 
> > Provide kref_read() to read the current reference count; typically
> > used for debug messages.
> 
> Can we just provide a printk specifier for a kref value instead as
> that is the only valid use case for reading the value?

Yeah, that would be great as no one should be doing anything
logic-related based on the kref value.

thanks,

greg k-h


Re: [PATCH v3] mmc: sdhci-of-esdhc: fixup PRESENT_STATE read

2016-11-14 Thread Alexander Stein
On Monday 14 November 2016 16:12:27, Michael Walle wrote:
> Since commit 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy
> cards in __mmc_switch()") the ESDHC driver is broken:
>   mmc0: Card stuck in programming state! __mmc_switch
>   mmc0: error -110 whilst initialising MMC card
> 
> Since this commit __mmc_switch() uses ->card_busy(), which is
> sdhci_card_busy() for the esdhc driver. sdhci_card_busy() uses the
> PRESENT_STATE register, specifically the DAT0 signal level bit. But the
> ESDHC uses a non-conformant PRESENT_STATE register, thus a read fixup is
> required to make the driver work again.
> 
> Signed-off-by: Michael Walle 
> Fixes: 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy cards in
> __mmc_switch()") ---
> v3:
>  - explain the bits in the comments
>  - use bits[19:0] from the original value, all other will be taken from the
>fixup value.
> 
> v2:
>  - use lower bits of the original value (that was actually a typo)
>  - add fixes tag
>  - fix typo
> 
>  drivers/mmc/host/sdhci-of-esdhc.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c
> b/drivers/mmc/host/sdhci-of-esdhc.c index fb71c86..74cf3b1 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -66,6 +66,19 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
>   return ret;
>   }
>   }
> + /*
> +  * The DAT[3:0] line signal levels and the CMD line signal level are
> +  * not compatible with standard SDHC register. The line signal levels
> +  * DAT[7:0] are at bits 31:24 and the line signal level is at bit 23.
  ^
I guess there is a "command" missing, no?

Best regards,
Alexander



Re: [RFC][PATCH 2/7] kref: Add kref_read()

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 10:16:55AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:
> > Since we need to change the implementation, stop exposing internals.
> > 
> > Provide kref_read() to read the current reference count; typically
> > used for debug messages.
> 
> Can we just provide a printk specifier for a kref value instead as
> that is the only valid use case for reading the value?

Yeah, that would be great as no one should be doing anything
logic-related based on the kref value.

thanks,

greg k-h


Re: [PATCH v3] mmc: sdhci-of-esdhc: fixup PRESENT_STATE read

2016-11-14 Thread Alexander Stein
On Monday 14 November 2016 16:12:27, Michael Walle wrote:
> Since commit 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy
> cards in __mmc_switch()") the ESDHC driver is broken:
>   mmc0: Card stuck in programming state! __mmc_switch
>   mmc0: error -110 whilst initialising MMC card
> 
> Since this commit __mmc_switch() uses ->card_busy(), which is
> sdhci_card_busy() for the esdhc driver. sdhci_card_busy() uses the
> PRESENT_STATE register, specifically the DAT0 signal level bit. But the
> ESDHC uses a non-conformant PRESENT_STATE register, thus a read fixup is
> required to make the driver work again.
> 
> Signed-off-by: Michael Walle 
> Fixes: 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy cards in
> __mmc_switch()") ---
> v3:
>  - explain the bits in the comments
>  - use bits[19:0] from the original value, all other will be taken from the
>fixup value.
> 
> v2:
>  - use lower bits of the original value (that was actually a typo)
>  - add fixes tag
>  - fix typo
> 
>  drivers/mmc/host/sdhci-of-esdhc.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c
> b/drivers/mmc/host/sdhci-of-esdhc.c index fb71c86..74cf3b1 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -66,6 +66,19 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
>   return ret;
>   }
>   }
> + /*
> +  * The DAT[3:0] line signal levels and the CMD line signal level are
> +  * not compatible with standard SDHC register. The line signal levels
> +  * DAT[7:0] are at bits 31:24 and the line signal level is at bit 23.
  ^
I guess there is a "command" missing, no?

Best regards,
Alexander



[PATCH] perf/ring_buffer: Fix invalid page order

2016-11-14 Thread Takao Indoh
In rb_alloc_aux_page(), a page order is set to MAX_ORDER when order is
greater than MAX_ORDER, but page order should be less than MAX_ORDER,
therefore alloc_pages_node fails at least once. This patch fixes page
order so that it can be always less than MAX_ORDER.

Signed-off-by: Takao Indoh 
---
 kernel/events/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 257fa46..3f76fdd 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -502,8 +502,8 @@ static struct page *rb_alloc_aux_page(int node, int order)
 {
struct page *page;
 
-   if (order > MAX_ORDER)
-   order = MAX_ORDER;
+   if (order >= MAX_ORDER)
+   order = MAX_ORDER - 1;
 
do {
page = alloc_pages_node(node, PERF_AUX_GFP, order);
-- 
1.8.3.1



[PATCH] perf/ring_buffer: Fix invalid page order

2016-11-14 Thread Takao Indoh
In rb_alloc_aux_page(), a page order is set to MAX_ORDER when order is
greater than MAX_ORDER, but page order should be less than MAX_ORDER,
therefore alloc_pages_node fails at least once. This patch fixes page
order so that it can be always less than MAX_ORDER.

Signed-off-by: Takao Indoh 
---
 kernel/events/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 257fa46..3f76fdd 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -502,8 +502,8 @@ static struct page *rb_alloc_aux_page(int node, int order)
 {
struct page *page;
 
-   if (order > MAX_ORDER)
-   order = MAX_ORDER;
+   if (order >= MAX_ORDER)
+   order = MAX_ORDER - 1;
 
do {
page = alloc_pages_node(node, PERF_AUX_GFP, order);
-- 
1.8.3.1



Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-14 Thread Alexey Kardashevskiy
On 15/11/16 17:33, Kirti Wankhede wrote:
> 
> 
> On 11/15/2016 10:47 AM, Alexey Kardashevskiy wrote:
>> On 08/11/16 17:52, Alexey Kardashevskiy wrote:
>>> On 05/11/16 08:10, Kirti Wankhede wrote:
 VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
 Mediated device only uses IOMMU APIs, the underlying hardware can be
 managed by an IOMMU domain.

 Aim of this change is:
 - To use most of the code of TYPE1 IOMMU driver for mediated devices
 - To support direct assigned device and mediated device in single module

 This change adds pin and unpin support for mediated device to TYPE1 IOMMU
 backend module. More details:
 - vfio_pin_pages() callback here uses task and address space of vfio_dma,
   that is, of the process who mapped that iova range.
 - Added pfn_list tracking logic to address space structure. All pages
   pinned through this interface are trached in its address space.
 - Pinned pages list is used to verify unpinning request and to unpin
   remaining pages while detaching the group for that device.
 - Page accounting is updated to account in its address space where the
   pages are pinned/unpinned.
 -  Accouting for mdev device is only done if there is no iommu capable
   domain in the container. When there is a direct device assigned to the
   container and that domain is iommu capable, all pages are already pinned
   during DMA_MAP.
 - Page accouting is updated on hot plug and unplug mdev device and pass
   through device.

 Tested by assigning below combinations of devices to a single VM:
 - GPU pass through only
>>>
>>> This does not require this patchset, right?
>>>
> 
> Sorry I missed this earlier.
> This testing is required for this patch, because this patch touches code
> that is used for direct device assignment. Also for page accounting, all
> cases are considered i.e. when there is only pass through device in a
> container, when there is pass through device + vGPU device in a
> container. Also have to test that pages are pinned properly when device
> is hotplugged. In that case vfio_iommu_replay() is called to take
> necessary action.

So in this particular test you are only testing that the patchset did not
break the already existing functionality, is that correct?


> 
 - vGPU device only
>>>
>>> Out of curiosity - how exactly did you test this? The exact GPU, how to
>>> create vGPU, what was the QEMU command line and the guest does with this
>>> passed device? Thanks.
>>
>> ping?
>>
> 
> I'm testing this code with M60, with custom changes in our driver.


Is this shared anywhere? What does the mediated driver do? Can Tesla K80 do
the same thing, or [10de:15fe] (whatever its name is)?


> Steps how to create mediated device are listed in
> Documentation/vfio-mediated-device.txt for sample mtty driver. Same
> steps I'm following for GPU. Quoting those steps here for you:


Nah, I saw this, I was wondering about actual hardware :) Like when you say
"tested with vGPU" - I am wondering what is passed to the guest and how the
guest is actually using it.


> 
> 2. Create a mediated device by using the dummy device that you created
> in the
>previous step.
> 
># echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >  \
> 
> /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
> 
> 3. Add parameters to qemu-kvm.
> 
>-device vfio-pci,\
> sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001




-- 
Alexey


Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-14 Thread Alexey Kardashevskiy
On 15/11/16 17:33, Kirti Wankhede wrote:
> 
> 
> On 11/15/2016 10:47 AM, Alexey Kardashevskiy wrote:
>> On 08/11/16 17:52, Alexey Kardashevskiy wrote:
>>> On 05/11/16 08:10, Kirti Wankhede wrote:
 VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
 Mediated device only uses IOMMU APIs, the underlying hardware can be
 managed by an IOMMU domain.

 Aim of this change is:
 - To use most of the code of TYPE1 IOMMU driver for mediated devices
 - To support direct assigned device and mediated device in single module

 This change adds pin and unpin support for mediated device to TYPE1 IOMMU
 backend module. More details:
 - vfio_pin_pages() callback here uses task and address space of vfio_dma,
   that is, of the process who mapped that iova range.
 - Added pfn_list tracking logic to address space structure. All pages
   pinned through this interface are trached in its address space.
 - Pinned pages list is used to verify unpinning request and to unpin
   remaining pages while detaching the group for that device.
 - Page accounting is updated to account in its address space where the
   pages are pinned/unpinned.
 -  Accouting for mdev device is only done if there is no iommu capable
   domain in the container. When there is a direct device assigned to the
   container and that domain is iommu capable, all pages are already pinned
   during DMA_MAP.
 - Page accouting is updated on hot plug and unplug mdev device and pass
   through device.

 Tested by assigning below combinations of devices to a single VM:
 - GPU pass through only
>>>
>>> This does not require this patchset, right?
>>>
> 
> Sorry I missed this earlier.
> This testing is required for this patch, because this patch touches code
> that is used for direct device assignment. Also for page accounting, all
> cases are considered i.e. when there is only pass through device in a
> container, when there is pass through device + vGPU device in a
> container. Also have to test that pages are pinned properly when device
> is hotplugged. In that case vfio_iommu_replay() is called to take
> necessary action.

So in this particular test you are only testing that the patchset did not
break the already existing functionality, is that correct?


> 
 - vGPU device only
>>>
>>> Out of curiosity - how exactly did you test this? The exact GPU, how to
>>> create vGPU, what was the QEMU command line and the guest does with this
>>> passed device? Thanks.
>>
>> ping?
>>
> 
> I'm testing this code with M60, with custom changes in our driver.


Is this shared anywhere? What does the mediated driver do? Can Tesla K80 do
the same thing, or [10de:15fe] (whatever its name is)?


> Steps how to create mediated device are listed in
> Documentation/vfio-mediated-device.txt for sample mtty driver. Same
> steps I'm following for GPU. Quoting those steps here for you:


Nah, I saw this, I was wondering about actual hardware :) Like when you say
"tested with vGPU" - I am wondering what is passed to the guest and how the
guest is actually using it.


> 
> 2. Create a mediated device by using the dummy device that you created
> in the
>previous step.
> 
># echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >  \
> 
> /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
> 
> 3. Add parameters to qemu-kvm.
> 
>-device vfio-pci,\
> sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001




-- 
Alexey


Re: Patch procedure

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 12:16:08PM -0500, feas wrote:
> Here is how I am going about making the patches. It is basically
> what I have picked up from kernel newbies among other sites
> and videos on making patches. I would be greatful for any
> pointers on what seems to be the problem(s) with why it does
> not produce a proper patch.



Honestly, it's not our job to review someone's patch creation
proceedures and notes as everyone does it differently.  We will be glad
to review your patches that you create.

I think the review that I previously provided should be sufficient to
start with.  Again, send a short patch series to verify this, do not
send 100+ patches without getting feedback on them.

thanks,

greg k-h


Re: Patch procedure

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 12:16:08PM -0500, feas wrote:
> Here is how I am going about making the patches. It is basically
> what I have picked up from kernel newbies among other sites
> and videos on making patches. I would be greatful for any
> pointers on what seems to be the problem(s) with why it does
> not produce a proper patch.



Honestly, it's not our job to review someone's patch creation
proceedures and notes as everyone does it differently.  We will be glad
to review your patches that you create.

I think the review that I previously provided should be sufficient to
start with.  Again, send a short patch series to verify this, do not
send 100+ patches without getting feedback on them.

thanks,

greg k-h


Re: [PATCH v3] mmc: sdhci-of-esdhc: fixup PRESENT_STATE read

2016-11-14 Thread Adrian Hunter
On 14/11/16 17:12, Michael Walle wrote:
> Since commit 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy
> cards in __mmc_switch()") the ESDHC driver is broken:
>   mmc0: Card stuck in programming state! __mmc_switch
>   mmc0: error -110 whilst initialising MMC card
> 
> Since this commit __mmc_switch() uses ->card_busy(), which is
> sdhci_card_busy() for the esdhc driver. sdhci_card_busy() uses the
> PRESENT_STATE register, specifically the DAT0 signal level bit. But the
> ESDHC uses a non-conformant PRESENT_STATE register, thus a read fixup is
> required to make the driver work again.
> 
> Signed-off-by: Michael Walle 
> Fixes: 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy cards in 
> __mmc_switch()")
> ---
> v3:
>  - explain the bits in the comments
>  - use bits[19:0] from the original value, all other will be taken from the
>fixup value.
> 
> v2:
>  - use lower bits of the original value (that was actually a typo)
>  - add fixes tag
>  - fix typo
> 
>  drivers/mmc/host/sdhci-of-esdhc.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c 
> b/drivers/mmc/host/sdhci-of-esdhc.c
> index fb71c86..74cf3b1 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -66,6 +66,19 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
>   return ret;
>   }
>   }
> + /*
> +  * The DAT[3:0] line signal levels and the CMD line signal level are
> +  * not compatible with standard SDHC register. The line signal levels
> +  * DAT[7:0] are at bits 31:24 and the line signal level is at bit 23.
> +  * All other bits are the same as in the standard SDHC register.
> +  */
> + if (spec_reg == SDHCI_PRESENT_STATE) {
> + ret = value & 0x000f;
> + ret |= (value >> 4) & SDHCI_DATA_LVL_MASK;
> + ret |= (value << 1) & 0x0100;

Please define the command line level bit in sdhci.h and use that here.
e.g.

diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 766df17fb7eb..2570455b219a 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -73,6 +73,7 @@
 #define  SDHCI_DATA_LVL_MASK   0x00F0
 #define   SDHCI_DATA_LVL_SHIFT 20
 #define   SDHCI_DATA_0_LVL_MASK0x0010
+#define  SDHCI_CMD_LVL 0x0100
 
 #define SDHCI_HOST_CONTROL 0x28
 #define  SDHCI_CTRL_LED0x01


> + return ret;
> + }
> +
>   ret = value;
>   return ret;
>  }
> 



Re: [PATCH v3] mmc: sdhci-of-esdhc: fixup PRESENT_STATE read

2016-11-14 Thread Adrian Hunter
On 14/11/16 17:12, Michael Walle wrote:
> Since commit 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy
> cards in __mmc_switch()") the ESDHC driver is broken:
>   mmc0: Card stuck in programming state! __mmc_switch
>   mmc0: error -110 whilst initialising MMC card
> 
> Since this commit __mmc_switch() uses ->card_busy(), which is
> sdhci_card_busy() for the esdhc driver. sdhci_card_busy() uses the
> PRESENT_STATE register, specifically the DAT0 signal level bit. But the
> ESDHC uses a non-conformant PRESENT_STATE register, thus a read fixup is
> required to make the driver work again.
> 
> Signed-off-by: Michael Walle 
> Fixes: 87a18a6a5652 ("mmc: mmc: Use ->card_busy() to detect busy cards in 
> __mmc_switch()")
> ---
> v3:
>  - explain the bits in the comments
>  - use bits[19:0] from the original value, all other will be taken from the
>fixup value.
> 
> v2:
>  - use lower bits of the original value (that was actually a typo)
>  - add fixes tag
>  - fix typo
> 
>  drivers/mmc/host/sdhci-of-esdhc.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-of-esdhc.c 
> b/drivers/mmc/host/sdhci-of-esdhc.c
> index fb71c86..74cf3b1 100644
> --- a/drivers/mmc/host/sdhci-of-esdhc.c
> +++ b/drivers/mmc/host/sdhci-of-esdhc.c
> @@ -66,6 +66,19 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
>   return ret;
>   }
>   }
> + /*
> +  * The DAT[3:0] line signal levels and the CMD line signal level are
> +  * not compatible with standard SDHC register. The line signal levels
> +  * DAT[7:0] are at bits 31:24 and the line signal level is at bit 23.
> +  * All other bits are the same as in the standard SDHC register.
> +  */
> + if (spec_reg == SDHCI_PRESENT_STATE) {
> + ret = value & 0x000f;
> + ret |= (value >> 4) & SDHCI_DATA_LVL_MASK;
> + ret |= (value << 1) & 0x0100;

Please define the command line level bit in sdhci.h and use that here.
e.g.

diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 766df17fb7eb..2570455b219a 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -73,6 +73,7 @@
 #define  SDHCI_DATA_LVL_MASK   0x00F0
 #define   SDHCI_DATA_LVL_SHIFT 20
 #define   SDHCI_DATA_0_LVL_MASK0x0010
+#define  SDHCI_CMD_LVL 0x0100
 
 #define SDHCI_HOST_CONTROL 0x28
 #define  SDHCI_CTRL_LED0x01


> + return ret;
> + }
> +
>   ret = value;
>   return ret;
>  }
> 



Re: [Intel-gfx] [PATCH v11 3/4] drm/i915: Use new CRC debugfs API

2016-11-14 Thread David Weinehall
On Mon, Nov 14, 2016 at 12:44:25PM +0200, Jani Nikula wrote:
> On Thu, 06 Oct 2016, Tomeu Vizoso  wrote:
> > diff --git a/drivers/gpu/drm/i915/intel_display.c 
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 23a6c7213eca..7412a05fa5d9 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -14636,6 +14636,7 @@ static const struct drm_crtc_funcs intel_crtc_funcs 
> > = {
> > .page_flip = intel_crtc_page_flip,
> > .atomic_duplicate_state = intel_crtc_duplicate_state,
> > .atomic_destroy_state = intel_crtc_destroy_state,
> > +   .set_crc_source = intel_crtc_set_crc_source,
> >  };
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h 
> > b/drivers/gpu/drm/i915/intel_drv.h
> > index 737261b09110..31894b7c6517 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -1844,6 +1844,14 @@ void intel_color_load_luts(struct drm_crtc_state 
> > *crtc_state);
> >  /* intel_pipe_crc.c */
> >  int intel_pipe_crc_create(struct drm_minor *minor);
> >  void intel_pipe_crc_cleanup(struct drm_minor *minor);
> > +#ifdef CONFIG_DEBUG_FS
> > +int intel_crtc_set_crc_source(struct drm_crtc *crtc, const char 
> > *source_name,
> > + size_t *values_cnt);
> > +#else
> > +static inline int intel_crtc_set_crc_source(struct drm_crtc *crtc,
> > +   const char *source_name,
> > +   size_t *values_cnt) { return 0; }
> > +#endif
> 
> "inline" here doesn't work because it's used as a function pointer.
> 
> Is it better to have a function that returns 0 for .set_crc_source, or
> to set .set_crc_source to NULL when CONFIG_DEBUG_FS=n?

I'd say that whenever we have a function pointer we should have a dummy
function without side-effects for this kind of things.


Kind regards, David


Re: [Intel-gfx] [PATCH v11 3/4] drm/i915: Use new CRC debugfs API

2016-11-14 Thread David Weinehall
On Mon, Nov 14, 2016 at 12:44:25PM +0200, Jani Nikula wrote:
> On Thu, 06 Oct 2016, Tomeu Vizoso  wrote:
> > diff --git a/drivers/gpu/drm/i915/intel_display.c 
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 23a6c7213eca..7412a05fa5d9 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -14636,6 +14636,7 @@ static const struct drm_crtc_funcs intel_crtc_funcs 
> > = {
> > .page_flip = intel_crtc_page_flip,
> > .atomic_duplicate_state = intel_crtc_duplicate_state,
> > .atomic_destroy_state = intel_crtc_destroy_state,
> > +   .set_crc_source = intel_crtc_set_crc_source,
> >  };
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h 
> > b/drivers/gpu/drm/i915/intel_drv.h
> > index 737261b09110..31894b7c6517 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -1844,6 +1844,14 @@ void intel_color_load_luts(struct drm_crtc_state 
> > *crtc_state);
> >  /* intel_pipe_crc.c */
> >  int intel_pipe_crc_create(struct drm_minor *minor);
> >  void intel_pipe_crc_cleanup(struct drm_minor *minor);
> > +#ifdef CONFIG_DEBUG_FS
> > +int intel_crtc_set_crc_source(struct drm_crtc *crtc, const char 
> > *source_name,
> > + size_t *values_cnt);
> > +#else
> > +static inline int intel_crtc_set_crc_source(struct drm_crtc *crtc,
> > +   const char *source_name,
> > +   size_t *values_cnt) { return 0; }
> > +#endif
> 
> "inline" here doesn't work because it's used as a function pointer.
> 
> Is it better to have a function that returns 0 for .set_crc_source, or
> to set .set_crc_source to NULL when CONFIG_DEBUG_FS=n?

I'd say that whenever we have a function pointer we should have a dummy
function without side-effects for this kind of things.


Kind regards, David


Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Jan Beulich
>>> On 15.11.16 at 07:33,  wrote:
> On 15/11/16 01:11, Alex Thorlton wrote:
>> Hey everyone,
>> 
>> We're having problems with large systems hitting a BUG in
>> xen_memory_setup, due to extra e820 entries created in the
>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>> working, but Boris and I wanted to get opinions on whether or not this
>> is the appropriate/entire solution, which is why I've sent it as an RFC
>> for now.
>> 
>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>> which is a detail worth discussig.  He proposed possibly adding
>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>> value than E820MAX, since the Xen e820 table isn't bound by the
>> zero-page memory limitations.
>> 
>> I do *slightly* question the use of E820_X_MAX here, only from a
>> cosmetic prospective, as I believe this macro is intended to describe
>> the maximum size of the extended e820 table, which, AFAIK, is not used
>> by the Xen HV.  That being said, there isn't exactly a "more
>> appropriate" macro/variable to use, so this may not really be an issue.
>> 
>> Any input on the patch, or the questions I've raised above is greatly
>> appreciated!
> 
> While I think extending the e820 table is the right thing to do I'm
> questioning the assumptions here.
> 
> Looking briefly through the Xen hypervisor sources I think it isn't
> yet ready for such large machines: the hypervisor's e820 map seems to
> be still limited to 128 e820 entries. Jan, did I overlook an EFI
> specific path extending this limitation?

No, you didn't. I do question the correlation with "large machines"
here though: The issue isn't with large machines afaict, but with
ones having very many entries (i.e. heavily fragmented).

> In case I'm right the Xen hypervisor should be prepared for a larger
> e820 map, but this won't help alone as there would still be additional
> entries for the IOAPICs created.
> 
> So I think we need something like:
> 
> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
> 
> and use this for sizing xen_e820_map[].

I would say that if any change gets done here, there shouldn't be
any static upper limit at all. That could even be viewed as in line
with recent e820.c changes moving to dynamic allocations. In
particular I don't see why MAX_IO_APICS would need adding in
here, but not other (current and future) factors determining the
(pseudo) E820 map Xen presents to Dom0.

Jan



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Jan Beulich
>>> On 15.11.16 at 07:33,  wrote:
> On 15/11/16 01:11, Alex Thorlton wrote:
>> Hey everyone,
>> 
>> We're having problems with large systems hitting a BUG in
>> xen_memory_setup, due to extra e820 entries created in the
>> XENMEM_machine_memory_map callback.  The change in the patch gets things
>> working, but Boris and I wanted to get opinions on whether or not this
>> is the appropriate/entire solution, which is why I've sent it as an RFC
>> for now.
>> 
>> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
>> which is a detail worth discussig.  He proposed possibly adding
>> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
>> value than E820MAX, since the Xen e820 table isn't bound by the
>> zero-page memory limitations.
>> 
>> I do *slightly* question the use of E820_X_MAX here, only from a
>> cosmetic prospective, as I believe this macro is intended to describe
>> the maximum size of the extended e820 table, which, AFAIK, is not used
>> by the Xen HV.  That being said, there isn't exactly a "more
>> appropriate" macro/variable to use, so this may not really be an issue.
>> 
>> Any input on the patch, or the questions I've raised above is greatly
>> appreciated!
> 
> While I think extending the e820 table is the right thing to do I'm
> questioning the assumptions here.
> 
> Looking briefly through the Xen hypervisor sources I think it isn't
> yet ready for such large machines: the hypervisor's e820 map seems to
> be still limited to 128 e820 entries. Jan, did I overlook an EFI
> specific path extending this limitation?

No, you didn't. I do question the correlation with "large machines"
here though: The issue isn't with large machines afaict, but with
ones having very many entries (i.e. heavily fragmented).

> In case I'm right the Xen hypervisor should be prepared for a larger
> e820 map, but this won't help alone as there would still be additional
> entries for the IOAPICs created.
> 
> So I think we need something like:
> 
> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
> 
> and use this for sizing xen_e820_map[].

I would say that if any change gets done here, there shouldn't be
any static upper limit at all. That could even be viewed as in line
with recent e820.c changes moving to dynamic allocations. In
particular I don't see why MAX_IO_APICS would need adding in
here, but not other (current and future) factors determining the
(pseudo) E820 map Xen presents to Dom0.

Jan



Re: [PATHCv10 1/2] usb: USB Type-C connector class

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 12:46:50PM -0800, Guenter Roeck wrote:
> On Mon, Nov 14, 2016 at 02:32:35PM +0200, Heikki Krogerus wrote:
> > Hi Greg,
> > 
> > On Mon, Nov 14, 2016 at 10:51:48AM +0100, Greg KH wrote:
> > > On Mon, Sep 19, 2016 at 02:16:56PM +0300, Heikki Krogerus wrote:
> > > > The purpose of USB Type-C connector class is to provide
> > > > unified interface for the user space to get the status and
> > > > basic information about USB Type-C connectors on a system,
> > > > control over data role swapping, and when the port supports
> > > > USB Power Delivery, also control over power role swapping
> > > > and Alternate Modes.
> > > > 
> > > > Reviewed-by: Guenter Roeck 
> > > > Tested-by: Guenter Roeck 
> > > > Signed-off-by: Heikki Krogerus 
> > > > ---
> > > >  Documentation/ABI/testing/sysfs-class-typec |  218 ++
> > > >  Documentation/usb/typec.txt |  103 +++
> > > >  MAINTAINERS |9 +
> > > >  drivers/usb/Kconfig |2 +
> > > >  drivers/usb/Makefile|2 +
> > > >  drivers/usb/typec/Kconfig   |7 +
> > > >  drivers/usb/typec/Makefile  |1 +
> > > >  drivers/usb/typec/typec.c   | 1075 
> > > > +++
> > > >  include/linux/usb/typec.h   |  252 +++
> > > >  9 files changed, 1669 insertions(+)
> > > >  create mode 100644 Documentation/ABI/testing/sysfs-class-typec
> > > >  create mode 100644 Documentation/usb/typec.txt
> > > >  create mode 100644 drivers/usb/typec/Kconfig
> > > >  create mode 100644 drivers/usb/typec/Makefile
> > > >  create mode 100644 drivers/usb/typec/typec.c
> > > >  create mode 100644 include/linux/usb/typec.h
> > > 
> [ ... ]
> 
> > > > +
> > > > +int typec_connect(struct typec_port *port, struct typec_connection 
> > > > *con)
> > > > +{
> > > > +   int ret;
> > > > +
> > > > +   if (!con->partner && !con->cable)
> > > > +   return -EINVAL;
> > > > +
> > > > +   port->connected = 1;
> > > > +   port->data_role = con->data_role;
> > > > +   port->pwr_role = con->pwr_role;
> > > > +   port->vconn_role = con->vconn_role;
> > > > +   port->pwr_opmode = con->pwr_opmode;
> > > > +
> > > > +   kobject_uevent(>dev.kobj, KOBJ_CHANGE);
> > > 
> > > This worries me.  Who is listening for it?  What will you do with it?
> > > Shouldn't you just poll on an attribute file instead?
> > 
> > Oliver! Did you need this or can we remove it?
> > 
> > I remember I removed the "connected" attribute because you did not see
> > any use for it at one point. I don't remember the reason exactly why?
> > 
> 
> The Android team tells me that they are currently using the udev events
> to track port role changes, and to detect presence of port partner.
> 
> Also, there are plans to track changes on usbc*cable to differentiate
> between cable attach vs. device being attached on the remote end. 
> 
> What is the problem with using kobject_uevent() and thus presumably
> udev events ?

It's not a "normal" thing to do and is pretty "heavy" to do.  What does
userspace do with that change event?  Does it read specific attributes?
What causes the event to happen in the kernel, is it really just a
change in the specific object, or do new ones get added/removed?

In short, document the heck out of this please so people know how to use
it, and what is happening when the event happens.

thanks,

greg k-h


Re: [PATHCv10 1/2] usb: USB Type-C connector class

2016-11-14 Thread Greg KH
On Mon, Nov 14, 2016 at 12:46:50PM -0800, Guenter Roeck wrote:
> On Mon, Nov 14, 2016 at 02:32:35PM +0200, Heikki Krogerus wrote:
> > Hi Greg,
> > 
> > On Mon, Nov 14, 2016 at 10:51:48AM +0100, Greg KH wrote:
> > > On Mon, Sep 19, 2016 at 02:16:56PM +0300, Heikki Krogerus wrote:
> > > > The purpose of USB Type-C connector class is to provide
> > > > unified interface for the user space to get the status and
> > > > basic information about USB Type-C connectors on a system,
> > > > control over data role swapping, and when the port supports
> > > > USB Power Delivery, also control over power role swapping
> > > > and Alternate Modes.
> > > > 
> > > > Reviewed-by: Guenter Roeck 
> > > > Tested-by: Guenter Roeck 
> > > > Signed-off-by: Heikki Krogerus 
> > > > ---
> > > >  Documentation/ABI/testing/sysfs-class-typec |  218 ++
> > > >  Documentation/usb/typec.txt |  103 +++
> > > >  MAINTAINERS |9 +
> > > >  drivers/usb/Kconfig |2 +
> > > >  drivers/usb/Makefile|2 +
> > > >  drivers/usb/typec/Kconfig   |7 +
> > > >  drivers/usb/typec/Makefile  |1 +
> > > >  drivers/usb/typec/typec.c   | 1075 
> > > > +++
> > > >  include/linux/usb/typec.h   |  252 +++
> > > >  9 files changed, 1669 insertions(+)
> > > >  create mode 100644 Documentation/ABI/testing/sysfs-class-typec
> > > >  create mode 100644 Documentation/usb/typec.txt
> > > >  create mode 100644 drivers/usb/typec/Kconfig
> > > >  create mode 100644 drivers/usb/typec/Makefile
> > > >  create mode 100644 drivers/usb/typec/typec.c
> > > >  create mode 100644 include/linux/usb/typec.h
> > > 
> [ ... ]
> 
> > > > +
> > > > +int typec_connect(struct typec_port *port, struct typec_connection 
> > > > *con)
> > > > +{
> > > > +   int ret;
> > > > +
> > > > +   if (!con->partner && !con->cable)
> > > > +   return -EINVAL;
> > > > +
> > > > +   port->connected = 1;
> > > > +   port->data_role = con->data_role;
> > > > +   port->pwr_role = con->pwr_role;
> > > > +   port->vconn_role = con->vconn_role;
> > > > +   port->pwr_opmode = con->pwr_opmode;
> > > > +
> > > > +   kobject_uevent(>dev.kobj, KOBJ_CHANGE);
> > > 
> > > This worries me.  Who is listening for it?  What will you do with it?
> > > Shouldn't you just poll on an attribute file instead?
> > 
> > Oliver! Did you need this or can we remove it?
> > 
> > I remember I removed the "connected" attribute because you did not see
> > any use for it at one point. I don't remember the reason exactly why?
> > 
> 
> The Android team tells me that they are currently using the udev events
> to track port role changes, and to detect presence of port partner.
> 
> Also, there are plans to track changes on usbc*cable to differentiate
> between cable attach vs. device being attached on the remote end. 
> 
> What is the problem with using kobject_uevent() and thus presumably
> udev events ?

It's not a "normal" thing to do and is pretty "heavy" to do.  What does
userspace do with that change event?  Does it read specific attributes?
What causes the event to happen in the kernel, is it really just a
change in the specific object, or do new ones get added/removed?

In short, document the heck out of this please so people know how to use
it, and what is happening when the event happens.

thanks,

greg k-h


RE: [PATCH net-next v5] cadence: Add LSO support.

2016-11-14 Thread Rafal Ozieblo
> > > If UFO is in use it should not silently disable UDP checksums.
> > > 
> > > If you cannot support UFO with proper checksumming, then you cannot 
> > > enable support for that feature.
> > 
> > According Cadence Gigabit Ethernet MAC documentation:
> > 
> > "Hardware will not calculate the UDP checksum or modify the UDP 
> > checksum field. Therefore software must set a value of zero in the 
> > checksum field in the UDP header (in the first payload buffer) to indicate 
> > to the receiver that the UDP datagram does not include a checksum."
> > 
> > It is hardware requirement.
>
> I do not doubt that it is a hardware restriction.
>
> But I am saying that you cannot enable this feature under Linux if this is 
> how it operates on your hardware.

Would it be good to enable UFO conditionally with some internal define? Ex.:

+#ifdef MACB_ENABLE_UFO
+#define MACB_NETIF_LSO (NETIF_F_TSO | NETIF_F_UFO)
+#else
+#define MACB_NETIF_LSO (NETIF_F_TSO)
+#endif

I could add precise comment here that ufo is possible only without checksum.

Or maybe I could enable it from module_params or device-tree (like: 
drivers/net/ethernet/neterion/s2io.c).


RE: [PATCH net-next v5] cadence: Add LSO support.

2016-11-14 Thread Rafal Ozieblo
> > > If UFO is in use it should not silently disable UDP checksums.
> > > 
> > > If you cannot support UFO with proper checksumming, then you cannot 
> > > enable support for that feature.
> > 
> > According Cadence Gigabit Ethernet MAC documentation:
> > 
> > "Hardware will not calculate the UDP checksum or modify the UDP 
> > checksum field. Therefore software must set a value of zero in the 
> > checksum field in the UDP header (in the first payload buffer) to indicate 
> > to the receiver that the UDP datagram does not include a checksum."
> > 
> > It is hardware requirement.
>
> I do not doubt that it is a hardware restriction.
>
> But I am saying that you cannot enable this feature under Linux if this is 
> how it operates on your hardware.

Would it be good to enable UFO conditionally with some internal define? Ex.:

+#ifdef MACB_ENABLE_UFO
+#define MACB_NETIF_LSO (NETIF_F_TSO | NETIF_F_UFO)
+#else
+#define MACB_NETIF_LSO (NETIF_F_TSO)
+#endif

I could add precise comment here that ufo is possible only without checksum.

Or maybe I could enable it from module_params or device-tree (like: 
drivers/net/ethernet/neterion/s2io.c).


Re: [PATCH v7 2/5] mm: remove x86-only restriction of movable_node

2016-11-14 Thread Aneesh Kumar K.V
Reza Arbab  writes:

> In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
> option"), the memblock allocation direction is changed to bottom-up and
> then back to top-down like this:
>
> 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
> 2. memblock_set_bottom_up(false), called by x86's numa_init().
>
> Even though (1) occurs in generic mm code, it is wrapped by #ifdef
> CONFIG_MOVABLE_NODE, which depends on X86_64.
>
> This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
> things will be unbalanced. (1) will happen for them, but (2) will not.
>
> This toggle was added in the first place because x86 has a delay between
> adding memblocks and marking them as hotpluggable. Since other arches do
> this marking either immediately or not at all, they do not require the
> bottom-up toggle.
>
> So, resolve things by moving (1) from cmdline_parse_movable_node() to
> x86's setup_arch(), immediately after the movable_node parameter has
> been parsed.


Considering that we now can mark memblock hotpluggable, do we need to
enable the bottom up allocation for ppc64 also ?


>
> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt |  2 +-
>  arch/x86/kernel/setup.c | 24 

-aneesh



Re: [PATCH v7 2/5] mm: remove x86-only restriction of movable_node

2016-11-14 Thread Aneesh Kumar K.V
Reza Arbab  writes:

> In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
> option"), the memblock allocation direction is changed to bottom-up and
> then back to top-down like this:
>
> 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
> 2. memblock_set_bottom_up(false), called by x86's numa_init().
>
> Even though (1) occurs in generic mm code, it is wrapped by #ifdef
> CONFIG_MOVABLE_NODE, which depends on X86_64.
>
> This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
> things will be unbalanced. (1) will happen for them, but (2) will not.
>
> This toggle was added in the first place because x86 has a delay between
> adding memblocks and marking them as hotpluggable. Since other arches do
> this marking either immediately or not at all, they do not require the
> bottom-up toggle.
>
> So, resolve things by moving (1) from cmdline_parse_movable_node() to
> x86's setup_arch(), immediately after the movable_node parameter has
> been parsed.


Considering that we now can mark memblock hotpluggable, do we need to
enable the bottom up allocation for ppc64 also ?


>
> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt |  2 +-
>  arch/x86/kernel/setup.c | 24 

-aneesh



[PATCH] lkdtm: Prevent the compiler from optimising lkdtm_CORRUPT_STACK()

2016-11-14 Thread Michael Ellerman
At least on powerpc with GCC 6, the compiler is smart enough to optimise
lkdtm_CORRUPT_STACK() into an empty function that just returns.

If we print the buffer after we've written to it that prevents the
compiler from optimising away data and the memset().

Signed-off-by: Michael Ellerman 
---
 drivers/misc/lkdtm_bugs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c
index 182ae1894b32..30e62dd7e7ca 100644
--- a/drivers/misc/lkdtm_bugs.c
+++ b/drivers/misc/lkdtm_bugs.c
@@ -80,7 +80,8 @@ noinline void lkdtm_CORRUPT_STACK(void)
/* Use default char array length that triggers stack protection. */
char data[8];
 
-   memset((void *)data, 0, 64);
+   memset((void *)data, 'a', 64);
+   pr_info("Corrupted stack with '%16s'...\n", data);
 }
 
 void lkdtm_UNALIGNED_LOAD_STORE_WRITE(void)
-- 
2.7.4



Re: [PATCH] thermal/powerclamp: add back module device table

2016-11-14 Thread Greg Kroah-Hartman
On Mon, Nov 14, 2016 at 11:08:45AM -0800, Jacob Pan wrote:
> Commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module
> cpu id table with a cpu feature check, which is logically correct.
> But we need the module device table to allow module auto loading.
> 
> Fixes:3105f234 thermal/powerclamp: correct cpu support check
> Signed-off-by: Jacob Pan 
> ---
>  drivers/thermal/intel_powerclamp.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.




[PATCH] lkdtm: Prevent the compiler from optimising lkdtm_CORRUPT_STACK()

2016-11-14 Thread Michael Ellerman
At least on powerpc with GCC 6, the compiler is smart enough to optimise
lkdtm_CORRUPT_STACK() into an empty function that just returns.

If we print the buffer after we've written to it that prevents the
compiler from optimising away data and the memset().

Signed-off-by: Michael Ellerman 
---
 drivers/misc/lkdtm_bugs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c
index 182ae1894b32..30e62dd7e7ca 100644
--- a/drivers/misc/lkdtm_bugs.c
+++ b/drivers/misc/lkdtm_bugs.c
@@ -80,7 +80,8 @@ noinline void lkdtm_CORRUPT_STACK(void)
/* Use default char array length that triggers stack protection. */
char data[8];
 
-   memset((void *)data, 0, 64);
+   memset((void *)data, 'a', 64);
+   pr_info("Corrupted stack with '%16s'...\n", data);
 }
 
 void lkdtm_UNALIGNED_LOAD_STORE_WRITE(void)
-- 
2.7.4



Re: [PATCH] thermal/powerclamp: add back module device table

2016-11-14 Thread Greg Kroah-Hartman
On Mon, Nov 14, 2016 at 11:08:45AM -0800, Jacob Pan wrote:
> Commit 3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module
> cpu id table with a cpu feature check, which is logically correct.
> But we need the module device table to allow module auto loading.
> 
> Fixes:3105f234 thermal/powerclamp: correct cpu support check
> Signed-off-by: Jacob Pan 
> ---
>  drivers/thermal/intel_powerclamp.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.




Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)

2016-11-14 Thread Namhyung Kim
Hi Ingo,

On Tue, Nov 15, 2016 at 07:42:14AM +0100, Ingo Molnar wrote:
> 
> * Namhyung Kim  wrote:
> 
> > Hello,
> > 
> > This patchset is a rebased version of David's sched timehist work [1].
> > I plan to improve perf sched command more and think that having
> > timehist command before the work looks good.  It seems David is busy
> > these days, so I'm retrying it by myself.
> > 
> > This implements only basic feature and a few options.  I just split
> > the patch to make it easier to review and did some cosmetic changes.
> > More patches will come later.
> > 
> > The below is from the David's original description:
> > 
> > 8<-
> > 'perf sched timehist' provides an analysis of scheduling events.
> > 
> > Example usage:
> > perf sched record -- sleep 1
> > perf sched timehist
> 
> 
> Cool, very nice!

:)

> 
> > By default it shows the individual schedule events, including the time 
> > between
> > sched-in events for the task, the task scheduling delay (time between wakeup
> > and actually running) and run time for the task:
> > 
> >time cpu  task name[tid/pid]b/n time sch delay  run time
> >   -   - - -
> >79371.874569 [11] gcc[31949]   0.014 0.000 1.148
> >79371.874591 [10] gcc[31951]   0.000 0.000 0.024
> >79371.874603 [10] migration/10[59] 3.350 0.004 0.011
> >79371.874604 [11]1.148 0.000 0.035
> >79371.874723 [05]0.016 0.000 1.383
> >79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> > ...
> 
> What does the 'b/n' abbreviation stand for? 'Between'? Could we call the 
> column 
> 'sch wait' instead, or so?

Looks better, or what about 'wait time'?

> 
> 
> > Times are in msec.usec.
> > 
> > If callchains were recorded they are appended to the line with a default 
> > stack depth of 5:
> > 
> >79371.874569 [11] gcc[31949]  0.14  0.00  
> > 0.001148  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
> >79371.874591 [10] gcc[31951]  0.00  0.00  
> > 0.24  __cond_resched _cond_resched wait_for_completion stop_one_cpu 
> > sched_exec
> >79371.874603 [10] migration/10[59]0.003350  0.04  
> > 0.11  smpboot_thread_fn kthread ret_from_fork
> >79371.874604 [11]   0.001148  0.00  
> > 0.35  cpu_startup_entry start_secondary
> >79371.874723 [05]   0.16  0.00  
> > 0.001383  cpu_startup_entry start_secondary
> >79371.874746 [05] gcc[31949]  0.000153  0.78  
> > 0.22  do_wait sys_wait4 system_call_fastpath __GI___waitpid
> 
> So when I first saw this it was hard for me to disambiguate individual 
> function 
> names. Wouldn't this be a bit more readable:
> 
> >79371.874569 [11] gcc[31949]  0.14  0.00  
> > 0.001148  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() 
> > <- __vfork()
> >79371.874591 [10] gcc[31951]  0.00  0.00  
> > 0.24  __cond_resched() <- _cond_resched() <- wait_for_completion() <- 
> > stop_one_cpu() <- sched_exec()
> >79371.874603 [10] migration/10[59]0.003350  0.04  
> > 0.11  smpboot_thread_fn() <- kthread() <- ret_from_fork()
> >79371.874604 [11]   0.001148  0.00  
> > 0.35  cpu_startup_entry() <- start_secondary()
> >79371.874723 [05]   0.16  0.00  
> > 0.001383  cpu_startup_entry() <- start_secondary()
> >79371.874746 [05] gcc[31949]  0.000153  0.78  
> > 0.22  do_wait() <- sys_wait4() <- system_call_fastpath() <- 
> > __GI___waitpid()
> 
> Or:
> 
> >79371.874569 [11] gcc[31949]  0.14  0.00  
> > 0.001148  wait_for_completion_killable()   <- do_fork sys_vfork 
> > stub_vfork() <- __vfork()
> >79371.874591 [10] gcc[31951]  0.00  0.00  
> > 0.24  __cond_resched() <- _cond_resched() <- 
> > wait_for_completion() <- stop_one_cpu() <- sched_exec()
> >79371.874603 [10] migration/10[59]0.003350  0.04  
> > 0.11  smpboot_thread_fn()  <- kthread() <- 
> > ret_from_fork()
> >79371.874604 [11]   0.001148  0.00  
> > 0.35  cpu_startup_entry()  <- start_secondary()
> >79371.874723 [05]   0.16  0.00  
> > 0.001383  cpu_startup_entry()  <- start_secondary()
> >79371.874746 [05] gcc[31949]  0.000153  0.78  
> > 0.22  do_wait()<- sys_wait4() <- 
> > system_call_fastpath() <- __GI___waitpid()
> 
> (i.e. visually separate the first entry - and list 

Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)

2016-11-14 Thread Namhyung Kim
Hi Ingo,

On Tue, Nov 15, 2016 at 07:42:14AM +0100, Ingo Molnar wrote:
> 
> * Namhyung Kim  wrote:
> 
> > Hello,
> > 
> > This patchset is a rebased version of David's sched timehist work [1].
> > I plan to improve perf sched command more and think that having
> > timehist command before the work looks good.  It seems David is busy
> > these days, so I'm retrying it by myself.
> > 
> > This implements only basic feature and a few options.  I just split
> > the patch to make it easier to review and did some cosmetic changes.
> > More patches will come later.
> > 
> > The below is from the David's original description:
> > 
> > 8<-
> > 'perf sched timehist' provides an analysis of scheduling events.
> > 
> > Example usage:
> > perf sched record -- sleep 1
> > perf sched timehist
> 
> 
> Cool, very nice!

:)

> 
> > By default it shows the individual schedule events, including the time 
> > between
> > sched-in events for the task, the task scheduling delay (time between wakeup
> > and actually running) and run time for the task:
> > 
> >time cpu  task name[tid/pid]b/n time sch delay  run time
> >   -   - - -
> >79371.874569 [11] gcc[31949]   0.014 0.000 1.148
> >79371.874591 [10] gcc[31951]   0.000 0.000 0.024
> >79371.874603 [10] migration/10[59] 3.350 0.004 0.011
> >79371.874604 [11]1.148 0.000 0.035
> >79371.874723 [05]0.016 0.000 1.383
> >79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> > ...
> 
> What does the 'b/n' abbreviation stand for? 'Between'? Could we call the 
> column 
> 'sch wait' instead, or so?

Looks better, or what about 'wait time'?

> 
> 
> > Times are in msec.usec.
> > 
> > If callchains were recorded they are appended to the line with a default 
> > stack depth of 5:
> > 
> >79371.874569 [11] gcc[31949]  0.14  0.00  
> > 0.001148  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
> >79371.874591 [10] gcc[31951]  0.00  0.00  
> > 0.24  __cond_resched _cond_resched wait_for_completion stop_one_cpu 
> > sched_exec
> >79371.874603 [10] migration/10[59]0.003350  0.04  
> > 0.11  smpboot_thread_fn kthread ret_from_fork
> >79371.874604 [11]   0.001148  0.00  
> > 0.35  cpu_startup_entry start_secondary
> >79371.874723 [05]   0.16  0.00  
> > 0.001383  cpu_startup_entry start_secondary
> >79371.874746 [05] gcc[31949]  0.000153  0.78  
> > 0.22  do_wait sys_wait4 system_call_fastpath __GI___waitpid
> 
> So when I first saw this it was hard for me to disambiguate individual 
> function 
> names. Wouldn't this be a bit more readable:
> 
> >79371.874569 [11] gcc[31949]  0.14  0.00  
> > 0.001148  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() 
> > <- __vfork()
> >79371.874591 [10] gcc[31951]  0.00  0.00  
> > 0.24  __cond_resched() <- _cond_resched() <- wait_for_completion() <- 
> > stop_one_cpu() <- sched_exec()
> >79371.874603 [10] migration/10[59]0.003350  0.04  
> > 0.11  smpboot_thread_fn() <- kthread() <- ret_from_fork()
> >79371.874604 [11]   0.001148  0.00  
> > 0.35  cpu_startup_entry() <- start_secondary()
> >79371.874723 [05]   0.16  0.00  
> > 0.001383  cpu_startup_entry() <- start_secondary()
> >79371.874746 [05] gcc[31949]  0.000153  0.78  
> > 0.22  do_wait() <- sys_wait4() <- system_call_fastpath() <- 
> > __GI___waitpid()
> 
> Or:
> 
> >79371.874569 [11] gcc[31949]  0.14  0.00  
> > 0.001148  wait_for_completion_killable()   <- do_fork sys_vfork 
> > stub_vfork() <- __vfork()
> >79371.874591 [10] gcc[31951]  0.00  0.00  
> > 0.24  __cond_resched() <- _cond_resched() <- 
> > wait_for_completion() <- stop_one_cpu() <- sched_exec()
> >79371.874603 [10] migration/10[59]0.003350  0.04  
> > 0.11  smpboot_thread_fn()  <- kthread() <- 
> > ret_from_fork()
> >79371.874604 [11]   0.001148  0.00  
> > 0.35  cpu_startup_entry()  <- start_secondary()
> >79371.874723 [05]   0.16  0.00  
> > 0.001383  cpu_startup_entry()  <- start_secondary()
> >79371.874746 [05] gcc[31949]  0.000153  0.78  
> > 0.22  do_wait()<- sys_wait4() <- 
> > system_call_fastpath() <- __GI___waitpid()
> 
> (i.e. visually separate the first entry - and list the rest.)
> 
> Or 

[PATCH] clk: qcom: smd-rpm: Add msm8974 clocks

2016-11-14 Thread Bjorn Andersson
This adds all RPM based clocks for msm8974 except cxo and gfx3d_clk_src.

Signed-off-by: Bjorn Andersson 
---
 .../devicetree/bindings/clock/qcom,rpmcc.txt   |  1 +
 drivers/clk/qcom/clk-smd-rpm.c | 71 ++
 include/dt-bindings/clock/qcom,rpmcc.h | 40 +++-
 3 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt 
b/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt
index 87d3714b956a..a7235e9e1c97 100644
--- a/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt
+++ b/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt
@@ -11,6 +11,7 @@ Required properties :
compatible "qcom,rpmcc" should be also included.
 
"qcom,rpmcc-msm8916", "qcom,rpmcc"
+   "qcom,rpmcc-msm8974", "qcom,rpmcc"
"qcom,rpmcc-apq8064", "qcom,rpmcc"
 
 - #clock-cells : shall contain 1
diff --git a/drivers/clk/qcom/clk-smd-rpm.c b/drivers/clk/qcom/clk-smd-rpm.c
index a27013dbc0aa..b8fcac6f2f87 100644
--- a/drivers/clk/qcom/clk-smd-rpm.c
+++ b/drivers/clk/qcom/clk-smd-rpm.c
@@ -462,8 +462,79 @@ static const struct rpm_smd_clk_desc rpm_clk_msm8916 = {
.num_clks = ARRAY_SIZE(msm8916_clks),
 };
 
+/* msm8974 */
+DEFINE_CLK_SMD_RPM(msm8974, pnoc_clk, pnoc_a_clk, QCOM_SMD_RPM_BUS_CLK, 0);
+DEFINE_CLK_SMD_RPM(msm8974, snoc_clk, snoc_a_clk, QCOM_SMD_RPM_BUS_CLK, 1);
+DEFINE_CLK_SMD_RPM(msm8974, cnoc_clk, cnoc_a_clk, QCOM_SMD_RPM_BUS_CLK, 2);
+DEFINE_CLK_SMD_RPM(msm8974, mmssnoc_ahb_clk, mmssnoc_ahb_a_clk, 
QCOM_SMD_RPM_BUS_CLK, 3);
+DEFINE_CLK_SMD_RPM(msm8974, bimc_clk, bimc_a_clk, QCOM_SMD_RPM_MEM_CLK, 0);
+DEFINE_CLK_SMD_RPM(msm8974, gfx3d_clk_src, gfx3d_a_clk_src, 
QCOM_SMD_RPM_MEM_CLK, 1);
+DEFINE_CLK_SMD_RPM(msm8974, ocmemgx_clk, ocmemgx_a_clk, QCOM_SMD_RPM_MEM_CLK, 
2);
+DEFINE_CLK_SMD_RPM_QDSS(msm8974, qdss_clk, qdss_a_clk, QCOM_SMD_RPM_MISC_CLK, 
1);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_d0, cxo_d0_a, 1);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_d1, cxo_d1_a, 2);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_a0, cxo_a0_a, 4);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_a1, cxo_a1_a, 5);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_a2, cxo_a2_a, 6);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, diff_clk, diff_a_clk, 7);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, div_clk1, div_a_clk1, 11);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, div_clk2, div_a_clk2, 12);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_d0_pin, cxo_d0_a_pin, 1);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_d1_pin, cxo_d1_a_pin, 2);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_a0_pin, cxo_a0_a_pin, 4);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_a1_pin, cxo_a1_a_pin, 5);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_a2_pin, cxo_a2_a_pin, 6);
+
+static struct clk_smd_rpm *msm8974_clks[] = {
+   [RPM_SMD_PNOC_CLK]  = _pnoc_clk,
+   [RPM_SMD_PNOC_A_CLK]= _pnoc_a_clk,
+   [RPM_SMD_SNOC_CLK]  = _snoc_clk,
+   [RPM_SMD_SNOC_A_CLK]= _snoc_a_clk,
+   [RPM_SMD_CNOC_CLK]  = _cnoc_clk,
+   [RPM_SMD_CNOC_A_CLK]= _cnoc_a_clk,
+   [RPM_SMD_MMSSNOC_AHB_CLK]   = _mmssnoc_ahb_clk,
+   [RPM_SMD_MMSSNOC_AHB_A_CLK] = _mmssnoc_ahb_a_clk,
+   [RPM_SMD_BIMC_CLK]  = _bimc_clk,
+   [RPM_SMD_BIMC_A_CLK]= _bimc_a_clk,
+   [RPM_SMD_OCMEMGX_CLK]   = _ocmemgx_clk,
+   [RPM_SMD_OCMEMGX_A_CLK] = _ocmemgx_a_clk,
+   [RPM_SMD_QDSS_CLK]  = _qdss_clk,
+   [RPM_SMD_QDSS_A_CLK]= _qdss_a_clk,
+   [RPM_SMD_CXO_D0]= _cxo_d0,
+   [RPM_SMD_CXO_D0_A]  = _cxo_d0_a,
+   [RPM_SMD_CXO_D1]= _cxo_d1,
+   [RPM_SMD_CXO_D1_A]  = _cxo_d1_a,
+   [RPM_SMD_CXO_A0]= _cxo_a0,
+   [RPM_SMD_CXO_A0_A]  = _cxo_a0_a,
+   [RPM_SMD_CXO_A1]= _cxo_a1,
+   [RPM_SMD_CXO_A1_A]  = _cxo_a1_a,
+   [RPM_SMD_CXO_A2]= _cxo_a2,
+   [RPM_SMD_CXO_A2_A]  = _cxo_a2_a,
+   [RPM_SMD_DIFF_CLK]  = _diff_clk,
+   [RPM_SMD_DIFF_A_CLK]= _diff_a_clk,
+   [RPM_SMD_DIV_CLK1]  = _div_clk1,
+   [RPM_SMD_DIV_A_CLK1]= _div_a_clk1,
+   [RPM_SMD_DIV_CLK2]  = _div_clk2,
+   [RPM_SMD_DIV_A_CLK2]= _div_a_clk2,
+   [RPM_SMD_CXO_D0_PIN]= _cxo_d0_pin,
+   [RPM_SMD_CXO_D0_A_PIN]  = _cxo_d0_a_pin,
+   [RPM_SMD_CXO_D1_PIN]= _cxo_d1_pin,
+   [RPM_SMD_CXO_D1_A_PIN]  = _cxo_d1_a_pin,
+   [RPM_SMD_CXO_A0_PIN]= _cxo_a0_pin,
+   [RPM_SMD_CXO_A0_A_PIN]  = _cxo_a0_a_pin,
+   [RPM_SMD_CXO_A1_PIN]= _cxo_a1_pin,
+   

[PATCH] clk: qcom: smd-rpm: Add msm8974 clocks

2016-11-14 Thread Bjorn Andersson
This adds all RPM based clocks for msm8974 except cxo and gfx3d_clk_src.

Signed-off-by: Bjorn Andersson 
---
 .../devicetree/bindings/clock/qcom,rpmcc.txt   |  1 +
 drivers/clk/qcom/clk-smd-rpm.c | 71 ++
 include/dt-bindings/clock/qcom,rpmcc.h | 40 +++-
 3 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt 
b/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt
index 87d3714b956a..a7235e9e1c97 100644
--- a/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt
+++ b/Documentation/devicetree/bindings/clock/qcom,rpmcc.txt
@@ -11,6 +11,7 @@ Required properties :
compatible "qcom,rpmcc" should be also included.
 
"qcom,rpmcc-msm8916", "qcom,rpmcc"
+   "qcom,rpmcc-msm8974", "qcom,rpmcc"
"qcom,rpmcc-apq8064", "qcom,rpmcc"
 
 - #clock-cells : shall contain 1
diff --git a/drivers/clk/qcom/clk-smd-rpm.c b/drivers/clk/qcom/clk-smd-rpm.c
index a27013dbc0aa..b8fcac6f2f87 100644
--- a/drivers/clk/qcom/clk-smd-rpm.c
+++ b/drivers/clk/qcom/clk-smd-rpm.c
@@ -462,8 +462,79 @@ static const struct rpm_smd_clk_desc rpm_clk_msm8916 = {
.num_clks = ARRAY_SIZE(msm8916_clks),
 };
 
+/* msm8974 */
+DEFINE_CLK_SMD_RPM(msm8974, pnoc_clk, pnoc_a_clk, QCOM_SMD_RPM_BUS_CLK, 0);
+DEFINE_CLK_SMD_RPM(msm8974, snoc_clk, snoc_a_clk, QCOM_SMD_RPM_BUS_CLK, 1);
+DEFINE_CLK_SMD_RPM(msm8974, cnoc_clk, cnoc_a_clk, QCOM_SMD_RPM_BUS_CLK, 2);
+DEFINE_CLK_SMD_RPM(msm8974, mmssnoc_ahb_clk, mmssnoc_ahb_a_clk, 
QCOM_SMD_RPM_BUS_CLK, 3);
+DEFINE_CLK_SMD_RPM(msm8974, bimc_clk, bimc_a_clk, QCOM_SMD_RPM_MEM_CLK, 0);
+DEFINE_CLK_SMD_RPM(msm8974, gfx3d_clk_src, gfx3d_a_clk_src, 
QCOM_SMD_RPM_MEM_CLK, 1);
+DEFINE_CLK_SMD_RPM(msm8974, ocmemgx_clk, ocmemgx_a_clk, QCOM_SMD_RPM_MEM_CLK, 
2);
+DEFINE_CLK_SMD_RPM_QDSS(msm8974, qdss_clk, qdss_a_clk, QCOM_SMD_RPM_MISC_CLK, 
1);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_d0, cxo_d0_a, 1);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_d1, cxo_d1_a, 2);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_a0, cxo_a0_a, 4);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_a1, cxo_a1_a, 5);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, cxo_a2, cxo_a2_a, 6);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, diff_clk, diff_a_clk, 7);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, div_clk1, div_a_clk1, 11);
+DEFINE_CLK_SMD_RPM_XO_BUFFER(msm8974, div_clk2, div_a_clk2, 12);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_d0_pin, cxo_d0_a_pin, 1);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_d1_pin, cxo_d1_a_pin, 2);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_a0_pin, cxo_a0_a_pin, 4);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_a1_pin, cxo_a1_a_pin, 5);
+DEFINE_CLK_SMD_RPM_XO_BUFFER_PINCTRL(msm8974, cxo_a2_pin, cxo_a2_a_pin, 6);
+
+static struct clk_smd_rpm *msm8974_clks[] = {
+   [RPM_SMD_PNOC_CLK]  = _pnoc_clk,
+   [RPM_SMD_PNOC_A_CLK]= _pnoc_a_clk,
+   [RPM_SMD_SNOC_CLK]  = _snoc_clk,
+   [RPM_SMD_SNOC_A_CLK]= _snoc_a_clk,
+   [RPM_SMD_CNOC_CLK]  = _cnoc_clk,
+   [RPM_SMD_CNOC_A_CLK]= _cnoc_a_clk,
+   [RPM_SMD_MMSSNOC_AHB_CLK]   = _mmssnoc_ahb_clk,
+   [RPM_SMD_MMSSNOC_AHB_A_CLK] = _mmssnoc_ahb_a_clk,
+   [RPM_SMD_BIMC_CLK]  = _bimc_clk,
+   [RPM_SMD_BIMC_A_CLK]= _bimc_a_clk,
+   [RPM_SMD_OCMEMGX_CLK]   = _ocmemgx_clk,
+   [RPM_SMD_OCMEMGX_A_CLK] = _ocmemgx_a_clk,
+   [RPM_SMD_QDSS_CLK]  = _qdss_clk,
+   [RPM_SMD_QDSS_A_CLK]= _qdss_a_clk,
+   [RPM_SMD_CXO_D0]= _cxo_d0,
+   [RPM_SMD_CXO_D0_A]  = _cxo_d0_a,
+   [RPM_SMD_CXO_D1]= _cxo_d1,
+   [RPM_SMD_CXO_D1_A]  = _cxo_d1_a,
+   [RPM_SMD_CXO_A0]= _cxo_a0,
+   [RPM_SMD_CXO_A0_A]  = _cxo_a0_a,
+   [RPM_SMD_CXO_A1]= _cxo_a1,
+   [RPM_SMD_CXO_A1_A]  = _cxo_a1_a,
+   [RPM_SMD_CXO_A2]= _cxo_a2,
+   [RPM_SMD_CXO_A2_A]  = _cxo_a2_a,
+   [RPM_SMD_DIFF_CLK]  = _diff_clk,
+   [RPM_SMD_DIFF_A_CLK]= _diff_a_clk,
+   [RPM_SMD_DIV_CLK1]  = _div_clk1,
+   [RPM_SMD_DIV_A_CLK1]= _div_a_clk1,
+   [RPM_SMD_DIV_CLK2]  = _div_clk2,
+   [RPM_SMD_DIV_A_CLK2]= _div_a_clk2,
+   [RPM_SMD_CXO_D0_PIN]= _cxo_d0_pin,
+   [RPM_SMD_CXO_D0_A_PIN]  = _cxo_d0_a_pin,
+   [RPM_SMD_CXO_D1_PIN]= _cxo_d1_pin,
+   [RPM_SMD_CXO_D1_A_PIN]  = _cxo_d1_a_pin,
+   [RPM_SMD_CXO_A0_PIN]= _cxo_a0_pin,
+   [RPM_SMD_CXO_A0_A_PIN]  = _cxo_a0_a_pin,
+   [RPM_SMD_CXO_A1_PIN]= _cxo_a1_pin,
+   [RPM_SMD_CXO_A1_A_PIN]  

Re: [PATCH 1/5] pinctrl: core: Use delayed work for hogs

2016-11-14 Thread Linus Walleij
On Tue, Nov 15, 2016 at 1:47 AM, Tony Lindgren  wrote:

> 8< 
> From tony Mon Sep 17 00:00:00 2001
> From: Tony Lindgren 
> Date: Tue, 25 Oct 2016 08:33:35 -0700
> Subject: [PATCH] pinctrl: core: Use delayed work for hogs
>
> Having the pin control framework call pin controller functions
> before it's probe has finished is not nice as the pin controller
> device driver does not yet have struct pinctrl_dev handle.
>
> Let's fix this issue by adding deferred work for late init. This is
> needed to be able to add pinctrl generic helper functions that expect
> to know struct pinctrl_dev handle. Note that we now need to call
> create_pinctrl() directly as we don't want to add the pin controller
> to the list of controllers until the hogs are claimed. We also need
> to pass the pinctrl_dev to the device tree parser functions as they
> otherwise won't find the right controller at this point.
>
> Signed-off-by: Tony Lindgren 

This looks a lot better!

So if I understand correctly, we can guarantee that the delayed
work will not execute until the device driver probe() has finished,
and it *will* execute immediately after that?

So:
- Device driver probes
- Delayed work is called
- Next initcall

I'm not 100% familiar with how delayed work works... :/

Yours,
Linus Walleij


Re: [PATCH 1/5] pinctrl: core: Use delayed work for hogs

2016-11-14 Thread Linus Walleij
On Tue, Nov 15, 2016 at 1:47 AM, Tony Lindgren  wrote:

> 8< 
> From tony Mon Sep 17 00:00:00 2001
> From: Tony Lindgren 
> Date: Tue, 25 Oct 2016 08:33:35 -0700
> Subject: [PATCH] pinctrl: core: Use delayed work for hogs
>
> Having the pin control framework call pin controller functions
> before it's probe has finished is not nice as the pin controller
> device driver does not yet have struct pinctrl_dev handle.
>
> Let's fix this issue by adding deferred work for late init. This is
> needed to be able to add pinctrl generic helper functions that expect
> to know struct pinctrl_dev handle. Note that we now need to call
> create_pinctrl() directly as we don't want to add the pin controller
> to the list of controllers until the hogs are claimed. We also need
> to pass the pinctrl_dev to the device tree parser functions as they
> otherwise won't find the right controller at this point.
>
> Signed-off-by: Tony Lindgren 

This looks a lot better!

So if I understand correctly, we can guarantee that the delayed
work will not execute until the device driver probe() has finished,
and it *will* execute immediately after that?

So:
- Device driver probes
- Delayed work is called
- Next initcall

I'm not 100% familiar with how delayed work works... :/

Yours,
Linus Walleij


Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-14 Thread Jike Song
On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>  include/linux/mdev.h  |  9 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index ffc36758cb84..1694b1635607 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>   if (ret)
>   module_put(THIS_MODULE);
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + if (vfio_register_notifier(>dev, >nb))
> + pr_err("Failed to register notifier for mdev\n");
> + }

Hi Kirti,

Could you please move the notifier registration before parent->ops->open()?
as you might know, I'm extending your vfio_register_notifier to also include
the attaching/detaching events of vfio_group and kvm.  Basically if vfio_group
not attached to any kvm instance, the parent->ops->open() should return -ENODEV
to indicate the failure, but to know whether kvm is available in open(), the
notifier registration should be earlier.

Of course I can call vfio_register_notifier() from an earlier place to
workaround it, but it doesn't seem a canonical way.

--
Thanks,
Jike

>   return ret;
>  }
>  
> @@ -48,6 +62,11 @@ static void vfio_mdev_release(void *device_data)
>   struct mdev_device *mdev = device_data;
>   struct parent_device *parent = mdev->parent;
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 4900cc472364..665afe0a4c31 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  /**
> @@ -85,6 +86,12 @@ struct mdev_device {
>   * @mmap:mmap callback
>   *   @mdev: mediated device structure
>   *   @vma: vma structure
> + * @notifer: Notifier callback, currently only for
> + *   VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
> + *   DMA_UNMAP call on mapped iova range.
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -106,6 +113,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */
> 


Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-14 Thread Jike Song
On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>  include/linux/mdev.h  |  9 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index ffc36758cb84..1694b1635607 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>   if (ret)
>   module_put(THIS_MODULE);
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + if (vfio_register_notifier(>dev, >nb))
> + pr_err("Failed to register notifier for mdev\n");
> + }

Hi Kirti,

Could you please move the notifier registration before parent->ops->open()?
as you might know, I'm extending your vfio_register_notifier to also include
the attaching/detaching events of vfio_group and kvm.  Basically if vfio_group
not attached to any kvm instance, the parent->ops->open() should return -ENODEV
to indicate the failure, but to know whether kvm is available in open(), the
notifier registration should be earlier.

Of course I can call vfio_register_notifier() from an earlier place to
workaround it, but it doesn't seem a canonical way.

--
Thanks,
Jike

>   return ret;
>  }
>  
> @@ -48,6 +62,11 @@ static void vfio_mdev_release(void *device_data)
>   struct mdev_device *mdev = device_data;
>   struct parent_device *parent = mdev->parent;
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 4900cc472364..665afe0a4c31 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  /**
> @@ -85,6 +86,12 @@ struct mdev_device {
>   * @mmap:mmap callback
>   *   @mdev: mediated device structure
>   *   @vma: vma structure
> + * @notifer: Notifier callback, currently only for
> + *   VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
> + *   DMA_UNMAP call on mapped iova range.
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -106,6 +113,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */
> 


Re: [kbuild-all] [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread Fengguang Wu

Hi He Chen,

On Tue, Nov 15, 2016 at 02:02:23PM +0800, He Chen wrote:

On Tue, Nov 15, 2016 at 04:24:39AM +0800, kbuild test robot wrote:

Hi He,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.9-rc5]
[cannot apply to next-20161114]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/He-Chen/x86-kvm-Add-AVX512_4VNNIW-and-AVX512_4FMAPS-support/20161114-170941
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next



I have downloaded .config.gz in attachment and use the .config in it
to build kernel in my local branch again, and I don't see any warn or
error message.

I wonder whether the previous 0001 and 0002 patches have applied to run
this test? Or is there something wrong with my compiler or patches?


Sorry the robot is not smart enough to see the 0001/0002 patches.
As you may see from the above url, only this patch is applied on top
of the KVM linux-next branch.

Thanks,
Fengguang


Re: [kbuild-all] [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread Fengguang Wu

Hi He Chen,

On Tue, Nov 15, 2016 at 02:02:23PM +0800, He Chen wrote:

On Tue, Nov 15, 2016 at 04:24:39AM +0800, kbuild test robot wrote:

Hi He,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.9-rc5]
[cannot apply to next-20161114]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/He-Chen/x86-kvm-Add-AVX512_4VNNIW-and-AVX512_4FMAPS-support/20161114-170941
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next



I have downloaded .config.gz in attachment and use the .config in it
to build kernel in my local branch again, and I don't see any warn or
error message.

I wonder whether the previous 0001 and 0002 patches have applied to run
this test? Or is there something wrong with my compiler or patches?


Sorry the robot is not smart enough to see the 0001/0002 patches.
As you may see from the above url, only this patch is applied on top
of the KVM linux-next branch.

Thanks,
Fengguang


Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)

2016-11-14 Thread Ingo Molnar

* Namhyung Kim  wrote:

> Hello,
> 
> This patchset is a rebased version of David's sched timehist work [1].
> I plan to improve perf sched command more and think that having
> timehist command before the work looks good.  It seems David is busy
> these days, so I'm retrying it by myself.
> 
> This implements only basic feature and a few options.  I just split
> the patch to make it easier to review and did some cosmetic changes.
> More patches will come later.
> 
> The below is from the David's original description:
> 
> 8<-
> 'perf sched timehist' provides an analysis of scheduling events.
> 
> Example usage:
> perf sched record -- sleep 1
> perf sched timehist


Cool, very nice!

> By default it shows the individual schedule events, including the time between
> sched-in events for the task, the task scheduling delay (time between wakeup
> and actually running) and run time for the task:
> 
>time cpu  task name[tid/pid]b/n time sch delay  run time
>   -   - - -
>79371.874569 [11] gcc[31949]   0.014 0.000 1.148
>79371.874591 [10] gcc[31951]   0.000 0.000 0.024
>79371.874603 [10] migration/10[59] 3.350 0.004 0.011
>79371.874604 [11]1.148 0.000 0.035
>79371.874723 [05]0.016 0.000 1.383
>79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> ...

What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
'sch wait' instead, or so?


> Times are in msec.usec.
> 
> If callchains were recorded they are appended to the line with a default 
> stack depth of 5:
> 
>79371.874569 [11] gcc[31949]  0.14  0.00  0.001148 
>  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
>79371.874591 [10] gcc[31951]  0.00  0.00  0.24 
>  __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
>79371.874603 [10] migration/10[59]0.003350  0.04  0.11 
>  smpboot_thread_fn kthread ret_from_fork
>79371.874604 [11]   0.001148  0.00  0.35 
>  cpu_startup_entry start_secondary
>79371.874723 [05]   0.16  0.00  0.001383 
>  cpu_startup_entry start_secondary
>79371.874746 [05] gcc[31949]  0.000153  0.78  0.22 
>  do_wait sys_wait4 system_call_fastpath __GI___waitpid

So when I first saw this it was hard for me to disambiguate individual function 
names. Wouldn't this be a bit more readable:

>79371.874569 [11] gcc[31949]  0.14  0.00  0.001148 
>  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() <- __vfork()
>79371.874591 [10] gcc[31951]  0.00  0.00  0.24 
>  __cond_resched() <- _cond_resched() <- wait_for_completion() <- 
> stop_one_cpu() <- sched_exec()
>79371.874603 [10] migration/10[59]0.003350  0.04  0.11 
>  smpboot_thread_fn() <- kthread() <- ret_from_fork()
>79371.874604 [11]   0.001148  0.00  0.35 
>  cpu_startup_entry() <- start_secondary()
>79371.874723 [05]   0.16  0.00  0.001383 
>  cpu_startup_entry() <- start_secondary()
>79371.874746 [05] gcc[31949]  0.000153  0.78  0.22 
>  do_wait() <- sys_wait4() <- system_call_fastpath() <- __GI___waitpid()

Or:

>79371.874569 [11] gcc[31949]  0.14  0.00  0.001148 
>  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() <- 
> __vfork()
>79371.874591 [10] gcc[31951]  0.00  0.00  0.24 
>  __cond_resched()   <- _cond_resched() <- 
> wait_for_completion() <- stop_one_cpu() <- sched_exec()
>79371.874603 [10] migration/10[59]0.003350  0.04  0.11 
>  smpboot_thread_fn()<- kthread() <- ret_from_fork()
>79371.874604 [11]   0.001148  0.00  0.35 
>  cpu_startup_entry()<- start_secondary()
>79371.874723 [05]   0.16  0.00  0.001383 
>  cpu_startup_entry()<- start_secondary()
>79371.874746 [05] gcc[31949]  0.000153  0.78  0.22 
>  do_wait()  <- sys_wait4() <- 
> system_call_fastpath() <- __GI___waitpid()

(i.e. visually separate the first entry - and list the rest.)

Or maybe it could be ASCII color coded so that the different entries are easier 
to 
separate: for example the functions could be printed in alternating white/grey 
color?

Thanks,

Ingo


Re: [PATCHSET 0/7] perf sched: Introduce timehist command, again (v1)

2016-11-14 Thread Ingo Molnar

* Namhyung Kim  wrote:

> Hello,
> 
> This patchset is a rebased version of David's sched timehist work [1].
> I plan to improve perf sched command more and think that having
> timehist command before the work looks good.  It seems David is busy
> these days, so I'm retrying it by myself.
> 
> This implements only basic feature and a few options.  I just split
> the patch to make it easier to review and did some cosmetic changes.
> More patches will come later.
> 
> The below is from the David's original description:
> 
> 8<-
> 'perf sched timehist' provides an analysis of scheduling events.
> 
> Example usage:
> perf sched record -- sleep 1
> perf sched timehist


Cool, very nice!

> By default it shows the individual schedule events, including the time between
> sched-in events for the task, the task scheduling delay (time between wakeup
> and actually running) and run time for the task:
> 
>time cpu  task name[tid/pid]b/n time sch delay  run time
>   -   - - -
>79371.874569 [11] gcc[31949]   0.014 0.000 1.148
>79371.874591 [10] gcc[31951]   0.000 0.000 0.024
>79371.874603 [10] migration/10[59] 3.350 0.004 0.011
>79371.874604 [11]1.148 0.000 0.035
>79371.874723 [05]0.016 0.000 1.383
>79371.874746 [05] gcc[31949]   0.153 0.078 0.022
> ...

What does the 'b/n' abbreviation stand for? 'Between'? Could we call the column 
'sch wait' instead, or so?


> Times are in msec.usec.
> 
> If callchains were recorded they are appended to the line with a default 
> stack depth of 5:
> 
>79371.874569 [11] gcc[31949]  0.14  0.00  0.001148 
>  wait_for_completion_killable do_fork sys_vfork stub_vfork __vfork
>79371.874591 [10] gcc[31951]  0.00  0.00  0.24 
>  __cond_resched _cond_resched wait_for_completion stop_one_cpu sched_exec
>79371.874603 [10] migration/10[59]0.003350  0.04  0.11 
>  smpboot_thread_fn kthread ret_from_fork
>79371.874604 [11]   0.001148  0.00  0.35 
>  cpu_startup_entry start_secondary
>79371.874723 [05]   0.16  0.00  0.001383 
>  cpu_startup_entry start_secondary
>79371.874746 [05] gcc[31949]  0.000153  0.78  0.22 
>  do_wait sys_wait4 system_call_fastpath __GI___waitpid

So when I first saw this it was hard for me to disambiguate individual function 
names. Wouldn't this be a bit more readable:

>79371.874569 [11] gcc[31949]  0.14  0.00  0.001148 
>  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() <- __vfork()
>79371.874591 [10] gcc[31951]  0.00  0.00  0.24 
>  __cond_resched() <- _cond_resched() <- wait_for_completion() <- 
> stop_one_cpu() <- sched_exec()
>79371.874603 [10] migration/10[59]0.003350  0.04  0.11 
>  smpboot_thread_fn() <- kthread() <- ret_from_fork()
>79371.874604 [11]   0.001148  0.00  0.35 
>  cpu_startup_entry() <- start_secondary()
>79371.874723 [05]   0.16  0.00  0.001383 
>  cpu_startup_entry() <- start_secondary()
>79371.874746 [05] gcc[31949]  0.000153  0.78  0.22 
>  do_wait() <- sys_wait4() <- system_call_fastpath() <- __GI___waitpid()

Or:

>79371.874569 [11] gcc[31949]  0.14  0.00  0.001148 
>  wait_for_completion_killable() <- do_fork sys_vfork stub_vfork() <- 
> __vfork()
>79371.874591 [10] gcc[31951]  0.00  0.00  0.24 
>  __cond_resched()   <- _cond_resched() <- 
> wait_for_completion() <- stop_one_cpu() <- sched_exec()
>79371.874603 [10] migration/10[59]0.003350  0.04  0.11 
>  smpboot_thread_fn()<- kthread() <- ret_from_fork()
>79371.874604 [11]   0.001148  0.00  0.35 
>  cpu_startup_entry()<- start_secondary()
>79371.874723 [05]   0.16  0.00  0.001383 
>  cpu_startup_entry()<- start_secondary()
>79371.874746 [05] gcc[31949]  0.000153  0.78  0.22 
>  do_wait()  <- sys_wait4() <- 
> system_call_fastpath() <- __GI___waitpid()

(i.e. visually separate the first entry - and list the rest.)

Or maybe it could be ASCII color coded so that the different entries are easier 
to 
separate: for example the functions could be printed in alternating white/grey 
color?

Thanks,

Ingo


Re: [PATCH] xen-platform: use builtin_pci_driver

2016-11-14 Thread Juergen Gross
On 14/11/16 13:52, Geliang Tang wrote:
> Use builtin_pci_driver() helper to simplify the code.
> 
> Signed-off-by: Geliang Tang 

Reviewed-by: Juergen Gross 

> ---
>  drivers/xen/platform-pci.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index b59c9455..112ce42 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -125,8 +125,4 @@ static struct pci_driver platform_driver = {
>   .id_table =   platform_pci_tbl,
>  };
>  
> -static int __init platform_pci_init(void)
> -{
> - return pci_register_driver(_driver);
> -}
> -device_initcall(platform_pci_init);
> +builtin_pci_driver(platform_driver);
> 



Re: [PATCH] xen-platform: use builtin_pci_driver

2016-11-14 Thread Juergen Gross
On 14/11/16 13:52, Geliang Tang wrote:
> Use builtin_pci_driver() helper to simplify the code.
> 
> Signed-off-by: Geliang Tang 

Reviewed-by: Juergen Gross 

> ---
>  drivers/xen/platform-pci.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index b59c9455..112ce42 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -125,8 +125,4 @@ static struct pci_driver platform_driver = {
>   .id_table =   platform_pci_tbl,
>  };
>  
> -static int __init platform_pci_init(void)
> -{
> - return pci_register_driver(_driver);
> -}
> -device_initcall(platform_pci_init);
> +builtin_pci_driver(platform_driver);
> 



Re: [PATCH v2] f2fs: don't wait writeback for datas during checkpoint

2016-11-14 Thread Chao Yu
Hi Jaegeuk,

On 2016/11/15 7:32, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Mon, Nov 14, 2016 at 07:04:12PM +0800, Chao Yu wrote:
>> Normally, while committing checkpoint, we will wait on all pages to be
>> writebacked no matter the page is data or metadata, so in scenario where
>> there are lots of data IO being submitted with metadata, we may suffer
>> long latency for waiting writeback during checkpoint.
>>
>> Indeed, we only care about persistence for pages with metadata, but not
>> pages with data, as file system consistent are only related to metadate,
>> so in order to avoid encountering long latency in above scenario, let's
>> recognize and reference metadata in submitted IOs, wait writeback only
>> for metadatas.
> 
> Hmm, another concern comes, which is related to GCed data like below scenario.
> 
> 1. Write data X
> 2. Sync
> 3. Move data X by GC
> 4. Checkpoint
> 5. Power-cut
> 
> In this case, we should guarantee data X which was migrated by GC during #3.
> If we don't care about end_io in #4 Checkpoint, we can lose the data after
> #5 Power-cut.
> 
> Any idea?

Yes, good catch. :)

What about tagging these GCed page as cold data through set_cold_data, and clear
the tag in end_io, then we can keep reference count and wait on writeback for 
them?

Thanks,

> 
> Thanks,
> 
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/checkpoint.c |  2 +-
>>  fs/f2fs/data.c   | 36 
>>  fs/f2fs/debug.c  |  7 ---
>>  fs/f2fs/f2fs.h   |  8 +---
>>  4 files changed, 42 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index 7bece59..bdf8a50 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -1003,7 +1003,7 @@ static void wait_on_all_pages_writeback(struct 
>> f2fs_sb_info *sbi)
>>  for (;;) {
>>  prepare_to_wait(>cp_wait, , TASK_UNINTERRUPTIBLE);
>>  
>> -if (!atomic_read(>nr_wb_bios))
>> +if (!get_pages(sbi, F2FS_WB_META))
>>  break;
>>  
>>  io_schedule_timeout(5*HZ);
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 66d2aee..f52cec3 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -29,6 +29,26 @@
>>  #include "trace.h"
>>  #include 
>>  
>> +static bool f2fs_is_meta_data(struct page *page)
>> +{
>> +struct address_space *mapping = page->mapping;
>> +struct f2fs_sb_info *sbi;
>> +struct inode *inode;
>> +
>> +/* it is bounce page of encrypted regular inode */
>> +if (!mapping)
>> +return false;
>> +
>> +inode = mapping->host;
>> +sbi = F2FS_I_SB(inode);
>> +
>> +if ((inode->i_ino == F2FS_META_INO(sbi) &&
>> +page->index < MAIN_BLKADDR(sbi)) ||
>> +inode->i_ino ==  F2FS_NODE_INO(sbi) ||
>> +S_ISDIR(inode->i_mode))
>> +return true;
>> +return false;
>> +}
>>  static void f2fs_read_end_io(struct bio *bio)
>>  {
>>  struct bio_vec *bvec;
>> @@ -73,6 +93,7 @@ static void f2fs_write_end_io(struct bio *bio)
>>  
>>  bio_for_each_segment_all(bvec, bio, i) {
>>  struct page *page = bvec->bv_page;
>> +bool is_meta = f2fs_is_meta_data(page);
>>  
>>  fscrypt_pullback_bio_page(, true);
>>  
>> @@ -80,9 +101,10 @@ static void f2fs_write_end_io(struct bio *bio)
>>  mapping_set_error(page->mapping, -EIO);
>>  f2fs_stop_checkpoint(sbi, true);
>>  }
>> +dec_page_count(sbi, is_meta ? F2FS_WB_META : F2FS_WB_DATA);
>>  end_page_writeback(page);
>>  }
>> -if (atomic_dec_and_test(>nr_wb_bios) &&
>> +if (!get_pages(sbi, F2FS_WB_META) &&
>>  wq_has_sleeper(>cp_wait))
>>  wake_up(>cp_wait);
>>  
>> @@ -111,7 +133,6 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi,
>>  struct bio *bio, enum page_type type)
>>  {
>>  if (!is_read_io(bio_op(bio))) {
>> -atomic_inc(>nr_wb_bios);
>>  if (f2fs_sb_mounted_blkzoned(sbi->sb) &&
>>  current->plug && (type == DATA || type == NODE))
>>  blk_finish_plug(current->plug);
>> @@ -272,6 +293,15 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>>  verify_block_addr(sbi, fio->old_blkaddr);
>>  verify_block_addr(sbi, fio->new_blkaddr);
>>  
>> +bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page;
>> +
>> +if (!is_read) {
>> +bool is_meta;
>> +
>> +is_meta = f2fs_is_meta_data(bio_page);
>> +inc_page_count(sbi, is_meta ? F2FS_WB_META : F2FS_WB_DATA);
>> +}
>> +
>>  down_write(>io_rwsem);
>>  
>>  if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
>> @@ -284,8 +314,6 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>>  io->fio = *fio;
>>  }
>>  
>> -bio_page = 

Re: [PATCH v2] f2fs: don't wait writeback for datas during checkpoint

2016-11-14 Thread Chao Yu
Hi Jaegeuk,

On 2016/11/15 7:32, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Mon, Nov 14, 2016 at 07:04:12PM +0800, Chao Yu wrote:
>> Normally, while committing checkpoint, we will wait on all pages to be
>> writebacked no matter the page is data or metadata, so in scenario where
>> there are lots of data IO being submitted with metadata, we may suffer
>> long latency for waiting writeback during checkpoint.
>>
>> Indeed, we only care about persistence for pages with metadata, but not
>> pages with data, as file system consistent are only related to metadate,
>> so in order to avoid encountering long latency in above scenario, let's
>> recognize and reference metadata in submitted IOs, wait writeback only
>> for metadatas.
> 
> Hmm, another concern comes, which is related to GCed data like below scenario.
> 
> 1. Write data X
> 2. Sync
> 3. Move data X by GC
> 4. Checkpoint
> 5. Power-cut
> 
> In this case, we should guarantee data X which was migrated by GC during #3.
> If we don't care about end_io in #4 Checkpoint, we can lose the data after
> #5 Power-cut.
> 
> Any idea?

Yes, good catch. :)

What about tagging these GCed page as cold data through set_cold_data, and clear
the tag in end_io, then we can keep reference count and wait on writeback for 
them?

Thanks,

> 
> Thanks,
> 
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/checkpoint.c |  2 +-
>>  fs/f2fs/data.c   | 36 
>>  fs/f2fs/debug.c  |  7 ---
>>  fs/f2fs/f2fs.h   |  8 +---
>>  4 files changed, 42 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index 7bece59..bdf8a50 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -1003,7 +1003,7 @@ static void wait_on_all_pages_writeback(struct 
>> f2fs_sb_info *sbi)
>>  for (;;) {
>>  prepare_to_wait(>cp_wait, , TASK_UNINTERRUPTIBLE);
>>  
>> -if (!atomic_read(>nr_wb_bios))
>> +if (!get_pages(sbi, F2FS_WB_META))
>>  break;
>>  
>>  io_schedule_timeout(5*HZ);
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 66d2aee..f52cec3 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -29,6 +29,26 @@
>>  #include "trace.h"
>>  #include 
>>  
>> +static bool f2fs_is_meta_data(struct page *page)
>> +{
>> +struct address_space *mapping = page->mapping;
>> +struct f2fs_sb_info *sbi;
>> +struct inode *inode;
>> +
>> +/* it is bounce page of encrypted regular inode */
>> +if (!mapping)
>> +return false;
>> +
>> +inode = mapping->host;
>> +sbi = F2FS_I_SB(inode);
>> +
>> +if ((inode->i_ino == F2FS_META_INO(sbi) &&
>> +page->index < MAIN_BLKADDR(sbi)) ||
>> +inode->i_ino ==  F2FS_NODE_INO(sbi) ||
>> +S_ISDIR(inode->i_mode))
>> +return true;
>> +return false;
>> +}
>>  static void f2fs_read_end_io(struct bio *bio)
>>  {
>>  struct bio_vec *bvec;
>> @@ -73,6 +93,7 @@ static void f2fs_write_end_io(struct bio *bio)
>>  
>>  bio_for_each_segment_all(bvec, bio, i) {
>>  struct page *page = bvec->bv_page;
>> +bool is_meta = f2fs_is_meta_data(page);
>>  
>>  fscrypt_pullback_bio_page(, true);
>>  
>> @@ -80,9 +101,10 @@ static void f2fs_write_end_io(struct bio *bio)
>>  mapping_set_error(page->mapping, -EIO);
>>  f2fs_stop_checkpoint(sbi, true);
>>  }
>> +dec_page_count(sbi, is_meta ? F2FS_WB_META : F2FS_WB_DATA);
>>  end_page_writeback(page);
>>  }
>> -if (atomic_dec_and_test(>nr_wb_bios) &&
>> +if (!get_pages(sbi, F2FS_WB_META) &&
>>  wq_has_sleeper(>cp_wait))
>>  wake_up(>cp_wait);
>>  
>> @@ -111,7 +133,6 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi,
>>  struct bio *bio, enum page_type type)
>>  {
>>  if (!is_read_io(bio_op(bio))) {
>> -atomic_inc(>nr_wb_bios);
>>  if (f2fs_sb_mounted_blkzoned(sbi->sb) &&
>>  current->plug && (type == DATA || type == NODE))
>>  blk_finish_plug(current->plug);
>> @@ -272,6 +293,15 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>>  verify_block_addr(sbi, fio->old_blkaddr);
>>  verify_block_addr(sbi, fio->new_blkaddr);
>>  
>> +bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page;
>> +
>> +if (!is_read) {
>> +bool is_meta;
>> +
>> +is_meta = f2fs_is_meta_data(bio_page);
>> +inc_page_count(sbi, is_meta ? F2FS_WB_META : F2FS_WB_DATA);
>> +}
>> +
>>  down_write(>io_rwsem);
>>  
>>  if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
>> @@ -284,8 +314,6 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>>  io->fio = *fio;
>>  }
>>  
>> -bio_page = fio->encrypted_page 

[PATCH v2] kvm: x86: don't print warning messages for unimplemented msrs

2016-11-14 Thread Bandan Das

Change unimplemented msrs messages to use pr_debug.
If CONFIG_DYNAMIC_DEBUG is set, then these messages can be
enabled at run time or else -DDEBUG can be used at compile
time to enable them. These messages will still be printed if
ignore_msrs=1.

Signed-off-by: Bandan Das 
---
v2:
use kvm_debug_ratelimited for vcpu_debug_ratelimited

This is a follow up to RFC posted by Dave at
https://patchwork.kernel.org/patch/9238227/ which uses pr_debug_ratelimited
when ignore_msrs is not set.

 arch/x86/kvm/mmu.c   | 2 +-
 arch/x86/kvm/x86.c   | 5 +++--
 include/linux/kvm_host.h | 6 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7e98..1b3f241 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4958,7 +4958,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, 
struct kvm_memslots *slots)
 * zap all shadow pages.
 */
if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) {
-   printk_ratelimited(KERN_DEBUG "kvm: zapping shadow pages for 
mmio generation wraparound\n");
+   kvm_debug_ratelimited("kvm: zapping shadow pages for mmio 
generation wraparound\n");
kvm_mmu_invalidate_zap_all_pages(kvm);
}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0..5d50403 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2280,7 +2280,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
+   vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x 
data 0x%llx\n",
msr, data);
return 1;
} else {
@@ -2492,7 +2492,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info->index, 
_info->data);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", 
msr_info->index);
+   vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
+  msr_info->index);
return 1;
} else {
vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", 
msr_info->index);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 01c0b9c..274bf34 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -439,6 +439,9 @@ struct kvm {
pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
pr_debug("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_debug_ratelimited(fmt, ...) \
+   pr_debug_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), \
+## __VA_ARGS__)
 #define kvm_pr_unimpl(fmt, ...) \
pr_err_ratelimited("kvm [%i]: " fmt, \
   task_tgid_nr(current), ## __VA_ARGS__)
@@ -450,6 +453,9 @@ struct kvm {
 
 #define vcpu_debug(vcpu, fmt, ...) \
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+#define vcpu_debug_ratelimited(vcpu, fmt, ...) \
+   kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id,   \
+ ## __VA_ARGS__)
 #define vcpu_err(vcpu, fmt, ...)   \
kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 
-- 
2.9.3



[PATCH v2] kvm: x86: don't print warning messages for unimplemented msrs

2016-11-14 Thread Bandan Das

Change unimplemented msrs messages to use pr_debug.
If CONFIG_DYNAMIC_DEBUG is set, then these messages can be
enabled at run time or else -DDEBUG can be used at compile
time to enable them. These messages will still be printed if
ignore_msrs=1.

Signed-off-by: Bandan Das 
---
v2:
use kvm_debug_ratelimited for vcpu_debug_ratelimited

This is a follow up to RFC posted by Dave at
https://patchwork.kernel.org/patch/9238227/ which uses pr_debug_ratelimited
when ignore_msrs is not set.

 arch/x86/kvm/mmu.c   | 2 +-
 arch/x86/kvm/x86.c   | 5 +++--
 include/linux/kvm_host.h | 6 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7e98..1b3f241 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4958,7 +4958,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, 
struct kvm_memslots *slots)
 * zap all shadow pages.
 */
if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) {
-   printk_ratelimited(KERN_DEBUG "kvm: zapping shadow pages for 
mmio generation wraparound\n");
+   kvm_debug_ratelimited("kvm: zapping shadow pages for mmio 
generation wraparound\n");
kvm_mmu_invalidate_zap_all_pages(kvm);
}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0..5d50403 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2280,7 +2280,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
+   vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x 
data 0x%llx\n",
msr, data);
return 1;
} else {
@@ -2492,7 +2492,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info->index, 
_info->data);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", 
msr_info->index);
+   vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
+  msr_info->index);
return 1;
} else {
vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", 
msr_info->index);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 01c0b9c..274bf34 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -439,6 +439,9 @@ struct kvm {
pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
pr_debug("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_debug_ratelimited(fmt, ...) \
+   pr_debug_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), \
+## __VA_ARGS__)
 #define kvm_pr_unimpl(fmt, ...) \
pr_err_ratelimited("kvm [%i]: " fmt, \
   task_tgid_nr(current), ## __VA_ARGS__)
@@ -450,6 +453,9 @@ struct kvm {
 
 #define vcpu_debug(vcpu, fmt, ...) \
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+#define vcpu_debug_ratelimited(vcpu, fmt, ...) \
+   kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id,   \
+ ## __VA_ARGS__)
 #define vcpu_err(vcpu, fmt, ...)   \
kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 
-- 
2.9.3



Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-14 Thread Kirti Wankhede


On 11/15/2016 10:47 AM, Alexey Kardashevskiy wrote:
> On 08/11/16 17:52, Alexey Kardashevskiy wrote:
>> On 05/11/16 08:10, Kirti Wankhede wrote:
>>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>>> managed by an IOMMU domain.
>>>
>>> Aim of this change is:
>>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>>> - To support direct assigned device and mediated device in single module
>>>
>>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>>> backend module. More details:
>>> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>>>   that is, of the process who mapped that iova range.
>>> - Added pfn_list tracking logic to address space structure. All pages
>>>   pinned through this interface are trached in its address space.
>>> - Pinned pages list is used to verify unpinning request and to unpin
>>>   remaining pages while detaching the group for that device.
>>> - Page accounting is updated to account in its address space where the
>>>   pages are pinned/unpinned.
>>> -  Accouting for mdev device is only done if there is no iommu capable
>>>   domain in the container. When there is a direct device assigned to the
>>>   container and that domain is iommu capable, all pages are already pinned
>>>   during DMA_MAP.
>>> - Page accouting is updated on hot plug and unplug mdev device and pass
>>>   through device.
>>>
>>> Tested by assigning below combinations of devices to a single VM:
>>> - GPU pass through only
>>
>> This does not require this patchset, right?
>>

Sorry I missed this earlier.
This testing is required for this patch, because this patch touches code
that is used for direct device assignment. Also for page accounting, all
cases are considered i.e. when there is only pass through device in a
container, when there is pass through device + vGPU device in a
container. Also have to test that pages are pinned properly when device
is hotplugged. In that case vfio_iommu_replay() is called to take
necessary action.

>>> - vGPU device only
>>
>> Out of curiosity - how exactly did you test this? The exact GPU, how to
>> create vGPU, what was the QEMU command line and the guest does with this
>> passed device? Thanks.
> 
> ping?
> 

I'm testing this code with M60, with custom changes in our driver.
Steps how to create mediated device are listed in
Documentation/vfio-mediated-device.txt for sample mtty driver. Same
steps I'm following for GPU. Quoting those steps here for you:

2. Create a mediated device by using the dummy device that you created
in the
   previous step.

   # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >  \

/sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create

3. Add parameters to qemu-kvm.

   -device vfio-pci,\
sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001


Thanks,
Kirti



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 01:11, Alex Thorlton wrote:
> Hey everyone,
> 
> We're having problems with large systems hitting a BUG in
> xen_memory_setup, due to extra e820 entries created in the
> XENMEM_machine_memory_map callback.  The change in the patch gets things
> working, but Boris and I wanted to get opinions on whether or not this
> is the appropriate/entire solution, which is why I've sent it as an RFC
> for now.
> 
> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
> which is a detail worth discussig.  He proposed possibly adding
> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
> value than E820MAX, since the Xen e820 table isn't bound by the
> zero-page memory limitations.
> 
> I do *slightly* question the use of E820_X_MAX here, only from a
> cosmetic prospective, as I believe this macro is intended to describe
> the maximum size of the extended e820 table, which, AFAIK, is not used
> by the Xen HV.  That being said, there isn't exactly a "more
> appropriate" macro/variable to use, so this may not really be an issue.
> 
> Any input on the patch, or the questions I've raised above is greatly
> appreciated!

While I think extending the e820 table is the right thing to do I'm
questioning the assumptions here.

Looking briefly through the Xen hypervisor sources I think it isn't
yet ready for such large machines: the hypervisor's e820 map seems to
be still limited to 128 e820 entries. Jan, did I overlook an EFI
specific path extending this limitation?

In case I'm right the Xen hypervisor should be prepared for a larger
e820 map, but this won't help alone as there would still be additional
entries for the IOAPICs created.

So I think we need something like:

#define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)

and use this for sizing xen_e820_map[].


Juergen


Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-14 Thread Kirti Wankhede


On 11/15/2016 10:47 AM, Alexey Kardashevskiy wrote:
> On 08/11/16 17:52, Alexey Kardashevskiy wrote:
>> On 05/11/16 08:10, Kirti Wankhede wrote:
>>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>>> managed by an IOMMU domain.
>>>
>>> Aim of this change is:
>>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>>> - To support direct assigned device and mediated device in single module
>>>
>>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>>> backend module. More details:
>>> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>>>   that is, of the process who mapped that iova range.
>>> - Added pfn_list tracking logic to address space structure. All pages
>>>   pinned through this interface are trached in its address space.
>>> - Pinned pages list is used to verify unpinning request and to unpin
>>>   remaining pages while detaching the group for that device.
>>> - Page accounting is updated to account in its address space where the
>>>   pages are pinned/unpinned.
>>> -  Accouting for mdev device is only done if there is no iommu capable
>>>   domain in the container. When there is a direct device assigned to the
>>>   container and that domain is iommu capable, all pages are already pinned
>>>   during DMA_MAP.
>>> - Page accouting is updated on hot plug and unplug mdev device and pass
>>>   through device.
>>>
>>> Tested by assigning below combinations of devices to a single VM:
>>> - GPU pass through only
>>
>> This does not require this patchset, right?
>>

Sorry I missed this earlier.
This testing is required for this patch, because this patch touches code
that is used for direct device assignment. Also for page accounting, all
cases are considered i.e. when there is only pass through device in a
container, when there is pass through device + vGPU device in a
container. Also have to test that pages are pinned properly when device
is hotplugged. In that case vfio_iommu_replay() is called to take
necessary action.

>>> - vGPU device only
>>
>> Out of curiosity - how exactly did you test this? The exact GPU, how to
>> create vGPU, what was the QEMU command line and the guest does with this
>> passed device? Thanks.
> 
> ping?
> 

I'm testing this code with M60, with custom changes in our driver.
Steps how to create mediated device are listed in
Documentation/vfio-mediated-device.txt for sample mtty driver. Same
steps I'm following for GPU. Quoting those steps here for you:

2. Create a mediated device by using the dummy device that you created
in the
   previous step.

   # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >  \

/sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create

3. Add parameters to qemu-kvm.

   -device vfio-pci,\
sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001


Thanks,
Kirti



Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Juergen Gross
On 15/11/16 01:11, Alex Thorlton wrote:
> Hey everyone,
> 
> We're having problems with large systems hitting a BUG in
> xen_memory_setup, due to extra e820 entries created in the
> XENMEM_machine_memory_map callback.  The change in the patch gets things
> working, but Boris and I wanted to get opinions on whether or not this
> is the appropriate/entire solution, which is why I've sent it as an RFC
> for now.
> 
> Boris pointed out to me that E820_X_MAX is only large when CONFIG_EFI=y,
> which is a detail worth discussig.  He proposed possibly adding
> CONFIG_XEN to the conditions under which we set E820_X_MAX to a larger
> value than E820MAX, since the Xen e820 table isn't bound by the
> zero-page memory limitations.
> 
> I do *slightly* question the use of E820_X_MAX here, only from a
> cosmetic prospective, as I believe this macro is intended to describe
> the maximum size of the extended e820 table, which, AFAIK, is not used
> by the Xen HV.  That being said, there isn't exactly a "more
> appropriate" macro/variable to use, so this may not really be an issue.
> 
> Any input on the patch, or the questions I've raised above is greatly
> appreciated!

While I think extending the e820 table is the right thing to do I'm
questioning the assumptions here.

Looking briefly through the Xen hypervisor sources I think it isn't
yet ready for such large machines: the hypervisor's e820 map seems to
be still limited to 128 e820 entries. Jan, did I overlook an EFI
specific path extending this limitation?

In case I'm right the Xen hypervisor should be prepared for a larger
e820 map, but this won't help alone as there would still be additional
entries for the IOAPICs created.

So I think we need something like:

#define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)

and use this for sizing xen_e820_map[].


Juergen


Re: [PATCH -tip v2 2/6] selftests: ftrace: Initialize ftrace before each test

2016-11-14 Thread Masami Hiramatsu
On Mon, 14 Nov 2016 13:12:00 -0500
Steven Rostedt  wrote:

> On Sun, 30 Oct 2016 15:54:10 +0900
> Masami Hiramatsu  wrote:
> 
> > Reset ftrace to initial state before running each test.
> > This fixes some test cases to enable tracing before starting
> > trace test. This can avoid false-positive failure when
> > previous testcase fails while disabling tracing.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > Suggested-by: Steven Rostedt 
> > ---
> >  tools/testing/selftests/ftrace/ftracetest   |2 +-
> >  tools/testing/selftests/ftrace/test.d/functions |   25 
> > +++
> >  2 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/ftrace/ftracetest 
> > b/tools/testing/selftests/ftrace/ftracetest
> > index 4c6a0bf..a03d366 100755
> > --- a/tools/testing/selftests/ftrace/ftracetest
> > +++ b/tools/testing/selftests/ftrace/ftracetest
> > @@ -228,7 +228,7 @@ trap 'SIG_RESULT=$XFAIL' $SIG_XFAIL
> >  
> >  __run_test() { # testfile
> ># setup PID and PPID, $$ is not updated.
> > -  (cd $TRACING_DIR; read PID _ < /proc/self/stat ; set -e; set -x; . $1)
> > +  (cd $TRACING_DIR; read PID _ < /proc/self/stat; set -e; set -x; 
> > initialize_ftrace; . $1)
> >[ $? -ne 0 ] && kill -s $SIG_FAIL $SIG_PID
> >  }
> >  
> > diff --git a/tools/testing/selftests/ftrace/test.d/functions 
> > b/tools/testing/selftests/ftrace/test.d/functions
> > index c37262f..fbaf565 100644
> > --- a/tools/testing/selftests/ftrace/test.d/functions
> > +++ b/tools/testing/selftests/ftrace/test.d/functions
> > @@ -23,3 +23,28 @@ reset_trigger() { # reset all current setting triggers
> >  done
> >  }
> >  
> > +reset_events_filter() { # reset all current setting filters
> > +grep -v ^none events/*/*/filter |
> > +while read line; do
> > +   echo 0 > `echo $line | cut -f1 -d:`
> > +done
> > +}
> > +
> > +disable_events() {
> > +echo 0 > events/enable
> > +}
> > +
> > +initialize_ftrace() { # Reset ftrace to initial-state
> > +# As the initial state, ftrace will be set to nop tracer,
> > +# no events, no triggers, no filters, no function filters,
> > +# no probes, and tracing on.
> > +disable_tracing
> > +reset_tracer
> > +reset_trigger
> > +reset_events_filter
> > +disable_events
> > +echo | tee set_ftrace_* set_graph_* stack_trace_filter set_event_pid
> 
> I just disabled function graph tracing, and this causes every test to
> fail.
> 
>tee: set_graph_*: Permission denied

Oops, right. OK, I'll fix that.

Thanks!

> 
> -- Steve
> 
> > +echo > kprobe_events
> > +echo > uprobe_events
> > +enable_tracing
> > +}
> 


-- 
Masami Hiramatsu 


Re: [PATCH -tip v2 2/6] selftests: ftrace: Initialize ftrace before each test

2016-11-14 Thread Masami Hiramatsu
On Mon, 14 Nov 2016 13:12:00 -0500
Steven Rostedt  wrote:

> On Sun, 30 Oct 2016 15:54:10 +0900
> Masami Hiramatsu  wrote:
> 
> > Reset ftrace to initial state before running each test.
> > This fixes some test cases to enable tracing before starting
> > trace test. This can avoid false-positive failure when
> > previous testcase fails while disabling tracing.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > Suggested-by: Steven Rostedt 
> > ---
> >  tools/testing/selftests/ftrace/ftracetest   |2 +-
> >  tools/testing/selftests/ftrace/test.d/functions |   25 
> > +++
> >  2 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/ftrace/ftracetest 
> > b/tools/testing/selftests/ftrace/ftracetest
> > index 4c6a0bf..a03d366 100755
> > --- a/tools/testing/selftests/ftrace/ftracetest
> > +++ b/tools/testing/selftests/ftrace/ftracetest
> > @@ -228,7 +228,7 @@ trap 'SIG_RESULT=$XFAIL' $SIG_XFAIL
> >  
> >  __run_test() { # testfile
> ># setup PID and PPID, $$ is not updated.
> > -  (cd $TRACING_DIR; read PID _ < /proc/self/stat ; set -e; set -x; . $1)
> > +  (cd $TRACING_DIR; read PID _ < /proc/self/stat; set -e; set -x; 
> > initialize_ftrace; . $1)
> >[ $? -ne 0 ] && kill -s $SIG_FAIL $SIG_PID
> >  }
> >  
> > diff --git a/tools/testing/selftests/ftrace/test.d/functions 
> > b/tools/testing/selftests/ftrace/test.d/functions
> > index c37262f..fbaf565 100644
> > --- a/tools/testing/selftests/ftrace/test.d/functions
> > +++ b/tools/testing/selftests/ftrace/test.d/functions
> > @@ -23,3 +23,28 @@ reset_trigger() { # reset all current setting triggers
> >  done
> >  }
> >  
> > +reset_events_filter() { # reset all current setting filters
> > +grep -v ^none events/*/*/filter |
> > +while read line; do
> > +   echo 0 > `echo $line | cut -f1 -d:`
> > +done
> > +}
> > +
> > +disable_events() {
> > +echo 0 > events/enable
> > +}
> > +
> > +initialize_ftrace() { # Reset ftrace to initial-state
> > +# As the initial state, ftrace will be set to nop tracer,
> > +# no events, no triggers, no filters, no function filters,
> > +# no probes, and tracing on.
> > +disable_tracing
> > +reset_tracer
> > +reset_trigger
> > +reset_events_filter
> > +disable_events
> > +echo | tee set_ftrace_* set_graph_* stack_trace_filter set_event_pid
> 
> I just disabled function graph tracing, and this causes every test to
> fail.
> 
>tee: set_graph_*: Permission denied

Oops, right. OK, I'll fix that.

Thanks!

> 
> -- Steve
> 
> > +echo > kprobe_events
> > +echo > uprobe_events
> > +enable_tracing
> > +}
> 


-- 
Masami Hiramatsu 


Re: kvm: deadlock between kvm_vm_ioctl_get_dirty_log/kvm_hv_set_msr_common/kvm_create_pit

2016-11-14 Thread Dmitry Vyukov
On Tue, Nov 15, 2016 at 7:27 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program produces a deadlocked, unkillable process:
> https://gist.githubusercontent.com/dvyukov/fb7e93f6618f4eccb84d419ea6cec491/raw/a14b60250e593eb1b61f50cead41059dc49ceff2/gistfile1.txt
>
>
> # cat /proc/9362/task/*/stack
> [] __synchronize_srcu+0x2f8/0x4a0 kernel/rcu/srcu.c:448
> [] synchronize_srcu_expedited+0x13/0x20 
> kernel/rcu/srcu.c:510
> [] kvm_io_bus_register_dev+0x2ab/0x3e0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:3559
> [] kvm_create_pit+0x5c6/0x8c0 arch/x86/kvm/i8254.c:694
> [] kvm_arch_vm_ioctl+0x1406/0x23c0 arch/x86/kvm/x86.c:3956
> [] kvm_vm_ioctl+0x1fa/0x1a70
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
> [< inline >] vfs_ioctl fs/ioctl.c:43
> [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
> [< inline >] SYSC_ioctl fs/ioctl.c:694
> [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
> [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> [] 0x
>
> [] kvm_hv_set_msr_common+0x163/0x2a30
> arch/x86/kvm/hyperv.c:1145
> [] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
> [] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
> [] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
> [] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
> [< inline >] __msr_io arch/x86/kvm/x86.c:2523
> [] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
> [] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
> [] kvm_vcpu_ioctl+0x237/0x11c0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2710
> [< inline >] vfs_ioctl fs/ioctl.c:43
> [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
> [< inline >] SYSC_ioctl fs/ioctl.c:694
> [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
> [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
>
> [] 0x
> [] kvm_vm_ioctl_get_dirty_log+0x8f/0x210
> arch/x86/kvm/x86.c:3779
> [] kvm_vm_ioctl+0x11e4/0x1a70
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2969
> [< inline >] vfs_ioctl fs/ioctl.c:43
> [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
> [< inline >] SYSC_ioctl fs/ioctl.c:694
> [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
> [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> [] 0x
>
>
> INFO: task syz-executor:5833 blocked for more than 120 seconds.
>   Not tainted 4.9.0-rc5+ #28
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executorD17872  5833   4082 0x0004
>  880033944780 8800602f5100 8800652b0c80 8800391a2380
>  88006d122cd8 8800368763a8 8812c15c 41b58ab3
>  88006d123668 88006d123640 110006d0ec5c 88006d122cd8
> Call Trace:
>  [] schedule+0x10d/0x460 kernel/sched/core.c:3457
>  [] schedule_preempt_disabled+0x15/0x20
> kernel/sched/core.c:3490
>  [< inline >] __mutex_lock_common kernel/locking/mutex.c:582
>  [] mutex_lock_nested+0x686/0xf20 kernel/locking/mutex.c:621
>  [] kvm_hv_set_msr_common+0x163/0x2a30
> arch/x86/kvm/hyperv.c:1145
>  [] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
>  [] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
>  [] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
>  [] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
>  [< inline >] __msr_io arch/x86/kvm/x86.c:2523
>  [] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
>  [] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
>  [] kvm_vcpu_ioctl+0x237/0x11c0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2708
>  [< inline >] vfs_ioctl fs/ioctl.c:43
>  [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
>  [< inline >] SYSC_ioctl fs/ioctl.c:694
>  [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
>  [] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> [ 3319.345108] Showing all locks held in the system:
> [ 3319.349897] 2 locks held by khungtaskd/1328:
> [ 3319.352888]  #0: [ 3319.354562]  (
> rcu_read_lock[ 3319.358168] ){..}
> , at: [ 3319.360511] [] watchdog+0x1cc/0xd70
> [ 3319.363841]  #1: [ 3319.364761]  (
> tasklist_lock[ 3319.367215] ){.+.+..}
> , at: [ 3319.369197] [] debug_show_all_locks+0xd2/0x420
> [ 3319.374809] 3 locks held by syz-executor/5833:
> [ 3319.388745]  #0: [ 3319.390145]  (
> >mutex[ 3319.391749] ){+.+.+.}
> , at: [ 3319.392313] [] vcpu_load+0x21/0x70
> [ 3319.396281]  #1: [ 3319.398802]  (
> >srcu[ 3319.399431] ){..}
> , at: [ 3319.399883] [] msr_io+0x148/0x460
> [ 3319.403905]  #2: [ 3319.404639]  (
> >lock[ 3319.406582] ){+.+.+.}
> , at: [ 3319.409670] [] kvm_hv_set_msr_common+0x163/0x2a30
> [ 3319.422421] 2 locks held by syz-executor/5849:
> [ 3319.425646]  #0: [ 3319.426948]  (
> >lock[ 3319.427747] ){+.+.+.}
> , at: [ 3319.428368] [] kvm_arch_vm_ioctl+0xb4e/0x23c0
> [ 3319.429594]  #1: [ 3319.429942]  (
> >slots_lock[ 3319.430881] ){+.+.+.}
> , at: [ 3319.431631] [] kvm_create_pit+0x589/0x8c0
>
>
> On commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13)


kvm_vm_ioctl_get_dirty_log is probably unrelated because I also see

Re: kvm: deadlock between kvm_vm_ioctl_get_dirty_log/kvm_hv_set_msr_common/kvm_create_pit

2016-11-14 Thread Dmitry Vyukov
On Tue, Nov 15, 2016 at 7:27 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program produces a deadlocked, unkillable process:
> https://gist.githubusercontent.com/dvyukov/fb7e93f6618f4eccb84d419ea6cec491/raw/a14b60250e593eb1b61f50cead41059dc49ceff2/gistfile1.txt
>
>
> # cat /proc/9362/task/*/stack
> [] __synchronize_srcu+0x2f8/0x4a0 kernel/rcu/srcu.c:448
> [] synchronize_srcu_expedited+0x13/0x20 
> kernel/rcu/srcu.c:510
> [] kvm_io_bus_register_dev+0x2ab/0x3e0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:3559
> [] kvm_create_pit+0x5c6/0x8c0 arch/x86/kvm/i8254.c:694
> [] kvm_arch_vm_ioctl+0x1406/0x23c0 arch/x86/kvm/x86.c:3956
> [] kvm_vm_ioctl+0x1fa/0x1a70
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
> [< inline >] vfs_ioctl fs/ioctl.c:43
> [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
> [< inline >] SYSC_ioctl fs/ioctl.c:694
> [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
> [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> [] 0x
>
> [] kvm_hv_set_msr_common+0x163/0x2a30
> arch/x86/kvm/hyperv.c:1145
> [] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
> [] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
> [] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
> [] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
> [< inline >] __msr_io arch/x86/kvm/x86.c:2523
> [] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
> [] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
> [] kvm_vcpu_ioctl+0x237/0x11c0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2710
> [< inline >] vfs_ioctl fs/ioctl.c:43
> [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
> [< inline >] SYSC_ioctl fs/ioctl.c:694
> [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
> [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
>
> [] 0x
> [] kvm_vm_ioctl_get_dirty_log+0x8f/0x210
> arch/x86/kvm/x86.c:3779
> [] kvm_vm_ioctl+0x11e4/0x1a70
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2969
> [< inline >] vfs_ioctl fs/ioctl.c:43
> [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
> [< inline >] SYSC_ioctl fs/ioctl.c:694
> [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
> [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> [] 0x
>
>
> INFO: task syz-executor:5833 blocked for more than 120 seconds.
>   Not tainted 4.9.0-rc5+ #28
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executorD17872  5833   4082 0x0004
>  880033944780 8800602f5100 8800652b0c80 8800391a2380
>  88006d122cd8 8800368763a8 8812c15c 41b58ab3
>  88006d123668 88006d123640 110006d0ec5c 88006d122cd8
> Call Trace:
>  [] schedule+0x10d/0x460 kernel/sched/core.c:3457
>  [] schedule_preempt_disabled+0x15/0x20
> kernel/sched/core.c:3490
>  [< inline >] __mutex_lock_common kernel/locking/mutex.c:582
>  [] mutex_lock_nested+0x686/0xf20 kernel/locking/mutex.c:621
>  [] kvm_hv_set_msr_common+0x163/0x2a30
> arch/x86/kvm/hyperv.c:1145
>  [] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
>  [] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
>  [] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
>  [] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
>  [< inline >] __msr_io arch/x86/kvm/x86.c:2523
>  [] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
>  [] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
>  [] kvm_vcpu_ioctl+0x237/0x11c0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2708
>  [< inline >] vfs_ioctl fs/ioctl.c:43
>  [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
>  [< inline >] SYSC_ioctl fs/ioctl.c:694
>  [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
>  [] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> [ 3319.345108] Showing all locks held in the system:
> [ 3319.349897] 2 locks held by khungtaskd/1328:
> [ 3319.352888]  #0: [ 3319.354562]  (
> rcu_read_lock[ 3319.358168] ){..}
> , at: [ 3319.360511] [] watchdog+0x1cc/0xd70
> [ 3319.363841]  #1: [ 3319.364761]  (
> tasklist_lock[ 3319.367215] ){.+.+..}
> , at: [ 3319.369197] [] debug_show_all_locks+0xd2/0x420
> [ 3319.374809] 3 locks held by syz-executor/5833:
> [ 3319.388745]  #0: [ 3319.390145]  (
> >mutex[ 3319.391749] ){+.+.+.}
> , at: [ 3319.392313] [] vcpu_load+0x21/0x70
> [ 3319.396281]  #1: [ 3319.398802]  (
> >srcu[ 3319.399431] ){..}
> , at: [ 3319.399883] [] msr_io+0x148/0x460
> [ 3319.403905]  #2: [ 3319.404639]  (
> >lock[ 3319.406582] ){+.+.+.}
> , at: [ 3319.409670] [] kvm_hv_set_msr_common+0x163/0x2a30
> [ 3319.422421] 2 locks held by syz-executor/5849:
> [ 3319.425646]  #0: [ 3319.426948]  (
> >lock[ 3319.427747] ){+.+.+.}
> , at: [ 3319.428368] [] kvm_arch_vm_ioctl+0xb4e/0x23c0
> [ 3319.429594]  #1: [ 3319.429942]  (
> >slots_lock[ 3319.430881] ){+.+.+.}
> , at: [ 3319.431631] [] kvm_create_pit+0x589/0x8c0
>
>
> On commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13)


kvm_vm_ioctl_get_dirty_log is probably unrelated because I also see
following deadlocks:

# cat 

kvm: deadlock between kvm_vm_ioctl_get_dirty_log/kvm_hv_set_msr_common/kvm_create_pit

2016-11-14 Thread Dmitry Vyukov
Hello,

The following program produces a deadlocked, unkillable process:
https://gist.githubusercontent.com/dvyukov/fb7e93f6618f4eccb84d419ea6cec491/raw/a14b60250e593eb1b61f50cead41059dc49ceff2/gistfile1.txt


# cat /proc/9362/task/*/stack
[] __synchronize_srcu+0x2f8/0x4a0 kernel/rcu/srcu.c:448
[] synchronize_srcu_expedited+0x13/0x20 kernel/rcu/srcu.c:510
[] kvm_io_bus_register_dev+0x2ab/0x3e0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3559
[] kvm_create_pit+0x5c6/0x8c0 arch/x86/kvm/i8254.c:694
[] kvm_arch_vm_ioctl+0x1406/0x23c0 arch/x86/kvm/x86.c:3956
[] kvm_vm_ioctl+0x1fa/0x1a70
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[< inline >] SYSC_ioctl fs/ioctl.c:694
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
[] 0x

[] kvm_hv_set_msr_common+0x163/0x2a30
arch/x86/kvm/hyperv.c:1145
[] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
[] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
[] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
[] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
[< inline >] __msr_io arch/x86/kvm/x86.c:2523
[] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
[] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
[] kvm_vcpu_ioctl+0x237/0x11c0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2710
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[< inline >] SYSC_ioctl fs/ioctl.c:694
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209

[] 0x
[] kvm_vm_ioctl_get_dirty_log+0x8f/0x210
arch/x86/kvm/x86.c:3779
[] kvm_vm_ioctl+0x11e4/0x1a70
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2969
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[< inline >] SYSC_ioctl fs/ioctl.c:694
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
[] 0x


INFO: task syz-executor:5833 blocked for more than 120 seconds.
  Not tainted 4.9.0-rc5+ #28
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executorD17872  5833   4082 0x0004
 880033944780 8800602f5100 8800652b0c80 8800391a2380
 88006d122cd8 8800368763a8 8812c15c 41b58ab3
 88006d123668 88006d123640 110006d0ec5c 88006d122cd8
Call Trace:
 [] schedule+0x10d/0x460 kernel/sched/core.c:3457
 [] schedule_preempt_disabled+0x15/0x20
kernel/sched/core.c:3490
 [< inline >] __mutex_lock_common kernel/locking/mutex.c:582
 [] mutex_lock_nested+0x686/0xf20 kernel/locking/mutex.c:621
 [] kvm_hv_set_msr_common+0x163/0x2a30
arch/x86/kvm/hyperv.c:1145
 [] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
 [] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
 [] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
 [] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
 [< inline >] __msr_io arch/x86/kvm/x86.c:2523
 [] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
 [] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
 [] kvm_vcpu_ioctl+0x237/0x11c0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2708
 [< inline >] vfs_ioctl fs/ioctl.c:43
 [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
 [< inline >] SYSC_ioctl fs/ioctl.c:694
 [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
 [] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 3319.345108] Showing all locks held in the system:
[ 3319.349897] 2 locks held by khungtaskd/1328:
[ 3319.352888]  #0: [ 3319.354562]  (
rcu_read_lock[ 3319.358168] ){..}
, at: [ 3319.360511] [] watchdog+0x1cc/0xd70
[ 3319.363841]  #1: [ 3319.364761]  (
tasklist_lock[ 3319.367215] ){.+.+..}
, at: [ 3319.369197] [] debug_show_all_locks+0xd2/0x420
[ 3319.374809] 3 locks held by syz-executor/5833:
[ 3319.388745]  #0: [ 3319.390145]  (
>mutex[ 3319.391749] ){+.+.+.}
, at: [ 3319.392313] [] vcpu_load+0x21/0x70
[ 3319.396281]  #1: [ 3319.398802]  (
>srcu[ 3319.399431] ){..}
, at: [ 3319.399883] [] msr_io+0x148/0x460
[ 3319.403905]  #2: [ 3319.404639]  (
>lock[ 3319.406582] ){+.+.+.}
, at: [ 3319.409670] [] kvm_hv_set_msr_common+0x163/0x2a30
[ 3319.422421] 2 locks held by syz-executor/5849:
[ 3319.425646]  #0: [ 3319.426948]  (
>lock[ 3319.427747] ){+.+.+.}
, at: [ 3319.428368] [] kvm_arch_vm_ioctl+0xb4e/0x23c0
[ 3319.429594]  #1: [ 3319.429942]  (
>slots_lock[ 3319.430881] ){+.+.+.}
, at: [ 3319.431631] [] kvm_create_pit+0x589/0x8c0


On commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13)


[PATCH] kvm: x86: don't print warning messages for unimplemented msrs

2016-11-14 Thread Bandan Das

Change unimplemented msrs messages to use pr_debug.
If CONFIG_DYNAMIC_DEBUG is set, then these messages can be
enabled at run time or else -DDEBUG can be used at compile
time to enable them. These messages will still be printed if
ignore_msrs=1.

Signed-off-by: Bandan Das 
---
This is a follow up to RFC posted by Dave at
https://patchwork.kernel.org/patch/9238227/ which uses pr_debug_ratelimited
when ignore_msrs is not set.

 arch/x86/kvm/mmu.c   | 2 +-
 arch/x86/kvm/x86.c   | 5 +++--
 include/linux/kvm_host.h | 5 +
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7e98..1b3f241 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4958,7 +4958,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, 
struct kvm_memslots *slots)
 * zap all shadow pages.
 */
if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) {
-   printk_ratelimited(KERN_DEBUG "kvm: zapping shadow pages for 
mmio generation wraparound\n");
+   kvm_debug_ratelimited("kvm: zapping shadow pages for mmio 
generation wraparound\n");
kvm_mmu_invalidate_zap_all_pages(kvm);
}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0..5d50403 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2280,7 +2280,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
+   vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x 
data 0x%llx\n",
msr, data);
return 1;
} else {
@@ -2492,7 +2492,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info->index, 
_info->data);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", 
msr_info->index);
+   vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
+  msr_info->index);
return 1;
} else {
vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", 
msr_info->index);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 01c0b9c..e4c0980 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -439,6 +439,9 @@ struct kvm {
pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
pr_debug("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_debug_ratelimited(fmt, ...) \
+   pr_debug_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), \
+## __VA_ARGS__)
 #define kvm_pr_unimpl(fmt, ...) \
pr_err_ratelimited("kvm [%i]: " fmt, \
   task_tgid_nr(current), ## __VA_ARGS__)
@@ -450,6 +453,8 @@ struct kvm {
 
 #define vcpu_debug(vcpu, fmt, ...) \
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+#define vcpu_debug_ratelimited(vcpu, fmt, ...) \
+   kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 #define vcpu_err(vcpu, fmt, ...)   \
kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 
-- 
2.5.5



kvm: deadlock between kvm_vm_ioctl_get_dirty_log/kvm_hv_set_msr_common/kvm_create_pit

2016-11-14 Thread Dmitry Vyukov
Hello,

The following program produces a deadlocked, unkillable process:
https://gist.githubusercontent.com/dvyukov/fb7e93f6618f4eccb84d419ea6cec491/raw/a14b60250e593eb1b61f50cead41059dc49ceff2/gistfile1.txt


# cat /proc/9362/task/*/stack
[] __synchronize_srcu+0x2f8/0x4a0 kernel/rcu/srcu.c:448
[] synchronize_srcu_expedited+0x13/0x20 kernel/rcu/srcu.c:510
[] kvm_io_bus_register_dev+0x2ab/0x3e0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3559
[] kvm_create_pit+0x5c6/0x8c0 arch/x86/kvm/i8254.c:694
[] kvm_arch_vm_ioctl+0x1406/0x23c0 arch/x86/kvm/x86.c:3956
[] kvm_vm_ioctl+0x1fa/0x1a70
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[< inline >] SYSC_ioctl fs/ioctl.c:694
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
[] 0x

[] kvm_hv_set_msr_common+0x163/0x2a30
arch/x86/kvm/hyperv.c:1145
[] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
[] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
[] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
[] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
[< inline >] __msr_io arch/x86/kvm/x86.c:2523
[] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
[] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
[] kvm_vcpu_ioctl+0x237/0x11c0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2710
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[< inline >] SYSC_ioctl fs/ioctl.c:694
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209

[] 0x
[] kvm_vm_ioctl_get_dirty_log+0x8f/0x210
arch/x86/kvm/x86.c:3779
[] kvm_vm_ioctl+0x11e4/0x1a70
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2969
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[< inline >] SYSC_ioctl fs/ioctl.c:694
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
[] 0x


INFO: task syz-executor:5833 blocked for more than 120 seconds.
  Not tainted 4.9.0-rc5+ #28
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executorD17872  5833   4082 0x0004
 880033944780 8800602f5100 8800652b0c80 8800391a2380
 88006d122cd8 8800368763a8 8812c15c 41b58ab3
 88006d123668 88006d123640 110006d0ec5c 88006d122cd8
Call Trace:
 [] schedule+0x10d/0x460 kernel/sched/core.c:3457
 [] schedule_preempt_disabled+0x15/0x20
kernel/sched/core.c:3490
 [< inline >] __mutex_lock_common kernel/locking/mutex.c:582
 [] mutex_lock_nested+0x686/0xf20 kernel/locking/mutex.c:621
 [] kvm_hv_set_msr_common+0x163/0x2a30
arch/x86/kvm/hyperv.c:1145
 [] kvm_set_msr_common+0xb0b/0x23a0 arch/x86/kvm/x86.c:2261
 [] vmx_set_msr+0x27d/0xcb0 arch/x86/kvm/vmx.c:3149
 [] kvm_set_msr+0xd9/0x170 arch/x86/kvm/x86.c:1084
 [] do_set_msr+0x123/0x1a0 arch/x86/kvm/x86.c:1113
 [< inline >] __msr_io arch/x86/kvm/x86.c:2523
 [] msr_io+0x250/0x460 arch/x86/kvm/x86.c:2560
 [] kvm_arch_vcpu_ioctl+0x360/0x44a0 arch/x86/kvm/x86.c:3401
 [] kvm_vcpu_ioctl+0x237/0x11c0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2708
 [< inline >] vfs_ioctl fs/ioctl.c:43
 [] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
 [< inline >] SYSC_ioctl fs/ioctl.c:694
 [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
 [] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 3319.345108] Showing all locks held in the system:
[ 3319.349897] 2 locks held by khungtaskd/1328:
[ 3319.352888]  #0: [ 3319.354562]  (
rcu_read_lock[ 3319.358168] ){..}
, at: [ 3319.360511] [] watchdog+0x1cc/0xd70
[ 3319.363841]  #1: [ 3319.364761]  (
tasklist_lock[ 3319.367215] ){.+.+..}
, at: [ 3319.369197] [] debug_show_all_locks+0xd2/0x420
[ 3319.374809] 3 locks held by syz-executor/5833:
[ 3319.388745]  #0: [ 3319.390145]  (
>mutex[ 3319.391749] ){+.+.+.}
, at: [ 3319.392313] [] vcpu_load+0x21/0x70
[ 3319.396281]  #1: [ 3319.398802]  (
>srcu[ 3319.399431] ){..}
, at: [ 3319.399883] [] msr_io+0x148/0x460
[ 3319.403905]  #2: [ 3319.404639]  (
>lock[ 3319.406582] ){+.+.+.}
, at: [ 3319.409670] [] kvm_hv_set_msr_common+0x163/0x2a30
[ 3319.422421] 2 locks held by syz-executor/5849:
[ 3319.425646]  #0: [ 3319.426948]  (
>lock[ 3319.427747] ){+.+.+.}
, at: [ 3319.428368] [] kvm_arch_vm_ioctl+0xb4e/0x23c0
[ 3319.429594]  #1: [ 3319.429942]  (
>slots_lock[ 3319.430881] ){+.+.+.}
, at: [ 3319.431631] [] kvm_create_pit+0x589/0x8c0


On commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13)


[PATCH] kvm: x86: don't print warning messages for unimplemented msrs

2016-11-14 Thread Bandan Das

Change unimplemented msrs messages to use pr_debug.
If CONFIG_DYNAMIC_DEBUG is set, then these messages can be
enabled at run time or else -DDEBUG can be used at compile
time to enable them. These messages will still be printed if
ignore_msrs=1.

Signed-off-by: Bandan Das 
---
This is a follow up to RFC posted by Dave at
https://patchwork.kernel.org/patch/9238227/ which uses pr_debug_ratelimited
when ignore_msrs is not set.

 arch/x86/kvm/mmu.c   | 2 +-
 arch/x86/kvm/x86.c   | 5 +++--
 include/linux/kvm_host.h | 5 +
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7e98..1b3f241 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4958,7 +4958,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, 
struct kvm_memslots *slots)
 * zap all shadow pages.
 */
if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) {
-   printk_ratelimited(KERN_DEBUG "kvm: zapping shadow pages for 
mmio generation wraparound\n");
+   kvm_debug_ratelimited("kvm: zapping shadow pages for mmio 
generation wraparound\n");
kvm_mmu_invalidate_zap_all_pages(kvm);
}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0..5d50403 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2280,7 +2280,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
+   vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x 
data 0x%llx\n",
msr, data);
return 1;
} else {
@@ -2492,7 +2492,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info->index, 
_info->data);
if (!ignore_msrs) {
-   vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", 
msr_info->index);
+   vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
+  msr_info->index);
return 1;
} else {
vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", 
msr_info->index);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 01c0b9c..e4c0980 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -439,6 +439,9 @@ struct kvm {
pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
pr_debug("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_debug_ratelimited(fmt, ...) \
+   pr_debug_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), \
+## __VA_ARGS__)
 #define kvm_pr_unimpl(fmt, ...) \
pr_err_ratelimited("kvm [%i]: " fmt, \
   task_tgid_nr(current), ## __VA_ARGS__)
@@ -450,6 +453,8 @@ struct kvm {
 
 #define vcpu_debug(vcpu, fmt, ...) \
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+#define vcpu_debug_ratelimited(vcpu, fmt, ...) \
+   kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 #define vcpu_err(vcpu, fmt, ...)   \
kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 
-- 
2.5.5



[GIT PULL] arch/tile bugfix for 4.9-rc6

2016-11-14 Thread Chris Metcalf

Linus,

Please pull the following change for 4.9-rc6 from:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git stable

This just fixes an incompatibility with tile __ro_after_init.

Chris Metcalf (1):
  tile: handle __ro_after_init like parisc does

 arch/tile/include/asm/cache.h | 3 +++
 1 file changed, 3 insertions(+)

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com




[GIT PULL] arch/tile bugfix for 4.9-rc6

2016-11-14 Thread Chris Metcalf

Linus,

Please pull the following change for 4.9-rc6 from:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git stable

This just fixes an incompatibility with tile __ro_after_init.

Chris Metcalf (1):
  tile: handle __ro_after_init like parisc does

 arch/tile/include/asm/cache.h | 3 +++
 1 file changed, 3 insertions(+)

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com




Re: [PATCH 2/3] qemu: Implement virtio-pstore device

2016-11-14 Thread Namhyung Kim
On Fri, Nov 11, 2016 at 12:50:03AM +0200, Michael S. Tsirkin wrote:
> On Fri, Sep 16, 2016 at 07:05:47PM +0900, Namhyung Kim wrote:
> > On Tue, Sep 13, 2016 at 06:57:10PM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Aug 20, 2016 at 05:07:43PM +0900, Namhyung Kim wrote:
> > > > +
> > > > +/* the index should match to the type value */
> > > > +static const char *virtio_pstore_file_prefix[] = {
> > > > +"unknown-",/* VIRTIO_PSTORE_TYPE_UNKNOWN */
> > > 
> > > Is there value in treating everything unexpected as "unknown"
> > > and rotating them as if they were logs?
> > > It might be better to treat everything that's not known
> > > as guest error.
> > 
> > I was thinking about the version mismatch between the kernel and qemu.
> > I'd like to make the device can deal with a new kernel version which
> > might implement a new pstore message type.  It will be saved as
> > unknown but the kernel can read it properly later.
> 
> Well it'll have a different prefix. E.g. if kernel has
> two different types they will end up in the same
> file, hardly what was wanted.

Right, I think it needs to add 'type' info to the filename for unknown
type.

Thanks,
Namhyung


Re: [PATCH 2/3] qemu: Implement virtio-pstore device

2016-11-14 Thread Namhyung Kim
On Fri, Nov 11, 2016 at 12:50:03AM +0200, Michael S. Tsirkin wrote:
> On Fri, Sep 16, 2016 at 07:05:47PM +0900, Namhyung Kim wrote:
> > On Tue, Sep 13, 2016 at 06:57:10PM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Aug 20, 2016 at 05:07:43PM +0900, Namhyung Kim wrote:
> > > > +
> > > > +/* the index should match to the type value */
> > > > +static const char *virtio_pstore_file_prefix[] = {
> > > > +"unknown-",/* VIRTIO_PSTORE_TYPE_UNKNOWN */
> > > 
> > > Is there value in treating everything unexpected as "unknown"
> > > and rotating them as if they were logs?
> > > It might be better to treat everything that's not known
> > > as guest error.
> > 
> > I was thinking about the version mismatch between the kernel and qemu.
> > I'd like to make the device can deal with a new kernel version which
> > might implement a new pstore message type.  It will be saved as
> > unknown but the kernel can read it properly later.
> 
> Well it'll have a different prefix. E.g. if kernel has
> two different types they will end up in the same
> file, hardly what was wanted.

Right, I think it needs to add 'type' info to the filename for unknown
type.

Thanks,
Namhyung


Re: [RFC PATCH] x86/debug: Dump more detailed segfault info

2016-11-14 Thread Ingo Molnar

* Borislav Petkov  wrote:

> On Sun, Nov 13, 2016 at 12:25:52PM +0100, Borislav Petkov wrote:
> > Hmm, enabling all *PRINTK* options from your .config doesn't change
> > anything for my qemu guest here. Lemme try with your full config.
> 
> Same with your .config:
> 
> [  115.694717] strsep[3027]: segfault at 40066b ip 77abe22b sp 
> 7fffe990 error 7 in libc-2.19.so[77a33000+19f000]
> [  115.700181] RIP: 0033:[<77abe22b>]  [<77abe22b>] 
> 0x77abe22b
> [  115.704843] RSP: 002b:7fffe990  EFLAGS: 00010202
> [  115.707183] RAX: 0040066b RBX: 00400664 RCX: 
> 
> [  115.709189] RDX:  RSI: 003d RDI: 
> 00400665
> [  115.711207] RBP: 7fffe9b0 R08: 77dd7c60 R09: 
> 77deae20
> [  115.713630] R10: 7fffe770 R11: 77abe200 R12: 
> 00400460
> [  115.715653] R13: 7fffeaa0 R14:  R15: 
> 
> [  115.717651] FS:  77fdc700() GS:88007ed0() 
> knlGS:
> [  115.719554] CS:  0010 DS:  ES:  CR0: 80050033
> [  115.720393] CR2: 0040066b CR3: 79f4f000 CR4: 
> 000406e0
> [  115.721409] Code: [  115.721692] 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a 
> ff ff 48 85 c0 74 20  00 00 48 83 c0 01 48 89 45 00 48 89 d8 48 83 c4 08 
> 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
> 
> Is this a real hw issue? I.e., maybe I should not be doing this in a
> guest?

So I think the line breaking artifact might be due to the following commit:

  bfd8d3f23b51 ("printk: make reading the kernel log flush pending lines")

... which Linus reverted upstream a few hours ago:

 commit f5c9f9c72395c3291c2e35c905dedae2b98475a4
 Author: Linus Torvalds 
 Date:   Mon Nov 14 09:31:52 2016 -0800

Revert "printk: make reading the kernel log flush pending lines"

This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294.

It turns out that this flushes things much too aggressiverly, and causes
lines to break up when the system logger races with new continuation
lines being printed.
...

Thanks,

Ingo


Re: [RFC PATCH] x86/debug: Dump more detailed segfault info

2016-11-14 Thread Ingo Molnar

* Borislav Petkov  wrote:

> On Sun, Nov 13, 2016 at 12:25:52PM +0100, Borislav Petkov wrote:
> > Hmm, enabling all *PRINTK* options from your .config doesn't change
> > anything for my qemu guest here. Lemme try with your full config.
> 
> Same with your .config:
> 
> [  115.694717] strsep[3027]: segfault at 40066b ip 77abe22b sp 
> 7fffe990 error 7 in libc-2.19.so[77a33000+19f000]
> [  115.700181] RIP: 0033:[<77abe22b>]  [<77abe22b>] 
> 0x77abe22b
> [  115.704843] RSP: 002b:7fffe990  EFLAGS: 00010202
> [  115.707183] RAX: 0040066b RBX: 00400664 RCX: 
> 
> [  115.709189] RDX:  RSI: 003d RDI: 
> 00400665
> [  115.711207] RBP: 7fffe9b0 R08: 77dd7c60 R09: 
> 77deae20
> [  115.713630] R10: 7fffe770 R11: 77abe200 R12: 
> 00400460
> [  115.715653] R13: 7fffeaa0 R14:  R15: 
> 
> [  115.717651] FS:  77fdc700() GS:88007ed0() 
> knlGS:
> [  115.719554] CS:  0010 DS:  ES:  CR0: 80050033
> [  115.720393] CR2: 0040066b CR3: 79f4f000 CR4: 
> 000406e0
> [  115.721409] Code: [  115.721692] 74 33 80 7e 01 00 74 22 48 89 df e8 5a 8a 
> ff ff 48 85 c0 74 20  00 00 48 83 c0 01 48 89 45 00 48 89 d8 48 83 c4 08 
> 5b 5d c3 0f b6 13 38 d0 74 29 84 d2 75 15 48 c7 45 00 00 00 00 00 48 83 c4
> 
> Is this a real hw issue? I.e., maybe I should not be doing this in a
> guest?

So I think the line breaking artifact might be due to the following commit:

  bfd8d3f23b51 ("printk: make reading the kernel log flush pending lines")

... which Linus reverted upstream a few hours ago:

 commit f5c9f9c72395c3291c2e35c905dedae2b98475a4
 Author: Linus Torvalds 
 Date:   Mon Nov 14 09:31:52 2016 -0800

Revert "printk: make reading the kernel log flush pending lines"

This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294.

It turns out that this flushes things much too aggressiverly, and causes
lines to break up when the system logger races with new continuation
lines being printed.
...

Thanks,

Ingo


Re: perf: fuzzer KASAN slab-out-of-bounds in snb_uncore_imc_event_del

2016-11-14 Thread Dmitry Vyukov
On Tue, Nov 15, 2016 at 6:57 AM, Vince Weaver  wrote:
> On Mon, 14 Nov 2016, Vince Weaver wrote:
>
>> Anyway as per the suggestion at Linux Plumbers I enabled KASAN and on my
>> haswell machine it falls over in a few minutes of running the perf_fuzzer.
>>
>> [  205.740194] 
>> ==
>> [  205.748005] BUG: KASAN: slab-out-of-bounds in 
>> snb_uncore_imc_event_del+0x6c/0xa0 at addr 8800caa43768
>> [  205.758324] Read of size 8 by task perf_fuzzer/6618
>> [  205.763589] CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4
>> [  205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
>> 01/26/2014
>> [  205.778689]  8800c3c479b8 816bb796 88011ec00600 
>> 8800caa43580
>> [  205.786759]  8800c3c479e0 812fb961 8800c3c47a78 
>> 8800caa43580
>> [  205.794850]  8800caa43580 8800c3c47a68 812fbbd8 
>> 8800c3c47a28
>> [  205.802911] Call Trace:
>> [  205.805559]  [] dump_stack+0x63/0x8d
>> [  205.811135]  [] kasan_object_err+0x21/0x70
>> [  205.817267]  [] kasan_report_error+0x1d8/0x4c0
>> [  205.823752]  [] ? __lock_is_held+0x75/0xc0
>> [  205.829868]  [] ? snb_uncore_imc_read_counter+0x42/0x50
>> [  205.837198]  [] ? uncore_perf_event_update+0xe2/0x160
>> [  205.844337]  [] kasan_report+0x39/0x40
>> [  205.850085]  [] ? snb_uncore_imc_event_del+0x6c/0xa0


If you pipe the report through
https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py
it will give you line numbers and inlined frames.

> The best I can tell this maps to:
>
> static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
> {
> struct intel_uncore_box *box = uncore_event_to_box(event);
> int i;
>
> snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
>
> for (i = 0; i < box->n_events; i++) {
 if (event == box->event_list[i]) {
> --box->n_events;
> break;
> }
> }
> }
>
> Can this code be right?  Does it actually remove the event?
> The similar code in
>
> static void uncore_pmu_event_del(struct perf_event *event, int flags)
>
> 
>
> for (i = 0; i < box->n_events; i++) {
> if (event == box->event_list[i]) {
> uncore_put_event_constraint(box, event);
>
> for (++i; i < box->n_events; i++)
> box->event_list[i - 1] = box->event_list[i];
>
> --box->n_events;
> break;
> }
> }
>
>
> seems like it is more likely to be correct.
>
> Vince


Re: perf: fuzzer KASAN slab-out-of-bounds in snb_uncore_imc_event_del

2016-11-14 Thread Dmitry Vyukov
On Tue, Nov 15, 2016 at 6:57 AM, Vince Weaver  wrote:
> On Mon, 14 Nov 2016, Vince Weaver wrote:
>
>> Anyway as per the suggestion at Linux Plumbers I enabled KASAN and on my
>> haswell machine it falls over in a few minutes of running the perf_fuzzer.
>>
>> [  205.740194] 
>> ==
>> [  205.748005] BUG: KASAN: slab-out-of-bounds in 
>> snb_uncore_imc_event_del+0x6c/0xa0 at addr 8800caa43768
>> [  205.758324] Read of size 8 by task perf_fuzzer/6618
>> [  205.763589] CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4
>> [  205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
>> 01/26/2014
>> [  205.778689]  8800c3c479b8 816bb796 88011ec00600 
>> 8800caa43580
>> [  205.786759]  8800c3c479e0 812fb961 8800c3c47a78 
>> 8800caa43580
>> [  205.794850]  8800caa43580 8800c3c47a68 812fbbd8 
>> 8800c3c47a28
>> [  205.802911] Call Trace:
>> [  205.805559]  [] dump_stack+0x63/0x8d
>> [  205.811135]  [] kasan_object_err+0x21/0x70
>> [  205.817267]  [] kasan_report_error+0x1d8/0x4c0
>> [  205.823752]  [] ? __lock_is_held+0x75/0xc0
>> [  205.829868]  [] ? snb_uncore_imc_read_counter+0x42/0x50
>> [  205.837198]  [] ? uncore_perf_event_update+0xe2/0x160
>> [  205.844337]  [] kasan_report+0x39/0x40
>> [  205.850085]  [] ? snb_uncore_imc_event_del+0x6c/0xa0


If you pipe the report through
https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py
it will give you line numbers and inlined frames.

> The best I can tell this maps to:
>
> static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
> {
> struct intel_uncore_box *box = uncore_event_to_box(event);
> int i;
>
> snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
>
> for (i = 0; i < box->n_events; i++) {
 if (event == box->event_list[i]) {
> --box->n_events;
> break;
> }
> }
> }
>
> Can this code be right?  Does it actually remove the event?
> The similar code in
>
> static void uncore_pmu_event_del(struct perf_event *event, int flags)
>
> 
>
> for (i = 0; i < box->n_events; i++) {
> if (event == box->event_list[i]) {
> uncore_put_event_constraint(box, event);
>
> for (++i; i < box->n_events; i++)
> box->event_list[i - 1] = box->event_list[i];
>
> --box->n_events;
> break;
> }
> }
>
>
> seems like it is more likely to be correct.
>
> Vince


  1   2   3   4   5   6   7   8   9   10   >