Re: [PATCH v7 16/24] i2c: allow adapter drivers to override the adapter locking

2016-05-03 Thread Wolfram Sang
> Yes, they look like reasonable complaints.

Thanks for fixing them. I just sent out my latest comments and when you
fix those and send V8, I'll apply that right away. I think we are safe
to fix the rest incrementally if needed. Note that I didn't review the
IIO and media patches, I trust the reviewers on those.

Thanks for your work on this! I need a break now, this is
mind-boggling...



Re: [PATCH v7 16/24] i2c: allow adapter drivers to override the adapter locking

2016-05-03 Thread Wolfram Sang
> Yes, they look like reasonable complaints.

Thanks for fixing them. I just sent out my latest comments and when you
fix those and send V8, I'll apply that right away. I think we are safe
to fix the rest incrementally if needed. Note that I didn't review the
IIO and media patches, I trust the reviewers on those.

Thanks for your work on this! I need a break now, this is
mind-boggling...



Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Linus Torvalds
On Tue, May 3, 2016 at 2:31 PM, Andy Lutomirski  wrote:
>
> Having actually read the erratum: how can this affect Linux at all
> under any scenario where user code hasn't already completely
> compromised the kernel?

If it matches purely on linear address, you will potentially have
interesting situations with people running in virtualized environments
and crashing programs in other virtual containers or the host.

I do agree that we likely don't care all that much (especially with
the lack of users in the first place).  But considering that the
trivial workaround is to make sure SMEP is enabled - which we want to
heavily encourage anyway - I also don't think there's any real
downside to just enforcing that workaround.

  Linus


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Linus Torvalds
On Tue, May 3, 2016 at 2:31 PM, Andy Lutomirski  wrote:
>
> Having actually read the erratum: how can this affect Linux at all
> under any scenario where user code hasn't already completely
> compromised the kernel?

If it matches purely on linear address, you will potentially have
interesting situations with people running in virtualized environments
and crashing programs in other virtual containers or the host.

I do agree that we likely don't care all that much (especially with
the lack of users in the first place).  But considering that the
trivial workaround is to make sure SMEP is enabled - which we want to
heavily encourage anyway - I also don't think there's any real
downside to just enforcing that workaround.

  Linus


Re: [PATCH v7 16/24] i2c: allow adapter drivers to override the adapter locking

2016-05-03 Thread Wolfram Sang
On Wed, Apr 20, 2016 at 05:17:56PM +0200, Peter Rosin wrote:
> Add i2c_lock_bus() and i2c_unlock_bus(), which call the new lock_bus and
> unlock_bus ops in the adapter. These funcs/ops take an additional flags
> argument that indicates for what purpose the adapter is locked.
> 
> There are two flags, I2C_LOCK_ADAPTER and I2C_LOCK_SEGMENT, but they are
> both implemented the same. For now. Locking the adapter means that the
> whole bus is locked, locking the segment means that only the current bus
> segment is locked (i.e. i2c traffic on the parent side of mux is still
> allowed even if the child side of the mux is locked.
> 
> Also support a trylock_bus op (but no function to call it, as it is not
> expected to be needed outside of the i2c core).
> 
> Implement i2c_lock_adapter/i2c_unlock_adapter in terms of the new locking
> scheme (i.e. lock with the I2C_LOCK_ADAPTER flag).
> 
> Annotate some of the locking with explicit I2C_LOCK_SEGMENT flags.

Can you explain a little why it is SEGMENT and not ADAPTER here? That
probably makes it easier to get into this patch.

And to double check my understanding: I was surprised to not see any
i2c_lock_adapter() or I2C_LOCK_ADAPTER in action. This is because muxes
call I2C_LOCK_SEGMENT on their parent which in case of the parent being
the root adapter is essentially the same as I2C_LOCK_ADAPTER. Correct?

> 
> Signed-off-by: Peter Rosin 
> ---
>  drivers/i2c/i2c-core.c | 46 --
>  include/linux/i2c.h| 28 ++--
>  2 files changed, 54 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index 0f2f8484e8ec..21f46d011c33 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -960,10 +960,12 @@ static int i2c_check_addr_busy(struct i2c_adapter 
> *adapter, int addr)
>  }
>  
>  /**
> - * i2c_lock_adapter - Get exclusive access to an I2C bus segment
> + * i2c_adapter_lock_bus - Get exclusive access to an I2C bus segment
>   * @adapter: Target I2C bus segment
> + * @flags: I2C_LOCK_ADAPTER locks the root i2c adapter, I2C_LOCK_SEGMENT
> + *   locks only this branch in the adapter tree
>   */

I think this kerneldoc should be moved to i2c_lock_adapter and/or
i2c_lock_bus() which are now in i2c.h. This is what users will use, not
this static, adapter-specific implementation. I think it is enough to
have a comment here explaining what is special in handling adapters.

Thanks,

   Wolfram


Re: [PATCH v7 16/24] i2c: allow adapter drivers to override the adapter locking

2016-05-03 Thread Wolfram Sang
On Wed, Apr 20, 2016 at 05:17:56PM +0200, Peter Rosin wrote:
> Add i2c_lock_bus() and i2c_unlock_bus(), which call the new lock_bus and
> unlock_bus ops in the adapter. These funcs/ops take an additional flags
> argument that indicates for what purpose the adapter is locked.
> 
> There are two flags, I2C_LOCK_ADAPTER and I2C_LOCK_SEGMENT, but they are
> both implemented the same. For now. Locking the adapter means that the
> whole bus is locked, locking the segment means that only the current bus
> segment is locked (i.e. i2c traffic on the parent side of mux is still
> allowed even if the child side of the mux is locked.
> 
> Also support a trylock_bus op (but no function to call it, as it is not
> expected to be needed outside of the i2c core).
> 
> Implement i2c_lock_adapter/i2c_unlock_adapter in terms of the new locking
> scheme (i.e. lock with the I2C_LOCK_ADAPTER flag).
> 
> Annotate some of the locking with explicit I2C_LOCK_SEGMENT flags.

Can you explain a little why it is SEGMENT and not ADAPTER here? That
probably makes it easier to get into this patch.

And to double check my understanding: I was surprised to not see any
i2c_lock_adapter() or I2C_LOCK_ADAPTER in action. This is because muxes
call I2C_LOCK_SEGMENT on their parent which in case of the parent being
the root adapter is essentially the same as I2C_LOCK_ADAPTER. Correct?

> 
> Signed-off-by: Peter Rosin 
> ---
>  drivers/i2c/i2c-core.c | 46 --
>  include/linux/i2c.h| 28 ++--
>  2 files changed, 54 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index 0f2f8484e8ec..21f46d011c33 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -960,10 +960,12 @@ static int i2c_check_addr_busy(struct i2c_adapter 
> *adapter, int addr)
>  }
>  
>  /**
> - * i2c_lock_adapter - Get exclusive access to an I2C bus segment
> + * i2c_adapter_lock_bus - Get exclusive access to an I2C bus segment
>   * @adapter: Target I2C bus segment
> + * @flags: I2C_LOCK_ADAPTER locks the root i2c adapter, I2C_LOCK_SEGMENT
> + *   locks only this branch in the adapter tree
>   */

I think this kerneldoc should be moved to i2c_lock_adapter and/or
i2c_lock_bus() which are now in i2c.h. This is what users will use, not
this static, adapter-specific implementation. I think it is enough to
have a comment here explaining what is special in handling adapters.

Thanks,

   Wolfram


Re: [PATCH 6/8] pci: provide sensible irq vector alloc/free routines

2016-05-03 Thread Bjorn Helgaas
On Tue, May 03, 2016 at 11:19:46PM +0200, Christoph Hellwig wrote:
> Hi Bjorn,
> 
> I've implemented your suggestion and I'm getting ready to send out
> a new version.  One thing that came to mind is:  do you prefer this
> code in irq.c or would you rather have it in msi.c?  While it
> also has a legacy irq fallback most of it tied pretty closely to
> the msi.c code, so I wonder if we should group them together.

Good question.  There isn't much in irq.c, and the interesting bits
are the MSI-related things, so maybe msi.c would make more sense.

Bjorn


Re: [PATCH 6/8] pci: provide sensible irq vector alloc/free routines

2016-05-03 Thread Bjorn Helgaas
On Tue, May 03, 2016 at 11:19:46PM +0200, Christoph Hellwig wrote:
> Hi Bjorn,
> 
> I've implemented your suggestion and I'm getting ready to send out
> a new version.  One thing that came to mind is:  do you prefer this
> code in irq.c or would you rather have it in msi.c?  While it
> also has a legacy irq fallback most of it tied pretty closely to
> the msi.c code, so I wonder if we should group them together.

Good question.  There isn't much in irq.c, and the interesting bits
are the MSI-related things, so maybe msi.c would make more sense.

Bjorn


Re: VDSO unmap and remap support for additional architectures

2016-05-03 Thread Christopher Covington
On 04/29/2016 09:55 AM, Dmitry Safonov wrote:
> On 04/29/2016 04:22 PM, Christopher Covington wrote:
>> On 04/28/2016 02:53 PM, Andy Lutomirski wrote:
>>> Also, at some point, possibly quite soon, x86 will want a way for
>>> user code to ask the kernel to map a specific vdso variant at a specific
>>> address. Could we perhaps add a new pair of syscalls:
>>>
>>> struct vdso_info {
>>>  unsigned long space_needed_before;
>>>  unsigned long space_needed_after;
>>>  unsigned long alignment;
>>> };
>>>
>>> long vdso_get_info(unsigned int vdso_type, struct vdso_info *info);
>>>
>>> long vdso_remap(unsigned int vdso_type, unsigned long addr, unsigned
>>> int flags);
>>>
>>> #define VDSO_X86_I386 0
>>> #define VDSO_X86_64 1
>>> #define VDSO_X86_X32 2
>>> // etc.
>>>
>>> vdso_remap will map the vdso of the chosen type such at
>>> AT_SYSINFO_EHDR lines up with addr. It will use up to
>>> space_needed_before bytes before that address and space_needed_after
>>> after than address. It will also unmap the old vdso (or maybe only do
>>> that if some flag is set).
>>>
>>> On x86, mremap is *not* sufficient for everything that's needed,
>>> because some programs will need to change the vdso type.
>> I don't I understand. Why can't people just exec() the ELF type that
>> corresponds to the VDSO they want?
> 
> I may say about my needs in it: to not lose all the existing
> information in application.
> Imagine you're restoring a container with 64-bit and 32-bit
> applications (in compatible mode). So you need somehow
> switch vdso type in restorer for a 32-bit application.
> Yes, you may exec() and then - all already restored application
> properties will got lost. You will need to transpher information
> about mappings, make protocol between restorer binary
> and main criu application, finally you'll end up with some
> really much more difficult architecture than it is now.
> And it'll be slower.

Perhaps a more modest exec based strategy would be for x86_64 criu to
handle all of the x86_64 restores as usual but exec i386 and/or x32 criu
service daemons and use them for restoring applications needing those ABIs.

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: VDSO unmap and remap support for additional architectures

2016-05-03 Thread Christopher Covington
On 04/29/2016 09:55 AM, Dmitry Safonov wrote:
> On 04/29/2016 04:22 PM, Christopher Covington wrote:
>> On 04/28/2016 02:53 PM, Andy Lutomirski wrote:
>>> Also, at some point, possibly quite soon, x86 will want a way for
>>> user code to ask the kernel to map a specific vdso variant at a specific
>>> address. Could we perhaps add a new pair of syscalls:
>>>
>>> struct vdso_info {
>>>  unsigned long space_needed_before;
>>>  unsigned long space_needed_after;
>>>  unsigned long alignment;
>>> };
>>>
>>> long vdso_get_info(unsigned int vdso_type, struct vdso_info *info);
>>>
>>> long vdso_remap(unsigned int vdso_type, unsigned long addr, unsigned
>>> int flags);
>>>
>>> #define VDSO_X86_I386 0
>>> #define VDSO_X86_64 1
>>> #define VDSO_X86_X32 2
>>> // etc.
>>>
>>> vdso_remap will map the vdso of the chosen type such at
>>> AT_SYSINFO_EHDR lines up with addr. It will use up to
>>> space_needed_before bytes before that address and space_needed_after
>>> after than address. It will also unmap the old vdso (or maybe only do
>>> that if some flag is set).
>>>
>>> On x86, mremap is *not* sufficient for everything that's needed,
>>> because some programs will need to change the vdso type.
>> I don't I understand. Why can't people just exec() the ELF type that
>> corresponds to the VDSO they want?
> 
> I may say about my needs in it: to not lose all the existing
> information in application.
> Imagine you're restoring a container with 64-bit and 32-bit
> applications (in compatible mode). So you need somehow
> switch vdso type in restorer for a 32-bit application.
> Yes, you may exec() and then - all already restored application
> properties will got lost. You will need to transpher information
> about mappings, make protocol between restorer binary
> and main criu application, finally you'll end up with some
> really much more difficult architecture than it is now.
> And it'll be slower.

Perhaps a more modest exec based strategy would be for x86_64 criu to
handle all of the x86_64 restores as usual but exec i386 and/or x32 criu
service daemons and use them for restoring applications needing those ABIs.

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [RFC PATCH] livepatch: allow removal of a disabled patch

2016-05-03 Thread Josh Poimboeuf
On Mon, May 02, 2016 at 01:57:22PM +0200, Miroslav Benes wrote:
> 1. Do we really need a completion? If I am not missing something
> kobject_del() always waits for sysfs callers to leave thanks to kernfs
> active protection.

What do you mean by "kernfs active protection"?  I see that
kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
that a write to a sysfs file uses that lock.

I'm probably missing something...

-- 
Josh


Re: [RFC PATCH] livepatch: allow removal of a disabled patch

2016-05-03 Thread Josh Poimboeuf
On Mon, May 02, 2016 at 01:57:22PM +0200, Miroslav Benes wrote:
> 1. Do we really need a completion? If I am not missing something
> kobject_del() always waits for sysfs callers to leave thanks to kernfs
> active protection.

What do you mean by "kernfs active protection"?  I see that
kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
that a write to a sysfs file uses that lock.

I'm probably missing something...

-- 
Josh


Re: [PATCH] fix infoleak in wireless

2016-05-03 Thread Greg Kroah-Hartman
On Tue, May 03, 2016 at 05:11:07PM -0400, Kangjie Lu wrote:
> Opps, I did not notice the patch is not attached.
> 
> From 34a82a734388d07eb10f91770f86938e38f7575a Mon Sep 17 00:00:00 2001
> From: Kangjie Lu 
> Date: Tue, 3 May 2016 14:15:18 -0400
> Subject: [PATCH] fix infoleak in wireless
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The 6-bytes array “mac_addr” is not initialized in the dump_station
> implementations of “drivers/staging/wilc1000/wilc_wfi_cfgoperations.c”
> and “drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c”, so all 6
> bytes may be leaked.
> 
> Signed-off-by: Kangjie Lu 
> ---
>  net/wireless/nl80211.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 056a730..2e92d14 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -3905,6 +3905,7 @@ static int nl80211_dump_station(struct sk_buff *skb,
>  
>   while (1) {
>   memset(, 0, sizeof(sinfo));
> + eth_zero_addr(mac_addr);
>   err = rdev_dump_station(rdev, wdev->netdev, sta_idx,
>   mac_addr, );
>   if (err == -ENOENT)

Patch is corrupted :(

Why not fix up the staging drivers, they are the real problem here,
which is what I think the networking maintainers were telling you to do.

thanks,

greg k-h


Re: [PATCH] fix infoleak in wireless

2016-05-03 Thread Greg Kroah-Hartman
On Tue, May 03, 2016 at 05:11:07PM -0400, Kangjie Lu wrote:
> Opps, I did not notice the patch is not attached.
> 
> From 34a82a734388d07eb10f91770f86938e38f7575a Mon Sep 17 00:00:00 2001
> From: Kangjie Lu 
> Date: Tue, 3 May 2016 14:15:18 -0400
> Subject: [PATCH] fix infoleak in wireless
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The 6-bytes array “mac_addr” is not initialized in the dump_station
> implementations of “drivers/staging/wilc1000/wilc_wfi_cfgoperations.c”
> and “drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c”, so all 6
> bytes may be leaked.
> 
> Signed-off-by: Kangjie Lu 
> ---
>  net/wireless/nl80211.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 056a730..2e92d14 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -3905,6 +3905,7 @@ static int nl80211_dump_station(struct sk_buff *skb,
>  
>   while (1) {
>   memset(, 0, sizeof(sinfo));
> + eth_zero_addr(mac_addr);
>   err = rdev_dump_station(rdev, wdev->netdev, sta_idx,
>   mac_addr, );
>   if (err == -ENOENT)

Patch is corrupted :(

Why not fix up the staging drivers, they are the real problem here,
which is what I think the networking maintainers were telling you to do.

thanks,

greg k-h


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Linus Torvalds
On Tue, May 3, 2016 at 2:28 PM, Dave Hansen  wrote:
>>
>> So we won't init MPX on those...
>
> Yes, and as long as such a processor doesn't exist today and never
> exists in the future or the folks that buy such a processor truly don't
> care about MPX, that's fine to do.  I'm just a bit nervous about the
> whole "never exists in the future" part.

I don't think we need to care.

If a CPU doesn't support SMEP, there really is no reason for us to
ever support MPX either.

It's not like there is a clamoring for MPX support in the first place
(have you ever heard of anybody actually asking for or using it?), and
quite frankly, it's a _lot_ more complicated from a CPU core side than
SMEP (which is trivial).

So if Intel ever releases a CPU with MPX but without SMEP, nobody will
ever care about the MPX part being useless.

 Linus


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Linus Torvalds
On Tue, May 3, 2016 at 2:28 PM, Dave Hansen  wrote:
>>
>> So we won't init MPX on those...
>
> Yes, and as long as such a processor doesn't exist today and never
> exists in the future or the folks that buy such a processor truly don't
> care about MPX, that's fine to do.  I'm just a bit nervous about the
> whole "never exists in the future" part.

I don't think we need to care.

If a CPU doesn't support SMEP, there really is no reason for us to
ever support MPX either.

It's not like there is a clamoring for MPX support in the first place
(have you ever heard of anybody actually asking for or using it?), and
quite frankly, it's a _lot_ more complicated from a CPU core side than
SMEP (which is trivial).

So if Intel ever releases a CPU with MPX but without SMEP, nobody will
ever care about the MPX part being useless.

 Linus


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Andy Lutomirski
On May 3, 2016 2:05 PM, "Dave Hansen"  wrote:
>
> On 05/02/2016 11:43 PM, Ingo Molnar wrote:
> >> > +static int is_mpx_affected_microarch(struct cpuinfo_x86 *c)
> >> > +{
> >> > +  /* Only family 6 is affected */
> >> > +  if (c->x86 != 0x6)
> >> > +  return 0;
> >> > +
> >> > +  /* We know these Atom models are unaffected, for sure */
> >> > +  switch (c->x86_model) {
> >> > +  case 0x5F: /* "Future Intel Atom ... Goldmont */
> >> > +  case 0x5C: /* "Future Intel Atom ... Goldmont */
> >> > +   return 0;
> >> > +  }
> >> > +  /*
> >> > +   * We will get here on future unknown processors and all
> >> > +   * Core/Xeons.  They might be unaffected Atoms or
> >> > +   * affected Core/Xeons. Be conservative and assume
> >> > +   * processor is affected.
> >> > +   *
> >> > +   * Once the complete list of Core/Xeon models is known
> >> > +   * it can be added here, and the Atom list removed.
> >> > +   */
> >> > +  return 1;
> > So instead of trying to sort out the erratum, could we not just generally 
> > make MPX
> > dependent on SMEP and be done with it? MPX is a sophisticated security 
> > feature,
> > and it makes little sense to not do SMEP if you have it available.
> >
> > Anyone who is absolutely desperate to disable SMEP while enabling MPX is 
> > free to
> > step in and make his case.
>
> My concern was not necessarily with folks booting with 'nosmep', but
> with processors that have MPX present and SMEP fused off (or made
> unavailable by a hypervisor) and which are unaffected by this issue.
>
> People would have to be very careful to never create a processor which
> did not have SMEP but did have MPX, since MPX would effectively be
> unusable on such a processor.
>
>

Having actually read the erratum: how can this affect Linux at all
under any scenario where user code hasn't already completely
compromised the kernel?

I.e. why do we care about this erratum?


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Andy Lutomirski
On May 3, 2016 2:05 PM, "Dave Hansen"  wrote:
>
> On 05/02/2016 11:43 PM, Ingo Molnar wrote:
> >> > +static int is_mpx_affected_microarch(struct cpuinfo_x86 *c)
> >> > +{
> >> > +  /* Only family 6 is affected */
> >> > +  if (c->x86 != 0x6)
> >> > +  return 0;
> >> > +
> >> > +  /* We know these Atom models are unaffected, for sure */
> >> > +  switch (c->x86_model) {
> >> > +  case 0x5F: /* "Future Intel Atom ... Goldmont */
> >> > +  case 0x5C: /* "Future Intel Atom ... Goldmont */
> >> > +   return 0;
> >> > +  }
> >> > +  /*
> >> > +   * We will get here on future unknown processors and all
> >> > +   * Core/Xeons.  They might be unaffected Atoms or
> >> > +   * affected Core/Xeons. Be conservative and assume
> >> > +   * processor is affected.
> >> > +   *
> >> > +   * Once the complete list of Core/Xeon models is known
> >> > +   * it can be added here, and the Atom list removed.
> >> > +   */
> >> > +  return 1;
> > So instead of trying to sort out the erratum, could we not just generally 
> > make MPX
> > dependent on SMEP and be done with it? MPX is a sophisticated security 
> > feature,
> > and it makes little sense to not do SMEP if you have it available.
> >
> > Anyone who is absolutely desperate to disable SMEP while enabling MPX is 
> > free to
> > step in and make his case.
>
> My concern was not necessarily with folks booting with 'nosmep', but
> with processors that have MPX present and SMEP fused off (or made
> unavailable by a hypervisor) and which are unaffected by this issue.
>
> People would have to be very careful to never create a processor which
> did not have SMEP but did have MPX, since MPX would effectively be
> unusable on such a processor.
>
>

Having actually read the erratum: how can this affect Linux at all
under any scenario where user code hasn't already completely
compromised the kernel?

I.e. why do we care about this erratum?


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Dave Hansen
On 05/03/2016 02:12 PM, Borislav Petkov wrote:
> On Tue, May 03, 2016 at 02:04:40PM -0700, Dave Hansen wrote:
>> My concern was not necessarily with folks booting with 'nosmep', but
> 
> Btw, does anything speak for even keeping that 'nosmep' thing?

Generally, I'm not sure we need the no$foo options at all.  There's
always "clearcpuid=" which does the same thing.  It just requires you to
go look up the X86_FEATURE_* bit first.

>> with processors that have MPX present and SMEP fused off (or made
>> unavailable by a hypervisor) and which are unaffected by this issue.
> 
> So we won't init MPX on those...

Yes, and as long as such a processor doesn't exist today and never
exists in the future or the folks that buy such a processor truly don't
care about MPX, that's fine to do.  I'm just a bit nervous about the
whole "never exists in the future" part.

>> People would have to be very careful to never create a processor which
>> did not have SMEP but did have MPX, since MPX would effectively be
>> unusable on such a processor.
> 
> We can disable that combination in qemu too, right?

What do you mean by disable?  Have qemu error out if MPX and SMEP aren't
disabled in concert with each other?


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Dave Hansen
On 05/03/2016 02:12 PM, Borislav Petkov wrote:
> On Tue, May 03, 2016 at 02:04:40PM -0700, Dave Hansen wrote:
>> My concern was not necessarily with folks booting with 'nosmep', but
> 
> Btw, does anything speak for even keeping that 'nosmep' thing?

Generally, I'm not sure we need the no$foo options at all.  There's
always "clearcpuid=" which does the same thing.  It just requires you to
go look up the X86_FEATURE_* bit first.

>> with processors that have MPX present and SMEP fused off (or made
>> unavailable by a hypervisor) and which are unaffected by this issue.
> 
> So we won't init MPX on those...

Yes, and as long as such a processor doesn't exist today and never
exists in the future or the folks that buy such a processor truly don't
care about MPX, that's fine to do.  I'm just a bit nervous about the
whole "never exists in the future" part.

>> People would have to be very careful to never create a processor which
>> did not have SMEP but did have MPX, since MPX would effectively be
>> unusable on such a processor.
> 
> We can disable that combination in qemu too, right?

What do you mean by disable?  Have qemu error out if MPX and SMEP aren't
disabled in concert with each other?


Re: [PATCH v7 1/6] Shared library support

2016-05-03 Thread Emese Revfy
On Tue, 3 May 2016 11:00:56 +0900
Masahiro Yamada  wrote:

Hi,

> # Compile .c file, create position independent .o file
> # host-cxxshobjs -> .o
> quiet_cmd_host-cxxshobjs = HOSTCXX -fPIC $@
>   cmd_host-cxxshobjs = $(HOSTCXX) $(hostcxx_flags) -fPIC -c -o $@ $<
>$(host-cxxshobjs): $(obj)/%.o: $(src)/%.c FORCE
>$(call if_changed_dep,host-cxxshobjs)
> 
> 
> 
> 
> We generally use HOSTCC to compile *.c files,
> and HOSTCXX to compile *.cc files.
> 
> 
> But, here, you mention to use HOSTCXX to compile .c files
> such as cyc_complexity_plugin.c, sancov_plugin.c, etc.
> 
> This is not straight-forward.  It is worthwhile to comment the reason.

I wrote a comment about it here:
https://github.com/ephox-gcc-plugins/gcc-plugins_linux-next/commit/74f6343a7f13c071e00c417332051e25f15009ea

-- 
Emese


Re: [PATCH v7 1/6] Shared library support

2016-05-03 Thread Emese Revfy
On Tue, 3 May 2016 11:00:56 +0900
Masahiro Yamada  wrote:

Hi,

> # Compile .c file, create position independent .o file
> # host-cxxshobjs -> .o
> quiet_cmd_host-cxxshobjs = HOSTCXX -fPIC $@
>   cmd_host-cxxshobjs = $(HOSTCXX) $(hostcxx_flags) -fPIC -c -o $@ $<
>$(host-cxxshobjs): $(obj)/%.o: $(src)/%.c FORCE
>$(call if_changed_dep,host-cxxshobjs)
> 
> 
> 
> 
> We generally use HOSTCC to compile *.c files,
> and HOSTCXX to compile *.cc files.
> 
> 
> But, here, you mention to use HOSTCXX to compile .c files
> such as cyc_complexity_plugin.c, sancov_plugin.c, etc.
> 
> This is not straight-forward.  It is worthwhile to comment the reason.

I wrote a comment about it here:
https://github.com/ephox-gcc-plugins/gcc-plugins_linux-next/commit/74f6343a7f13c071e00c417332051e25f15009ea

-- 
Emese


Re: [PATCH 6/8] pci: provide sensible irq vector alloc/free routines

2016-05-03 Thread Christoph Hellwig
Hi Bjorn,

I've implemented your suggestion and I'm getting ready to send out
a new version.  One thing that came to mind is:  do you prefer this
code in irq.c or would you rather have it in msi.c?  While it
also has a legacy irq fallback most of it tied pretty closely to
the msi.c code, so I wonder if we should group them together.


Re: [PATCH 6/8] pci: provide sensible irq vector alloc/free routines

2016-05-03 Thread Christoph Hellwig
Hi Bjorn,

I've implemented your suggestion and I'm getting ready to send out
a new version.  One thing that came to mind is:  do you prefer this
code in irq.c or would you rather have it in msi.c?  While it
also has a legacy irq fallback most of it tied pretty closely to
the msi.c code, so I wonder if we should group them together.


Re: [patch] usb: dwc3: gadget: fix mask and shift order in DWC3_DCFG_NUMP()

2016-05-03 Thread Greg Kroah-Hartman
On Tue, May 03, 2016 at 10:53:05AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Dan Carpenter  writes:
> > In the original DWC3_DCFG_NUMP() was always zero.  It looks like the
> > intent was to shift first and then do the mask.
> >
> > Fixes: 2a58f9c12bb3 ('usb: dwc3: gadget: disable automatic calculation of 
> > ACK TP NUMP')
> > Signed-off-by: Dan Carpenter 
> 
> Thanks for this fix, I completely missed it :-) Greg, if you want to
> take this as a patch, that's fine by me. Otherwise I can queue it for
> -rc1. In any case, S-o-B below:
> 
> Signed-off-by: Felipe Balbi 

I'll take it, thanks.

greg k-h


Re: [patch] usb: dwc3: gadget: fix mask and shift order in DWC3_DCFG_NUMP()

2016-05-03 Thread Greg Kroah-Hartman
On Tue, May 03, 2016 at 10:53:05AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Dan Carpenter  writes:
> > In the original DWC3_DCFG_NUMP() was always zero.  It looks like the
> > intent was to shift first and then do the mask.
> >
> > Fixes: 2a58f9c12bb3 ('usb: dwc3: gadget: disable automatic calculation of 
> > ACK TP NUMP')
> > Signed-off-by: Dan Carpenter 
> 
> Thanks for this fix, I completely missed it :-) Greg, if you want to
> take this as a patch, that's fine by me. Otherwise I can queue it for
> -rc1. In any case, S-o-B below:
> 
> Signed-off-by: Felipe Balbi 

I'll take it, thanks.

greg k-h


Re: [PATCH] usb: hub: fix panic caused by NULL bos pointer during reset device

2016-05-03 Thread Greg KH
On Wed, Apr 27, 2016 at 09:35:57AM -0400, Tony Battersby wrote:
> On 04/26/2016 10:53 PM, Du, Changbin wrote:
> >> On Tue, Mar 08, 2016 at 05:15:17PM +0800, changbin...@intel.com wrote:
> >>> From: "Du, Changbin" 
> >>>
> >>> This is a reworked patch based on reverted commit d8f00cd685f5 ("usb:
> >>> hub: do not clear BOS field during reset device").
> >>>
> >>> The privious one caused double mem-free if run to re_enumerate label.
> >>> New patch title changed to distinguish from old one. And I have tested
> >>> it with memory debugging options.
> >>>
> >>> In function usb_reset_and_verify_device, the old BOS descriptor may
> >>> still be used before allocating a new one. (usb_disable_lpm function
> >>> uses it under the situation that it fails at usb_disable_link_state.)
> >>> So we cannot set the udev->bos to NULL before that, just keep what it
> >>> was. It will be overwrite when allocating a new one.
> >>>
> >>> How to reproduce:
> >>> 1. connect one usb3 hub to xhci port.
> >>> 2. connect several lpm-capable super-speed usb disk to the hub.
> >>> 3. copy big files to the usb disks.
> >>> 4. disconnect the hub and repeat step 1-4.
> >>>
> >>> Crash log:
> >>> BUG: unable to handle kernel NULL pointer dereference at
> >>> 0010
> >>> IP: [] usb_enable_link_state+0x2d/0x2f0
> >>> Call Trace:
> >>> [] ? usb_set_lpm_timeout+0x12b/0x140
> >>> [] usb_enable_lpm+0x81/0xa0
> >>> [] usb_disable_lpm+0xa8/0xc0
> >>> [] usb_unlocked_disable_lpm+0x2c/0x50
> >>> [] usb_reset_and_verify_device+0xc3/0x710
> >>> [] ? usb_sg_wait+0x13d/0x190
> >>> [] usb_reset_device+0x133/0x280
> >>> [] usb_stor_port_reset+0x61/0x70
> >>> [] usb_stor_invoke_transport+0x88/0x520
> >>>
> >>> Signed-off-by: Du, Changbin 
> >>> ---
> >>>  drivers/usb/core/hub.c | 14 +-
> >>>  1 file changed, 9 insertions(+), 5 deletions(-)
> >> Is this patch still needed?  I thought we had some other fix in this
> >> area...
> >>
> >> confused,
> >>
> >> greg k-h
> >>
> > Hi, Greg k-h,
> > Sorry for it confused you. This patch still need. This is same fix with
> > previous commit d8f00cd685f5 ("usb: hub: do not clear BOS field
> > during reset device"). But d8f00cd685f5 is buggy and reverted. This
> > new patch should be the final fix.
> >
> > Best Regards,
> > Du, Changbin
> >
> 
> I think Greg is referring to commit 464ad8c43a9e ("usb: core : hub: Fix
> BOS 'NULL pointer' kernel panic"), which has already been applied
> upstream.  It looks to me like that patch might have fixed the same
> problem in a different way, in which case Changbin's patch is not
> needed.  But I haven't been involved in developing or testing that
> patch, so I can't say for sure.  At the very least, 464ad8c43a9e
> conflicts with Changbin's patch.
> 
> Changbin, can you take a look at 464ad8c43a9e and see if that fixes the
> same problem that your patch did?

Given the lack of response here, I've dropped this from my queue.  If
it's still needed, someone must resend it.

thanks,

greg k-h


Re: [PATCH v7 1/6] Shared library support

2016-05-03 Thread Emese Revfy
On Mon, 2 May 2016 14:03:00 +0900
Masahiro Yamada  wrote:

Hi,

> > diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
> > index 55c96cb..e4e88ab 100644
> > --- a/scripts/Makefile.clean
> > +++ b/scripts/Makefile.clean
> > @@ -38,7 +38,8 @@ subdir-ymn:= $(addprefix $(obj)/,$(subdir-ymn))
> >  __clean-files  := $(extra-y) $(extra-m) $(extra-)   \
> >$(always) $(targets) $(clean-files)   \
> >$(host-progs) \
> > -  $(hostprogs-y) $(hostprogs-m) $(hostprogs-)
> > +  $(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
> > +  $(hostlibs-y) $(hostlibs-m) $(hostlibs-)
> 
> $(hostcxxlibs-y)$(hostcxxlibs-m) is missing here.
> 
> 
> BTW, are you planning to support hostlibs without .so extention in the future?
> 
> 
> You are changing the top Makefile to clean with "*.so" pattern rule.
> 
> Do you need to change both the top Makefile and Makefile.clean
> for belt-and-braces cleaning?

It doesn't delete the *.so files without the hunk from the top Makefile. 
The mrproper and distclean targets remove them with this patch:
https://github.com/ephox-gcc-plugins/gcc-plugins_linux-next/commit/f23b7eb247ccc9f1007e53313af343980dd52591
But I think this isn't the best solution much like doing it from the top 
Makefile.

-- 
Emese


Re: [PATCH] usb: hub: fix panic caused by NULL bos pointer during reset device

2016-05-03 Thread Greg KH
On Wed, Apr 27, 2016 at 09:35:57AM -0400, Tony Battersby wrote:
> On 04/26/2016 10:53 PM, Du, Changbin wrote:
> >> On Tue, Mar 08, 2016 at 05:15:17PM +0800, changbin...@intel.com wrote:
> >>> From: "Du, Changbin" 
> >>>
> >>> This is a reworked patch based on reverted commit d8f00cd685f5 ("usb:
> >>> hub: do not clear BOS field during reset device").
> >>>
> >>> The privious one caused double mem-free if run to re_enumerate label.
> >>> New patch title changed to distinguish from old one. And I have tested
> >>> it with memory debugging options.
> >>>
> >>> In function usb_reset_and_verify_device, the old BOS descriptor may
> >>> still be used before allocating a new one. (usb_disable_lpm function
> >>> uses it under the situation that it fails at usb_disable_link_state.)
> >>> So we cannot set the udev->bos to NULL before that, just keep what it
> >>> was. It will be overwrite when allocating a new one.
> >>>
> >>> How to reproduce:
> >>> 1. connect one usb3 hub to xhci port.
> >>> 2. connect several lpm-capable super-speed usb disk to the hub.
> >>> 3. copy big files to the usb disks.
> >>> 4. disconnect the hub and repeat step 1-4.
> >>>
> >>> Crash log:
> >>> BUG: unable to handle kernel NULL pointer dereference at
> >>> 0010
> >>> IP: [] usb_enable_link_state+0x2d/0x2f0
> >>> Call Trace:
> >>> [] ? usb_set_lpm_timeout+0x12b/0x140
> >>> [] usb_enable_lpm+0x81/0xa0
> >>> [] usb_disable_lpm+0xa8/0xc0
> >>> [] usb_unlocked_disable_lpm+0x2c/0x50
> >>> [] usb_reset_and_verify_device+0xc3/0x710
> >>> [] ? usb_sg_wait+0x13d/0x190
> >>> [] usb_reset_device+0x133/0x280
> >>> [] usb_stor_port_reset+0x61/0x70
> >>> [] usb_stor_invoke_transport+0x88/0x520
> >>>
> >>> Signed-off-by: Du, Changbin 
> >>> ---
> >>>  drivers/usb/core/hub.c | 14 +-
> >>>  1 file changed, 9 insertions(+), 5 deletions(-)
> >> Is this patch still needed?  I thought we had some other fix in this
> >> area...
> >>
> >> confused,
> >>
> >> greg k-h
> >>
> > Hi, Greg k-h,
> > Sorry for it confused you. This patch still need. This is same fix with
> > previous commit d8f00cd685f5 ("usb: hub: do not clear BOS field
> > during reset device"). But d8f00cd685f5 is buggy and reverted. This
> > new patch should be the final fix.
> >
> > Best Regards,
> > Du, Changbin
> >
> 
> I think Greg is referring to commit 464ad8c43a9e ("usb: core : hub: Fix
> BOS 'NULL pointer' kernel panic"), which has already been applied
> upstream.  It looks to me like that patch might have fixed the same
> problem in a different way, in which case Changbin's patch is not
> needed.  But I haven't been involved in developing or testing that
> patch, so I can't say for sure.  At the very least, 464ad8c43a9e
> conflicts with Changbin's patch.
> 
> Changbin, can you take a look at 464ad8c43a9e and see if that fixes the
> same problem that your patch did?

Given the lack of response here, I've dropped this from my queue.  If
it's still needed, someone must resend it.

thanks,

greg k-h


Re: [PATCH v7 1/6] Shared library support

2016-05-03 Thread Emese Revfy
On Mon, 2 May 2016 14:03:00 +0900
Masahiro Yamada  wrote:

Hi,

> > diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
> > index 55c96cb..e4e88ab 100644
> > --- a/scripts/Makefile.clean
> > +++ b/scripts/Makefile.clean
> > @@ -38,7 +38,8 @@ subdir-ymn:= $(addprefix $(obj)/,$(subdir-ymn))
> >  __clean-files  := $(extra-y) $(extra-m) $(extra-)   \
> >$(always) $(targets) $(clean-files)   \
> >$(host-progs) \
> > -  $(hostprogs-y) $(hostprogs-m) $(hostprogs-)
> > +  $(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
> > +  $(hostlibs-y) $(hostlibs-m) $(hostlibs-)
> 
> $(hostcxxlibs-y)$(hostcxxlibs-m) is missing here.
> 
> 
> BTW, are you planning to support hostlibs without .so extention in the future?
> 
> 
> You are changing the top Makefile to clean with "*.so" pattern rule.
> 
> Do you need to change both the top Makefile and Makefile.clean
> for belt-and-braces cleaning?

It doesn't delete the *.so files without the hunk from the top Makefile. 
The mrproper and distclean targets remove them with this patch:
https://github.com/ephox-gcc-plugins/gcc-plugins_linux-next/commit/f23b7eb247ccc9f1007e53313af343980dd52591
But I think this isn't the best solution much like doing it from the top 
Makefile.

-- 
Emese


Re: [PATCH] perf/sdt: Directly record cached SDT events

2016-05-03 Thread Hemant Kumar



On 05/03/2016 06:05 AM, Masami Hiramatsu wrote:

On Tue, 03 May 2016 05:06:24 +0530
Hemant Kumar  wrote:


Hi Masami,

On 04/30/2016 06:06 PM, Masami Hiramatsu wrote:

Hi Hemant,

On Fri, 29 Apr 2016 19:10:41 +0530
Hemant Kumar  wrote:


This patch adds support for directly recording SDT events which are
present in the probe cache. This patch is based on current SDT
enablement patchset (v5) by Masami :
https://lkml.org/lkml/2016/4/27/828
and it implements two points in the TODO list mentioned in the
cover note :
"- (perf record) Support SDT event recording directly"
"- (perf record) Try to unregister SDT events after record."

Without this patch, we could probe into SDT events using
"perf probe" and "perf record". With this patch, we can probe
the SDT events directly using "perf record".

Thanks! However, before looking over each part of this patch,
I think this is not enough for supporting SDT for perf record.

Hmm.


If there are several SDTs which have same eventname but differnt
addresses (e.g. libc:memory_memalign_retry), how are those handled?
Currently, to support this, we'll need to enable those events
in different names, or just pick one of them. It could confuse
users in each case.

Right. But now, its the same case with a binary having multiple
symbols with same names, isn't it?

Yes, but for the symbols or lines etc., user can not directly specify
it via perf record. And as you showed below, perf-probe expresses
there are 2 events on the probe point. So user is forced to aware of it.


Right.


# nm ./multi | grep foo
00400530 t foo
00400560 t foo

# perf probe -x ./multi foo
Added new events:
probe_multi:foo  (on foo in /home/hemant/work/linux/tools/perf/multi)
probe_multi:foo_1(on foo in /home/hemant/work/linux/tools/perf/multi)

You can now use it in all perf tools, such as:

  perf record -e probe_multi:foo_1 -aR sleep 1


My point being, the user can still know, if its shown that there are two or
more probes being placed and the o/p of perf report/script shows that
the probes are placed at two or more different addresses.

Not only the different address, but also they will see the different
event names. That may be no good for making a script on it.

My point is, if the user only uses "perf record -e sdt_something:sdtevent",
they will think that there is one event recorded. it can easily misleading
them.


Ok. Makes sense. With a warning message then, we can make the user
aware in this case.


To solve this issue, we need to introduce multiple SDTs on single
ftrace event. Please read my comment on v3 patch 
(https://lkml.org/lkml/2015/8/15/52)

Ok. But, I think, for initial direct recording support, we can go with
this IMHO.

So, at least this should be noticed to users carefully. (e.g. warn if
there are more than two SDTs defined)


Ok. I have made the changes and also added a warning message if the
user tries to record on an sdt event, which has multiple occurences with
the same event and group name. I have sent a v2 for this patch.

--
Thanks,
Hemant Kumar



Re: [PATCH] perf/sdt: Directly record cached SDT events

2016-05-03 Thread Hemant Kumar



On 05/03/2016 06:05 AM, Masami Hiramatsu wrote:

On Tue, 03 May 2016 05:06:24 +0530
Hemant Kumar  wrote:


Hi Masami,

On 04/30/2016 06:06 PM, Masami Hiramatsu wrote:

Hi Hemant,

On Fri, 29 Apr 2016 19:10:41 +0530
Hemant Kumar  wrote:


This patch adds support for directly recording SDT events which are
present in the probe cache. This patch is based on current SDT
enablement patchset (v5) by Masami :
https://lkml.org/lkml/2016/4/27/828
and it implements two points in the TODO list mentioned in the
cover note :
"- (perf record) Support SDT event recording directly"
"- (perf record) Try to unregister SDT events after record."

Without this patch, we could probe into SDT events using
"perf probe" and "perf record". With this patch, we can probe
the SDT events directly using "perf record".

Thanks! However, before looking over each part of this patch,
I think this is not enough for supporting SDT for perf record.

Hmm.


If there are several SDTs which have same eventname but differnt
addresses (e.g. libc:memory_memalign_retry), how are those handled?
Currently, to support this, we'll need to enable those events
in different names, or just pick one of them. It could confuse
users in each case.

Right. But now, its the same case with a binary having multiple
symbols with same names, isn't it?

Yes, but for the symbols or lines etc., user can not directly specify
it via perf record. And as you showed below, perf-probe expresses
there are 2 events on the probe point. So user is forced to aware of it.


Right.


# nm ./multi | grep foo
00400530 t foo
00400560 t foo

# perf probe -x ./multi foo
Added new events:
probe_multi:foo  (on foo in /home/hemant/work/linux/tools/perf/multi)
probe_multi:foo_1(on foo in /home/hemant/work/linux/tools/perf/multi)

You can now use it in all perf tools, such as:

  perf record -e probe_multi:foo_1 -aR sleep 1


My point being, the user can still know, if its shown that there are two or
more probes being placed and the o/p of perf report/script shows that
the probes are placed at two or more different addresses.

Not only the different address, but also they will see the different
event names. That may be no good for making a script on it.

My point is, if the user only uses "perf record -e sdt_something:sdtevent",
they will think that there is one event recorded. it can easily misleading
them.


Ok. Makes sense. With a warning message then, we can make the user
aware in this case.


To solve this issue, we need to introduce multiple SDTs on single
ftrace event. Please read my comment on v3 patch 
(https://lkml.org/lkml/2015/8/15/52)

Ok. But, I think, for initial direct recording support, we can go with
this IMHO.

So, at least this should be noticed to users carefully. (e.g. warn if
there are more than two SDTs defined)


Ok. I have made the changes and also added a warning message if the
user tries to record on an sdt event, which has multiple occurences with
the same event and group name. I have sent a v2 for this patch.

--
Thanks,
Hemant Kumar



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between # #, 
I will try to explain my intentions. Feel free to point out any flaws:


if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet 
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an Ethernet 
frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header 
word will be in a later URB buffer #


# offset is an index to the expected next 32-bit header word 
after the end of the Ethernet frame #

offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);

# rx->header contains the expected 32-bit header value 
corrected for Endianness and alignment #

rx->header = get_unaligned_le32(skb->data + offset);
offset = 0;

# check the data integrity of the size value from the header word #
size = (u16)(rx->header & 0x7ff);
# if the size value fails the integrity check then we are not 
looking at a valid header word so #

# synchronisation has been lost #
if (size != ((~rx->header >> 16) & 0x7ff)) {
netdev_err(dev->net, "asix_rx_fixup() Data Header 
synchronisation was lost, remaining %d\n",

   rx->remaining);
if (rx->ax_skb) {
kfree_skb(rx->ax_skb);
rx->ax_skb = NULL;
/* Discard the incomplete netdev Ethernet frame
 * and assume the Data header is at the start of
 * the current URB socket buffer.
 */
}
rx->remaining = 0;
}
}




So that change made no sense to me, but I don't have significant 
kernel dev experience. Effectively it will drop/truncate every 
(2047/2048) split (longer than an URB) packet, and report an error for 
the second URB and then again for treating said second URB as a first 
URB for a packet. I would expect your problems will go away just 
removing the second change. You could also change the != to == in "if 
(size != ...)" but then you'd still have 1/2048 (depending on data 
patterns) false positives.
The code only runs when the Ethernet frame spans across URBs and is 
checking that the next 32-bit header word is present and valid.


Upon loss of synchronisation, the strategy is to assume that the 32-bit 
header is at the start of the URB buffer. Obviously, that might not be 
true every time but it is the most likely location especially when 
Ethernet frames are not spanning URBs at that point at time.


Looking at the error messages:


[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4 


The offset 4 means that the 32-bit header word was invalid at the start 
of the URB buffer. This could be a consequence of data synchronisation 
being lost however, we would expect the timestamps between the error 
messages of "synchronisation was lost" and "Bad Header Length" to very 
close as they would be consecutive URBs. The evidence is showing 10ms 
gaps which does not suggest consecutive URBs. In other words, an 
Ethernet frame should not be spanned over a time gap of 10ms as that 
would be very inefficient. If that were true then there would be USB 
communications problem with the USB to Ethernet adaptor which I hope is 
not true.


[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988

This error message consistently shows the remaining value to be 988, at 
least for the 3 examples provided by John. This does not suggest a 
random failure unless there are other examples of a non 988 remaining 
value error message. 988 is well within a Ethernet frame 

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread Dean Jenkins

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between # #, 
I will try to explain my intentions. Feel free to point out any flaws:


if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet 
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an Ethernet 
frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header 
word will be in a later URB buffer #


# offset is an index to the expected next 32-bit header word 
after the end of the Ethernet frame #

offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);

# rx->header contains the expected 32-bit header value 
corrected for Endianness and alignment #

rx->header = get_unaligned_le32(skb->data + offset);
offset = 0;

# check the data integrity of the size value from the header word #
size = (u16)(rx->header & 0x7ff);
# if the size value fails the integrity check then we are not 
looking at a valid header word so #

# synchronisation has been lost #
if (size != ((~rx->header >> 16) & 0x7ff)) {
netdev_err(dev->net, "asix_rx_fixup() Data Header 
synchronisation was lost, remaining %d\n",

   rx->remaining);
if (rx->ax_skb) {
kfree_skb(rx->ax_skb);
rx->ax_skb = NULL;
/* Discard the incomplete netdev Ethernet frame
 * and assume the Data header is at the start of
 * the current URB socket buffer.
 */
}
rx->remaining = 0;
}
}




So that change made no sense to me, but I don't have significant 
kernel dev experience. Effectively it will drop/truncate every 
(2047/2048) split (longer than an URB) packet, and report an error for 
the second URB and then again for treating said second URB as a first 
URB for a packet. I would expect your problems will go away just 
removing the second change. You could also change the != to == in "if 
(size != ...)" but then you'd still have 1/2048 (depending on data 
patterns) false positives.
The code only runs when the Ethernet frame spans across URBs and is 
checking that the next 32-bit header word is present and valid.


Upon loss of synchronisation, the strategy is to assume that the 32-bit 
header is at the start of the URB buffer. Obviously, that might not be 
true every time but it is the most likely location especially when 
Ethernet frames are not spanning URBs at that point at time.


Looking at the error messages:


[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4 


The offset 4 means that the 32-bit header word was invalid at the start 
of the URB buffer. This could be a consequence of data synchronisation 
being lost however, we would expect the timestamps between the error 
messages of "synchronisation was lost" and "Bad Header Length" to very 
close as they would be consecutive URBs. The evidence is showing 10ms 
gaps which does not suggest consecutive URBs. In other words, an 
Ethernet frame should not be spanned over a time gap of 10ms as that 
would be very inefficient. If that were true then there would be USB 
communications problem with the USB to Ethernet adaptor which I hope is 
not true.


[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988

This error message consistently shows the remaining value to be 988, at 
least for the 3 examples provided by John. This does not suggest a 
random failure unless there are other examples of a non 988 remaining 
value error message. 988 is well within a Ethernet frame 

[PATCH v2] perf/sdt: Directly record SDT events

2016-05-03 Thread Hemant Kumar
This patch adds support for directly recording SDT events which are
present in the probe cache. This patch is based on current SDT
enablement patchset (v5) by Masami :
https://lkml.org/lkml/2016/4/27/828
and it implements two points in the TODO list mentioned in the
cover note :
"- (perf record) Support SDT event recording directly"
"- (perf record) Try to unregister SDT events after record."

Without this patch, we could probe into SDT events using
"perf probe" and "perf record". With this patch, we can probe
the SDT events directly using "perf record".

For example :

 # perf list sdt   // List the SDT events
  ...
sdt_mysql:update__row__done[SDT event]
sdt_mysql:update__row__start   [SDT event]
sdt_mysql:update__start[SDT event]
sdt_python:function__entry [SDT event]
sdt_python:function__return[SDT event]
sdt_test:marker1   [SDT event]
sdt_test:marker2   [SDT event]
  ...

 # perf record -e %sdt_test:marker1 -e %sdt_test:marker2 -a
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.087 MB perf.data (22 samples) ]

 # perf script 
test_sdt 29230 [002] 405550.548017: sdt_test:marker1: (400534)
test_sdt 29230 [002] 405550.548064: sdt_test:marker2: (40053f)
test_sdt 29231 [002] 405550.962806: sdt_test:marker1: (400534)
test_sdt 29231 [002] 405550.962841: sdt_test:marker2: (40053f)
test_sdt 29232 [001] 405551.379327: sdt_test:marker1: (400534)
   ...

Recording on SDT events with same provider and marker names is also
supported :
 # readelf -n ./test_sdt | grep test
  Provider: test
  Provider: test
  Provider: test
 # readelf -n ./test_sdt | grep -A2 test
  Provider: test
  Name: marker1
  Location: 0x0040053e, Base: 0x00400607, Semaphore: 
0x
  --
  Provider: test
  Name: marker1
  Location: 0x00400545, Base: 0x00400607, Semaphore: 
0x
  --
  Provider: test
  Name: marker1
  Location: 0x00400550, Base: 0x00400607, Semaphore: 
0x

 # perf record -e %sdt_test:marker1 -a
  Warning : Recording on 3 occurences of sdt_test:marker1
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.208 MB perf.data (12 samples) ]

 # perf script
  ...
test_sdt  1445 [001] 578689.792146: sdt_test:marker1_1: (400545)
test_sdt  1445 [001] 578689.792168: sdt_test:marker1_2: (400550)
test_sdt  1445 [001] 578689.792170:   sdt_test:marker1: (40053e)
test_sdt  1454 [000] 578690.436117: sdt_test:marker1_1: (400545)
  ...


After invoking "perf record", behind the scenes, it checks whether the
event specified is an SDT event using the flag '%'. After that, it
does a lookup of the probe cache to find out the SDT event. If its not
present, it throws an error. Otherwise, it goes on and writes the event
into the uprobe_events file and sets up the probe event, trace events,
etc and starts recording. It also maintains a list of the event names
that were written to uprobe_events file. After finishing the record
session, it removes the events from the uprobe_events file using the
maintained name list.

Signed-off-by: Hemant Kumar 
---
Changes since v1:
- Added support for recording on multiple SDT events with same names.
- Added support to show warning if multiple SDT events with same names are 
present.
- Used del_perf_probe_events() to delete the SDT events after recording is done.
- Moved function remove_sdt_event_list() to util/probe-event.c
- Added a function parse_sdt_event() to parse an SDT event from a string.
- Used find_cached_events_all() to find SDT events.
- Used group:event as a strfilter to delete the SDT events from the 
uprobe_events file.
 (Thanks Masami for the above suggestions!)

 tools/perf/builtin-record.c| 21 +
 tools/perf/perf.h  |  2 +
 tools/perf/util/parse-events.c | 53 +-
 tools/perf/util/parse-events.h |  1 +
 tools/perf/util/probe-event.c  | 40 -
 tools/perf/util/probe-event.h  |  4 ++
 tools/perf/util/probe-file.c   | 99 ++
 tools/perf/util/probe-file.h   |  8 
 8 files changed, 226 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 515510e..104eafe 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -34,6 +34,7 @@
 #include "util/llvm-utils.h"
 #include "util/bpf-loader.h"
 #include "asm/bug.h"
+#include "util/probe-file.h"
 
 #include 
 #include 
@@ -56,6 +57,7 @@ struct record {
boolno_buildid_cache_set;
boolbuildid_all;
unsigned long long 

[PATCH v2] perf/sdt: Directly record SDT events

2016-05-03 Thread Hemant Kumar
This patch adds support for directly recording SDT events which are
present in the probe cache. This patch is based on current SDT
enablement patchset (v5) by Masami :
https://lkml.org/lkml/2016/4/27/828
and it implements two points in the TODO list mentioned in the
cover note :
"- (perf record) Support SDT event recording directly"
"- (perf record) Try to unregister SDT events after record."

Without this patch, we could probe into SDT events using
"perf probe" and "perf record". With this patch, we can probe
the SDT events directly using "perf record".

For example :

 # perf list sdt   // List the SDT events
  ...
sdt_mysql:update__row__done[SDT event]
sdt_mysql:update__row__start   [SDT event]
sdt_mysql:update__start[SDT event]
sdt_python:function__entry [SDT event]
sdt_python:function__return[SDT event]
sdt_test:marker1   [SDT event]
sdt_test:marker2   [SDT event]
  ...

 # perf record -e %sdt_test:marker1 -e %sdt_test:marker2 -a
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.087 MB perf.data (22 samples) ]

 # perf script 
test_sdt 29230 [002] 405550.548017: sdt_test:marker1: (400534)
test_sdt 29230 [002] 405550.548064: sdt_test:marker2: (40053f)
test_sdt 29231 [002] 405550.962806: sdt_test:marker1: (400534)
test_sdt 29231 [002] 405550.962841: sdt_test:marker2: (40053f)
test_sdt 29232 [001] 405551.379327: sdt_test:marker1: (400534)
   ...

Recording on SDT events with same provider and marker names is also
supported :
 # readelf -n ./test_sdt | grep test
  Provider: test
  Provider: test
  Provider: test
 # readelf -n ./test_sdt | grep -A2 test
  Provider: test
  Name: marker1
  Location: 0x0040053e, Base: 0x00400607, Semaphore: 
0x
  --
  Provider: test
  Name: marker1
  Location: 0x00400545, Base: 0x00400607, Semaphore: 
0x
  --
  Provider: test
  Name: marker1
  Location: 0x00400550, Base: 0x00400607, Semaphore: 
0x

 # perf record -e %sdt_test:marker1 -a
  Warning : Recording on 3 occurences of sdt_test:marker1
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.208 MB perf.data (12 samples) ]

 # perf script
  ...
test_sdt  1445 [001] 578689.792146: sdt_test:marker1_1: (400545)
test_sdt  1445 [001] 578689.792168: sdt_test:marker1_2: (400550)
test_sdt  1445 [001] 578689.792170:   sdt_test:marker1: (40053e)
test_sdt  1454 [000] 578690.436117: sdt_test:marker1_1: (400545)
  ...


After invoking "perf record", behind the scenes, it checks whether the
event specified is an SDT event using the flag '%'. After that, it
does a lookup of the probe cache to find out the SDT event. If its not
present, it throws an error. Otherwise, it goes on and writes the event
into the uprobe_events file and sets up the probe event, trace events,
etc and starts recording. It also maintains a list of the event names
that were written to uprobe_events file. After finishing the record
session, it removes the events from the uprobe_events file using the
maintained name list.

Signed-off-by: Hemant Kumar 
---
Changes since v1:
- Added support for recording on multiple SDT events with same names.
- Added support to show warning if multiple SDT events with same names are 
present.
- Used del_perf_probe_events() to delete the SDT events after recording is done.
- Moved function remove_sdt_event_list() to util/probe-event.c
- Added a function parse_sdt_event() to parse an SDT event from a string.
- Used find_cached_events_all() to find SDT events.
- Used group:event as a strfilter to delete the SDT events from the 
uprobe_events file.
 (Thanks Masami for the above suggestions!)

 tools/perf/builtin-record.c| 21 +
 tools/perf/perf.h  |  2 +
 tools/perf/util/parse-events.c | 53 +-
 tools/perf/util/parse-events.h |  1 +
 tools/perf/util/probe-event.c  | 40 -
 tools/perf/util/probe-event.h  |  4 ++
 tools/perf/util/probe-file.c   | 99 ++
 tools/perf/util/probe-file.h   |  8 
 8 files changed, 226 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 515510e..104eafe 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -34,6 +34,7 @@
 #include "util/llvm-utils.h"
 #include "util/bpf-loader.h"
 #include "asm/bug.h"
+#include "util/probe-file.h"
 
 #include 
 #include 
@@ -56,6 +57,7 @@ struct record {
boolno_buildid_cache_set;
boolbuildid_all;
unsigned long long  samples;
+   

Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Borislav Petkov
On Tue, May 03, 2016 at 02:04:40PM -0700, Dave Hansen wrote:
> My concern was not necessarily with folks booting with 'nosmep', but

Btw, does anything speak for even keeping that 'nosmep' thing?

> with processors that have MPX present and SMEP fused off (or made
> unavailable by a hypervisor) and which are unaffected by this issue.

So we won't init MPX on those...

> People would have to be very careful to never create a processor which
> did not have SMEP but did have MPX, since MPX would effectively be
> unusable on such a processor.

We can disable that combination in qemu too, right?

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Borislav Petkov
On Tue, May 03, 2016 at 02:04:40PM -0700, Dave Hansen wrote:
> My concern was not necessarily with folks booting with 'nosmep', but

Btw, does anything speak for even keeping that 'nosmep' thing?

> with processors that have MPX present and SMEP fused off (or made
> unavailable by a hypervisor) and which are unaffected by this issue.

So we won't init MPX on those...

> People would have to be very careful to never create a processor which
> did not have SMEP but did have MPX, since MPX would effectively be
> unusable on such a processor.

We can disable that combination in qemu too, right?

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


[PATCH 7/7] mm: Batch unmapping of pages that are in swap cache

2016-05-03 Thread Tim Chen
We created a new function __remove_swap_mapping_batch that
allows all pages under the same swap partition to be removed
from the swap cache's mapping in a single acquisition
of the mapping's tree lock.  This reduces the contention
on the lock when multiple threads are reclaiming
memory by swapping to the same swap partition.

The handle_pgout_batch function is updated so all the
pages under the same swap partition are unmapped together
when the have been paged out.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 426 
 1 file changed, 286 insertions(+), 140 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9fc04e1..5e4b8ce 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -690,6 +690,103 @@ cannot_free:
    return 0;
 }
 
+/* use this only for swap mapped pages */
+static void __remove_swap_mapping_batch(struct page *pages[],
+   bool reclaimed, short ret[], int nr)
+{
+   unsigned long flags;
+   struct page *page;
+   swp_entry_t swap[SWAP_BATCH];
+   struct address_space *mapping;
+
+   int i, batch_size;
+
+   if (nr <= 0)
+   return;
+
+   while (nr) {
+   mapping = page_mapping(pages[0]);
+   BUG_ON(!mapping);
+
+   batch_size = min(nr, SWAP_BATCH);
+
+   spin_lock_irqsave(>tree_lock, flags);
+   for (i = 0; i < batch_size; ++i) {
+   page = pages[i];
+
+   BUG_ON(!PageLocked(page));
+   BUG_ON(!PageSwapCache(page));
+   BUG_ON(mapping != page_mapping(page));
+
+   /* stop batching if mapping changes */
+   if (mapping != page_mapping(page)) {
+   batch_size = i;
+   break;
+   }
+   /*
+    * The non racy check for a busy page.
+    *
+    * Must be careful with the order of the tests. When 
someone has
+    * a ref to the page, it may be possible that they 
dirty it then
+    * drop the reference. So if PageDirty is tested before 
page_count
+    * here, then the following race may occur:
+    *
+    * get_user_pages();
+    * [user mapping goes away]
+    * write_to(page);
+    *  !PageDirty(page)
[good]
+    * SetPageDirty(page);
+    * put_page(page);
+    *  !page_count(page)   
[good, discard it]
+    *
+    * [oops, our write_to data is lost]
+    *
+    * Reversing the order of the tests ensures such a 
situation cannot
+    * escape unnoticed. The smp_rmb is needed to ensure 
the page->flags
+    * load is not satisfied before that of page->_count.
+    *
+    * Note that if SetPageDirty is always performed via 
set_page_dirty,
+    * and thus under tree_lock, then this ordering is not 
required.
+    */
+   if (!page_ref_freeze(page, 2))
+   goto cannot_free;
+   /* note: atomic_cmpxchg in page_freeze_refs provides 
the smp_rmb */
+   if (unlikely(PageDirty(page))) {
+   page_ref_unfreeze(page, 2);
+   goto cannot_free;
+   }
+
+   swap[i].val = page_private(page);
+   __delete_from_swap_cache(page);
+
+   ret[i] = 1;
+   continue;
+
+cannot_free:
+   ret[i] = 0;
+   }
+   spin_unlock_irqrestore(>tree_lock, flags);
+
+   /* need to keep irq off for mem_cgroup accounting, don't 
restore flags yet  */
+   local_irq_disable();
+   for (i = 0; i < batch_size; ++i) {
+   if (ret[i]) {
+   page = pages[i];
+   mem_cgroup_swapout(page, swap[i]);
+   }
+   }
+   local_irq_enable();
+
+   for (i = 0; i < batch_size; ++i) {
+   if (ret[i])
+   swapcache_free(swap[i]);
+   }
+   /* advance to next batch */
+   pages += batch_size;
+   ret += batch_size;
+   nr -= batch_size;
+   }
+}
 /*
  * Attempt to detach a locked page from its ->mapping.  If it is dirty or if
  * someone else has a 

[PATCH 7/7] mm: Batch unmapping of pages that are in swap cache

2016-05-03 Thread Tim Chen
We created a new function __remove_swap_mapping_batch that
allows all pages under the same swap partition to be removed
from the swap cache's mapping in a single acquisition
of the mapping's tree lock.  This reduces the contention
on the lock when multiple threads are reclaiming
memory by swapping to the same swap partition.

The handle_pgout_batch function is updated so all the
pages under the same swap partition are unmapped together
when the have been paged out.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 426 
 1 file changed, 286 insertions(+), 140 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9fc04e1..5e4b8ce 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -690,6 +690,103 @@ cannot_free:
    return 0;
 }
 
+/* use this only for swap mapped pages */
+static void __remove_swap_mapping_batch(struct page *pages[],
+   bool reclaimed, short ret[], int nr)
+{
+   unsigned long flags;
+   struct page *page;
+   swp_entry_t swap[SWAP_BATCH];
+   struct address_space *mapping;
+
+   int i, batch_size;
+
+   if (nr <= 0)
+   return;
+
+   while (nr) {
+   mapping = page_mapping(pages[0]);
+   BUG_ON(!mapping);
+
+   batch_size = min(nr, SWAP_BATCH);
+
+   spin_lock_irqsave(>tree_lock, flags);
+   for (i = 0; i < batch_size; ++i) {
+   page = pages[i];
+
+   BUG_ON(!PageLocked(page));
+   BUG_ON(!PageSwapCache(page));
+   BUG_ON(mapping != page_mapping(page));
+
+   /* stop batching if mapping changes */
+   if (mapping != page_mapping(page)) {
+   batch_size = i;
+   break;
+   }
+   /*
+    * The non racy check for a busy page.
+    *
+    * Must be careful with the order of the tests. When 
someone has
+    * a ref to the page, it may be possible that they 
dirty it then
+    * drop the reference. So if PageDirty is tested before 
page_count
+    * here, then the following race may occur:
+    *
+    * get_user_pages();
+    * [user mapping goes away]
+    * write_to(page);
+    *  !PageDirty(page)
[good]
+    * SetPageDirty(page);
+    * put_page(page);
+    *  !page_count(page)   
[good, discard it]
+    *
+    * [oops, our write_to data is lost]
+    *
+    * Reversing the order of the tests ensures such a 
situation cannot
+    * escape unnoticed. The smp_rmb is needed to ensure 
the page->flags
+    * load is not satisfied before that of page->_count.
+    *
+    * Note that if SetPageDirty is always performed via 
set_page_dirty,
+    * and thus under tree_lock, then this ordering is not 
required.
+    */
+   if (!page_ref_freeze(page, 2))
+   goto cannot_free;
+   /* note: atomic_cmpxchg in page_freeze_refs provides 
the smp_rmb */
+   if (unlikely(PageDirty(page))) {
+   page_ref_unfreeze(page, 2);
+   goto cannot_free;
+   }
+
+   swap[i].val = page_private(page);
+   __delete_from_swap_cache(page);
+
+   ret[i] = 1;
+   continue;
+
+cannot_free:
+   ret[i] = 0;
+   }
+   spin_unlock_irqrestore(>tree_lock, flags);
+
+   /* need to keep irq off for mem_cgroup accounting, don't 
restore flags yet  */
+   local_irq_disable();
+   for (i = 0; i < batch_size; ++i) {
+   if (ret[i]) {
+   page = pages[i];
+   mem_cgroup_swapout(page, swap[i]);
+   }
+   }
+   local_irq_enable();
+
+   for (i = 0; i < batch_size; ++i) {
+   if (ret[i])
+   swapcache_free(swap[i]);
+   }
+   /* advance to next batch */
+   pages += batch_size;
+   ret += batch_size;
+   nr -= batch_size;
+   }
+}
 /*
  * Attempt to detach a locked page from its ->mapping.  If it is dirty or if
  * someone else has a ref on the page, abort and 

Re: [PATCH] Staging: dgnc: fix coding style in dgnc_tty.c

2016-05-03 Thread Greg KH
On Tue, May 03, 2016 at 08:37:27PM +0200, Patryk Mezydlo wrote:
> Fix checkpatch.pl warning about 'line over 80 characters'.
> I just split line with function.
> 
> Signed-off-by: Patryk Mezydlo 
> ---
>  drivers/staging/dgnc/dgnc_tty.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/dgnc/dgnc_tty.c b/drivers/staging/dgnc/dgnc_tty.c
> index 9aa0ba0..b818b0a 100644
> --- a/drivers/staging/dgnc/dgnc_tty.c
> +++ b/drivers/staging/dgnc/dgnc_tty.c
> @@ -186,7 +186,8 @@ int dgnc_tty_register(struct dgnc_board *brd)
>   if (IS_ERR(brd->serial_driver))
>   return PTR_ERR(brd->serial_driver);
>  
> - snprintf(brd->serial_name, MAXTTYNAMELEN, "tty_dgnc_%d_", 
> brd->boardnum);
> + snprintf(brd->serial_name, MAXTTYNAMELEN,
> + "tty_dgnc_%d_", brd->boardnum);

Why move the string to a new line?  It didn't need to be there, did it?



Re: [PATCH] Staging: dgnc: fix coding style in dgnc_tty.c

2016-05-03 Thread Greg KH
On Tue, May 03, 2016 at 08:37:27PM +0200, Patryk Mezydlo wrote:
> Fix checkpatch.pl warning about 'line over 80 characters'.
> I just split line with function.
> 
> Signed-off-by: Patryk Mezydlo 
> ---
>  drivers/staging/dgnc/dgnc_tty.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/dgnc/dgnc_tty.c b/drivers/staging/dgnc/dgnc_tty.c
> index 9aa0ba0..b818b0a 100644
> --- a/drivers/staging/dgnc/dgnc_tty.c
> +++ b/drivers/staging/dgnc/dgnc_tty.c
> @@ -186,7 +186,8 @@ int dgnc_tty_register(struct dgnc_board *brd)
>   if (IS_ERR(brd->serial_driver))
>   return PTR_ERR(brd->serial_driver);
>  
> - snprintf(brd->serial_name, MAXTTYNAMELEN, "tty_dgnc_%d_", 
> brd->boardnum);
> + snprintf(brd->serial_name, MAXTTYNAMELEN,
> + "tty_dgnc_%d_", brd->boardnum);

Why move the string to a new line?  It didn't need to be there, did it?



Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Dave Hansen
On 05/02/2016 11:43 PM, Ingo Molnar wrote:
>> > +static int is_mpx_affected_microarch(struct cpuinfo_x86 *c)
>> > +{
>> > +  /* Only family 6 is affected */
>> > +  if (c->x86 != 0x6)
>> > +  return 0;
>> > +
>> > +  /* We know these Atom models are unaffected, for sure */
>> > +  switch (c->x86_model) {
>> > +  case 0x5F: /* "Future Intel Atom ... Goldmont */
>> > +  case 0x5C: /* "Future Intel Atom ... Goldmont */
>> > +   return 0;
>> > +  }
>> > +  /*
>> > +   * We will get here on future unknown processors and all
>> > +   * Core/Xeons.  They might be unaffected Atoms or
>> > +   * affected Core/Xeons. Be conservative and assume
>> > +   * processor is affected.
>> > +   *
>> > +   * Once the complete list of Core/Xeon models is known
>> > +   * it can be added here, and the Atom list removed.
>> > +   */
>> > +  return 1;
> So instead of trying to sort out the erratum, could we not just generally 
> make MPX 
> dependent on SMEP and be done with it? MPX is a sophisticated security 
> feature, 
> and it makes little sense to not do SMEP if you have it available.
> 
> Anyone who is absolutely desperate to disable SMEP while enabling MPX is free 
> to 
> step in and make his case.

My concern was not necessarily with folks booting with 'nosmep', but
with processors that have MPX present and SMEP fused off (or made
unavailable by a hypervisor) and which are unaffected by this issue.

People would have to be very careful to never create a processor which
did not have SMEP but did have MPX, since MPX would effectively be
unusable on such a processor.




Re: [PATCH] [RFC] x86: work around MPX Erratum

2016-05-03 Thread Dave Hansen
On 05/02/2016 11:43 PM, Ingo Molnar wrote:
>> > +static int is_mpx_affected_microarch(struct cpuinfo_x86 *c)
>> > +{
>> > +  /* Only family 6 is affected */
>> > +  if (c->x86 != 0x6)
>> > +  return 0;
>> > +
>> > +  /* We know these Atom models are unaffected, for sure */
>> > +  switch (c->x86_model) {
>> > +  case 0x5F: /* "Future Intel Atom ... Goldmont */
>> > +  case 0x5C: /* "Future Intel Atom ... Goldmont */
>> > +   return 0;
>> > +  }
>> > +  /*
>> > +   * We will get here on future unknown processors and all
>> > +   * Core/Xeons.  They might be unaffected Atoms or
>> > +   * affected Core/Xeons. Be conservative and assume
>> > +   * processor is affected.
>> > +   *
>> > +   * Once the complete list of Core/Xeon models is known
>> > +   * it can be added here, and the Atom list removed.
>> > +   */
>> > +  return 1;
> So instead of trying to sort out the erratum, could we not just generally 
> make MPX 
> dependent on SMEP and be done with it? MPX is a sophisticated security 
> feature, 
> and it makes little sense to not do SMEP if you have it available.
> 
> Anyone who is absolutely desperate to disable SMEP while enabling MPX is free 
> to 
> step in and make his case.

My concern was not necessarily with folks booting with 'nosmep', but
with processors that have MPX present and SMEP fused off (or made
unavailable by a hypervisor) and which are unaffected by this issue.

People would have to be very careful to never create a processor which
did not have SMEP but did have MPX, since MPX would effectively be
unusable on such a processor.




[PATCH 6/7] mm: Cleanup - Reorganize code to group handling of page

2016-05-03 Thread Tim Chen
In this patch, we reorganize the paging operations so the paging
operations of pages to the same swap device can be grouped together.
This prepares for the next patch that remove multiple pages from
the same swap cache together once they have been paged out.

The patch creates a new function handle_pgout_batch that takes
the code of handle_pgout and put a loop around handle_pgout code for
multiple pages.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 338 +++-
 1 file changed, 196 insertions(+), 142 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fab61f1..9fc04e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -884,154 +884,218 @@ enum pg_result {
    PG_UNKNOWN,
 };
 
-static enum pg_result handle_pgout(struct list_head *page_list,
+static void handle_pgout_batch(struct list_head *page_list,
    struct zone *zone,
    struct scan_control *sc,
    enum ttu_flags ttu_flags,
    enum page_references references,
    bool may_enter_fs,
    bool lazyfree,
-   int  *swap_ret,
-   struct page *page)
+   struct page *pages[],
+   int  swap_ret[],
+   int ret[],
+   int nr)
 {
    struct address_space *mapping;
+   struct page *page;
+   int i;
 
-   mapping =  page_mapping(page);
+   for (i = 0; i < nr; ++i) {
+   page = pages[i];
+   mapping =  page_mapping(page);
 
-   /*
-    * The page is mapped into the page tables of one or more
-    * processes. Try to unmap it here.
-    */
-   if (page_mapped(page) && mapping) {
-   switch (*swap_ret = try_to_unmap(page, lazyfree ?
-   (ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
-   (ttu_flags | TTU_BATCH_FLUSH))) {
-   case SWAP_FAIL:
-   return PG_ACTIVATE_LOCKED;
-   case SWAP_AGAIN:
-   return PG_KEEP_LOCKED;
-   case SWAP_MLOCK:
-   return PG_MLOCKED;
-   case SWAP_LZFREE:
-   goto lazyfree;
-   case SWAP_SUCCESS:
-   ; /* try to free the page below */
+   /* check outcome of cache addition */
+   if (!ret[i]) {
+   ret[i] = PG_ACTIVATE_LOCKED;
+   continue;
    }
-   }
-
-   if (PageDirty(page)) {
    /*
-    * Only kswapd can writeback filesystem pages to
-    * avoid risk of stack overflow but only writeback
-    * if many dirty pages have been encountered.
+    * The page is mapped into the page tables of one or more
+    * processes. Try to unmap it here.
     */
-   if (page_is_file_cache(page) &&
-   (!current_is_kswapd() ||
-    !test_bit(ZONE_DIRTY, >flags))) {
+   if (page_mapped(page) && mapping) {
+   switch (swap_ret[i] = try_to_unmap(page, lazyfree ?
+   (ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
+   (ttu_flags | TTU_BATCH_FLUSH))) {
+   case SWAP_FAIL:
+   ret[i] = PG_ACTIVATE_LOCKED;
+   continue;
+   case SWAP_AGAIN:
+   ret[i] = PG_KEEP_LOCKED;
+   continue;
+   case SWAP_MLOCK:
+   ret[i] = PG_MLOCKED;
+   continue;
+   case SWAP_LZFREE:
+   goto lazyfree;
+   case SWAP_SUCCESS:
+   ; /* try to free the page below */
+   }
+   }
+
+   if (PageDirty(page)) {
    /*
-    * Immediately reclaim when written back.
-    * Similar in principal to deactivate_page()
-    * except we already have the page isolated
-    * and know it's dirty
+    * Only kswapd can writeback filesystem pages to
+    * avoid risk of stack overflow but only writeback
+    * if many dirty pages have been encountered.
     */
-   inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
-   SetPageReclaim(page);
-
-   return PG_KEEP_LOCKED;
-   }
+   if (page_is_file_cache(page) &&
+   (!current_is_kswapd() ||
+    !test_bit(ZONE_DIRTY, >flags))) {
+   /*
+    * Immediately reclaim when written back.
+ 

[PATCH 3/7] mm: Add new functions to allocate swap slots in batches

2016-05-03 Thread Tim Chen
Currently, the swap slots have to be allocated one page at a time,
causing contention to the swap_info lock protecting the swap partition
on every page being swapped.

This patch adds new functions get_swap_pages and scan_swap_map_slots to
request multiple swap slots at once. This will reduce the lock contention
on the swap_info lock as we only need to acquire the lock once to get
multiple slots.  Also scan_swap_map_slots can operate more efficiently
as swap slots often occurs in clusters close to each other on a swap
device and it is quicker to allocate them together.

Multiple swap slots can also be freed in one shot with new function
swapcache_free_entries, that further reduce contention on the swap_info
lock.

Signed-off-by: Tim Chen 
---
 include/linux/swap.h |  27 +--
 mm/swap_state.c  |  23 +++---
 mm/swapfile.c| 215 +--
 mm/vmscan.c  |   2 +-
 4 files changed, 228 insertions(+), 39 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2b83359..da6d994 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -23,6 +23,7 @@ struct bio;
 #define SWAP_FLAG_DISCARD  0x1 /* enable discard for swap */
 #define SWAP_FLAG_DISCARD_ONCE 0x2 /* discard swap area at swapon-time */
 #define SWAP_FLAG_DISCARD_PAGES 0x4 /* discard page-clusters after use */
+#define SWAP_BATCH 64
 
 #define SWAP_FLAGS_VALID   (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \
     SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \
@@ -370,7 +371,8 @@ extern struct address_space swapper_spaces[];
 #define swap_address_space(entry) (_spaces[swp_type(entry)])
 extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
-extern int add_to_swap(struct page *, struct list_head *list);
+extern int add_to_swap(struct page *, struct list_head *list,
+   swp_entry_t *entry);
 extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
 extern void __delete_from_swap_cache(struct page *);
@@ -403,6 +405,7 @@ static inline long get_nr_swap_pages(void)
 
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
+extern int get_swap_pages(int n, swp_entry_t swp_entries[]);
 extern swp_entry_t get_swap_page_of_type(int);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
@@ -410,6 +413,7 @@ extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t);
+extern void swapcache_free_entries(swp_entry_t *entries, int n);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
@@ -429,7 +433,6 @@ struct backing_dev_info;
 #define total_swap_pages   0L
 #define total_swapcache_pages()0UL
 #define vm_swap_full() 0
-
 #define si_swapinfo(val) \
    do { (val)->freeswap = (val)->totalswap = 0; } while (0)
 /* only sparc can not include linux/pagemap.h in this file
@@ -451,6 +454,21 @@ static inline int add_swap_count_continuation(swp_entry_t 
swp, gfp_t gfp_mask)
    return 0;
 }
 
+static inline int add_to_swap(struct page *page, struct list_head *list,
+   swp_entry_t *entry)
+{
+   return 0;
+}
+
+static inline int get_swap_pages(int n, swp_entry_t swp_entries[])
+{
+   return 0;
+}
+
+static inline void swapcache_free_entries(swp_entry_t *entries, int n)
+{
+}
+
 static inline void swap_shmem_alloc(swp_entry_t swp)
 {
 }
@@ -484,11 +502,6 @@ static inline struct page *lookup_swap_cache(swp_entry_t 
swp)
    return NULL;
 }
 
-static inline int add_to_swap(struct page *page, struct list_head *list)
-{
-   return 0;
-}
-
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
    gfp_t gfp_mask)
 {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 366ce35..bad02c1 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -154,30 +154,35 @@ void __delete_from_swap_cache(struct page *page)
 /**
  * add_to_swap - allocate swap space for a page
  * @page: page we want to move to swap
+ * @entry: swap entry that we have pre-allocated
  *
  * Allocate swap space for the page and add the page to the
  * swap cache.  Caller needs to hold the page lock. 
  */
-int add_to_swap(struct page *page, struct list_head *list)
+int add_to_swap(struct page *page, struct list_head *list, swp_entry_t *entry)
 {
-   swp_entry_t entry;
    int err;
+   swp_entry_t ent;
 
    VM_BUG_ON_PAGE(!PageLocked(page), page);
    VM_BUG_ON_PAGE(!PageUptodate(page), page);
 
-   entry = get_swap_page();
-  

[PATCH 6/7] mm: Cleanup - Reorganize code to group handling of page

2016-05-03 Thread Tim Chen
In this patch, we reorganize the paging operations so the paging
operations of pages to the same swap device can be grouped together.
This prepares for the next patch that remove multiple pages from
the same swap cache together once they have been paged out.

The patch creates a new function handle_pgout_batch that takes
the code of handle_pgout and put a loop around handle_pgout code for
multiple pages.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 338 +++-
 1 file changed, 196 insertions(+), 142 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fab61f1..9fc04e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -884,154 +884,218 @@ enum pg_result {
    PG_UNKNOWN,
 };
 
-static enum pg_result handle_pgout(struct list_head *page_list,
+static void handle_pgout_batch(struct list_head *page_list,
    struct zone *zone,
    struct scan_control *sc,
    enum ttu_flags ttu_flags,
    enum page_references references,
    bool may_enter_fs,
    bool lazyfree,
-   int  *swap_ret,
-   struct page *page)
+   struct page *pages[],
+   int  swap_ret[],
+   int ret[],
+   int nr)
 {
    struct address_space *mapping;
+   struct page *page;
+   int i;
 
-   mapping =  page_mapping(page);
+   for (i = 0; i < nr; ++i) {
+   page = pages[i];
+   mapping =  page_mapping(page);
 
-   /*
-    * The page is mapped into the page tables of one or more
-    * processes. Try to unmap it here.
-    */
-   if (page_mapped(page) && mapping) {
-   switch (*swap_ret = try_to_unmap(page, lazyfree ?
-   (ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
-   (ttu_flags | TTU_BATCH_FLUSH))) {
-   case SWAP_FAIL:
-   return PG_ACTIVATE_LOCKED;
-   case SWAP_AGAIN:
-   return PG_KEEP_LOCKED;
-   case SWAP_MLOCK:
-   return PG_MLOCKED;
-   case SWAP_LZFREE:
-   goto lazyfree;
-   case SWAP_SUCCESS:
-   ; /* try to free the page below */
+   /* check outcome of cache addition */
+   if (!ret[i]) {
+   ret[i] = PG_ACTIVATE_LOCKED;
+   continue;
    }
-   }
-
-   if (PageDirty(page)) {
    /*
-    * Only kswapd can writeback filesystem pages to
-    * avoid risk of stack overflow but only writeback
-    * if many dirty pages have been encountered.
+    * The page is mapped into the page tables of one or more
+    * processes. Try to unmap it here.
     */
-   if (page_is_file_cache(page) &&
-   (!current_is_kswapd() ||
-    !test_bit(ZONE_DIRTY, >flags))) {
+   if (page_mapped(page) && mapping) {
+   switch (swap_ret[i] = try_to_unmap(page, lazyfree ?
+   (ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
+   (ttu_flags | TTU_BATCH_FLUSH))) {
+   case SWAP_FAIL:
+   ret[i] = PG_ACTIVATE_LOCKED;
+   continue;
+   case SWAP_AGAIN:
+   ret[i] = PG_KEEP_LOCKED;
+   continue;
+   case SWAP_MLOCK:
+   ret[i] = PG_MLOCKED;
+   continue;
+   case SWAP_LZFREE:
+   goto lazyfree;
+   case SWAP_SUCCESS:
+   ; /* try to free the page below */
+   }
+   }
+
+   if (PageDirty(page)) {
    /*
-    * Immediately reclaim when written back.
-    * Similar in principal to deactivate_page()
-    * except we already have the page isolated
-    * and know it's dirty
+    * Only kswapd can writeback filesystem pages to
+    * avoid risk of stack overflow but only writeback
+    * if many dirty pages have been encountered.
     */
-   inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
-   SetPageReclaim(page);
-
-   return PG_KEEP_LOCKED;
-   }
+   if (page_is_file_cache(page) &&
+   (!current_is_kswapd() ||
+    !test_bit(ZONE_DIRTY, >flags))) {
+   /*
+    * Immediately reclaim when written back.
+    

[PATCH 3/7] mm: Add new functions to allocate swap slots in batches

2016-05-03 Thread Tim Chen
Currently, the swap slots have to be allocated one page at a time,
causing contention to the swap_info lock protecting the swap partition
on every page being swapped.

This patch adds new functions get_swap_pages and scan_swap_map_slots to
request multiple swap slots at once. This will reduce the lock contention
on the swap_info lock as we only need to acquire the lock once to get
multiple slots.  Also scan_swap_map_slots can operate more efficiently
as swap slots often occurs in clusters close to each other on a swap
device and it is quicker to allocate them together.

Multiple swap slots can also be freed in one shot with new function
swapcache_free_entries, that further reduce contention on the swap_info
lock.

Signed-off-by: Tim Chen 
---
 include/linux/swap.h |  27 +--
 mm/swap_state.c  |  23 +++---
 mm/swapfile.c| 215 +--
 mm/vmscan.c  |   2 +-
 4 files changed, 228 insertions(+), 39 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2b83359..da6d994 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -23,6 +23,7 @@ struct bio;
 #define SWAP_FLAG_DISCARD  0x1 /* enable discard for swap */
 #define SWAP_FLAG_DISCARD_ONCE 0x2 /* discard swap area at swapon-time */
 #define SWAP_FLAG_DISCARD_PAGES 0x4 /* discard page-clusters after use */
+#define SWAP_BATCH 64
 
 #define SWAP_FLAGS_VALID   (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \
     SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \
@@ -370,7 +371,8 @@ extern struct address_space swapper_spaces[];
 #define swap_address_space(entry) (_spaces[swp_type(entry)])
 extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
-extern int add_to_swap(struct page *, struct list_head *list);
+extern int add_to_swap(struct page *, struct list_head *list,
+   swp_entry_t *entry);
 extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
 extern void __delete_from_swap_cache(struct page *);
@@ -403,6 +405,7 @@ static inline long get_nr_swap_pages(void)
 
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
+extern int get_swap_pages(int n, swp_entry_t swp_entries[]);
 extern swp_entry_t get_swap_page_of_type(int);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
@@ -410,6 +413,7 @@ extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t);
+extern void swapcache_free_entries(swp_entry_t *entries, int n);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
@@ -429,7 +433,6 @@ struct backing_dev_info;
 #define total_swap_pages   0L
 #define total_swapcache_pages()0UL
 #define vm_swap_full() 0
-
 #define si_swapinfo(val) \
    do { (val)->freeswap = (val)->totalswap = 0; } while (0)
 /* only sparc can not include linux/pagemap.h in this file
@@ -451,6 +454,21 @@ static inline int add_swap_count_continuation(swp_entry_t 
swp, gfp_t gfp_mask)
    return 0;
 }
 
+static inline int add_to_swap(struct page *page, struct list_head *list,
+   swp_entry_t *entry)
+{
+   return 0;
+}
+
+static inline int get_swap_pages(int n, swp_entry_t swp_entries[])
+{
+   return 0;
+}
+
+static inline void swapcache_free_entries(swp_entry_t *entries, int n)
+{
+}
+
 static inline void swap_shmem_alloc(swp_entry_t swp)
 {
 }
@@ -484,11 +502,6 @@ static inline struct page *lookup_swap_cache(swp_entry_t 
swp)
    return NULL;
 }
 
-static inline int add_to_swap(struct page *page, struct list_head *list)
-{
-   return 0;
-}
-
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
    gfp_t gfp_mask)
 {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 366ce35..bad02c1 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -154,30 +154,35 @@ void __delete_from_swap_cache(struct page *page)
 /**
  * add_to_swap - allocate swap space for a page
  * @page: page we want to move to swap
+ * @entry: swap entry that we have pre-allocated
  *
  * Allocate swap space for the page and add the page to the
  * swap cache.  Caller needs to hold the page lock. 
  */
-int add_to_swap(struct page *page, struct list_head *list)
+int add_to_swap(struct page *page, struct list_head *list, swp_entry_t *entry)
 {
-   swp_entry_t entry;
    int err;
+   swp_entry_t ent;
 
    VM_BUG_ON_PAGE(!PageLocked(page), page);
    VM_BUG_ON_PAGE(!PageUptodate(page), page);
 
-   entry = get_swap_page();
-   if (!entry.val)
+  

[PATCH 5/7] mm: Batch addtion of pages to swap cache

2016-05-03 Thread Tim Chen
When a page is to be swapped, it needed to be added to the swap cache
and then removed after the paging has been completed.  A swap partition's
mapping tree lock is acquired for each anonymous page's addition to the
swap cache.

This patch created new functions add_to_swap_batch and
__add_to_swap_cache_batch that allows multiple pages destinied for the
same swap partition to be added to that swap partition's swap cache in
one acquisition of the mapping tree lock.  These functions extend the
original add_to_swap and __add_to_swap_cache. This reduces the contention
of the swap partition's mapping tree lock when we are actively reclaiming
memory and swapping pages

Signed-off-by: Tim Chen 
---
 include/linux/swap.h |   2 +
 mm/swap_state.c  | 248 +--
 mm/vmscan.c  |  19 ++--
 3 files changed, 196 insertions(+), 73 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index da6d994..cd06f2a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -373,6 +373,8 @@ extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *, struct list_head *list,
    swp_entry_t *entry);
+extern void add_to_swap_batch(struct page *pages[], struct list_head *list,
+   swp_entry_t entries[], int ret_codes[], int nr);
 extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
 extern void __delete_from_swap_cache(struct page *);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bad02c1..ce02024 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -72,49 +72,94 @@ void show_swap_cache_info(void)
    printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
 }
 
-/*
- * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
- * but sets SwapCache flag and private instead of mapping and index.
- */
-int __add_to_swap_cache(struct page *page, swp_entry_t entry)
+void __add_to_swap_cache_batch(struct page *pages[], swp_entry_t entries[],
+   int ret[], int nr)
 {
-   int error;
+   int error, i;
    struct address_space *address_space;
+   struct address_space *prev;
+   struct page *page;
+   swp_entry_t entry;
 
-   VM_BUG_ON_PAGE(!PageLocked(page), page);
-   VM_BUG_ON_PAGE(PageSwapCache(page), page);
-   VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
+   prev = NULL;
+   address_space = NULL;
+   for (i = 0; i < nr; ++i) {
+   /* error at pre-processing stage, swap entry already released */
+   if (ret[i] == -ENOENT)
+   continue;
 
-   get_page(page);
-   SetPageSwapCache(page);
-   set_page_private(page, entry.val);
+   page = pages[i];
+   entry = entries[i];
 
-   address_space = swap_address_space(entry);
-   spin_lock_irq(_space->tree_lock);
-   error = radix_tree_insert(_space->page_tree,
-   entry.val, page);
-   if (likely(!error)) {
-   address_space->nrpages++;
-   __inc_zone_page_state(page, NR_FILE_PAGES);
-   INC_CACHE_INFO(add_total);
-   }
-   spin_unlock_irq(_space->tree_lock);
+   VM_BUG_ON_PAGE(!PageLocked(page), page);
+   VM_BUG_ON_PAGE(PageSwapCache(page), page);
+   VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
-   if (unlikely(error)) {
-   /*
-    * Only the context which have set SWAP_HAS_CACHE flag
-    * would call add_to_swap_cache().
-    * So add_to_swap_cache() doesn't returns -EEXIST.
-    */
-   VM_BUG_ON(error == -EEXIST);
-   set_page_private(page, 0UL);
-   ClearPageSwapCache(page);
-   put_page(page);
+   get_page(page);
+   SetPageSwapCache(page);
+   set_page_private(page, entry.val);
+
+   address_space = swap_address_space(entry);
+   if (prev != address_space) {
+   if (prev)
+   spin_unlock_irq(>tree_lock);
+   spin_lock_irq(_space->tree_lock);
+   }
+   error = radix_tree_insert(_space->page_tree,
+   entry.val, page);
+   if (likely(!error)) {
+   address_space->nrpages++;
+   __inc_zone_page_state(page, NR_FILE_PAGES);
+   INC_CACHE_INFO(add_total);
+   }
+
+   if (unlikely(error)) {
+   spin_unlock_irq(_space->tree_lock);
+   address_space = NULL;
+   /*
+    * Only the context which have set SWAP_HAS_CACHE 

[PATCH 5/7] mm: Batch addtion of pages to swap cache

2016-05-03 Thread Tim Chen
When a page is to be swapped, it needed to be added to the swap cache
and then removed after the paging has been completed.  A swap partition's
mapping tree lock is acquired for each anonymous page's addition to the
swap cache.

This patch created new functions add_to_swap_batch and
__add_to_swap_cache_batch that allows multiple pages destinied for the
same swap partition to be added to that swap partition's swap cache in
one acquisition of the mapping tree lock.  These functions extend the
original add_to_swap and __add_to_swap_cache. This reduces the contention
of the swap partition's mapping tree lock when we are actively reclaiming
memory and swapping pages

Signed-off-by: Tim Chen 
---
 include/linux/swap.h |   2 +
 mm/swap_state.c  | 248 +--
 mm/vmscan.c  |  19 ++--
 3 files changed, 196 insertions(+), 73 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index da6d994..cd06f2a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -373,6 +373,8 @@ extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *, struct list_head *list,
    swp_entry_t *entry);
+extern void add_to_swap_batch(struct page *pages[], struct list_head *list,
+   swp_entry_t entries[], int ret_codes[], int nr);
 extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
 extern void __delete_from_swap_cache(struct page *);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bad02c1..ce02024 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -72,49 +72,94 @@ void show_swap_cache_info(void)
    printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
 }
 
-/*
- * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
- * but sets SwapCache flag and private instead of mapping and index.
- */
-int __add_to_swap_cache(struct page *page, swp_entry_t entry)
+void __add_to_swap_cache_batch(struct page *pages[], swp_entry_t entries[],
+   int ret[], int nr)
 {
-   int error;
+   int error, i;
    struct address_space *address_space;
+   struct address_space *prev;
+   struct page *page;
+   swp_entry_t entry;
 
-   VM_BUG_ON_PAGE(!PageLocked(page), page);
-   VM_BUG_ON_PAGE(PageSwapCache(page), page);
-   VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
+   prev = NULL;
+   address_space = NULL;
+   for (i = 0; i < nr; ++i) {
+   /* error at pre-processing stage, swap entry already released */
+   if (ret[i] == -ENOENT)
+   continue;
 
-   get_page(page);
-   SetPageSwapCache(page);
-   set_page_private(page, entry.val);
+   page = pages[i];
+   entry = entries[i];
 
-   address_space = swap_address_space(entry);
-   spin_lock_irq(_space->tree_lock);
-   error = radix_tree_insert(_space->page_tree,
-   entry.val, page);
-   if (likely(!error)) {
-   address_space->nrpages++;
-   __inc_zone_page_state(page, NR_FILE_PAGES);
-   INC_CACHE_INFO(add_total);
-   }
-   spin_unlock_irq(_space->tree_lock);
+   VM_BUG_ON_PAGE(!PageLocked(page), page);
+   VM_BUG_ON_PAGE(PageSwapCache(page), page);
+   VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
-   if (unlikely(error)) {
-   /*
-    * Only the context which have set SWAP_HAS_CACHE flag
-    * would call add_to_swap_cache().
-    * So add_to_swap_cache() doesn't returns -EEXIST.
-    */
-   VM_BUG_ON(error == -EEXIST);
-   set_page_private(page, 0UL);
-   ClearPageSwapCache(page);
-   put_page(page);
+   get_page(page);
+   SetPageSwapCache(page);
+   set_page_private(page, entry.val);
+
+   address_space = swap_address_space(entry);
+   if (prev != address_space) {
+   if (prev)
+   spin_unlock_irq(>tree_lock);
+   spin_lock_irq(_space->tree_lock);
+   }
+   error = radix_tree_insert(_space->page_tree,
+   entry.val, page);
+   if (likely(!error)) {
+   address_space->nrpages++;
+   __inc_zone_page_state(page, NR_FILE_PAGES);
+   INC_CACHE_INFO(add_total);
+   }
+
+   if (unlikely(error)) {
+   spin_unlock_irq(_space->tree_lock);
+   address_space = NULL;
+   /*
+    * Only the context which have set SWAP_HAS_CACHE flag
+    * 

[PATCH 4/7] mm: Shrink page list batch allocates swap slots for page swapping

2016-05-03 Thread Tim Chen
In shrink page list, we take advantage bulk allocation of swap entries
with the new get_swap_pages function. This reduces contention on a
swap device's swap_info lock. When the memory is low and the system is
actively trying to reclaim memory, both direct reclaim path and kswapd
contends on this lock when they access the same swap partition.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 63 -
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e36d8a7..310e2b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1096,38 +1096,59 @@ static unsigned long shrink_anon_page_list(struct 
list_head *page_list,
 {
    unsigned long nr_reclaimed = 0;
    enum pg_result pg_dispose;
+   swp_entry_t swp_entries[SWAP_BATCH];
+   struct page *page;
+   int m, i, k;
 
    while (n > 0) {
-   struct page *page;
    int swap_ret = SWAP_SUCCESS;
 
-   --n;
-   if (list_empty(swap_pages))
-      return nr_reclaimed;
+   m = get_swap_pages(n, swp_entries);
+   if (!m)
+   goto no_swap_slots;
+   n -= m;
+   for (i = 0; i < m; ++i) {
+   if (list_empty(swap_pages)) {
+   /* free any leftover swap slots */
+   for (k = i; k < m; ++k)
+   swapcache_free(swp_entries[k]);
+   return nr_reclaimed;
+   }
+   page = lru_to_page(swap_pages);
 
-   page = lru_to_page(swap_pages);
+   list_del(>lru);
 
-   list_del(>lru);
+   /*
+   * Anonymous process memory has backing store?
+   * Try to allocate it some swap space here.
+   */
+
+   if (!add_to_swap(page, page_list, NULL)) {
+   pg_finish(page, PG_ACTIVATE_LOCKED, swap_ret,
+   _reclaimed, pgactivate,
+   ret_pages, free_pages);
+   continue;
+   }
 
-   /*
-   * Anonymous process memory has backing store?
-   * Try to allocate it some swap space here.
-   */
+   if (clean)
+   pg_dispose = handle_pgout(page_list, zone, sc,
+   ttu_flags, 
PAGEREF_RECLAIM_CLEAN,
+   true, true, _ret, page);
+   else
+   pg_dispose = handle_pgout(page_list, zone, sc,
+   ttu_flags, PAGEREF_RECLAIM,
+   true, true, _ret, page);
 
-   if (!add_to_swap(page, page_list, NULL)) {
-   pg_finish(page, PG_ACTIVATE_LOCKED, swap_ret, 
_reclaimed,
+   pg_finish(page, pg_dispose, swap_ret, _reclaimed,
    pgactivate, ret_pages, free_pages);
-   continue;
    }
+   }
+   return nr_reclaimed;
 
-   if (clean)
-   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
-   PAGEREF_RECLAIM_CLEAN, true, true, _ret, 
page);
-   else
-   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
-   PAGEREF_RECLAIM, true, true, _ret, page);
-
-   pg_finish(page, pg_dispose, swap_ret, _reclaimed,
+no_swap_slots:
+   while (!list_empty(swap_pages)) {
+   page = lru_to_page(swap_pages);
+   pg_finish(page, PG_ACTIVATE_LOCKED, 0, _reclaimed,
    pgactivate, ret_pages, free_pages);
    }
    return nr_reclaimed;
-- 
2.5.5



[PATCH 4/7] mm: Shrink page list batch allocates swap slots for page swapping

2016-05-03 Thread Tim Chen
In shrink page list, we take advantage bulk allocation of swap entries
with the new get_swap_pages function. This reduces contention on a
swap device's swap_info lock. When the memory is low and the system is
actively trying to reclaim memory, both direct reclaim path and kswapd
contends on this lock when they access the same swap partition.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 63 -
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e36d8a7..310e2b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1096,38 +1096,59 @@ static unsigned long shrink_anon_page_list(struct 
list_head *page_list,
 {
    unsigned long nr_reclaimed = 0;
    enum pg_result pg_dispose;
+   swp_entry_t swp_entries[SWAP_BATCH];
+   struct page *page;
+   int m, i, k;
 
    while (n > 0) {
-   struct page *page;
    int swap_ret = SWAP_SUCCESS;
 
-   --n;
-   if (list_empty(swap_pages))
-      return nr_reclaimed;
+   m = get_swap_pages(n, swp_entries);
+   if (!m)
+   goto no_swap_slots;
+   n -= m;
+   for (i = 0; i < m; ++i) {
+   if (list_empty(swap_pages)) {
+   /* free any leftover swap slots */
+   for (k = i; k < m; ++k)
+   swapcache_free(swp_entries[k]);
+   return nr_reclaimed;
+   }
+   page = lru_to_page(swap_pages);
 
-   page = lru_to_page(swap_pages);
+   list_del(>lru);
 
-   list_del(>lru);
+   /*
+   * Anonymous process memory has backing store?
+   * Try to allocate it some swap space here.
+   */
+
+   if (!add_to_swap(page, page_list, NULL)) {
+   pg_finish(page, PG_ACTIVATE_LOCKED, swap_ret,
+   _reclaimed, pgactivate,
+   ret_pages, free_pages);
+   continue;
+   }
 
-   /*
-   * Anonymous process memory has backing store?
-   * Try to allocate it some swap space here.
-   */
+   if (clean)
+   pg_dispose = handle_pgout(page_list, zone, sc,
+   ttu_flags, 
PAGEREF_RECLAIM_CLEAN,
+   true, true, _ret, page);
+   else
+   pg_dispose = handle_pgout(page_list, zone, sc,
+   ttu_flags, PAGEREF_RECLAIM,
+   true, true, _ret, page);
 
-   if (!add_to_swap(page, page_list, NULL)) {
-   pg_finish(page, PG_ACTIVATE_LOCKED, swap_ret, 
_reclaimed,
+   pg_finish(page, pg_dispose, swap_ret, _reclaimed,
    pgactivate, ret_pages, free_pages);
-   continue;
    }
+   }
+   return nr_reclaimed;
 
-   if (clean)
-   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
-   PAGEREF_RECLAIM_CLEAN, true, true, _ret, 
page);
-   else
-   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
-   PAGEREF_RECLAIM, true, true, _ret, page);
-
-   pg_finish(page, pg_dispose, swap_ret, _reclaimed,
+no_swap_slots:
+   while (!list_empty(swap_pages)) {
+   page = lru_to_page(swap_pages);
+   pg_finish(page, PG_ACTIVATE_LOCKED, 0, _reclaimed,
    pgactivate, ret_pages, free_pages);
    }
    return nr_reclaimed;
-- 
2.5.5



[PATCH 2/7] mm: Group the processing of anonymous pages to be swapped in shrink_page_list

2016-05-03 Thread Tim Chen
This is a clean up patch to reorganize the processing of anonymous
pages in shrink_page_list.

We delay the processing of swapping anonymous pages in shrink_page_list
and put them together on a separate list.  This prepares for batching
of pages to be swapped.  The processing of the list of anonymous pages
to be swapped is consolidated in the function shrink_anon_page_list.

Functionally, there is no change in the logic of how pages are processed,
just the order of processing of the anonymous pages and file mapped
pages in shrink_page_list.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 82 +
 1 file changed, 77 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5542005..132ba02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1083,6 +1083,58 @@ static void pg_finish(struct page *page,
    }
 }
 
+static unsigned long shrink_anon_page_list(struct list_head *page_list,
+   struct zone *zone,
+   struct scan_control *sc,
+   struct list_head *swap_pages,
+   struct list_head *ret_pages,
+   struct list_head *free_pages,
+   enum ttu_flags ttu_flags,
+   int *pgactivate,
+   int n,
+   bool clean)
+{
+   unsigned long nr_reclaimed = 0;
+   enum pg_result pg_dispose;
+
+   while (n > 0) {
+   struct page *page;
+   int swap_ret = SWAP_SUCCESS;
+
+   --n;
+   if (list_empty(swap_pages))
+      return nr_reclaimed;
+
+   page = lru_to_page(swap_pages);
+
+   list_del(>lru);
+
+   /*
+   * Anonymous process memory has backing store?
+   * Try to allocate it some swap space here.
+   */
+
+   if (!add_to_swap(page, page_list)) {
+   pg_finish(page, PG_ACTIVATE_LOCKED, swap_ret, 
_reclaimed,
+   pgactivate, ret_pages, free_pages);
+   continue;
+   }
+
+   if (clean)
+   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
+   PAGEREF_RECLAIM_CLEAN, true, true, _ret, 
page);
+   else
+   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
+   PAGEREF_RECLAIM, true, true, _ret, page);
+
+   pg_finish(page, pg_dispose, swap_ret, _reclaimed,
+   pgactivate, ret_pages, free_pages);
+   }
+   return nr_reclaimed;
+}
+
+
+
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
@@ -1099,6 +1151,8 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 {
    LIST_HEAD(ret_pages);
    LIST_HEAD(free_pages);
+   LIST_HEAD(swap_pages);
+   LIST_HEAD(swap_pages_clean);
    int pgactivate = 0;
    unsigned long nr_unqueued_dirty = 0;
    unsigned long nr_dirty = 0;
@@ -1106,6 +1160,8 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
    unsigned long nr_reclaimed = 0;
    unsigned long nr_writeback = 0;
    unsigned long nr_immediate = 0;
+   unsigned long nr_swap = 0;
+   unsigned long nr_swap_clean = 0;
 
    cond_resched();
 
@@ -1271,12 +1327,17 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
    pg_dispose = PG_KEEP_LOCKED;
    goto finish;
    }
-   if (!add_to_swap(page, page_list)) {
-   pg_dispose = PG_ACTIVATE_LOCKED;
-   goto finish;
+   if (references == PAGEREF_RECLAIM_CLEAN) {
+   list_add(>lru, _pages_clean);
+   ++nr_swap_clean;
+   } else {
+   list_add(>lru, _pages);
+   ++nr_swap;
    }
-   lazyfree = true;
-   may_enter_fs = 1;
+
+   pg_dispose = PG_NEXT;
+   goto finish;
+
    }
 
    pg_dispose = handle_pgout(page_list, zone, sc, ttu_flags,
@@ -1288,6 +1349,17 @@ finish:
 
    }
 
+   nr_reclaimed += shrink_anon_page_list(page_list, zone, sc,
+   _pages_clean, _pages,
+   _pages, ttu_flags,
+   , nr_swap_clean,
+   true);
+   nr_reclaimed += shrink_anon_page_list(page_list, zone, sc,
+   _pages, _pages,
+   _pages, ttu_flags,
+   , nr_swap,
+ 

[PATCH 1/7] mm: Cleanup - Reorganize the shrink_page_list code into smaller functions

2016-05-03 Thread Tim Chen
This patch prepares the code for being able to batch the anonymous pages
to be swapped out.  It reorganizes shrink_page_list function with
2 new functions: handle_pgout and pg_finish.

The paging operation in shrink_page_list is consolidated into
handle_pgout function.

After we have scanned a page shrink_page_list and completed any paging,
the final disposition and clean up of the page is conslidated into
pg_finish.  The designated disposition of the page from page scanning
in shrink_page_list is marked with one of the designation in pg_result.

This is a clean up patch and there is no change in functionality or
logic of the code.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 429 ++--
 1 file changed, 246 insertions(+), 183 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b934223e..5542005 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -873,6 +873,216 @@ static void page_check_dirty_writeback(struct page *page,
    mapping->a_ops->is_dirty_writeback(page, dirty, writeback);
 }
 
+enum pg_result {
+   PG_SPECULATIVE_REF,
+   PG_FREE,
+   PG_MLOCKED,
+   PG_ACTIVATE_LOCKED,
+   PG_KEEP_LOCKED,
+   PG_KEEP,
+   PG_NEXT,
+   PG_UNKNOWN,
+};
+
+static enum pg_result handle_pgout(struct list_head *page_list,
+   struct zone *zone,
+   struct scan_control *sc,
+   enum ttu_flags ttu_flags,
+   enum page_references references,
+   bool may_enter_fs,
+   bool lazyfree,
+   int  *swap_ret,
+   struct page *page)
+{
+   struct address_space *mapping;
+
+   mapping =  page_mapping(page);
+
+   /*
+    * The page is mapped into the page tables of one or more
+    * processes. Try to unmap it here.
+    */
+   if (page_mapped(page) && mapping) {
+   switch (*swap_ret = try_to_unmap(page, lazyfree ?
+   (ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
+   (ttu_flags | TTU_BATCH_FLUSH))) {
+   case SWAP_FAIL:
+   return PG_ACTIVATE_LOCKED;
+   case SWAP_AGAIN:
+   return PG_KEEP_LOCKED;
+   case SWAP_MLOCK:
+   return PG_MLOCKED;
+   case SWAP_LZFREE:
+   goto lazyfree;
+   case SWAP_SUCCESS:
+   ; /* try to free the page below */
+   }
+   }
+
+   if (PageDirty(page)) {
+   /*
+    * Only kswapd can writeback filesystem pages to
+    * avoid risk of stack overflow but only writeback
+    * if many dirty pages have been encountered.
+    */
+   if (page_is_file_cache(page) &&
+   (!current_is_kswapd() ||
+    !test_bit(ZONE_DIRTY, >flags))) {
+   /*
+    * Immediately reclaim when written back.
+    * Similar in principal to deactivate_page()
+    * except we already have the page isolated
+    * and know it's dirty
+    */
+   inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
+   SetPageReclaim(page);
+
+   return PG_KEEP_LOCKED;
+   }
+
+   if (references == PAGEREF_RECLAIM_CLEAN)
+   return PG_KEEP_LOCKED;
+   if (!may_enter_fs)
+   return PG_KEEP_LOCKED;
+   if (!sc->may_writepage)
+   return PG_KEEP_LOCKED;
+
+   /*
+    * Page is dirty. Flush the TLB if a writable entry
+    * potentially exists to avoid CPU writes after IO
+    * starts and then write it out here.
+    */
+   try_to_unmap_flush_dirty();
+   switch (pageout(page, mapping, sc)) {
+   case PAGE_KEEP:
+   return PG_KEEP_LOCKED;
+   case PAGE_ACTIVATE:
+   return PG_ACTIVATE_LOCKED;
+   case PAGE_SUCCESS:
+   if (PageWriteback(page))
+   return PG_KEEP;
+   if (PageDirty(page))
+   return PG_KEEP;
+
+   /*
+    * A synchronous write - probably a ramdisk.  Go
+    * ahead and try to reclaim the page.
+    */
+   if (!trylock_page(page))
+   return PG_KEEP;
+   if (PageDirty(page) || PageWriteback(page))
+   return PG_KEEP_LOCKED;
+   mapping = page_mapping(page);
+   case PAGE_CLEAN:
+   ; /* try to free the page below */
+   }
+   }
+
+

[PATCH 2/7] mm: Group the processing of anonymous pages to be swapped in shrink_page_list

2016-05-03 Thread Tim Chen
This is a clean up patch to reorganize the processing of anonymous
pages in shrink_page_list.

We delay the processing of swapping anonymous pages in shrink_page_list
and put them together on a separate list.  This prepares for batching
of pages to be swapped.  The processing of the list of anonymous pages
to be swapped is consolidated in the function shrink_anon_page_list.

Functionally, there is no change in the logic of how pages are processed,
just the order of processing of the anonymous pages and file mapped
pages in shrink_page_list.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 82 +
 1 file changed, 77 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5542005..132ba02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1083,6 +1083,58 @@ static void pg_finish(struct page *page,
    }
 }
 
+static unsigned long shrink_anon_page_list(struct list_head *page_list,
+   struct zone *zone,
+   struct scan_control *sc,
+   struct list_head *swap_pages,
+   struct list_head *ret_pages,
+   struct list_head *free_pages,
+   enum ttu_flags ttu_flags,
+   int *pgactivate,
+   int n,
+   bool clean)
+{
+   unsigned long nr_reclaimed = 0;
+   enum pg_result pg_dispose;
+
+   while (n > 0) {
+   struct page *page;
+   int swap_ret = SWAP_SUCCESS;
+
+   --n;
+   if (list_empty(swap_pages))
+      return nr_reclaimed;
+
+   page = lru_to_page(swap_pages);
+
+   list_del(>lru);
+
+   /*
+   * Anonymous process memory has backing store?
+   * Try to allocate it some swap space here.
+   */
+
+   if (!add_to_swap(page, page_list)) {
+   pg_finish(page, PG_ACTIVATE_LOCKED, swap_ret, 
_reclaimed,
+   pgactivate, ret_pages, free_pages);
+   continue;
+   }
+
+   if (clean)
+   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
+   PAGEREF_RECLAIM_CLEAN, true, true, _ret, 
page);
+   else
+   pg_dispose = handle_pgout(page_list, zone, sc, 
ttu_flags,
+   PAGEREF_RECLAIM, true, true, _ret, page);
+
+   pg_finish(page, pg_dispose, swap_ret, _reclaimed,
+   pgactivate, ret_pages, free_pages);
+   }
+   return nr_reclaimed;
+}
+
+
+
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
@@ -1099,6 +1151,8 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
 {
    LIST_HEAD(ret_pages);
    LIST_HEAD(free_pages);
+   LIST_HEAD(swap_pages);
+   LIST_HEAD(swap_pages_clean);
    int pgactivate = 0;
    unsigned long nr_unqueued_dirty = 0;
    unsigned long nr_dirty = 0;
@@ -1106,6 +1160,8 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
    unsigned long nr_reclaimed = 0;
    unsigned long nr_writeback = 0;
    unsigned long nr_immediate = 0;
+   unsigned long nr_swap = 0;
+   unsigned long nr_swap_clean = 0;
 
    cond_resched();
 
@@ -1271,12 +1327,17 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
    pg_dispose = PG_KEEP_LOCKED;
    goto finish;
    }
-   if (!add_to_swap(page, page_list)) {
-   pg_dispose = PG_ACTIVATE_LOCKED;
-   goto finish;
+   if (references == PAGEREF_RECLAIM_CLEAN) {
+   list_add(>lru, _pages_clean);
+   ++nr_swap_clean;
+   } else {
+   list_add(>lru, _pages);
+   ++nr_swap;
    }
-   lazyfree = true;
-   may_enter_fs = 1;
+
+   pg_dispose = PG_NEXT;
+   goto finish;
+
    }
 
    pg_dispose = handle_pgout(page_list, zone, sc, ttu_flags,
@@ -1288,6 +1349,17 @@ finish:
 
    }
 
+   nr_reclaimed += shrink_anon_page_list(page_list, zone, sc,
+   _pages_clean, _pages,
+   _pages, ttu_flags,
+   , nr_swap_clean,
+   true);
+   nr_reclaimed += shrink_anon_page_list(page_list, zone, sc,
+   _pages, _pages,
+   _pages, ttu_flags,
+   , nr_swap,
+   false);
+
    

[PATCH 1/7] mm: Cleanup - Reorganize the shrink_page_list code into smaller functions

2016-05-03 Thread Tim Chen
This patch prepares the code for being able to batch the anonymous pages
to be swapped out.  It reorganizes shrink_page_list function with
2 new functions: handle_pgout and pg_finish.

The paging operation in shrink_page_list is consolidated into
handle_pgout function.

After we have scanned a page shrink_page_list and completed any paging,
the final disposition and clean up of the page is conslidated into
pg_finish.  The designated disposition of the page from page scanning
in shrink_page_list is marked with one of the designation in pg_result.

This is a clean up patch and there is no change in functionality or
logic of the code.

Signed-off-by: Tim Chen 
---
 mm/vmscan.c | 429 ++--
 1 file changed, 246 insertions(+), 183 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b934223e..5542005 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -873,6 +873,216 @@ static void page_check_dirty_writeback(struct page *page,
    mapping->a_ops->is_dirty_writeback(page, dirty, writeback);
 }
 
+enum pg_result {
+   PG_SPECULATIVE_REF,
+   PG_FREE,
+   PG_MLOCKED,
+   PG_ACTIVATE_LOCKED,
+   PG_KEEP_LOCKED,
+   PG_KEEP,
+   PG_NEXT,
+   PG_UNKNOWN,
+};
+
+static enum pg_result handle_pgout(struct list_head *page_list,
+   struct zone *zone,
+   struct scan_control *sc,
+   enum ttu_flags ttu_flags,
+   enum page_references references,
+   bool may_enter_fs,
+   bool lazyfree,
+   int  *swap_ret,
+   struct page *page)
+{
+   struct address_space *mapping;
+
+   mapping =  page_mapping(page);
+
+   /*
+    * The page is mapped into the page tables of one or more
+    * processes. Try to unmap it here.
+    */
+   if (page_mapped(page) && mapping) {
+   switch (*swap_ret = try_to_unmap(page, lazyfree ?
+   (ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
+   (ttu_flags | TTU_BATCH_FLUSH))) {
+   case SWAP_FAIL:
+   return PG_ACTIVATE_LOCKED;
+   case SWAP_AGAIN:
+   return PG_KEEP_LOCKED;
+   case SWAP_MLOCK:
+   return PG_MLOCKED;
+   case SWAP_LZFREE:
+   goto lazyfree;
+   case SWAP_SUCCESS:
+   ; /* try to free the page below */
+   }
+   }
+
+   if (PageDirty(page)) {
+   /*
+    * Only kswapd can writeback filesystem pages to
+    * avoid risk of stack overflow but only writeback
+    * if many dirty pages have been encountered.
+    */
+   if (page_is_file_cache(page) &&
+   (!current_is_kswapd() ||
+    !test_bit(ZONE_DIRTY, >flags))) {
+   /*
+    * Immediately reclaim when written back.
+    * Similar in principal to deactivate_page()
+    * except we already have the page isolated
+    * and know it's dirty
+    */
+   inc_zone_page_state(page, NR_VMSCAN_IMMEDIATE);
+   SetPageReclaim(page);
+
+   return PG_KEEP_LOCKED;
+   }
+
+   if (references == PAGEREF_RECLAIM_CLEAN)
+   return PG_KEEP_LOCKED;
+   if (!may_enter_fs)
+   return PG_KEEP_LOCKED;
+   if (!sc->may_writepage)
+   return PG_KEEP_LOCKED;
+
+   /*
+    * Page is dirty. Flush the TLB if a writable entry
+    * potentially exists to avoid CPU writes after IO
+    * starts and then write it out here.
+    */
+   try_to_unmap_flush_dirty();
+   switch (pageout(page, mapping, sc)) {
+   case PAGE_KEEP:
+   return PG_KEEP_LOCKED;
+   case PAGE_ACTIVATE:
+   return PG_ACTIVATE_LOCKED;
+   case PAGE_SUCCESS:
+   if (PageWriteback(page))
+   return PG_KEEP;
+   if (PageDirty(page))
+   return PG_KEEP;
+
+   /*
+    * A synchronous write - probably a ramdisk.  Go
+    * ahead and try to reclaim the page.
+    */
+   if (!trylock_page(page))
+   return PG_KEEP;
+   if (PageDirty(page) || PageWriteback(page))
+   return PG_KEEP_LOCKED;
+   mapping = page_mapping(page);
+   case PAGE_CLEAN:
+   ; /* try to free the page below */
+   }
+   }
+
+   /*
+    * If the 

[PATCH 0/7] mm: Improve swap path scalability with batched operations

2016-05-03 Thread Tim Chen
The page swap out path is not scalable due to the numerous locks
acquired and released along the way, which are all executed on a page
by page basis, e.g.:

1. The acquisition of the mapping tree lock in swap cache when adding
a page to swap cache, and then again when deleting a page from swap cache after
it has been swapped out. 
2. The acquisition of the lock on swap device to allocate a swap slot for
a page to be swapped out. 

With the advent of high speed block devices that's several orders  
of magnitude faster than the old spinning disks, these bottlenecks
become fairly significant, especially on server class machines
with many theads running.  To reduce these locking costs, this patch
series attempt to batch the pages on the following oprations needed
on for swap:
1. Allocate swap slots in large batches, so locks on the swap device
don't need to be acquired as often. 
2. Add anonymous pages to the swap cache for the same swap device in
 
batches, so the mapping tree lock can be acquired less.
3. Delete pages from swap cache also in batches.

We experimented the effect of this patches. We set up N threads to access
memory in excess of memory capcity, causing swap.  In experiments using
a single pmem based fast block device on a 2 socket machine, we saw
that for 1 thread, there is a ~25% increase in swap throughput and for
16 threads, the swap throughput increase by ~85%, when compared with the
vanilla kernel. Batching helps even for 1 thread because of contention
with kswapd when doing direct memory reclaim.

Feedbacks and reviews to this patch series are much appreciated.

Thanks.

Tim


Tim Chen (7):
  mm: Cleanup - Reorganize the shrink_page_list code into smaller
functions
  mm: Group the processing of anonymous pages to be swapped in
shrink_page_list
  mm: Add new functions to allocate swap slots in batches
  mm: Shrink page list batch allocates swap slots for page swapping
  mm: Batch addtion of pages to swap cache
  mm: Cleanup - Reorganize code to group handling of page
  mm: Batch unmapping of pages that are in swap cache

 include/linux/swap.h |  29 ++-
 mm/swap_state.c  | 253 +-
 mm/swapfile.c| 215 +--
 mm/vmscan.c  | 725 ++-
 4 files changed, 945 insertions(+), 277 deletions(-)

-- 
2.5.5


Re: [PATCH] fix infoleak in wireless

2016-05-03 Thread Greg Kroah-Hartman

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Tue, May 03, 2016 at 04:47:16PM -0400, Kangjie Lu wrote:
> Hi Greg,
> 
> Could you please take a look at this issue.
> mac_addr is not initialized is some implementations of dump_station(), which
> can be exploited by attackers for leaking information.

You are going to have to give me more context here...

Like the patch itself?

thanks,

greg k-h


[PATCH 0/7] mm: Improve swap path scalability with batched operations

2016-05-03 Thread Tim Chen
The page swap out path is not scalable due to the numerous locks
acquired and released along the way, which are all executed on a page
by page basis, e.g.:

1. The acquisition of the mapping tree lock in swap cache when adding
a page to swap cache, and then again when deleting a page from swap cache after
it has been swapped out. 
2. The acquisition of the lock on swap device to allocate a swap slot for
a page to be swapped out. 

With the advent of high speed block devices that's several orders  
of magnitude faster than the old spinning disks, these bottlenecks
become fairly significant, especially on server class machines
with many theads running.  To reduce these locking costs, this patch
series attempt to batch the pages on the following oprations needed
on for swap:
1. Allocate swap slots in large batches, so locks on the swap device
don't need to be acquired as often. 
2. Add anonymous pages to the swap cache for the same swap device in
 
batches, so the mapping tree lock can be acquired less.
3. Delete pages from swap cache also in batches.

We experimented the effect of this patches. We set up N threads to access
memory in excess of memory capcity, causing swap.  In experiments using
a single pmem based fast block device on a 2 socket machine, we saw
that for 1 thread, there is a ~25% increase in swap throughput and for
16 threads, the swap throughput increase by ~85%, when compared with the
vanilla kernel. Batching helps even for 1 thread because of contention
with kswapd when doing direct memory reclaim.

Feedbacks and reviews to this patch series are much appreciated.

Thanks.

Tim


Tim Chen (7):
  mm: Cleanup - Reorganize the shrink_page_list code into smaller
functions
  mm: Group the processing of anonymous pages to be swapped in
shrink_page_list
  mm: Add new functions to allocate swap slots in batches
  mm: Shrink page list batch allocates swap slots for page swapping
  mm: Batch addtion of pages to swap cache
  mm: Cleanup - Reorganize code to group handling of page
  mm: Batch unmapping of pages that are in swap cache

 include/linux/swap.h |  29 ++-
 mm/swap_state.c  | 253 +-
 mm/swapfile.c| 215 +--
 mm/vmscan.c  | 725 ++-
 4 files changed, 945 insertions(+), 277 deletions(-)

-- 
2.5.5


Re: [PATCH] fix infoleak in wireless

2016-05-03 Thread Greg Kroah-Hartman

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Tue, May 03, 2016 at 04:47:16PM -0400, Kangjie Lu wrote:
> Hi Greg,
> 
> Could you please take a look at this issue.
> mac_addr is not initialized is some implementations of dump_station(), which
> can be exploited by attackers for leaking information.

You are going to have to give me more context here...

Like the patch itself?

thanks,

greg k-h


Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)

2016-05-03 Thread Tadeusz Struk
Hi Jamie,
On 05/03/2016 01:35 PM, David Howells wrote:
> (cc'ing Tadeusz as he did the pkcs1 padding function)
> 
> Jamie Heilman  wrote:
> 
 Problem loading in-kernel X.509 certificate (-2)
>>>
>>> ENOENT?  Hmmm...  The only place that is generated is in the crypto layer.
>>> That suggests missing crypto of some sort.
>>>
>>> The attached patch enables some debugging in some relevant files if you can
>>> try applying it to your kernel.
>>
>> Alrighty, presumably relevant bits:
>>
>> X.509: Cert Issuer: Build time autogenerated kernel key
>> X.509: Cert Subject: Build time autogenerated kernel key
>> X.509: Cert Key Algo: rsa
>> X.509: Cert Valid period: 1461826791-4615426791
>> X.509: Cert Signature: rsa + sha512
>> X.509: ==>x509_check_signature()
>> X.509: ==>x509_get_sig_params()
>> X.509: <==x509_get_sig_params() = 0
>> PKEY: ==>public_key_verify_signature()
>> X.509: Cert Verification: -2
> 
> Hmmm...  Okay, the only ways out of public_key_verify_signature() without
> printing a leaving message are for snprintf() to overrun (which would return
> error -22) or for crypto_alloc_akcipher() to have failed; everything else must
> go through the kleave() at the pr_devel() at the bottom of the function.
> 
> Can you stick:
> 
>   pr_devel("ALGO: %s\n", alg_name);
> 
> immediately before this line:
> 
>   tfm = crypto_alloc_akcipher(alg_name, 0, 0);
> 
> and try it again?
> 

Could you please check if this one fixes the problem for you:
https://patchwork.kernel.org/patch/8766361/
Thanks,
-- 
TS


Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)

2016-05-03 Thread Tadeusz Struk
Hi Jamie,
On 05/03/2016 01:35 PM, David Howells wrote:
> (cc'ing Tadeusz as he did the pkcs1 padding function)
> 
> Jamie Heilman  wrote:
> 
 Problem loading in-kernel X.509 certificate (-2)
>>>
>>> ENOENT?  Hmmm...  The only place that is generated is in the crypto layer.
>>> That suggests missing crypto of some sort.
>>>
>>> The attached patch enables some debugging in some relevant files if you can
>>> try applying it to your kernel.
>>
>> Alrighty, presumably relevant bits:
>>
>> X.509: Cert Issuer: Build time autogenerated kernel key
>> X.509: Cert Subject: Build time autogenerated kernel key
>> X.509: Cert Key Algo: rsa
>> X.509: Cert Valid period: 1461826791-4615426791
>> X.509: Cert Signature: rsa + sha512
>> X.509: ==>x509_check_signature()
>> X.509: ==>x509_get_sig_params()
>> X.509: <==x509_get_sig_params() = 0
>> PKEY: ==>public_key_verify_signature()
>> X.509: Cert Verification: -2
> 
> Hmmm...  Okay, the only ways out of public_key_verify_signature() without
> printing a leaving message are for snprintf() to overrun (which would return
> error -22) or for crypto_alloc_akcipher() to have failed; everything else must
> go through the kleave() at the pr_devel() at the bottom of the function.
> 
> Can you stick:
> 
>   pr_devel("ALGO: %s\n", alg_name);
> 
> immediately before this line:
> 
>   tfm = crypto_alloc_akcipher(alg_name, 0, 0);
> 
> and try it again?
> 

Could you please check if this one fixes the problem for you:
https://patchwork.kernel.org/patch/8766361/
Thanks,
-- 
TS


Re: [PATCH 00/42] v7: separate operations from flags in the bio/request structs

2016-05-03 Thread Jeff Moyer
mchri...@redhat.com writes:

> The following patches begin to cleanup the request->cmd_flags and
> bio->bi_rw mess. We currently use cmd_flags to specify the operation,
> attributes and state of the request. For bi_rw we use it for similar
> info and also the priority but then also have another bi_flags field
> for state. At some point, we abused them so much we just made cmd_flags
> 64 bits, so we could add more.
>
> The following patches seperate the operation (read, write discard,
> flush, etc) from cmd_flags/bi_rw.
>
> This patchset was made against linux-next from today April 15
> (git tag next-20160415).
>
> I put a git tree here:
> https://github.com/mikechristie/linux-kernel.git
> The patches are in the op branch.

Hi, Mike,

That git tree doesn't seem to exist.  I did manage to apply your patch
set on top of next-20160415, though.

So... what testing did you do? ;-) I ran into the following problems:
- git clone fails
- yum segfaults
- many blktrace/blkparse issues, including incorrect cpu recorded in
  traces, null task names, and blkparse outputting nothing for a trace
  file several gigabytes in size.

After that, I decided to back out your patches and test the base
linux-next kernel.  That kernel has none of those issues.

So, either I'm missing some dependencies, or I think we've got some
issues to iron out before this thing goes in.  Before I dig any further,
am I missing something?

Cheers,
Jeff


Re: [PATCH 00/42] v7: separate operations from flags in the bio/request structs

2016-05-03 Thread Jeff Moyer
mchri...@redhat.com writes:

> The following patches begin to cleanup the request->cmd_flags and
> bio->bi_rw mess. We currently use cmd_flags to specify the operation,
> attributes and state of the request. For bi_rw we use it for similar
> info and also the priority but then also have another bi_flags field
> for state. At some point, we abused them so much we just made cmd_flags
> 64 bits, so we could add more.
>
> The following patches seperate the operation (read, write discard,
> flush, etc) from cmd_flags/bi_rw.
>
> This patchset was made against linux-next from today April 15
> (git tag next-20160415).
>
> I put a git tree here:
> https://github.com/mikechristie/linux-kernel.git
> The patches are in the op branch.

Hi, Mike,

That git tree doesn't seem to exist.  I did manage to apply your patch
set on top of next-20160415, though.

So... what testing did you do? ;-) I ran into the following problems:
- git clone fails
- yum segfaults
- many blktrace/blkparse issues, including incorrect cpu recorded in
  traces, null task names, and blkparse outputting nothing for a trace
  file several gigabytes in size.

After that, I decided to back out your patches and test the base
linux-next kernel.  That kernel has none of those issues.

So, either I'm missing some dependencies, or I think we've got some
issues to iron out before this thing goes in.  Before I dig any further,
am I missing something?

Cheers,
Jeff


[PATCH] fix infoleak in rtnetlink

2016-05-03 Thread Kangjie Lu
The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.

Signed-off-by: Kangjie Lu 
---
 net/core/rtnetlink.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a75f7e9..65763c2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1180,14 +1180,16 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct 
sk_buff *skb,
 
 static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 {
-   struct rtnl_link_ifmap map = {
-   .mem_start   = dev->mem_start,
-   .mem_end = dev->mem_end,
-   .base_addr   = dev->base_addr,
-   .irq = dev->irq,
-   .dma = dev->dma,
-   .port= dev->if_port,
-   };
+   struct rtnl_link_ifmap map;
+
+   memset(, 0, sizeof(map));
+   map.mem_start   = dev->mem_start;
+   map.mem_end = dev->mem_end;
+   map.base_addr   = dev->base_addr;
+   map.irq = dev->irq;
+   map.dma = dev->dma;
+   map.port= dev->if_port;
+
if (nla_put(skb, IFLA_MAP, sizeof(map), ))
return -EMSGSIZE;
 
-- 
1.9.1



[PATCH] fix infoleak in rtnetlink

2016-05-03 Thread Kangjie Lu
The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.

Signed-off-by: Kangjie Lu 
---
 net/core/rtnetlink.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a75f7e9..65763c2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1180,14 +1180,16 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct 
sk_buff *skb,
 
 static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 {
-   struct rtnl_link_ifmap map = {
-   .mem_start   = dev->mem_start,
-   .mem_end = dev->mem_end,
-   .base_addr   = dev->base_addr,
-   .irq = dev->irq,
-   .dma = dev->dma,
-   .port= dev->if_port,
-   };
+   struct rtnl_link_ifmap map;
+
+   memset(, 0, sizeof(map));
+   map.mem_start   = dev->mem_start;
+   map.mem_end = dev->mem_end;
+   map.base_addr   = dev->base_addr;
+   map.irq = dev->irq;
+   map.dma = dev->dma;
+   map.port= dev->if_port;
+
if (nla_put(skb, IFLA_MAP, sizeof(map), ))
return -EMSGSIZE;
 
-- 
1.9.1



[PATCH] infoleak fix2 in timer

2016-05-03 Thread Kangjie Lu
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu 
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 964f5eb..e98fa5f 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1225,6 +1225,7 @@ static void snd_timer_user_ccallback(struct 
snd_timer_instance *timeri,
tu->tstamp = *tstamp;
if ((tu->filter & (1 << event)) == 0 || !tu->tread)
return;
+   memset(, 0, sizeof(r1));
r1.event = event;
r1.tstamp = *tstamp;
r1.val = resolution;
-- 
1.9.1



[PATCH] infoleak fix2 in timer

2016-05-03 Thread Kangjie Lu
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu 
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 964f5eb..e98fa5f 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1225,6 +1225,7 @@ static void snd_timer_user_ccallback(struct 
snd_timer_instance *timeri,
tu->tstamp = *tstamp;
if ((tu->filter & (1 << event)) == 0 || !tu->tread)
return;
+   memset(, 0, sizeof(r1));
r1.event = event;
r1.tstamp = *tstamp;
r1.val = resolution;
-- 
1.9.1



[PATCH] infoleak fix3 in timer

2016-05-03 Thread Kangjie Lu
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu 
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index e98fa5f..c69a271 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1268,6 +1268,7 @@ static void snd_timer_user_tinterrupt(struct 
snd_timer_instance *timeri,
}
if ((tu->filter & (1 << SNDRV_TIMER_EVENT_RESOLUTION)) &&
tu->last_resolution != resolution) {
+   memset(, 0, sizeof(r1));
r1.event = SNDRV_TIMER_EVENT_RESOLUTION;
r1.tstamp = tstamp;
r1.val = resolution;
-- 
1.9.1



[PATCH] infoleak fix3 in timer

2016-05-03 Thread Kangjie Lu
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu 
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index e98fa5f..c69a271 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1268,6 +1268,7 @@ static void snd_timer_user_tinterrupt(struct 
snd_timer_instance *timeri,
}
if ((tu->filter & (1 << SNDRV_TIMER_EVENT_RESOLUTION)) &&
tu->last_resolution != resolution) {
+   memset(, 0, sizeof(r1));
r1.event = SNDRV_TIMER_EVENT_RESOLUTION;
r1.tstamp = tstamp;
r1.val = resolution;
-- 
1.9.1



[PATCH] infoleak fix1 in timer

2016-05-03 Thread Kangjie Lu
The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu 
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 6469bed..964f5eb 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1739,6 +1739,7 @@ static int snd_timer_user_params(struct file *file,
if (tu->timeri->flags & SNDRV_TIMER_IFLG_EARLY_EVENT) {
if (tu->tread) {
struct snd_timer_tread tread;
+   memset(, 0, sizeof(tread));
tread.event = SNDRV_TIMER_EVENT_EARLY;
tread.tstamp.tv_sec = 0;
tread.tstamp.tv_nsec = 0;
-- 
1.9.1



Re: [PATCH] fix infoleak in wireless

2016-05-03 Thread Johannes Berg
On Tue, 2016-05-03 at 16:40 -0400, Kangjie Lu wrote:
> The 6-bytes array “mac_addr” is not initialized in the dump_station
> implementations of
> “drivers/staging/wilc1000/wilc_wfi_cfgoperations.c”
> and “drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c”, so all 6
> bytes may be leaked.

Like I said to you before, this makes those implementations completely
broken. I'm not going to apply this patch. If you want, feel free to
send patches to Greg to remove those dump_station implementations that
are completely broken and that can never do anything useful.

johannes


[PATCH] infoleak fix1 in timer

2016-05-03 Thread Kangjie Lu
The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.

Signed-off-by: Kangjie Lu 
---
 sound/core/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 6469bed..964f5eb 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1739,6 +1739,7 @@ static int snd_timer_user_params(struct file *file,
if (tu->timeri->flags & SNDRV_TIMER_IFLG_EARLY_EVENT) {
if (tu->tread) {
struct snd_timer_tread tread;
+   memset(, 0, sizeof(tread));
tread.event = SNDRV_TIMER_EVENT_EARLY;
tread.tstamp.tv_sec = 0;
tread.tstamp.tv_nsec = 0;
-- 
1.9.1



Re: [PATCH] fix infoleak in wireless

2016-05-03 Thread Johannes Berg
On Tue, 2016-05-03 at 16:40 -0400, Kangjie Lu wrote:
> The 6-bytes array “mac_addr” is not initialized in the dump_station
> implementations of
> “drivers/staging/wilc1000/wilc_wfi_cfgoperations.c”
> and “drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c”, so all 6
> bytes may be leaked.

Like I said to you before, this makes those implementations completely
broken. I'm not going to apply this patch. If you want, feel free to
send patches to Greg to remove those dump_station implementations that
are completely broken and that can never do anything useful.

johannes


[PATCH] infoleak fix1 in signal

2016-05-03 Thread Kangjie Lu
The stack object “info” has a total size of 128 bytes; however,
only 28 bytes are initialized. The remaining uninitialized bytes
are sent to userland via send_signal.

Signed-off-by: Kangjie Lu 
---
 kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index aa9bf00..f40f0b3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1594,6 +1594,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
sig = SIGCHLD;
}
 
+   memset(, 0, sizeof(info));
info.si_signo = sig;
info.si_errno = 0;
/*
-- 
1.9.1



[PATCH] infoleak fix1 in signal

2016-05-03 Thread Kangjie Lu
The stack object “info” has a total size of 128 bytes; however,
only 28 bytes are initialized. The remaining uninitialized bytes
are sent to userland via send_signal.

Signed-off-by: Kangjie Lu 
---
 kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index aa9bf00..f40f0b3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1594,6 +1594,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
sig = SIGCHLD;
}
 
+   memset(, 0, sizeof(info));
info.si_signo = sig;
info.si_errno = 0;
/*
-- 
1.9.1



[PATCH] infoleak fix2 in signal

2016-05-03 Thread Kangjie Lu
The stack object “info” has a total size of 128 bytes; however,
only 32 bytes are initialized. The remaining uninitialized bytes
are sent to userland via send_signal.

Signed-off-by: Kangjie Lu 
---
 kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index f40f0b3..26fdb74 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1689,6 +1689,7 @@ static void do_notify_parent_cldstop(struct task_struct 
*tsk,
parent = tsk->real_parent;
}
 
+   memset(, 0, sizeof(info));
info.si_signo = SIGCHLD;
info.si_errno = 0;
/*
-- 
1.9.1



[PATCH] infoleak fix2 in signal

2016-05-03 Thread Kangjie Lu
The stack object “info” has a total size of 128 bytes; however,
only 32 bytes are initialized. The remaining uninitialized bytes
are sent to userland via send_signal.

Signed-off-by: Kangjie Lu 
---
 kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index f40f0b3..26fdb74 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1689,6 +1689,7 @@ static void do_notify_parent_cldstop(struct task_struct 
*tsk,
parent = tsk->real_parent;
}
 
+   memset(, 0, sizeof(info));
info.si_signo = SIGCHLD;
info.si_errno = 0;
/*
-- 
1.9.1



[GIT] Networking

2016-05-03 Thread David Miller

Some straggler bug fixes:

1) Batman-adv DAT must consider VLAN IDs when choosing candidate nodes, from
   Antonio Quartulli.

2) Fix botched reference counting of vlan objects and neigh nodes in batman-adv,
   from Sven Eckelmann.

3) netem can crash when it sees GSO packets, the fix is to segment then upon
   ->enqueue.  Fix from Neil Horman with help from Eric Dumazet.

4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew Finlay.

5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5, since it
   can sleep.  Fix also from Matthew Finlay.

6) Check mdiobus_scan() return values properly in pxa168_eth and macb
   drivers.  From Sergei Shtylyov.

7) If the netdevice doesn't support checksumming, disable segmentation.
   From Alexandery Duyck.

8) Fix races between RDS tcp accept and sending, from Sowmini Varadhan.

9) In macb driver, probe MDIO bus before we register the netdev, otherwise
   we can try to open the device before it is really ready for that.  Fix
  from Florian Fainelli.

10) Netlink attribute size for ILA "tunnels" not calculated properly, fix
from Nicolas Dichtel.

Please pull, thanks a lot!

The following changes since commit 33656a1f2ee5346c742d63ddd0e0970c95a56b70:

  Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2016-05-02 
09:59:57 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 79e8dc8b80bff0bc5bbb90ca5e73044bf207c8ac:

  ipv6/ila: fix nlsize calculation for lwtunnel (2016-05-03 16:21:33 -0400)


Alexander Duyck (2):
  net: Disable segmentation if checksumming is not supported
  vxlan: Add checksum check to the features check function

Anna-Maria Gleixner (1):
  net: mvneta: Remove superfluous SMP function call

Antonio Quartulli (2):
  batman-adv: fix DAT candidate selection (must use vid)
  batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP 
event

David S. Miller (4):
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge
  Merge branch 'mlx5-fixes'
  Merge branch 'tunnel-csum-and-sg-offloads'
  Merge branch 'rds-fixes'

Florian Fainelli (1):
  net: macb: Probe MDIO bus before registering netdev

Gal Pressman (1):
  net/mlx5: Unmap only the relevant IO memory mapping

Matthew Finlay (3):
  net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue
  net/mlx5e: Implement a mlx5e workqueue
  net/mlx5e: Use workqueue for vxlan ops

Neil Horman (1):
  netem: Segment GSO packets on enqueue

Nicolas Dichtel (1):
  ipv6/ila: fix nlsize calculation for lwtunnel

Sergei Shtylyov (2):
  pxa168_eth: fix mdiobus_scan() error check
  macb: fix mdiobus_scan() error check

Sowmini Varadhan (2):
  RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting 
t_sock
  RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.

Sven Eckelmann (2):
  batman-adv: Fix reference counting of vlan object for tt_local_entry
  batman-adv: Fix reference counting of hardif_neigh_node object for 
neigh_node

 drivers/net/ethernet/cadence/macb.c   | 34 
+-
 drivers/net/ethernet/marvell/mvneta.c |  6 ++
 drivers/net/ethernet/marvell/pxa168_eth.c |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 34 
+-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/vxlan.c   | 50 
++
 drivers/net/ethernet/mellanox/mlx5/core/vxlan.h   | 11 +--
 include/linux/if_ether.h  |  5 +
 include/net/vxlan.h   |  4 +++-
 net/batman-adv/bat_v.c| 12 
 net/batman-adv/distributed-arp-table.c| 17 ++---
 net/batman-adv/hard-interface.c   |  3 +++
 net/batman-adv/originator.c   | 16 +---
 net/batman-adv/translation-table.c| 42 
--
 net/batman-adv/types.h|  7 +++
 net/core/dev.c|  2 +-
 net/ipv6/ila/ila_lwt.c|  3 +--
 net/rds/tcp.c |  3 ++-
 net/rds/tcp.h |  4 
 net/rds/tcp_connect.c |  8 
 net/rds/tcp_listen.c  | 54 
--
 net/sched/sch_netem.c | 61 
+++--
 24 

[GIT] Networking

2016-05-03 Thread David Miller

Some straggler bug fixes:

1) Batman-adv DAT must consider VLAN IDs when choosing candidate nodes, from
   Antonio Quartulli.

2) Fix botched reference counting of vlan objects and neigh nodes in batman-adv,
   from Sven Eckelmann.

3) netem can crash when it sees GSO packets, the fix is to segment then upon
   ->enqueue.  Fix from Neil Horman with help from Eric Dumazet.

4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew Finlay.

5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5, since it
   can sleep.  Fix also from Matthew Finlay.

6) Check mdiobus_scan() return values properly in pxa168_eth and macb
   drivers.  From Sergei Shtylyov.

7) If the netdevice doesn't support checksumming, disable segmentation.
   From Alexandery Duyck.

8) Fix races between RDS tcp accept and sending, from Sowmini Varadhan.

9) In macb driver, probe MDIO bus before we register the netdev, otherwise
   we can try to open the device before it is really ready for that.  Fix
  from Florian Fainelli.

10) Netlink attribute size for ILA "tunnels" not calculated properly, fix
from Nicolas Dichtel.

Please pull, thanks a lot!

The following changes since commit 33656a1f2ee5346c742d63ddd0e0970c95a56b70:

  Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2016-05-02 
09:59:57 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 79e8dc8b80bff0bc5bbb90ca5e73044bf207c8ac:

  ipv6/ila: fix nlsize calculation for lwtunnel (2016-05-03 16:21:33 -0400)


Alexander Duyck (2):
  net: Disable segmentation if checksumming is not supported
  vxlan: Add checksum check to the features check function

Anna-Maria Gleixner (1):
  net: mvneta: Remove superfluous SMP function call

Antonio Quartulli (2):
  batman-adv: fix DAT candidate selection (must use vid)
  batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP 
event

David S. Miller (4):
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge
  Merge branch 'mlx5-fixes'
  Merge branch 'tunnel-csum-and-sg-offloads'
  Merge branch 'rds-fixes'

Florian Fainelli (1):
  net: macb: Probe MDIO bus before registering netdev

Gal Pressman (1):
  net/mlx5: Unmap only the relevant IO memory mapping

Matthew Finlay (3):
  net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue
  net/mlx5e: Implement a mlx5e workqueue
  net/mlx5e: Use workqueue for vxlan ops

Neil Horman (1):
  netem: Segment GSO packets on enqueue

Nicolas Dichtel (1):
  ipv6/ila: fix nlsize calculation for lwtunnel

Sergei Shtylyov (2):
  pxa168_eth: fix mdiobus_scan() error check
  macb: fix mdiobus_scan() error check

Sowmini Varadhan (2):
  RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting 
t_sock
  RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.

Sven Eckelmann (2):
  batman-adv: Fix reference counting of vlan object for tt_local_entry
  batman-adv: Fix reference counting of hardif_neigh_node object for 
neigh_node

 drivers/net/ethernet/cadence/macb.c   | 34 
+-
 drivers/net/ethernet/marvell/mvneta.c |  6 ++
 drivers/net/ethernet/marvell/pxa168_eth.c |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 34 
+-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c |  6 --
 drivers/net/ethernet/mellanox/mlx5/core/vxlan.c   | 50 
++
 drivers/net/ethernet/mellanox/mlx5/core/vxlan.h   | 11 +--
 include/linux/if_ether.h  |  5 +
 include/net/vxlan.h   |  4 +++-
 net/batman-adv/bat_v.c| 12 
 net/batman-adv/distributed-arp-table.c| 17 ++---
 net/batman-adv/hard-interface.c   |  3 +++
 net/batman-adv/originator.c   | 16 +---
 net/batman-adv/translation-table.c| 42 
--
 net/batman-adv/types.h|  7 +++
 net/core/dev.c|  2 +-
 net/ipv6/ila/ila_lwt.c|  3 +--
 net/rds/tcp.c |  3 ++-
 net/rds/tcp.h |  4 
 net/rds/tcp_connect.c |  8 
 net/rds/tcp_listen.c  | 54 
--
 net/sched/sch_netem.c | 61 
+++--
 24 

[PATCH] fix infoleak in wireless

2016-05-03 Thread Kangjie Lu
The 6-bytes array “mac_addr” is not initialized in the dump_station
implementations of “drivers/staging/wilc1000/wilc_wfi_cfgoperations.c”
and “drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c”, so all 6
bytes may be leaked.

Signed-off-by: Kangjie Lu 
---
 net/wireless/nl80211.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 056a730..2e92d14 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -3905,6 +3905,7 @@ static int nl80211_dump_station(struct sk_buff *skb,
 
while (1) {
memset(, 0, sizeof(sinfo));
+   eth_zero_addr(mac_addr);
err = rdev_dump_station(rdev, wdev->netdev, sta_idx,
mac_addr, );
if (err == -ENOENT)
-- 
1.9.1



[PATCH] fix infoleak in wireless

2016-05-03 Thread Kangjie Lu
The 6-bytes array “mac_addr” is not initialized in the dump_station
implementations of “drivers/staging/wilc1000/wilc_wfi_cfgoperations.c”
and “drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c”, so all 6
bytes may be leaked.

Signed-off-by: Kangjie Lu 
---
 net/wireless/nl80211.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 056a730..2e92d14 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -3905,6 +3905,7 @@ static int nl80211_dump_station(struct sk_buff *skb,
 
while (1) {
memset(, 0, sizeof(sinfo));
+   eth_zero_addr(mac_addr);
err = rdev_dump_station(rdev, wdev->netdev, sta_idx,
mac_addr, );
if (err == -ENOENT)
-- 
1.9.1



Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)

2016-05-03 Thread David Howells
(cc'ing Tadeusz as he did the pkcs1 padding function)

Jamie Heilman  wrote:

> > > Problem loading in-kernel X.509 certificate (-2)
> > 
> > ENOENT?  Hmmm...  The only place that is generated is in the crypto layer.
> > That suggests missing crypto of some sort.
> > 
> > The attached patch enables some debugging in some relevant files if you can
> > try applying it to your kernel.
> 
> Alrighty, presumably relevant bits:
>
> X.509: Cert Issuer: Build time autogenerated kernel key
> X.509: Cert Subject: Build time autogenerated kernel key
> X.509: Cert Key Algo: rsa
> X.509: Cert Valid period: 1461826791-4615426791
> X.509: Cert Signature: rsa + sha512
> X.509: ==>x509_check_signature()
> X.509: ==>x509_get_sig_params()
> X.509: <==x509_get_sig_params() = 0
> PKEY: ==>public_key_verify_signature()
> X.509: Cert Verification: -2

Hmmm...  Okay, the only ways out of public_key_verify_signature() without
printing a leaving message are for snprintf() to overrun (which would return
error -22) or for crypto_alloc_akcipher() to have failed; everything else must
go through the kleave() at the pr_devel() at the bottom of the function.

Can you stick:

pr_devel("ALGO: %s\n", alg_name);

immediately before this line:

tfm = crypto_alloc_akcipher(alg_name, 0, 0);

and try it again?

Thanks,
David


Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)

2016-05-03 Thread David Howells
(cc'ing Tadeusz as he did the pkcs1 padding function)

Jamie Heilman  wrote:

> > > Problem loading in-kernel X.509 certificate (-2)
> > 
> > ENOENT?  Hmmm...  The only place that is generated is in the crypto layer.
> > That suggests missing crypto of some sort.
> > 
> > The attached patch enables some debugging in some relevant files if you can
> > try applying it to your kernel.
> 
> Alrighty, presumably relevant bits:
>
> X.509: Cert Issuer: Build time autogenerated kernel key
> X.509: Cert Subject: Build time autogenerated kernel key
> X.509: Cert Key Algo: rsa
> X.509: Cert Valid period: 1461826791-4615426791
> X.509: Cert Signature: rsa + sha512
> X.509: ==>x509_check_signature()
> X.509: ==>x509_get_sig_params()
> X.509: <==x509_get_sig_params() = 0
> PKEY: ==>public_key_verify_signature()
> X.509: Cert Verification: -2

Hmmm...  Okay, the only ways out of public_key_verify_signature() without
printing a leaving message are for snprintf() to overrun (which would return
error -22) or for crypto_alloc_akcipher() to have failed; everything else must
go through the kleave() at the pr_devel() at the bottom of the function.

Can you stick:

pr_devel("ALGO: %s\n", alg_name);

immediately before this line:

tfm = crypto_alloc_akcipher(alg_name, 0, 0);

and try it again?

Thanks,
David


[PATCH] fix infoleak in mm

2016-05-03 Thread Kangjie Lu
The stack object “si” has a total size of 128; however, only 20
bytes are initialized. The remaining uninitialized bytes are sent
to userland via send_signal

Signed-off-by: Kangjie Lu 
---
 arch/arm64/mm/fault.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 95df28b..f790eda 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -117,6 +117,7 @@ static void __do_user_fault(struct task_struct *tsk, 
unsigned long addr,
 {
struct siginfo si;
 
+   memset(, 0, sizeof(si));
if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) 
{
pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
tsk->comm, task_pid_nr(tsk), fault_name(esr), sig,
-- 
1.9.1



[PATCH] fix infoleak in mm

2016-05-03 Thread Kangjie Lu
The stack object “si” has a total size of 128; however, only 20
bytes are initialized. The remaining uninitialized bytes are sent
to userland via send_signal

Signed-off-by: Kangjie Lu 
---
 arch/arm64/mm/fault.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 95df28b..f790eda 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -117,6 +117,7 @@ static void __do_user_fault(struct task_struct *tsk, 
unsigned long addr,
 {
struct siginfo si;
 
+   memset(, 0, sizeof(si));
if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) 
{
pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
tsk->comm, task_pid_nr(tsk), fault_name(esr), sig,
-- 
1.9.1



[PATCH] fix infoleak in llc

2016-05-03 Thread Kangjie Lu
The stack object “info” has a total size of 12 bytes. Its last byte
is padding which is not initialized and leaked via “put_cmsg”.

Signed-off-by: Kangjie Lu 
---
 net/llc/af_llc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index b3c52e3..8ae3ed9 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -626,6 +626,7 @@ static void llc_cmsg_rcv(struct msghdr *msg, struct sk_buff 
*skb)
if (llc->cmsg_flags & LLC_CMSG_PKTINFO) {
struct llc_pktinfo info;
 
+   memset(, 0, sizeof(info));
info.lpi_ifindex = llc_sk(skb->sk)->dev->ifindex;
llc_pdu_decode_dsap(skb, _sap);
llc_pdu_decode_da(skb, info.lpi_mac);
-- 
1.9.1



[PATCH] fix infoleak in llc

2016-05-03 Thread Kangjie Lu
The stack object “info” has a total size of 12 bytes. Its last byte
is padding which is not initialized and leaked via “put_cmsg”.

Signed-off-by: Kangjie Lu 
---
 net/llc/af_llc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index b3c52e3..8ae3ed9 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -626,6 +626,7 @@ static void llc_cmsg_rcv(struct msghdr *msg, struct sk_buff 
*skb)
if (llc->cmsg_flags & LLC_CMSG_PKTINFO) {
struct llc_pktinfo info;
 
+   memset(, 0, sizeof(info));
info.lpi_ifindex = llc_sk(skb->sk)->dev->ifindex;
llc_pdu_decode_dsap(skb, _sap);
llc_pdu_decode_da(skb, info.lpi_mac);
-- 
1.9.1



[PATCH] fix infoleak in fcntl

2016-05-03 Thread Kangjie Lu
The stack object “si” has a total size of 128 bytes; however, only
16 bytes are initialized. The remaining uninitialized bytes are
sent to userland via send_signal.

Signed-off-by: Kangjie Lu 
---
 fs/fcntl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 350a2c8..d06f943 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -468,6 +468,7 @@ static void send_sigio_to_task(struct task_struct *p,
   delivered even if we can't queue.  Failure to
   queue in this case _should_ be reported; we fall
   back to SIGIO in that case. --sct */
+   memset(, 0, sizeof(si));
si.si_signo = signum;
si.si_errno = 0;
si.si_code  = reason;
-- 
1.9.1



[PATCH] fix infoleak in fcntl

2016-05-03 Thread Kangjie Lu
The stack object “si” has a total size of 128 bytes; however, only
16 bytes are initialized. The remaining uninitialized bytes are
sent to userland via send_signal.

Signed-off-by: Kangjie Lu 
---
 fs/fcntl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 350a2c8..d06f943 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -468,6 +468,7 @@ static void send_sigio_to_task(struct task_struct *p,
   delivered even if we can't queue.  Failure to
   queue in this case _should_ be reported; we fall
   back to SIGIO in that case. --sct */
+   memset(, 0, sizeof(si));
si.si_signo = signum;
si.si_errno = 0;
si.si_code  = reason;
-- 
1.9.1



[PATCH 0/3] [media] s5p-mfc: Fixes for issues when module is removed

2016-05-03 Thread Javier Martinez Canillas
Hello,

This patch series fixes some issues that I noticed when trying to remove
the s5p-mfc driver when built as a module.

Some of these issues will be fixed once Marek's patches to convert the
custom memory region reservation code is replaced by a generic one that
supports named memory region reservation [0]. But the fixes are trivial
so we can fix the current code until his rework patch lands.

[0]: https://patchwork.linuxtv.org/patch/32287/

Best regards,
Javier


Javier Martinez Canillas (3):
  [media] s5p-mfc: Set device name for reserved memory region devs
  [media] s5p-mfc: Add release callback for memory region devs
  [media] s5p-mfc: Fix race between s5p_mfc_probe() and s5p_mfc_open()

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 50 
 1 file changed, 32 insertions(+), 18 deletions(-)

-- 
2.5.5



[PATCH] fix infoleak in devio

2016-05-03 Thread Kangjie Lu
The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.

Signed-off-by: Kangjie Lu 
---
 drivers/usb/core/devio.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 52c4461..9b7f1f7 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -1316,10 +1316,11 @@ static int proc_getdriver(struct usb_dev_state *ps, 
void __user *arg)
 
 static int proc_connectinfo(struct usb_dev_state *ps, void __user *arg)
 {
-   struct usbdevfs_connectinfo ci = {
-   .devnum = ps->dev->devnum,
-   .slow = ps->dev->speed == USB_SPEED_LOW
-   };
+   struct usbdevfs_connectinfo ci;
+
+   memset(, 0, sizeof(ci));
+   ci.devnum = ps->dev->devnum;
+   ci.slow = ps->dev->speed == USB_SPEED_LOW;
 
if (copy_to_user(arg, , sizeof(ci)))
return -EFAULT;
-- 
1.9.1



[PATCH 0/3] [media] s5p-mfc: Fixes for issues when module is removed

2016-05-03 Thread Javier Martinez Canillas
Hello,

This patch series fixes some issues that I noticed when trying to remove
the s5p-mfc driver when built as a module.

Some of these issues will be fixed once Marek's patches to convert the
custom memory region reservation code is replaced by a generic one that
supports named memory region reservation [0]. But the fixes are trivial
so we can fix the current code until his rework patch lands.

[0]: https://patchwork.linuxtv.org/patch/32287/

Best regards,
Javier


Javier Martinez Canillas (3):
  [media] s5p-mfc: Set device name for reserved memory region devs
  [media] s5p-mfc: Add release callback for memory region devs
  [media] s5p-mfc: Fix race between s5p_mfc_probe() and s5p_mfc_open()

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 50 
 1 file changed, 32 insertions(+), 18 deletions(-)

-- 
2.5.5



[PATCH] fix infoleak in devio

2016-05-03 Thread Kangjie Lu
The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.

Signed-off-by: Kangjie Lu 
---
 drivers/usb/core/devio.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 52c4461..9b7f1f7 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -1316,10 +1316,11 @@ static int proc_getdriver(struct usb_dev_state *ps, 
void __user *arg)
 
 static int proc_connectinfo(struct usb_dev_state *ps, void __user *arg)
 {
-   struct usbdevfs_connectinfo ci = {
-   .devnum = ps->dev->devnum,
-   .slow = ps->dev->speed == USB_SPEED_LOW
-   };
+   struct usbdevfs_connectinfo ci;
+
+   memset(, 0, sizeof(ci));
+   ci.devnum = ps->dev->devnum;
+   ci.slow = ps->dev->speed == USB_SPEED_LOW;
 
if (copy_to_user(arg, , sizeof(ci)))
return -EFAULT;
-- 
1.9.1



[PATCH 3/3] [media] s5p-mfc: Fix race between s5p_mfc_probe() and s5p_mfc_open()

2016-05-03 Thread Javier Martinez Canillas
The s5p_mfc_probe() function registers the video devices before all the
resources needed by s5p_mfc_open() are correctly initalized.

So if s5p_mfc_open() function is called before s5p_mfc_probe() finishes
(since the video dev is already registered), a NULL pointer dereference
will happen due s5p_mfc_open() accessing uninitialized vars such as the
struct s5p_mfc_dev .watchdog_timer and .mfc_ops fields.

An example is following BUG caused by add_timer() getting a NULL pointer:

[   45.765374] kernel BUG at kernel/time/timer.c:790!
[   45.765381] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
[   45.766149] [] (mod_timer) from [] 
(s5p_mfc_open+0x274/0x4d4 [s5p_mfc])
[   45.766416] [] (s5p_mfc_open [s5p_mfc]) from [] 
(v4l2_open+0x9c/0x100 [videodev])
[   45.766547] [] (v4l2_open [videodev]) from [] 
(chrdev_open+0x9c/0x178)
[   45.766575] [] (chrdev_open) from [] 
(do_dentry_open+0x1e0/0x300)
[   45.766595] [] (do_dentry_open) from [] 
(path_openat+0x800/0x10d4)
[   45.766610] [] (path_openat) from [] 
(do_filp_open+0x5c/0xc0)
[   45.766624] [] (do_filp_open) from [] 
(do_sys_open+0x10c/0x1bc)
[   45.766642] [] (do_sys_open) from [] 
(ret_fast_syscall+0x0/0x3c)
[   45.766655] Code: eae3 e3a1 e28dd008 e8bd81f0 (e7f001f2)

Fix it by registering the video devs as the last step in s5p_mfc_probe().

Signed-off-by: Javier Martinez Canillas 

---

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 39 +---
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index beb4fd5bd326..501d822cad6b 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1212,14 +1212,6 @@ static int s5p_mfc_probe(struct platform_device *pdev)
vfd->vfl_dir= VFL_DIR_M2M;
snprintf(vfd->name, sizeof(vfd->name), "%s", S5P_MFC_DEC_NAME);
dev->vfd_dec= vfd;
-   ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
-   if (ret) {
-   v4l2_err(>v4l2_dev, "Failed to register video device\n");
-   video_device_release(vfd);
-   goto err_dec_reg;
-   }
-   v4l2_info(>v4l2_dev,
- "decoder registered as /dev/video%d\n", vfd->num);
video_set_drvdata(vfd, dev);
 
/* encoder */
@@ -1237,14 +1229,6 @@ static int s5p_mfc_probe(struct platform_device *pdev)
vfd->vfl_dir= VFL_DIR_M2M;
snprintf(vfd->name, sizeof(vfd->name), "%s", S5P_MFC_ENC_NAME);
dev->vfd_enc= vfd;
-   ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
-   if (ret) {
-   v4l2_err(>v4l2_dev, "Failed to register video device\n");
-   video_device_release(vfd);
-   goto err_enc_reg;
-   }
-   v4l2_info(>v4l2_dev,
- "encoder registered as /dev/video%d\n", vfd->num);
video_set_drvdata(vfd, dev);
platform_set_drvdata(pdev, dev);
 
@@ -1261,15 +1245,34 @@ static int s5p_mfc_probe(struct platform_device *pdev)
s5p_mfc_init_hw_cmds(dev);
s5p_mfc_init_regs(dev);
 
+   /* Register decoder and encoder */
+   ret = video_register_device(dev->vfd_dec, VFL_TYPE_GRABBER, 0);
+   if (ret) {
+   v4l2_err(>v4l2_dev, "Failed to register video device\n");
+   video_device_release(dev->vfd_dec);
+   goto err_dec_reg;
+   }
+   v4l2_info(>v4l2_dev,
+ "decoder registered as /dev/video%d\n", dev->vfd_dec->num);
+
+   ret = video_register_device(dev->vfd_enc, VFL_TYPE_GRABBER, 0);
+   if (ret) {
+   v4l2_err(>v4l2_dev, "Failed to register video device\n");
+   video_device_release(dev->vfd_enc);
+   goto err_enc_reg;
+   }
+   v4l2_info(>v4l2_dev,
+ "encoder registered as /dev/video%d\n", dev->vfd_enc->num);
+
pr_debug("%s--\n", __func__);
return 0;
 
 /* Deinit MFC if probe had failed */
 err_enc_reg:
-   video_device_release(dev->vfd_enc);
-err_enc_alloc:
video_unregister_device(dev->vfd_dec);
 err_dec_reg:
+   video_device_release(dev->vfd_enc);
+err_enc_alloc:
video_device_release(dev->vfd_dec);
 err_dec_alloc:
v4l2_device_unregister(>v4l2_dev);
-- 
2.5.5



[PATCH 3/3] [media] s5p-mfc: Fix race between s5p_mfc_probe() and s5p_mfc_open()

2016-05-03 Thread Javier Martinez Canillas
The s5p_mfc_probe() function registers the video devices before all the
resources needed by s5p_mfc_open() are correctly initalized.

So if s5p_mfc_open() function is called before s5p_mfc_probe() finishes
(since the video dev is already registered), a NULL pointer dereference
will happen due s5p_mfc_open() accessing uninitialized vars such as the
struct s5p_mfc_dev .watchdog_timer and .mfc_ops fields.

An example is following BUG caused by add_timer() getting a NULL pointer:

[   45.765374] kernel BUG at kernel/time/timer.c:790!
[   45.765381] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
[   45.766149] [] (mod_timer) from [] 
(s5p_mfc_open+0x274/0x4d4 [s5p_mfc])
[   45.766416] [] (s5p_mfc_open [s5p_mfc]) from [] 
(v4l2_open+0x9c/0x100 [videodev])
[   45.766547] [] (v4l2_open [videodev]) from [] 
(chrdev_open+0x9c/0x178)
[   45.766575] [] (chrdev_open) from [] 
(do_dentry_open+0x1e0/0x300)
[   45.766595] [] (do_dentry_open) from [] 
(path_openat+0x800/0x10d4)
[   45.766610] [] (path_openat) from [] 
(do_filp_open+0x5c/0xc0)
[   45.766624] [] (do_filp_open) from [] 
(do_sys_open+0x10c/0x1bc)
[   45.766642] [] (do_sys_open) from [] 
(ret_fast_syscall+0x0/0x3c)
[   45.766655] Code: eae3 e3a1 e28dd008 e8bd81f0 (e7f001f2)

Fix it by registering the video devs as the last step in s5p_mfc_probe().

Signed-off-by: Javier Martinez Canillas 

---

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 39 +---
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index beb4fd5bd326..501d822cad6b 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1212,14 +1212,6 @@ static int s5p_mfc_probe(struct platform_device *pdev)
vfd->vfl_dir= VFL_DIR_M2M;
snprintf(vfd->name, sizeof(vfd->name), "%s", S5P_MFC_DEC_NAME);
dev->vfd_dec= vfd;
-   ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
-   if (ret) {
-   v4l2_err(>v4l2_dev, "Failed to register video device\n");
-   video_device_release(vfd);
-   goto err_dec_reg;
-   }
-   v4l2_info(>v4l2_dev,
- "decoder registered as /dev/video%d\n", vfd->num);
video_set_drvdata(vfd, dev);
 
/* encoder */
@@ -1237,14 +1229,6 @@ static int s5p_mfc_probe(struct platform_device *pdev)
vfd->vfl_dir= VFL_DIR_M2M;
snprintf(vfd->name, sizeof(vfd->name), "%s", S5P_MFC_ENC_NAME);
dev->vfd_enc= vfd;
-   ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
-   if (ret) {
-   v4l2_err(>v4l2_dev, "Failed to register video device\n");
-   video_device_release(vfd);
-   goto err_enc_reg;
-   }
-   v4l2_info(>v4l2_dev,
- "encoder registered as /dev/video%d\n", vfd->num);
video_set_drvdata(vfd, dev);
platform_set_drvdata(pdev, dev);
 
@@ -1261,15 +1245,34 @@ static int s5p_mfc_probe(struct platform_device *pdev)
s5p_mfc_init_hw_cmds(dev);
s5p_mfc_init_regs(dev);
 
+   /* Register decoder and encoder */
+   ret = video_register_device(dev->vfd_dec, VFL_TYPE_GRABBER, 0);
+   if (ret) {
+   v4l2_err(>v4l2_dev, "Failed to register video device\n");
+   video_device_release(dev->vfd_dec);
+   goto err_dec_reg;
+   }
+   v4l2_info(>v4l2_dev,
+ "decoder registered as /dev/video%d\n", dev->vfd_dec->num);
+
+   ret = video_register_device(dev->vfd_enc, VFL_TYPE_GRABBER, 0);
+   if (ret) {
+   v4l2_err(>v4l2_dev, "Failed to register video device\n");
+   video_device_release(dev->vfd_enc);
+   goto err_enc_reg;
+   }
+   v4l2_info(>v4l2_dev,
+ "encoder registered as /dev/video%d\n", dev->vfd_enc->num);
+
pr_debug("%s--\n", __func__);
return 0;
 
 /* Deinit MFC if probe had failed */
 err_enc_reg:
-   video_device_release(dev->vfd_enc);
-err_enc_alloc:
video_unregister_device(dev->vfd_dec);
 err_dec_reg:
+   video_device_release(dev->vfd_enc);
+err_enc_alloc:
video_device_release(dev->vfd_dec);
 err_dec_alloc:
v4l2_device_unregister(>v4l2_dev);
-- 
2.5.5



[PATCH 2/3] [media] s5p-mfc: Add release callback for memory region devs

2016-05-03 Thread Javier Martinez Canillas
When s5p_mfc_remove() calls put_device() for the reserved memory region
devs, the driver core warns that the dev doesn't have a release callback:

WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be 
fixed.

Also, the declared DMA memory using dma_declare_coherent_memory() isn't
relased so add a dev .release that calls dma_release_declared_memory().

Cc: 
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init")
Signed-off-by: Javier Martinez Canillas 

---

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index 8fc1fd4ee2ed..beb4fd5bd326 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1050,6 +1050,11 @@ static int match_child(struct device *dev, void *data)
return !strcmp(dev_name(dev), (char *)data);
 }
 
+static void s5p_mfc_memdev_release(struct device *dev)
+{
+   dma_release_declared_memory(dev);
+}
+
 static void *mfc_get_drv_data(struct platform_device *pdev);
 
 static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
@@ -1064,6 +1069,7 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
}
 
dev_set_name(dev->mem_dev_l, "%s", "s5p-mfc-l");
+   dev->mem_dev_l->release = s5p_mfc_memdev_release;
device_initialize(dev->mem_dev_l);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-l", mem_info, 2);
@@ -1083,6 +1089,7 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
}
 
dev_set_name(dev->mem_dev_r, "%s", "s5p-mfc-r");
+   dev->mem_dev_r->release = s5p_mfc_memdev_release;
device_initialize(dev->mem_dev_r);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-r", mem_info, 2);
-- 
2.5.5



[PATCH 1/3] [media] s5p-mfc: Set device name for reserved memory region devs

2016-05-03 Thread Javier Martinez Canillas
The devices don't have a name set, so makes dev_name() returns NULL which
makes harder to identify the devices that are causing issues, for example:

WARNING: CPU: 2 PID: 616 at drivers/base/core.c:251 device_release+0x8c/0x90
Device '(null)' does not have a release() function, it is broken and must be 
fixed.

And after setting the device name:

WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be 
fixed.

Cc: 
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init")
Signed-off-by: Javier Martinez Canillas 

---

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index b16466fe35ee..8fc1fd4ee2ed 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1062,6 +1062,8 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
mfc_err("Not enough memory\n");
return -ENOMEM;
}
+
+   dev_set_name(dev->mem_dev_l, "%s", "s5p-mfc-l");
device_initialize(dev->mem_dev_l);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-l", mem_info, 2);
@@ -1079,6 +1081,8 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
mfc_err("Not enough memory\n");
return -ENOMEM;
}
+
+   dev_set_name(dev->mem_dev_r, "%s", "s5p-mfc-r");
device_initialize(dev->mem_dev_r);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-r", mem_info, 2);
-- 
2.5.5



[PATCH 2/3] [media] s5p-mfc: Add release callback for memory region devs

2016-05-03 Thread Javier Martinez Canillas
When s5p_mfc_remove() calls put_device() for the reserved memory region
devs, the driver core warns that the dev doesn't have a release callback:

WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be 
fixed.

Also, the declared DMA memory using dma_declare_coherent_memory() isn't
relased so add a dev .release that calls dma_release_declared_memory().

Cc: 
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init")
Signed-off-by: Javier Martinez Canillas 

---

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index 8fc1fd4ee2ed..beb4fd5bd326 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1050,6 +1050,11 @@ static int match_child(struct device *dev, void *data)
return !strcmp(dev_name(dev), (char *)data);
 }
 
+static void s5p_mfc_memdev_release(struct device *dev)
+{
+   dma_release_declared_memory(dev);
+}
+
 static void *mfc_get_drv_data(struct platform_device *pdev);
 
 static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
@@ -1064,6 +1069,7 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
}
 
dev_set_name(dev->mem_dev_l, "%s", "s5p-mfc-l");
+   dev->mem_dev_l->release = s5p_mfc_memdev_release;
device_initialize(dev->mem_dev_l);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-l", mem_info, 2);
@@ -1083,6 +1089,7 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
}
 
dev_set_name(dev->mem_dev_r, "%s", "s5p-mfc-r");
+   dev->mem_dev_r->release = s5p_mfc_memdev_release;
device_initialize(dev->mem_dev_r);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-r", mem_info, 2);
-- 
2.5.5



[PATCH 1/3] [media] s5p-mfc: Set device name for reserved memory region devs

2016-05-03 Thread Javier Martinez Canillas
The devices don't have a name set, so makes dev_name() returns NULL which
makes harder to identify the devices that are causing issues, for example:

WARNING: CPU: 2 PID: 616 at drivers/base/core.c:251 device_release+0x8c/0x90
Device '(null)' does not have a release() function, it is broken and must be 
fixed.

And after setting the device name:

WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be 
fixed.

Cc: 
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init")
Signed-off-by: Javier Martinez Canillas 

---

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index b16466fe35ee..8fc1fd4ee2ed 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1062,6 +1062,8 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
mfc_err("Not enough memory\n");
return -ENOMEM;
}
+
+   dev_set_name(dev->mem_dev_l, "%s", "s5p-mfc-l");
device_initialize(dev->mem_dev_l);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-l", mem_info, 2);
@@ -1079,6 +1081,8 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
mfc_err("Not enough memory\n");
return -ENOMEM;
}
+
+   dev_set_name(dev->mem_dev_r, "%s", "s5p-mfc-r");
device_initialize(dev->mem_dev_r);
of_property_read_u32_array(dev->plat_dev->dev.of_node,
"samsung,mfc-r", mem_info, 2);
-- 
2.5.5



<    1   2   3   4   5   6   7   8   9   10   >