Re: [PATCH v4 5/5] staging/android: add flags member to sync ioctl structs

2016-02-28 Thread Emil Velikov
On 27 February 2016 at 15:27, Gustavo Padovan
 wrote:
> Hi Emil,
>
> 2016-02-27 Emil Velikov :
>
>> Hi Gustavo,
>>
>> On 26 February 2016 at 18:31, Gustavo Padovan  wrote:
>> > From: Gustavo Padovan 
>> >
>> > Play safe and add flags member to all structs. So we don't need to
>> > break API or create new IOCTL in the future if new features that requires
>> > flags arises.
>> >
>> > v2: check if flags are valid (zero, in this case)
>> >
>> > Signed-off-by: Gustavo Padovan 
>> > ---
>> >  drivers/staging/android/sync.c  | 7 ++-
>> >  drivers/staging/android/uapi/sync.h | 6 ++
>> >  2 files changed, 12 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/staging/android/sync.c 
>> > b/drivers/staging/android/sync.c
>> > index 837cff5..54fd5ab 100644
>> > --- a/drivers/staging/android/sync.c
>> > +++ b/drivers/staging/android/sync.c
>> > @@ -445,6 +445,11 @@ static long sync_file_ioctl_merge(struct sync_file 
>> > *sync_file,
>> > goto err_put_fd;
>> > }
>> >
>> > +   if (data.flags) {
>> > +   err = -EFAULT;
>> -EINVAL ?
>>
>> > +   goto err_put_fd;
>> > +   }
>> > +
>> > fence2 = sync_file_fdget(data.fd2);
>> > if (!fence2) {
>> > err = -ENOENT;
>> > @@ -511,7 +516,7 @@ static long sync_file_ioctl_fence_info(struct 
>> > sync_file *sync_file,
>> > if (copy_from_user(, (void __user *)arg, sizeof(*info)))
>> > return -EFAULT;
>> >
>> > -   if (in.status || strcmp(in.name, "\0"))
>> > +   if (in.status || in.flags || strcmp(in.name, "\0"))
>> > return -EFAULT;
>> -EINVAL ?
>>
>> >
>> > if (in.num_fences && !in.sync_fence_info)
>> > diff --git a/drivers/staging/android/uapi/sync.h 
>> > b/drivers/staging/android/uapi/sync.h
>> > index 9aad623..f56a6c2 100644
>> > --- a/drivers/staging/android/uapi/sync.h
>> > +++ b/drivers/staging/android/uapi/sync.h
>> > @@ -19,11 +19,13 @@
>> >   * @fd2:   file descriptor of second fence
>> >   * @name:  name of new fence
>> >   * @fence: returns the fd of the new fence to userspace
>> > + * @flags: merge_data flags
>> >   */
>> >  struct sync_merge_data {
>> > __s32   fd2;
>> > charname[32];
>> > __s32   fence;
>> > +   __u32   flags;
>> The overall size of the struct is not multiple of 64bit, so things
>> will end up badly if we decide to extend it in the future. Even if
>> there's a small chance that update will be needed, we might as well
>> pad it now (and check the padding for zero, returning -EINVAL).
>
> I think name could be the first field here.
>
Up-to you really. I'm afraid that it doesn't resolve the issue :-(
As a test add a u64 value at the end of the struct and check the
output of pahole for 32 and 64 bit build.

>>
>> >  };
>> >
>> >  /**
>> > @@ -31,12 +33,14 @@ struct sync_merge_data {
>> >   * @obj_name:  name of parent sync_timeline
>> >   * @driver_name:   name of driver implementing the parent
>> >   * @status:status of the fence 0:active 1:signaled <0:error
>> > + * @flags: fence_info flags
>> >   * @timestamp_ns:  timestamp of status change in nanoseconds
>> >   */
>> >  struct sync_fence_info {
>> > charobj_name[32];
>> > chardriver_name[32];
>> > __s32   status;
>> > +   __u32   flags;
>> > __u64   timestamp_ns;
>> Should we be doing some form of validation in sync_fill_fence_info()
>> of 'flags' ?
>
> Do you think it is necessary? The kernel allocates a zero'ed buffer to
> fill sync_fence_info array.
>
Good point. Missed out the z in kzalloc :-)

-Emil


Re: [PATCH v4 5/5] staging/android: add flags member to sync ioctl structs

2016-02-28 Thread Emil Velikov
On 27 February 2016 at 15:27, Gustavo Padovan
 wrote:
> Hi Emil,
>
> 2016-02-27 Emil Velikov :
>
>> Hi Gustavo,
>>
>> On 26 February 2016 at 18:31, Gustavo Padovan  wrote:
>> > From: Gustavo Padovan 
>> >
>> > Play safe and add flags member to all structs. So we don't need to
>> > break API or create new IOCTL in the future if new features that requires
>> > flags arises.
>> >
>> > v2: check if flags are valid (zero, in this case)
>> >
>> > Signed-off-by: Gustavo Padovan 
>> > ---
>> >  drivers/staging/android/sync.c  | 7 ++-
>> >  drivers/staging/android/uapi/sync.h | 6 ++
>> >  2 files changed, 12 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/staging/android/sync.c 
>> > b/drivers/staging/android/sync.c
>> > index 837cff5..54fd5ab 100644
>> > --- a/drivers/staging/android/sync.c
>> > +++ b/drivers/staging/android/sync.c
>> > @@ -445,6 +445,11 @@ static long sync_file_ioctl_merge(struct sync_file 
>> > *sync_file,
>> > goto err_put_fd;
>> > }
>> >
>> > +   if (data.flags) {
>> > +   err = -EFAULT;
>> -EINVAL ?
>>
>> > +   goto err_put_fd;
>> > +   }
>> > +
>> > fence2 = sync_file_fdget(data.fd2);
>> > if (!fence2) {
>> > err = -ENOENT;
>> > @@ -511,7 +516,7 @@ static long sync_file_ioctl_fence_info(struct 
>> > sync_file *sync_file,
>> > if (copy_from_user(, (void __user *)arg, sizeof(*info)))
>> > return -EFAULT;
>> >
>> > -   if (in.status || strcmp(in.name, "\0"))
>> > +   if (in.status || in.flags || strcmp(in.name, "\0"))
>> > return -EFAULT;
>> -EINVAL ?
>>
>> >
>> > if (in.num_fences && !in.sync_fence_info)
>> > diff --git a/drivers/staging/android/uapi/sync.h 
>> > b/drivers/staging/android/uapi/sync.h
>> > index 9aad623..f56a6c2 100644
>> > --- a/drivers/staging/android/uapi/sync.h
>> > +++ b/drivers/staging/android/uapi/sync.h
>> > @@ -19,11 +19,13 @@
>> >   * @fd2:   file descriptor of second fence
>> >   * @name:  name of new fence
>> >   * @fence: returns the fd of the new fence to userspace
>> > + * @flags: merge_data flags
>> >   */
>> >  struct sync_merge_data {
>> > __s32   fd2;
>> > charname[32];
>> > __s32   fence;
>> > +   __u32   flags;
>> The overall size of the struct is not multiple of 64bit, so things
>> will end up badly if we decide to extend it in the future. Even if
>> there's a small chance that update will be needed, we might as well
>> pad it now (and check the padding for zero, returning -EINVAL).
>
> I think name could be the first field here.
>
Up-to you really. I'm afraid that it doesn't resolve the issue :-(
As a test add a u64 value at the end of the struct and check the
output of pahole for 32 and 64 bit build.

>>
>> >  };
>> >
>> >  /**
>> > @@ -31,12 +33,14 @@ struct sync_merge_data {
>> >   * @obj_name:  name of parent sync_timeline
>> >   * @driver_name:   name of driver implementing the parent
>> >   * @status:status of the fence 0:active 1:signaled <0:error
>> > + * @flags: fence_info flags
>> >   * @timestamp_ns:  timestamp of status change in nanoseconds
>> >   */
>> >  struct sync_fence_info {
>> > charobj_name[32];
>> > chardriver_name[32];
>> > __s32   status;
>> > +   __u32   flags;
>> > __u64   timestamp_ns;
>> Should we be doing some form of validation in sync_fill_fence_info()
>> of 'flags' ?
>
> Do you think it is necessary? The kernel allocates a zero'ed buffer to
> fill sync_fence_info array.
>
Good point. Missed out the z in kzalloc :-)

-Emil


Re: [PATCH v2] signals, pkeys: make si_pkey 32 bits

2016-02-28 Thread Ingo Molnar

* Stephen Rothwell  wrote:

> In order to prevent a change of alignment of the _sifields union in the
> siginfo structure on (some) 32 bit platforms and an ABI breakage, we
> change the type of _pkey to unsigned int.  If more bits are needed in
> the future, a second unsigned int could be added.
> 
> Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key 
> faults")
> Acked-by: Dave Hansen 
> Signed-off-by: Stephen Rothwell 
> ---
>  arch/ia64/include/uapi/asm/siginfo.h | 2 +-
>  arch/mips/include/uapi/asm/siginfo.h | 2 +-
>  include/uapi/asm-generic/siginfo.h   | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/ia64/include/uapi/asm/siginfo.h 
> b/arch/ia64/include/uapi/asm/siginfo.h
> index 0151cfab929d..19e7db0c9453 100644
> --- a/arch/ia64/include/uapi/asm/siginfo.h
> +++ b/arch/ia64/include/uapi/asm/siginfo.h
> @@ -70,7 +70,7 @@ typedef struct siginfo {
>   void __user *_upper;
>   } _addr_bnd;
>   /* used when si_code=SEGV_PKUERR */
> - u64 _pkey;
> + unsigned int _pkey;
>   };
>   } _sigfault;
>  
> diff --git a/arch/mips/include/uapi/asm/siginfo.h 
> b/arch/mips/include/uapi/asm/siginfo.h
> index 6f4edf0d794c..3cc14f4a5936 100644
> --- a/arch/mips/include/uapi/asm/siginfo.h
> +++ b/arch/mips/include/uapi/asm/siginfo.h
> @@ -93,7 +93,7 @@ typedef struct siginfo {
>   void __user *_upper;
>   } _addr_bnd;
>   /* used when si_code=SEGV_PKUERR */
> - u64 _pkey;
> + unsigned int _pkey;
>   };
>   } _sigfault;
>  
> diff --git a/include/uapi/asm-generic/siginfo.h 
> b/include/uapi/asm-generic/siginfo.h
> index 90384d55225b..f4459dc3d31b 100644
> --- a/include/uapi/asm-generic/siginfo.h
> +++ b/include/uapi/asm-generic/siginfo.h
> @@ -98,7 +98,7 @@ typedef struct siginfo {
>   void __user *_upper;
>   } _addr_bnd;
>   /* used when si_code=SEGV_PKUERR */
> - u64 _pkey;
> + unsigned int _pkey;
>   };
>   } _sigfault;
>  

Please use the standard ABI integer type pattern: __u32.

The advantage of only using __[su][8|16|32|64] integer types is that it's 
"obvious" at a glance that an ABI is bitness-invariant.

For example include/uapi/linux/perf_event.h only uses such ABI-safe types, and 
arch/x86/include/uapi is using these types 95%+ of the time.

( The various struct siginfo definitions should probably be harmonized as well, 
  but in a separate patch. )

Thanks,

Ingo


Re: [PATCH v2] signals, pkeys: make si_pkey 32 bits

2016-02-28 Thread Ingo Molnar

* Stephen Rothwell  wrote:

> In order to prevent a change of alignment of the _sifields union in the
> siginfo structure on (some) 32 bit platforms and an ABI breakage, we
> change the type of _pkey to unsigned int.  If more bits are needed in
> the future, a second unsigned int could be added.
> 
> Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key 
> faults")
> Acked-by: Dave Hansen 
> Signed-off-by: Stephen Rothwell 
> ---
>  arch/ia64/include/uapi/asm/siginfo.h | 2 +-
>  arch/mips/include/uapi/asm/siginfo.h | 2 +-
>  include/uapi/asm-generic/siginfo.h   | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/ia64/include/uapi/asm/siginfo.h 
> b/arch/ia64/include/uapi/asm/siginfo.h
> index 0151cfab929d..19e7db0c9453 100644
> --- a/arch/ia64/include/uapi/asm/siginfo.h
> +++ b/arch/ia64/include/uapi/asm/siginfo.h
> @@ -70,7 +70,7 @@ typedef struct siginfo {
>   void __user *_upper;
>   } _addr_bnd;
>   /* used when si_code=SEGV_PKUERR */
> - u64 _pkey;
> + unsigned int _pkey;
>   };
>   } _sigfault;
>  
> diff --git a/arch/mips/include/uapi/asm/siginfo.h 
> b/arch/mips/include/uapi/asm/siginfo.h
> index 6f4edf0d794c..3cc14f4a5936 100644
> --- a/arch/mips/include/uapi/asm/siginfo.h
> +++ b/arch/mips/include/uapi/asm/siginfo.h
> @@ -93,7 +93,7 @@ typedef struct siginfo {
>   void __user *_upper;
>   } _addr_bnd;
>   /* used when si_code=SEGV_PKUERR */
> - u64 _pkey;
> + unsigned int _pkey;
>   };
>   } _sigfault;
>  
> diff --git a/include/uapi/asm-generic/siginfo.h 
> b/include/uapi/asm-generic/siginfo.h
> index 90384d55225b..f4459dc3d31b 100644
> --- a/include/uapi/asm-generic/siginfo.h
> +++ b/include/uapi/asm-generic/siginfo.h
> @@ -98,7 +98,7 @@ typedef struct siginfo {
>   void __user *_upper;
>   } _addr_bnd;
>   /* used when si_code=SEGV_PKUERR */
> - u64 _pkey;
> + unsigned int _pkey;
>   };
>   } _sigfault;
>  

Please use the standard ABI integer type pattern: __u32.

The advantage of only using __[su][8|16|32|64] integer types is that it's 
"obvious" at a glance that an ABI is bitness-invariant.

For example include/uapi/linux/perf_event.h only uses such ABI-safe types, and 
arch/x86/include/uapi is using these types 95%+ of the time.

( The various struct siginfo definitions should probably be harmonized as well, 
  but in a separate patch. )

Thanks,

Ingo


Re: linux-next: manual merge of the iommu tree with the samsung-krzk tree

2016-02-28 Thread Joerg Roedel
Hi Stephen,

On Mon, Feb 29, 2016 at 03:20:55PM +1100, Stephen Rothwell wrote:
> Hi Joerg,
> 
> Today's linux-next merge of the iommu tree got a conflict in:
> 
>   drivers/memory/Kconfig
> 
> between commit:
> 
>   78fbb9361ca3 ("memory: Add support for Exynos SROM driver")
> 
> from the samsung-krzk tree and commit:
> 
>   cc8bbe1a8312 ("memory: mediatek: Add SMI driver")
> 
> from the iommu tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Thanks for fixing this (and the other conflict before) up.



Joerg



Re: linux-next: manual merge of the iommu tree with the samsung-krzk tree

2016-02-28 Thread Joerg Roedel
Hi Stephen,

On Mon, Feb 29, 2016 at 03:20:55PM +1100, Stephen Rothwell wrote:
> Hi Joerg,
> 
> Today's linux-next merge of the iommu tree got a conflict in:
> 
>   drivers/memory/Kconfig
> 
> between commit:
> 
>   78fbb9361ca3 ("memory: Add support for Exynos SROM driver")
> 
> from the samsung-krzk tree and commit:
> 
>   cc8bbe1a8312 ("memory: mediatek: Add SMI driver")
> 
> from the iommu tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Thanks for fixing this (and the other conflict before) up.



Joerg



Re: [PATCH] mm: __delete_from_page_cache WARN_ON(page_mapped)

2016-02-28 Thread Joonsoo Kim
2016-02-29 13:49 GMT+09:00 Hugh Dickins :
> Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount()
> for compound pages") changed the famous BUG_ON(page_mapped(page)) in
> __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which
> gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not.
>
> Although it has not usually been very helpul, being hit long after the
> error in question, we do need to know if it actually happens on users'
> systems; but reinstating a crash there is likely to be opposed :)
>
> In the non-debug case, use WARN_ON() plus dump_page() and add_taint() -
> I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the
> standard procedure now.  Move that, or the VM_BUG_ON_PAGE(), up before
> the deletion from tree: so that the unNULLified page->mapping gives a
> little more information.
>
> If the inode is being evicted (rather than truncated), it won't have
> any vmas left, so it's safe(ish) to assume that the raised mapcount is
> erroneous, and we can discount it from page_count to avoid leaking the
> page (I'm less worried by leaking the occasional 4kB, than losing a
> potential 2MB page with each 4kB page leaked).
>
> Signed-off-by: Hugh Dickins 
> ---
> I think this should go into v4.5, so I've written it with an atomic_sub
> on page->_count; but Joonsoo will probably want some page_ref thingy.

Okay. I will do it after this patch is merged.

Thanks for notification.

Thanks.


Re: [PATCH] mm: __delete_from_page_cache WARN_ON(page_mapped)

2016-02-28 Thread Joonsoo Kim
2016-02-29 13:49 GMT+09:00 Hugh Dickins :
> Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount()
> for compound pages") changed the famous BUG_ON(page_mapped(page)) in
> __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which
> gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not.
>
> Although it has not usually been very helpul, being hit long after the
> error in question, we do need to know if it actually happens on users'
> systems; but reinstating a crash there is likely to be opposed :)
>
> In the non-debug case, use WARN_ON() plus dump_page() and add_taint() -
> I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the
> standard procedure now.  Move that, or the VM_BUG_ON_PAGE(), up before
> the deletion from tree: so that the unNULLified page->mapping gives a
> little more information.
>
> If the inode is being evicted (rather than truncated), it won't have
> any vmas left, so it's safe(ish) to assume that the raised mapcount is
> erroneous, and we can discount it from page_count to avoid leaking the
> page (I'm less worried by leaking the occasional 4kB, than losing a
> potential 2MB page with each 4kB page leaked).
>
> Signed-off-by: Hugh Dickins 
> ---
> I think this should go into v4.5, so I've written it with an atomic_sub
> on page->_count; but Joonsoo will probably want some page_ref thingy.

Okay. I will do it after this patch is merged.

Thanks for notification.

Thanks.


Re: log spammed with "loading xx failed with error -2" since commit e40ba6d56b [replace call to fw_read_file_contents() with kernel version]

2016-02-28 Thread James Morris
On Sun, 28 Feb 2016, Luis R. Rodriguez wrote:

> >From e63d19975787c0e237a47c17efd01e41b2a8e2fa Mon Sep 17 00:00:00 2001
> From: "Luis R. Rodriguez" 
> Date: Sat, 27 Feb 2016 14:58:08 -0800
> Subject: [PATCH] firmware: change kernel read fail to dev_dbg()
> 

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next



-- 
James Morris




Re: log spammed with "loading xx failed with error -2" since commit e40ba6d56b [replace call to fw_read_file_contents() with kernel version]

2016-02-28 Thread James Morris
On Sun, 28 Feb 2016, Luis R. Rodriguez wrote:

> >From e63d19975787c0e237a47c17efd01e41b2a8e2fa Mon Sep 17 00:00:00 2001
> From: "Luis R. Rodriguez" 
> Date: Sat, 27 Feb 2016 14:58:08 -0800
> Subject: [PATCH] firmware: change kernel read fail to dev_dbg()
> 

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next



-- 
James Morris




Re: [PATCH] [RFC] mm/page_ref, crypto/async_pq: don't put_page from __exit

2016-02-28 Thread Joonsoo Kim
2016-02-29 6:57 GMT+09:00 Arnd Bergmann :
> The addition of tracepoints to the page reference tracking had an
> unfortunate side-effect in at least one driver that calls put_page
> from its exit function, resulting in a link error:
>
> `.exit.text' referenced in section `__jump_table' of crypto/built-in.o: 
> defined in discarded section `.exit.text' of crypto/built-in.o
>
> I could not come up with a nice solution that ignores __jump_table
> entries in discarded code, so we probably now have to treat this
> as something a driver is not allowed to do. Removing the __exit
> annotation avoids the problem in this particular driver, but the
> same problem could come back any time in other code.
>
> On a related problem regarding the runtime patching for SMP
> operations on ARM uniprocessor systems, we resorted to not
> drop the .exit section at link time, but that doesn't seem
> appropriate here.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: 0f80830dd044 ("mm/page_ref: add tracepoint to track down page 
> reference manipulation")
> ---
>  crypto/async_tx/async_pq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
> index c0748bbd4c08..be167145aa55 100644
> --- a/crypto/async_tx/async_pq.c
> +++ b/crypto/async_tx/async_pq.c
> @@ -442,7 +442,7 @@ static int __init async_pq_init(void)
> return -ENOMEM;
>  }
>
> -static void __exit async_pq_exit(void)
> +static void async_pq_exit(void)
>  {
> put_page(pq_scribble_page);
>  }

Hello, Arnd.

I think that we can avoid this error by using __free_page().
It would not be inlined so calling it would have no problem.

Could you test it, please?

Thanks.


Re: [PATCH] [RFC] mm/page_ref, crypto/async_pq: don't put_page from __exit

2016-02-28 Thread Joonsoo Kim
2016-02-29 6:57 GMT+09:00 Arnd Bergmann :
> The addition of tracepoints to the page reference tracking had an
> unfortunate side-effect in at least one driver that calls put_page
> from its exit function, resulting in a link error:
>
> `.exit.text' referenced in section `__jump_table' of crypto/built-in.o: 
> defined in discarded section `.exit.text' of crypto/built-in.o
>
> I could not come up with a nice solution that ignores __jump_table
> entries in discarded code, so we probably now have to treat this
> as something a driver is not allowed to do. Removing the __exit
> annotation avoids the problem in this particular driver, but the
> same problem could come back any time in other code.
>
> On a related problem regarding the runtime patching for SMP
> operations on ARM uniprocessor systems, we resorted to not
> drop the .exit section at link time, but that doesn't seem
> appropriate here.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: 0f80830dd044 ("mm/page_ref: add tracepoint to track down page 
> reference manipulation")
> ---
>  crypto/async_tx/async_pq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
> index c0748bbd4c08..be167145aa55 100644
> --- a/crypto/async_tx/async_pq.c
> +++ b/crypto/async_tx/async_pq.c
> @@ -442,7 +442,7 @@ static int __init async_pq_init(void)
> return -ENOMEM;
>  }
>
> -static void __exit async_pq_exit(void)
> +static void async_pq_exit(void)
>  {
> put_page(pq_scribble_page);
>  }

Hello, Arnd.

I think that we can avoid this error by using __free_page().
It would not be inlined so calling it would have no problem.

Could you test it, please?

Thanks.


Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

2016-02-28 Thread Jiri Slaby
On 02/26/2016, 08:59 PM, Robert Święcki wrote:
> It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
> newer AMD FX, and Opteron 300 series (4300, 6300 etc.).

Ok, I can confirm this is:
AMD Opteron(tm) Processor 6348

And:
microcode: CPU0: patch_level=0x06000836

Thank all the interested parties!

-- 
js
suse labs


Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

2016-02-28 Thread Jiri Slaby
On 02/26/2016, 08:59 PM, Robert Święcki wrote:
> It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
> newer AMD FX, and Opteron 300 series (4300, 6300 etc.).

Ok, I can confirm this is:
AMD Opteron(tm) Processor 6348

And:
microcode: CPU0: patch_level=0x06000836

Thank all the interested parties!

-- 
js
suse labs


Re: [PATCH v5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-02-28 Thread Huang Rui
On Fri, Feb 26, 2016 at 11:29:52AM +0100, Borislav Petkov wrote:
> On Fri, Feb 26, 2016 at 11:18:28AM +0100, Thomas Gleixner wrote:
> > On Fri, 26 Feb 2016, Huang Rui wrote:
> > > +/* Event code: LSB 8 bits, passed in attr->config any other bit is 
> > > reserved. */
> > > +#define AMD_POWER_EVENT_MASK 0xFFULL
> > > +
> > > +#define MAX_CUS  8
> > 
> > What's that define for? Max compute units? So is that stuff eternaly limited
> > to 8?
> 
> I already sent him a cleaned up version with that dumbness removed:
> 
> https://lkml.kernel.org/r/20160128145436.ge14...@pd.tnic
> 
> Rui, what's up?
> 

Sorry, I will remove superfluous MAX_CUS check at next version.

Thanks,
Rui


Re: [PATCH v5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-02-28 Thread Huang Rui
On Fri, Feb 26, 2016 at 11:29:52AM +0100, Borislav Petkov wrote:
> On Fri, Feb 26, 2016 at 11:18:28AM +0100, Thomas Gleixner wrote:
> > On Fri, 26 Feb 2016, Huang Rui wrote:
> > > +/* Event code: LSB 8 bits, passed in attr->config any other bit is 
> > > reserved. */
> > > +#define AMD_POWER_EVENT_MASK 0xFFULL
> > > +
> > > +#define MAX_CUS  8
> > 
> > What's that define for? Max compute units? So is that stuff eternaly limited
> > to 8?
> 
> I already sent him a cleaned up version with that dumbness removed:
> 
> https://lkml.kernel.org/r/20160128145436.ge14...@pd.tnic
> 
> Rui, what's up?
> 

Sorry, I will remove superfluous MAX_CUS check at next version.

Thanks,
Rui


[PATCH] PCI: PTM preliminary implementation

2016-02-28 Thread Yong, Jonathan
Simplified Precision Time Measurement driver, activates PTM feature
if a PCIe PTM requester (as per PCI Express 3.1 Base Specification
section 7.32)is found, but not before checking if the rest of the
PCI hierarchy can support it.

The driver does not take part in facilitating PTM conversations,
neither does it provide any useful services, it is only responsible
for setting up the required configuration space bits.

As of writing, there aren't any PTM capable devices on the market
yet, but it is supported by the Intel Apollo Lake platform.

Signed-off-by: Yong, Jonathan 
---
 drivers/pci/pci-sysfs.c |   7 +
 drivers/pci/pci.h   |  21 +++
 drivers/pci/pcie/Kconfig|   8 +
 drivers/pci/pcie/Makefile   |   2 +-
 drivers/pci/pcie/pcie_ptm.c | 353 
 drivers/pci/probe.c |   3 +
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/pcie_ptm.c

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 95d9e7b..c634fd11 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1335,6 +1335,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev 
*dev)
/* Active State Power Management */
pcie_aspm_create_sysfs_dev_files(dev);
 
+   /* PTM */
+   pci_create_ptm_sysfs(dev);
+
if (!pci_probe_reset_function(dev)) {
retval = device_create_file(>dev, _attr);
if (retval)
@@ -1433,6 +1436,10 @@ static void pci_remove_capabilities_sysfs(struct pci_dev 
*dev)
}
 
pcie_aspm_remove_sysfs_dev_files(dev);
+
+   /* PTM */
+   pci_release_ptm_sysfs(dev);
+
if (dev->reset_fn) {
device_remove_file(>dev, _attr);
dev->reset_fn = 0;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9a1660f..fb90420 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -320,6 +320,27 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,
 
 void pci_enable_acs(struct pci_dev *dev);
 
+#ifdef CONFIG_PCIEPORTBUS
+int pci_enable_ptm(struct pci_dev *dev);
+void pci_create_ptm_sysfs(struct pci_dev *dev);
+void pci_release_ptm_sysfs(struct pci_dev *dev);
+void pci_disable_ptm(struct pci_dev *dev);
+#else
+static inline int pci_enable_ptm(struct pci_dev *dev)
+{
+   return -ENXIO;
+}
+static inline void pci_create_ptm_sysfs(struct pci_dev *dev)
+{
+}
+static inline void pci_release_ptm_sysfs(struct pci_dev *dev)
+{
+}
+static inline void pci_disable_ptm(struct pci_dev *dev)
+{
+}
+#endif
+
 struct pci_dev_reset_methods {
u16 vendor;
u16 device;
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index e294713..f65ff4d 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -80,3 +80,11 @@ endchoice
 config PCIE_PME
def_bool y
depends on PCIEPORTBUS && PM
+
+config PCIE_PTM
+   bool "Turn on Precision Time Management by default"
+   depends on PCIEPORTBUS
+   help
+ Say Y here to enable PTM feature on PCI Express devices that
+ support them as they are found during device enumeration. Otherwise
+ the feature can be enabled manually through sysfs entries.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 00c62df..d18b4c7 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -5,7 +5,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o
 
-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie_ptm.o
 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o
 
 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/pcie_ptm.c b/drivers/pci/pcie/pcie_ptm.c
new file mode 100644
index 000..a128c79
--- /dev/null
+++ b/drivers/pci/pcie/pcie_ptm.c
@@ -0,0 +1,353 @@
+/*
+ * PCI Express Precision Time Measurement
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#include 
+#include 
+#include 
+#include "../pci.h"
+
+#define PCI_PTM_REQ0x0001  /* Requester capable */
+#define  PCI_PTM_RSP   0x0002  /* Responder capable */
+#define  PCI_PTM_ROOT  0x0004  /* Root capable */
+#define  PCI_PTM_GRANULITY 0xFF00  /* Local clock granulity */
+#define PCI_PTM_ENABLE 0x0001  /* PTM enable */
+#define  PCI_PTM_ROOT_SEL  0x0002  /* Root select */
+
+#define PCI_PTM_HEADER_REG_OFFSET   

[RFC] PCI: PTM Driver

2016-02-28 Thread Yong, Jonathan
Hello LKML,

This is a preliminary implementation of the PTM[1] support driver, the code
is obviously hacked together and in need of refactoring. This driver has
only been tested against a virtual PCI bus.

The drivers job is to get to every PTM capable device, set some PCI config
space bits, then go back to sleep [2].

PTM capable PCIe devices will get a new sysfs entry to allow PTM to be
enabled if automatic PTM activation is disabled, or disabled if so desired.

Comments? Should I explain the PTM registers in more details?
Please CC me, thanks.

[1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint
clocks against the host clock as specified in the PCI Express Base
Specification 3.1. It is identified by the 0x001f extended capability ID.

PTM capable devices are split into 3 roles, master, responder and requester.
Summary as follows:

A master holds the master clock that will be used for all devices under its
domain (not to be confused with PCI domains). There may be multiple masters
in a PTM hierarchy, in which case, the highest master closest to the root
complex will be selected for the PTM domain. A master is also always
responder capable. Clock precision is signified by a Local Clock
Granularity field, in nano-seconds.

A responder responds to any PTM synchronization requests from a downstream
device. A responder is typically a switch device. It may also hold a local
clock signified by a non-zero Local Clock Granularity field. A value of 0
signifies that the device simply propagates timing information from
upstream devices.

A requester is typically an endpoint that will request synchronization
updates from an upstream PTM capable time source. The driver will update
the Effective Clock Granularity field based on the same field from the
PTM domain master. The field should be programed with a value of 0 if any
intervening responder has a Local Clock Granularity field value of 0.

[2] The software drivers never see the PTM packets, the PCI Express Base
Specificaton 3.1 reads:
PTM capable components can make their PTM context available for
inspection by software, enabling software to translate timing
information between local times and PTM Master Time.

This isn't very informative.

Yong, Jonathan (1):
  PCI: PTM preliminary implementation

 drivers/pci/pci-sysfs.c |   7 +
 drivers/pci/pci.h   |  21 +++
 drivers/pci/pcie/Kconfig|   8 +
 drivers/pci/pcie/Makefile   |   2 +-
 drivers/pci/pcie/pcie_ptm.c | 353 
 drivers/pci/probe.c |   3 +
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/pcie_ptm.c

-- 
2.4.10



[PATCH] PCI: PTM preliminary implementation

2016-02-28 Thread Yong, Jonathan
Simplified Precision Time Measurement driver, activates PTM feature
if a PCIe PTM requester (as per PCI Express 3.1 Base Specification
section 7.32)is found, but not before checking if the rest of the
PCI hierarchy can support it.

The driver does not take part in facilitating PTM conversations,
neither does it provide any useful services, it is only responsible
for setting up the required configuration space bits.

As of writing, there aren't any PTM capable devices on the market
yet, but it is supported by the Intel Apollo Lake platform.

Signed-off-by: Yong, Jonathan 
---
 drivers/pci/pci-sysfs.c |   7 +
 drivers/pci/pci.h   |  21 +++
 drivers/pci/pcie/Kconfig|   8 +
 drivers/pci/pcie/Makefile   |   2 +-
 drivers/pci/pcie/pcie_ptm.c | 353 
 drivers/pci/probe.c |   3 +
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/pcie_ptm.c

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 95d9e7b..c634fd11 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1335,6 +1335,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev 
*dev)
/* Active State Power Management */
pcie_aspm_create_sysfs_dev_files(dev);
 
+   /* PTM */
+   pci_create_ptm_sysfs(dev);
+
if (!pci_probe_reset_function(dev)) {
retval = device_create_file(>dev, _attr);
if (retval)
@@ -1433,6 +1436,10 @@ static void pci_remove_capabilities_sysfs(struct pci_dev 
*dev)
}
 
pcie_aspm_remove_sysfs_dev_files(dev);
+
+   /* PTM */
+   pci_release_ptm_sysfs(dev);
+
if (dev->reset_fn) {
device_remove_file(>dev, _attr);
dev->reset_fn = 0;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9a1660f..fb90420 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -320,6 +320,27 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,
 
 void pci_enable_acs(struct pci_dev *dev);
 
+#ifdef CONFIG_PCIEPORTBUS
+int pci_enable_ptm(struct pci_dev *dev);
+void pci_create_ptm_sysfs(struct pci_dev *dev);
+void pci_release_ptm_sysfs(struct pci_dev *dev);
+void pci_disable_ptm(struct pci_dev *dev);
+#else
+static inline int pci_enable_ptm(struct pci_dev *dev)
+{
+   return -ENXIO;
+}
+static inline void pci_create_ptm_sysfs(struct pci_dev *dev)
+{
+}
+static inline void pci_release_ptm_sysfs(struct pci_dev *dev)
+{
+}
+static inline void pci_disable_ptm(struct pci_dev *dev)
+{
+}
+#endif
+
 struct pci_dev_reset_methods {
u16 vendor;
u16 device;
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index e294713..f65ff4d 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -80,3 +80,11 @@ endchoice
 config PCIE_PME
def_bool y
depends on PCIEPORTBUS && PM
+
+config PCIE_PTM
+   bool "Turn on Precision Time Management by default"
+   depends on PCIEPORTBUS
+   help
+ Say Y here to enable PTM feature on PCI Express devices that
+ support them as they are found during device enumeration. Otherwise
+ the feature can be enabled manually through sysfs entries.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 00c62df..d18b4c7 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -5,7 +5,7 @@
 # Build PCI Express ASPM if needed
 obj-$(CONFIG_PCIEASPM) += aspm.o
 
-pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
+pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o 
pcie_ptm.o
 pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o
 
 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
diff --git a/drivers/pci/pcie/pcie_ptm.c b/drivers/pci/pcie/pcie_ptm.c
new file mode 100644
index 000..a128c79
--- /dev/null
+++ b/drivers/pci/pcie/pcie_ptm.c
@@ -0,0 +1,353 @@
+/*
+ * PCI Express Precision Time Measurement
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#include 
+#include 
+#include 
+#include "../pci.h"
+
+#define PCI_PTM_REQ0x0001  /* Requester capable */
+#define  PCI_PTM_RSP   0x0002  /* Responder capable */
+#define  PCI_PTM_ROOT  0x0004  /* Root capable */
+#define  PCI_PTM_GRANULITY 0xFF00  /* Local clock granulity */
+#define PCI_PTM_ENABLE 0x0001  /* PTM enable */
+#define  PCI_PTM_ROOT_SEL  0x0002  /* Root select */
+
+#define PCI_PTM_HEADER_REG_OFFSET  0x00
+#define 

[RFC] PCI: PTM Driver

2016-02-28 Thread Yong, Jonathan
Hello LKML,

This is a preliminary implementation of the PTM[1] support driver, the code
is obviously hacked together and in need of refactoring. This driver has
only been tested against a virtual PCI bus.

The drivers job is to get to every PTM capable device, set some PCI config
space bits, then go back to sleep [2].

PTM capable PCIe devices will get a new sysfs entry to allow PTM to be
enabled if automatic PTM activation is disabled, or disabled if so desired.

Comments? Should I explain the PTM registers in more details?
Please CC me, thanks.

[1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint
clocks against the host clock as specified in the PCI Express Base
Specification 3.1. It is identified by the 0x001f extended capability ID.

PTM capable devices are split into 3 roles, master, responder and requester.
Summary as follows:

A master holds the master clock that will be used for all devices under its
domain (not to be confused with PCI domains). There may be multiple masters
in a PTM hierarchy, in which case, the highest master closest to the root
complex will be selected for the PTM domain. A master is also always
responder capable. Clock precision is signified by a Local Clock
Granularity field, in nano-seconds.

A responder responds to any PTM synchronization requests from a downstream
device. A responder is typically a switch device. It may also hold a local
clock signified by a non-zero Local Clock Granularity field. A value of 0
signifies that the device simply propagates timing information from
upstream devices.

A requester is typically an endpoint that will request synchronization
updates from an upstream PTM capable time source. The driver will update
the Effective Clock Granularity field based on the same field from the
PTM domain master. The field should be programed with a value of 0 if any
intervening responder has a Local Clock Granularity field value of 0.

[2] The software drivers never see the PTM packets, the PCI Express Base
Specificaton 3.1 reads:
PTM capable components can make their PTM context available for
inspection by software, enabling software to translate timing
information between local times and PTM Master Time.

This isn't very informative.

Yong, Jonathan (1):
  PCI: PTM preliminary implementation

 drivers/pci/pci-sysfs.c |   7 +
 drivers/pci/pci.h   |  21 +++
 drivers/pci/pcie/Kconfig|   8 +
 drivers/pci/pcie/Makefile   |   2 +-
 drivers/pci/pcie/pcie_ptm.c | 353 
 drivers/pci/probe.c |   3 +
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/pcie_ptm.c

-- 
2.4.10



Re: [GIT PULL] tpmdd fix

2016-02-28 Thread James Morris
On Fri, 26 Feb 2016, Jarkko Sakkinen wrote:

> Hi James,
> 
> this is the fix for the build warning.
> 
> /Jarkko
> 
> The following changes since commit 481873d06f2bf2ad732450a3a5fa5b8c2a07ef88:
> 
>   Merge branch 'next' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next 
> (2016-02-26 15:06:41 +1100)
> 
> are available in the git repository at:
> 
>   https://github.com/jsakkine/linux-tpmdd.git tags/tpmdd-next-20160226
> 
> for you to fetch changes up to 2cb6d6460f1a171c71c134e0efe3a94c2206d080:
> 
>   tpm_tis: fix build warning with tpm_tis_resume (2016-02-26 11:32:07 +0200)
> 
> 
> tpmdd fix
> 
> 
> Jarkko Sakkinen (1):
>   tpm_tis: fix build warning with tpm_tis_resume
> 

Pulled to -next.

-- 
James Morris




[RFC] PCI: PTM Driver

2016-02-28 Thread Yong, Jonathan
Hello LKML,

This is a preliminary implementation of the PTM[1] support driver, the code
is obviously hacked together and in need of refactoring. This driver has
only been tested against a virtual PCI bus.

The drivers job is to get to every PTM capable device, set some PCI config
space bits, then go back to sleep [2].

PTM capable PCIe devices will get a new sysfs entry to allow PTM to be
enabled if automatic PTM activation is disabled, or disabled if so desired.

Comments? Should I explain the PTM registers in more details?
Please CC me, thanks.

[1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint
clocks against the host clock as specified in the PCI Express Base
Specification 3.1. It is identified by the 0x001f extended capability ID.

PTM capable devices are split into 3 roles, master, responder and requester.
Summary as follows:

A master holds the master clock that will be used for all devices under its
domain (not to be confused with PCI domains). There may be multiple masters
in a PTM hierarchy, in which case, the highest master closest to the root
complex will be selected for the PTM domain. A master is also always
responder capable. Clock precision is signified by a Local Clock
Granularity field, in nano-seconds.

A responder responds to any PTM synchronization requests from a downstream
device. A responder is typically a switch device. It may also hold a local
clock signified by a non-zero Local Clock Granularity field. A value of 0
signifies that the device simply propagates timing information from
upstream devices.

A requester is typically an endpoint that will request synchronization
updates from an upstream PTM capable time source. The driver will update
the Effective Clock Granularity field based on the same field from the
PTM domain master. The field should be programed with a value of 0 if any
intervening responder has a Local Clock Granularity field value of 0.

[2] The software drivers never see the PTM packets, the PCI Express Base
Specificaton 3.1 reads:
PTM capable components can make their PTM context available for
inspection by software, enabling software to translate timing
information between local times and PTM Master Time.

This isn't very informative.

Yong, Jonathan (1):
  PCI: PTM preliminary implementation

 drivers/pci/pci-sysfs.c |   7 +
 drivers/pci/pci.h   |  21 +++
 drivers/pci/pcie/Kconfig|   8 +
 drivers/pci/pcie/Makefile   |   2 +-
 drivers/pci/pcie/pcie_ptm.c | 353 
 drivers/pci/probe.c |   3 +
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/pcie_ptm.c

-- 
2.4.10



Re: [GIT PULL] tpmdd fix

2016-02-28 Thread James Morris
On Fri, 26 Feb 2016, Jarkko Sakkinen wrote:

> Hi James,
> 
> this is the fix for the build warning.
> 
> /Jarkko
> 
> The following changes since commit 481873d06f2bf2ad732450a3a5fa5b8c2a07ef88:
> 
>   Merge branch 'next' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next 
> (2016-02-26 15:06:41 +1100)
> 
> are available in the git repository at:
> 
>   https://github.com/jsakkine/linux-tpmdd.git tags/tpmdd-next-20160226
> 
> for you to fetch changes up to 2cb6d6460f1a171c71c134e0efe3a94c2206d080:
> 
>   tpm_tis: fix build warning with tpm_tis_resume (2016-02-26 11:32:07 +0200)
> 
> 
> tpmdd fix
> 
> 
> Jarkko Sakkinen (1):
>   tpm_tis: fix build warning with tpm_tis_resume
> 

Pulled to -next.

-- 
James Morris




[RFC] PCI: PTM Driver

2016-02-28 Thread Yong, Jonathan
Hello LKML,

This is a preliminary implementation of the PTM[1] support driver, the code
is obviously hacked together and in need of refactoring. This driver has
only been tested against a virtual PCI bus.

The drivers job is to get to every PTM capable device, set some PCI config
space bits, then go back to sleep [2].

PTM capable PCIe devices will get a new sysfs entry to allow PTM to be
enabled if automatic PTM activation is disabled, or disabled if so desired.

Comments? Should I explain the PTM registers in more details?
Please CC me, thanks.

[1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint
clocks against the host clock as specified in the PCI Express Base
Specification 3.1. It is identified by the 0x001f extended capability ID.

PTM capable devices are split into 3 roles, master, responder and requester.
Summary as follows:

A master holds the master clock that will be used for all devices under its
domain (not to be confused with PCI domains). There may be multiple masters
in a PTM hierarchy, in which case, the highest master closest to the root
complex will be selected for the PTM domain. A master is also always
responder capable. Clock precision is signified by a Local Clock
Granularity field, in nano-seconds.

A responder responds to any PTM synchronization requests from a downstream
device. A responder is typically a switch device. It may also hold a local
clock signified by a non-zero Local Clock Granularity field. A value of 0
signifies that the device simply propagates timing information from
upstream devices.

A requester is typically an endpoint that will request synchronization
updates from an upstream PTM capable time source. The driver will update
the Effective Clock Granularity field based on the same field from the
PTM domain master. The field should be programed with a value of 0 if any
intervening responder has a Local Clock Granularity field value of 0.

[2] The software drivers never see the PTM packets, the PCI Express Base
Specificaton 3.1 reads:
PTM capable components can make their PTM context available for
inspection by software, enabling software to translate timing
information between local times and PTM Master Time.

This isn't very informative.

Yong, Jonathan (1):
  PCI: PTM preliminary implementation

 drivers/pci/pci-sysfs.c |   7 +
 drivers/pci/pci.h   |  21 +++
 drivers/pci/pcie/Kconfig|   8 +
 drivers/pci/pcie/Makefile   |   2 +-
 drivers/pci/pcie/pcie_ptm.c | 353 
 drivers/pci/probe.c |   3 +
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/pcie_ptm.c

-- 
2.4.10



Re: [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0

2016-02-28 Thread Christophe Leroy



Le 23/10/2015 05:33, Scott Wood a écrit :

On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:

Simplify csum_add(a, b) in case a or b is constant 0

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/checksum.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/checksum.h
b/arch/powerpc/include/asm/checksum.h
index 56deea8..f8a9704 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -119,7 +119,13 @@ static inline __wsum csum_add(__wsum csum, __wsum
addend)
  {
  #ifdef __powerpc64__
   u64 res = (__force u64)csum;
+#endif
+ if (__builtin_constant_p(csum) && csum == 0)
+ return addend;
+ if (__builtin_constant_p(addend) && addend == 0)
+ return csum;

+#ifdef __powerpc64__
   res += (__force u64)addend;
   return (__force __wsum)((u32)res + (res >> 32));
  #else

How often does this happen?


In the following patch (9/9), csum_add() is used to implement 
csum_partial() for small blocks.
In several places in the networking code, csum_partial() is called with 
0 as initial sum.


Christophe


Re: [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0

2016-02-28 Thread Christophe Leroy



Le 23/10/2015 05:33, Scott Wood a écrit :

On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:

Simplify csum_add(a, b) in case a or b is constant 0

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/checksum.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/checksum.h
b/arch/powerpc/include/asm/checksum.h
index 56deea8..f8a9704 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -119,7 +119,13 @@ static inline __wsum csum_add(__wsum csum, __wsum
addend)
  {
  #ifdef __powerpc64__
   u64 res = (__force u64)csum;
+#endif
+ if (__builtin_constant_p(csum) && csum == 0)
+ return addend;
+ if (__builtin_constant_p(addend) && addend == 0)
+ return csum;

+#ifdef __powerpc64__
   res += (__force u64)addend;
   return (__force __wsum)((u32)res + (res >> 32));
  #else

How often does this happen?


In the following patch (9/9), csum_add() is used to implement 
csum_partial() for small blocks.
In several places in the networking code, csum_partial() is called with 
0 as initial sum.


Christophe


Re: [PATCH 4/9] powerpc: inline ip_fast_csum()

2016-02-28 Thread Christophe Leroy



Le 23/09/2015 07:43, Denis Kirjanov a écrit :

On 9/22/15, Christophe Leroy  wrote:

In several architectures, ip_fast_csum() is inlined
There are functions like ip_send_check() which do nothing
much more than calling ip_fast_csum().
Inlining ip_fast_csum() allows the compiler to optimise better

Hi Christophe,
I did try it and see no difference on ppc64. Did you test with socklib
with modified loopback and if so do you have any numbers?


Hi Denis,

I put a mftbl at start and end of ip_send_check() and tested on a MPC885:
* Without ip_fast_csum() inlined, approxymatly 7 TB ticks are spent in 
ip_send_check()
* With ip_fast_csum() inlined, approxymatly 5,4 TB ticks are spent in 
ip_send_check()


So it is about 23% time reduction.

Christophe


Re: [PATCH 4/9] powerpc: inline ip_fast_csum()

2016-02-28 Thread Christophe Leroy



Le 23/09/2015 07:43, Denis Kirjanov a écrit :

On 9/22/15, Christophe Leroy  wrote:

In several architectures, ip_fast_csum() is inlined
There are functions like ip_send_check() which do nothing
much more than calling ip_fast_csum().
Inlining ip_fast_csum() allows the compiler to optimise better

Hi Christophe,
I did try it and see no difference on ppc64. Did you test with socklib
with modified loopback and if so do you have any numbers?


Hi Denis,

I put a mftbl at start and end of ip_send_check() and tested on a MPC885:
* Without ip_fast_csum() inlined, approxymatly 7 TB ticks are spent in 
ip_send_check()
* With ip_fast_csum() inlined, approxymatly 5,4 TB ticks are spent in 
ip_send_check()


So it is about 23% time reduction.

Christophe


Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached

2016-02-28 Thread Konstantin Khlebnikov
On Mon, Feb 29, 2016 at 3:03 AM, Hugh Dickins  wrote:
> On Fri, 19 Feb 2016, Andrew Morton wrote:
>> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov  
>> wrote:
>>
>> > >> What are your thoughts on this?
>> > >
>> > > My thoughts are NAK.  A misleading stat is not so bad as a
>> > > misleading stat whose meaning we change in some random kernel.
>> > >
>> > > By all means improve Documentation/filesystems/proc.txt on Cached.
>> > > By all means promote Active(file)+Inactive(file)-Buffers as often a
>> > > better measure (though Buffers itself is obscure to me - is it intended
>> > > usually to approximate resident FS metadata?).  By all means work on
>> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
>> > >
>> > > We have to assume that Cached has been useful to some people, and that
>> > > they've learnt to subtract Shmem from it, if slow or no swap concerns 
>> > > them.
>> > >
>> > > Added Konstantin to Cc: he's had valuable experience of people learning
>> > > to adapt to the numbers that we put out.
>> > >
>> >
>> > I think everything will ok. Subtraction of shmem isn't widespread practice,
>> > more like secret knowledge. This wasn't documented and people who use
>> > this should be aware that this might stop working at any time. So, ACK.
>>
>> It worries me as well - we're deliberately altering the behaviour of
>> existing userspace code.  Not all of those alterations will be welcome!
>>
>> We could add a shiny new field into meminfo and train people to migrate
>> to that.  But that would just be a sum of already-available fields.  In
>> an ideal world we could solve all of this with documentation and
>> cluebatting (and some apologizing!).
>
> Ah, I missed this, and just sent a redundant addition to the thread;
> followed by this doubly redundant addition.

"Cached" has been used for ages as amount of "potentially free memory".
This patch corrects it in original meaning and makes it closer to that
"potential"
meaining at the same time.

MemAvailable means exactly that and thing else so logic behind it could be
tuned and changed in the future. Thus, adding new fields makes no sense.


BTW
Glibc recently switched sysconf(_SC_PHYS_PAGES) / sysconf(_SC_AVPHYS_PAGES)
from /proc/meminfo MemTotal / MemFree to sysinfo(2) totalram / freeram for
performance reason. It seems possible to expose MemAvailable via sysinfo:
there is space for one field. Probably it's also possible to switch
_SC_AVPHYS_PAGES
to really available memory and add memcg awareness too.


Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached

2016-02-28 Thread Konstantin Khlebnikov
On Mon, Feb 29, 2016 at 3:03 AM, Hugh Dickins  wrote:
> On Fri, 19 Feb 2016, Andrew Morton wrote:
>> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov  
>> wrote:
>>
>> > >> What are your thoughts on this?
>> > >
>> > > My thoughts are NAK.  A misleading stat is not so bad as a
>> > > misleading stat whose meaning we change in some random kernel.
>> > >
>> > > By all means improve Documentation/filesystems/proc.txt on Cached.
>> > > By all means promote Active(file)+Inactive(file)-Buffers as often a
>> > > better measure (though Buffers itself is obscure to me - is it intended
>> > > usually to approximate resident FS metadata?).  By all means work on
>> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
>> > >
>> > > We have to assume that Cached has been useful to some people, and that
>> > > they've learnt to subtract Shmem from it, if slow or no swap concerns 
>> > > them.
>> > >
>> > > Added Konstantin to Cc: he's had valuable experience of people learning
>> > > to adapt to the numbers that we put out.
>> > >
>> >
>> > I think everything will ok. Subtraction of shmem isn't widespread practice,
>> > more like secret knowledge. This wasn't documented and people who use
>> > this should be aware that this might stop working at any time. So, ACK.
>>
>> It worries me as well - we're deliberately altering the behaviour of
>> existing userspace code.  Not all of those alterations will be welcome!
>>
>> We could add a shiny new field into meminfo and train people to migrate
>> to that.  But that would just be a sum of already-available fields.  In
>> an ideal world we could solve all of this with documentation and
>> cluebatting (and some apologizing!).
>
> Ah, I missed this, and just sent a redundant addition to the thread;
> followed by this doubly redundant addition.

"Cached" has been used for ages as amount of "potentially free memory".
This patch corrects it in original meaning and makes it closer to that
"potential"
meaining at the same time.

MemAvailable means exactly that and thing else so logic behind it could be
tuned and changed in the future. Thus, adding new fields makes no sense.


BTW
Glibc recently switched sysconf(_SC_PHYS_PAGES) / sysconf(_SC_AVPHYS_PAGES)
from /proc/meminfo MemTotal / MemFree to sysinfo(2) totalram / freeram for
performance reason. It seems possible to expose MemAvailable via sysinfo:
there is space for one field. Probably it's also possible to switch
_SC_AVPHYS_PAGES
to really available memory and add memcg awareness too.


Re: [PATCH 1/2] sigaltstack: implement SS_AUTODISARM flag

2016-02-28 Thread Stas Sergeev

29.02.2016 00:13, Stas Sergeev пишет:

This patch implements the SS_AUTODISARM flag that can be ORed with
SS_ONSTACK when forming ss_flags.
When this flag is set, sigaltstack will be disabled when entering
the signal handler; more precisely, after saving sas to uc_stack.
When leaving the signal handler, the sigaltstack is restored by
uc_stack.
When this flag is used, it is safe to switch from sighandler with
swapcontext(). Without this flag, the subsequent signal will corrupt
the state of the switched-away sighandler.

CC: Ingo Molnar 
CC: Peter Zijlstra 
CC: Richard Weinberger 
CC: Andrew Morton 
CC: Oleg Nesterov 
CC: Tejun Heo 
CC: Heinrich Schuchardt 
CC: Jason Low 
CC: Andrea Arcangeli 
CC: Frederic Weisbecker 
CC: Konstantin Khlebnikov 
CC: Josh Triplett 
CC: "Eric W. Biederman" 
CC: Aleksa Sarai 
CC: "Amanieu d'Antras" 
CC: Paul Moore 
CC: Sasha Levin 
CC: Palmer Dabbelt 
CC: Vladimir Davydov 
CC: linux-kernel@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: Andy Lutomirski 

Signed-off-by: Stas Sergeev 
---
  include/linux/sched.h   |  1 +
  include/linux/signal.h  |  4 +++-
  include/uapi/linux/signal.h |  3 +++
  kernel/fork.c   |  4 +++-
  kernel/signal.c | 23 ---
  5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a..f561d34 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1587,6 +1587,7 @@ struct task_struct {
  
  	unsigned long sas_ss_sp;

size_t sas_ss_size;
+   unsigned sas_ss_flags;
  
  	struct callback_head *task_works;
  
diff --git a/include/linux/signal.h b/include/linux/signal.h

index 92557bb..be3ebe0 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -432,8 +432,10 @@ int __save_altstack(stack_t __user *, unsigned long);
stack_t __user *__uss = uss; \
struct task_struct *t = current; \
put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
-   put_user_ex(sas_ss_flags(sp), &__uss->ss_flags); \
+   put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
put_user_ex(t->sas_ss_size, &__uss->ss_size); \
+   if (t->sas_ss_flags & SS_AUTODISARM) \
+   t->sas_ss_size = 0; \

Should also reset flags here...
Will send v4.


Re: [PATCH 1/2] sigaltstack: implement SS_AUTODISARM flag

2016-02-28 Thread Stas Sergeev

29.02.2016 00:13, Stas Sergeev пишет:

This patch implements the SS_AUTODISARM flag that can be ORed with
SS_ONSTACK when forming ss_flags.
When this flag is set, sigaltstack will be disabled when entering
the signal handler; more precisely, after saving sas to uc_stack.
When leaving the signal handler, the sigaltstack is restored by
uc_stack.
When this flag is used, it is safe to switch from sighandler with
swapcontext(). Without this flag, the subsequent signal will corrupt
the state of the switched-away sighandler.

CC: Ingo Molnar 
CC: Peter Zijlstra 
CC: Richard Weinberger 
CC: Andrew Morton 
CC: Oleg Nesterov 
CC: Tejun Heo 
CC: Heinrich Schuchardt 
CC: Jason Low 
CC: Andrea Arcangeli 
CC: Frederic Weisbecker 
CC: Konstantin Khlebnikov 
CC: Josh Triplett 
CC: "Eric W. Biederman" 
CC: Aleksa Sarai 
CC: "Amanieu d'Antras" 
CC: Paul Moore 
CC: Sasha Levin 
CC: Palmer Dabbelt 
CC: Vladimir Davydov 
CC: linux-kernel@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: Andy Lutomirski 

Signed-off-by: Stas Sergeev 
---
  include/linux/sched.h   |  1 +
  include/linux/signal.h  |  4 +++-
  include/uapi/linux/signal.h |  3 +++
  kernel/fork.c   |  4 +++-
  kernel/signal.c | 23 ---
  5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a..f561d34 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1587,6 +1587,7 @@ struct task_struct {
  
  	unsigned long sas_ss_sp;

size_t sas_ss_size;
+   unsigned sas_ss_flags;
  
  	struct callback_head *task_works;
  
diff --git a/include/linux/signal.h b/include/linux/signal.h

index 92557bb..be3ebe0 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -432,8 +432,10 @@ int __save_altstack(stack_t __user *, unsigned long);
stack_t __user *__uss = uss; \
struct task_struct *t = current; \
put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \
-   put_user_ex(sas_ss_flags(sp), &__uss->ss_flags); \
+   put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \
put_user_ex(t->sas_ss_size, &__uss->ss_size); \
+   if (t->sas_ss_flags & SS_AUTODISARM) \
+   t->sas_ss_size = 0; \

Should also reset flags here...
Will send v4.


Re: [PATCH v10 2/2] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-02-28 Thread Viresh Kumar
On 26-02-16, 16:06, Shilpasri G Bhat wrote:
> +static int powernv_cpufreq_policy_notifier(struct notifier_block *nb,
> +unsigned long action, void *data)
> +{
> + struct cpufreq_policy *policy = data;
> + int ret;
> +
> + if (action == CPUFREQ_CREATE_POLICY) {
> + ret = sysfs_create_group(>kobj, _attr_grp);
> + if (ret)
> + pr_info("Failed to create throttle stats directory for 
> cpu %d\n",
> + policy->cpu);
> + } else if (action == CPUFREQ_REMOVE_POLICY) {
> + sysfs_remove_group(>kobj, _attr_grp);
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block powernv_cpufreq_policy_nb = {
> + .notifier_call  = powernv_cpufreq_policy_notifier,
> + .next   = NULL,
> +};
> +
>  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>  {
>   struct powernv_smp_call_data freq_data;
> @@ -603,6 +708,8 @@ static inline void clean_chip_info(void)
>  
>  static inline void unregister_all_notifiers(void)
>  {
> + cpufreq_unregister_notifier(_cpufreq_policy_nb,
> + CPUFREQ_POLICY_NOTIFIER);
>   opal_message_notifier_unregister(OPAL_MSG_OCC,
>_cpufreq_opal_nb);
>   unregister_reboot_notifier(_cpufreq_reboot_nb);
> @@ -628,6 +735,8 @@ static int __init powernv_cpufreq_init(void)
>  
>   register_reboot_notifier(_cpufreq_reboot_nb);
>   opal_message_notifier_register(OPAL_MSG_OCC, _cpufreq_opal_nb);
> + cpufreq_register_notifier(_cpufreq_policy_nb,
> +   CPUFREQ_POLICY_NOTIFIER);
>  
>   rc = cpufreq_register_driver(_cpufreq_driver);
>   if (!rc)

@Rafael: This driver needs to do this *ugly* notifier hack, just because we
aren't doing kobject_add() for policy->kobj before ->init(). And we did that
because, we wanted to create the policyX structure with the first CPU in
policy->related_cpus mask and related_cpus mask isn't available until we call
->init()..

Should we do something in core to make this easier for this driver?

-- 
viresh


linux-next: manual merge of the target-merge tree with the net-next tree

2016-02-28 Thread Stephen Rothwell
Hi Nicholas,

Today's linux-next merge of the target-merge tree got a conflict in:

  drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h

between commit:

  ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support")

from the net-next tree and commit:

  c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD")

from the target-merge tree.

I fixed it up (the latter was a superset of the former) and can carry
the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v10 2/2] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-02-28 Thread Viresh Kumar
On 26-02-16, 16:06, Shilpasri G Bhat wrote:
> +static int powernv_cpufreq_policy_notifier(struct notifier_block *nb,
> +unsigned long action, void *data)
> +{
> + struct cpufreq_policy *policy = data;
> + int ret;
> +
> + if (action == CPUFREQ_CREATE_POLICY) {
> + ret = sysfs_create_group(>kobj, _attr_grp);
> + if (ret)
> + pr_info("Failed to create throttle stats directory for 
> cpu %d\n",
> + policy->cpu);
> + } else if (action == CPUFREQ_REMOVE_POLICY) {
> + sysfs_remove_group(>kobj, _attr_grp);
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block powernv_cpufreq_policy_nb = {
> + .notifier_call  = powernv_cpufreq_policy_notifier,
> + .next   = NULL,
> +};
> +
>  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>  {
>   struct powernv_smp_call_data freq_data;
> @@ -603,6 +708,8 @@ static inline void clean_chip_info(void)
>  
>  static inline void unregister_all_notifiers(void)
>  {
> + cpufreq_unregister_notifier(_cpufreq_policy_nb,
> + CPUFREQ_POLICY_NOTIFIER);
>   opal_message_notifier_unregister(OPAL_MSG_OCC,
>_cpufreq_opal_nb);
>   unregister_reboot_notifier(_cpufreq_reboot_nb);
> @@ -628,6 +735,8 @@ static int __init powernv_cpufreq_init(void)
>  
>   register_reboot_notifier(_cpufreq_reboot_nb);
>   opal_message_notifier_register(OPAL_MSG_OCC, _cpufreq_opal_nb);
> + cpufreq_register_notifier(_cpufreq_policy_nb,
> +   CPUFREQ_POLICY_NOTIFIER);
>  
>   rc = cpufreq_register_driver(_cpufreq_driver);
>   if (!rc)

@Rafael: This driver needs to do this *ugly* notifier hack, just because we
aren't doing kobject_add() for policy->kobj before ->init(). And we did that
because, we wanted to create the policyX structure with the first CPU in
policy->related_cpus mask and related_cpus mask isn't available until we call
->init()..

Should we do something in core to make this easier for this driver?

-- 
viresh


linux-next: manual merge of the target-merge tree with the net-next tree

2016-02-28 Thread Stephen Rothwell
Hi Nicholas,

Today's linux-next merge of the target-merge tree got a conflict in:

  drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h

between commit:

  ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support")

from the net-next tree and commit:

  c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD")

from the target-merge tree.

I fixed it up (the latter was a superset of the former) and can carry
the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] mm/zsmalloc: add compact column to pool stat

2016-02-28 Thread Sergey Senozhatsky
Hello,

On (02/29/16 15:02), Minchan Kim wrote:
> On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote:
> > Add a new column to pool stats, which will tell us class' zs_can_compact()
> > number, so it will be easier to analyze zsmalloc fragmentation.
> 
> Just nitpick:
> 
> Strictly speaking, zs_can_compact number is number of "ideal freeable page
> by compaction". How about using high level term in description rather than
> function name?

OK, makes sense.


> > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but
> > they don't tell us how badly the class is fragmented internally.
> > 
> > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:
> > 
> >  class  size almost_full almost_empty obj_allocated   obj_used pages_used 
> > pages_per_zspage compact
> > [..]
> > 12   224   02   146  5  8   
> >  4   4
> > 13   240   00 0  0  0   
> >  1   0
> > 14   256   1   13  1840   1672115   
> >  1  10
> > 15   272   00 0  0  0   
> >  1   0
> > [..]
> > 49   816   03   745735149   
> >  1   2
> > 51   848   34   361306 76   
> >  4   8
> > 52   864  12   14   378268 81   
> >  3  21
> > 54   896   1   12   117 57 26   
> >  2  12
> > 57   944   00 0  0  0   
> >  3   0
> > [..]
> >  Total26  131 12709  10994   1071   
> >134
> > 
> > For example, from this particular output we can easily conclude that 
> > class-896
> > is heavily fragmented -- it occupies 26 pages, 12 can be freed by 
> > compaction.
> 
> How about using "freeable" or something which could represent "freeable"?
> IMO, it's more strightforward for user.

OK. didn't want to put any long column name there, which would bloat the
output. will take a look.

> Other than that,
> 
> Acked-by: Minchan Kim 
> 
> 
> Thanks for the nice job!

thanks.

-ss


Re: [PATCH] mm/zsmalloc: add compact column to pool stat

2016-02-28 Thread Sergey Senozhatsky
Hello,

On (02/29/16 15:02), Minchan Kim wrote:
> On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote:
> > Add a new column to pool stats, which will tell us class' zs_can_compact()
> > number, so it will be easier to analyze zsmalloc fragmentation.
> 
> Just nitpick:
> 
> Strictly speaking, zs_can_compact number is number of "ideal freeable page
> by compaction". How about using high level term in description rather than
> function name?

OK, makes sense.


> > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but
> > they don't tell us how badly the class is fragmented internally.
> > 
> > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:
> > 
> >  class  size almost_full almost_empty obj_allocated   obj_used pages_used 
> > pages_per_zspage compact
> > [..]
> > 12   224   02   146  5  8   
> >  4   4
> > 13   240   00 0  0  0   
> >  1   0
> > 14   256   1   13  1840   1672115   
> >  1  10
> > 15   272   00 0  0  0   
> >  1   0
> > [..]
> > 49   816   03   745735149   
> >  1   2
> > 51   848   34   361306 76   
> >  4   8
> > 52   864  12   14   378268 81   
> >  3  21
> > 54   896   1   12   117 57 26   
> >  2  12
> > 57   944   00 0  0  0   
> >  3   0
> > [..]
> >  Total26  131 12709  10994   1071   
> >134
> > 
> > For example, from this particular output we can easily conclude that 
> > class-896
> > is heavily fragmented -- it occupies 26 pages, 12 can be freed by 
> > compaction.
> 
> How about using "freeable" or something which could represent "freeable"?
> IMO, it's more strightforward for user.

OK. didn't want to put any long column name there, which would bloat the
output. will take a look.

> Other than that,
> 
> Acked-by: Minchan Kim 
> 
> 
> Thanks for the nice job!

thanks.

-ss


Re: [PATCH] mm/zsmalloc: add compact column to pool stat

2016-02-28 Thread Minchan Kim
On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote:
> Add a new column to pool stats, which will tell us class' zs_can_compact()
> number, so it will be easier to analyze zsmalloc fragmentation.

Just nitpick:

Strictly speaking, zs_can_compact number is number of "ideal freeable page
by compaction". How about using high level term in description rather than
function name?


> 
> At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but
> they don't tell us how badly the class is fragmented internally.
> 
> The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:
> 
>  class  size almost_full almost_empty obj_allocated   obj_used pages_used 
> pages_per_zspage compact
> [..]
> 12   224   02   146  5  8 
>4   4
> 13   240   00 0  0  0 
>1   0
> 14   256   1   13  1840   1672115 
>1  10
> 15   272   00 0  0  0 
>1   0
> [..]
> 49   816   03   745735149 
>1   2
> 51   848   34   361306 76 
>4   8
> 52   864  12   14   378268 81 
>3  21
> 54   896   1   12   117 57 26 
>2  12
> 57   944   00 0  0  0 
>3   0
> [..]
>  Total26  131 12709  10994   1071 
>  134
> 
> For example, from this particular output we can easily conclude that class-896
> is heavily fragmented -- it occupies 26 pages, 12 can be freed by compaction.

How about using "freeable" or something which could represent "freeable"?
IMO, it's more strightforward for user.

Other than that,

Acked-by: Minchan Kim 


Thanks for the nice job!


Re: [PATCH] mm/zsmalloc: add compact column to pool stat

2016-02-28 Thread Minchan Kim
On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote:
> Add a new column to pool stats, which will tell us class' zs_can_compact()
> number, so it will be easier to analyze zsmalloc fragmentation.

Just nitpick:

Strictly speaking, zs_can_compact number is number of "ideal freeable page
by compaction". How about using high level term in description rather than
function name?


> 
> At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but
> they don't tell us how badly the class is fragmented internally.
> 
> The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:
> 
>  class  size almost_full almost_empty obj_allocated   obj_used pages_used 
> pages_per_zspage compact
> [..]
> 12   224   02   146  5  8 
>4   4
> 13   240   00 0  0  0 
>1   0
> 14   256   1   13  1840   1672115 
>1  10
> 15   272   00 0  0  0 
>1   0
> [..]
> 49   816   03   745735149 
>1   2
> 51   848   34   361306 76 
>4   8
> 52   864  12   14   378268 81 
>3  21
> 54   896   1   12   117 57 26 
>2  12
> 57   944   00 0  0  0 
>3   0
> [..]
>  Total26  131 12709  10994   1071 
>  134
> 
> For example, from this particular output we can easily conclude that class-896
> is heavily fragmented -- it occupies 26 pages, 12 can be freed by compaction.

How about using "freeable" or something which could represent "freeable"?
IMO, it's more strightforward for user.

Other than that,

Acked-by: Minchan Kim 


Thanks for the nice job!


Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files

2016-02-28 Thread Vinod Koul
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote:
> The two header files got moved to include/linux, and most
> users were already converted, this changes the remaining drivers
> and removes the files.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/dma/idma64.h| 2 +-
For this:

Acked-by: Vinod Koul 

Thanks
-- 
~Vinod


Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files

2016-02-28 Thread Vinod Koul
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote:
> The two header files got moved to include/linux, and most
> users were already converted, this changes the remaining drivers
> and removes the files.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/dma/idma64.h| 2 +-
For this:

Acked-by: Vinod Koul 

Thanks
-- 
~Vinod


Re: [PATCH v3 22/22] sound/usb: Use Media Controller API to share media resources

2016-02-28 Thread Shuah Khan
On 02/27/2016 12:48 AM, Takashi Iwai wrote:
> On Sat, 27 Feb 2016 03:55:39 +0100,
> Shuah Khan wrote:
>>
>> On 02/26/2016 01:50 PM, Takashi Iwai wrote:
>>> On Fri, 26 Feb 2016 21:08:43 +0100,
>>> Shuah Khan wrote:

 On 02/26/2016 12:55 PM, Takashi Iwai wrote:
> On Fri, 12 Feb 2016 00:41:38 +0100,
> Shuah Khan wrote:
>>
>> Change ALSA driver to use Media Controller API to
>> share media resources with DVB and V4L2 drivers
>> on a AU0828 media device. Media Controller specific
>> initialization is done after sound card is registered.
>> ALSA creates Media interface and entity function graph
>> nodes for Control, Mixer, PCM Playback, and PCM Capture
>> devices.
>>
>> snd_usb_hw_params() will call Media Controller enable
>> source handler interface to request the media resource.
>> If resource request is granted, it will release it from
>> snd_usb_hw_free(). If resource is busy, -EBUSY is returned.
>>
>> Media specific cleanup is done in usb_audio_disconnect().
>>
>> Signed-off-by: Shuah Khan 
>> ---
>>  sound/usb/Kconfig|   4 +
>>  sound/usb/Makefile   |   2 +
>>  sound/usb/card.c |  14 +++
>>  sound/usb/card.h |   3 +
>>  sound/usb/media.c| 318 
>> +++
>>  sound/usb/media.h|  72 +++
>>  sound/usb/mixer.h|   3 +
>>  sound/usb/pcm.c  |  28 -
>>  sound/usb/quirks-table.h |   1 +
>>  sound/usb/stream.c   |   2 +
>>  sound/usb/usbaudio.h |   6 +
>>  11 files changed, 448 insertions(+), 5 deletions(-)
>>  create mode 100644 sound/usb/media.c
>>  create mode 100644 sound/usb/media.h
>>
>> diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig
>> index a452ad7..ba117f5 100644
>> --- a/sound/usb/Kconfig
>> +++ b/sound/usb/Kconfig
>> @@ -15,6 +15,7 @@ config SND_USB_AUDIO
>>  select SND_RAWMIDI
>>  select SND_PCM
>>  select BITREVERSE
>> +select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER 
>> && MEDIA_SUPPORT
>
> Looking at the media Kconfig again, this would be broken if
> MEDIA_SUPPORT=m and SND_USB_AUDIO=y.  The ugly workaround is something
> like:
>   select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \
>   if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND)

 My current config is MEDIA_SUPPORT=m and SND_USB_AUDIO=y
 It is working and I didn't see any issues so far.
>>>
>>> Hmm, how does it be?  In drivers/media/Makefile:
>>>
>>> ifeq ($(CONFIG_MEDIA_CONTROLLER),y)
>>>   obj-$(CONFIG_MEDIA_SUPPORT) += media.o
>>> endif
>>>
>>> So it's a module.  Meanwhile you have reference from usb-audio driver
>>> that is built-in kernel.  How is the symbol resolved?
>>
>> Sorry my mistake. I misspoke. My config had:
>> CONFIG_MEDIA_SUPPORT=m
>> CONFIG_MEDIA_CONTROLLER=y
>> CONFIG_SND_USB_AUDIO=m
>>
>> The following doesn't work as you pointed out.
>>
>> CONFIG_MEDIA_SUPPORT=m
>> CONFIG_MEDIA_CONTROLLER=y
>> CONFIG_SND_USB_AUDIO=y
>>
>> okay here is what will work for all of the possible
>> combinations of CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO
>>
>> select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \
>>if MEDIA_CONTROLLER && ((MEDIA_SUPPORT=y) || (MEDIA_SUPPORT=m && 
>> SND_USB_AUDIO=m))
>>
>> The above will cover the cases when
>>
>> 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are
>>both modules
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected
>>
>> 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected
>>
>> 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected
>>
>> 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y
>>This is when we don't want
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected
>>
>> I verified all of the above combinations to make sure
>> the logic works.
>>
>> If you think of a better way to do this please let me
>> know. I will go ahead and send patch v4 with the above
>> change and you can decide if that is acceptable.
> 
> I'm not 100% sure whether CONFIG_SND_USB_AUDIO=m can be put there as
> conditional inside CONFIG_SND_USB_AUDIO definition.  Maybe a safer
> form would be like:
> 
> config SND_USB_AUDIO_USE_MEDIA_CONTROLLER
>   bool
>   default y
>   depends on SND_USB_AUDIO
>   depends on MEDIA_CONTROLLER
>   depends on (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO)
> 
> and drop select from SND_USB_AUDIO.
> 
> 
> Other than that, it looks more or less OK to me.
> The way how media_stream_init() gets called is a bit worrisome, but it
> should work practically.  Another concern is about the disconnection.
> Can all function calls in media_device_delete() be safe even if it's
> called while 

Re: [PATCH v3 22/22] sound/usb: Use Media Controller API to share media resources

2016-02-28 Thread Shuah Khan
On 02/27/2016 12:48 AM, Takashi Iwai wrote:
> On Sat, 27 Feb 2016 03:55:39 +0100,
> Shuah Khan wrote:
>>
>> On 02/26/2016 01:50 PM, Takashi Iwai wrote:
>>> On Fri, 26 Feb 2016 21:08:43 +0100,
>>> Shuah Khan wrote:

 On 02/26/2016 12:55 PM, Takashi Iwai wrote:
> On Fri, 12 Feb 2016 00:41:38 +0100,
> Shuah Khan wrote:
>>
>> Change ALSA driver to use Media Controller API to
>> share media resources with DVB and V4L2 drivers
>> on a AU0828 media device. Media Controller specific
>> initialization is done after sound card is registered.
>> ALSA creates Media interface and entity function graph
>> nodes for Control, Mixer, PCM Playback, and PCM Capture
>> devices.
>>
>> snd_usb_hw_params() will call Media Controller enable
>> source handler interface to request the media resource.
>> If resource request is granted, it will release it from
>> snd_usb_hw_free(). If resource is busy, -EBUSY is returned.
>>
>> Media specific cleanup is done in usb_audio_disconnect().
>>
>> Signed-off-by: Shuah Khan 
>> ---
>>  sound/usb/Kconfig|   4 +
>>  sound/usb/Makefile   |   2 +
>>  sound/usb/card.c |  14 +++
>>  sound/usb/card.h |   3 +
>>  sound/usb/media.c| 318 
>> +++
>>  sound/usb/media.h|  72 +++
>>  sound/usb/mixer.h|   3 +
>>  sound/usb/pcm.c  |  28 -
>>  sound/usb/quirks-table.h |   1 +
>>  sound/usb/stream.c   |   2 +
>>  sound/usb/usbaudio.h |   6 +
>>  11 files changed, 448 insertions(+), 5 deletions(-)
>>  create mode 100644 sound/usb/media.c
>>  create mode 100644 sound/usb/media.h
>>
>> diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig
>> index a452ad7..ba117f5 100644
>> --- a/sound/usb/Kconfig
>> +++ b/sound/usb/Kconfig
>> @@ -15,6 +15,7 @@ config SND_USB_AUDIO
>>  select SND_RAWMIDI
>>  select SND_PCM
>>  select BITREVERSE
>> +select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER 
>> && MEDIA_SUPPORT
>
> Looking at the media Kconfig again, this would be broken if
> MEDIA_SUPPORT=m and SND_USB_AUDIO=y.  The ugly workaround is something
> like:
>   select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \
>   if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND)

 My current config is MEDIA_SUPPORT=m and SND_USB_AUDIO=y
 It is working and I didn't see any issues so far.
>>>
>>> Hmm, how does it be?  In drivers/media/Makefile:
>>>
>>> ifeq ($(CONFIG_MEDIA_CONTROLLER),y)
>>>   obj-$(CONFIG_MEDIA_SUPPORT) += media.o
>>> endif
>>>
>>> So it's a module.  Meanwhile you have reference from usb-audio driver
>>> that is built-in kernel.  How is the symbol resolved?
>>
>> Sorry my mistake. I misspoke. My config had:
>> CONFIG_MEDIA_SUPPORT=m
>> CONFIG_MEDIA_CONTROLLER=y
>> CONFIG_SND_USB_AUDIO=m
>>
>> The following doesn't work as you pointed out.
>>
>> CONFIG_MEDIA_SUPPORT=m
>> CONFIG_MEDIA_CONTROLLER=y
>> CONFIG_SND_USB_AUDIO=y
>>
>> okay here is what will work for all of the possible
>> combinations of CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO
>>
>> select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \
>>if MEDIA_CONTROLLER && ((MEDIA_SUPPORT=y) || (MEDIA_SUPPORT=m && 
>> SND_USB_AUDIO=m))
>>
>> The above will cover the cases when
>>
>> 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are
>>both modules
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected
>>
>> 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected
>>
>> 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected
>>
>> 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y
>>This is when we don't want
>>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected
>>
>> I verified all of the above combinations to make sure
>> the logic works.
>>
>> If you think of a better way to do this please let me
>> know. I will go ahead and send patch v4 with the above
>> change and you can decide if that is acceptable.
> 
> I'm not 100% sure whether CONFIG_SND_USB_AUDIO=m can be put there as
> conditional inside CONFIG_SND_USB_AUDIO definition.  Maybe a safer
> form would be like:
> 
> config SND_USB_AUDIO_USE_MEDIA_CONTROLLER
>   bool
>   default y
>   depends on SND_USB_AUDIO
>   depends on MEDIA_CONTROLLER
>   depends on (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO)
> 
> and drop select from SND_USB_AUDIO.
> 
> 
> Other than that, it looks more or less OK to me.
> The way how media_stream_init() gets called is a bit worrisome, but it
> should work practically.  Another concern is about the disconnection.
> Can all function calls in media_device_delete() be safe even if it's
> called while the application still 

[PATCH v4 22/22] sound/usb: Use Media Controller API to share media resources

2016-02-28 Thread Shuah Khan
Change ALSA driver to use Media Controller API to
share media resources with DVB and V4L2 drivers
on a AU0828 media device. Media Controller specific
initialization is done after sound card is registered.
ALSA creates Media interface and entity function graph
nodes for Control, Mixer, PCM Playback, and PCM Capture
devices.

snd_usb_hw_params() will call Media Controller enable
source handler interface to request the media resource.
If resource request is granted, it will release it from
snd_usb_hw_free(). If resource is busy, -EBUSY is returned.

Media specific cleanup is done in usb_audio_disconnect().

Signed-off-by: Shuah Khan 
---

Changes since v3:
- Fixed Kconfig to handle the following
1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are
   both modules
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected

2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected

3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected

4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y
   This is when we don't want
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected

 sound/usb/Kconfig|   4 +
 sound/usb/Makefile   |   2 +
 sound/usb/card.c |  14 +++
 sound/usb/card.h |   3 +
 sound/usb/media.c| 318 +++
 sound/usb/media.h|  72 +++
 sound/usb/mixer.h|   3 +
 sound/usb/pcm.c  |  28 -
 sound/usb/quirks-table.h |   1 +
 sound/usb/stream.c   |   2 +
 sound/usb/usbaudio.h |   6 +
 11 files changed, 448 insertions(+), 5 deletions(-)
 create mode 100644 sound/usb/media.c
 create mode 100644 sound/usb/media.h

diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig
index a452ad7..d14bf41 100644
--- a/sound/usb/Kconfig
+++ b/sound/usb/Kconfig
@@ -15,6 +15,7 @@ config SND_USB_AUDIO
select SND_RAWMIDI
select SND_PCM
select BITREVERSE
+   select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && 
(MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO)
help
  Say Y here to include support for USB audio and USB MIDI
  devices.
@@ -22,6 +23,9 @@ config SND_USB_AUDIO
  To compile this driver as a module, choose M here: the module
  will be called snd-usb-audio.
 
+config SND_USB_AUDIO_USE_MEDIA_CONTROLLER
+   bool
+
 config SND_USB_UA101
tristate "Edirol UA-101/UA-1000 driver"
select SND_PCM
diff --git a/sound/usb/Makefile b/sound/usb/Makefile
index 2d2d122..8dca3c4 100644
--- a/sound/usb/Makefile
+++ b/sound/usb/Makefile
@@ -15,6 +15,8 @@ snd-usb-audio-objs := card.o \
quirks.o \
stream.o
 
+snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o
+
 snd-usbmidi-lib-objs := midi.o
 
 # Toplevel Module Dependency
diff --git a/sound/usb/card.c b/sound/usb/card.c
index 1f09d95..35fe256 100644
--- a/sound/usb/card.c
+++ b/sound/usb/card.c
@@ -66,6 +66,7 @@
 #include "format.h"
 #include "power.h"
 #include "stream.h"
+#include "media.h"
 
 MODULE_AUTHOR("Takashi Iwai ");
 MODULE_DESCRIPTION("USB Audio");
@@ -561,6 +562,11 @@ static int usb_audio_probe(struct usb_interface *intf,
if (err < 0)
goto __error;
 
+   if (quirk->media_device) {
+   /* don't want to fail when media_device_create() fails */
+   media_device_create(chip, intf);
+   }
+
usb_chip[chip->index] = chip;
chip->num_interfaces++;
usb_set_intfdata(intf, chip);
@@ -617,6 +623,14 @@ static void usb_audio_disconnect(struct usb_interface 
*intf)
list_for_each(p, >midi_list) {
snd_usbmidi_disconnect(p);
}
+   /*
+* Nice to check quirk && quirk->media_device
+* need some special handlings. Doesn't look like
+* we have access to quirk here
+* Acceses mixer_list
+   */
+   media_device_delete(chip);
+
/* release mixer resources */
list_for_each_entry(mixer, >mixer_list, list) {
snd_usb_mixer_disconnect(mixer);
diff --git a/sound/usb/card.h b/sound/usb/card.h
index 71778ca..34a0898 100644
--- a/sound/usb/card.h
+++ b/sound/usb/card.h
@@ -105,6 +105,8 @@ struct snd_usb_endpoint {
struct list_head list;
 };
 
+struct media_ctl;
+
 struct snd_usb_substream {
struct snd_usb_stream *stream;
struct usb_device *dev;
@@ -156,6 +158,7 @@ struct snd_usb_substream {
} dsd_dop;
 
bool trigger_tstamp_pending_update; /* trigger timestamp being updated 
from initial estimate */
+   struct media_ctl *media_ctl;
 };
 
 struct snd_usb_stream {
diff --git a/sound/usb/media.c b/sound/usb/media.c
new file mode 100644
index 000..cff1459
--- 

[PATCH v4 22/22] sound/usb: Use Media Controller API to share media resources

2016-02-28 Thread Shuah Khan
Change ALSA driver to use Media Controller API to
share media resources with DVB and V4L2 drivers
on a AU0828 media device. Media Controller specific
initialization is done after sound card is registered.
ALSA creates Media interface and entity function graph
nodes for Control, Mixer, PCM Playback, and PCM Capture
devices.

snd_usb_hw_params() will call Media Controller enable
source handler interface to request the media resource.
If resource request is granted, it will release it from
snd_usb_hw_free(). If resource is busy, -EBUSY is returned.

Media specific cleanup is done in usb_audio_disconnect().

Signed-off-by: Shuah Khan 
---

Changes since v3:
- Fixed Kconfig to handle the following
1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are
   both modules
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected

2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected

3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected

4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y
   This is when we don't want
   CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected

 sound/usb/Kconfig|   4 +
 sound/usb/Makefile   |   2 +
 sound/usb/card.c |  14 +++
 sound/usb/card.h |   3 +
 sound/usb/media.c| 318 +++
 sound/usb/media.h|  72 +++
 sound/usb/mixer.h|   3 +
 sound/usb/pcm.c  |  28 -
 sound/usb/quirks-table.h |   1 +
 sound/usb/stream.c   |   2 +
 sound/usb/usbaudio.h |   6 +
 11 files changed, 448 insertions(+), 5 deletions(-)
 create mode 100644 sound/usb/media.c
 create mode 100644 sound/usb/media.h

diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig
index a452ad7..d14bf41 100644
--- a/sound/usb/Kconfig
+++ b/sound/usb/Kconfig
@@ -15,6 +15,7 @@ config SND_USB_AUDIO
select SND_RAWMIDI
select SND_PCM
select BITREVERSE
+   select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && 
(MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO)
help
  Say Y here to include support for USB audio and USB MIDI
  devices.
@@ -22,6 +23,9 @@ config SND_USB_AUDIO
  To compile this driver as a module, choose M here: the module
  will be called snd-usb-audio.
 
+config SND_USB_AUDIO_USE_MEDIA_CONTROLLER
+   bool
+
 config SND_USB_UA101
tristate "Edirol UA-101/UA-1000 driver"
select SND_PCM
diff --git a/sound/usb/Makefile b/sound/usb/Makefile
index 2d2d122..8dca3c4 100644
--- a/sound/usb/Makefile
+++ b/sound/usb/Makefile
@@ -15,6 +15,8 @@ snd-usb-audio-objs := card.o \
quirks.o \
stream.o
 
+snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o
+
 snd-usbmidi-lib-objs := midi.o
 
 # Toplevel Module Dependency
diff --git a/sound/usb/card.c b/sound/usb/card.c
index 1f09d95..35fe256 100644
--- a/sound/usb/card.c
+++ b/sound/usb/card.c
@@ -66,6 +66,7 @@
 #include "format.h"
 #include "power.h"
 #include "stream.h"
+#include "media.h"
 
 MODULE_AUTHOR("Takashi Iwai ");
 MODULE_DESCRIPTION("USB Audio");
@@ -561,6 +562,11 @@ static int usb_audio_probe(struct usb_interface *intf,
if (err < 0)
goto __error;
 
+   if (quirk->media_device) {
+   /* don't want to fail when media_device_create() fails */
+   media_device_create(chip, intf);
+   }
+
usb_chip[chip->index] = chip;
chip->num_interfaces++;
usb_set_intfdata(intf, chip);
@@ -617,6 +623,14 @@ static void usb_audio_disconnect(struct usb_interface 
*intf)
list_for_each(p, >midi_list) {
snd_usbmidi_disconnect(p);
}
+   /*
+* Nice to check quirk && quirk->media_device
+* need some special handlings. Doesn't look like
+* we have access to quirk here
+* Acceses mixer_list
+   */
+   media_device_delete(chip);
+
/* release mixer resources */
list_for_each_entry(mixer, >mixer_list, list) {
snd_usb_mixer_disconnect(mixer);
diff --git a/sound/usb/card.h b/sound/usb/card.h
index 71778ca..34a0898 100644
--- a/sound/usb/card.h
+++ b/sound/usb/card.h
@@ -105,6 +105,8 @@ struct snd_usb_endpoint {
struct list_head list;
 };
 
+struct media_ctl;
+
 struct snd_usb_substream {
struct snd_usb_stream *stream;
struct usb_device *dev;
@@ -156,6 +158,7 @@ struct snd_usb_substream {
} dsd_dop;
 
bool trigger_tstamp_pending_update; /* trigger timestamp being updated 
from initial estimate */
+   struct media_ctl *media_ctl;
 };
 
 struct snd_usb_stream {
diff --git a/sound/usb/media.c b/sound/usb/media.c
new file mode 100644
index 000..cff1459
--- /dev/null
+++ b/sound/usb/media.c
@@ -0,0 

[PATCH] phy: Fix armada375 compile test build on UM

2016-02-28 Thread Krzysztof Kozlowski
The phy-armada375-usb2 driver uses IOMEM functions so COMPILE_TEST && OF
build failed with:

drivers/built-in.o: In function `armada375_usb_phy_probe':
phy-armada375-usb2.c:(.text+0x121d): undefined reference to 
`devm_ioremap_resource'

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/phy/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 0124d17bd9fe..786a9d6356b8 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -32,7 +32,7 @@ config PHY_BERLIN_SATA
 config ARMADA375_USBCLUSTER_PHY
def_bool y
depends on MACH_ARMADA_375 || COMPILE_TEST
-   depends on OF
+   depends on OF && HAS_IOMEM
select GENERIC_PHY
 
 config PHY_DM816X_USB
-- 
2.5.0



[PATCH] phy: Fix armada375 compile test build on UM

2016-02-28 Thread Krzysztof Kozlowski
The phy-armada375-usb2 driver uses IOMEM functions so COMPILE_TEST && OF
build failed with:

drivers/built-in.o: In function `armada375_usb_phy_probe':
phy-armada375-usb2.c:(.text+0x121d): undefined reference to 
`devm_ioremap_resource'

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/phy/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 0124d17bd9fe..786a9d6356b8 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -32,7 +32,7 @@ config PHY_BERLIN_SATA
 config ARMADA375_USBCLUSTER_PHY
def_bool y
depends on MACH_ARMADA_375 || COMPILE_TEST
-   depends on OF
+   depends on OF && HAS_IOMEM
select GENERIC_PHY
 
 config PHY_DM816X_USB
-- 
2.5.0



[GIT PULL] extcon next for 4.6

2016-02-28 Thread Chanwoo Choi
Dear Greg,

This is extcon-next pull request for v4.6. I add detailed description of
this pull request on below. Please pull extcon with following updates.

Best Regards,
Chanwoo Choi

The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d:

  Linux 4.5-rc1 (2016-01-24 13:06:47 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git 
tags/extcon-next-for-4.6

for you to fetch changes up to ae64e42cc2b3a17ac0c11815f53211093a54cf55:

  extcon: palmas: Drop IRQF_EARLY_RESUME flag (2016-02-29 11:07:34 +0900)


Update extcon for 4.6

Detailed description for patchset:
1. Add new EXTCON_CHG_USB_SDP type
- SDP (Standard Downstream Port) USB Charging Port
  means the charging connector.a

2. Add the VBUS detection by using GPIO on extcon-palmas
- Beaglex15 board uses the extcon-palmas driver
  But, beaglex15 board need the GPIO support for VBUS
  detection.

3. Fix the minor issue of extcon drivers


Chanwoo Choi (1):
  extcon: Add the EXTCON_CHG_USB_SDP to support SDP charing port

Charles Keepax (1):
  extcon: arizona: Use DAPM mutex helper functions

Dan Carpenter (1):
  extcon: max77843: Use correct size for reading the interrupt register

Felipe Balbi (3):
  extcon: palmas: Add the support for VBUS detection by using GPIO
  arm: boot: dts: beaglex15: Remove ID GPIO
  arm: boot: beaglex15: pass correct interrupt

Geliang Tang (1):
  extcon: Use to_i2c_client for both rt8973a and sm5502

Grygorii Strashko (1):
  extcon: palmas: Drop IRQF_EARLY_RESUME flag

Moritz Fischer (1):
  extcon: gpio: Fix typo in comment

 arch/arm/boot/dts/am57xx-beagle-x15.dts |  3 +-
 drivers/extcon/extcon-arizona.c |  4 +--
 drivers/extcon/extcon-gpio.c|  2 +-
 drivers/extcon/extcon-max14577.c|  3 ++
 drivers/extcon/extcon-max77693.c| 12 +++-
 drivers/extcon/extcon-max77843.c|  5 ++-
 drivers/extcon/extcon-max8997.c |  3 ++
 drivers/extcon/extcon-palmas.c  | 54 +++--
 drivers/extcon/extcon-rt8973a.c |  8 +++--
 drivers/extcon/extcon-sm5502.c  |  8 +++--
 include/linux/mfd/palmas.h  |  3 ++
 11 files changed, 92 insertions(+), 13 deletions(-)


[GIT PULL] extcon next for 4.6

2016-02-28 Thread Chanwoo Choi
Dear Greg,

This is extcon-next pull request for v4.6. I add detailed description of
this pull request on below. Please pull extcon with following updates.

Best Regards,
Chanwoo Choi

The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d:

  Linux 4.5-rc1 (2016-01-24 13:06:47 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git 
tags/extcon-next-for-4.6

for you to fetch changes up to ae64e42cc2b3a17ac0c11815f53211093a54cf55:

  extcon: palmas: Drop IRQF_EARLY_RESUME flag (2016-02-29 11:07:34 +0900)


Update extcon for 4.6

Detailed description for patchset:
1. Add new EXTCON_CHG_USB_SDP type
- SDP (Standard Downstream Port) USB Charging Port
  means the charging connector.a

2. Add the VBUS detection by using GPIO on extcon-palmas
- Beaglex15 board uses the extcon-palmas driver
  But, beaglex15 board need the GPIO support for VBUS
  detection.

3. Fix the minor issue of extcon drivers


Chanwoo Choi (1):
  extcon: Add the EXTCON_CHG_USB_SDP to support SDP charing port

Charles Keepax (1):
  extcon: arizona: Use DAPM mutex helper functions

Dan Carpenter (1):
  extcon: max77843: Use correct size for reading the interrupt register

Felipe Balbi (3):
  extcon: palmas: Add the support for VBUS detection by using GPIO
  arm: boot: dts: beaglex15: Remove ID GPIO
  arm: boot: beaglex15: pass correct interrupt

Geliang Tang (1):
  extcon: Use to_i2c_client for both rt8973a and sm5502

Grygorii Strashko (1):
  extcon: palmas: Drop IRQF_EARLY_RESUME flag

Moritz Fischer (1):
  extcon: gpio: Fix typo in comment

 arch/arm/boot/dts/am57xx-beagle-x15.dts |  3 +-
 drivers/extcon/extcon-arizona.c |  4 +--
 drivers/extcon/extcon-gpio.c|  2 +-
 drivers/extcon/extcon-max14577.c|  3 ++
 drivers/extcon/extcon-max77693.c| 12 +++-
 drivers/extcon/extcon-max77843.c|  5 ++-
 drivers/extcon/extcon-max8997.c |  3 ++
 drivers/extcon/extcon-palmas.c  | 54 +++--
 drivers/extcon/extcon-rt8973a.c |  8 +++--
 drivers/extcon/extcon-sm5502.c  |  8 +++--
 include/linux/mfd/palmas.h  |  3 ++
 11 files changed, 92 insertions(+), 13 deletions(-)


Re: [PATCH v6 00/12] Add T210 support in Tegra soctherm

2016-02-28 Thread Wei Ni
Hi,
Does anyone have comments on this series?

Thanks.
Wei.

On 2016年02月22日 16:05, Wei Ni wrote:
> This patchset adds following functions for tegra_soctherm driver:
> 1. add T210 support.
> 2. export debugfs to show some registers.
> 3. add thermtrip funciton.
> 4. add suspend/resume function.
> 
> The v5 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg25079.html
> The v4 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24972.html
> The V3 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24911.html
> The V2 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24901.html
> The V1 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24808.html
> 
> Main changes from V5:
> 1. Change to use linux thermal framework to implement
> thermtrip funciton, per Rob's comment.
> 2. Add .set_trip_temp() in of-thermal driver, so that
> we can set trips on hardware.
> 
> Main changes from V4:
> 1. Change description of devicetree binding per Rob's comment.
> 2. Call of_node_put to decrement refcount of the node.
> 
> Main changes from V3:
> 1. Change structures to "const" in chip specific files.
> 2. Minor changes per Thieery's comments.
> 
> Main changes from V2:
> 1. Fix build error in patch [1/11].
> 2. Use of_get_child_by_name instead of of_find_node_by_name in patch [8/11].
> 3. Use debugfs_remove_recursive to remove debugfs in patch [6/11].
> 
> Main changes from V1:
> 1. Use the new type to handl different Tegra chips in one driver, which 
> suggested by Thierry.
> 2. Changes per Thieery's other comments.
> 
> Wei Ni (12):
>   thermal: tegra: move tegra thermal files into tegra directory
>   thermal: tegra: combine sensor group-related data
>   thermal: tegra: get rid of PDIV/HOTSPOT hack
>   thermal: tegra: split tegra_soctherm driver
>   thermal: tegra: add Tegra210 specific SOC_THERM driver
>   thermal: tegra: add a debugfs to show registers
>   thermal: of-thermal: allow setting trip_temp on hardware
>   of: add notes of critical trips for soctherm
>   thermal: tegra: add thermtrip function
>   thermal: tegra: add PM support
>   arm64: tegra: add soctherm node for Tegra210
>   arm: tegra: set critical trips for Tegra124
> 
>  .../devicetree/bindings/thermal/tegra-soctherm.txt |  12 +
>  arch/arm/boot/dts/tegra124.dtsi|  16 +
>  arch/arm64/boot/dts/nvidia/tegra210.dtsi   |  60 ++
>  drivers/thermal/Kconfig|  12 +-
>  drivers/thermal/Makefile   |   2 +-
>  drivers/thermal/of-thermal.c   |   8 +
>  drivers/thermal/tegra/Kconfig  |  13 +
>  drivers/thermal/tegra/Makefile |   5 +
>  drivers/thermal/tegra/soctherm-fuse.c  | 169 +
>  drivers/thermal/tegra/soctherm.c   | 685 
> +
>  drivers/thermal/tegra/soctherm.h   | 123 
>  drivers/thermal/tegra/tegra124-soctherm.c  | 196 ++
>  drivers/thermal/tegra/tegra210-soctherm.c  | 197 ++
>  drivers/thermal/tegra_soctherm.c   | 476 --
>  include/dt-bindings/thermal/tegra124-soctherm.h|   1 +
>  include/linux/thermal.h|   1 +
>  16 files changed, 1489 insertions(+), 487 deletions(-)
>  create mode 100644 drivers/thermal/tegra/Kconfig
>  create mode 100644 drivers/thermal/tegra/Makefile
>  create mode 100644 drivers/thermal/tegra/soctherm-fuse.c
>  create mode 100644 drivers/thermal/tegra/soctherm.c
>  create mode 100644 drivers/thermal/tegra/soctherm.h
>  create mode 100644 drivers/thermal/tegra/tegra124-soctherm.c
>  create mode 100644 drivers/thermal/tegra/tegra210-soctherm.c
>  delete mode 100644 drivers/thermal/tegra_soctherm.c
> 


Re: [PATCH v6 00/12] Add T210 support in Tegra soctherm

2016-02-28 Thread Wei Ni
Hi,
Does anyone have comments on this series?

Thanks.
Wei.

On 2016年02月22日 16:05, Wei Ni wrote:
> This patchset adds following functions for tegra_soctherm driver:
> 1. add T210 support.
> 2. export debugfs to show some registers.
> 3. add thermtrip funciton.
> 4. add suspend/resume function.
> 
> The v5 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg25079.html
> The v4 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24972.html
> The V3 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24911.html
> The V2 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24901.html
> The V1 serial is in:
> http://www.spinics.net/lists/linux-tegra/msg24808.html
> 
> Main changes from V5:
> 1. Change to use linux thermal framework to implement
> thermtrip funciton, per Rob's comment.
> 2. Add .set_trip_temp() in of-thermal driver, so that
> we can set trips on hardware.
> 
> Main changes from V4:
> 1. Change description of devicetree binding per Rob's comment.
> 2. Call of_node_put to decrement refcount of the node.
> 
> Main changes from V3:
> 1. Change structures to "const" in chip specific files.
> 2. Minor changes per Thieery's comments.
> 
> Main changes from V2:
> 1. Fix build error in patch [1/11].
> 2. Use of_get_child_by_name instead of of_find_node_by_name in patch [8/11].
> 3. Use debugfs_remove_recursive to remove debugfs in patch [6/11].
> 
> Main changes from V1:
> 1. Use the new type to handl different Tegra chips in one driver, which 
> suggested by Thierry.
> 2. Changes per Thieery's other comments.
> 
> Wei Ni (12):
>   thermal: tegra: move tegra thermal files into tegra directory
>   thermal: tegra: combine sensor group-related data
>   thermal: tegra: get rid of PDIV/HOTSPOT hack
>   thermal: tegra: split tegra_soctherm driver
>   thermal: tegra: add Tegra210 specific SOC_THERM driver
>   thermal: tegra: add a debugfs to show registers
>   thermal: of-thermal: allow setting trip_temp on hardware
>   of: add notes of critical trips for soctherm
>   thermal: tegra: add thermtrip function
>   thermal: tegra: add PM support
>   arm64: tegra: add soctherm node for Tegra210
>   arm: tegra: set critical trips for Tegra124
> 
>  .../devicetree/bindings/thermal/tegra-soctherm.txt |  12 +
>  arch/arm/boot/dts/tegra124.dtsi|  16 +
>  arch/arm64/boot/dts/nvidia/tegra210.dtsi   |  60 ++
>  drivers/thermal/Kconfig|  12 +-
>  drivers/thermal/Makefile   |   2 +-
>  drivers/thermal/of-thermal.c   |   8 +
>  drivers/thermal/tegra/Kconfig  |  13 +
>  drivers/thermal/tegra/Makefile |   5 +
>  drivers/thermal/tegra/soctherm-fuse.c  | 169 +
>  drivers/thermal/tegra/soctherm.c   | 685 
> +
>  drivers/thermal/tegra/soctherm.h   | 123 
>  drivers/thermal/tegra/tegra124-soctherm.c  | 196 ++
>  drivers/thermal/tegra/tegra210-soctherm.c  | 197 ++
>  drivers/thermal/tegra_soctherm.c   | 476 --
>  include/dt-bindings/thermal/tegra124-soctherm.h|   1 +
>  include/linux/thermal.h|   1 +
>  16 files changed, 1489 insertions(+), 487 deletions(-)
>  create mode 100644 drivers/thermal/tegra/Kconfig
>  create mode 100644 drivers/thermal/tegra/Makefile
>  create mode 100644 drivers/thermal/tegra/soctherm-fuse.c
>  create mode 100644 drivers/thermal/tegra/soctherm.c
>  create mode 100644 drivers/thermal/tegra/soctherm.h
>  create mode 100644 drivers/thermal/tegra/tegra124-soctherm.c
>  create mode 100644 drivers/thermal/tegra/tegra210-soctherm.c
>  delete mode 100644 drivers/thermal/tegra_soctherm.c
> 


Re: [PATCH 01/10] fs crypto: add basic definitions for per-file encryption

2016-02-28 Thread Randy Dunlap
On 02/25/16 11:25, Jaegeuk Kim wrote:
> This patch adds definitions for per-file encryption used by ext4 and f2fs.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  include/linux/fs.h   |   8 ++
>  include/linux/fscrypto.h | 239 
> +++
>  include/uapi/linux/fs.h  |  18 
>  3 files changed, 265 insertions(+)
>  create mode 100644 include/linux/fscrypto.h
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ae68100..d8f57cf 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -53,6 +53,8 @@ struct swap_info_struct;
>  struct seq_file;
>  struct workqueue_struct;
>  struct iov_iter;
> +struct fscrypt_info;
> +struct fscrypt_operations;
>  
>  extern void __init inode_init(void);
>  extern void __init inode_init_early(void);
> @@ -678,6 +680,10 @@ struct inode {
>   struct hlist_head   i_fsnotify_marks;
>  #endif
>  
> +#ifdef CONFIG_FS_ENCRYPTION
> + struct fscrypt_info *i_crypt_info;
> +#endif
> +
>   void*i_private; /* fs or device private pointer */
>  };
>  
> @@ -1323,6 +1329,8 @@ struct super_block {
>  #endif
>   const struct xattr_handler **s_xattr;
>  
> + const struct fscrypt_operations *s_cop;
> +
>   struct hlist_bl_heads_anon; /* anonymous dentries for (nfs) 
> exporting */
>   struct list_heads_mounts;   /* list of mounts; _not_ for fs 
> use */
>   struct block_device *s_bdev;
> diff --git a/include/linux/fscrypto.h b/include/linux/fscrypto.h
> new file mode 100644
> index 000..b0aed92
> --- /dev/null
> +++ b/include/linux/fscrypto.h
> @@ -0,0 +1,239 @@
> +/*
> + * General per-file encryption definition
> + *
> + * Copyright (C) 2015, Google, Inc.
> + *
> + * Written by Michael Halcrow, 2015.
> + * Modified by Jaegeuk Kim, 2015.
> + */
> +
> +#ifndef _LINUX_FSCRYPTO_H
> +#define _LINUX_FSCRYPTO_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define FS_KEY_DERIVATION_NONCE_SIZE 16
> +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1  1
> +
> +#define FS_POLICY_FLAGS_PAD_40x00
> +#define FS_POLICY_FLAGS_PAD_80x01
> +#define FS_POLICY_FLAGS_PAD_16   0x02
> +#define FS_POLICY_FLAGS_PAD_32   0x03
> +#define FS_POLICY_FLAGS_PAD_MASK 0x03
> +#define FS_POLICY_FLAGS_VALID0x03
> +
> +/* Encryption algorithms */
> +#define FS_ENCRYPTION_MODE_INVALID   0
> +#define FS_ENCRYPTION_MODE_AES_256_XTS   1
> +#define FS_ENCRYPTION_MODE_AES_256_GCM   2
> +#define FS_ENCRYPTION_MODE_AES_256_CBC   3
> +#define FS_ENCRYPTION_MODE_AES_256_CTS   4
> +
> +/**
> + * Encryption context for inode
> + *
> + * Protector format:
> + *  1 byte: Protector format (1 = this version)
> + *  1 byte: File contents encryption mode
> + *  1 byte: File names encryption mode
> + *  1 byte: Flags
> + *  8 bytes: Master Key descriptor
> + *  16 bytes: Encryption Key derivation nonce
> + */
> +struct fscrypt_context {
> + char format;
> + char contents_encryption_mode;
> + char filenames_encryption_mode;
> + char flags;
> + char master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
> + char nonce[FS_KEY_DERIVATION_NONCE_SIZE];

how about u8 instead of char?

> +} __packed;
> +
> +/* Encryption parameters */
> +#define FS_XTS_TWEAK_SIZE16
> +#define FS_AES_128_ECB_KEY_SIZE  16
> +#define FS_AES_256_GCM_KEY_SIZE  32
> +#define FS_AES_256_CBC_KEY_SIZE  32
> +#define FS_AES_256_CTS_KEY_SIZE  32
> +#define FS_AES_256_XTS_KEY_SIZE  64
> +#define FS_MAX_KEY_SIZE  64
> +
> +#define FS_KEY_DESC_PREFIX   "fscrypt:"
> +#define FS_KEY_DESC_PREFIX_SIZE  8
> +
> +/* This is passed in from userspace into the kernel keyring */
> +struct fscrypt_key {
> + __u32 mode;
> + char raw[FS_MAX_KEY_SIZE];
> + __u32 size;
> +} __packed;
> +
> +struct fscrypt_info {
> + char ci_data_mode;
> + char ci_filename_mode;
> + char ci_flags;

ditto

> + struct crypto_ablkcipher *ci_ctfm;
> + struct key *ci_keyring_key;
> + char ci_master_key[FS_KEY_DESCRIPTOR_SIZE];
> +};
> +
> +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL  0x0001
> +#define FS_WRITE_PATH_FL 0x0002
> +
> +struct fscrypt_ctx {
> + union {
> + struct {
> + struct page *bounce_page;   /* Ciphertext page */
> + struct page *control_page;  /* Original page  */
> + } w;
> + struct {
> + struct bio *bio;
> + struct work_struct work;
> + } r;
> + struct list_head free_list; /* Free list */
> + };
> + char flags; /* Flags */
> + char mode;   

Re: [PATCH 01/10] fs crypto: add basic definitions for per-file encryption

2016-02-28 Thread Randy Dunlap
On 02/25/16 11:25, Jaegeuk Kim wrote:
> This patch adds definitions for per-file encryption used by ext4 and f2fs.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  include/linux/fs.h   |   8 ++
>  include/linux/fscrypto.h | 239 
> +++
>  include/uapi/linux/fs.h  |  18 
>  3 files changed, 265 insertions(+)
>  create mode 100644 include/linux/fscrypto.h
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ae68100..d8f57cf 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -53,6 +53,8 @@ struct swap_info_struct;
>  struct seq_file;
>  struct workqueue_struct;
>  struct iov_iter;
> +struct fscrypt_info;
> +struct fscrypt_operations;
>  
>  extern void __init inode_init(void);
>  extern void __init inode_init_early(void);
> @@ -678,6 +680,10 @@ struct inode {
>   struct hlist_head   i_fsnotify_marks;
>  #endif
>  
> +#ifdef CONFIG_FS_ENCRYPTION
> + struct fscrypt_info *i_crypt_info;
> +#endif
> +
>   void*i_private; /* fs or device private pointer */
>  };
>  
> @@ -1323,6 +1329,8 @@ struct super_block {
>  #endif
>   const struct xattr_handler **s_xattr;
>  
> + const struct fscrypt_operations *s_cop;
> +
>   struct hlist_bl_heads_anon; /* anonymous dentries for (nfs) 
> exporting */
>   struct list_heads_mounts;   /* list of mounts; _not_ for fs 
> use */
>   struct block_device *s_bdev;
> diff --git a/include/linux/fscrypto.h b/include/linux/fscrypto.h
> new file mode 100644
> index 000..b0aed92
> --- /dev/null
> +++ b/include/linux/fscrypto.h
> @@ -0,0 +1,239 @@
> +/*
> + * General per-file encryption definition
> + *
> + * Copyright (C) 2015, Google, Inc.
> + *
> + * Written by Michael Halcrow, 2015.
> + * Modified by Jaegeuk Kim, 2015.
> + */
> +
> +#ifndef _LINUX_FSCRYPTO_H
> +#define _LINUX_FSCRYPTO_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define FS_KEY_DERIVATION_NONCE_SIZE 16
> +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1  1
> +
> +#define FS_POLICY_FLAGS_PAD_40x00
> +#define FS_POLICY_FLAGS_PAD_80x01
> +#define FS_POLICY_FLAGS_PAD_16   0x02
> +#define FS_POLICY_FLAGS_PAD_32   0x03
> +#define FS_POLICY_FLAGS_PAD_MASK 0x03
> +#define FS_POLICY_FLAGS_VALID0x03
> +
> +/* Encryption algorithms */
> +#define FS_ENCRYPTION_MODE_INVALID   0
> +#define FS_ENCRYPTION_MODE_AES_256_XTS   1
> +#define FS_ENCRYPTION_MODE_AES_256_GCM   2
> +#define FS_ENCRYPTION_MODE_AES_256_CBC   3
> +#define FS_ENCRYPTION_MODE_AES_256_CTS   4
> +
> +/**
> + * Encryption context for inode
> + *
> + * Protector format:
> + *  1 byte: Protector format (1 = this version)
> + *  1 byte: File contents encryption mode
> + *  1 byte: File names encryption mode
> + *  1 byte: Flags
> + *  8 bytes: Master Key descriptor
> + *  16 bytes: Encryption Key derivation nonce
> + */
> +struct fscrypt_context {
> + char format;
> + char contents_encryption_mode;
> + char filenames_encryption_mode;
> + char flags;
> + char master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
> + char nonce[FS_KEY_DERIVATION_NONCE_SIZE];

how about u8 instead of char?

> +} __packed;
> +
> +/* Encryption parameters */
> +#define FS_XTS_TWEAK_SIZE16
> +#define FS_AES_128_ECB_KEY_SIZE  16
> +#define FS_AES_256_GCM_KEY_SIZE  32
> +#define FS_AES_256_CBC_KEY_SIZE  32
> +#define FS_AES_256_CTS_KEY_SIZE  32
> +#define FS_AES_256_XTS_KEY_SIZE  64
> +#define FS_MAX_KEY_SIZE  64
> +
> +#define FS_KEY_DESC_PREFIX   "fscrypt:"
> +#define FS_KEY_DESC_PREFIX_SIZE  8
> +
> +/* This is passed in from userspace into the kernel keyring */
> +struct fscrypt_key {
> + __u32 mode;
> + char raw[FS_MAX_KEY_SIZE];
> + __u32 size;
> +} __packed;
> +
> +struct fscrypt_info {
> + char ci_data_mode;
> + char ci_filename_mode;
> + char ci_flags;

ditto

> + struct crypto_ablkcipher *ci_ctfm;
> + struct key *ci_keyring_key;
> + char ci_master_key[FS_KEY_DESCRIPTOR_SIZE];
> +};
> +
> +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL  0x0001
> +#define FS_WRITE_PATH_FL 0x0002
> +
> +struct fscrypt_ctx {
> + union {
> + struct {
> + struct page *bounce_page;   /* Ciphertext page */
> + struct page *control_page;  /* Original page  */
> + } w;
> + struct {
> + struct bio *bio;
> + struct work_struct work;
> + } r;
> + struct list_head free_list; /* Free list */
> + };
> + char flags; /* Flags */
> + char mode;  /* Encryption 

Re: [PATCH 06/10] fs crypto: add Makefile and Kconfig

2016-02-28 Thread Randy Dunlap
On 02/25/16 11:26, Jaegeuk Kim wrote:
> This patch adds a facility to enable per-file encryption.
> 
> Arnd fixes a missing CONFIG_BLOCK check in the original patch.
> "The newly added generic crypto abstraction for file systems operates
> on 'struct bio' objects, which do not exist when CONFIG_BLOCK is
> disabled:
> 
> fs/crypto/crypto.c: In function 'fscrypt_zeroout_range':
> fs/crypto/crypto.c:308:9: error: implicit declaration of function 'bio_alloc' 
> [-Werror=implicit-function-declaration]
> 
> This adds a Kconfig dependency that prevents FS_ENCRYPTION from being
> enabled without BLOCK."
> 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/Kconfig |  2 ++
>  fs/Makefile|  1 +
>  fs/crypto/Kconfig  | 17 +
>  fs/crypto/Makefile |  2 ++
>  4 files changed, 22 insertions(+)
>  create mode 100644 fs/crypto/Kconfig
>  create mode 100644 fs/crypto/Makefile
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 9adee0d..9d75767 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -84,6 +84,8 @@ config MANDATORY_FILE_LOCKING
>  
> To the best of my knowledge this is dead code that no one cares about.
>  
> +source "fs/crypto/Kconfig"
> +
>  source "fs/notify/Kconfig"
>  
>  source "fs/quota/Kconfig"
> diff --git a/fs/Makefile b/fs/Makefile
> index 79f5225..47571e2 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_EVENTFD)   += eventfd.o
>  obj-$(CONFIG_USERFAULTFD)+= userfaultfd.o
>  obj-$(CONFIG_AIO)   += aio.o
>  obj-$(CONFIG_FS_DAX) += dax.o
> +obj-y+= crypto/
>  obj-$(CONFIG_FILE_LOCKING)  += locks.o
>  obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
>  obj-$(CONFIG_BINFMT_AOUT)+= binfmt_aout.o
> diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> new file mode 100644
> index 000..9bea124e
> --- /dev/null
> +++ b/fs/crypto/Kconfig
> @@ -0,0 +1,17 @@
> +config FS_ENCRYPTION
> + bool "FS Encryption (Per-file encryption)"
> + depends on BLOCK

depends on CRYPTO
since all of the CRYPTO_xxx below also depend on CRYPTO.

> + select CRYPTO_AES
> + select CRYPTO_CBC
> + select CRYPTO_ECB
> + select CRYPTO_XTS
> + select CRYPTO_CTS
> + select CRYPTO_CTR
> + select CRYPTO_SHA256
> + select KEYS
> + select ENCRYPTED_KEYS
> + help
> +   Enable encryption of files and directories.  This
> +   feature is similar to ecryptfs, but it is more memory
> +   efficient since it avoids caching the encrypted and
> +   decrypted pages in the page cache.
> diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile
> new file mode 100644
> index 000..f9f68cd
> --- /dev/null
> +++ b/fs/crypto/Makefile
> @@ -0,0 +1,2 @@
> +obj-y += fname.o
> +obj-$(CONFIG_FS_ENCRYPTION)  += crypto.o policy.o keyinfo.o
> 


-- 
~Randy


Re: [PATCH 06/10] fs crypto: add Makefile and Kconfig

2016-02-28 Thread Randy Dunlap
On 02/25/16 11:26, Jaegeuk Kim wrote:
> This patch adds a facility to enable per-file encryption.
> 
> Arnd fixes a missing CONFIG_BLOCK check in the original patch.
> "The newly added generic crypto abstraction for file systems operates
> on 'struct bio' objects, which do not exist when CONFIG_BLOCK is
> disabled:
> 
> fs/crypto/crypto.c: In function 'fscrypt_zeroout_range':
> fs/crypto/crypto.c:308:9: error: implicit declaration of function 'bio_alloc' 
> [-Werror=implicit-function-declaration]
> 
> This adds a Kconfig dependency that prevents FS_ENCRYPTION from being
> enabled without BLOCK."
> 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/Kconfig |  2 ++
>  fs/Makefile|  1 +
>  fs/crypto/Kconfig  | 17 +
>  fs/crypto/Makefile |  2 ++
>  4 files changed, 22 insertions(+)
>  create mode 100644 fs/crypto/Kconfig
>  create mode 100644 fs/crypto/Makefile
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 9adee0d..9d75767 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -84,6 +84,8 @@ config MANDATORY_FILE_LOCKING
>  
> To the best of my knowledge this is dead code that no one cares about.
>  
> +source "fs/crypto/Kconfig"
> +
>  source "fs/notify/Kconfig"
>  
>  source "fs/quota/Kconfig"
> diff --git a/fs/Makefile b/fs/Makefile
> index 79f5225..47571e2 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_EVENTFD)   += eventfd.o
>  obj-$(CONFIG_USERFAULTFD)+= userfaultfd.o
>  obj-$(CONFIG_AIO)   += aio.o
>  obj-$(CONFIG_FS_DAX) += dax.o
> +obj-y+= crypto/
>  obj-$(CONFIG_FILE_LOCKING)  += locks.o
>  obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
>  obj-$(CONFIG_BINFMT_AOUT)+= binfmt_aout.o
> diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> new file mode 100644
> index 000..9bea124e
> --- /dev/null
> +++ b/fs/crypto/Kconfig
> @@ -0,0 +1,17 @@
> +config FS_ENCRYPTION
> + bool "FS Encryption (Per-file encryption)"
> + depends on BLOCK

depends on CRYPTO
since all of the CRYPTO_xxx below also depend on CRYPTO.

> + select CRYPTO_AES
> + select CRYPTO_CBC
> + select CRYPTO_ECB
> + select CRYPTO_XTS
> + select CRYPTO_CTS
> + select CRYPTO_CTR
> + select CRYPTO_SHA256
> + select KEYS
> + select ENCRYPTED_KEYS
> + help
> +   Enable encryption of files and directories.  This
> +   feature is similar to ecryptfs, but it is more memory
> +   efficient since it avoids caching the encrypted and
> +   decrypted pages in the page cache.
> diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile
> new file mode 100644
> index 000..f9f68cd
> --- /dev/null
> +++ b/fs/crypto/Makefile
> @@ -0,0 +1,2 @@
> +obj-y += fname.o
> +obj-$(CONFIG_FS_ENCRYPTION)  += crypto.o policy.o keyinfo.o
> 


-- 
~Randy


[PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well

2016-02-28 Thread Andy Lutomirski
Setting TF prevents fastpath returns in most cases, which causes the
test to fail on 32-bit kernels because 32-bit kernels do not, in
fact, handle NT correctly on SYSENTER entries.

The next patch will fix 32-bit kernels.

Signed-off-by: Andy Lutomirski 
---
 tools/testing/selftests/x86/syscall_nt.c | 57 +++-
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/x86/syscall_nt.c 
b/tools/testing/selftests/x86/syscall_nt.c
index 60c06af4646a..a6ceff86c199 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -17,6 +17,9 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -26,6 +29,8 @@
 # define WIDTH "l"
 #endif
 
+static unsigned int nerrs;
+
 static unsigned long get_eflags(void)
 {
unsigned long eflags;
@@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags)
  : : "rm" (eflags) : "flags");
 }
 
-int main()
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+  int flags)
 {
-   printf("[RUN]\tSet NT and issue a syscall\n");
-   set_eflags(get_eflags() | X86_EFLAGS_NT);
+   struct sigaction sa;
+   memset(, 0, sizeof(sa));
+   sa.sa_sigaction = handler;
+   sa.sa_flags = SA_SIGINFO | flags;
+   sigemptyset(_mask);
+   if (sigaction(sig, , 0))
+   err(1, "sigaction");
+}
+
+static void sigtrap(int sig, siginfo_t *si, void *ctx_void)
+{
+}
+
+static void do_it(unsigned long extraflags)
+{
+   unsigned long flags;
+
+   set_eflags(get_eflags() | extraflags);
syscall(SYS_getpid);
-   if (get_eflags() & X86_EFLAGS_NT) {
-   printf("[OK]\tThe syscall worked and NT is still set\n");
-   return 0;
+   flags = get_eflags();
+   if ((flags & extraflags) == extraflags) {
+   printf("[OK]\tThe syscall worked and flags are still set\n");
} else {
-   printf("[FAIL]\tThe syscall worked but NT was cleared\n");
-   return 1;
+   printf("[FAIL]\tThe syscall worked but flags were cleared 
(flags = 0x%lx but expected 0x%lx set)\n",
+  flags, extraflags);
+   nerrs++;
}
 }
+
+int main()
+{
+   printf("[RUN]\tSet NT and issue a syscall\n");
+   do_it(X86_EFLAGS_NT);
+
+   /*
+* Now try it again with TF set -- TF forces returns via IRET in all
+* cases except non-ptregs-using 64-bit full fast path syscalls.
+*/
+
+   sethandler(SIGTRAP, sigtrap, 0);
+
+   printf("[RUN]\tSet NT|TF and issue a syscall\n");
+   do_it(X86_EFLAGS_NT | X86_EFLAGS_TF);
+
+   return nerrs == 0 ? 0 : 1;
+}
-- 
2.5.0



[PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well

2016-02-28 Thread Andy Lutomirski
Setting TF prevents fastpath returns in most cases, which causes the
test to fail on 32-bit kernels because 32-bit kernels do not, in
fact, handle NT correctly on SYSENTER entries.

The next patch will fix 32-bit kernels.

Signed-off-by: Andy Lutomirski 
---
 tools/testing/selftests/x86/syscall_nt.c | 57 +++-
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/x86/syscall_nt.c 
b/tools/testing/selftests/x86/syscall_nt.c
index 60c06af4646a..a6ceff86c199 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -17,6 +17,9 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -26,6 +29,8 @@
 # define WIDTH "l"
 #endif
 
+static unsigned int nerrs;
+
 static unsigned long get_eflags(void)
 {
unsigned long eflags;
@@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags)
  : : "rm" (eflags) : "flags");
 }
 
-int main()
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+  int flags)
 {
-   printf("[RUN]\tSet NT and issue a syscall\n");
-   set_eflags(get_eflags() | X86_EFLAGS_NT);
+   struct sigaction sa;
+   memset(, 0, sizeof(sa));
+   sa.sa_sigaction = handler;
+   sa.sa_flags = SA_SIGINFO | flags;
+   sigemptyset(_mask);
+   if (sigaction(sig, , 0))
+   err(1, "sigaction");
+}
+
+static void sigtrap(int sig, siginfo_t *si, void *ctx_void)
+{
+}
+
+static void do_it(unsigned long extraflags)
+{
+   unsigned long flags;
+
+   set_eflags(get_eflags() | extraflags);
syscall(SYS_getpid);
-   if (get_eflags() & X86_EFLAGS_NT) {
-   printf("[OK]\tThe syscall worked and NT is still set\n");
-   return 0;
+   flags = get_eflags();
+   if ((flags & extraflags) == extraflags) {
+   printf("[OK]\tThe syscall worked and flags are still set\n");
} else {
-   printf("[FAIL]\tThe syscall worked but NT was cleared\n");
-   return 1;
+   printf("[FAIL]\tThe syscall worked but flags were cleared 
(flags = 0x%lx but expected 0x%lx set)\n",
+  flags, extraflags);
+   nerrs++;
}
 }
+
+int main()
+{
+   printf("[RUN]\tSet NT and issue a syscall\n");
+   do_it(X86_EFLAGS_NT);
+
+   /*
+* Now try it again with TF set -- TF forces returns via IRET in all
+* cases except non-ptregs-using 64-bit full fast path syscalls.
+*/
+
+   sethandler(SIGTRAP, sigtrap, 0);
+
+   printf("[RUN]\tSet NT|TF and issue a syscall\n");
+   do_it(X86_EFLAGS_NT | X86_EFLAGS_TF);
+
+   return nerrs == 0 ? 0 : 1;
+}
-- 
2.5.0



[PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups

2016-02-28 Thread Andy Lutomirski
hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
path.  Little did he know...

This series makes the observed behavior of SYSENTER wrt flags the same
for all sane flags and kernel bitnesses.  That is, SYSENTER preserves
flags now unless you do a syscall that explicitly changes flags, and
the HW flags that the syscall executes with are sanitized.  This
includes NT, TF, AC and all arithmetic flags.  Prior to this series,
32-bit kernels clobbered TF and the arithmetic flags and behaved
highly erratically if NT was set.  (If IF is cleared by evil userspace
when SYSENTER starts, IF will be set again on return.  There's nothing
the kernel can do about this -- SYSENTER inherently forgets the state
of IF.)

This series speeds up SYSENTER on all kernels by a surprisingly large
amount on Skylake because it eliminates an unconditional CLAC.

While SYSENTER used to handle TF correctly as far as I can tell on
64-bit kernels, the means by which it did so was heavily tangled up in
the ptrace single-step logic.  It now works just like all the other
kernel entries except insofar as do_debug has a simple special case
for it.  Relatedly, the bizarre and poorly explained old fixup in
do_debug is now hidden behind a WARN_ON_ONCE in preparation for
deleting it at some point.

The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels
used to be both terrifying and incorrect.  (It doesn't appear to have
been exploitably bad, but the reason for that is subtle, and the code
was certainy more fragile than it deserved to me.)  We still need a
special fixup, but it's much simpler now.

While I was doing all this, I also noticed that DR6 and BTF handling
in do_debug was a bit off.  Two of the patches in here try to fix it
up.

Have fun!

tl;dr: Cleanups and sanity fixes here, but no security fixes, and I
don't think anything needs to be backported or put in x86/urgent.

This series applies to the result of merging tip:x86/asm and
tip:x86/urgent.  I've been testing on a somewhat bastardized base,
because tip currently doesn't work on my laptop in 32-bit mode.  (That
bug is fixed in Linus' tree.)

Andy Lutomirski (10):
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing
FLAGS test
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/traps: Clear DR6 early in do_debug and improve the comment
  x86/entry: Vastly simplify SYSENTER TF handling
  x86/entry: Only allocate space for SYSENTER_stack if needed
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry/32: Add and check a stack canary for the SYSENTER stack

 arch/x86/entry/entry_32.S| 182 ++-
 arch/x86/entry/entry_64_compat.S |  15 ++-
 arch/x86/include/asm/processor.h |   5 +-
 arch/x86/include/asm/proto.h |  15 ++-
 arch/x86/kernel/asm-offsets_32.c |   5 +
 arch/x86/kernel/process.c|   3 +
 arch/x86/kernel/traps.c  |  87 ---
 tools/testing/selftests/x86/syscall_nt.c |  57 --
 8 files changed, 263 insertions(+), 106 deletions(-)

-- 
2.5.0



[PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test

2016-02-28 Thread Andy Lutomirski
CLAC is slow, and the SYSENTER code already has an unlikely path
that runs if unusual flags are set.  Drop the CLAC and instead rely
on the unlikely path to clear AC.

This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
make this faster please!)

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64_compat.S | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 89bcb4979e7a..7c8e72da7654 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
 */
pushfq  /* pt_regs->flags (except IF = 0) */
orl $X86_EFLAGS_IF, (%rsp)  /* Fix saved flags */
-   ASM_CLAC/* Clear AC after saving FLAGS */
-
pushq   $__USER32_CS/* pt_regs->cs */
xorq%r8,%r8
pushq   %r8 /* pt_regs->ip = 0 (placeholder) */
@@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
cld
 
/*
-* Sysenter doesn't filter flags, so we need to clear NT
+* Sysenter doesn't filter flags, so we need to clear NT and AC
 * ourselves.  To save a few cycles, we can check whether
-* NT was set instead of doing an unconditional popfq.
+* either was set instead of doing an unconditional popfq.
 * This needs to happen before enabling interrupts so that
 * we don't get preempted with NT set.
 *
@@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
 * we're keeping that code behind a branch which will predict as
 * not-taken and therefore its instructions won't be fetched.
 */
-   testl   $X86_EFLAGS_NT, EFLAGS(%rsp)
+   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
jnz .Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-- 
2.5.0



[PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups

2016-02-28 Thread Andy Lutomirski
hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER
path.  Little did he know...

This series makes the observed behavior of SYSENTER wrt flags the same
for all sane flags and kernel bitnesses.  That is, SYSENTER preserves
flags now unless you do a syscall that explicitly changes flags, and
the HW flags that the syscall executes with are sanitized.  This
includes NT, TF, AC and all arithmetic flags.  Prior to this series,
32-bit kernels clobbered TF and the arithmetic flags and behaved
highly erratically if NT was set.  (If IF is cleared by evil userspace
when SYSENTER starts, IF will be set again on return.  There's nothing
the kernel can do about this -- SYSENTER inherently forgets the state
of IF.)

This series speeds up SYSENTER on all kernels by a surprisingly large
amount on Skylake because it eliminates an unconditional CLAC.

While SYSENTER used to handle TF correctly as far as I can tell on
64-bit kernels, the means by which it did so was heavily tangled up in
the ptrace single-step logic.  It now works just like all the other
kernel entries except insofar as do_debug has a simple special case
for it.  Relatedly, the bizarre and poorly explained old fixup in
do_debug is now hidden behind a WARN_ON_ONCE in preparation for
deleting it at some point.

The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels
used to be both terrifying and incorrect.  (It doesn't appear to have
been exploitably bad, but the reason for that is subtle, and the code
was certainy more fragile than it deserved to me.)  We still need a
special fixup, but it's much simpler now.

While I was doing all this, I also noticed that DR6 and BTF handling
in do_debug was a bit off.  Two of the patches in here try to fix it
up.

Have fun!

tl;dr: Cleanups and sanity fixes here, but no security fixes, and I
don't think anything needs to be backported or put in x86/urgent.

This series applies to the result of merging tip:x86/asm and
tip:x86/urgent.  I've been testing on a somewhat bastardized base,
because tip currently doesn't work on my laptop in 32-bit mode.  (That
bug is fixed in Linus' tree.)

Andy Lutomirski (10):
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing
FLAGS test
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/traps: Clear DR6 early in do_debug and improve the comment
  x86/entry: Vastly simplify SYSENTER TF handling
  x86/entry: Only allocate space for SYSENTER_stack if needed
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry/32: Add and check a stack canary for the SYSENTER stack

 arch/x86/entry/entry_32.S| 182 ++-
 arch/x86/entry/entry_64_compat.S |  15 ++-
 arch/x86/include/asm/processor.h |   5 +-
 arch/x86/include/asm/proto.h |  15 ++-
 arch/x86/kernel/asm-offsets_32.c |   5 +
 arch/x86/kernel/process.c|   3 +
 arch/x86/kernel/traps.c  |  87 ---
 tools/testing/selftests/x86/syscall_nt.c |  57 --
 8 files changed, 263 insertions(+), 106 deletions(-)

-- 
2.5.0



[PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test

2016-02-28 Thread Andy Lutomirski
CLAC is slow, and the SYSENTER code already has an unlikely path
that runs if unusual flags are set.  Drop the CLAC and instead rely
on the unlikely path to clear AC.

This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
make this faster please!)

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64_compat.S | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 89bcb4979e7a..7c8e72da7654 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat)
 */
pushfq  /* pt_regs->flags (except IF = 0) */
orl $X86_EFLAGS_IF, (%rsp)  /* Fix saved flags */
-   ASM_CLAC/* Clear AC after saving FLAGS */
-
pushq   $__USER32_CS/* pt_regs->cs */
xorq%r8,%r8
pushq   %r8 /* pt_regs->ip = 0 (placeholder) */
@@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat)
cld
 
/*
-* Sysenter doesn't filter flags, so we need to clear NT
+* Sysenter doesn't filter flags, so we need to clear NT and AC
 * ourselves.  To save a few cycles, we can check whether
-* NT was set instead of doing an unconditional popfq.
+* either was set instead of doing an unconditional popfq.
 * This needs to happen before enabling interrupts so that
 * we don't get preempted with NT set.
 *
@@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat)
 * we're keeping that code behind a branch which will predict as
 * not-taken and therefore its instructions won't be fetched.
 */
-   testl   $X86_EFLAGS_NT, EFLAGS(%rsp)
+   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp)
jnz .Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-- 
2.5.0



[PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment

2016-02-28 Thread Andy Lutomirski
Leaving any bits set in DR6 on return from a debug exception is
asking for trouble.  Prevent it by writing zero right away and
clarify the comment.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/traps.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 19e6cfa501e3..6dddc220e3ed 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
ist_enter(regs);
 
get_debugreg(dr6, 6);
+   /*
+* The Intel SDM says:
+*
+*   Certain debug exceptions may clear bits 0-3. The remaining
+*   contents of the DR6 register are never cleared by the
+*   processor. To avoid confusion in identifying debug
+*   exceptions, debug handlers should clear the register before
+*   returning to the interrupted task.
+*
+* Keep it simple: clear DR6 immediately.
+*/
+   set_debugreg(0, 6);
 
/* Filter out all the reserved bits which are preset to 1 */
dr6 &= ~DR6_RESERVED;
@@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
goto exit;
 
-   /* DR6 may or may not be cleared by the CPU */
-   set_debugreg(0, 6);
-
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0



[PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment

2016-02-28 Thread Andy Lutomirski
Leaving any bits set in DR6 on return from a debug exception is
asking for trouble.  Prevent it by writing zero right away and
clarify the comment.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/traps.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 19e6cfa501e3..6dddc220e3ed 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
ist_enter(regs);
 
get_debugreg(dr6, 6);
+   /*
+* The Intel SDM says:
+*
+*   Certain debug exceptions may clear bits 0-3. The remaining
+*   contents of the DR6 register are never cleared by the
+*   processor. To avoid confusion in identifying debug
+*   exceptions, debug handlers should clear the register before
+*   returning to the interrupted task.
+*
+* Keep it simple: clear DR6 immediately.
+*/
+   set_debugreg(0, 6);
 
/* Filter out all the reserved bits which are preset to 1 */
dr6 &= ~DR6_RESERVED;
@@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
goto exit;
 
-   /* DR6 may or may not be cleared by the CPU */
-   set_debugreg(0, 6);
-
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0



[PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-02-28 Thread Andy Lutomirski
Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
if a user does SYSENTER with TF set, we will single-step through the
kernel until something clears TF.  There is absolutely nothing we can
do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
   add TF to that list of flags to sanitize with no overhead whatsoever.

2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
making sure we clear TF before any branches in the SYSENTER
code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S| 42 ++--
 arch/x86/entry/entry_64_compat.S |  9 ++-
 arch/x86/include/asm/proto.h | 15 ++--
 arch/x86/kernel/traps.c  | 52 +---
 4 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ed171f938960..752d4f031a18 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -287,7 +287,26 @@ need_resched:
 END(resume_kernel)
 #endif
 
-   # SYSENTER  call handler stub
+GLOBAL(__begin_SYSENTER_singlestep_region)
+/*
+ * All code from here through __end_SYSENTER_singlestep_region is subject
+ * to being single-stepped if a user program sets TF and executes SYSENTER.
+ * There is absolutely nothing that we can do to prevent this from happening
+ * (thanks Intel!).  To keep our handling of this situation as simple as
+ * possible, we handle TF just like AC and NT, except that our #DB handler
+ * will ignore all of the single-step traps generated in this range.
+ */
+
+#ifdef CONFIG_XEN
+/*
+ * Xen doesn't set %esp to be precisely what the normal SYSENTER
+ * entry point expects, so fix it up before using the normal path.
+ */
+ENTRY(xen_sysenter_target)
+   addl$5*4, %esp  /* remove xen-provided frame */
+   jmp sysenter_past_esp
+#endif
+
 ENTRY(entry_SYSENTER_32)
movlTSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -301,19 +320,25 @@ sysenter_past_esp:
SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */
 
/*
-* Sysenter doesn't filter flags, so we need to clear NT and AC
-* ourselves.  To save a few cycles, we can check whether
+* Sysenter doesn't filter flags, so we need to clear NT, AC
+* and TF ourselves.  To save a few cycles, we can check whether
 * either was set instead of doing an unconditional popfq.
 * This needs to happen before enabling interrupts so that
 * we don't get preempted with NT set.
 *
+* If TF is set, we will single-step all the way to here -- do_debug
+* will ignore all the traps.  (Yes, this is slow, but so is
+* single-stepping in general.  This allows us to avoid having
+* a more complicated code to handle the case where a user program
+* forces us to single-step through the SYSENTER entry code.)
+*
 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 * out-of-line as an optimization: NT is unlikely to be set in the
 * majority of the cases and instead of polluting the I$ unnecessarily,
 * we're keeping that code behind a branch which will predict as
 * not-taken and therefore its instructions won't be fetched.
 */
-   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp)
jnz .Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -369,6 +394,7 @@ sysenter_past_esp:
pushl   $X86_EFLAGS_FIXED
popfl
jmp .Lsysenter_flags_fixed
+GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
# system call handler stub
@@ -662,14 +688,6 @@ ENTRY(spurious_interrupt_bug)
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
-   addl$5*4, %esp  /* remove xen-provided frame */
-   jmp sysenter_past_esp
-
 ENTRY(xen_hypervisor_callback)
pushl   $-1 /* orig_ax = -1 => not a system 
call */
SAVE_ALL
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 

[PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-02-28 Thread Andy Lutomirski
Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
if a user does SYSENTER with TF set, we will single-step through the
kernel until something clears TF.  There is absolutely nothing we can
do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
   add TF to that list of flags to sanitize with no overhead whatsoever.

2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
making sure we clear TF before any branches in the SYSENTER
code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S| 42 ++--
 arch/x86/entry/entry_64_compat.S |  9 ++-
 arch/x86/include/asm/proto.h | 15 ++--
 arch/x86/kernel/traps.c  | 52 +---
 4 files changed, 94 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ed171f938960..752d4f031a18 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -287,7 +287,26 @@ need_resched:
 END(resume_kernel)
 #endif
 
-   # SYSENTER  call handler stub
+GLOBAL(__begin_SYSENTER_singlestep_region)
+/*
+ * All code from here through __end_SYSENTER_singlestep_region is subject
+ * to being single-stepped if a user program sets TF and executes SYSENTER.
+ * There is absolutely nothing that we can do to prevent this from happening
+ * (thanks Intel!).  To keep our handling of this situation as simple as
+ * possible, we handle TF just like AC and NT, except that our #DB handler
+ * will ignore all of the single-step traps generated in this range.
+ */
+
+#ifdef CONFIG_XEN
+/*
+ * Xen doesn't set %esp to be precisely what the normal SYSENTER
+ * entry point expects, so fix it up before using the normal path.
+ */
+ENTRY(xen_sysenter_target)
+   addl$5*4, %esp  /* remove xen-provided frame */
+   jmp sysenter_past_esp
+#endif
+
 ENTRY(entry_SYSENTER_32)
movlTSS_sysenter_sp0(%esp), %esp
 sysenter_past_esp:
@@ -301,19 +320,25 @@ sysenter_past_esp:
SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */
 
/*
-* Sysenter doesn't filter flags, so we need to clear NT and AC
-* ourselves.  To save a few cycles, we can check whether
+* Sysenter doesn't filter flags, so we need to clear NT, AC
+* and TF ourselves.  To save a few cycles, we can check whether
 * either was set instead of doing an unconditional popfq.
 * This needs to happen before enabling interrupts so that
 * we don't get preempted with NT set.
 *
+* If TF is set, we will single-step all the way to here -- do_debug
+* will ignore all the traps.  (Yes, this is slow, but so is
+* single-stepping in general.  This allows us to avoid having
+* a more complicated code to handle the case where a user program
+* forces us to single-step through the SYSENTER entry code.)
+*
 * NB.: .Lsysenter_fix_flags is a label with the code under it moved
 * out-of-line as an optimization: NT is unlikely to be set in the
 * majority of the cases and instead of polluting the I$ unnecessarily,
 * we're keeping that code behind a branch which will predict as
 * not-taken and therefore its instructions won't be fetched.
 */
-   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp)
jnz .Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
@@ -369,6 +394,7 @@ sysenter_past_esp:
pushl   $X86_EFLAGS_FIXED
popfl
jmp .Lsysenter_flags_fixed
+GLOBAL(__end_SYSENTER_singlestep_region)
 ENDPROC(entry_SYSENTER_32)
 
# system call handler stub
@@ -662,14 +688,6 @@ ENTRY(spurious_interrupt_bug)
 END(spurious_interrupt_bug)
 
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
-   addl$5*4, %esp  /* remove xen-provided frame */
-   jmp sysenter_past_esp
-
 ENTRY(xen_hypervisor_callback)
pushl   $-1 /* orig_ax = -1 => not a system 
call */
SAVE_ALL
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 7c8e72da7654..6aec75b41b06 

[PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed

2016-02-28 Thread Andy Lutomirski
The SYSENTER stack is only used on 32-bit kernels.  Remove it in
64-bit kernels.

(We may end up using it down the road on 64-bit kernels.  If so,
 we'll re-enable it for CONFIG_IA32_EMULATION.)

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ecb410310e70..7cd01b71b5bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -297,10 +297,12 @@ struct tss_struct {
 */
unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];
 
+#ifdef CONFIG_X86_32
/*
 * Space for the temporary SYSENTER stack:
 */
unsigned long   SYSENTER_stack[64];
+#endif
 
 } cacheline_aligned;
 
-- 
2.5.0



[PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed

2016-02-28 Thread Andy Lutomirski
The SYSENTER stack is only used on 32-bit kernels.  Remove it in
64-bit kernels.

(We may end up using it down the road on 64-bit kernels.  If so,
 we'll re-enable it for CONFIG_IA32_EMULATION.)

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ecb410310e70..7cd01b71b5bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -297,10 +297,12 @@ struct tss_struct {
 */
unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];
 
+#ifdef CONFIG_X86_32
/*
 * Space for the temporary SYSENTER stack:
 */
unsigned long   SYSENTER_stack[64];
+#endif
 
 } cacheline_aligned;
 
-- 
2.5.0



[PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack

2016-02-28 Thread Andy Lutomirski
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 3 ++-
 arch/x86/kernel/process.c| 3 +++
 arch/x86/kernel/traps.c  | 8 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7cd01b71b5bd..50a6dc871cc0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -299,8 +299,9 @@ struct tss_struct {
 
 #ifdef CONFIG_X86_32
/*
-* Space for the temporary SYSENTER stack:
+* Space for the temporary SYSENTER stack.
 */
+   unsigned long   SYSENTER_stack_canary;
unsigned long   SYSENTER_stack[64];
 #endif
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..ee9a9792caeb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, 
cpu_tss) = {
  */
.io_bitmap  = { [0 ... IO_BITMAP_LONGS] = ~0 },
 #endif
+#ifdef CONFIG_X86_32
+   .SYSENTER_stack_canary  = STACK_END_MAGIC,
+#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 80928ea78373..590110119e6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
debug_stack_usage_dec();
 
 exit:
+#if defined(CONFIG_X86_32)
+   /*
+* This is the most likely code path that involves non-trivial use
+* of the SYSENTER stack.  Check that we haven't overrun it.
+*/
+   WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
+"Overran or corrupted SYSENTER stack\n");
+#endif
ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);
-- 
2.5.0



[PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack

2016-02-28 Thread Andy Lutomirski
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 3 ++-
 arch/x86/kernel/process.c| 3 +++
 arch/x86/kernel/traps.c  | 8 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7cd01b71b5bd..50a6dc871cc0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -299,8 +299,9 @@ struct tss_struct {
 
 #ifdef CONFIG_X86_32
/*
-* Space for the temporary SYSENTER stack:
+* Space for the temporary SYSENTER stack.
 */
+   unsigned long   SYSENTER_stack_canary;
unsigned long   SYSENTER_stack[64];
 #endif
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..ee9a9792caeb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, 
cpu_tss) = {
  */
.io_bitmap  = { [0 ... IO_BITMAP_LONGS] = ~0 },
 #endif
+#ifdef CONFIG_X86_32
+   .SYSENTER_stack_canary  = STACK_END_MAGIC,
+#endif
 };
 EXPORT_PER_CPU_SYMBOL(cpu_tss);
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 80928ea78373..590110119e6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
debug_stack_usage_dec();
 
 exit:
+#if defined(CONFIG_X86_32)
+   /*
+* This is the most likely code path that involves non-trivial use
+* of the SYSENTER stack.  Check that we haven't overrun it.
+*/
+   WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
+"Overran or corrupted SYSENTER stack\n");
+#endif
ist_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug);
-- 
2.5.0



[PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup

2016-02-28 Thread Andy Lutomirski
Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

1. It checked the interrupt frame's CS and EIP.  This wasn't
   obviously correct on Xen or if vm86 mode was in use [1].

2. In the NMI handler, it did some frightening digging into the
   stack frame.  I'm not convinced this digging was correct.

3. The fixup didn't switch stacks and then switch back.  Instead, it
   synthesized a brand new stack frame that would redirect the IRET
   back to the SYSENTER code.  That frame was highly questionable.
   For one thing, if NMI nested inside #DB, we would effectively
   abort the #DB prologue, which was probably safe but was
   frightening.  For another, the code used PUSHFL to write the
   FLAGS portion of the frame, which was simply bogus -- by the time
   PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
   flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
shenanigans as far as I can tell.  A user can't point EIP at
entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
like all kernel addresses, is greater than 0x and would thus
violate the CS segment limit.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S| 114 ++-
 arch/x86/kernel/asm-offsets_32.c |   5 ++
 2 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 752d4f031a18..99bf636a6eaf 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -987,51 +987,48 @@ error_code:
jmp ret_from_exception
 END(page_fault)
 
-/*
- * Debug traps and NMI can happen at the one SYSENTER instruction
- * that sets up the real kernel stack. Check here, since we can't
- * allow the wrong stack to be used.
- *
- * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
- * already pushed 3 words if it hits on the sysenter instruction:
- * eflags, cs and eip.
- *
- * We just load the right stack, and push the three (known) values
- * by hand onto the new stack - while updating the return eip past
- * the instruction that would have done it for sysenter.
- */
-.macro FIX_STACK offset ok label
-   cmpw$__KERNEL_CS, 4(%esp)
-   jne \ok
-\label:
-   movlTSS_sysenter_sp0 + \offset(%esp), %esp
-   pushfl
-   pushl   $__KERNEL_CS
-   pushl   $sysenter_past_esp
-.endm
-
 ENTRY(debug)
+   /*
+* #DB can happen at the first instruction of
+* entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
+* happens, then we will be running on a very small stack.  We
+* need to detect this condition and switch to the thread
+* stack before calling any C code at all.
+*
+* If you edit this code, keep in mind that NMIs can happen in here.
+*/
ASM_CLAC
-   cmpl$entry_SYSENTER_32, (%esp)
-   jne debug_stack_correct
-   FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
-debug_stack_correct:
pushl   $-1 # mark this as an int
SAVE_ALL
-   TRACE_IRQS_OFF
xorl%edx, %edx  # error code 0
movl%esp, %eax  # pt_regs pointer
+
+   /* Are we currently on the SYSENTER stack? */
+   PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+   subl%eax, %ecx  /* ecx = (end of SYENTER_stack) - esp */
+   cmpl$SIZEOF_SYSENTER_stack, %ecx
+   jb  .Ldebug_from_sysenter_stack
+
+   TRACE_IRQS_OFF
+   calldo_debug
+   jmp ret_from_exception
+
+.Ldebug_from_sysenter_stack:
+   /* We're on the SYSENTER stack.  Switch off. */
+   movl%esp, %ebp
+   movlPER_CPU_VAR(cpu_current_top_of_stack), %esp
+   TRACE_IRQS_OFF
calldo_debug
+   movl%ebp, %esp
jmp ret_from_exception
 END(debug)
 
 /*
- * NMI is doubly nasty. It can happen _while_ we're handling
- * a debug fault, and the debug fault hasn't yet been able to
- * clear up the stack. So we first check whether we got  an
- * NMI on the sysenter entry path, but after that we need to
- * check whether we got an NMI on the debug path where the debug
- * fault happened on the sysenter 

[PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup

2016-02-28 Thread Andy Lutomirski
Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

1. It checked the interrupt frame's CS and EIP.  This wasn't
   obviously correct on Xen or if vm86 mode was in use [1].

2. In the NMI handler, it did some frightening digging into the
   stack frame.  I'm not convinced this digging was correct.

3. The fixup didn't switch stacks and then switch back.  Instead, it
   synthesized a brand new stack frame that would redirect the IRET
   back to the SYSENTER code.  That frame was highly questionable.
   For one thing, if NMI nested inside #DB, we would effectively
   abort the #DB prologue, which was probably safe but was
   frightening.  For another, the code used PUSHFL to write the
   FLAGS portion of the frame, which was simply bogus -- by the time
   PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
   flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
shenanigans as far as I can tell.  A user can't point EIP at
entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
like all kernel addresses, is greater than 0x and would thus
violate the CS segment limit.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S| 114 ++-
 arch/x86/kernel/asm-offsets_32.c |   5 ++
 2 files changed, 56 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 752d4f031a18..99bf636a6eaf 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -987,51 +987,48 @@ error_code:
jmp ret_from_exception
 END(page_fault)
 
-/*
- * Debug traps and NMI can happen at the one SYSENTER instruction
- * that sets up the real kernel stack. Check here, since we can't
- * allow the wrong stack to be used.
- *
- * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
- * already pushed 3 words if it hits on the sysenter instruction:
- * eflags, cs and eip.
- *
- * We just load the right stack, and push the three (known) values
- * by hand onto the new stack - while updating the return eip past
- * the instruction that would have done it for sysenter.
- */
-.macro FIX_STACK offset ok label
-   cmpw$__KERNEL_CS, 4(%esp)
-   jne \ok
-\label:
-   movlTSS_sysenter_sp0 + \offset(%esp), %esp
-   pushfl
-   pushl   $__KERNEL_CS
-   pushl   $sysenter_past_esp
-.endm
-
 ENTRY(debug)
+   /*
+* #DB can happen at the first instruction of
+* entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
+* happens, then we will be running on a very small stack.  We
+* need to detect this condition and switch to the thread
+* stack before calling any C code at all.
+*
+* If you edit this code, keep in mind that NMIs can happen in here.
+*/
ASM_CLAC
-   cmpl$entry_SYSENTER_32, (%esp)
-   jne debug_stack_correct
-   FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
-debug_stack_correct:
pushl   $-1 # mark this as an int
SAVE_ALL
-   TRACE_IRQS_OFF
xorl%edx, %edx  # error code 0
movl%esp, %eax  # pt_regs pointer
+
+   /* Are we currently on the SYSENTER stack? */
+   PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
+   subl%eax, %ecx  /* ecx = (end of SYENTER_stack) - esp */
+   cmpl$SIZEOF_SYSENTER_stack, %ecx
+   jb  .Ldebug_from_sysenter_stack
+
+   TRACE_IRQS_OFF
+   calldo_debug
+   jmp ret_from_exception
+
+.Ldebug_from_sysenter_stack:
+   /* We're on the SYSENTER stack.  Switch off. */
+   movl%esp, %ebp
+   movlPER_CPU_VAR(cpu_current_top_of_stack), %esp
+   TRACE_IRQS_OFF
calldo_debug
+   movl%ebp, %esp
jmp ret_from_exception
 END(debug)
 
 /*
- * NMI is doubly nasty. It can happen _while_ we're handling
- * a debug fault, and the debug fault hasn't yet been able to
- * clear up the stack. So we first check whether we got  an
- * NMI on the sysenter entry path, but after that we need to
- * check whether we got an NMI on the debug path where the debug
- * fault happened on the sysenter path.
+ * NMI is 

[PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions

2016-02-28 Thread Andy Lutomirski
The SDM says that debug exceptions clear BTF, and we need to keep
TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
the comment.

I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
to be cleared was just an oversight.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/traps.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index dd2c2e66c2e1..19e6cfa501e3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
dr6 &= ~DR6_RESERVED;
 
/*
+* The SDM says "The processor clears the BTF flag when it
+* generates a debug exception."  Clear TIF_BLOCKSTEP to keep
+* TIF_BLOCKSTEP in sync with the hardware BTF flag.
+*/
+   clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
+
+   /*
 * If dr6 has no reason to give us about the origin of this trap,
 * then it's very likely the result of an icebp/int01 trap.
 * User wants a sigtrap for that.
@@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
/* DR6 may or may not be cleared by the CPU */
set_debugreg(0, 6);
 
-   /*
-* The processor cleared BTF, so don't mark that we need it set.
-*/
-   clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
-
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0



[PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT

2016-02-28 Thread Andy Lutomirski
We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.

With this patch applied, native kernels should always honor
task_pt_regs()->flags, which opens the door for some sys_iopl
cleanups.  I'll do those as a separate series, though, since getting
it right will involve tweaking some paravirt ops.

(The short version is that, before this patch, sys_iopl, invoked via
 SYSENTER, wasn't guaranteed to ever transfer the updated
 regs->flags, so sys_iopl had to change the hardware flags register
 as well.)

Reported-by: Brian Gerst 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 263ebde6333f..ed171f938960 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -343,6 +343,15 @@ sysenter_past_esp:
popl%eax/* pt_regs->ax */
 
/*
+* Restore all flags except IF (we restore IF separately because
+* STI gives a one-instruction window in which we won't be interrupted,
+* whereas POPF does not.
+*/
+   addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
+   btr $X86_EFLAGS_IF_BIT, (%esp)
+   popfl
+
+   /*
 * Return back to the vDSO, which will pop ecx and edx.
 * Don't bother with DS and ES (they already contain __USER_DS).
 */
-- 
2.5.0



[PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER

2016-02-28 Thread Andy Lutomirski
This makes the 32-bit code work just like the 64-bit code.  It should
speed up syscalls on 32-bit kernels on Skylake by something like 20
cycles (by analogy to the 64-bit compat case).

It also cleans up NT just like we do for the 64-bit case.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ab710eee4308..263ebde6333f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,7 +294,6 @@ sysenter_past_esp:
pushl   $__USER_DS  /* pt_regs->ss */
pushl   %ebp/* pt_regs->sp (stashed in bp) */
pushfl  /* pt_regs->flags (except IF = 0) */
-   ASM_CLAC/* Clear AC after saving FLAGS */
orl $X86_EFLAGS_IF, (%esp)  /* Fix IF */
pushl   $__USER_CS  /* pt_regs->cs */
pushl   $0  /* pt_regs->ip = 0 (placeholder) */
@@ -302,6 +301,23 @@ sysenter_past_esp:
SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */
 
/*
+* Sysenter doesn't filter flags, so we need to clear NT and AC
+* ourselves.  To save a few cycles, we can check whether
+* either was set instead of doing an unconditional popfq.
+* This needs to happen before enabling interrupts so that
+* we don't get preempted with NT set.
+*
+* NB.: .Lsysenter_fix_flags is a label with the code under it moved
+* out-of-line as an optimization: NT is unlikely to be set in the
+* majority of the cases and instead of polluting the I$ unnecessarily,
+* we're keeping that code behind a branch which will predict as
+* not-taken and therefore its instructions won't be fetched.
+*/
+   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+   jnz .Lsysenter_fix_flags
+.Lsysenter_flags_fixed:
+
+   /*
 * User mode is traced as though IRQs are on, and SYSENTER
 * turned them off.
 */
@@ -339,6 +355,11 @@ sysenter_past_esp:
 .popsection
_ASM_EXTABLE(1b, 2b)
PTGS_TO_GS_EX
+
+.Lsysenter_fix_flags:
+   pushl   $X86_EFLAGS_FIXED
+   popfl
+   jmp .Lsysenter_flags_fixed
 ENDPROC(entry_SYSENTER_32)
 
# system call handler stub
-- 
2.5.0



[PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions

2016-02-28 Thread Andy Lutomirski
The SDM says that debug exceptions clear BTF, and we need to keep
TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
the comment.

I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
to be cleared was just an oversight.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/traps.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index dd2c2e66c2e1..19e6cfa501e3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
dr6 &= ~DR6_RESERVED;
 
/*
+* The SDM says "The processor clears the BTF flag when it
+* generates a debug exception."  Clear TIF_BLOCKSTEP to keep
+* TIF_BLOCKSTEP in sync with the hardware BTF flag.
+*/
+   clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
+
+   /*
 * If dr6 has no reason to give us about the origin of this trap,
 * then it's very likely the result of an icebp/int01 trap.
 * User wants a sigtrap for that.
@@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long 
error_code)
/* DR6 may or may not be cleared by the CPU */
set_debugreg(0, 6);
 
-   /*
-* The processor cleared BTF, so don't mark that we need it set.
-*/
-   clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
-
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;
 
-- 
2.5.0



[PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT

2016-02-28 Thread Andy Lutomirski
We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.

With this patch applied, native kernels should always honor
task_pt_regs()->flags, which opens the door for some sys_iopl
cleanups.  I'll do those as a separate series, though, since getting
it right will involve tweaking some paravirt ops.

(The short version is that, before this patch, sys_iopl, invoked via
 SYSENTER, wasn't guaranteed to ever transfer the updated
 regs->flags, so sys_iopl had to change the hardware flags register
 as well.)

Reported-by: Brian Gerst 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 263ebde6333f..ed171f938960 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -343,6 +343,15 @@ sysenter_past_esp:
popl%eax/* pt_regs->ax */
 
/*
+* Restore all flags except IF (we restore IF separately because
+* STI gives a one-instruction window in which we won't be interrupted,
+* whereas POPF does not.
+*/
+   addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
+   btr $X86_EFLAGS_IF_BIT, (%esp)
+   popfl
+
+   /*
 * Return back to the vDSO, which will pop ecx and edx.
 * Don't bother with DS and ES (they already contain __USER_DS).
 */
-- 
2.5.0



[PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER

2016-02-28 Thread Andy Lutomirski
This makes the 32-bit code work just like the 64-bit code.  It should
speed up syscalls on 32-bit kernels on Skylake by something like 20
cycles (by analogy to the 64-bit compat case).

It also cleans up NT just like we do for the 64-bit case.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ab710eee4308..263ebde6333f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,7 +294,6 @@ sysenter_past_esp:
pushl   $__USER_DS  /* pt_regs->ss */
pushl   %ebp/* pt_regs->sp (stashed in bp) */
pushfl  /* pt_regs->flags (except IF = 0) */
-   ASM_CLAC/* Clear AC after saving FLAGS */
orl $X86_EFLAGS_IF, (%esp)  /* Fix IF */
pushl   $__USER_CS  /* pt_regs->cs */
pushl   $0  /* pt_regs->ip = 0 (placeholder) */
@@ -302,6 +301,23 @@ sysenter_past_esp:
SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */
 
/*
+* Sysenter doesn't filter flags, so we need to clear NT and AC
+* ourselves.  To save a few cycles, we can check whether
+* either was set instead of doing an unconditional popfq.
+* This needs to happen before enabling interrupts so that
+* we don't get preempted with NT set.
+*
+* NB.: .Lsysenter_fix_flags is a label with the code under it moved
+* out-of-line as an optimization: NT is unlikely to be set in the
+* majority of the cases and instead of polluting the I$ unnecessarily,
+* we're keeping that code behind a branch which will predict as
+* not-taken and therefore its instructions won't be fetched.
+*/
+   testl   $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp)
+   jnz .Lsysenter_fix_flags
+.Lsysenter_flags_fixed:
+
+   /*
 * User mode is traced as though IRQs are on, and SYSENTER
 * turned them off.
 */
@@ -339,6 +355,11 @@ sysenter_past_esp:
 .popsection
_ASM_EXTABLE(1b, 2b)
PTGS_TO_GS_EX
+
+.Lsysenter_fix_flags:
+   pushl   $X86_EFLAGS_FIXED
+   popfl
+   jmp .Lsysenter_flags_fixed
 ENDPROC(entry_SYSENTER_32)
 
# system call handler stub
-- 
2.5.0



Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep

2016-02-28 Thread Andy Lutomirski
On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds
 wrote:
> On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen  wrote:
>>
>> The exception handlers which use the IST stacks don't necessarily
>> set irq count. Maybe they should.
>
> Hmm. I think they should. Since they clearly must not schedule, as
> they use a percpu stack.
>
> Which exceptions use IST?
>
> [ grep grep ]
>
> Looks like stack, doublefault, nmi, debug and mce. And yes, I really
> think they should all raise the irq count if they don't already.
> Rather than add random arch-specific "let's check that we're on the
> right stack" code to the might-sleep stuff, just use the one we have.
>

Resurrecting an old thread:

The outcome of this discussion was that ist_enter now raises
HARDIRQ_COUNT.  I think this is causing a problem.  If a user program
enables TF, it generates a bunch of debug exceptions.  The handlers
raise the IRQ count and do stuff, and apparently some of that stuff
can raise a softirq.  (I have no idea where the softirq is being
raised.)  The softirq code notices that we're in_interrupt and doesn't
wake ksoftirqd because it thinks we're about to exit the interrupt and
process the softirq.  But we don't, which causes occasional warnings
and confuses things (and me!).

So how do we fix it?  If we stop raising HARDIRQ_COUNT (and apply
$SUBJECT?), then raise_softirq will wake ksoftirqd and life is good.
But this seems a bit silly, since, if we entered the ist exception
handler from a context with irqs on and softirqs enabled, we *could*
plausibly handle the softirq right away -- we're on an essentially
empty stack.  (Of course, it's a *small* stack, since it could be the
IST stack.)

Or we could just let ksoftirqd do its thing and stop raising
HARDIRQ_COUNT.  We could add a new preempt count field just for IST
(yuck).  We could try to hijack a different preempt count field
(NMI?).  But I kind of like the idea of just reinstating the original
patch of explicitly checking that we're on a safe stack in schedule
and __might_sleep, since that is the actual condition we care about.

--Andy


Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep

2016-02-28 Thread Andy Lutomirski
On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds
 wrote:
> On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen  wrote:
>>
>> The exception handlers which use the IST stacks don't necessarily
>> set irq count. Maybe they should.
>
> Hmm. I think they should. Since they clearly must not schedule, as
> they use a percpu stack.
>
> Which exceptions use IST?
>
> [ grep grep ]
>
> Looks like stack, doublefault, nmi, debug and mce. And yes, I really
> think they should all raise the irq count if they don't already.
> Rather than add random arch-specific "let's check that we're on the
> right stack" code to the might-sleep stuff, just use the one we have.
>

Resurrecting an old thread:

The outcome of this discussion was that ist_enter now raises
HARDIRQ_COUNT.  I think this is causing a problem.  If a user program
enables TF, it generates a bunch of debug exceptions.  The handlers
raise the IRQ count and do stuff, and apparently some of that stuff
can raise a softirq.  (I have no idea where the softirq is being
raised.)  The softirq code notices that we're in_interrupt and doesn't
wake ksoftirqd because it thinks we're about to exit the interrupt and
process the softirq.  But we don't, which causes occasional warnings
and confuses things (and me!).

So how do we fix it?  If we stop raising HARDIRQ_COUNT (and apply
$SUBJECT?), then raise_softirq will wake ksoftirqd and life is good.
But this seems a bit silly, since, if we entered the ist exception
handler from a context with irqs on and softirqs enabled, we *could*
plausibly handle the softirq right away -- we're on an essentially
empty stack.  (Of course, it's a *small* stack, since it could be the
IST stack.)

Or we could just let ksoftirqd do its thing and stop raising
HARDIRQ_COUNT.  We could add a new preempt count field just for IST
(yuck).  We could try to hijack a different preempt count field
(NMI?).  But I kind of like the idea of just reinstating the original
patch of explicitly checking that we're on a safe stack in schedule
and __might_sleep, since that is the actual condition we care about.

--Andy


[PATCH v4 3/5] ocfs2: create/remove sysfile for online file check

2016-02-28 Thread Gang He
Create online file check sysfile when ocfs2 mount,
remove the related sysfile when ocfs2 umount.

Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
---
 fs/ocfs2/super.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 2de4c8a..5ef88b8 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -74,6 +74,7 @@
 #include "suballoc.h"
 
 #include "buffer_head_io.h"
+#include "filecheck.h"
 
 static struct kmem_cache *ocfs2_inode_cachep;
 struct kmem_cache *ocfs2_dquot_cachep;
@@ -1204,6 +1205,9 @@ static int ocfs2_fill_super(struct super_block *sb, void 
*data, int silent)
/* Start this when the mount is almost sure of being successful */
ocfs2_orphan_scan_start(osb);
 
+   /* Create filecheck sysfile /sys/fs/ocfs2//filecheck */
+   ocfs2_filecheck_create_sysfs(sb);
+
return status;
 
 read_super_error:
@@ -1671,6 +1675,7 @@ static void ocfs2_put_super(struct super_block *sb)
 
ocfs2_sync_blockdev(sb);
ocfs2_dismount_volume(sb, 0);
+   ocfs2_filecheck_remove_sysfs(sb);
 }
 
 static int ocfs2_statfs(struct dentry *dentry, struct kstatfs *buf)
-- 
2.1.2



[PATCH v4 4/5] ocfs2: check/fix inode block for online file check

2016-02-28 Thread Gang He
Implement online check or fix inode block during
reading a inode block to memory.

Signed-off-by: Gang He 
---
 fs/ocfs2/inode.c   | 225 +++--
 fs/ocfs2/ocfs2_trace.h |   2 +
 2 files changed, 218 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 8f87e05..6ce531e 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -53,6 +53,7 @@
 #include "xattr.h"
 #include "refcounttree.h"
 #include "ocfs2_trace.h"
+#include "filecheck.h"
 
 #include "buffer_head_io.h"
 
@@ -74,6 +75,14 @@ static int ocfs2_truncate_for_delete(struct ocfs2_super *osb,
struct inode *inode,
struct buffer_head *fe_bh);
 
+static int ocfs2_filecheck_read_inode_block_full(struct inode *inode,
+struct buffer_head **bh,
+int flags, int type);
+static int ocfs2_filecheck_validate_inode_block(struct super_block *sb,
+   struct buffer_head *bh);
+static int ocfs2_filecheck_repair_inode_block(struct super_block *sb,
+ struct buffer_head *bh);
+
 void ocfs2_set_inode_flags(struct inode *inode)
 {
unsigned int flags = OCFS2_I(inode)->ip_attr;
@@ -127,6 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 
blkno)
 struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
 int sysfile_type)
 {
+   int rc = 0;
struct inode *inode = NULL;
struct super_block *sb = osb->sb;
struct ocfs2_find_inode_args args;
@@ -161,12 +171,17 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 
blkno, unsigned flags,
}
trace_ocfs2_iget5_locked(inode->i_state);
if (inode->i_state & I_NEW) {
-   ocfs2_read_locked_inode(inode, );
+   rc = ocfs2_read_locked_inode(inode, );
unlock_new_inode(inode);
}
if (is_bad_inode(inode)) {
iput(inode);
-   inode = ERR_PTR(-ESTALE);
+   if ((flags & OCFS2_FI_FLAG_FILECHECK_CHK) ||
+   (flags & OCFS2_FI_FLAG_FILECHECK_FIX))
+   /* Return OCFS2_FILECHECK_ERR_XXX related errno */
+   inode = ERR_PTR(rc);
+   else
+   inode = ERR_PTR(-ESTALE);
goto bail;
}
 
@@ -409,7 +424,7 @@ static int ocfs2_read_locked_inode(struct inode *inode,
struct ocfs2_super *osb;
struct ocfs2_dinode *fe;
struct buffer_head *bh = NULL;
-   int status, can_lock;
+   int status, can_lock, lock_level = 0;
u32 generation = 0;
 
status = -EINVAL;
@@ -477,7 +492,7 @@ static int ocfs2_read_locked_inode(struct inode *inode,
mlog_errno(status);
return status;
}
-   status = ocfs2_inode_lock(inode, NULL, 0);
+   status = ocfs2_inode_lock(inode, NULL, lock_level);
if (status) {
make_bad_inode(inode);
mlog_errno(status);
@@ -494,16 +509,32 @@ static int ocfs2_read_locked_inode(struct inode *inode,
}
 
if (can_lock) {
-   status = ocfs2_read_inode_block_full(inode, ,
-OCFS2_BH_IGNORE_CACHE);
+   if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK)
+   status = ocfs2_filecheck_read_inode_block_full(inode,
+   , OCFS2_BH_IGNORE_CACHE, 0);
+   else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX)
+   status = ocfs2_filecheck_read_inode_block_full(inode,
+   , OCFS2_BH_IGNORE_CACHE, 1);
+   else
+   status = ocfs2_read_inode_block_full(inode,
+   , OCFS2_BH_IGNORE_CACHE);
} else {
status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, );
/*
 * If buffer is in jbd, then its checksum may not have been
 * computed as yet.
 */
-   if (!status && !buffer_jbd(bh))
-   status = ocfs2_validate_inode_block(osb->sb, bh);
+   if (!status && !buffer_jbd(bh)) {
+   if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK)
+   status = ocfs2_filecheck_validate_inode_block(
+   osb->sb, bh);
+   else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX)
+   status = ocfs2_filecheck_repair_inode_block(
+

[PATCH v4 1/5] ocfs2: export ocfs2_kset for online file check

2016-02-28 Thread Gang He
Export ocfs2_kset object from ocfs2_stackglue kernel module,
then online file check code will create the related sysfiles
under ocfs2_kset object.

Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
---
 fs/ocfs2/stackglue.c | 3 ++-
 fs/ocfs2/stackglue.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 5d965e8..13219ed 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -629,7 +629,8 @@ static struct attribute_group ocfs2_attr_group = {
.attrs = ocfs2_attrs,
 };
 
-static struct kset *ocfs2_kset;
+struct kset *ocfs2_kset;
+EXPORT_SYMBOL_GPL(ocfs2_kset);
 
 static void ocfs2_sysfs_exit(void)
 {
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 66334a3..f2dce10 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -298,4 +298,6 @@ void ocfs2_stack_glue_set_max_proto_version(struct 
ocfs2_protocol_version *max_p
 int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
 void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
 
+extern struct kset *ocfs2_kset;
+
 #endif  /* STACKGLUE_H */
-- 
2.1.2



[PATCH v4 3/5] ocfs2: create/remove sysfile for online file check

2016-02-28 Thread Gang He
Create online file check sysfile when ocfs2 mount,
remove the related sysfile when ocfs2 umount.

Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
---
 fs/ocfs2/super.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 2de4c8a..5ef88b8 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -74,6 +74,7 @@
 #include "suballoc.h"
 
 #include "buffer_head_io.h"
+#include "filecheck.h"
 
 static struct kmem_cache *ocfs2_inode_cachep;
 struct kmem_cache *ocfs2_dquot_cachep;
@@ -1204,6 +1205,9 @@ static int ocfs2_fill_super(struct super_block *sb, void 
*data, int silent)
/* Start this when the mount is almost sure of being successful */
ocfs2_orphan_scan_start(osb);
 
+   /* Create filecheck sysfile /sys/fs/ocfs2//filecheck */
+   ocfs2_filecheck_create_sysfs(sb);
+
return status;
 
 read_super_error:
@@ -1671,6 +1675,7 @@ static void ocfs2_put_super(struct super_block *sb)
 
ocfs2_sync_blockdev(sb);
ocfs2_dismount_volume(sb, 0);
+   ocfs2_filecheck_remove_sysfs(sb);
 }
 
 static int ocfs2_statfs(struct dentry *dentry, struct kstatfs *buf)
-- 
2.1.2



[PATCH v4 4/5] ocfs2: check/fix inode block for online file check

2016-02-28 Thread Gang He
Implement online check or fix inode block during
reading a inode block to memory.

Signed-off-by: Gang He 
---
 fs/ocfs2/inode.c   | 225 +++--
 fs/ocfs2/ocfs2_trace.h |   2 +
 2 files changed, 218 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 8f87e05..6ce531e 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -53,6 +53,7 @@
 #include "xattr.h"
 #include "refcounttree.h"
 #include "ocfs2_trace.h"
+#include "filecheck.h"
 
 #include "buffer_head_io.h"
 
@@ -74,6 +75,14 @@ static int ocfs2_truncate_for_delete(struct ocfs2_super *osb,
struct inode *inode,
struct buffer_head *fe_bh);
 
+static int ocfs2_filecheck_read_inode_block_full(struct inode *inode,
+struct buffer_head **bh,
+int flags, int type);
+static int ocfs2_filecheck_validate_inode_block(struct super_block *sb,
+   struct buffer_head *bh);
+static int ocfs2_filecheck_repair_inode_block(struct super_block *sb,
+ struct buffer_head *bh);
+
 void ocfs2_set_inode_flags(struct inode *inode)
 {
unsigned int flags = OCFS2_I(inode)->ip_attr;
@@ -127,6 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 
blkno)
 struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
 int sysfile_type)
 {
+   int rc = 0;
struct inode *inode = NULL;
struct super_block *sb = osb->sb;
struct ocfs2_find_inode_args args;
@@ -161,12 +171,17 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 
blkno, unsigned flags,
}
trace_ocfs2_iget5_locked(inode->i_state);
if (inode->i_state & I_NEW) {
-   ocfs2_read_locked_inode(inode, );
+   rc = ocfs2_read_locked_inode(inode, );
unlock_new_inode(inode);
}
if (is_bad_inode(inode)) {
iput(inode);
-   inode = ERR_PTR(-ESTALE);
+   if ((flags & OCFS2_FI_FLAG_FILECHECK_CHK) ||
+   (flags & OCFS2_FI_FLAG_FILECHECK_FIX))
+   /* Return OCFS2_FILECHECK_ERR_XXX related errno */
+   inode = ERR_PTR(rc);
+   else
+   inode = ERR_PTR(-ESTALE);
goto bail;
}
 
@@ -409,7 +424,7 @@ static int ocfs2_read_locked_inode(struct inode *inode,
struct ocfs2_super *osb;
struct ocfs2_dinode *fe;
struct buffer_head *bh = NULL;
-   int status, can_lock;
+   int status, can_lock, lock_level = 0;
u32 generation = 0;
 
status = -EINVAL;
@@ -477,7 +492,7 @@ static int ocfs2_read_locked_inode(struct inode *inode,
mlog_errno(status);
return status;
}
-   status = ocfs2_inode_lock(inode, NULL, 0);
+   status = ocfs2_inode_lock(inode, NULL, lock_level);
if (status) {
make_bad_inode(inode);
mlog_errno(status);
@@ -494,16 +509,32 @@ static int ocfs2_read_locked_inode(struct inode *inode,
}
 
if (can_lock) {
-   status = ocfs2_read_inode_block_full(inode, ,
-OCFS2_BH_IGNORE_CACHE);
+   if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK)
+   status = ocfs2_filecheck_read_inode_block_full(inode,
+   , OCFS2_BH_IGNORE_CACHE, 0);
+   else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX)
+   status = ocfs2_filecheck_read_inode_block_full(inode,
+   , OCFS2_BH_IGNORE_CACHE, 1);
+   else
+   status = ocfs2_read_inode_block_full(inode,
+   , OCFS2_BH_IGNORE_CACHE);
} else {
status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, );
/*
 * If buffer is in jbd, then its checksum may not have been
 * computed as yet.
 */
-   if (!status && !buffer_jbd(bh))
-   status = ocfs2_validate_inode_block(osb->sb, bh);
+   if (!status && !buffer_jbd(bh)) {
+   if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK)
+   status = ocfs2_filecheck_validate_inode_block(
+   osb->sb, bh);
+   else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX)
+   status = ocfs2_filecheck_repair_inode_block(
+   

[PATCH v4 1/5] ocfs2: export ocfs2_kset for online file check

2016-02-28 Thread Gang He
Export ocfs2_kset object from ocfs2_stackglue kernel module,
then online file check code will create the related sysfiles
under ocfs2_kset object.

Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
---
 fs/ocfs2/stackglue.c | 3 ++-
 fs/ocfs2/stackglue.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 5d965e8..13219ed 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -629,7 +629,8 @@ static struct attribute_group ocfs2_attr_group = {
.attrs = ocfs2_attrs,
 };
 
-static struct kset *ocfs2_kset;
+struct kset *ocfs2_kset;
+EXPORT_SYMBOL_GPL(ocfs2_kset);
 
 static void ocfs2_sysfs_exit(void)
 {
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 66334a3..f2dce10 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -298,4 +298,6 @@ void ocfs2_stack_glue_set_max_proto_version(struct 
ocfs2_protocol_version *max_p
 int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
 void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
 
+extern struct kset *ocfs2_kset;
+
 #endif  /* STACKGLUE_H */
-- 
2.1.2



[PATCH v4 5/5] ocfs2: add feature document for online file check

2016-02-28 Thread Gang He
This document will describe OCFS2 online file check feature.
OCFS2 is often used in high-availaibility systems. However, OCFS2 usually
converts the filesystem to read-only when encounters an error. This may not be
necessary, since turning the filesystem read-only would affect other running
processes as well, decreasing availability.
Then, a mount option (errors=continue) is introduced, which would return the
-EIO errno to the calling process and terminate furhter processing so that the
filesystem is not corrupted further. The filesystem is not converted to
read-only, and the problematic file's inode number is reported in the kernel
log. The user can try to check/fix this file via online filecheck feature.

Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
---
 .../filesystems/ocfs2-online-filecheck.txt | 94 ++
 1 file changed, 94 insertions(+)
 create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt

diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt 
b/Documentation/filesystems/ocfs2-online-filecheck.txt
new file mode 100644
index 000..1ab0786
--- /dev/null
+++ b/Documentation/filesystems/ocfs2-online-filecheck.txt
@@ -0,0 +1,94 @@
+   OCFS2 online file check
+   ---
+
+This document will describe OCFS2 online file check feature.
+
+Introduction
+
+OCFS2 is often used in high-availaibility systems. However, OCFS2 usually
+converts the filesystem to read-only when encounters an error. This may not be
+necessary, since turning the filesystem read-only would affect other running
+processes as well, decreasing availability.
+Then, a mount option (errors=continue) is introduced, which would return the
+-EIO errno to the calling process and terminate furhter processing so that the
+filesystem is not corrupted further. The filesystem is not converted to
+read-only, and the problematic file's inode number is reported in the kernel
+log. The user can try to check/fix this file via online filecheck feature.
+
+Scope
+=
+This effort is to check/fix small issues which may hinder day-to-day operations
+of a cluster filesystem by turning the filesystem read-only. The scope of
+checking/fixing is at the file level, initially for regular files and 
eventually
+to all files (including system files) of the filesystem.
+
+In case of directory to file links is incorrect, the directory inode is
+reported as erroneous.
+
+This feature is not suited for extravagant checks which involve dependency of
+other components of the filesystem, such as but not limited to, checking if the
+bits for file blocks in the allocation has been set. In case of such an error,
+the offline fsck should/would be recommended.
+
+Finally, such an operation/feature should not be automated lest the filesystem
+may end up with more damage than before the repair attempt. So, this has to
+be performed using user interaction and consent.
+
+User interface
+==
+When there are errors in the OCFS2 filesystem, they are usually accompanied
+by the inode number which caused the error. This inode number would be the
+input to check/fix the file.
+
+There is a sysfs directory for each OCFS2 file system mounting:
+
+  /sys/fs/ocfs2//filecheck
+
+Here,  indicates the name of OCFS2 volumn device which has been 
already
+mounted. The file above would accept inode numbers. This could be used to
+communicate with kernel space, tell which file(inode number) will be checked or
+fixed. Currently, three operations are supported, which includes checking
+inode, fixing inode and setting the size of result record history.
+
+1. If you want to know what error exactly happened to  before fixing, do
+
+  # echo "" > /sys/fs/ocfs2//filecheck/check
+  # cat /sys/fs/ocfs2//filecheck/check
+
+The output is like this:
+  INO  DONEERROR
+39502  1   GENERATION
+
+ lists the inode numbers.
+ indicates whether the operation has been finished.
+ says what kind of errors was found. For the detailed error numbers,
+please refer to the file linux/fs/ocfs2/filecheck.h.
+
+2. If you determine to fix this inode, do
+
+  # echo "" > /sys/fs/ocfs2//filecheck/fix
+  # cat /sys/fs/ocfs2//filecheck/fix
+
+The output is like this:
+  INO  DONEERROR
+39502  1   SUCCESS
+
+This time, the  column indicates whether this fix is successful or not.
+
+3. The record cache is used to store the history of check/fix results. It's
+defalut size is 10, and can be adjust between the range of 10 ~ 100. You can
+adjust the size like this:
+
+  # echo "" > /sys/fs/ocfs2//filecheck/set
+
+Fixing stuff
+
+On receivng the inode, the filesystem would read the inode and the
+file metadata. In case of errors, the filesystem would fix the errors
+and report the problems it fixed in the kernel log. As a precautionary measure,
+the inode must first be checked for errors before performing a final 

[PATCH v4 2/5] ocfs2: sysfile interfaces for online file check

2016-02-28 Thread Gang He
Implement online file check sysfile interfaces, e.g.
how to create the related sysfile according to device name,
how to display/handle file check request from the sysfile.

Signed-off-by: Gang He 
---
 fs/ocfs2/Makefile|   3 +-
 fs/ocfs2/filecheck.c | 606 +++
 fs/ocfs2/filecheck.h |  49 +
 fs/ocfs2/inode.h |   3 +
 4 files changed, 660 insertions(+), 1 deletion(-)
 create mode 100644 fs/ocfs2/filecheck.c
 create mode 100644 fs/ocfs2/filecheck.h

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index ce210d4..e27e652 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -41,7 +41,8 @@ ocfs2-objs := \
quota_local.o   \
quota_global.o  \
xattr.o \
-   acl.o
+   acl.o   \
+   filecheck.o
 
 ocfs2_stackglue-objs := stackglue.o
 ocfs2_stack_o2cb-objs := stack_o2cb.o
diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c
new file mode 100644
index 000..2cabbcf
--- /dev/null
+++ b/fs/ocfs2/filecheck.c
@@ -0,0 +1,606 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * filecheck.c
+ *
+ * Code which implements online file check.
+ *
+ * Copyright (C) 2016 SuSE.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ocfs2.h"
+#include "ocfs2_fs.h"
+#include "stackglue.h"
+#include "inode.h"
+
+#include "filecheck.h"
+
+
+/* File check error strings,
+ * must correspond with error number in header file.
+ */
+static const char * const ocfs2_filecheck_errs[] = {
+   "SUCCESS",
+   "FAILED",
+   "INPROGRESS",
+   "READONLY",
+   "INJBD",
+   "INVALIDINO",
+   "BLOCKECC",
+   "BLOCKNO",
+   "VALIDFLAG",
+   "GENERATION",
+   "UNSUPPORTED"
+};
+
+static DEFINE_SPINLOCK(ocfs2_filecheck_sysfs_lock);
+static LIST_HEAD(ocfs2_filecheck_sysfs_list);
+
+struct ocfs2_filecheck {
+   struct list_head fc_head;   /* File check entry list head */
+   spinlock_t fc_lock;
+   unsigned int fc_max;/* Maximum number of entry in list */
+   unsigned int fc_size;   /* Current entry count in list */
+   unsigned int fc_done;   /* Finished entry count in list */
+};
+
+struct ocfs2_filecheck_sysfs_entry {   /* sysfs entry per mounting */
+   struct list_head fs_list;
+   atomic_t fs_count;
+   struct super_block *fs_sb;
+   struct kset *fs_devicekset;
+   struct kset *fs_fcheckkset;
+   struct ocfs2_filecheck *fs_fcheck;
+};
+
+#define OCFS2_FILECHECK_MAXSIZE100
+#define OCFS2_FILECHECK_MINSIZE10
+
+/* File check operation type */
+enum {
+   OCFS2_FILECHECK_TYPE_CHK = 0,   /* Check a file(inode) */
+   OCFS2_FILECHECK_TYPE_FIX,   /* Fix a file(inode) */
+   OCFS2_FILECHECK_TYPE_SET = 100  /* Set entry list maximum size */
+};
+
+struct ocfs2_filecheck_entry {
+   struct list_head fe_list;
+   unsigned long fe_ino;
+   unsigned int fe_type;
+   unsigned int fe_done:1;
+   unsigned int fe_status:31;
+};
+
+struct ocfs2_filecheck_args {
+   unsigned int fa_type;
+   union {
+   unsigned long fa_ino;
+   unsigned int fa_len;
+   };
+};
+
+static const char *
+ocfs2_filecheck_error(int errno)
+{
+   if (!errno)
+   return ocfs2_filecheck_errs[errno];
+
+   BUG_ON(errno < OCFS2_FILECHECK_ERR_START ||
+  errno > OCFS2_FILECHECK_ERR_END);
+   return ocfs2_filecheck_errs[errno - OCFS2_FILECHECK_ERR_START + 1];
+}
+
+static ssize_t ocfs2_filecheck_show(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   char *buf);
+static ssize_t ocfs2_filecheck_store(struct kobject *kobj,
+struct kobj_attribute *attr,
+const char *buf, size_t count);
+static struct kobj_attribute ocfs2_attr_filecheck_chk =
+   __ATTR(check, S_IRUSR | S_IWUSR,
+   ocfs2_filecheck_show,
+   ocfs2_filecheck_store);
+static struct kobj_attribute ocfs2_attr_filecheck_fix =
+   __ATTR(fix, S_IRUSR | S_IWUSR,
+   ocfs2_filecheck_show,
+   ocfs2_filecheck_store);
+static struct 

[PATCH v4 5/5] ocfs2: add feature document for online file check

2016-02-28 Thread Gang He
This document will describe OCFS2 online file check feature.
OCFS2 is often used in high-availaibility systems. However, OCFS2 usually
converts the filesystem to read-only when encounters an error. This may not be
necessary, since turning the filesystem read-only would affect other running
processes as well, decreasing availability.
Then, a mount option (errors=continue) is introduced, which would return the
-EIO errno to the calling process and terminate furhter processing so that the
filesystem is not corrupted further. The filesystem is not converted to
read-only, and the problematic file's inode number is reported in the kernel
log. The user can try to check/fix this file via online filecheck feature.

Signed-off-by: Gang He 
Reviewed-by: Mark Fasheh 
---
 .../filesystems/ocfs2-online-filecheck.txt | 94 ++
 1 file changed, 94 insertions(+)
 create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt

diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt 
b/Documentation/filesystems/ocfs2-online-filecheck.txt
new file mode 100644
index 000..1ab0786
--- /dev/null
+++ b/Documentation/filesystems/ocfs2-online-filecheck.txt
@@ -0,0 +1,94 @@
+   OCFS2 online file check
+   ---
+
+This document will describe OCFS2 online file check feature.
+
+Introduction
+
+OCFS2 is often used in high-availaibility systems. However, OCFS2 usually
+converts the filesystem to read-only when encounters an error. This may not be
+necessary, since turning the filesystem read-only would affect other running
+processes as well, decreasing availability.
+Then, a mount option (errors=continue) is introduced, which would return the
+-EIO errno to the calling process and terminate furhter processing so that the
+filesystem is not corrupted further. The filesystem is not converted to
+read-only, and the problematic file's inode number is reported in the kernel
+log. The user can try to check/fix this file via online filecheck feature.
+
+Scope
+=
+This effort is to check/fix small issues which may hinder day-to-day operations
+of a cluster filesystem by turning the filesystem read-only. The scope of
+checking/fixing is at the file level, initially for regular files and 
eventually
+to all files (including system files) of the filesystem.
+
+In case of directory to file links is incorrect, the directory inode is
+reported as erroneous.
+
+This feature is not suited for extravagant checks which involve dependency of
+other components of the filesystem, such as but not limited to, checking if the
+bits for file blocks in the allocation has been set. In case of such an error,
+the offline fsck should/would be recommended.
+
+Finally, such an operation/feature should not be automated lest the filesystem
+may end up with more damage than before the repair attempt. So, this has to
+be performed using user interaction and consent.
+
+User interface
+==
+When there are errors in the OCFS2 filesystem, they are usually accompanied
+by the inode number which caused the error. This inode number would be the
+input to check/fix the file.
+
+There is a sysfs directory for each OCFS2 file system mounting:
+
+  /sys/fs/ocfs2//filecheck
+
+Here,  indicates the name of OCFS2 volumn device which has been 
already
+mounted. The file above would accept inode numbers. This could be used to
+communicate with kernel space, tell which file(inode number) will be checked or
+fixed. Currently, three operations are supported, which includes checking
+inode, fixing inode and setting the size of result record history.
+
+1. If you want to know what error exactly happened to  before fixing, do
+
+  # echo "" > /sys/fs/ocfs2//filecheck/check
+  # cat /sys/fs/ocfs2//filecheck/check
+
+The output is like this:
+  INO  DONEERROR
+39502  1   GENERATION
+
+ lists the inode numbers.
+ indicates whether the operation has been finished.
+ says what kind of errors was found. For the detailed error numbers,
+please refer to the file linux/fs/ocfs2/filecheck.h.
+
+2. If you determine to fix this inode, do
+
+  # echo "" > /sys/fs/ocfs2//filecheck/fix
+  # cat /sys/fs/ocfs2//filecheck/fix
+
+The output is like this:
+  INO  DONEERROR
+39502  1   SUCCESS
+
+This time, the  column indicates whether this fix is successful or not.
+
+3. The record cache is used to store the history of check/fix results. It's
+defalut size is 10, and can be adjust between the range of 10 ~ 100. You can
+adjust the size like this:
+
+  # echo "" > /sys/fs/ocfs2//filecheck/set
+
+Fixing stuff
+
+On receivng the inode, the filesystem would read the inode and the
+file metadata. In case of errors, the filesystem would fix the errors
+and report the problems it fixed in the kernel log. As a precautionary measure,
+the inode must first be checked for errors before performing a final fix.
+
+The inode and the result 

[PATCH v4 2/5] ocfs2: sysfile interfaces for online file check

2016-02-28 Thread Gang He
Implement online file check sysfile interfaces, e.g.
how to create the related sysfile according to device name,
how to display/handle file check request from the sysfile.

Signed-off-by: Gang He 
---
 fs/ocfs2/Makefile|   3 +-
 fs/ocfs2/filecheck.c | 606 +++
 fs/ocfs2/filecheck.h |  49 +
 fs/ocfs2/inode.h |   3 +
 4 files changed, 660 insertions(+), 1 deletion(-)
 create mode 100644 fs/ocfs2/filecheck.c
 create mode 100644 fs/ocfs2/filecheck.h

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index ce210d4..e27e652 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -41,7 +41,8 @@ ocfs2-objs := \
quota_local.o   \
quota_global.o  \
xattr.o \
-   acl.o
+   acl.o   \
+   filecheck.o
 
 ocfs2_stackglue-objs := stackglue.o
 ocfs2_stack_o2cb-objs := stack_o2cb.o
diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c
new file mode 100644
index 000..2cabbcf
--- /dev/null
+++ b/fs/ocfs2/filecheck.c
@@ -0,0 +1,606 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * filecheck.c
+ *
+ * Code which implements online file check.
+ *
+ * Copyright (C) 2016 SuSE.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ocfs2.h"
+#include "ocfs2_fs.h"
+#include "stackglue.h"
+#include "inode.h"
+
+#include "filecheck.h"
+
+
+/* File check error strings,
+ * must correspond with error number in header file.
+ */
+static const char * const ocfs2_filecheck_errs[] = {
+   "SUCCESS",
+   "FAILED",
+   "INPROGRESS",
+   "READONLY",
+   "INJBD",
+   "INVALIDINO",
+   "BLOCKECC",
+   "BLOCKNO",
+   "VALIDFLAG",
+   "GENERATION",
+   "UNSUPPORTED"
+};
+
+static DEFINE_SPINLOCK(ocfs2_filecheck_sysfs_lock);
+static LIST_HEAD(ocfs2_filecheck_sysfs_list);
+
+struct ocfs2_filecheck {
+   struct list_head fc_head;   /* File check entry list head */
+   spinlock_t fc_lock;
+   unsigned int fc_max;/* Maximum number of entry in list */
+   unsigned int fc_size;   /* Current entry count in list */
+   unsigned int fc_done;   /* Finished entry count in list */
+};
+
+struct ocfs2_filecheck_sysfs_entry {   /* sysfs entry per mounting */
+   struct list_head fs_list;
+   atomic_t fs_count;
+   struct super_block *fs_sb;
+   struct kset *fs_devicekset;
+   struct kset *fs_fcheckkset;
+   struct ocfs2_filecheck *fs_fcheck;
+};
+
+#define OCFS2_FILECHECK_MAXSIZE100
+#define OCFS2_FILECHECK_MINSIZE10
+
+/* File check operation type */
+enum {
+   OCFS2_FILECHECK_TYPE_CHK = 0,   /* Check a file(inode) */
+   OCFS2_FILECHECK_TYPE_FIX,   /* Fix a file(inode) */
+   OCFS2_FILECHECK_TYPE_SET = 100  /* Set entry list maximum size */
+};
+
+struct ocfs2_filecheck_entry {
+   struct list_head fe_list;
+   unsigned long fe_ino;
+   unsigned int fe_type;
+   unsigned int fe_done:1;
+   unsigned int fe_status:31;
+};
+
+struct ocfs2_filecheck_args {
+   unsigned int fa_type;
+   union {
+   unsigned long fa_ino;
+   unsigned int fa_len;
+   };
+};
+
+static const char *
+ocfs2_filecheck_error(int errno)
+{
+   if (!errno)
+   return ocfs2_filecheck_errs[errno];
+
+   BUG_ON(errno < OCFS2_FILECHECK_ERR_START ||
+  errno > OCFS2_FILECHECK_ERR_END);
+   return ocfs2_filecheck_errs[errno - OCFS2_FILECHECK_ERR_START + 1];
+}
+
+static ssize_t ocfs2_filecheck_show(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   char *buf);
+static ssize_t ocfs2_filecheck_store(struct kobject *kobj,
+struct kobj_attribute *attr,
+const char *buf, size_t count);
+static struct kobj_attribute ocfs2_attr_filecheck_chk =
+   __ATTR(check, S_IRUSR | S_IWUSR,
+   ocfs2_filecheck_show,
+   ocfs2_filecheck_store);
+static struct kobj_attribute ocfs2_attr_filecheck_fix =
+   __ATTR(fix, S_IRUSR | S_IWUSR,
+   ocfs2_filecheck_show,
+   ocfs2_filecheck_store);
+static struct kobj_attribute 

[PATCH v4 0/5] Add online file check feature

2016-02-28 Thread Gang He
When there are errors in the ocfs2 filesystem,
they are usually accompanied by the inode number which caused the error.
This inode number would be the input to fixing the file.
One of these options could be considered:
A file in the sys filesytem which would accept inode numbers.
This could be used to communication back what has to be fixed or is fixed.
You could write:
$# echo "" > /sys/fs/ocfs2/devname/filecheck/check
or
$# echo "" > /sys/fs/ocfs2/devname/filecheck/fix

Compare with third version, I add buffer_jbd() check in inode block fix/writing
dirty buffer back, make unsigned short type to unsigned int type for members
in ocfs2_filecheck_entry struct, add feature document in this patch set.
Compare with second version, I re-design filecheck sysfs interfaces, there
are three sysfs files(check, fix and set) under filecheck directory(see above),
sysfs will accept only one argument . Second, I adjust some code in
ocfs2_filecheck_repair_inode_block() function according to upstream feedback,
we cannot just add VALID_FL flag back as a inode block fix, then we will not
fix this field corruption currently until having a complete solution.
Compare with first version, I use strncasecmp instead of double strncmp
functions. Second, update the source file contribution vendor.

Gang He (5):
  ocfs2: export ocfs2_kset for online file check
  ocfs2: sysfile interfaces for online file check
  ocfs2: create/remove sysfile for online file check
  ocfs2: check/fix inode block for online file check
  ocfs2: add feature document for online file check

 .../filesystems/ocfs2-online-filecheck.txt |  94 
 fs/ocfs2/Makefile  |   3 +-
 fs/ocfs2/filecheck.c   | 606 +
 fs/ocfs2/filecheck.h   |  49 ++
 fs/ocfs2/inode.c   | 225 +++-
 fs/ocfs2/inode.h   |   3 +
 fs/ocfs2/ocfs2_trace.h |   2 +
 fs/ocfs2/stackglue.c   |   3 +-
 fs/ocfs2/stackglue.h   |   2 +
 fs/ocfs2/super.c   |   5 +
 10 files changed, 981 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt
 create mode 100644 fs/ocfs2/filecheck.c
 create mode 100644 fs/ocfs2/filecheck.h

-- 
2.1.2



linux-next: manual merge of the kvm-arm tree with the arm64 tree

2016-02-28 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm-arm tree got a conflict in:

  arch/arm64/include/asm/cpufeature.h

between commit:

  104a0c02e8b1 ("arm64: Add workaround for Cavium erratum 27456")

from the arm64 tree and commit:

  d0be74f771d5 ("arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature")

from the kvm-arm tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/include/asm/cpufeature.h
index 1497163213ed,a5c769b1c65b..
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@@ -30,12 -30,12 +30,13 @@@
  #define ARM64_HAS_LSE_ATOMICS 5
  #define ARM64_WORKAROUND_CAVIUM_23154 6
  #define ARM64_WORKAROUND_834220   7
 -/* #define ARM64_HAS_NO_HW_PREFETCH   8 */
 -/* #define ARM64_HAS_UAO  9 */
 -/* #define ARM64_ALT_PAN_NOT_UAO  10 */
 +#define ARM64_HAS_NO_HW_PREFETCH  8
 +#define ARM64_HAS_UAO 9
 +#define ARM64_ALT_PAN_NOT_UAO 10
+ #define ARM64_HAS_VIRT_HOST_EXTN  11
 +#define ARM64_WORKAROUND_CAVIUM_27456 12
  
 -#define ARM64_NCAPS   12
 +#define ARM64_NCAPS   13
  
  #ifndef __ASSEMBLY__
  


[PATCH v4 0/5] Add online file check feature

2016-02-28 Thread Gang He
When there are errors in the ocfs2 filesystem,
they are usually accompanied by the inode number which caused the error.
This inode number would be the input to fixing the file.
One of these options could be considered:
A file in the sys filesytem which would accept inode numbers.
This could be used to communication back what has to be fixed or is fixed.
You could write:
$# echo "" > /sys/fs/ocfs2/devname/filecheck/check
or
$# echo "" > /sys/fs/ocfs2/devname/filecheck/fix

Compare with third version, I add buffer_jbd() check in inode block fix/writing
dirty buffer back, make unsigned short type to unsigned int type for members
in ocfs2_filecheck_entry struct, add feature document in this patch set.
Compare with second version, I re-design filecheck sysfs interfaces, there
are three sysfs files(check, fix and set) under filecheck directory(see above),
sysfs will accept only one argument . Second, I adjust some code in
ocfs2_filecheck_repair_inode_block() function according to upstream feedback,
we cannot just add VALID_FL flag back as a inode block fix, then we will not
fix this field corruption currently until having a complete solution.
Compare with first version, I use strncasecmp instead of double strncmp
functions. Second, update the source file contribution vendor.

Gang He (5):
  ocfs2: export ocfs2_kset for online file check
  ocfs2: sysfile interfaces for online file check
  ocfs2: create/remove sysfile for online file check
  ocfs2: check/fix inode block for online file check
  ocfs2: add feature document for online file check

 .../filesystems/ocfs2-online-filecheck.txt |  94 
 fs/ocfs2/Makefile  |   3 +-
 fs/ocfs2/filecheck.c   | 606 +
 fs/ocfs2/filecheck.h   |  49 ++
 fs/ocfs2/inode.c   | 225 +++-
 fs/ocfs2/inode.h   |   3 +
 fs/ocfs2/ocfs2_trace.h |   2 +
 fs/ocfs2/stackglue.c   |   3 +-
 fs/ocfs2/stackglue.h   |   2 +
 fs/ocfs2/super.c   |   5 +
 10 files changed, 981 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt
 create mode 100644 fs/ocfs2/filecheck.c
 create mode 100644 fs/ocfs2/filecheck.h

-- 
2.1.2



linux-next: manual merge of the kvm-arm tree with the arm64 tree

2016-02-28 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm-arm tree got a conflict in:

  arch/arm64/include/asm/cpufeature.h

between commit:

  104a0c02e8b1 ("arm64: Add workaround for Cavium erratum 27456")

from the arm64 tree and commit:

  d0be74f771d5 ("arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature")

from the kvm-arm tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/include/asm/cpufeature.h
index 1497163213ed,a5c769b1c65b..
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@@ -30,12 -30,12 +30,13 @@@
  #define ARM64_HAS_LSE_ATOMICS 5
  #define ARM64_WORKAROUND_CAVIUM_23154 6
  #define ARM64_WORKAROUND_834220   7
 -/* #define ARM64_HAS_NO_HW_PREFETCH   8 */
 -/* #define ARM64_HAS_UAO  9 */
 -/* #define ARM64_ALT_PAN_NOT_UAO  10 */
 +#define ARM64_HAS_NO_HW_PREFETCH  8
 +#define ARM64_HAS_UAO 9
 +#define ARM64_ALT_PAN_NOT_UAO 10
+ #define ARM64_HAS_VIRT_HOST_EXTN  11
 +#define ARM64_WORKAROUND_CAVIUM_27456 12
  
 -#define ARM64_NCAPS   12
 +#define ARM64_NCAPS   13
  
  #ifndef __ASSEMBLY__
  


Re: [PATCH V3 3/3] vhost_net: basic polling support

2016-02-28 Thread Jason Wang


On 02/29/2016 05:56 AM, Christian Borntraeger wrote:
> On 02/26/2016 09:42 AM, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang 
>> > ---
>> >  drivers/vhost/net.c| 79 
>> > +++---
>> >  drivers/vhost/vhost.c  | 14 
>> >  drivers/vhost/vhost.h  |  1 +
>> >  include/uapi/linux/vhost.h |  6 
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info 
>> > *ubuf, bool success)
>> >rcu_read_unlock_bh();
>> >  }
>> > 
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +  return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +  unsigned long endtime)
>> > +{
>> > +  return likely(!need_resched()) &&
>> > + likely(!time_after(busy_clock(), endtime)) &&
>> > + likely(!signal_pending(current)) &&
>> > + !vhost_has_work(dev) &&
>> > + single_task_running();
>> > +}
>> > +
>> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>> > +  struct vhost_virtqueue *vq,
>> > +  struct iovec iov[], unsigned int iov_size,
>> > +  unsigned int *out_num, unsigned int *in_num)
>> > +{
>> > +  unsigned long uninitialized_var(endtime);
>> > +  int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> > +  out_num, in_num, NULL, NULL);
>> > +
>> > +  if (r == vq->num && vq->busyloop_timeout) {
>> > +  preempt_disable();
>> > +  endtime = busy_clock() + vq->busyloop_timeout;
>> > +  while (vhost_can_busy_poll(vq->dev, endtime) &&
>> > + vhost_vq_avail_empty(vq->dev, vq))
>> > +  cpu_relax();
> Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for 
> almost
> everybody but s390? cpu_relax (without low latency might give up the time 
> slice
> when running under another hypervisor (like LPAR on s390), which might not be 
> what
> we want here.

Ok, will do this in next version.


Re: [PATCH V3 3/3] vhost_net: basic polling support

2016-02-28 Thread Jason Wang


On 02/29/2016 05:56 AM, Christian Borntraeger wrote:
> On 02/26/2016 09:42 AM, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang 
>> > ---
>> >  drivers/vhost/net.c| 79 
>> > +++---
>> >  drivers/vhost/vhost.c  | 14 
>> >  drivers/vhost/vhost.h  |  1 +
>> >  include/uapi/linux/vhost.h |  6 
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info 
>> > *ubuf, bool success)
>> >rcu_read_unlock_bh();
>> >  }
>> > 
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +  return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +  unsigned long endtime)
>> > +{
>> > +  return likely(!need_resched()) &&
>> > + likely(!time_after(busy_clock(), endtime)) &&
>> > + likely(!signal_pending(current)) &&
>> > + !vhost_has_work(dev) &&
>> > + single_task_running();
>> > +}
>> > +
>> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>> > +  struct vhost_virtqueue *vq,
>> > +  struct iovec iov[], unsigned int iov_size,
>> > +  unsigned int *out_num, unsigned int *in_num)
>> > +{
>> > +  unsigned long uninitialized_var(endtime);
>> > +  int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> > +  out_num, in_num, NULL, NULL);
>> > +
>> > +  if (r == vq->num && vq->busyloop_timeout) {
>> > +  preempt_disable();
>> > +  endtime = busy_clock() + vq->busyloop_timeout;
>> > +  while (vhost_can_busy_poll(vq->dev, endtime) &&
>> > + vhost_vq_avail_empty(vq->dev, vq))
>> > +  cpu_relax();
> Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for 
> almost
> everybody but s390? cpu_relax (without low latency might give up the time 
> slice
> when running under another hypervisor (like LPAR on s390), which might not be 
> what
> we want here.

Ok, will do this in next version.


Re: [PATCH V3 3/3] vhost_net: basic polling support

2016-02-28 Thread Jason Wang


On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote:
> On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang 
> Looks good overall, but I still see one problem.
>
>> > ---
>> >  drivers/vhost/net.c| 79 
>> > +++---
>> >  drivers/vhost/vhost.c  | 14 
>> >  drivers/vhost/vhost.h  |  1 +
>> >  include/uapi/linux/vhost.h |  6 
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info 
>> > *ubuf, bool success)
>> >rcu_read_unlock_bh();
>> >  }
>> >  
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +  return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +  unsigned long endtime)
>> > +{
>> > +  return likely(!need_resched()) &&
>> > + likely(!time_after(busy_clock(), endtime)) &&
>> > + likely(!signal_pending(current)) &&
>> > + !vhost_has_work(dev) &&
>> > + single_task_running();
> So I find it quite unfortunate that this still uses single_task_running.
> This means that for example a SCHED_IDLE task will prevent polling from
> becoming active, and that seems like a bug, or at least
> an undocumented feature :).

Yes, it may need more thoughts.

>
> Unfortunately this logic affects the behaviour as observed
> by userspace, so we can't merge it like this and tune
> afterwards, since otherwise mangement tools will start
> depending on this logic.
>
>

How about remove single_task_running() first here and optimize on top?
We probably need something like this to handle overcommitment.



Re: [PATCH V3 3/3] vhost_net: basic polling support

2016-02-28 Thread Jason Wang


On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote:
> On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang 
> Looks good overall, but I still see one problem.
>
>> > ---
>> >  drivers/vhost/net.c| 79 
>> > +++---
>> >  drivers/vhost/vhost.c  | 14 
>> >  drivers/vhost/vhost.h  |  1 +
>> >  include/uapi/linux/vhost.h |  6 
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info 
>> > *ubuf, bool success)
>> >rcu_read_unlock_bh();
>> >  }
>> >  
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +  return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +  unsigned long endtime)
>> > +{
>> > +  return likely(!need_resched()) &&
>> > + likely(!time_after(busy_clock(), endtime)) &&
>> > + likely(!signal_pending(current)) &&
>> > + !vhost_has_work(dev) &&
>> > + single_task_running();
> So I find it quite unfortunate that this still uses single_task_running.
> This means that for example a SCHED_IDLE task will prevent polling from
> becoming active, and that seems like a bug, or at least
> an undocumented feature :).

Yes, it may need more thoughts.

>
> Unfortunately this logic affects the behaviour as observed
> by userspace, so we can't merge it like this and tune
> afterwards, since otherwise mangement tools will start
> depending on this logic.
>
>

How about remove single_task_running() first here and optimize on top?
We probably need something like this to handle overcommitment.



[lkp] [n_tty] dd9a6fee68: INFO: possible circular locking dependency detected ]

2016-02-28 Thread kernel test robot
FYI, we noticed the below changes on

https://github.com/0day-ci/linux 
Brian-Bloniarz/Re-n_tty-Check-the-other-end-of-pty-pair-before-returning-EAGAIN-on-a-read/20160229-070452
commit dd9a6fee6830f16f602b1aa2e85d6307acd04945 ("n_tty: Check the other end of 
pty pair before returning EAGAIN on a read()")


++--++
|| v4.5-rc6 | dd9a6fee68 |
++--++
| boot_successes | 128  | 2  |
| boot_failures  | 9| 6  |
| invoked_oom-killer:gfp_mask=0x | 9| 1  |
| Mem-Info   | 9| 2  |
| Out_of_memory:Kill_process | 9| 1  |
| backtrace:vfs_write| 1||
| backtrace:SyS_write| 1||
| backtrace:do_execveat_common   | 1||
| backtrace:compat_SyS_execve| 1||
| backtrace:vfs_read | 1| 4  |
| backtrace:SyS_read | 1| 4  |
| backtrace:compat_process_vm_rw | 1||
| backtrace:compat_SyS_process_vm_readv  | 1||
| backtrace:_do_fork | 1||
| backtrace:SyS_clone| 1||
| page_allocation_failure:order:#,mode   | 0| 1  |
| warn_alloc_failed+0x   | 0| 1  |
| backtrace:kswapd   | 0| 1  |
| INFO:possible_circular_locking_dependency_detected | 0| 4  |
| backtrace:flush_to_ldisc   | 0| 4  |
++--++



[   17.523349] mount (2393) used greatest stack depth: 12392 bytes left
[   17.684314] 
[   17.684972] ==
[   17.686059] [ INFO: possible circular locking dependency detected ]
[   17.687174] 4.5.0-rc6-1-gdd9a6fe #64 Not tainted
[   17.688127] ---
[   17.689216] bootlogd/2434 is trying to acquire lock:
[   17.690167]  ((>work)){+.+...}, at: [] 
flush_work+0x5/0x23d
[   17.692006] 
[   17.692006] but task is already holding lock:
[   17.693433]  (>termios_rwsem){..}, at: [] 
n_tty_read+0xd0/0x882
[   17.695346] 
[   17.695346] which lock already depends on the new lock.
[   17.695346] 
[   17.697370] 
[   17.697370] the existing dependency chain (in reverse order) is:
[   17.698961] 
-> #2 (>termios_rwsem){..}:
[   17.700507][] lock_acquire+0x147/0x1e2
[   17.701621][] down_read+0x48/0x90
[   17.702696][] n_tty_receive_buf_common+0x46/0x8c0
[   17.703900][] n_tty_receive_buf2+0x14/0x16
[   17.705046][] flush_to_ldisc+0xcb/0x125
[   17.706167][] process_one_work+0x2b8/0x5b2
[   17.707339][] worker_thread+0x28b/0x37d
[   17.708454][] kthread+0xfb/0x103
[   17.709511][] ret_from_fork+0x3f/0x70
[   17.710614] 
-> #1 (>lock){+.+...}:
[   17.712070][] lock_acquire+0x147/0x1e2
[   17.713185][] mutex_lock_nested+0x79/0x35f
[   17.714328][] flush_to_ldisc+0x4b/0x125
[   17.715443][] process_one_work+0x2b8/0x5b2
[   17.716587][] worker_thread+0x28b/0x37d
[   17.717700][] kthread+0xfb/0x103
[   17.718752][] ret_from_fork+0x3f/0x70
[   17.719855] 
-> #0 ((>work)){+.+...}:
[   17.721333][] __lock_acquire+0x12dd/0x1932
[   17.722489][] lock_acquire+0x147/0x1e2
[   17.723598][] flush_work+0x3a/0x23d
[   17.724683][] n_tty_read+0x308/0x882
[   17.725771][] tty_read+0x8b/0xcd
[   17.726830][] __vfs_read+0x26/0xb9
[   17.727910][] vfs_read+0xa0/0x12e
[   17.728974][] SyS_read+0x51/0x92
[   17.730032][] entry_SYSCALL_64_fastpath+0x12/0x72
[   17.731237] 
[   17.731237] other info that might help us debug this:
[   17.731237] 
[   17.733255] Chain exists of:
  (>work) --> >lock --> >termios_rwsem

[   17.735644]  Possible unsafe locking scenario:
[   17.735644] 
[   17.737064]CPU0CPU1
[   17.737969]
[   17.738873]   lock(>termios_rwsem);
[   17.739832]lock(>lock);
[   17.740966]lock(>termios_rwsem);
[   17.742181]   lock((>work));
[   17.743081] 
[   17.743081]  *** DEADLOCK ***
[   17.743081] 
[   17.744901] 3 locks held by 

[lkp] [n_tty] dd9a6fee68: INFO: possible circular locking dependency detected ]

2016-02-28 Thread kernel test robot
FYI, we noticed the below changes on

https://github.com/0day-ci/linux 
Brian-Bloniarz/Re-n_tty-Check-the-other-end-of-pty-pair-before-returning-EAGAIN-on-a-read/20160229-070452
commit dd9a6fee6830f16f602b1aa2e85d6307acd04945 ("n_tty: Check the other end of 
pty pair before returning EAGAIN on a read()")


++--++
|| v4.5-rc6 | dd9a6fee68 |
++--++
| boot_successes | 128  | 2  |
| boot_failures  | 9| 6  |
| invoked_oom-killer:gfp_mask=0x | 9| 1  |
| Mem-Info   | 9| 2  |
| Out_of_memory:Kill_process | 9| 1  |
| backtrace:vfs_write| 1||
| backtrace:SyS_write| 1||
| backtrace:do_execveat_common   | 1||
| backtrace:compat_SyS_execve| 1||
| backtrace:vfs_read | 1| 4  |
| backtrace:SyS_read | 1| 4  |
| backtrace:compat_process_vm_rw | 1||
| backtrace:compat_SyS_process_vm_readv  | 1||
| backtrace:_do_fork | 1||
| backtrace:SyS_clone| 1||
| page_allocation_failure:order:#,mode   | 0| 1  |
| warn_alloc_failed+0x   | 0| 1  |
| backtrace:kswapd   | 0| 1  |
| INFO:possible_circular_locking_dependency_detected | 0| 4  |
| backtrace:flush_to_ldisc   | 0| 4  |
++--++



[   17.523349] mount (2393) used greatest stack depth: 12392 bytes left
[   17.684314] 
[   17.684972] ==
[   17.686059] [ INFO: possible circular locking dependency detected ]
[   17.687174] 4.5.0-rc6-1-gdd9a6fe #64 Not tainted
[   17.688127] ---
[   17.689216] bootlogd/2434 is trying to acquire lock:
[   17.690167]  ((>work)){+.+...}, at: [] 
flush_work+0x5/0x23d
[   17.692006] 
[   17.692006] but task is already holding lock:
[   17.693433]  (>termios_rwsem){..}, at: [] 
n_tty_read+0xd0/0x882
[   17.695346] 
[   17.695346] which lock already depends on the new lock.
[   17.695346] 
[   17.697370] 
[   17.697370] the existing dependency chain (in reverse order) is:
[   17.698961] 
-> #2 (>termios_rwsem){..}:
[   17.700507][] lock_acquire+0x147/0x1e2
[   17.701621][] down_read+0x48/0x90
[   17.702696][] n_tty_receive_buf_common+0x46/0x8c0
[   17.703900][] n_tty_receive_buf2+0x14/0x16
[   17.705046][] flush_to_ldisc+0xcb/0x125
[   17.706167][] process_one_work+0x2b8/0x5b2
[   17.707339][] worker_thread+0x28b/0x37d
[   17.708454][] kthread+0xfb/0x103
[   17.709511][] ret_from_fork+0x3f/0x70
[   17.710614] 
-> #1 (>lock){+.+...}:
[   17.712070][] lock_acquire+0x147/0x1e2
[   17.713185][] mutex_lock_nested+0x79/0x35f
[   17.714328][] flush_to_ldisc+0x4b/0x125
[   17.715443][] process_one_work+0x2b8/0x5b2
[   17.716587][] worker_thread+0x28b/0x37d
[   17.717700][] kthread+0xfb/0x103
[   17.718752][] ret_from_fork+0x3f/0x70
[   17.719855] 
-> #0 ((>work)){+.+...}:
[   17.721333][] __lock_acquire+0x12dd/0x1932
[   17.722489][] lock_acquire+0x147/0x1e2
[   17.723598][] flush_work+0x3a/0x23d
[   17.724683][] n_tty_read+0x308/0x882
[   17.725771][] tty_read+0x8b/0xcd
[   17.726830][] __vfs_read+0x26/0xb9
[   17.727910][] vfs_read+0xa0/0x12e
[   17.728974][] SyS_read+0x51/0x92
[   17.730032][] entry_SYSCALL_64_fastpath+0x12/0x72
[   17.731237] 
[   17.731237] other info that might help us debug this:
[   17.731237] 
[   17.733255] Chain exists of:
  (>work) --> >lock --> >termios_rwsem

[   17.735644]  Possible unsafe locking scenario:
[   17.735644] 
[   17.737064]CPU0CPU1
[   17.737969]
[   17.738873]   lock(>termios_rwsem);
[   17.739832]lock(>lock);
[   17.740966]lock(>termios_rwsem);
[   17.742181]   lock((>work));
[   17.743081] 
[   17.743081]  *** DEADLOCK ***
[   17.743081] 
[   17.744901] 3 locks held by 

  1   2   3   4   5   6   7   8   >