Re: [PATCH v4 5/5] staging/android: add flags member to sync ioctl structs
On 27 February 2016 at 15:27, Gustavo Padovanwrote: > Hi Emil, > > 2016-02-27 Emil Velikov : > >> Hi Gustavo, >> >> On 26 February 2016 at 18:31, Gustavo Padovan wrote: >> > From: Gustavo Padovan >> > >> > Play safe and add flags member to all structs. So we don't need to >> > break API or create new IOCTL in the future if new features that requires >> > flags arises. >> > >> > v2: check if flags are valid (zero, in this case) >> > >> > Signed-off-by: Gustavo Padovan >> > --- >> > drivers/staging/android/sync.c | 7 ++- >> > drivers/staging/android/uapi/sync.h | 6 ++ >> > 2 files changed, 12 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/staging/android/sync.c >> > b/drivers/staging/android/sync.c >> > index 837cff5..54fd5ab 100644 >> > --- a/drivers/staging/android/sync.c >> > +++ b/drivers/staging/android/sync.c >> > @@ -445,6 +445,11 @@ static long sync_file_ioctl_merge(struct sync_file >> > *sync_file, >> > goto err_put_fd; >> > } >> > >> > + if (data.flags) { >> > + err = -EFAULT; >> -EINVAL ? >> >> > + goto err_put_fd; >> > + } >> > + >> > fence2 = sync_file_fdget(data.fd2); >> > if (!fence2) { >> > err = -ENOENT; >> > @@ -511,7 +516,7 @@ static long sync_file_ioctl_fence_info(struct >> > sync_file *sync_file, >> > if (copy_from_user(, (void __user *)arg, sizeof(*info))) >> > return -EFAULT; >> > >> > - if (in.status || strcmp(in.name, "\0")) >> > + if (in.status || in.flags || strcmp(in.name, "\0")) >> > return -EFAULT; >> -EINVAL ? >> >> > >> > if (in.num_fences && !in.sync_fence_info) >> > diff --git a/drivers/staging/android/uapi/sync.h >> > b/drivers/staging/android/uapi/sync.h >> > index 9aad623..f56a6c2 100644 >> > --- a/drivers/staging/android/uapi/sync.h >> > +++ b/drivers/staging/android/uapi/sync.h >> > @@ -19,11 +19,13 @@ >> > * @fd2: file descriptor of second fence >> > * @name: name of new fence >> > * @fence: returns the fd of the new fence to userspace >> > + * @flags: merge_data flags >> > */ >> > struct sync_merge_data { >> > __s32 fd2; >> > charname[32]; >> > __s32 fence; >> > + __u32 flags; >> The overall size of the struct is not multiple of 64bit, so things >> will end up badly if we decide to extend it in the future. Even if >> there's a small chance that update will be needed, we might as well >> pad it now (and check the padding for zero, returning -EINVAL). > > I think name could be the first field here. > Up-to you really. I'm afraid that it doesn't resolve the issue :-( As a test add a u64 value at the end of the struct and check the output of pahole for 32 and 64 bit build. >> >> > }; >> > >> > /** >> > @@ -31,12 +33,14 @@ struct sync_merge_data { >> > * @obj_name: name of parent sync_timeline >> > * @driver_name: name of driver implementing the parent >> > * @status:status of the fence 0:active 1:signaled <0:error >> > + * @flags: fence_info flags >> > * @timestamp_ns: timestamp of status change in nanoseconds >> > */ >> > struct sync_fence_info { >> > charobj_name[32]; >> > chardriver_name[32]; >> > __s32 status; >> > + __u32 flags; >> > __u64 timestamp_ns; >> Should we be doing some form of validation in sync_fill_fence_info() >> of 'flags' ? > > Do you think it is necessary? The kernel allocates a zero'ed buffer to > fill sync_fence_info array. > Good point. Missed out the z in kzalloc :-) -Emil
Re: [PATCH v4 5/5] staging/android: add flags member to sync ioctl structs
On 27 February 2016 at 15:27, Gustavo Padovan wrote: > Hi Emil, > > 2016-02-27 Emil Velikov : > >> Hi Gustavo, >> >> On 26 February 2016 at 18:31, Gustavo Padovan wrote: >> > From: Gustavo Padovan >> > >> > Play safe and add flags member to all structs. So we don't need to >> > break API or create new IOCTL in the future if new features that requires >> > flags arises. >> > >> > v2: check if flags are valid (zero, in this case) >> > >> > Signed-off-by: Gustavo Padovan >> > --- >> > drivers/staging/android/sync.c | 7 ++- >> > drivers/staging/android/uapi/sync.h | 6 ++ >> > 2 files changed, 12 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/staging/android/sync.c >> > b/drivers/staging/android/sync.c >> > index 837cff5..54fd5ab 100644 >> > --- a/drivers/staging/android/sync.c >> > +++ b/drivers/staging/android/sync.c >> > @@ -445,6 +445,11 @@ static long sync_file_ioctl_merge(struct sync_file >> > *sync_file, >> > goto err_put_fd; >> > } >> > >> > + if (data.flags) { >> > + err = -EFAULT; >> -EINVAL ? >> >> > + goto err_put_fd; >> > + } >> > + >> > fence2 = sync_file_fdget(data.fd2); >> > if (!fence2) { >> > err = -ENOENT; >> > @@ -511,7 +516,7 @@ static long sync_file_ioctl_fence_info(struct >> > sync_file *sync_file, >> > if (copy_from_user(, (void __user *)arg, sizeof(*info))) >> > return -EFAULT; >> > >> > - if (in.status || strcmp(in.name, "\0")) >> > + if (in.status || in.flags || strcmp(in.name, "\0")) >> > return -EFAULT; >> -EINVAL ? >> >> > >> > if (in.num_fences && !in.sync_fence_info) >> > diff --git a/drivers/staging/android/uapi/sync.h >> > b/drivers/staging/android/uapi/sync.h >> > index 9aad623..f56a6c2 100644 >> > --- a/drivers/staging/android/uapi/sync.h >> > +++ b/drivers/staging/android/uapi/sync.h >> > @@ -19,11 +19,13 @@ >> > * @fd2: file descriptor of second fence >> > * @name: name of new fence >> > * @fence: returns the fd of the new fence to userspace >> > + * @flags: merge_data flags >> > */ >> > struct sync_merge_data { >> > __s32 fd2; >> > charname[32]; >> > __s32 fence; >> > + __u32 flags; >> The overall size of the struct is not multiple of 64bit, so things >> will end up badly if we decide to extend it in the future. Even if >> there's a small chance that update will be needed, we might as well >> pad it now (and check the padding for zero, returning -EINVAL). > > I think name could be the first field here. > Up-to you really. I'm afraid that it doesn't resolve the issue :-( As a test add a u64 value at the end of the struct and check the output of pahole for 32 and 64 bit build. >> >> > }; >> > >> > /** >> > @@ -31,12 +33,14 @@ struct sync_merge_data { >> > * @obj_name: name of parent sync_timeline >> > * @driver_name: name of driver implementing the parent >> > * @status:status of the fence 0:active 1:signaled <0:error >> > + * @flags: fence_info flags >> > * @timestamp_ns: timestamp of status change in nanoseconds >> > */ >> > struct sync_fence_info { >> > charobj_name[32]; >> > chardriver_name[32]; >> > __s32 status; >> > + __u32 flags; >> > __u64 timestamp_ns; >> Should we be doing some form of validation in sync_fill_fence_info() >> of 'flags' ? > > Do you think it is necessary? The kernel allocates a zero'ed buffer to > fill sync_fence_info array. > Good point. Missed out the z in kzalloc :-) -Emil
Re: [PATCH v2] signals, pkeys: make si_pkey 32 bits
* Stephen Rothwellwrote: > In order to prevent a change of alignment of the _sifields union in the > siginfo structure on (some) 32 bit platforms and an ABI breakage, we > change the type of _pkey to unsigned int. If more bits are needed in > the future, a second unsigned int could be added. > > Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key > faults") > Acked-by: Dave Hansen > Signed-off-by: Stephen Rothwell > --- > arch/ia64/include/uapi/asm/siginfo.h | 2 +- > arch/mips/include/uapi/asm/siginfo.h | 2 +- > include/uapi/asm-generic/siginfo.h | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/ia64/include/uapi/asm/siginfo.h > b/arch/ia64/include/uapi/asm/siginfo.h > index 0151cfab929d..19e7db0c9453 100644 > --- a/arch/ia64/include/uapi/asm/siginfo.h > +++ b/arch/ia64/include/uapi/asm/siginfo.h > @@ -70,7 +70,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > > diff --git a/arch/mips/include/uapi/asm/siginfo.h > b/arch/mips/include/uapi/asm/siginfo.h > index 6f4edf0d794c..3cc14f4a5936 100644 > --- a/arch/mips/include/uapi/asm/siginfo.h > +++ b/arch/mips/include/uapi/asm/siginfo.h > @@ -93,7 +93,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > > diff --git a/include/uapi/asm-generic/siginfo.h > b/include/uapi/asm-generic/siginfo.h > index 90384d55225b..f4459dc3d31b 100644 > --- a/include/uapi/asm-generic/siginfo.h > +++ b/include/uapi/asm-generic/siginfo.h > @@ -98,7 +98,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > Please use the standard ABI integer type pattern: __u32. The advantage of only using __[su][8|16|32|64] integer types is that it's "obvious" at a glance that an ABI is bitness-invariant. For example include/uapi/linux/perf_event.h only uses such ABI-safe types, and arch/x86/include/uapi is using these types 95%+ of the time. ( The various struct siginfo definitions should probably be harmonized as well, but in a separate patch. ) Thanks, Ingo
Re: [PATCH v2] signals, pkeys: make si_pkey 32 bits
* Stephen Rothwell wrote: > In order to prevent a change of alignment of the _sifields union in the > siginfo structure on (some) 32 bit platforms and an ABI breakage, we > change the type of _pkey to unsigned int. If more bits are needed in > the future, a second unsigned int could be added. > > Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key > faults") > Acked-by: Dave Hansen > Signed-off-by: Stephen Rothwell > --- > arch/ia64/include/uapi/asm/siginfo.h | 2 +- > arch/mips/include/uapi/asm/siginfo.h | 2 +- > include/uapi/asm-generic/siginfo.h | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/ia64/include/uapi/asm/siginfo.h > b/arch/ia64/include/uapi/asm/siginfo.h > index 0151cfab929d..19e7db0c9453 100644 > --- a/arch/ia64/include/uapi/asm/siginfo.h > +++ b/arch/ia64/include/uapi/asm/siginfo.h > @@ -70,7 +70,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > > diff --git a/arch/mips/include/uapi/asm/siginfo.h > b/arch/mips/include/uapi/asm/siginfo.h > index 6f4edf0d794c..3cc14f4a5936 100644 > --- a/arch/mips/include/uapi/asm/siginfo.h > +++ b/arch/mips/include/uapi/asm/siginfo.h > @@ -93,7 +93,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > > diff --git a/include/uapi/asm-generic/siginfo.h > b/include/uapi/asm-generic/siginfo.h > index 90384d55225b..f4459dc3d31b 100644 > --- a/include/uapi/asm-generic/siginfo.h > +++ b/include/uapi/asm-generic/siginfo.h > @@ -98,7 +98,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > Please use the standard ABI integer type pattern: __u32. The advantage of only using __[su][8|16|32|64] integer types is that it's "obvious" at a glance that an ABI is bitness-invariant. For example include/uapi/linux/perf_event.h only uses such ABI-safe types, and arch/x86/include/uapi is using these types 95%+ of the time. ( The various struct siginfo definitions should probably be harmonized as well, but in a separate patch. ) Thanks, Ingo
Re: linux-next: manual merge of the iommu tree with the samsung-krzk tree
Hi Stephen, On Mon, Feb 29, 2016 at 03:20:55PM +1100, Stephen Rothwell wrote: > Hi Joerg, > > Today's linux-next merge of the iommu tree got a conflict in: > > drivers/memory/Kconfig > > between commit: > > 78fbb9361ca3 ("memory: Add support for Exynos SROM driver") > > from the samsung-krzk tree and commit: > > cc8bbe1a8312 ("memory: mediatek: Add SMI driver") > > from the iommu tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). Thanks for fixing this (and the other conflict before) up. Joerg
Re: linux-next: manual merge of the iommu tree with the samsung-krzk tree
Hi Stephen, On Mon, Feb 29, 2016 at 03:20:55PM +1100, Stephen Rothwell wrote: > Hi Joerg, > > Today's linux-next merge of the iommu tree got a conflict in: > > drivers/memory/Kconfig > > between commit: > > 78fbb9361ca3 ("memory: Add support for Exynos SROM driver") > > from the samsung-krzk tree and commit: > > cc8bbe1a8312 ("memory: mediatek: Add SMI driver") > > from the iommu tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). Thanks for fixing this (and the other conflict before) up. Joerg
Re: [PATCH] mm: __delete_from_page_cache WARN_ON(page_mapped)
2016-02-29 13:49 GMT+09:00 Hugh Dickins: > Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() > for compound pages") changed the famous BUG_ON(page_mapped(page)) in > __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which > gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not. > > Although it has not usually been very helpul, being hit long after the > error in question, we do need to know if it actually happens on users' > systems; but reinstating a crash there is likely to be opposed :) > > In the non-debug case, use WARN_ON() plus dump_page() and add_taint() - > I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the > standard procedure now. Move that, or the VM_BUG_ON_PAGE(), up before > the deletion from tree: so that the unNULLified page->mapping gives a > little more information. > > If the inode is being evicted (rather than truncated), it won't have > any vmas left, so it's safe(ish) to assume that the raised mapcount is > erroneous, and we can discount it from page_count to avoid leaking the > page (I'm less worried by leaking the occasional 4kB, than losing a > potential 2MB page with each 4kB page leaked). > > Signed-off-by: Hugh Dickins > --- > I think this should go into v4.5, so I've written it with an atomic_sub > on page->_count; but Joonsoo will probably want some page_ref thingy. Okay. I will do it after this patch is merged. Thanks for notification. Thanks.
Re: [PATCH] mm: __delete_from_page_cache WARN_ON(page_mapped)
2016-02-29 13:49 GMT+09:00 Hugh Dickins : > Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() > for compound pages") changed the famous BUG_ON(page_mapped(page)) in > __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which > gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not. > > Although it has not usually been very helpul, being hit long after the > error in question, we do need to know if it actually happens on users' > systems; but reinstating a crash there is likely to be opposed :) > > In the non-debug case, use WARN_ON() plus dump_page() and add_taint() - > I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the > standard procedure now. Move that, or the VM_BUG_ON_PAGE(), up before > the deletion from tree: so that the unNULLified page->mapping gives a > little more information. > > If the inode is being evicted (rather than truncated), it won't have > any vmas left, so it's safe(ish) to assume that the raised mapcount is > erroneous, and we can discount it from page_count to avoid leaking the > page (I'm less worried by leaking the occasional 4kB, than losing a > potential 2MB page with each 4kB page leaked). > > Signed-off-by: Hugh Dickins > --- > I think this should go into v4.5, so I've written it with an atomic_sub > on page->_count; but Joonsoo will probably want some page_ref thingy. Okay. I will do it after this patch is merged. Thanks for notification. Thanks.
Re: log spammed with "loading xx failed with error -2" since commit e40ba6d56b [replace call to fw_read_file_contents() with kernel version]
On Sun, 28 Feb 2016, Luis R. Rodriguez wrote: > >From e63d19975787c0e237a47c17efd01e41b2a8e2fa Mon Sep 17 00:00:00 2001 > From: "Luis R. Rodriguez"> Date: Sat, 27 Feb 2016 14:58:08 -0800 > Subject: [PATCH] firmware: change kernel read fail to dev_dbg() > Applied to git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next -- James Morris
Re: log spammed with "loading xx failed with error -2" since commit e40ba6d56b [replace call to fw_read_file_contents() with kernel version]
On Sun, 28 Feb 2016, Luis R. Rodriguez wrote: > >From e63d19975787c0e237a47c17efd01e41b2a8e2fa Mon Sep 17 00:00:00 2001 > From: "Luis R. Rodriguez" > Date: Sat, 27 Feb 2016 14:58:08 -0800 > Subject: [PATCH] firmware: change kernel read fail to dev_dbg() > Applied to git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next -- James Morris
Re: [PATCH] [RFC] mm/page_ref, crypto/async_pq: don't put_page from __exit
2016-02-29 6:57 GMT+09:00 Arnd Bergmann: > The addition of tracepoints to the page reference tracking had an > unfortunate side-effect in at least one driver that calls put_page > from its exit function, resulting in a link error: > > `.exit.text' referenced in section `__jump_table' of crypto/built-in.o: > defined in discarded section `.exit.text' of crypto/built-in.o > > I could not come up with a nice solution that ignores __jump_table > entries in discarded code, so we probably now have to treat this > as something a driver is not allowed to do. Removing the __exit > annotation avoids the problem in this particular driver, but the > same problem could come back any time in other code. > > On a related problem regarding the runtime patching for SMP > operations on ARM uniprocessor systems, we resorted to not > drop the .exit section at link time, but that doesn't seem > appropriate here. > > Signed-off-by: Arnd Bergmann > Fixes: 0f80830dd044 ("mm/page_ref: add tracepoint to track down page > reference manipulation") > --- > crypto/async_tx/async_pq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c > index c0748bbd4c08..be167145aa55 100644 > --- a/crypto/async_tx/async_pq.c > +++ b/crypto/async_tx/async_pq.c > @@ -442,7 +442,7 @@ static int __init async_pq_init(void) > return -ENOMEM; > } > > -static void __exit async_pq_exit(void) > +static void async_pq_exit(void) > { > put_page(pq_scribble_page); > } Hello, Arnd. I think that we can avoid this error by using __free_page(). It would not be inlined so calling it would have no problem. Could you test it, please? Thanks.
Re: [PATCH] [RFC] mm/page_ref, crypto/async_pq: don't put_page from __exit
2016-02-29 6:57 GMT+09:00 Arnd Bergmann : > The addition of tracepoints to the page reference tracking had an > unfortunate side-effect in at least one driver that calls put_page > from its exit function, resulting in a link error: > > `.exit.text' referenced in section `__jump_table' of crypto/built-in.o: > defined in discarded section `.exit.text' of crypto/built-in.o > > I could not come up with a nice solution that ignores __jump_table > entries in discarded code, so we probably now have to treat this > as something a driver is not allowed to do. Removing the __exit > annotation avoids the problem in this particular driver, but the > same problem could come back any time in other code. > > On a related problem regarding the runtime patching for SMP > operations on ARM uniprocessor systems, we resorted to not > drop the .exit section at link time, but that doesn't seem > appropriate here. > > Signed-off-by: Arnd Bergmann > Fixes: 0f80830dd044 ("mm/page_ref: add tracepoint to track down page > reference manipulation") > --- > crypto/async_tx/async_pq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c > index c0748bbd4c08..be167145aa55 100644 > --- a/crypto/async_tx/async_pq.c > +++ b/crypto/async_tx/async_pq.c > @@ -442,7 +442,7 @@ static int __init async_pq_init(void) > return -ENOMEM; > } > > -static void __exit async_pq_exit(void) > +static void async_pq_exit(void) > { > put_page(pq_scribble_page); > } Hello, Arnd. I think that we can avoid this error by using __free_page(). It would not be inlined so calling it would have no problem. Could you test it, please? Thanks.
Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
On 02/26/2016, 08:59 PM, Robert Święcki wrote: > It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e. > newer AMD FX, and Opteron 300 series (4300, 6300 etc.). Ok, I can confirm this is: AMD Opteron(tm) Processor 6348 And: microcode: CPU0: patch_level=0x06000836 Thank all the interested parties! -- js suse labs
Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
On 02/26/2016, 08:59 PM, Robert Święcki wrote: > It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e. > newer AMD FX, and Opteron 300 series (4300, 6300 etc.). Ok, I can confirm this is: AMD Opteron(tm) Processor 6348 And: microcode: CPU0: patch_level=0x06000836 Thank all the interested parties! -- js suse labs
Re: [PATCH v5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism
On Fri, Feb 26, 2016 at 11:29:52AM +0100, Borislav Petkov wrote: > On Fri, Feb 26, 2016 at 11:18:28AM +0100, Thomas Gleixner wrote: > > On Fri, 26 Feb 2016, Huang Rui wrote: > > > +/* Event code: LSB 8 bits, passed in attr->config any other bit is > > > reserved. */ > > > +#define AMD_POWER_EVENT_MASK 0xFFULL > > > + > > > +#define MAX_CUS 8 > > > > What's that define for? Max compute units? So is that stuff eternaly limited > > to 8? > > I already sent him a cleaned up version with that dumbness removed: > > https://lkml.kernel.org/r/20160128145436.ge14...@pd.tnic > > Rui, what's up? > Sorry, I will remove superfluous MAX_CUS check at next version. Thanks, Rui
Re: [PATCH v5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism
On Fri, Feb 26, 2016 at 11:29:52AM +0100, Borislav Petkov wrote: > On Fri, Feb 26, 2016 at 11:18:28AM +0100, Thomas Gleixner wrote: > > On Fri, 26 Feb 2016, Huang Rui wrote: > > > +/* Event code: LSB 8 bits, passed in attr->config any other bit is > > > reserved. */ > > > +#define AMD_POWER_EVENT_MASK 0xFFULL > > > + > > > +#define MAX_CUS 8 > > > > What's that define for? Max compute units? So is that stuff eternaly limited > > to 8? > > I already sent him a cleaned up version with that dumbness removed: > > https://lkml.kernel.org/r/20160128145436.ge14...@pd.tnic > > Rui, what's up? > Sorry, I will remove superfluous MAX_CUS check at next version. Thanks, Rui
[PATCH] PCI: PTM preliminary implementation
Simplified Precision Time Measurement driver, activates PTM feature if a PCIe PTM requester (as per PCI Express 3.1 Base Specification section 7.32)is found, but not before checking if the rest of the PCI hierarchy can support it. The driver does not take part in facilitating PTM conversations, neither does it provide any useful services, it is only responsible for setting up the required configuration space bits. As of writing, there aren't any PTM capable devices on the market yet, but it is supported by the Intel Apollo Lake platform. Signed-off-by: Yong, Jonathan--- drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 95d9e7b..c634fd11 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1335,6 +1335,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev) /* Active State Power Management */ pcie_aspm_create_sysfs_dev_files(dev); + /* PTM */ + pci_create_ptm_sysfs(dev); + if (!pci_probe_reset_function(dev)) { retval = device_create_file(>dev, _attr); if (retval) @@ -1433,6 +1436,10 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev) } pcie_aspm_remove_sysfs_dev_files(dev); + + /* PTM */ + pci_release_ptm_sysfs(dev); + if (dev->reset_fn) { device_remove_file(>dev, _attr); dev->reset_fn = 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9a1660f..fb90420 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -320,6 +320,27 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev, void pci_enable_acs(struct pci_dev *dev); +#ifdef CONFIG_PCIEPORTBUS +int pci_enable_ptm(struct pci_dev *dev); +void pci_create_ptm_sysfs(struct pci_dev *dev); +void pci_release_ptm_sysfs(struct pci_dev *dev); +void pci_disable_ptm(struct pci_dev *dev); +#else +static inline int pci_enable_ptm(struct pci_dev *dev) +{ + return -ENXIO; +} +static inline void pci_create_ptm_sysfs(struct pci_dev *dev) +{ +} +static inline void pci_release_ptm_sysfs(struct pci_dev *dev) +{ +} +static inline void pci_disable_ptm(struct pci_dev *dev) +{ +} +#endif + struct pci_dev_reset_methods { u16 vendor; u16 device; diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig index e294713..f65ff4d 100644 --- a/drivers/pci/pcie/Kconfig +++ b/drivers/pci/pcie/Kconfig @@ -80,3 +80,11 @@ endchoice config PCIE_PME def_bool y depends on PCIEPORTBUS && PM + +config PCIE_PTM + bool "Turn on Precision Time Management by default" + depends on PCIEPORTBUS + help + Say Y here to enable PTM feature on PCI Express devices that + support them as they are found during device enumeration. Otherwise + the feature can be enabled manually through sysfs entries. diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile index 00c62df..d18b4c7 100644 --- a/drivers/pci/pcie/Makefile +++ b/drivers/pci/pcie/Makefile @@ -5,7 +5,7 @@ # Build PCI Express ASPM if needed obj-$(CONFIG_PCIEASPM) += aspm.o -pcieportdrv-y := portdrv_core.o portdrv_pci.o portdrv_bus.o +pcieportdrv-y := portdrv_core.o portdrv_pci.o portdrv_bus.o pcie_ptm.o pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o diff --git a/drivers/pci/pcie/pcie_ptm.c b/drivers/pci/pcie/pcie_ptm.c new file mode 100644 index 000..a128c79 --- /dev/null +++ b/drivers/pci/pcie/pcie_ptm.c @@ -0,0 +1,353 @@ +/* + * PCI Express Precision Time Measurement + * Copyright (c) 2016, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ +#include +#include +#include +#include "../pci.h" + +#define PCI_PTM_REQ0x0001 /* Requester capable */ +#define PCI_PTM_RSP 0x0002 /* Responder capable */ +#define PCI_PTM_ROOT 0x0004 /* Root capable */ +#define PCI_PTM_GRANULITY 0xFF00 /* Local clock granulity */ +#define PCI_PTM_ENABLE 0x0001 /* PTM enable */ +#define PCI_PTM_ROOT_SEL 0x0002 /* Root select */ + +#define PCI_PTM_HEADER_REG_OFFSET
[RFC] PCI: PTM Driver
Hello LKML, This is a preliminary implementation of the PTM[1] support driver, the code is obviously hacked together and in need of refactoring. This driver has only been tested against a virtual PCI bus. The drivers job is to get to every PTM capable device, set some PCI config space bits, then go back to sleep [2]. PTM capable PCIe devices will get a new sysfs entry to allow PTM to be enabled if automatic PTM activation is disabled, or disabled if so desired. Comments? Should I explain the PTM registers in more details? Please CC me, thanks. [1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint clocks against the host clock as specified in the PCI Express Base Specification 3.1. It is identified by the 0x001f extended capability ID. PTM capable devices are split into 3 roles, master, responder and requester. Summary as follows: A master holds the master clock that will be used for all devices under its domain (not to be confused with PCI domains). There may be multiple masters in a PTM hierarchy, in which case, the highest master closest to the root complex will be selected for the PTM domain. A master is also always responder capable. Clock precision is signified by a Local Clock Granularity field, in nano-seconds. A responder responds to any PTM synchronization requests from a downstream device. A responder is typically a switch device. It may also hold a local clock signified by a non-zero Local Clock Granularity field. A value of 0 signifies that the device simply propagates timing information from upstream devices. A requester is typically an endpoint that will request synchronization updates from an upstream PTM capable time source. The driver will update the Effective Clock Granularity field based on the same field from the PTM domain master. The field should be programed with a value of 0 if any intervening responder has a Local Clock Granularity field value of 0. [2] The software drivers never see the PTM packets, the PCI Express Base Specificaton 3.1 reads: PTM capable components can make their PTM context available for inspection by software, enabling software to translate timing information between local times and PTM Master Time. This isn't very informative. Yong, Jonathan (1): PCI: PTM preliminary implementation drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c -- 2.4.10
[PATCH] PCI: PTM preliminary implementation
Simplified Precision Time Measurement driver, activates PTM feature if a PCIe PTM requester (as per PCI Express 3.1 Base Specification section 7.32)is found, but not before checking if the rest of the PCI hierarchy can support it. The driver does not take part in facilitating PTM conversations, neither does it provide any useful services, it is only responsible for setting up the required configuration space bits. As of writing, there aren't any PTM capable devices on the market yet, but it is supported by the Intel Apollo Lake platform. Signed-off-by: Yong, Jonathan --- drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 95d9e7b..c634fd11 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1335,6 +1335,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev) /* Active State Power Management */ pcie_aspm_create_sysfs_dev_files(dev); + /* PTM */ + pci_create_ptm_sysfs(dev); + if (!pci_probe_reset_function(dev)) { retval = device_create_file(>dev, _attr); if (retval) @@ -1433,6 +1436,10 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev) } pcie_aspm_remove_sysfs_dev_files(dev); + + /* PTM */ + pci_release_ptm_sysfs(dev); + if (dev->reset_fn) { device_remove_file(>dev, _attr); dev->reset_fn = 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9a1660f..fb90420 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -320,6 +320,27 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev, void pci_enable_acs(struct pci_dev *dev); +#ifdef CONFIG_PCIEPORTBUS +int pci_enable_ptm(struct pci_dev *dev); +void pci_create_ptm_sysfs(struct pci_dev *dev); +void pci_release_ptm_sysfs(struct pci_dev *dev); +void pci_disable_ptm(struct pci_dev *dev); +#else +static inline int pci_enable_ptm(struct pci_dev *dev) +{ + return -ENXIO; +} +static inline void pci_create_ptm_sysfs(struct pci_dev *dev) +{ +} +static inline void pci_release_ptm_sysfs(struct pci_dev *dev) +{ +} +static inline void pci_disable_ptm(struct pci_dev *dev) +{ +} +#endif + struct pci_dev_reset_methods { u16 vendor; u16 device; diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig index e294713..f65ff4d 100644 --- a/drivers/pci/pcie/Kconfig +++ b/drivers/pci/pcie/Kconfig @@ -80,3 +80,11 @@ endchoice config PCIE_PME def_bool y depends on PCIEPORTBUS && PM + +config PCIE_PTM + bool "Turn on Precision Time Management by default" + depends on PCIEPORTBUS + help + Say Y here to enable PTM feature on PCI Express devices that + support them as they are found during device enumeration. Otherwise + the feature can be enabled manually through sysfs entries. diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile index 00c62df..d18b4c7 100644 --- a/drivers/pci/pcie/Makefile +++ b/drivers/pci/pcie/Makefile @@ -5,7 +5,7 @@ # Build PCI Express ASPM if needed obj-$(CONFIG_PCIEASPM) += aspm.o -pcieportdrv-y := portdrv_core.o portdrv_pci.o portdrv_bus.o +pcieportdrv-y := portdrv_core.o portdrv_pci.o portdrv_bus.o pcie_ptm.o pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o diff --git a/drivers/pci/pcie/pcie_ptm.c b/drivers/pci/pcie/pcie_ptm.c new file mode 100644 index 000..a128c79 --- /dev/null +++ b/drivers/pci/pcie/pcie_ptm.c @@ -0,0 +1,353 @@ +/* + * PCI Express Precision Time Measurement + * Copyright (c) 2016, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ +#include +#include +#include +#include "../pci.h" + +#define PCI_PTM_REQ0x0001 /* Requester capable */ +#define PCI_PTM_RSP 0x0002 /* Responder capable */ +#define PCI_PTM_ROOT 0x0004 /* Root capable */ +#define PCI_PTM_GRANULITY 0xFF00 /* Local clock granulity */ +#define PCI_PTM_ENABLE 0x0001 /* PTM enable */ +#define PCI_PTM_ROOT_SEL 0x0002 /* Root select */ + +#define PCI_PTM_HEADER_REG_OFFSET 0x00 +#define
[RFC] PCI: PTM Driver
Hello LKML, This is a preliminary implementation of the PTM[1] support driver, the code is obviously hacked together and in need of refactoring. This driver has only been tested against a virtual PCI bus. The drivers job is to get to every PTM capable device, set some PCI config space bits, then go back to sleep [2]. PTM capable PCIe devices will get a new sysfs entry to allow PTM to be enabled if automatic PTM activation is disabled, or disabled if so desired. Comments? Should I explain the PTM registers in more details? Please CC me, thanks. [1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint clocks against the host clock as specified in the PCI Express Base Specification 3.1. It is identified by the 0x001f extended capability ID. PTM capable devices are split into 3 roles, master, responder and requester. Summary as follows: A master holds the master clock that will be used for all devices under its domain (not to be confused with PCI domains). There may be multiple masters in a PTM hierarchy, in which case, the highest master closest to the root complex will be selected for the PTM domain. A master is also always responder capable. Clock precision is signified by a Local Clock Granularity field, in nano-seconds. A responder responds to any PTM synchronization requests from a downstream device. A responder is typically a switch device. It may also hold a local clock signified by a non-zero Local Clock Granularity field. A value of 0 signifies that the device simply propagates timing information from upstream devices. A requester is typically an endpoint that will request synchronization updates from an upstream PTM capable time source. The driver will update the Effective Clock Granularity field based on the same field from the PTM domain master. The field should be programed with a value of 0 if any intervening responder has a Local Clock Granularity field value of 0. [2] The software drivers never see the PTM packets, the PCI Express Base Specificaton 3.1 reads: PTM capable components can make their PTM context available for inspection by software, enabling software to translate timing information between local times and PTM Master Time. This isn't very informative. Yong, Jonathan (1): PCI: PTM preliminary implementation drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c -- 2.4.10
Re: [GIT PULL] tpmdd fix
On Fri, 26 Feb 2016, Jarkko Sakkinen wrote: > Hi James, > > this is the fix for the build warning. > > /Jarkko > > The following changes since commit 481873d06f2bf2ad732450a3a5fa5b8c2a07ef88: > > Merge branch 'next' of > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next > (2016-02-26 15:06:41 +1100) > > are available in the git repository at: > > https://github.com/jsakkine/linux-tpmdd.git tags/tpmdd-next-20160226 > > for you to fetch changes up to 2cb6d6460f1a171c71c134e0efe3a94c2206d080: > > tpm_tis: fix build warning with tpm_tis_resume (2016-02-26 11:32:07 +0200) > > > tpmdd fix > > > Jarkko Sakkinen (1): > tpm_tis: fix build warning with tpm_tis_resume > Pulled to -next. -- James Morris
[RFC] PCI: PTM Driver
Hello LKML, This is a preliminary implementation of the PTM[1] support driver, the code is obviously hacked together and in need of refactoring. This driver has only been tested against a virtual PCI bus. The drivers job is to get to every PTM capable device, set some PCI config space bits, then go back to sleep [2]. PTM capable PCIe devices will get a new sysfs entry to allow PTM to be enabled if automatic PTM activation is disabled, or disabled if so desired. Comments? Should I explain the PTM registers in more details? Please CC me, thanks. [1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint clocks against the host clock as specified in the PCI Express Base Specification 3.1. It is identified by the 0x001f extended capability ID. PTM capable devices are split into 3 roles, master, responder and requester. Summary as follows: A master holds the master clock that will be used for all devices under its domain (not to be confused with PCI domains). There may be multiple masters in a PTM hierarchy, in which case, the highest master closest to the root complex will be selected for the PTM domain. A master is also always responder capable. Clock precision is signified by a Local Clock Granularity field, in nano-seconds. A responder responds to any PTM synchronization requests from a downstream device. A responder is typically a switch device. It may also hold a local clock signified by a non-zero Local Clock Granularity field. A value of 0 signifies that the device simply propagates timing information from upstream devices. A requester is typically an endpoint that will request synchronization updates from an upstream PTM capable time source. The driver will update the Effective Clock Granularity field based on the same field from the PTM domain master. The field should be programed with a value of 0 if any intervening responder has a Local Clock Granularity field value of 0. [2] The software drivers never see the PTM packets, the PCI Express Base Specificaton 3.1 reads: PTM capable components can make their PTM context available for inspection by software, enabling software to translate timing information between local times and PTM Master Time. This isn't very informative. Yong, Jonathan (1): PCI: PTM preliminary implementation drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c -- 2.4.10
Re: [GIT PULL] tpmdd fix
On Fri, 26 Feb 2016, Jarkko Sakkinen wrote: > Hi James, > > this is the fix for the build warning. > > /Jarkko > > The following changes since commit 481873d06f2bf2ad732450a3a5fa5b8c2a07ef88: > > Merge branch 'next' of > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next > (2016-02-26 15:06:41 +1100) > > are available in the git repository at: > > https://github.com/jsakkine/linux-tpmdd.git tags/tpmdd-next-20160226 > > for you to fetch changes up to 2cb6d6460f1a171c71c134e0efe3a94c2206d080: > > tpm_tis: fix build warning with tpm_tis_resume (2016-02-26 11:32:07 +0200) > > > tpmdd fix > > > Jarkko Sakkinen (1): > tpm_tis: fix build warning with tpm_tis_resume > Pulled to -next. -- James Morris
[RFC] PCI: PTM Driver
Hello LKML, This is a preliminary implementation of the PTM[1] support driver, the code is obviously hacked together and in need of refactoring. This driver has only been tested against a virtual PCI bus. The drivers job is to get to every PTM capable device, set some PCI config space bits, then go back to sleep [2]. PTM capable PCIe devices will get a new sysfs entry to allow PTM to be enabled if automatic PTM activation is disabled, or disabled if so desired. Comments? Should I explain the PTM registers in more details? Please CC me, thanks. [1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint clocks against the host clock as specified in the PCI Express Base Specification 3.1. It is identified by the 0x001f extended capability ID. PTM capable devices are split into 3 roles, master, responder and requester. Summary as follows: A master holds the master clock that will be used for all devices under its domain (not to be confused with PCI domains). There may be multiple masters in a PTM hierarchy, in which case, the highest master closest to the root complex will be selected for the PTM domain. A master is also always responder capable. Clock precision is signified by a Local Clock Granularity field, in nano-seconds. A responder responds to any PTM synchronization requests from a downstream device. A responder is typically a switch device. It may also hold a local clock signified by a non-zero Local Clock Granularity field. A value of 0 signifies that the device simply propagates timing information from upstream devices. A requester is typically an endpoint that will request synchronization updates from an upstream PTM capable time source. The driver will update the Effective Clock Granularity field based on the same field from the PTM domain master. The field should be programed with a value of 0 if any intervening responder has a Local Clock Granularity field value of 0. [2] The software drivers never see the PTM packets, the PCI Express Base Specificaton 3.1 reads: PTM capable components can make their PTM context available for inspection by software, enabling software to translate timing information between local times and PTM Master Time. This isn't very informative. Yong, Jonathan (1): PCI: PTM preliminary implementation drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c -- 2.4.10
Re: [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0
Le 23/10/2015 05:33, Scott Wood a écrit : On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote: Simplify csum_add(a, b) in case a or b is constant 0 Signed-off-by: Christophe Leroy--- arch/powerpc/include/asm/checksum.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index 56deea8..f8a9704 100644 --- a/arch/powerpc/include/asm/checksum.h +++ b/arch/powerpc/include/asm/checksum.h @@ -119,7 +119,13 @@ static inline __wsum csum_add(__wsum csum, __wsum addend) { #ifdef __powerpc64__ u64 res = (__force u64)csum; +#endif + if (__builtin_constant_p(csum) && csum == 0) + return addend; + if (__builtin_constant_p(addend) && addend == 0) + return csum; +#ifdef __powerpc64__ res += (__force u64)addend; return (__force __wsum)((u32)res + (res >> 32)); #else How often does this happen? In the following patch (9/9), csum_add() is used to implement csum_partial() for small blocks. In several places in the networking code, csum_partial() is called with 0 as initial sum. Christophe
Re: [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0
Le 23/10/2015 05:33, Scott Wood a écrit : On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote: Simplify csum_add(a, b) in case a or b is constant 0 Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/checksum.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index 56deea8..f8a9704 100644 --- a/arch/powerpc/include/asm/checksum.h +++ b/arch/powerpc/include/asm/checksum.h @@ -119,7 +119,13 @@ static inline __wsum csum_add(__wsum csum, __wsum addend) { #ifdef __powerpc64__ u64 res = (__force u64)csum; +#endif + if (__builtin_constant_p(csum) && csum == 0) + return addend; + if (__builtin_constant_p(addend) && addend == 0) + return csum; +#ifdef __powerpc64__ res += (__force u64)addend; return (__force __wsum)((u32)res + (res >> 32)); #else How often does this happen? In the following patch (9/9), csum_add() is used to implement csum_partial() for small blocks. In several places in the networking code, csum_partial() is called with 0 as initial sum. Christophe
Re: [PATCH 4/9] powerpc: inline ip_fast_csum()
Le 23/09/2015 07:43, Denis Kirjanov a écrit : On 9/22/15, Christophe Leroywrote: In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Hi Christophe, I did try it and see no difference on ppc64. Did you test with socklib with modified loopback and if so do you have any numbers? Hi Denis, I put a mftbl at start and end of ip_send_check() and tested on a MPC885: * Without ip_fast_csum() inlined, approxymatly 7 TB ticks are spent in ip_send_check() * With ip_fast_csum() inlined, approxymatly 5,4 TB ticks are spent in ip_send_check() So it is about 23% time reduction. Christophe
Re: [PATCH 4/9] powerpc: inline ip_fast_csum()
Le 23/09/2015 07:43, Denis Kirjanov a écrit : On 9/22/15, Christophe Leroy wrote: In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Hi Christophe, I did try it and see no difference on ppc64. Did you test with socklib with modified loopback and if so do you have any numbers? Hi Denis, I put a mftbl at start and end of ip_send_check() and tested on a MPC885: * Without ip_fast_csum() inlined, approxymatly 7 TB ticks are spent in ip_send_check() * With ip_fast_csum() inlined, approxymatly 5,4 TB ticks are spent in ip_send_check() So it is about 23% time reduction. Christophe
Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
On Mon, Feb 29, 2016 at 3:03 AM, Hugh Dickinswrote: > On Fri, 19 Feb 2016, Andrew Morton wrote: >> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov >> wrote: >> >> > >> What are your thoughts on this? >> > > >> > > My thoughts are NAK. A misleading stat is not so bad as a >> > > misleading stat whose meaning we change in some random kernel. >> > > >> > > By all means improve Documentation/filesystems/proc.txt on Cached. >> > > By all means promote Active(file)+Inactive(file)-Buffers as often a >> > > better measure (though Buffers itself is obscure to me - is it intended >> > > usually to approximate resident FS metadata?). By all means work on >> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions). >> > > >> > > We have to assume that Cached has been useful to some people, and that >> > > they've learnt to subtract Shmem from it, if slow or no swap concerns >> > > them. >> > > >> > > Added Konstantin to Cc: he's had valuable experience of people learning >> > > to adapt to the numbers that we put out. >> > > >> > >> > I think everything will ok. Subtraction of shmem isn't widespread practice, >> > more like secret knowledge. This wasn't documented and people who use >> > this should be aware that this might stop working at any time. So, ACK. >> >> It worries me as well - we're deliberately altering the behaviour of >> existing userspace code. Not all of those alterations will be welcome! >> >> We could add a shiny new field into meminfo and train people to migrate >> to that. But that would just be a sum of already-available fields. In >> an ideal world we could solve all of this with documentation and >> cluebatting (and some apologizing!). > > Ah, I missed this, and just sent a redundant addition to the thread; > followed by this doubly redundant addition. "Cached" has been used for ages as amount of "potentially free memory". This patch corrects it in original meaning and makes it closer to that "potential" meaining at the same time. MemAvailable means exactly that and thing else so logic behind it could be tuned and changed in the future. Thus, adding new fields makes no sense. BTW Glibc recently switched sysconf(_SC_PHYS_PAGES) / sysconf(_SC_AVPHYS_PAGES) from /proc/meminfo MemTotal / MemFree to sysinfo(2) totalram / freeram for performance reason. It seems possible to expose MemAvailable via sysinfo: there is space for one field. Probably it's also possible to switch _SC_AVPHYS_PAGES to really available memory and add memcg awareness too.
Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
On Mon, Feb 29, 2016 at 3:03 AM, Hugh Dickins wrote: > On Fri, 19 Feb 2016, Andrew Morton wrote: >> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov >> wrote: >> >> > >> What are your thoughts on this? >> > > >> > > My thoughts are NAK. A misleading stat is not so bad as a >> > > misleading stat whose meaning we change in some random kernel. >> > > >> > > By all means improve Documentation/filesystems/proc.txt on Cached. >> > > By all means promote Active(file)+Inactive(file)-Buffers as often a >> > > better measure (though Buffers itself is obscure to me - is it intended >> > > usually to approximate resident FS metadata?). By all means work on >> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions). >> > > >> > > We have to assume that Cached has been useful to some people, and that >> > > they've learnt to subtract Shmem from it, if slow or no swap concerns >> > > them. >> > > >> > > Added Konstantin to Cc: he's had valuable experience of people learning >> > > to adapt to the numbers that we put out. >> > > >> > >> > I think everything will ok. Subtraction of shmem isn't widespread practice, >> > more like secret knowledge. This wasn't documented and people who use >> > this should be aware that this might stop working at any time. So, ACK. >> >> It worries me as well - we're deliberately altering the behaviour of >> existing userspace code. Not all of those alterations will be welcome! >> >> We could add a shiny new field into meminfo and train people to migrate >> to that. But that would just be a sum of already-available fields. In >> an ideal world we could solve all of this with documentation and >> cluebatting (and some apologizing!). > > Ah, I missed this, and just sent a redundant addition to the thread; > followed by this doubly redundant addition. "Cached" has been used for ages as amount of "potentially free memory". This patch corrects it in original meaning and makes it closer to that "potential" meaining at the same time. MemAvailable means exactly that and thing else so logic behind it could be tuned and changed in the future. Thus, adding new fields makes no sense. BTW Glibc recently switched sysconf(_SC_PHYS_PAGES) / sysconf(_SC_AVPHYS_PAGES) from /proc/meminfo MemTotal / MemFree to sysinfo(2) totalram / freeram for performance reason. It seems possible to expose MemAvailable via sysinfo: there is space for one field. Probably it's also possible to switch _SC_AVPHYS_PAGES to really available memory and add memcg awareness too.
Re: [PATCH 1/2] sigaltstack: implement SS_AUTODISARM flag
29.02.2016 00:13, Stas Sergeev пишет: This patch implements the SS_AUTODISARM flag that can be ORed with SS_ONSTACK when forming ss_flags. When this flag is set, sigaltstack will be disabled when entering the signal handler; more precisely, after saving sas to uc_stack. When leaving the signal handler, the sigaltstack is restored by uc_stack. When this flag is used, it is safe to switch from sighandler with swapcontext(). Without this flag, the subsequent signal will corrupt the state of the switched-away sighandler. CC: Ingo MolnarCC: Peter Zijlstra CC: Richard Weinberger CC: Andrew Morton CC: Oleg Nesterov CC: Tejun Heo CC: Heinrich Schuchardt CC: Jason Low CC: Andrea Arcangeli CC: Frederic Weisbecker CC: Konstantin Khlebnikov CC: Josh Triplett CC: "Eric W. Biederman" CC: Aleksa Sarai CC: "Amanieu d'Antras" CC: Paul Moore CC: Sasha Levin CC: Palmer Dabbelt CC: Vladimir Davydov CC: linux-kernel@vger.kernel.org CC: linux-...@vger.kernel.org CC: Andy Lutomirski Signed-off-by: Stas Sergeev --- include/linux/sched.h | 1 + include/linux/signal.h | 4 +++- include/uapi/linux/signal.h | 3 +++ kernel/fork.c | 4 +++- kernel/signal.c | 23 --- 5 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a10494a..f561d34 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1587,6 +1587,7 @@ struct task_struct { unsigned long sas_ss_sp; size_t sas_ss_size; + unsigned sas_ss_flags; struct callback_head *task_works; diff --git a/include/linux/signal.h b/include/linux/signal.h index 92557bb..be3ebe0 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -432,8 +432,10 @@ int __save_altstack(stack_t __user *, unsigned long); stack_t __user *__uss = uss; \ struct task_struct *t = current; \ put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \ - put_user_ex(sas_ss_flags(sp), &__uss->ss_flags); \ + put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \ put_user_ex(t->sas_ss_size, &__uss->ss_size); \ + if (t->sas_ss_flags & SS_AUTODISARM) \ + t->sas_ss_size = 0; \ Should also reset flags here... Will send v4.
Re: [PATCH 1/2] sigaltstack: implement SS_AUTODISARM flag
29.02.2016 00:13, Stas Sergeev пишет: This patch implements the SS_AUTODISARM flag that can be ORed with SS_ONSTACK when forming ss_flags. When this flag is set, sigaltstack will be disabled when entering the signal handler; more precisely, after saving sas to uc_stack. When leaving the signal handler, the sigaltstack is restored by uc_stack. When this flag is used, it is safe to switch from sighandler with swapcontext(). Without this flag, the subsequent signal will corrupt the state of the switched-away sighandler. CC: Ingo Molnar CC: Peter Zijlstra CC: Richard Weinberger CC: Andrew Morton CC: Oleg Nesterov CC: Tejun Heo CC: Heinrich Schuchardt CC: Jason Low CC: Andrea Arcangeli CC: Frederic Weisbecker CC: Konstantin Khlebnikov CC: Josh Triplett CC: "Eric W. Biederman" CC: Aleksa Sarai CC: "Amanieu d'Antras" CC: Paul Moore CC: Sasha Levin CC: Palmer Dabbelt CC: Vladimir Davydov CC: linux-kernel@vger.kernel.org CC: linux-...@vger.kernel.org CC: Andy Lutomirski Signed-off-by: Stas Sergeev --- include/linux/sched.h | 1 + include/linux/signal.h | 4 +++- include/uapi/linux/signal.h | 3 +++ kernel/fork.c | 4 +++- kernel/signal.c | 23 --- 5 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a10494a..f561d34 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1587,6 +1587,7 @@ struct task_struct { unsigned long sas_ss_sp; size_t sas_ss_size; + unsigned sas_ss_flags; struct callback_head *task_works; diff --git a/include/linux/signal.h b/include/linux/signal.h index 92557bb..be3ebe0 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -432,8 +432,10 @@ int __save_altstack(stack_t __user *, unsigned long); stack_t __user *__uss = uss; \ struct task_struct *t = current; \ put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \ - put_user_ex(sas_ss_flags(sp), &__uss->ss_flags); \ + put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \ put_user_ex(t->sas_ss_size, &__uss->ss_size); \ + if (t->sas_ss_flags & SS_AUTODISARM) \ + t->sas_ss_size = 0; \ Should also reset flags here... Will send v4.
Re: [PATCH v10 2/2] cpufreq: powernv: Add sysfs attributes to show throttle stats
On 26-02-16, 16:06, Shilpasri G Bhat wrote: > +static int powernv_cpufreq_policy_notifier(struct notifier_block *nb, > +unsigned long action, void *data) > +{ > + struct cpufreq_policy *policy = data; > + int ret; > + > + if (action == CPUFREQ_CREATE_POLICY) { > + ret = sysfs_create_group(>kobj, _attr_grp); > + if (ret) > + pr_info("Failed to create throttle stats directory for > cpu %d\n", > + policy->cpu); > + } else if (action == CPUFREQ_REMOVE_POLICY) { > + sysfs_remove_group(>kobj, _attr_grp); > + } > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block powernv_cpufreq_policy_nb = { > + .notifier_call = powernv_cpufreq_policy_notifier, > + .next = NULL, > +}; > + > static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) > { > struct powernv_smp_call_data freq_data; > @@ -603,6 +708,8 @@ static inline void clean_chip_info(void) > > static inline void unregister_all_notifiers(void) > { > + cpufreq_unregister_notifier(_cpufreq_policy_nb, > + CPUFREQ_POLICY_NOTIFIER); > opal_message_notifier_unregister(OPAL_MSG_OCC, >_cpufreq_opal_nb); > unregister_reboot_notifier(_cpufreq_reboot_nb); > @@ -628,6 +735,8 @@ static int __init powernv_cpufreq_init(void) > > register_reboot_notifier(_cpufreq_reboot_nb); > opal_message_notifier_register(OPAL_MSG_OCC, _cpufreq_opal_nb); > + cpufreq_register_notifier(_cpufreq_policy_nb, > + CPUFREQ_POLICY_NOTIFIER); > > rc = cpufreq_register_driver(_cpufreq_driver); > if (!rc) @Rafael: This driver needs to do this *ugly* notifier hack, just because we aren't doing kobject_add() for policy->kobj before ->init(). And we did that because, we wanted to create the policyX structure with the first CPU in policy->related_cpus mask and related_cpus mask isn't available until we call ->init().. Should we do something in core to make this easier for this driver? -- viresh
linux-next: manual merge of the target-merge tree with the net-next tree
Hi Nicholas, Today's linux-next merge of the target-merge tree got a conflict in: drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h between commit: ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support") from the net-next tree and commit: c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD") from the target-merge tree. I fixed it up (the latter was a superset of the former) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell
Re: [PATCH v10 2/2] cpufreq: powernv: Add sysfs attributes to show throttle stats
On 26-02-16, 16:06, Shilpasri G Bhat wrote: > +static int powernv_cpufreq_policy_notifier(struct notifier_block *nb, > +unsigned long action, void *data) > +{ > + struct cpufreq_policy *policy = data; > + int ret; > + > + if (action == CPUFREQ_CREATE_POLICY) { > + ret = sysfs_create_group(>kobj, _attr_grp); > + if (ret) > + pr_info("Failed to create throttle stats directory for > cpu %d\n", > + policy->cpu); > + } else if (action == CPUFREQ_REMOVE_POLICY) { > + sysfs_remove_group(>kobj, _attr_grp); > + } > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block powernv_cpufreq_policy_nb = { > + .notifier_call = powernv_cpufreq_policy_notifier, > + .next = NULL, > +}; > + > static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) > { > struct powernv_smp_call_data freq_data; > @@ -603,6 +708,8 @@ static inline void clean_chip_info(void) > > static inline void unregister_all_notifiers(void) > { > + cpufreq_unregister_notifier(_cpufreq_policy_nb, > + CPUFREQ_POLICY_NOTIFIER); > opal_message_notifier_unregister(OPAL_MSG_OCC, >_cpufreq_opal_nb); > unregister_reboot_notifier(_cpufreq_reboot_nb); > @@ -628,6 +735,8 @@ static int __init powernv_cpufreq_init(void) > > register_reboot_notifier(_cpufreq_reboot_nb); > opal_message_notifier_register(OPAL_MSG_OCC, _cpufreq_opal_nb); > + cpufreq_register_notifier(_cpufreq_policy_nb, > + CPUFREQ_POLICY_NOTIFIER); > > rc = cpufreq_register_driver(_cpufreq_driver); > if (!rc) @Rafael: This driver needs to do this *ugly* notifier hack, just because we aren't doing kobject_add() for policy->kobj before ->init(). And we did that because, we wanted to create the policyX structure with the first CPU in policy->related_cpus mask and related_cpus mask isn't available until we call ->init().. Should we do something in core to make this easier for this driver? -- viresh
linux-next: manual merge of the target-merge tree with the net-next tree
Hi Nicholas, Today's linux-next merge of the target-merge tree got a conflict in: drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h between commit: ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support") from the net-next tree and commit: c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD") from the target-merge tree. I fixed it up (the latter was a superset of the former) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell
Re: [PATCH] mm/zsmalloc: add compact column to pool stat
Hello, On (02/29/16 15:02), Minchan Kim wrote: > On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote: > > Add a new column to pool stats, which will tell us class' zs_can_compact() > > number, so it will be easier to analyze zsmalloc fragmentation. > > Just nitpick: > > Strictly speaking, zs_can_compact number is number of "ideal freeable page > by compaction". How about using high level term in description rather than > function name? OK, makes sense. > > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but > > they don't tell us how badly the class is fragmented internally. > > > > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows: > > > > class size almost_full almost_empty obj_allocated obj_used pages_used > > pages_per_zspage compact > > [..] > > 12 224 02 146 5 8 > > 4 4 > > 13 240 00 0 0 0 > > 1 0 > > 14 256 1 13 1840 1672115 > > 1 10 > > 15 272 00 0 0 0 > > 1 0 > > [..] > > 49 816 03 745735149 > > 1 2 > > 51 848 34 361306 76 > > 4 8 > > 52 864 12 14 378268 81 > > 3 21 > > 54 896 1 12 117 57 26 > > 2 12 > > 57 944 00 0 0 0 > > 3 0 > > [..] > > Total26 131 12709 10994 1071 > >134 > > > > For example, from this particular output we can easily conclude that > > class-896 > > is heavily fragmented -- it occupies 26 pages, 12 can be freed by > > compaction. > > How about using "freeable" or something which could represent "freeable"? > IMO, it's more strightforward for user. OK. didn't want to put any long column name there, which would bloat the output. will take a look. > Other than that, > > Acked-by: Minchan Kim> > > Thanks for the nice job! thanks. -ss
Re: [PATCH] mm/zsmalloc: add compact column to pool stat
Hello, On (02/29/16 15:02), Minchan Kim wrote: > On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote: > > Add a new column to pool stats, which will tell us class' zs_can_compact() > > number, so it will be easier to analyze zsmalloc fragmentation. > > Just nitpick: > > Strictly speaking, zs_can_compact number is number of "ideal freeable page > by compaction". How about using high level term in description rather than > function name? OK, makes sense. > > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but > > they don't tell us how badly the class is fragmented internally. > > > > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows: > > > > class size almost_full almost_empty obj_allocated obj_used pages_used > > pages_per_zspage compact > > [..] > > 12 224 02 146 5 8 > > 4 4 > > 13 240 00 0 0 0 > > 1 0 > > 14 256 1 13 1840 1672115 > > 1 10 > > 15 272 00 0 0 0 > > 1 0 > > [..] > > 49 816 03 745735149 > > 1 2 > > 51 848 34 361306 76 > > 4 8 > > 52 864 12 14 378268 81 > > 3 21 > > 54 896 1 12 117 57 26 > > 2 12 > > 57 944 00 0 0 0 > > 3 0 > > [..] > > Total26 131 12709 10994 1071 > >134 > > > > For example, from this particular output we can easily conclude that > > class-896 > > is heavily fragmented -- it occupies 26 pages, 12 can be freed by > > compaction. > > How about using "freeable" or something which could represent "freeable"? > IMO, it's more strightforward for user. OK. didn't want to put any long column name there, which would bloat the output. will take a look. > Other than that, > > Acked-by: Minchan Kim > > > Thanks for the nice job! thanks. -ss
Re: [PATCH] mm/zsmalloc: add compact column to pool stat
On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote: > Add a new column to pool stats, which will tell us class' zs_can_compact() > number, so it will be easier to analyze zsmalloc fragmentation. Just nitpick: Strictly speaking, zs_can_compact number is number of "ideal freeable page by compaction". How about using high level term in description rather than function name? > > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but > they don't tell us how badly the class is fragmented internally. > > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows: > > class size almost_full almost_empty obj_allocated obj_used pages_used > pages_per_zspage compact > [..] > 12 224 02 146 5 8 >4 4 > 13 240 00 0 0 0 >1 0 > 14 256 1 13 1840 1672115 >1 10 > 15 272 00 0 0 0 >1 0 > [..] > 49 816 03 745735149 >1 2 > 51 848 34 361306 76 >4 8 > 52 864 12 14 378268 81 >3 21 > 54 896 1 12 117 57 26 >2 12 > 57 944 00 0 0 0 >3 0 > [..] > Total26 131 12709 10994 1071 > 134 > > For example, from this particular output we can easily conclude that class-896 > is heavily fragmented -- it occupies 26 pages, 12 can be freed by compaction. How about using "freeable" or something which could represent "freeable"? IMO, it's more strightforward for user. Other than that, Acked-by: Minchan KimThanks for the nice job!
Re: [PATCH] mm/zsmalloc: add compact column to pool stat
On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote: > Add a new column to pool stats, which will tell us class' zs_can_compact() > number, so it will be easier to analyze zsmalloc fragmentation. Just nitpick: Strictly speaking, zs_can_compact number is number of "ideal freeable page by compaction". How about using high level term in description rather than function name? > > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but > they don't tell us how badly the class is fragmented internally. > > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows: > > class size almost_full almost_empty obj_allocated obj_used pages_used > pages_per_zspage compact > [..] > 12 224 02 146 5 8 >4 4 > 13 240 00 0 0 0 >1 0 > 14 256 1 13 1840 1672115 >1 10 > 15 272 00 0 0 0 >1 0 > [..] > 49 816 03 745735149 >1 2 > 51 848 34 361306 76 >4 8 > 52 864 12 14 378268 81 >3 21 > 54 896 1 12 117 57 26 >2 12 > 57 944 00 0 0 0 >3 0 > [..] > Total26 131 12709 10994 1071 > 134 > > For example, from this particular output we can easily conclude that class-896 > is heavily fragmented -- it occupies 26 pages, 12 can be freed by compaction. How about using "freeable" or something which could represent "freeable"? IMO, it's more strightforward for user. Other than that, Acked-by: Minchan Kim Thanks for the nice job!
Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote: > The two header files got moved to include/linux, and most > users were already converted, this changes the remaining drivers > and removes the files. > > Signed-off-by: Arnd Bergmann> --- > drivers/dma/idma64.h| 2 +- For this: Acked-by: Vinod Koul Thanks -- ~Vinod
Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote: > The two header files got moved to include/linux, and most > users were already converted, this changes the remaining drivers > and removes the files. > > Signed-off-by: Arnd Bergmann > --- > drivers/dma/idma64.h| 2 +- For this: Acked-by: Vinod Koul Thanks -- ~Vinod
Re: [PATCH v3 22/22] sound/usb: Use Media Controller API to share media resources
On 02/27/2016 12:48 AM, Takashi Iwai wrote: > On Sat, 27 Feb 2016 03:55:39 +0100, > Shuah Khan wrote: >> >> On 02/26/2016 01:50 PM, Takashi Iwai wrote: >>> On Fri, 26 Feb 2016 21:08:43 +0100, >>> Shuah Khan wrote: On 02/26/2016 12:55 PM, Takashi Iwai wrote: > On Fri, 12 Feb 2016 00:41:38 +0100, > Shuah Khan wrote: >> >> Change ALSA driver to use Media Controller API to >> share media resources with DVB and V4L2 drivers >> on a AU0828 media device. Media Controller specific >> initialization is done after sound card is registered. >> ALSA creates Media interface and entity function graph >> nodes for Control, Mixer, PCM Playback, and PCM Capture >> devices. >> >> snd_usb_hw_params() will call Media Controller enable >> source handler interface to request the media resource. >> If resource request is granted, it will release it from >> snd_usb_hw_free(). If resource is busy, -EBUSY is returned. >> >> Media specific cleanup is done in usb_audio_disconnect(). >> >> Signed-off-by: Shuah Khan>> --- >> sound/usb/Kconfig| 4 + >> sound/usb/Makefile | 2 + >> sound/usb/card.c | 14 +++ >> sound/usb/card.h | 3 + >> sound/usb/media.c| 318 >> +++ >> sound/usb/media.h| 72 +++ >> sound/usb/mixer.h| 3 + >> sound/usb/pcm.c | 28 - >> sound/usb/quirks-table.h | 1 + >> sound/usb/stream.c | 2 + >> sound/usb/usbaudio.h | 6 + >> 11 files changed, 448 insertions(+), 5 deletions(-) >> create mode 100644 sound/usb/media.c >> create mode 100644 sound/usb/media.h >> >> diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig >> index a452ad7..ba117f5 100644 >> --- a/sound/usb/Kconfig >> +++ b/sound/usb/Kconfig >> @@ -15,6 +15,7 @@ config SND_USB_AUDIO >> select SND_RAWMIDI >> select SND_PCM >> select BITREVERSE >> +select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER >> && MEDIA_SUPPORT > > Looking at the media Kconfig again, this would be broken if > MEDIA_SUPPORT=m and SND_USB_AUDIO=y. The ugly workaround is something > like: > select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \ > if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND) My current config is MEDIA_SUPPORT=m and SND_USB_AUDIO=y It is working and I didn't see any issues so far. >>> >>> Hmm, how does it be? In drivers/media/Makefile: >>> >>> ifeq ($(CONFIG_MEDIA_CONTROLLER),y) >>> obj-$(CONFIG_MEDIA_SUPPORT) += media.o >>> endif >>> >>> So it's a module. Meanwhile you have reference from usb-audio driver >>> that is built-in kernel. How is the symbol resolved? >> >> Sorry my mistake. I misspoke. My config had: >> CONFIG_MEDIA_SUPPORT=m >> CONFIG_MEDIA_CONTROLLER=y >> CONFIG_SND_USB_AUDIO=m >> >> The following doesn't work as you pointed out. >> >> CONFIG_MEDIA_SUPPORT=m >> CONFIG_MEDIA_CONTROLLER=y >> CONFIG_SND_USB_AUDIO=y >> >> okay here is what will work for all of the possible >> combinations of CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO >> >> select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \ >>if MEDIA_CONTROLLER && ((MEDIA_SUPPORT=y) || (MEDIA_SUPPORT=m && >> SND_USB_AUDIO=m)) >> >> The above will cover the cases when >> >> 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are >>both modules >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y >>This is when we don't want >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected >> >> I verified all of the above combinations to make sure >> the logic works. >> >> If you think of a better way to do this please let me >> know. I will go ahead and send patch v4 with the above >> change and you can decide if that is acceptable. > > I'm not 100% sure whether CONFIG_SND_USB_AUDIO=m can be put there as > conditional inside CONFIG_SND_USB_AUDIO definition. Maybe a safer > form would be like: > > config SND_USB_AUDIO_USE_MEDIA_CONTROLLER > bool > default y > depends on SND_USB_AUDIO > depends on MEDIA_CONTROLLER > depends on (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) > > and drop select from SND_USB_AUDIO. > > > Other than that, it looks more or less OK to me. > The way how media_stream_init() gets called is a bit worrisome, but it > should work practically. Another concern is about the disconnection. > Can all function calls in media_device_delete() be safe even if it's > called while
Re: [PATCH v3 22/22] sound/usb: Use Media Controller API to share media resources
On 02/27/2016 12:48 AM, Takashi Iwai wrote: > On Sat, 27 Feb 2016 03:55:39 +0100, > Shuah Khan wrote: >> >> On 02/26/2016 01:50 PM, Takashi Iwai wrote: >>> On Fri, 26 Feb 2016 21:08:43 +0100, >>> Shuah Khan wrote: On 02/26/2016 12:55 PM, Takashi Iwai wrote: > On Fri, 12 Feb 2016 00:41:38 +0100, > Shuah Khan wrote: >> >> Change ALSA driver to use Media Controller API to >> share media resources with DVB and V4L2 drivers >> on a AU0828 media device. Media Controller specific >> initialization is done after sound card is registered. >> ALSA creates Media interface and entity function graph >> nodes for Control, Mixer, PCM Playback, and PCM Capture >> devices. >> >> snd_usb_hw_params() will call Media Controller enable >> source handler interface to request the media resource. >> If resource request is granted, it will release it from >> snd_usb_hw_free(). If resource is busy, -EBUSY is returned. >> >> Media specific cleanup is done in usb_audio_disconnect(). >> >> Signed-off-by: Shuah Khan >> --- >> sound/usb/Kconfig| 4 + >> sound/usb/Makefile | 2 + >> sound/usb/card.c | 14 +++ >> sound/usb/card.h | 3 + >> sound/usb/media.c| 318 >> +++ >> sound/usb/media.h| 72 +++ >> sound/usb/mixer.h| 3 + >> sound/usb/pcm.c | 28 - >> sound/usb/quirks-table.h | 1 + >> sound/usb/stream.c | 2 + >> sound/usb/usbaudio.h | 6 + >> 11 files changed, 448 insertions(+), 5 deletions(-) >> create mode 100644 sound/usb/media.c >> create mode 100644 sound/usb/media.h >> >> diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig >> index a452ad7..ba117f5 100644 >> --- a/sound/usb/Kconfig >> +++ b/sound/usb/Kconfig >> @@ -15,6 +15,7 @@ config SND_USB_AUDIO >> select SND_RAWMIDI >> select SND_PCM >> select BITREVERSE >> +select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER >> && MEDIA_SUPPORT > > Looking at the media Kconfig again, this would be broken if > MEDIA_SUPPORT=m and SND_USB_AUDIO=y. The ugly workaround is something > like: > select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \ > if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND) My current config is MEDIA_SUPPORT=m and SND_USB_AUDIO=y It is working and I didn't see any issues so far. >>> >>> Hmm, how does it be? In drivers/media/Makefile: >>> >>> ifeq ($(CONFIG_MEDIA_CONTROLLER),y) >>> obj-$(CONFIG_MEDIA_SUPPORT) += media.o >>> endif >>> >>> So it's a module. Meanwhile you have reference from usb-audio driver >>> that is built-in kernel. How is the symbol resolved? >> >> Sorry my mistake. I misspoke. My config had: >> CONFIG_MEDIA_SUPPORT=m >> CONFIG_MEDIA_CONTROLLER=y >> CONFIG_SND_USB_AUDIO=m >> >> The following doesn't work as you pointed out. >> >> CONFIG_MEDIA_SUPPORT=m >> CONFIG_MEDIA_CONTROLLER=y >> CONFIG_SND_USB_AUDIO=y >> >> okay here is what will work for all of the possible >> combinations of CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO >> >> select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \ >>if MEDIA_CONTROLLER && ((MEDIA_SUPPORT=y) || (MEDIA_SUPPORT=m && >> SND_USB_AUDIO=m)) >> >> The above will cover the cases when >> >> 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are >>both modules >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y >>This is when we don't want >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected >> >> I verified all of the above combinations to make sure >> the logic works. >> >> If you think of a better way to do this please let me >> know. I will go ahead and send patch v4 with the above >> change and you can decide if that is acceptable. > > I'm not 100% sure whether CONFIG_SND_USB_AUDIO=m can be put there as > conditional inside CONFIG_SND_USB_AUDIO definition. Maybe a safer > form would be like: > > config SND_USB_AUDIO_USE_MEDIA_CONTROLLER > bool > default y > depends on SND_USB_AUDIO > depends on MEDIA_CONTROLLER > depends on (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) > > and drop select from SND_USB_AUDIO. > > > Other than that, it looks more or less OK to me. > The way how media_stream_init() gets called is a bit worrisome, but it > should work practically. Another concern is about the disconnection. > Can all function calls in media_device_delete() be safe even if it's > called while the application still
[PATCH v4 22/22] sound/usb: Use Media Controller API to share media resources
Change ALSA driver to use Media Controller API to share media resources with DVB and V4L2 drivers on a AU0828 media device. Media Controller specific initialization is done after sound card is registered. ALSA creates Media interface and entity function graph nodes for Control, Mixer, PCM Playback, and PCM Capture devices. snd_usb_hw_params() will call Media Controller enable source handler interface to request the media resource. If resource request is granted, it will release it from snd_usb_hw_free(). If resource is busy, -EBUSY is returned. Media specific cleanup is done in usb_audio_disconnect(). Signed-off-by: Shuah Khan--- Changes since v3: - Fixed Kconfig to handle the following 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are both modules CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y This is when we don't want CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected sound/usb/Kconfig| 4 + sound/usb/Makefile | 2 + sound/usb/card.c | 14 +++ sound/usb/card.h | 3 + sound/usb/media.c| 318 +++ sound/usb/media.h| 72 +++ sound/usb/mixer.h| 3 + sound/usb/pcm.c | 28 - sound/usb/quirks-table.h | 1 + sound/usb/stream.c | 2 + sound/usb/usbaudio.h | 6 + 11 files changed, 448 insertions(+), 5 deletions(-) create mode 100644 sound/usb/media.c create mode 100644 sound/usb/media.h diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig index a452ad7..d14bf41 100644 --- a/sound/usb/Kconfig +++ b/sound/usb/Kconfig @@ -15,6 +15,7 @@ config SND_USB_AUDIO select SND_RAWMIDI select SND_PCM select BITREVERSE + select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) help Say Y here to include support for USB audio and USB MIDI devices. @@ -22,6 +23,9 @@ config SND_USB_AUDIO To compile this driver as a module, choose M here: the module will be called snd-usb-audio. +config SND_USB_AUDIO_USE_MEDIA_CONTROLLER + bool + config SND_USB_UA101 tristate "Edirol UA-101/UA-1000 driver" select SND_PCM diff --git a/sound/usb/Makefile b/sound/usb/Makefile index 2d2d122..8dca3c4 100644 --- a/sound/usb/Makefile +++ b/sound/usb/Makefile @@ -15,6 +15,8 @@ snd-usb-audio-objs := card.o \ quirks.o \ stream.o +snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o + snd-usbmidi-lib-objs := midi.o # Toplevel Module Dependency diff --git a/sound/usb/card.c b/sound/usb/card.c index 1f09d95..35fe256 100644 --- a/sound/usb/card.c +++ b/sound/usb/card.c @@ -66,6 +66,7 @@ #include "format.h" #include "power.h" #include "stream.h" +#include "media.h" MODULE_AUTHOR("Takashi Iwai "); MODULE_DESCRIPTION("USB Audio"); @@ -561,6 +562,11 @@ static int usb_audio_probe(struct usb_interface *intf, if (err < 0) goto __error; + if (quirk->media_device) { + /* don't want to fail when media_device_create() fails */ + media_device_create(chip, intf); + } + usb_chip[chip->index] = chip; chip->num_interfaces++; usb_set_intfdata(intf, chip); @@ -617,6 +623,14 @@ static void usb_audio_disconnect(struct usb_interface *intf) list_for_each(p, >midi_list) { snd_usbmidi_disconnect(p); } + /* +* Nice to check quirk && quirk->media_device +* need some special handlings. Doesn't look like +* we have access to quirk here +* Acceses mixer_list + */ + media_device_delete(chip); + /* release mixer resources */ list_for_each_entry(mixer, >mixer_list, list) { snd_usb_mixer_disconnect(mixer); diff --git a/sound/usb/card.h b/sound/usb/card.h index 71778ca..34a0898 100644 --- a/sound/usb/card.h +++ b/sound/usb/card.h @@ -105,6 +105,8 @@ struct snd_usb_endpoint { struct list_head list; }; +struct media_ctl; + struct snd_usb_substream { struct snd_usb_stream *stream; struct usb_device *dev; @@ -156,6 +158,7 @@ struct snd_usb_substream { } dsd_dop; bool trigger_tstamp_pending_update; /* trigger timestamp being updated from initial estimate */ + struct media_ctl *media_ctl; }; struct snd_usb_stream { diff --git a/sound/usb/media.c b/sound/usb/media.c new file mode 100644 index 000..cff1459 ---
[PATCH v4 22/22] sound/usb: Use Media Controller API to share media resources
Change ALSA driver to use Media Controller API to share media resources with DVB and V4L2 drivers on a AU0828 media device. Media Controller specific initialization is done after sound card is registered. ALSA creates Media interface and entity function graph nodes for Control, Mixer, PCM Playback, and PCM Capture devices. snd_usb_hw_params() will call Media Controller enable source handler interface to request the media resource. If resource request is granted, it will release it from snd_usb_hw_free(). If resource is busy, -EBUSY is returned. Media specific cleanup is done in usb_audio_disconnect(). Signed-off-by: Shuah Khan --- Changes since v3: - Fixed Kconfig to handle the following 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are both modules CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y This is when we don't want CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected sound/usb/Kconfig| 4 + sound/usb/Makefile | 2 + sound/usb/card.c | 14 +++ sound/usb/card.h | 3 + sound/usb/media.c| 318 +++ sound/usb/media.h| 72 +++ sound/usb/mixer.h| 3 + sound/usb/pcm.c | 28 - sound/usb/quirks-table.h | 1 + sound/usb/stream.c | 2 + sound/usb/usbaudio.h | 6 + 11 files changed, 448 insertions(+), 5 deletions(-) create mode 100644 sound/usb/media.c create mode 100644 sound/usb/media.h diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig index a452ad7..d14bf41 100644 --- a/sound/usb/Kconfig +++ b/sound/usb/Kconfig @@ -15,6 +15,7 @@ config SND_USB_AUDIO select SND_RAWMIDI select SND_PCM select BITREVERSE + select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) help Say Y here to include support for USB audio and USB MIDI devices. @@ -22,6 +23,9 @@ config SND_USB_AUDIO To compile this driver as a module, choose M here: the module will be called snd-usb-audio. +config SND_USB_AUDIO_USE_MEDIA_CONTROLLER + bool + config SND_USB_UA101 tristate "Edirol UA-101/UA-1000 driver" select SND_PCM diff --git a/sound/usb/Makefile b/sound/usb/Makefile index 2d2d122..8dca3c4 100644 --- a/sound/usb/Makefile +++ b/sound/usb/Makefile @@ -15,6 +15,8 @@ snd-usb-audio-objs := card.o \ quirks.o \ stream.o +snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o + snd-usbmidi-lib-objs := midi.o # Toplevel Module Dependency diff --git a/sound/usb/card.c b/sound/usb/card.c index 1f09d95..35fe256 100644 --- a/sound/usb/card.c +++ b/sound/usb/card.c @@ -66,6 +66,7 @@ #include "format.h" #include "power.h" #include "stream.h" +#include "media.h" MODULE_AUTHOR("Takashi Iwai "); MODULE_DESCRIPTION("USB Audio"); @@ -561,6 +562,11 @@ static int usb_audio_probe(struct usb_interface *intf, if (err < 0) goto __error; + if (quirk->media_device) { + /* don't want to fail when media_device_create() fails */ + media_device_create(chip, intf); + } + usb_chip[chip->index] = chip; chip->num_interfaces++; usb_set_intfdata(intf, chip); @@ -617,6 +623,14 @@ static void usb_audio_disconnect(struct usb_interface *intf) list_for_each(p, >midi_list) { snd_usbmidi_disconnect(p); } + /* +* Nice to check quirk && quirk->media_device +* need some special handlings. Doesn't look like +* we have access to quirk here +* Acceses mixer_list + */ + media_device_delete(chip); + /* release mixer resources */ list_for_each_entry(mixer, >mixer_list, list) { snd_usb_mixer_disconnect(mixer); diff --git a/sound/usb/card.h b/sound/usb/card.h index 71778ca..34a0898 100644 --- a/sound/usb/card.h +++ b/sound/usb/card.h @@ -105,6 +105,8 @@ struct snd_usb_endpoint { struct list_head list; }; +struct media_ctl; + struct snd_usb_substream { struct snd_usb_stream *stream; struct usb_device *dev; @@ -156,6 +158,7 @@ struct snd_usb_substream { } dsd_dop; bool trigger_tstamp_pending_update; /* trigger timestamp being updated from initial estimate */ + struct media_ctl *media_ctl; }; struct snd_usb_stream { diff --git a/sound/usb/media.c b/sound/usb/media.c new file mode 100644 index 000..cff1459 --- /dev/null +++ b/sound/usb/media.c @@ -0,0
[PATCH] phy: Fix armada375 compile test build on UM
The phy-armada375-usb2 driver uses IOMEM functions so COMPILE_TEST && OF build failed with: drivers/built-in.o: In function `armada375_usb_phy_probe': phy-armada375-usb2.c:(.text+0x121d): undefined reference to `devm_ioremap_resource' Signed-off-by: Krzysztof Kozlowski--- drivers/phy/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig index 0124d17bd9fe..786a9d6356b8 100644 --- a/drivers/phy/Kconfig +++ b/drivers/phy/Kconfig @@ -32,7 +32,7 @@ config PHY_BERLIN_SATA config ARMADA375_USBCLUSTER_PHY def_bool y depends on MACH_ARMADA_375 || COMPILE_TEST - depends on OF + depends on OF && HAS_IOMEM select GENERIC_PHY config PHY_DM816X_USB -- 2.5.0
[PATCH] phy: Fix armada375 compile test build on UM
The phy-armada375-usb2 driver uses IOMEM functions so COMPILE_TEST && OF build failed with: drivers/built-in.o: In function `armada375_usb_phy_probe': phy-armada375-usb2.c:(.text+0x121d): undefined reference to `devm_ioremap_resource' Signed-off-by: Krzysztof Kozlowski --- drivers/phy/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig index 0124d17bd9fe..786a9d6356b8 100644 --- a/drivers/phy/Kconfig +++ b/drivers/phy/Kconfig @@ -32,7 +32,7 @@ config PHY_BERLIN_SATA config ARMADA375_USBCLUSTER_PHY def_bool y depends on MACH_ARMADA_375 || COMPILE_TEST - depends on OF + depends on OF && HAS_IOMEM select GENERIC_PHY config PHY_DM816X_USB -- 2.5.0
[GIT PULL] extcon next for 4.6
Dear Greg, This is extcon-next pull request for v4.6. I add detailed description of this pull request on below. Please pull extcon with following updates. Best Regards, Chanwoo Choi The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d: Linux 4.5-rc1 (2016-01-24 13:06:47 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git tags/extcon-next-for-4.6 for you to fetch changes up to ae64e42cc2b3a17ac0c11815f53211093a54cf55: extcon: palmas: Drop IRQF_EARLY_RESUME flag (2016-02-29 11:07:34 +0900) Update extcon for 4.6 Detailed description for patchset: 1. Add new EXTCON_CHG_USB_SDP type - SDP (Standard Downstream Port) USB Charging Port means the charging connector.a 2. Add the VBUS detection by using GPIO on extcon-palmas - Beaglex15 board uses the extcon-palmas driver But, beaglex15 board need the GPIO support for VBUS detection. 3. Fix the minor issue of extcon drivers Chanwoo Choi (1): extcon: Add the EXTCON_CHG_USB_SDP to support SDP charing port Charles Keepax (1): extcon: arizona: Use DAPM mutex helper functions Dan Carpenter (1): extcon: max77843: Use correct size for reading the interrupt register Felipe Balbi (3): extcon: palmas: Add the support for VBUS detection by using GPIO arm: boot: dts: beaglex15: Remove ID GPIO arm: boot: beaglex15: pass correct interrupt Geliang Tang (1): extcon: Use to_i2c_client for both rt8973a and sm5502 Grygorii Strashko (1): extcon: palmas: Drop IRQF_EARLY_RESUME flag Moritz Fischer (1): extcon: gpio: Fix typo in comment arch/arm/boot/dts/am57xx-beagle-x15.dts | 3 +- drivers/extcon/extcon-arizona.c | 4 +-- drivers/extcon/extcon-gpio.c| 2 +- drivers/extcon/extcon-max14577.c| 3 ++ drivers/extcon/extcon-max77693.c| 12 +++- drivers/extcon/extcon-max77843.c| 5 ++- drivers/extcon/extcon-max8997.c | 3 ++ drivers/extcon/extcon-palmas.c | 54 +++-- drivers/extcon/extcon-rt8973a.c | 8 +++-- drivers/extcon/extcon-sm5502.c | 8 +++-- include/linux/mfd/palmas.h | 3 ++ 11 files changed, 92 insertions(+), 13 deletions(-)
[GIT PULL] extcon next for 4.6
Dear Greg, This is extcon-next pull request for v4.6. I add detailed description of this pull request on below. Please pull extcon with following updates. Best Regards, Chanwoo Choi The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d: Linux 4.5-rc1 (2016-01-24 13:06:47 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git tags/extcon-next-for-4.6 for you to fetch changes up to ae64e42cc2b3a17ac0c11815f53211093a54cf55: extcon: palmas: Drop IRQF_EARLY_RESUME flag (2016-02-29 11:07:34 +0900) Update extcon for 4.6 Detailed description for patchset: 1. Add new EXTCON_CHG_USB_SDP type - SDP (Standard Downstream Port) USB Charging Port means the charging connector.a 2. Add the VBUS detection by using GPIO on extcon-palmas - Beaglex15 board uses the extcon-palmas driver But, beaglex15 board need the GPIO support for VBUS detection. 3. Fix the minor issue of extcon drivers Chanwoo Choi (1): extcon: Add the EXTCON_CHG_USB_SDP to support SDP charing port Charles Keepax (1): extcon: arizona: Use DAPM mutex helper functions Dan Carpenter (1): extcon: max77843: Use correct size for reading the interrupt register Felipe Balbi (3): extcon: palmas: Add the support for VBUS detection by using GPIO arm: boot: dts: beaglex15: Remove ID GPIO arm: boot: beaglex15: pass correct interrupt Geliang Tang (1): extcon: Use to_i2c_client for both rt8973a and sm5502 Grygorii Strashko (1): extcon: palmas: Drop IRQF_EARLY_RESUME flag Moritz Fischer (1): extcon: gpio: Fix typo in comment arch/arm/boot/dts/am57xx-beagle-x15.dts | 3 +- drivers/extcon/extcon-arizona.c | 4 +-- drivers/extcon/extcon-gpio.c| 2 +- drivers/extcon/extcon-max14577.c| 3 ++ drivers/extcon/extcon-max77693.c| 12 +++- drivers/extcon/extcon-max77843.c| 5 ++- drivers/extcon/extcon-max8997.c | 3 ++ drivers/extcon/extcon-palmas.c | 54 +++-- drivers/extcon/extcon-rt8973a.c | 8 +++-- drivers/extcon/extcon-sm5502.c | 8 +++-- include/linux/mfd/palmas.h | 3 ++ 11 files changed, 92 insertions(+), 13 deletions(-)
Re: [PATCH v6 00/12] Add T210 support in Tegra soctherm
Hi, Does anyone have comments on this series? Thanks. Wei. On 2016年02月22日 16:05, Wei Ni wrote: > This patchset adds following functions for tegra_soctherm driver: > 1. add T210 support. > 2. export debugfs to show some registers. > 3. add thermtrip funciton. > 4. add suspend/resume function. > > The v5 serial is in: > http://www.spinics.net/lists/linux-tegra/msg25079.html > The v4 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24972.html > The V3 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24911.html > The V2 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24901.html > The V1 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24808.html > > Main changes from V5: > 1. Change to use linux thermal framework to implement > thermtrip funciton, per Rob's comment. > 2. Add .set_trip_temp() in of-thermal driver, so that > we can set trips on hardware. > > Main changes from V4: > 1. Change description of devicetree binding per Rob's comment. > 2. Call of_node_put to decrement refcount of the node. > > Main changes from V3: > 1. Change structures to "const" in chip specific files. > 2. Minor changes per Thieery's comments. > > Main changes from V2: > 1. Fix build error in patch [1/11]. > 2. Use of_get_child_by_name instead of of_find_node_by_name in patch [8/11]. > 3. Use debugfs_remove_recursive to remove debugfs in patch [6/11]. > > Main changes from V1: > 1. Use the new type to handl different Tegra chips in one driver, which > suggested by Thierry. > 2. Changes per Thieery's other comments. > > Wei Ni (12): > thermal: tegra: move tegra thermal files into tegra directory > thermal: tegra: combine sensor group-related data > thermal: tegra: get rid of PDIV/HOTSPOT hack > thermal: tegra: split tegra_soctherm driver > thermal: tegra: add Tegra210 specific SOC_THERM driver > thermal: tegra: add a debugfs to show registers > thermal: of-thermal: allow setting trip_temp on hardware > of: add notes of critical trips for soctherm > thermal: tegra: add thermtrip function > thermal: tegra: add PM support > arm64: tegra: add soctherm node for Tegra210 > arm: tegra: set critical trips for Tegra124 > > .../devicetree/bindings/thermal/tegra-soctherm.txt | 12 + > arch/arm/boot/dts/tegra124.dtsi| 16 + > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 60 ++ > drivers/thermal/Kconfig| 12 +- > drivers/thermal/Makefile | 2 +- > drivers/thermal/of-thermal.c | 8 + > drivers/thermal/tegra/Kconfig | 13 + > drivers/thermal/tegra/Makefile | 5 + > drivers/thermal/tegra/soctherm-fuse.c | 169 + > drivers/thermal/tegra/soctherm.c | 685 > + > drivers/thermal/tegra/soctherm.h | 123 > drivers/thermal/tegra/tegra124-soctherm.c | 196 ++ > drivers/thermal/tegra/tegra210-soctherm.c | 197 ++ > drivers/thermal/tegra_soctherm.c | 476 -- > include/dt-bindings/thermal/tegra124-soctherm.h| 1 + > include/linux/thermal.h| 1 + > 16 files changed, 1489 insertions(+), 487 deletions(-) > create mode 100644 drivers/thermal/tegra/Kconfig > create mode 100644 drivers/thermal/tegra/Makefile > create mode 100644 drivers/thermal/tegra/soctherm-fuse.c > create mode 100644 drivers/thermal/tegra/soctherm.c > create mode 100644 drivers/thermal/tegra/soctherm.h > create mode 100644 drivers/thermal/tegra/tegra124-soctherm.c > create mode 100644 drivers/thermal/tegra/tegra210-soctherm.c > delete mode 100644 drivers/thermal/tegra_soctherm.c >
Re: [PATCH v6 00/12] Add T210 support in Tegra soctherm
Hi, Does anyone have comments on this series? Thanks. Wei. On 2016年02月22日 16:05, Wei Ni wrote: > This patchset adds following functions for tegra_soctherm driver: > 1. add T210 support. > 2. export debugfs to show some registers. > 3. add thermtrip funciton. > 4. add suspend/resume function. > > The v5 serial is in: > http://www.spinics.net/lists/linux-tegra/msg25079.html > The v4 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24972.html > The V3 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24911.html > The V2 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24901.html > The V1 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24808.html > > Main changes from V5: > 1. Change to use linux thermal framework to implement > thermtrip funciton, per Rob's comment. > 2. Add .set_trip_temp() in of-thermal driver, so that > we can set trips on hardware. > > Main changes from V4: > 1. Change description of devicetree binding per Rob's comment. > 2. Call of_node_put to decrement refcount of the node. > > Main changes from V3: > 1. Change structures to "const" in chip specific files. > 2. Minor changes per Thieery's comments. > > Main changes from V2: > 1. Fix build error in patch [1/11]. > 2. Use of_get_child_by_name instead of of_find_node_by_name in patch [8/11]. > 3. Use debugfs_remove_recursive to remove debugfs in patch [6/11]. > > Main changes from V1: > 1. Use the new type to handl different Tegra chips in one driver, which > suggested by Thierry. > 2. Changes per Thieery's other comments. > > Wei Ni (12): > thermal: tegra: move tegra thermal files into tegra directory > thermal: tegra: combine sensor group-related data > thermal: tegra: get rid of PDIV/HOTSPOT hack > thermal: tegra: split tegra_soctherm driver > thermal: tegra: add Tegra210 specific SOC_THERM driver > thermal: tegra: add a debugfs to show registers > thermal: of-thermal: allow setting trip_temp on hardware > of: add notes of critical trips for soctherm > thermal: tegra: add thermtrip function > thermal: tegra: add PM support > arm64: tegra: add soctherm node for Tegra210 > arm: tegra: set critical trips for Tegra124 > > .../devicetree/bindings/thermal/tegra-soctherm.txt | 12 + > arch/arm/boot/dts/tegra124.dtsi| 16 + > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 60 ++ > drivers/thermal/Kconfig| 12 +- > drivers/thermal/Makefile | 2 +- > drivers/thermal/of-thermal.c | 8 + > drivers/thermal/tegra/Kconfig | 13 + > drivers/thermal/tegra/Makefile | 5 + > drivers/thermal/tegra/soctherm-fuse.c | 169 + > drivers/thermal/tegra/soctherm.c | 685 > + > drivers/thermal/tegra/soctherm.h | 123 > drivers/thermal/tegra/tegra124-soctherm.c | 196 ++ > drivers/thermal/tegra/tegra210-soctherm.c | 197 ++ > drivers/thermal/tegra_soctherm.c | 476 -- > include/dt-bindings/thermal/tegra124-soctherm.h| 1 + > include/linux/thermal.h| 1 + > 16 files changed, 1489 insertions(+), 487 deletions(-) > create mode 100644 drivers/thermal/tegra/Kconfig > create mode 100644 drivers/thermal/tegra/Makefile > create mode 100644 drivers/thermal/tegra/soctherm-fuse.c > create mode 100644 drivers/thermal/tegra/soctherm.c > create mode 100644 drivers/thermal/tegra/soctherm.h > create mode 100644 drivers/thermal/tegra/tegra124-soctherm.c > create mode 100644 drivers/thermal/tegra/tegra210-soctherm.c > delete mode 100644 drivers/thermal/tegra_soctherm.c >
Re: [PATCH 01/10] fs crypto: add basic definitions for per-file encryption
On 02/25/16 11:25, Jaegeuk Kim wrote: > This patch adds definitions for per-file encryption used by ext4 and f2fs. > > Signed-off-by: Jaegeuk Kim> --- > include/linux/fs.h | 8 ++ > include/linux/fscrypto.h | 239 > +++ > include/uapi/linux/fs.h | 18 > 3 files changed, 265 insertions(+) > create mode 100644 include/linux/fscrypto.h > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ae68100..d8f57cf 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -53,6 +53,8 @@ struct swap_info_struct; > struct seq_file; > struct workqueue_struct; > struct iov_iter; > +struct fscrypt_info; > +struct fscrypt_operations; > > extern void __init inode_init(void); > extern void __init inode_init_early(void); > @@ -678,6 +680,10 @@ struct inode { > struct hlist_head i_fsnotify_marks; > #endif > > +#ifdef CONFIG_FS_ENCRYPTION > + struct fscrypt_info *i_crypt_info; > +#endif > + > void*i_private; /* fs or device private pointer */ > }; > > @@ -1323,6 +1329,8 @@ struct super_block { > #endif > const struct xattr_handler **s_xattr; > > + const struct fscrypt_operations *s_cop; > + > struct hlist_bl_heads_anon; /* anonymous dentries for (nfs) > exporting */ > struct list_heads_mounts; /* list of mounts; _not_ for fs > use */ > struct block_device *s_bdev; > diff --git a/include/linux/fscrypto.h b/include/linux/fscrypto.h > new file mode 100644 > index 000..b0aed92 > --- /dev/null > +++ b/include/linux/fscrypto.h > @@ -0,0 +1,239 @@ > +/* > + * General per-file encryption definition > + * > + * Copyright (C) 2015, Google, Inc. > + * > + * Written by Michael Halcrow, 2015. > + * Modified by Jaegeuk Kim, 2015. > + */ > + > +#ifndef _LINUX_FSCRYPTO_H > +#define _LINUX_FSCRYPTO_H > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define FS_KEY_DERIVATION_NONCE_SIZE 16 > +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1 1 > + > +#define FS_POLICY_FLAGS_PAD_40x00 > +#define FS_POLICY_FLAGS_PAD_80x01 > +#define FS_POLICY_FLAGS_PAD_16 0x02 > +#define FS_POLICY_FLAGS_PAD_32 0x03 > +#define FS_POLICY_FLAGS_PAD_MASK 0x03 > +#define FS_POLICY_FLAGS_VALID0x03 > + > +/* Encryption algorithms */ > +#define FS_ENCRYPTION_MODE_INVALID 0 > +#define FS_ENCRYPTION_MODE_AES_256_XTS 1 > +#define FS_ENCRYPTION_MODE_AES_256_GCM 2 > +#define FS_ENCRYPTION_MODE_AES_256_CBC 3 > +#define FS_ENCRYPTION_MODE_AES_256_CTS 4 > + > +/** > + * Encryption context for inode > + * > + * Protector format: > + * 1 byte: Protector format (1 = this version) > + * 1 byte: File contents encryption mode > + * 1 byte: File names encryption mode > + * 1 byte: Flags > + * 8 bytes: Master Key descriptor > + * 16 bytes: Encryption Key derivation nonce > + */ > +struct fscrypt_context { > + char format; > + char contents_encryption_mode; > + char filenames_encryption_mode; > + char flags; > + char master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; > + char nonce[FS_KEY_DERIVATION_NONCE_SIZE]; how about u8 instead of char? > +} __packed; > + > +/* Encryption parameters */ > +#define FS_XTS_TWEAK_SIZE16 > +#define FS_AES_128_ECB_KEY_SIZE 16 > +#define FS_AES_256_GCM_KEY_SIZE 32 > +#define FS_AES_256_CBC_KEY_SIZE 32 > +#define FS_AES_256_CTS_KEY_SIZE 32 > +#define FS_AES_256_XTS_KEY_SIZE 64 > +#define FS_MAX_KEY_SIZE 64 > + > +#define FS_KEY_DESC_PREFIX "fscrypt:" > +#define FS_KEY_DESC_PREFIX_SIZE 8 > + > +/* This is passed in from userspace into the kernel keyring */ > +struct fscrypt_key { > + __u32 mode; > + char raw[FS_MAX_KEY_SIZE]; > + __u32 size; > +} __packed; > + > +struct fscrypt_info { > + char ci_data_mode; > + char ci_filename_mode; > + char ci_flags; ditto > + struct crypto_ablkcipher *ci_ctfm; > + struct key *ci_keyring_key; > + char ci_master_key[FS_KEY_DESCRIPTOR_SIZE]; > +}; > + > +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x0001 > +#define FS_WRITE_PATH_FL 0x0002 > + > +struct fscrypt_ctx { > + union { > + struct { > + struct page *bounce_page; /* Ciphertext page */ > + struct page *control_page; /* Original page */ > + } w; > + struct { > + struct bio *bio; > + struct work_struct work; > + } r; > + struct list_head free_list; /* Free list */ > + }; > + char flags; /* Flags */ > + char mode;
Re: [PATCH 01/10] fs crypto: add basic definitions for per-file encryption
On 02/25/16 11:25, Jaegeuk Kim wrote: > This patch adds definitions for per-file encryption used by ext4 and f2fs. > > Signed-off-by: Jaegeuk Kim > --- > include/linux/fs.h | 8 ++ > include/linux/fscrypto.h | 239 > +++ > include/uapi/linux/fs.h | 18 > 3 files changed, 265 insertions(+) > create mode 100644 include/linux/fscrypto.h > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ae68100..d8f57cf 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -53,6 +53,8 @@ struct swap_info_struct; > struct seq_file; > struct workqueue_struct; > struct iov_iter; > +struct fscrypt_info; > +struct fscrypt_operations; > > extern void __init inode_init(void); > extern void __init inode_init_early(void); > @@ -678,6 +680,10 @@ struct inode { > struct hlist_head i_fsnotify_marks; > #endif > > +#ifdef CONFIG_FS_ENCRYPTION > + struct fscrypt_info *i_crypt_info; > +#endif > + > void*i_private; /* fs or device private pointer */ > }; > > @@ -1323,6 +1329,8 @@ struct super_block { > #endif > const struct xattr_handler **s_xattr; > > + const struct fscrypt_operations *s_cop; > + > struct hlist_bl_heads_anon; /* anonymous dentries for (nfs) > exporting */ > struct list_heads_mounts; /* list of mounts; _not_ for fs > use */ > struct block_device *s_bdev; > diff --git a/include/linux/fscrypto.h b/include/linux/fscrypto.h > new file mode 100644 > index 000..b0aed92 > --- /dev/null > +++ b/include/linux/fscrypto.h > @@ -0,0 +1,239 @@ > +/* > + * General per-file encryption definition > + * > + * Copyright (C) 2015, Google, Inc. > + * > + * Written by Michael Halcrow, 2015. > + * Modified by Jaegeuk Kim, 2015. > + */ > + > +#ifndef _LINUX_FSCRYPTO_H > +#define _LINUX_FSCRYPTO_H > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define FS_KEY_DERIVATION_NONCE_SIZE 16 > +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1 1 > + > +#define FS_POLICY_FLAGS_PAD_40x00 > +#define FS_POLICY_FLAGS_PAD_80x01 > +#define FS_POLICY_FLAGS_PAD_16 0x02 > +#define FS_POLICY_FLAGS_PAD_32 0x03 > +#define FS_POLICY_FLAGS_PAD_MASK 0x03 > +#define FS_POLICY_FLAGS_VALID0x03 > + > +/* Encryption algorithms */ > +#define FS_ENCRYPTION_MODE_INVALID 0 > +#define FS_ENCRYPTION_MODE_AES_256_XTS 1 > +#define FS_ENCRYPTION_MODE_AES_256_GCM 2 > +#define FS_ENCRYPTION_MODE_AES_256_CBC 3 > +#define FS_ENCRYPTION_MODE_AES_256_CTS 4 > + > +/** > + * Encryption context for inode > + * > + * Protector format: > + * 1 byte: Protector format (1 = this version) > + * 1 byte: File contents encryption mode > + * 1 byte: File names encryption mode > + * 1 byte: Flags > + * 8 bytes: Master Key descriptor > + * 16 bytes: Encryption Key derivation nonce > + */ > +struct fscrypt_context { > + char format; > + char contents_encryption_mode; > + char filenames_encryption_mode; > + char flags; > + char master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; > + char nonce[FS_KEY_DERIVATION_NONCE_SIZE]; how about u8 instead of char? > +} __packed; > + > +/* Encryption parameters */ > +#define FS_XTS_TWEAK_SIZE16 > +#define FS_AES_128_ECB_KEY_SIZE 16 > +#define FS_AES_256_GCM_KEY_SIZE 32 > +#define FS_AES_256_CBC_KEY_SIZE 32 > +#define FS_AES_256_CTS_KEY_SIZE 32 > +#define FS_AES_256_XTS_KEY_SIZE 64 > +#define FS_MAX_KEY_SIZE 64 > + > +#define FS_KEY_DESC_PREFIX "fscrypt:" > +#define FS_KEY_DESC_PREFIX_SIZE 8 > + > +/* This is passed in from userspace into the kernel keyring */ > +struct fscrypt_key { > + __u32 mode; > + char raw[FS_MAX_KEY_SIZE]; > + __u32 size; > +} __packed; > + > +struct fscrypt_info { > + char ci_data_mode; > + char ci_filename_mode; > + char ci_flags; ditto > + struct crypto_ablkcipher *ci_ctfm; > + struct key *ci_keyring_key; > + char ci_master_key[FS_KEY_DESCRIPTOR_SIZE]; > +}; > + > +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x0001 > +#define FS_WRITE_PATH_FL 0x0002 > + > +struct fscrypt_ctx { > + union { > + struct { > + struct page *bounce_page; /* Ciphertext page */ > + struct page *control_page; /* Original page */ > + } w; > + struct { > + struct bio *bio; > + struct work_struct work; > + } r; > + struct list_head free_list; /* Free list */ > + }; > + char flags; /* Flags */ > + char mode; /* Encryption
Re: [PATCH 06/10] fs crypto: add Makefile and Kconfig
On 02/25/16 11:26, Jaegeuk Kim wrote: > This patch adds a facility to enable per-file encryption. > > Arnd fixes a missing CONFIG_BLOCK check in the original patch. > "The newly added generic crypto abstraction for file systems operates > on 'struct bio' objects, which do not exist when CONFIG_BLOCK is > disabled: > > fs/crypto/crypto.c: In function 'fscrypt_zeroout_range': > fs/crypto/crypto.c:308:9: error: implicit declaration of function 'bio_alloc' > [-Werror=implicit-function-declaration] > > This adds a Kconfig dependency that prevents FS_ENCRYPTION from being > enabled without BLOCK." > > Signed-off-by: Arnd Bergmann> Signed-off-by: Jaegeuk Kim > --- > fs/Kconfig | 2 ++ > fs/Makefile| 1 + > fs/crypto/Kconfig | 17 + > fs/crypto/Makefile | 2 ++ > 4 files changed, 22 insertions(+) > create mode 100644 fs/crypto/Kconfig > create mode 100644 fs/crypto/Makefile > > diff --git a/fs/Kconfig b/fs/Kconfig > index 9adee0d..9d75767 100644 > --- a/fs/Kconfig > +++ b/fs/Kconfig > @@ -84,6 +84,8 @@ config MANDATORY_FILE_LOCKING > > To the best of my knowledge this is dead code that no one cares about. > > +source "fs/crypto/Kconfig" > + > source "fs/notify/Kconfig" > > source "fs/quota/Kconfig" > diff --git a/fs/Makefile b/fs/Makefile > index 79f5225..47571e2 100644 > --- a/fs/Makefile > +++ b/fs/Makefile > @@ -30,6 +30,7 @@ obj-$(CONFIG_EVENTFD) += eventfd.o > obj-$(CONFIG_USERFAULTFD)+= userfaultfd.o > obj-$(CONFIG_AIO) += aio.o > obj-$(CONFIG_FS_DAX) += dax.o > +obj-y+= crypto/ > obj-$(CONFIG_FILE_LOCKING) += locks.o > obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o > obj-$(CONFIG_BINFMT_AOUT)+= binfmt_aout.o > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig > new file mode 100644 > index 000..9bea124e > --- /dev/null > +++ b/fs/crypto/Kconfig > @@ -0,0 +1,17 @@ > +config FS_ENCRYPTION > + bool "FS Encryption (Per-file encryption)" > + depends on BLOCK depends on CRYPTO since all of the CRYPTO_xxx below also depend on CRYPTO. > + select CRYPTO_AES > + select CRYPTO_CBC > + select CRYPTO_ECB > + select CRYPTO_XTS > + select CRYPTO_CTS > + select CRYPTO_CTR > + select CRYPTO_SHA256 > + select KEYS > + select ENCRYPTED_KEYS > + help > + Enable encryption of files and directories. This > + feature is similar to ecryptfs, but it is more memory > + efficient since it avoids caching the encrypted and > + decrypted pages in the page cache. > diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile > new file mode 100644 > index 000..f9f68cd > --- /dev/null > +++ b/fs/crypto/Makefile > @@ -0,0 +1,2 @@ > +obj-y += fname.o > +obj-$(CONFIG_FS_ENCRYPTION) += crypto.o policy.o keyinfo.o > -- ~Randy
Re: [PATCH 06/10] fs crypto: add Makefile and Kconfig
On 02/25/16 11:26, Jaegeuk Kim wrote: > This patch adds a facility to enable per-file encryption. > > Arnd fixes a missing CONFIG_BLOCK check in the original patch. > "The newly added generic crypto abstraction for file systems operates > on 'struct bio' objects, which do not exist when CONFIG_BLOCK is > disabled: > > fs/crypto/crypto.c: In function 'fscrypt_zeroout_range': > fs/crypto/crypto.c:308:9: error: implicit declaration of function 'bio_alloc' > [-Werror=implicit-function-declaration] > > This adds a Kconfig dependency that prevents FS_ENCRYPTION from being > enabled without BLOCK." > > Signed-off-by: Arnd Bergmann > Signed-off-by: Jaegeuk Kim > --- > fs/Kconfig | 2 ++ > fs/Makefile| 1 + > fs/crypto/Kconfig | 17 + > fs/crypto/Makefile | 2 ++ > 4 files changed, 22 insertions(+) > create mode 100644 fs/crypto/Kconfig > create mode 100644 fs/crypto/Makefile > > diff --git a/fs/Kconfig b/fs/Kconfig > index 9adee0d..9d75767 100644 > --- a/fs/Kconfig > +++ b/fs/Kconfig > @@ -84,6 +84,8 @@ config MANDATORY_FILE_LOCKING > > To the best of my knowledge this is dead code that no one cares about. > > +source "fs/crypto/Kconfig" > + > source "fs/notify/Kconfig" > > source "fs/quota/Kconfig" > diff --git a/fs/Makefile b/fs/Makefile > index 79f5225..47571e2 100644 > --- a/fs/Makefile > +++ b/fs/Makefile > @@ -30,6 +30,7 @@ obj-$(CONFIG_EVENTFD) += eventfd.o > obj-$(CONFIG_USERFAULTFD)+= userfaultfd.o > obj-$(CONFIG_AIO) += aio.o > obj-$(CONFIG_FS_DAX) += dax.o > +obj-y+= crypto/ > obj-$(CONFIG_FILE_LOCKING) += locks.o > obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o > obj-$(CONFIG_BINFMT_AOUT)+= binfmt_aout.o > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig > new file mode 100644 > index 000..9bea124e > --- /dev/null > +++ b/fs/crypto/Kconfig > @@ -0,0 +1,17 @@ > +config FS_ENCRYPTION > + bool "FS Encryption (Per-file encryption)" > + depends on BLOCK depends on CRYPTO since all of the CRYPTO_xxx below also depend on CRYPTO. > + select CRYPTO_AES > + select CRYPTO_CBC > + select CRYPTO_ECB > + select CRYPTO_XTS > + select CRYPTO_CTS > + select CRYPTO_CTR > + select CRYPTO_SHA256 > + select KEYS > + select ENCRYPTED_KEYS > + help > + Enable encryption of files and directories. This > + feature is similar to ecryptfs, but it is more memory > + efficient since it avoids caching the encrypted and > + decrypted pages in the page cache. > diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile > new file mode 100644 > index 000..f9f68cd > --- /dev/null > +++ b/fs/crypto/Makefile > @@ -0,0 +1,2 @@ > +obj-y += fname.o > +obj-$(CONFIG_FS_ENCRYPTION) += crypto.o policy.o keyinfo.o > -- ~Randy
[PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
Setting TF prevents fastpath returns in most cases, which causes the test to fail on 32-bit kernels because 32-bit kernels do not, in fact, handle NT correctly on SYSENTER entries. The next patch will fix 32-bit kernels. Signed-off-by: Andy Lutomirski--- tools/testing/selftests/x86/syscall_nt.c | 57 +++- 1 file changed, 49 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c index 60c06af4646a..a6ceff86c199 100644 --- a/tools/testing/selftests/x86/syscall_nt.c +++ b/tools/testing/selftests/x86/syscall_nt.c @@ -17,6 +17,9 @@ #include #include +#include +#include +#include #include #include @@ -26,6 +29,8 @@ # define WIDTH "l" #endif +static unsigned int nerrs; + static unsigned long get_eflags(void) { unsigned long eflags; @@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags) : : "rm" (eflags) : "flags"); } -int main() +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) { - printf("[RUN]\tSet NT and issue a syscall\n"); - set_eflags(get_eflags() | X86_EFLAGS_NT); + struct sigaction sa; + memset(, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(_mask); + if (sigaction(sig, , 0)) + err(1, "sigaction"); +} + +static void sigtrap(int sig, siginfo_t *si, void *ctx_void) +{ +} + +static void do_it(unsigned long extraflags) +{ + unsigned long flags; + + set_eflags(get_eflags() | extraflags); syscall(SYS_getpid); - if (get_eflags() & X86_EFLAGS_NT) { - printf("[OK]\tThe syscall worked and NT is still set\n"); - return 0; + flags = get_eflags(); + if ((flags & extraflags) == extraflags) { + printf("[OK]\tThe syscall worked and flags are still set\n"); } else { - printf("[FAIL]\tThe syscall worked but NT was cleared\n"); - return 1; + printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n", + flags, extraflags); + nerrs++; } } + +int main() +{ + printf("[RUN]\tSet NT and issue a syscall\n"); + do_it(X86_EFLAGS_NT); + + /* +* Now try it again with TF set -- TF forces returns via IRET in all +* cases except non-ptregs-using 64-bit full fast path syscalls. +*/ + + sethandler(SIGTRAP, sigtrap, 0); + + printf("[RUN]\tSet NT|TF and issue a syscall\n"); + do_it(X86_EFLAGS_NT | X86_EFLAGS_TF); + + return nerrs == 0 ? 0 : 1; +} -- 2.5.0
[PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
Setting TF prevents fastpath returns in most cases, which causes the test to fail on 32-bit kernels because 32-bit kernels do not, in fact, handle NT correctly on SYSENTER entries. The next patch will fix 32-bit kernels. Signed-off-by: Andy Lutomirski --- tools/testing/selftests/x86/syscall_nt.c | 57 +++- 1 file changed, 49 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c index 60c06af4646a..a6ceff86c199 100644 --- a/tools/testing/selftests/x86/syscall_nt.c +++ b/tools/testing/selftests/x86/syscall_nt.c @@ -17,6 +17,9 @@ #include #include +#include +#include +#include #include #include @@ -26,6 +29,8 @@ # define WIDTH "l" #endif +static unsigned int nerrs; + static unsigned long get_eflags(void) { unsigned long eflags; @@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags) : : "rm" (eflags) : "flags"); } -int main() +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) { - printf("[RUN]\tSet NT and issue a syscall\n"); - set_eflags(get_eflags() | X86_EFLAGS_NT); + struct sigaction sa; + memset(, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(_mask); + if (sigaction(sig, , 0)) + err(1, "sigaction"); +} + +static void sigtrap(int sig, siginfo_t *si, void *ctx_void) +{ +} + +static void do_it(unsigned long extraflags) +{ + unsigned long flags; + + set_eflags(get_eflags() | extraflags); syscall(SYS_getpid); - if (get_eflags() & X86_EFLAGS_NT) { - printf("[OK]\tThe syscall worked and NT is still set\n"); - return 0; + flags = get_eflags(); + if ((flags & extraflags) == extraflags) { + printf("[OK]\tThe syscall worked and flags are still set\n"); } else { - printf("[FAIL]\tThe syscall worked but NT was cleared\n"); - return 1; + printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n", + flags, extraflags); + nerrs++; } } + +int main() +{ + printf("[RUN]\tSet NT and issue a syscall\n"); + do_it(X86_EFLAGS_NT); + + /* +* Now try it again with TF set -- TF forces returns via IRET in all +* cases except non-ptregs-using 64-bit full fast path syscalls. +*/ + + sethandler(SIGTRAP, sigtrap, 0); + + printf("[RUN]\tSet NT|TF and issue a syscall\n"); + do_it(X86_EFLAGS_NT | X86_EFLAGS_TF); + + return nerrs == 0 ? 0 : 1; +} -- 2.5.0
[PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER path. Little did he know... This series makes the observed behavior of SYSENTER wrt flags the same for all sane flags and kernel bitnesses. That is, SYSENTER preserves flags now unless you do a syscall that explicitly changes flags, and the HW flags that the syscall executes with are sanitized. This includes NT, TF, AC and all arithmetic flags. Prior to this series, 32-bit kernels clobbered TF and the arithmetic flags and behaved highly erratically if NT was set. (If IF is cleared by evil userspace when SYSENTER starts, IF will be set again on return. There's nothing the kernel can do about this -- SYSENTER inherently forgets the state of IF.) This series speeds up SYSENTER on all kernels by a surprisingly large amount on Skylake because it eliminates an unconditional CLAC. While SYSENTER used to handle TF correctly as far as I can tell on 64-bit kernels, the means by which it did so was heavily tangled up in the ptrace single-step logic. It now works just like all the other kernel entries except insofar as do_debug has a simple special case for it. Relatedly, the bizarre and poorly explained old fixup in do_debug is now hidden behind a WARN_ON_ONCE in preparation for deleting it at some point. The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels used to be both terrifying and incorrect. (It doesn't appear to have been exploitably bad, but the reason for that is subtle, and the code was certainy more fragile than it deserved to me.) We still need a special fixup, but it's much simpler now. While I was doing all this, I also noticed that DR6 and BTF handling in do_debug was a bit off. Two of the patches in here try to fix it up. Have fun! tl;dr: Cleanups and sanity fixes here, but no security fixes, and I don't think anything needs to be backported or put in x86/urgent. This series applies to the result of merging tip:x86/asm and tip:x86/urgent. I've been testing on a somewhat bastardized base, because tip currently doesn't work on my laptop in 32-bit mode. (That bug is fixed in Linus' tree.) Andy Lutomirski (10): selftests/x86: In syscall_nt, test NT|TF as well x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/32: Restore FLAGS on SYSEXIT x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/traps: Clear DR6 early in do_debug and improve the comment x86/entry: Vastly simplify SYSENTER TF handling x86/entry: Only allocate space for SYSENTER_stack if needed x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry/32: Add and check a stack canary for the SYSENTER stack arch/x86/entry/entry_32.S| 182 ++- arch/x86/entry/entry_64_compat.S | 15 ++- arch/x86/include/asm/processor.h | 5 +- arch/x86/include/asm/proto.h | 15 ++- arch/x86/kernel/asm-offsets_32.c | 5 + arch/x86/kernel/process.c| 3 + arch/x86/kernel/traps.c | 87 --- tools/testing/selftests/x86/syscall_nt.c | 57 -- 8 files changed, 263 insertions(+), 106 deletions(-) -- 2.5.0
[PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
CLAC is slow, and the SYSENTER code already has an unlikely path that runs if unusual flags are set. Drop the CLAC and instead rely on the unlikely path to clear AC. This seems to save ~24 cycles on my Skylake laptop. (Hey, Intel, make this faster please!) Signed-off-by: Andy Lutomirski--- arch/x86/entry/entry_64_compat.S | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 89bcb4979e7a..7c8e72da7654 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat) */ pushfq /* pt_regs->flags (except IF = 0) */ orl $X86_EFLAGS_IF, (%rsp) /* Fix saved flags */ - ASM_CLAC/* Clear AC after saving FLAGS */ - pushq $__USER32_CS/* pt_regs->cs */ xorq%r8,%r8 pushq %r8 /* pt_regs->ip = 0 (placeholder) */ @@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat) cld /* -* Sysenter doesn't filter flags, so we need to clear NT +* Sysenter doesn't filter flags, so we need to clear NT and AC * ourselves. To save a few cycles, we can check whether -* NT was set instead of doing an unconditional popfq. +* either was set instead of doing an unconditional popfq. * This needs to happen before enabling interrupts so that * we don't get preempted with NT set. * @@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat) * we're keeping that code behind a branch which will predict as * not-taken and therefore its instructions won't be fetched. */ - testl $X86_EFLAGS_NT, EFLAGS(%rsp) + testl $X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: -- 2.5.0
[PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER path. Little did he know... This series makes the observed behavior of SYSENTER wrt flags the same for all sane flags and kernel bitnesses. That is, SYSENTER preserves flags now unless you do a syscall that explicitly changes flags, and the HW flags that the syscall executes with are sanitized. This includes NT, TF, AC and all arithmetic flags. Prior to this series, 32-bit kernels clobbered TF and the arithmetic flags and behaved highly erratically if NT was set. (If IF is cleared by evil userspace when SYSENTER starts, IF will be set again on return. There's nothing the kernel can do about this -- SYSENTER inherently forgets the state of IF.) This series speeds up SYSENTER on all kernels by a surprisingly large amount on Skylake because it eliminates an unconditional CLAC. While SYSENTER used to handle TF correctly as far as I can tell on 64-bit kernels, the means by which it did so was heavily tangled up in the ptrace single-step logic. It now works just like all the other kernel entries except insofar as do_debug has a simple special case for it. Relatedly, the bizarre and poorly explained old fixup in do_debug is now hidden behind a WARN_ON_ONCE in preparation for deleting it at some point. The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels used to be both terrifying and incorrect. (It doesn't appear to have been exploitably bad, but the reason for that is subtle, and the code was certainy more fragile than it deserved to me.) We still need a special fixup, but it's much simpler now. While I was doing all this, I also noticed that DR6 and BTF handling in do_debug was a bit off. Two of the patches in here try to fix it up. Have fun! tl;dr: Cleanups and sanity fixes here, but no security fixes, and I don't think anything needs to be backported or put in x86/urgent. This series applies to the result of merging tip:x86/asm and tip:x86/urgent. I've been testing on a somewhat bastardized base, because tip currently doesn't work on my laptop in 32-bit mode. (That bug is fixed in Linus' tree.) Andy Lutomirski (10): selftests/x86: In syscall_nt, test NT|TF as well x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/32: Restore FLAGS on SYSEXIT x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/traps: Clear DR6 early in do_debug and improve the comment x86/entry: Vastly simplify SYSENTER TF handling x86/entry: Only allocate space for SYSENTER_stack if needed x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry/32: Add and check a stack canary for the SYSENTER stack arch/x86/entry/entry_32.S| 182 ++- arch/x86/entry/entry_64_compat.S | 15 ++- arch/x86/include/asm/processor.h | 5 +- arch/x86/include/asm/proto.h | 15 ++- arch/x86/kernel/asm-offsets_32.c | 5 + arch/x86/kernel/process.c| 3 + arch/x86/kernel/traps.c | 87 --- tools/testing/selftests/x86/syscall_nt.c | 57 -- 8 files changed, 263 insertions(+), 106 deletions(-) -- 2.5.0
[PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
CLAC is slow, and the SYSENTER code already has an unlikely path that runs if unusual flags are set. Drop the CLAC and instead rely on the unlikely path to clear AC. This seems to save ~24 cycles on my Skylake laptop. (Hey, Intel, make this faster please!) Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_64_compat.S | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 89bcb4979e7a..7c8e72da7654 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat) */ pushfq /* pt_regs->flags (except IF = 0) */ orl $X86_EFLAGS_IF, (%rsp) /* Fix saved flags */ - ASM_CLAC/* Clear AC after saving FLAGS */ - pushq $__USER32_CS/* pt_regs->cs */ xorq%r8,%r8 pushq %r8 /* pt_regs->ip = 0 (placeholder) */ @@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat) cld /* -* Sysenter doesn't filter flags, so we need to clear NT +* Sysenter doesn't filter flags, so we need to clear NT and AC * ourselves. To save a few cycles, we can check whether -* NT was set instead of doing an unconditional popfq. +* either was set instead of doing an unconditional popfq. * This needs to happen before enabling interrupts so that * we don't get preempted with NT set. * @@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat) * we're keeping that code behind a branch which will predict as * not-taken and therefore its instructions won't be fetched. */ - testl $X86_EFLAGS_NT, EFLAGS(%rsp) + testl $X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: -- 2.5.0
[PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment
Leaving any bits set in DR6 on return from a debug exception is asking for trouble. Prevent it by writing zero right away and clarify the comment. Signed-off-by: Andy Lutomirski--- arch/x86/kernel/traps.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 19e6cfa501e3..6dddc220e3ed 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) ist_enter(regs); get_debugreg(dr6, 6); + /* +* The Intel SDM says: +* +* Certain debug exceptions may clear bits 0-3. The remaining +* contents of the DR6 register are never cleared by the +* processor. To avoid confusion in identifying debug +* exceptions, debug handlers should clear the register before +* returning to the interrupted task. +* +* Keep it simple: clear DR6 immediately. +*/ + set_debugreg(0, 6); /* Filter out all the reserved bits which are preset to 1 */ dr6 &= ~DR6_RESERVED; @@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) if ((dr6 & DR_STEP) && kmemcheck_trap(regs)) goto exit; - /* DR6 may or may not be cleared by the CPU */ - set_debugreg(0, 6); - /* Store the virtualized DR6 value */ tsk->thread.debugreg6 = dr6; -- 2.5.0
[PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment
Leaving any bits set in DR6 on return from a debug exception is asking for trouble. Prevent it by writing zero right away and clarify the comment. Signed-off-by: Andy Lutomirski --- arch/x86/kernel/traps.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 19e6cfa501e3..6dddc220e3ed 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) ist_enter(regs); get_debugreg(dr6, 6); + /* +* The Intel SDM says: +* +* Certain debug exceptions may clear bits 0-3. The remaining +* contents of the DR6 register are never cleared by the +* processor. To avoid confusion in identifying debug +* exceptions, debug handlers should clear the register before +* returning to the interrupted task. +* +* Keep it simple: clear DR6 immediately. +*/ + set_debugreg(0, 6); /* Filter out all the reserved bits which are preset to 1 */ dr6 &= ~DR6_RESERVED; @@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) if ((dr6 & DR_STEP) && kmemcheck_trap(regs)) goto exit; - /* DR6 may or may not be cleared by the CPU */ - set_debugreg(0, 6); - /* Store the virtualized DR6 value */ tsk->thread.debugreg6 = dr6; -- 2.5.0
[PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling
Due to a blatant design error, SYSENTER doesn't clear TF. As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. That's all we need to do. Don't get too excited -- our handling is still buggy on 32-bit kernels. There's nothing wrong with the SYSENTER code itself, but the #DB prologue has a clever fixup for traps on the very first instruction of entry_SYSENTER_32, and the fixup doesn't work quite correctly. The next two patches will fix that. [1] We could probably prevent it by forcing BTF on at all times and making sure we clear TF before any branches in the SYSENTER code. Needless to say, this is a bad idea. Signed-off-by: Andy Lutomirski--- arch/x86/entry/entry_32.S| 42 ++-- arch/x86/entry/entry_64_compat.S | 9 ++- arch/x86/include/asm/proto.h | 15 ++-- arch/x86/kernel/traps.c | 52 +--- 4 files changed, 94 insertions(+), 24 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ed171f938960..752d4f031a18 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -287,7 +287,26 @@ need_resched: END(resume_kernel) #endif - # SYSENTER call handler stub +GLOBAL(__begin_SYSENTER_singlestep_region) +/* + * All code from here through __end_SYSENTER_singlestep_region is subject + * to being single-stepped if a user program sets TF and executes SYSENTER. + * There is absolutely nothing that we can do to prevent this from happening + * (thanks Intel!). To keep our handling of this situation as simple as + * possible, we handle TF just like AC and NT, except that our #DB handler + * will ignore all of the single-step traps generated in this range. + */ + +#ifdef CONFIG_XEN +/* + * Xen doesn't set %esp to be precisely what the normal SYSENTER + * entry point expects, so fix it up before using the normal path. + */ +ENTRY(xen_sysenter_target) + addl$5*4, %esp /* remove xen-provided frame */ + jmp sysenter_past_esp +#endif + ENTRY(entry_SYSENTER_32) movlTSS_sysenter_sp0(%esp), %esp sysenter_past_esp: @@ -301,19 +320,25 @@ sysenter_past_esp: SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */ /* -* Sysenter doesn't filter flags, so we need to clear NT and AC -* ourselves. To save a few cycles, we can check whether +* Sysenter doesn't filter flags, so we need to clear NT, AC +* and TF ourselves. To save a few cycles, we can check whether * either was set instead of doing an unconditional popfq. * This needs to happen before enabling interrupts so that * we don't get preempted with NT set. * +* If TF is set, we will single-step all the way to here -- do_debug +* will ignore all the traps. (Yes, this is slow, but so is +* single-stepping in general. This allows us to avoid having +* a more complicated code to handle the case where a user program +* forces us to single-step through the SYSENTER entry code.) +* * NB.: .Lsysenter_fix_flags is a label with the code under it moved * out-of-line as an optimization: NT is unlikely to be set in the * majority of the cases and instead of polluting the I$ unnecessarily, * we're keeping that code behind a branch which will predict as * not-taken and therefore its instructions won't be fetched. */ - testl $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp) + testl $X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: @@ -369,6 +394,7 @@ sysenter_past_esp: pushl $X86_EFLAGS_FIXED popfl jmp .Lsysenter_flags_fixed +GLOBAL(__end_SYSENTER_singlestep_region) ENDPROC(entry_SYSENTER_32) # system call handler stub @@ -662,14 +688,6 @@ ENTRY(spurious_interrupt_bug) END(spurious_interrupt_bug) #ifdef CONFIG_XEN -/* - * Xen doesn't set %esp to be precisely what the normal SYSENTER - * entry point expects, so fix it up before using the normal path. - */ -ENTRY(xen_sysenter_target) - addl$5*4, %esp /* remove xen-provided frame */ - jmp sysenter_past_esp - ENTRY(xen_hypervisor_callback) pushl $-1 /* orig_ax = -1 => not a system call */ SAVE_ALL diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index
[PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling
Due to a blatant design error, SYSENTER doesn't clear TF. As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. That's all we need to do. Don't get too excited -- our handling is still buggy on 32-bit kernels. There's nothing wrong with the SYSENTER code itself, but the #DB prologue has a clever fixup for traps on the very first instruction of entry_SYSENTER_32, and the fixup doesn't work quite correctly. The next two patches will fix that. [1] We could probably prevent it by forcing BTF on at all times and making sure we clear TF before any branches in the SYSENTER code. Needless to say, this is a bad idea. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S| 42 ++-- arch/x86/entry/entry_64_compat.S | 9 ++- arch/x86/include/asm/proto.h | 15 ++-- arch/x86/kernel/traps.c | 52 +--- 4 files changed, 94 insertions(+), 24 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ed171f938960..752d4f031a18 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -287,7 +287,26 @@ need_resched: END(resume_kernel) #endif - # SYSENTER call handler stub +GLOBAL(__begin_SYSENTER_singlestep_region) +/* + * All code from here through __end_SYSENTER_singlestep_region is subject + * to being single-stepped if a user program sets TF and executes SYSENTER. + * There is absolutely nothing that we can do to prevent this from happening + * (thanks Intel!). To keep our handling of this situation as simple as + * possible, we handle TF just like AC and NT, except that our #DB handler + * will ignore all of the single-step traps generated in this range. + */ + +#ifdef CONFIG_XEN +/* + * Xen doesn't set %esp to be precisely what the normal SYSENTER + * entry point expects, so fix it up before using the normal path. + */ +ENTRY(xen_sysenter_target) + addl$5*4, %esp /* remove xen-provided frame */ + jmp sysenter_past_esp +#endif + ENTRY(entry_SYSENTER_32) movlTSS_sysenter_sp0(%esp), %esp sysenter_past_esp: @@ -301,19 +320,25 @@ sysenter_past_esp: SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */ /* -* Sysenter doesn't filter flags, so we need to clear NT and AC -* ourselves. To save a few cycles, we can check whether +* Sysenter doesn't filter flags, so we need to clear NT, AC +* and TF ourselves. To save a few cycles, we can check whether * either was set instead of doing an unconditional popfq. * This needs to happen before enabling interrupts so that * we don't get preempted with NT set. * +* If TF is set, we will single-step all the way to here -- do_debug +* will ignore all the traps. (Yes, this is slow, but so is +* single-stepping in general. This allows us to avoid having +* a more complicated code to handle the case where a user program +* forces us to single-step through the SYSENTER entry code.) +* * NB.: .Lsysenter_fix_flags is a label with the code under it moved * out-of-line as an optimization: NT is unlikely to be set in the * majority of the cases and instead of polluting the I$ unnecessarily, * we're keeping that code behind a branch which will predict as * not-taken and therefore its instructions won't be fetched. */ - testl $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp) + testl $X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: @@ -369,6 +394,7 @@ sysenter_past_esp: pushl $X86_EFLAGS_FIXED popfl jmp .Lsysenter_flags_fixed +GLOBAL(__end_SYSENTER_singlestep_region) ENDPROC(entry_SYSENTER_32) # system call handler stub @@ -662,14 +688,6 @@ ENTRY(spurious_interrupt_bug) END(spurious_interrupt_bug) #ifdef CONFIG_XEN -/* - * Xen doesn't set %esp to be precisely what the normal SYSENTER - * entry point expects, so fix it up before using the normal path. - */ -ENTRY(xen_sysenter_target) - addl$5*4, %esp /* remove xen-provided frame */ - jmp sysenter_past_esp - ENTRY(xen_hypervisor_callback) pushl $-1 /* orig_ax = -1 => not a system call */ SAVE_ALL diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 7c8e72da7654..6aec75b41b06
[PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed
The SYSENTER stack is only used on 32-bit kernels. Remove it in 64-bit kernels. (We may end up using it down the road on 64-bit kernels. If so, we'll re-enable it for CONFIG_IA32_EMULATION.) Signed-off-by: Andy Lutomirski--- arch/x86/include/asm/processor.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ecb410310e70..7cd01b71b5bd 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -297,10 +297,12 @@ struct tss_struct { */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; +#ifdef CONFIG_X86_32 /* * Space for the temporary SYSENTER stack: */ unsigned long SYSENTER_stack[64]; +#endif } cacheline_aligned; -- 2.5.0
[PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed
The SYSENTER stack is only used on 32-bit kernels. Remove it in 64-bit kernels. (We may end up using it down the road on 64-bit kernels. If so, we'll re-enable it for CONFIG_IA32_EMULATION.) Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/processor.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ecb410310e70..7cd01b71b5bd 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -297,10 +297,12 @@ struct tss_struct { */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; +#ifdef CONFIG_X86_32 /* * Space for the temporary SYSENTER stack: */ unsigned long SYSENTER_stack[64]; +#endif } cacheline_aligned; -- 2.5.0
[PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack
Signed-off-by: Andy Lutomirski--- arch/x86/include/asm/processor.h | 3 ++- arch/x86/kernel/process.c| 3 +++ arch/x86/kernel/traps.c | 8 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7cd01b71b5bd..50a6dc871cc0 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -299,8 +299,9 @@ struct tss_struct { #ifdef CONFIG_X86_32 /* -* Space for the temporary SYSENTER stack: +* Space for the temporary SYSENTER stack. */ + unsigned long SYSENTER_stack_canary; unsigned long SYSENTER_stack[64]; #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f7c21c22477..ee9a9792caeb 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { */ .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, #endif +#ifdef CONFIG_X86_32 + .SYSENTER_stack_canary = STACK_END_MAGIC, +#endif }; EXPORT_PER_CPU_SYMBOL(cpu_tss); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 80928ea78373..590110119e6a 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) debug_stack_usage_dec(); exit: +#if defined(CONFIG_X86_32) + /* +* This is the most likely code path that involves non-trivial use +* of the SYSENTER stack. Check that we haven't overrun it. +*/ + WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC, +"Overran or corrupted SYSENTER stack\n"); +#endif ist_exit(regs); } NOKPROBE_SYMBOL(do_debug); -- 2.5.0
[PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack
Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/processor.h | 3 ++- arch/x86/kernel/process.c| 3 +++ arch/x86/kernel/traps.c | 8 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7cd01b71b5bd..50a6dc871cc0 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -299,8 +299,9 @@ struct tss_struct { #ifdef CONFIG_X86_32 /* -* Space for the temporary SYSENTER stack: +* Space for the temporary SYSENTER stack. */ + unsigned long SYSENTER_stack_canary; unsigned long SYSENTER_stack[64]; #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f7c21c22477..ee9a9792caeb 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { */ .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, #endif +#ifdef CONFIG_X86_32 + .SYSENTER_stack_canary = STACK_END_MAGIC, +#endif }; EXPORT_PER_CPU_SYMBOL(cpu_tss); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 80928ea78373..590110119e6a 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) debug_stack_usage_dec(); exit: +#if defined(CONFIG_X86_32) + /* +* This is the most likely code path that involves non-trivial use +* of the SYSENTER stack. Check that we haven't overrun it. +*/ + WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC, +"Overran or corrupted SYSENTER stack\n"); +#endif ist_exit(regs); } NOKPROBE_SYMBOL(do_debug); -- 2.5.0
[PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
Right after SYSENTER, we can get a #DB or NMI. On x86_32, there's no IST, so the exception handler is invoked on the temporary SYSENTER stack. Because the SYSENTER stack is very small, we have a fixup to switch off the stack quickly when this happens. The old fixup had several issues: 1. It checked the interrupt frame's CS and EIP. This wasn't obviously correct on Xen or if vm86 mode was in use [1]. 2. In the NMI handler, it did some frightening digging into the stack frame. I'm not convinced this digging was correct. 3. The fixup didn't switch stacks and then switch back. Instead, it synthesized a brand new stack frame that would redirect the IRET back to the SYSENTER code. That frame was highly questionable. For one thing, if NMI nested inside #DB, we would effectively abort the #DB prologue, which was probably safe but was frightening. For another, the code used PUSHFL to write the FLAGS portion of the frame, which was simply bogus -- by the time PUSHFL was called, at least TF, NT, VM, and all of the arithmetic flags were clobbered. Simplify this considerably. Instead of looking at the saved frame to see where we came from, check the hardware ESP register against the SYSENTER stack directly. Malicious user code cannot spoof the kernel ESP register, and by moving the check after SAVE_ALL, we can use normal PER_CPU accesses to find all the relevant addresses. With this patch applied, the improved syscall_nt_32 test finally passes on 32-bit kernels. [1] It isn't obviously correct, but it is nonetheless safe from vm86 shenanigans as far as I can tell. A user can't point EIP at entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32, like all kernel addresses, is greater than 0x and would thus violate the CS segment limit. Signed-off-by: Andy Lutomirski--- arch/x86/entry/entry_32.S| 114 ++- arch/x86/kernel/asm-offsets_32.c | 5 ++ 2 files changed, 56 insertions(+), 63 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 752d4f031a18..99bf636a6eaf 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -987,51 +987,48 @@ error_code: jmp ret_from_exception END(page_fault) -/* - * Debug traps and NMI can happen at the one SYSENTER instruction - * that sets up the real kernel stack. Check here, since we can't - * allow the wrong stack to be used. - * - * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have - * already pushed 3 words if it hits on the sysenter instruction: - * eflags, cs and eip. - * - * We just load the right stack, and push the three (known) values - * by hand onto the new stack - while updating the return eip past - * the instruction that would have done it for sysenter. - */ -.macro FIX_STACK offset ok label - cmpw$__KERNEL_CS, 4(%esp) - jne \ok -\label: - movlTSS_sysenter_sp0 + \offset(%esp), %esp - pushfl - pushl $__KERNEL_CS - pushl $sysenter_past_esp -.endm - ENTRY(debug) + /* +* #DB can happen at the first instruction of +* entry_SYSENTER_32 or in Xen's SYSENTER prologue. If this +* happens, then we will be running on a very small stack. We +* need to detect this condition and switch to the thread +* stack before calling any C code at all. +* +* If you edit this code, keep in mind that NMIs can happen in here. +*/ ASM_CLAC - cmpl$entry_SYSENTER_32, (%esp) - jne debug_stack_correct - FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn -debug_stack_correct: pushl $-1 # mark this as an int SAVE_ALL - TRACE_IRQS_OFF xorl%edx, %edx # error code 0 movl%esp, %eax # pt_regs pointer + + /* Are we currently on the SYSENTER stack? */ + PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) + subl%eax, %ecx /* ecx = (end of SYENTER_stack) - esp */ + cmpl$SIZEOF_SYSENTER_stack, %ecx + jb .Ldebug_from_sysenter_stack + + TRACE_IRQS_OFF + calldo_debug + jmp ret_from_exception + +.Ldebug_from_sysenter_stack: + /* We're on the SYSENTER stack. Switch off. */ + movl%esp, %ebp + movlPER_CPU_VAR(cpu_current_top_of_stack), %esp + TRACE_IRQS_OFF calldo_debug + movl%ebp, %esp jmp ret_from_exception END(debug) /* - * NMI is doubly nasty. It can happen _while_ we're handling - * a debug fault, and the debug fault hasn't yet been able to - * clear up the stack. So we first check whether we got an - * NMI on the sysenter entry path, but after that we need to - * check whether we got an NMI on the debug path where the debug - * fault happened on the sysenter
[PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
Right after SYSENTER, we can get a #DB or NMI. On x86_32, there's no IST, so the exception handler is invoked on the temporary SYSENTER stack. Because the SYSENTER stack is very small, we have a fixup to switch off the stack quickly when this happens. The old fixup had several issues: 1. It checked the interrupt frame's CS and EIP. This wasn't obviously correct on Xen or if vm86 mode was in use [1]. 2. In the NMI handler, it did some frightening digging into the stack frame. I'm not convinced this digging was correct. 3. The fixup didn't switch stacks and then switch back. Instead, it synthesized a brand new stack frame that would redirect the IRET back to the SYSENTER code. That frame was highly questionable. For one thing, if NMI nested inside #DB, we would effectively abort the #DB prologue, which was probably safe but was frightening. For another, the code used PUSHFL to write the FLAGS portion of the frame, which was simply bogus -- by the time PUSHFL was called, at least TF, NT, VM, and all of the arithmetic flags were clobbered. Simplify this considerably. Instead of looking at the saved frame to see where we came from, check the hardware ESP register against the SYSENTER stack directly. Malicious user code cannot spoof the kernel ESP register, and by moving the check after SAVE_ALL, we can use normal PER_CPU accesses to find all the relevant addresses. With this patch applied, the improved syscall_nt_32 test finally passes on 32-bit kernels. [1] It isn't obviously correct, but it is nonetheless safe from vm86 shenanigans as far as I can tell. A user can't point EIP at entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32, like all kernel addresses, is greater than 0x and would thus violate the CS segment limit. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S| 114 ++- arch/x86/kernel/asm-offsets_32.c | 5 ++ 2 files changed, 56 insertions(+), 63 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 752d4f031a18..99bf636a6eaf 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -987,51 +987,48 @@ error_code: jmp ret_from_exception END(page_fault) -/* - * Debug traps and NMI can happen at the one SYSENTER instruction - * that sets up the real kernel stack. Check here, since we can't - * allow the wrong stack to be used. - * - * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have - * already pushed 3 words if it hits on the sysenter instruction: - * eflags, cs and eip. - * - * We just load the right stack, and push the three (known) values - * by hand onto the new stack - while updating the return eip past - * the instruction that would have done it for sysenter. - */ -.macro FIX_STACK offset ok label - cmpw$__KERNEL_CS, 4(%esp) - jne \ok -\label: - movlTSS_sysenter_sp0 + \offset(%esp), %esp - pushfl - pushl $__KERNEL_CS - pushl $sysenter_past_esp -.endm - ENTRY(debug) + /* +* #DB can happen at the first instruction of +* entry_SYSENTER_32 or in Xen's SYSENTER prologue. If this +* happens, then we will be running on a very small stack. We +* need to detect this condition and switch to the thread +* stack before calling any C code at all. +* +* If you edit this code, keep in mind that NMIs can happen in here. +*/ ASM_CLAC - cmpl$entry_SYSENTER_32, (%esp) - jne debug_stack_correct - FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn -debug_stack_correct: pushl $-1 # mark this as an int SAVE_ALL - TRACE_IRQS_OFF xorl%edx, %edx # error code 0 movl%esp, %eax # pt_regs pointer + + /* Are we currently on the SYSENTER stack? */ + PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) + subl%eax, %ecx /* ecx = (end of SYENTER_stack) - esp */ + cmpl$SIZEOF_SYSENTER_stack, %ecx + jb .Ldebug_from_sysenter_stack + + TRACE_IRQS_OFF + calldo_debug + jmp ret_from_exception + +.Ldebug_from_sysenter_stack: + /* We're on the SYSENTER stack. Switch off. */ + movl%esp, %ebp + movlPER_CPU_VAR(cpu_current_top_of_stack), %esp + TRACE_IRQS_OFF calldo_debug + movl%ebp, %esp jmp ret_from_exception END(debug) /* - * NMI is doubly nasty. It can happen _while_ we're handling - * a debug fault, and the debug fault hasn't yet been able to - * clear up the stack. So we first check whether we got an - * NMI on the sysenter entry path, but after that we need to - * check whether we got an NMI on the debug path where the debug - * fault happened on the sysenter path. + * NMI is
[PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
The SDM says that debug exceptions clear BTF, and we need to keep TIF_BLOCKSTEP in sync with BTF. Clear it unconditionally and improve the comment. I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not to be cleared was just an oversight. Signed-off-by: Andy Lutomirski--- arch/x86/kernel/traps.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index dd2c2e66c2e1..19e6cfa501e3 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) dr6 &= ~DR6_RESERVED; /* +* The SDM says "The processor clears the BTF flag when it +* generates a debug exception." Clear TIF_BLOCKSTEP to keep +* TIF_BLOCKSTEP in sync with the hardware BTF flag. +*/ + clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP); + + /* * If dr6 has no reason to give us about the origin of this trap, * then it's very likely the result of an icebp/int01 trap. * User wants a sigtrap for that. @@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) /* DR6 may or may not be cleared by the CPU */ set_debugreg(0, 6); - /* -* The processor cleared BTF, so don't mark that we need it set. -*/ - clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP); - /* Store the virtualized DR6 value */ tsk->thread.debugreg6 = dr6; -- 2.5.0
[PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT
We weren't restoring FLAGS at all on SYSEXIT. Apparently no one cared. With this patch applied, native kernels should always honor task_pt_regs()->flags, which opens the door for some sys_iopl cleanups. I'll do those as a separate series, though, since getting it right will involve tweaking some paravirt ops. (The short version is that, before this patch, sys_iopl, invoked via SYSENTER, wasn't guaranteed to ever transfer the updated regs->flags, so sys_iopl had to change the hardware flags register as well.) Reported-by: Brian GerstSigned-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 263ebde6333f..ed171f938960 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -343,6 +343,15 @@ sysenter_past_esp: popl%eax/* pt_regs->ax */ /* +* Restore all flags except IF (we restore IF separately because +* STI gives a one-instruction window in which we won't be interrupted, +* whereas POPF does not. +*/ + addl$PT_EFLAGS-PT_DS, %esp /* point esp at pt_regs->flags */ + btr $X86_EFLAGS_IF_BIT, (%esp) + popfl + + /* * Return back to the vDSO, which will pop ecx and edx. * Don't bother with DS and ES (they already contain __USER_DS). */ -- 2.5.0
[PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
This makes the 32-bit code work just like the 64-bit code. It should speed up syscalls on 32-bit kernels on Skylake by something like 20 cycles (by analogy to the 64-bit compat case). It also cleans up NT just like we do for the 64-bit case. Signed-off-by: Andy Lutomirski--- arch/x86/entry/entry_32.S | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ab710eee4308..263ebde6333f 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -294,7 +294,6 @@ sysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp/* pt_regs->sp (stashed in bp) */ pushfl /* pt_regs->flags (except IF = 0) */ - ASM_CLAC/* Clear AC after saving FLAGS */ orl $X86_EFLAGS_IF, (%esp) /* Fix IF */ pushl $__USER_CS /* pt_regs->cs */ pushl $0 /* pt_regs->ip = 0 (placeholder) */ @@ -302,6 +301,23 @@ sysenter_past_esp: SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */ /* +* Sysenter doesn't filter flags, so we need to clear NT and AC +* ourselves. To save a few cycles, we can check whether +* either was set instead of doing an unconditional popfq. +* This needs to happen before enabling interrupts so that +* we don't get preempted with NT set. +* +* NB.: .Lsysenter_fix_flags is a label with the code under it moved +* out-of-line as an optimization: NT is unlikely to be set in the +* majority of the cases and instead of polluting the I$ unnecessarily, +* we're keeping that code behind a branch which will predict as +* not-taken and therefore its instructions won't be fetched. +*/ + testl $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp) + jnz .Lsysenter_fix_flags +.Lsysenter_flags_fixed: + + /* * User mode is traced as though IRQs are on, and SYSENTER * turned them off. */ @@ -339,6 +355,11 @@ sysenter_past_esp: .popsection _ASM_EXTABLE(1b, 2b) PTGS_TO_GS_EX + +.Lsysenter_fix_flags: + pushl $X86_EFLAGS_FIXED + popfl + jmp .Lsysenter_flags_fixed ENDPROC(entry_SYSENTER_32) # system call handler stub -- 2.5.0
[PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
The SDM says that debug exceptions clear BTF, and we need to keep TIF_BLOCKSTEP in sync with BTF. Clear it unconditionally and improve the comment. I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not to be cleared was just an oversight. Signed-off-by: Andy Lutomirski --- arch/x86/kernel/traps.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index dd2c2e66c2e1..19e6cfa501e3 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) dr6 &= ~DR6_RESERVED; /* +* The SDM says "The processor clears the BTF flag when it +* generates a debug exception." Clear TIF_BLOCKSTEP to keep +* TIF_BLOCKSTEP in sync with the hardware BTF flag. +*/ + clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP); + + /* * If dr6 has no reason to give us about the origin of this trap, * then it's very likely the result of an icebp/int01 trap. * User wants a sigtrap for that. @@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) /* DR6 may or may not be cleared by the CPU */ set_debugreg(0, 6); - /* -* The processor cleared BTF, so don't mark that we need it set. -*/ - clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP); - /* Store the virtualized DR6 value */ tsk->thread.debugreg6 = dr6; -- 2.5.0
[PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT
We weren't restoring FLAGS at all on SYSEXIT. Apparently no one cared. With this patch applied, native kernels should always honor task_pt_regs()->flags, which opens the door for some sys_iopl cleanups. I'll do those as a separate series, though, since getting it right will involve tweaking some paravirt ops. (The short version is that, before this patch, sys_iopl, invoked via SYSENTER, wasn't guaranteed to ever transfer the updated regs->flags, so sys_iopl had to change the hardware flags register as well.) Reported-by: Brian Gerst Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 263ebde6333f..ed171f938960 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -343,6 +343,15 @@ sysenter_past_esp: popl%eax/* pt_regs->ax */ /* +* Restore all flags except IF (we restore IF separately because +* STI gives a one-instruction window in which we won't be interrupted, +* whereas POPF does not. +*/ + addl$PT_EFLAGS-PT_DS, %esp /* point esp at pt_regs->flags */ + btr $X86_EFLAGS_IF_BIT, (%esp) + popfl + + /* * Return back to the vDSO, which will pop ecx and edx. * Don't bother with DS and ES (they already contain __USER_DS). */ -- 2.5.0
[PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
This makes the 32-bit code work just like the 64-bit code. It should speed up syscalls on 32-bit kernels on Skylake by something like 20 cycles (by analogy to the 64-bit compat case). It also cleans up NT just like we do for the 64-bit case. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ab710eee4308..263ebde6333f 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -294,7 +294,6 @@ sysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp/* pt_regs->sp (stashed in bp) */ pushfl /* pt_regs->flags (except IF = 0) */ - ASM_CLAC/* Clear AC after saving FLAGS */ orl $X86_EFLAGS_IF, (%esp) /* Fix IF */ pushl $__USER_CS /* pt_regs->cs */ pushl $0 /* pt_regs->ip = 0 (placeholder) */ @@ -302,6 +301,23 @@ sysenter_past_esp: SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */ /* +* Sysenter doesn't filter flags, so we need to clear NT and AC +* ourselves. To save a few cycles, we can check whether +* either was set instead of doing an unconditional popfq. +* This needs to happen before enabling interrupts so that +* we don't get preempted with NT set. +* +* NB.: .Lsysenter_fix_flags is a label with the code under it moved +* out-of-line as an optimization: NT is unlikely to be set in the +* majority of the cases and instead of polluting the I$ unnecessarily, +* we're keeping that code behind a branch which will predict as +* not-taken and therefore its instructions won't be fetched. +*/ + testl $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp) + jnz .Lsysenter_fix_flags +.Lsysenter_flags_fixed: + + /* * User mode is traced as though IRQs are on, and SYSENTER * turned them off. */ @@ -339,6 +355,11 @@ sysenter_past_esp: .popsection _ASM_EXTABLE(1b, 2b) PTGS_TO_GS_EX + +.Lsysenter_fix_flags: + pushl $X86_EFLAGS_FIXED + popfl + jmp .Lsysenter_flags_fixed ENDPROC(entry_SYSENTER_32) # system call handler stub -- 2.5.0
Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep
On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvaldswrote: > On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen wrote: >> >> The exception handlers which use the IST stacks don't necessarily >> set irq count. Maybe they should. > > Hmm. I think they should. Since they clearly must not schedule, as > they use a percpu stack. > > Which exceptions use IST? > > [ grep grep ] > > Looks like stack, doublefault, nmi, debug and mce. And yes, I really > think they should all raise the irq count if they don't already. > Rather than add random arch-specific "let's check that we're on the > right stack" code to the might-sleep stuff, just use the one we have. > Resurrecting an old thread: The outcome of this discussion was that ist_enter now raises HARDIRQ_COUNT. I think this is causing a problem. If a user program enables TF, it generates a bunch of debug exceptions. The handlers raise the IRQ count and do stuff, and apparently some of that stuff can raise a softirq. (I have no idea where the softirq is being raised.) The softirq code notices that we're in_interrupt and doesn't wake ksoftirqd because it thinks we're about to exit the interrupt and process the softirq. But we don't, which causes occasional warnings and confuses things (and me!). So how do we fix it? If we stop raising HARDIRQ_COUNT (and apply $SUBJECT?), then raise_softirq will wake ksoftirqd and life is good. But this seems a bit silly, since, if we entered the ist exception handler from a context with irqs on and softirqs enabled, we *could* plausibly handle the softirq right away -- we're on an essentially empty stack. (Of course, it's a *small* stack, since it could be the IST stack.) Or we could just let ksoftirqd do its thing and stop raising HARDIRQ_COUNT. We could add a new preempt count field just for IST (yuck). We could try to hijack a different preempt count field (NMI?). But I kind of like the idea of just reinstating the original patch of explicitly checking that we're on a safe stack in schedule and __might_sleep, since that is the actual condition we care about. --Andy
Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep
On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds wrote: > On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen wrote: >> >> The exception handlers which use the IST stacks don't necessarily >> set irq count. Maybe they should. > > Hmm. I think they should. Since they clearly must not schedule, as > they use a percpu stack. > > Which exceptions use IST? > > [ grep grep ] > > Looks like stack, doublefault, nmi, debug and mce. And yes, I really > think they should all raise the irq count if they don't already. > Rather than add random arch-specific "let's check that we're on the > right stack" code to the might-sleep stuff, just use the one we have. > Resurrecting an old thread: The outcome of this discussion was that ist_enter now raises HARDIRQ_COUNT. I think this is causing a problem. If a user program enables TF, it generates a bunch of debug exceptions. The handlers raise the IRQ count and do stuff, and apparently some of that stuff can raise a softirq. (I have no idea where the softirq is being raised.) The softirq code notices that we're in_interrupt and doesn't wake ksoftirqd because it thinks we're about to exit the interrupt and process the softirq. But we don't, which causes occasional warnings and confuses things (and me!). So how do we fix it? If we stop raising HARDIRQ_COUNT (and apply $SUBJECT?), then raise_softirq will wake ksoftirqd and life is good. But this seems a bit silly, since, if we entered the ist exception handler from a context with irqs on and softirqs enabled, we *could* plausibly handle the softirq right away -- we're on an essentially empty stack. (Of course, it's a *small* stack, since it could be the IST stack.) Or we could just let ksoftirqd do its thing and stop raising HARDIRQ_COUNT. We could add a new preempt count field just for IST (yuck). We could try to hijack a different preempt count field (NMI?). But I kind of like the idea of just reinstating the original patch of explicitly checking that we're on a safe stack in schedule and __might_sleep, since that is the actual condition we care about. --Andy
[PATCH v4 3/5] ocfs2: create/remove sysfile for online file check
Create online file check sysfile when ocfs2 mount, remove the related sysfile when ocfs2 umount. Signed-off-by: Gang HeReviewed-by: Mark Fasheh --- fs/ocfs2/super.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 2de4c8a..5ef88b8 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -74,6 +74,7 @@ #include "suballoc.h" #include "buffer_head_io.h" +#include "filecheck.h" static struct kmem_cache *ocfs2_inode_cachep; struct kmem_cache *ocfs2_dquot_cachep; @@ -1204,6 +1205,9 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent) /* Start this when the mount is almost sure of being successful */ ocfs2_orphan_scan_start(osb); + /* Create filecheck sysfile /sys/fs/ocfs2//filecheck */ + ocfs2_filecheck_create_sysfs(sb); + return status; read_super_error: @@ -1671,6 +1675,7 @@ static void ocfs2_put_super(struct super_block *sb) ocfs2_sync_blockdev(sb); ocfs2_dismount_volume(sb, 0); + ocfs2_filecheck_remove_sysfs(sb); } static int ocfs2_statfs(struct dentry *dentry, struct kstatfs *buf) -- 2.1.2
[PATCH v4 4/5] ocfs2: check/fix inode block for online file check
Implement online check or fix inode block during reading a inode block to memory. Signed-off-by: Gang He--- fs/ocfs2/inode.c | 225 +++-- fs/ocfs2/ocfs2_trace.h | 2 + 2 files changed, 218 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 8f87e05..6ce531e 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -53,6 +53,7 @@ #include "xattr.h" #include "refcounttree.h" #include "ocfs2_trace.h" +#include "filecheck.h" #include "buffer_head_io.h" @@ -74,6 +75,14 @@ static int ocfs2_truncate_for_delete(struct ocfs2_super *osb, struct inode *inode, struct buffer_head *fe_bh); +static int ocfs2_filecheck_read_inode_block_full(struct inode *inode, +struct buffer_head **bh, +int flags, int type); +static int ocfs2_filecheck_validate_inode_block(struct super_block *sb, + struct buffer_head *bh); +static int ocfs2_filecheck_repair_inode_block(struct super_block *sb, + struct buffer_head *bh); + void ocfs2_set_inode_flags(struct inode *inode) { unsigned int flags = OCFS2_I(inode)->ip_attr; @@ -127,6 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 blkno) struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags, int sysfile_type) { + int rc = 0; struct inode *inode = NULL; struct super_block *sb = osb->sb; struct ocfs2_find_inode_args args; @@ -161,12 +171,17 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags, } trace_ocfs2_iget5_locked(inode->i_state); if (inode->i_state & I_NEW) { - ocfs2_read_locked_inode(inode, ); + rc = ocfs2_read_locked_inode(inode, ); unlock_new_inode(inode); } if (is_bad_inode(inode)) { iput(inode); - inode = ERR_PTR(-ESTALE); + if ((flags & OCFS2_FI_FLAG_FILECHECK_CHK) || + (flags & OCFS2_FI_FLAG_FILECHECK_FIX)) + /* Return OCFS2_FILECHECK_ERR_XXX related errno */ + inode = ERR_PTR(rc); + else + inode = ERR_PTR(-ESTALE); goto bail; } @@ -409,7 +424,7 @@ static int ocfs2_read_locked_inode(struct inode *inode, struct ocfs2_super *osb; struct ocfs2_dinode *fe; struct buffer_head *bh = NULL; - int status, can_lock; + int status, can_lock, lock_level = 0; u32 generation = 0; status = -EINVAL; @@ -477,7 +492,7 @@ static int ocfs2_read_locked_inode(struct inode *inode, mlog_errno(status); return status; } - status = ocfs2_inode_lock(inode, NULL, 0); + status = ocfs2_inode_lock(inode, NULL, lock_level); if (status) { make_bad_inode(inode); mlog_errno(status); @@ -494,16 +509,32 @@ static int ocfs2_read_locked_inode(struct inode *inode, } if (can_lock) { - status = ocfs2_read_inode_block_full(inode, , -OCFS2_BH_IGNORE_CACHE); + if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK) + status = ocfs2_filecheck_read_inode_block_full(inode, + , OCFS2_BH_IGNORE_CACHE, 0); + else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX) + status = ocfs2_filecheck_read_inode_block_full(inode, + , OCFS2_BH_IGNORE_CACHE, 1); + else + status = ocfs2_read_inode_block_full(inode, + , OCFS2_BH_IGNORE_CACHE); } else { status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, ); /* * If buffer is in jbd, then its checksum may not have been * computed as yet. */ - if (!status && !buffer_jbd(bh)) - status = ocfs2_validate_inode_block(osb->sb, bh); + if (!status && !buffer_jbd(bh)) { + if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK) + status = ocfs2_filecheck_validate_inode_block( + osb->sb, bh); + else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX) + status = ocfs2_filecheck_repair_inode_block( +
[PATCH v4 1/5] ocfs2: export ocfs2_kset for online file check
Export ocfs2_kset object from ocfs2_stackglue kernel module, then online file check code will create the related sysfiles under ocfs2_kset object. Signed-off-by: Gang HeReviewed-by: Mark Fasheh --- fs/ocfs2/stackglue.c | 3 ++- fs/ocfs2/stackglue.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index 5d965e8..13219ed 100644 --- a/fs/ocfs2/stackglue.c +++ b/fs/ocfs2/stackglue.c @@ -629,7 +629,8 @@ static struct attribute_group ocfs2_attr_group = { .attrs = ocfs2_attrs, }; -static struct kset *ocfs2_kset; +struct kset *ocfs2_kset; +EXPORT_SYMBOL_GPL(ocfs2_kset); static void ocfs2_sysfs_exit(void) { diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h index 66334a3..f2dce10 100644 --- a/fs/ocfs2/stackglue.h +++ b/fs/ocfs2/stackglue.h @@ -298,4 +298,6 @@ void ocfs2_stack_glue_set_max_proto_version(struct ocfs2_protocol_version *max_p int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin); void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin); +extern struct kset *ocfs2_kset; + #endif /* STACKGLUE_H */ -- 2.1.2
[PATCH v4 3/5] ocfs2: create/remove sysfile for online file check
Create online file check sysfile when ocfs2 mount, remove the related sysfile when ocfs2 umount. Signed-off-by: Gang He Reviewed-by: Mark Fasheh --- fs/ocfs2/super.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 2de4c8a..5ef88b8 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -74,6 +74,7 @@ #include "suballoc.h" #include "buffer_head_io.h" +#include "filecheck.h" static struct kmem_cache *ocfs2_inode_cachep; struct kmem_cache *ocfs2_dquot_cachep; @@ -1204,6 +1205,9 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent) /* Start this when the mount is almost sure of being successful */ ocfs2_orphan_scan_start(osb); + /* Create filecheck sysfile /sys/fs/ocfs2//filecheck */ + ocfs2_filecheck_create_sysfs(sb); + return status; read_super_error: @@ -1671,6 +1675,7 @@ static void ocfs2_put_super(struct super_block *sb) ocfs2_sync_blockdev(sb); ocfs2_dismount_volume(sb, 0); + ocfs2_filecheck_remove_sysfs(sb); } static int ocfs2_statfs(struct dentry *dentry, struct kstatfs *buf) -- 2.1.2
[PATCH v4 4/5] ocfs2: check/fix inode block for online file check
Implement online check or fix inode block during reading a inode block to memory. Signed-off-by: Gang He --- fs/ocfs2/inode.c | 225 +++-- fs/ocfs2/ocfs2_trace.h | 2 + 2 files changed, 218 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 8f87e05..6ce531e 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -53,6 +53,7 @@ #include "xattr.h" #include "refcounttree.h" #include "ocfs2_trace.h" +#include "filecheck.h" #include "buffer_head_io.h" @@ -74,6 +75,14 @@ static int ocfs2_truncate_for_delete(struct ocfs2_super *osb, struct inode *inode, struct buffer_head *fe_bh); +static int ocfs2_filecheck_read_inode_block_full(struct inode *inode, +struct buffer_head **bh, +int flags, int type); +static int ocfs2_filecheck_validate_inode_block(struct super_block *sb, + struct buffer_head *bh); +static int ocfs2_filecheck_repair_inode_block(struct super_block *sb, + struct buffer_head *bh); + void ocfs2_set_inode_flags(struct inode *inode) { unsigned int flags = OCFS2_I(inode)->ip_attr; @@ -127,6 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 blkno) struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags, int sysfile_type) { + int rc = 0; struct inode *inode = NULL; struct super_block *sb = osb->sb; struct ocfs2_find_inode_args args; @@ -161,12 +171,17 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags, } trace_ocfs2_iget5_locked(inode->i_state); if (inode->i_state & I_NEW) { - ocfs2_read_locked_inode(inode, ); + rc = ocfs2_read_locked_inode(inode, ); unlock_new_inode(inode); } if (is_bad_inode(inode)) { iput(inode); - inode = ERR_PTR(-ESTALE); + if ((flags & OCFS2_FI_FLAG_FILECHECK_CHK) || + (flags & OCFS2_FI_FLAG_FILECHECK_FIX)) + /* Return OCFS2_FILECHECK_ERR_XXX related errno */ + inode = ERR_PTR(rc); + else + inode = ERR_PTR(-ESTALE); goto bail; } @@ -409,7 +424,7 @@ static int ocfs2_read_locked_inode(struct inode *inode, struct ocfs2_super *osb; struct ocfs2_dinode *fe; struct buffer_head *bh = NULL; - int status, can_lock; + int status, can_lock, lock_level = 0; u32 generation = 0; status = -EINVAL; @@ -477,7 +492,7 @@ static int ocfs2_read_locked_inode(struct inode *inode, mlog_errno(status); return status; } - status = ocfs2_inode_lock(inode, NULL, 0); + status = ocfs2_inode_lock(inode, NULL, lock_level); if (status) { make_bad_inode(inode); mlog_errno(status); @@ -494,16 +509,32 @@ static int ocfs2_read_locked_inode(struct inode *inode, } if (can_lock) { - status = ocfs2_read_inode_block_full(inode, , -OCFS2_BH_IGNORE_CACHE); + if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK) + status = ocfs2_filecheck_read_inode_block_full(inode, + , OCFS2_BH_IGNORE_CACHE, 0); + else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX) + status = ocfs2_filecheck_read_inode_block_full(inode, + , OCFS2_BH_IGNORE_CACHE, 1); + else + status = ocfs2_read_inode_block_full(inode, + , OCFS2_BH_IGNORE_CACHE); } else { status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, ); /* * If buffer is in jbd, then its checksum may not have been * computed as yet. */ - if (!status && !buffer_jbd(bh)) - status = ocfs2_validate_inode_block(osb->sb, bh); + if (!status && !buffer_jbd(bh)) { + if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK) + status = ocfs2_filecheck_validate_inode_block( + osb->sb, bh); + else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX) + status = ocfs2_filecheck_repair_inode_block( +
[PATCH v4 1/5] ocfs2: export ocfs2_kset for online file check
Export ocfs2_kset object from ocfs2_stackglue kernel module, then online file check code will create the related sysfiles under ocfs2_kset object. Signed-off-by: Gang He Reviewed-by: Mark Fasheh --- fs/ocfs2/stackglue.c | 3 ++- fs/ocfs2/stackglue.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index 5d965e8..13219ed 100644 --- a/fs/ocfs2/stackglue.c +++ b/fs/ocfs2/stackglue.c @@ -629,7 +629,8 @@ static struct attribute_group ocfs2_attr_group = { .attrs = ocfs2_attrs, }; -static struct kset *ocfs2_kset; +struct kset *ocfs2_kset; +EXPORT_SYMBOL_GPL(ocfs2_kset); static void ocfs2_sysfs_exit(void) { diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h index 66334a3..f2dce10 100644 --- a/fs/ocfs2/stackglue.h +++ b/fs/ocfs2/stackglue.h @@ -298,4 +298,6 @@ void ocfs2_stack_glue_set_max_proto_version(struct ocfs2_protocol_version *max_p int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin); void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin); +extern struct kset *ocfs2_kset; + #endif /* STACKGLUE_H */ -- 2.1.2
[PATCH v4 5/5] ocfs2: add feature document for online file check
This document will describe OCFS2 online file check feature. OCFS2 is often used in high-availaibility systems. However, OCFS2 usually converts the filesystem to read-only when encounters an error. This may not be necessary, since turning the filesystem read-only would affect other running processes as well, decreasing availability. Then, a mount option (errors=continue) is introduced, which would return the -EIO errno to the calling process and terminate furhter processing so that the filesystem is not corrupted further. The filesystem is not converted to read-only, and the problematic file's inode number is reported in the kernel log. The user can try to check/fix this file via online filecheck feature. Signed-off-by: Gang HeReviewed-by: Mark Fasheh --- .../filesystems/ocfs2-online-filecheck.txt | 94 ++ 1 file changed, 94 insertions(+) create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt b/Documentation/filesystems/ocfs2-online-filecheck.txt new file mode 100644 index 000..1ab0786 --- /dev/null +++ b/Documentation/filesystems/ocfs2-online-filecheck.txt @@ -0,0 +1,94 @@ + OCFS2 online file check + --- + +This document will describe OCFS2 online file check feature. + +Introduction + +OCFS2 is often used in high-availaibility systems. However, OCFS2 usually +converts the filesystem to read-only when encounters an error. This may not be +necessary, since turning the filesystem read-only would affect other running +processes as well, decreasing availability. +Then, a mount option (errors=continue) is introduced, which would return the +-EIO errno to the calling process and terminate furhter processing so that the +filesystem is not corrupted further. The filesystem is not converted to +read-only, and the problematic file's inode number is reported in the kernel +log. The user can try to check/fix this file via online filecheck feature. + +Scope += +This effort is to check/fix small issues which may hinder day-to-day operations +of a cluster filesystem by turning the filesystem read-only. The scope of +checking/fixing is at the file level, initially for regular files and eventually +to all files (including system files) of the filesystem. + +In case of directory to file links is incorrect, the directory inode is +reported as erroneous. + +This feature is not suited for extravagant checks which involve dependency of +other components of the filesystem, such as but not limited to, checking if the +bits for file blocks in the allocation has been set. In case of such an error, +the offline fsck should/would be recommended. + +Finally, such an operation/feature should not be automated lest the filesystem +may end up with more damage than before the repair attempt. So, this has to +be performed using user interaction and consent. + +User interface +== +When there are errors in the OCFS2 filesystem, they are usually accompanied +by the inode number which caused the error. This inode number would be the +input to check/fix the file. + +There is a sysfs directory for each OCFS2 file system mounting: + + /sys/fs/ocfs2//filecheck + +Here, indicates the name of OCFS2 volumn device which has been already +mounted. The file above would accept inode numbers. This could be used to +communicate with kernel space, tell which file(inode number) will be checked or +fixed. Currently, three operations are supported, which includes checking +inode, fixing inode and setting the size of result record history. + +1. If you want to know what error exactly happened to before fixing, do + + # echo "" > /sys/fs/ocfs2//filecheck/check + # cat /sys/fs/ocfs2//filecheck/check + +The output is like this: + INO DONEERROR +39502 1 GENERATION + + lists the inode numbers. + indicates whether the operation has been finished. + says what kind of errors was found. For the detailed error numbers, +please refer to the file linux/fs/ocfs2/filecheck.h. + +2. If you determine to fix this inode, do + + # echo "" > /sys/fs/ocfs2//filecheck/fix + # cat /sys/fs/ocfs2//filecheck/fix + +The output is like this: + INO DONEERROR +39502 1 SUCCESS + +This time, the column indicates whether this fix is successful or not. + +3. The record cache is used to store the history of check/fix results. It's +defalut size is 10, and can be adjust between the range of 10 ~ 100. You can +adjust the size like this: + + # echo "" > /sys/fs/ocfs2//filecheck/set + +Fixing stuff + +On receivng the inode, the filesystem would read the inode and the +file metadata. In case of errors, the filesystem would fix the errors +and report the problems it fixed in the kernel log. As a precautionary measure, +the inode must first be checked for errors before performing a final
[PATCH v4 2/5] ocfs2: sysfile interfaces for online file check
Implement online file check sysfile interfaces, e.g. how to create the related sysfile according to device name, how to display/handle file check request from the sysfile. Signed-off-by: Gang He--- fs/ocfs2/Makefile| 3 +- fs/ocfs2/filecheck.c | 606 +++ fs/ocfs2/filecheck.h | 49 + fs/ocfs2/inode.h | 3 + 4 files changed, 660 insertions(+), 1 deletion(-) create mode 100644 fs/ocfs2/filecheck.c create mode 100644 fs/ocfs2/filecheck.h diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index ce210d4..e27e652 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -41,7 +41,8 @@ ocfs2-objs := \ quota_local.o \ quota_global.o \ xattr.o \ - acl.o + acl.o \ + filecheck.o ocfs2_stackglue-objs := stackglue.o ocfs2_stack_o2cb-objs := stack_o2cb.o diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c new file mode 100644 index 000..2cabbcf --- /dev/null +++ b/fs/ocfs2/filecheck.c @@ -0,0 +1,606 @@ +/* -*- mode: c; c-basic-offset: 8; -*- + * vim: noexpandtab sw=8 ts=8 sts=0: + * + * filecheck.c + * + * Code which implements online file check. + * + * Copyright (C) 2016 SuSE. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation, version 2. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ocfs2.h" +#include "ocfs2_fs.h" +#include "stackglue.h" +#include "inode.h" + +#include "filecheck.h" + + +/* File check error strings, + * must correspond with error number in header file. + */ +static const char * const ocfs2_filecheck_errs[] = { + "SUCCESS", + "FAILED", + "INPROGRESS", + "READONLY", + "INJBD", + "INVALIDINO", + "BLOCKECC", + "BLOCKNO", + "VALIDFLAG", + "GENERATION", + "UNSUPPORTED" +}; + +static DEFINE_SPINLOCK(ocfs2_filecheck_sysfs_lock); +static LIST_HEAD(ocfs2_filecheck_sysfs_list); + +struct ocfs2_filecheck { + struct list_head fc_head; /* File check entry list head */ + spinlock_t fc_lock; + unsigned int fc_max;/* Maximum number of entry in list */ + unsigned int fc_size; /* Current entry count in list */ + unsigned int fc_done; /* Finished entry count in list */ +}; + +struct ocfs2_filecheck_sysfs_entry { /* sysfs entry per mounting */ + struct list_head fs_list; + atomic_t fs_count; + struct super_block *fs_sb; + struct kset *fs_devicekset; + struct kset *fs_fcheckkset; + struct ocfs2_filecheck *fs_fcheck; +}; + +#define OCFS2_FILECHECK_MAXSIZE100 +#define OCFS2_FILECHECK_MINSIZE10 + +/* File check operation type */ +enum { + OCFS2_FILECHECK_TYPE_CHK = 0, /* Check a file(inode) */ + OCFS2_FILECHECK_TYPE_FIX, /* Fix a file(inode) */ + OCFS2_FILECHECK_TYPE_SET = 100 /* Set entry list maximum size */ +}; + +struct ocfs2_filecheck_entry { + struct list_head fe_list; + unsigned long fe_ino; + unsigned int fe_type; + unsigned int fe_done:1; + unsigned int fe_status:31; +}; + +struct ocfs2_filecheck_args { + unsigned int fa_type; + union { + unsigned long fa_ino; + unsigned int fa_len; + }; +}; + +static const char * +ocfs2_filecheck_error(int errno) +{ + if (!errno) + return ocfs2_filecheck_errs[errno]; + + BUG_ON(errno < OCFS2_FILECHECK_ERR_START || + errno > OCFS2_FILECHECK_ERR_END); + return ocfs2_filecheck_errs[errno - OCFS2_FILECHECK_ERR_START + 1]; +} + +static ssize_t ocfs2_filecheck_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf); +static ssize_t ocfs2_filecheck_store(struct kobject *kobj, +struct kobj_attribute *attr, +const char *buf, size_t count); +static struct kobj_attribute ocfs2_attr_filecheck_chk = + __ATTR(check, S_IRUSR | S_IWUSR, + ocfs2_filecheck_show, + ocfs2_filecheck_store); +static struct kobj_attribute ocfs2_attr_filecheck_fix = + __ATTR(fix, S_IRUSR | S_IWUSR, + ocfs2_filecheck_show, + ocfs2_filecheck_store); +static struct
[PATCH v4 5/5] ocfs2: add feature document for online file check
This document will describe OCFS2 online file check feature. OCFS2 is often used in high-availaibility systems. However, OCFS2 usually converts the filesystem to read-only when encounters an error. This may not be necessary, since turning the filesystem read-only would affect other running processes as well, decreasing availability. Then, a mount option (errors=continue) is introduced, which would return the -EIO errno to the calling process and terminate furhter processing so that the filesystem is not corrupted further. The filesystem is not converted to read-only, and the problematic file's inode number is reported in the kernel log. The user can try to check/fix this file via online filecheck feature. Signed-off-by: Gang He Reviewed-by: Mark Fasheh --- .../filesystems/ocfs2-online-filecheck.txt | 94 ++ 1 file changed, 94 insertions(+) create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt b/Documentation/filesystems/ocfs2-online-filecheck.txt new file mode 100644 index 000..1ab0786 --- /dev/null +++ b/Documentation/filesystems/ocfs2-online-filecheck.txt @@ -0,0 +1,94 @@ + OCFS2 online file check + --- + +This document will describe OCFS2 online file check feature. + +Introduction + +OCFS2 is often used in high-availaibility systems. However, OCFS2 usually +converts the filesystem to read-only when encounters an error. This may not be +necessary, since turning the filesystem read-only would affect other running +processes as well, decreasing availability. +Then, a mount option (errors=continue) is introduced, which would return the +-EIO errno to the calling process and terminate furhter processing so that the +filesystem is not corrupted further. The filesystem is not converted to +read-only, and the problematic file's inode number is reported in the kernel +log. The user can try to check/fix this file via online filecheck feature. + +Scope += +This effort is to check/fix small issues which may hinder day-to-day operations +of a cluster filesystem by turning the filesystem read-only. The scope of +checking/fixing is at the file level, initially for regular files and eventually +to all files (including system files) of the filesystem. + +In case of directory to file links is incorrect, the directory inode is +reported as erroneous. + +This feature is not suited for extravagant checks which involve dependency of +other components of the filesystem, such as but not limited to, checking if the +bits for file blocks in the allocation has been set. In case of such an error, +the offline fsck should/would be recommended. + +Finally, such an operation/feature should not be automated lest the filesystem +may end up with more damage than before the repair attempt. So, this has to +be performed using user interaction and consent. + +User interface +== +When there are errors in the OCFS2 filesystem, they are usually accompanied +by the inode number which caused the error. This inode number would be the +input to check/fix the file. + +There is a sysfs directory for each OCFS2 file system mounting: + + /sys/fs/ocfs2//filecheck + +Here, indicates the name of OCFS2 volumn device which has been already +mounted. The file above would accept inode numbers. This could be used to +communicate with kernel space, tell which file(inode number) will be checked or +fixed. Currently, three operations are supported, which includes checking +inode, fixing inode and setting the size of result record history. + +1. If you want to know what error exactly happened to before fixing, do + + # echo "" > /sys/fs/ocfs2//filecheck/check + # cat /sys/fs/ocfs2//filecheck/check + +The output is like this: + INO DONEERROR +39502 1 GENERATION + + lists the inode numbers. + indicates whether the operation has been finished. + says what kind of errors was found. For the detailed error numbers, +please refer to the file linux/fs/ocfs2/filecheck.h. + +2. If you determine to fix this inode, do + + # echo "" > /sys/fs/ocfs2//filecheck/fix + # cat /sys/fs/ocfs2//filecheck/fix + +The output is like this: + INO DONEERROR +39502 1 SUCCESS + +This time, the column indicates whether this fix is successful or not. + +3. The record cache is used to store the history of check/fix results. It's +defalut size is 10, and can be adjust between the range of 10 ~ 100. You can +adjust the size like this: + + # echo "" > /sys/fs/ocfs2//filecheck/set + +Fixing stuff + +On receivng the inode, the filesystem would read the inode and the +file metadata. In case of errors, the filesystem would fix the errors +and report the problems it fixed in the kernel log. As a precautionary measure, +the inode must first be checked for errors before performing a final fix. + +The inode and the result
[PATCH v4 2/5] ocfs2: sysfile interfaces for online file check
Implement online file check sysfile interfaces, e.g. how to create the related sysfile according to device name, how to display/handle file check request from the sysfile. Signed-off-by: Gang He --- fs/ocfs2/Makefile| 3 +- fs/ocfs2/filecheck.c | 606 +++ fs/ocfs2/filecheck.h | 49 + fs/ocfs2/inode.h | 3 + 4 files changed, 660 insertions(+), 1 deletion(-) create mode 100644 fs/ocfs2/filecheck.c create mode 100644 fs/ocfs2/filecheck.h diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index ce210d4..e27e652 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -41,7 +41,8 @@ ocfs2-objs := \ quota_local.o \ quota_global.o \ xattr.o \ - acl.o + acl.o \ + filecheck.o ocfs2_stackglue-objs := stackglue.o ocfs2_stack_o2cb-objs := stack_o2cb.o diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c new file mode 100644 index 000..2cabbcf --- /dev/null +++ b/fs/ocfs2/filecheck.c @@ -0,0 +1,606 @@ +/* -*- mode: c; c-basic-offset: 8; -*- + * vim: noexpandtab sw=8 ts=8 sts=0: + * + * filecheck.c + * + * Code which implements online file check. + * + * Copyright (C) 2016 SuSE. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation, version 2. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ocfs2.h" +#include "ocfs2_fs.h" +#include "stackglue.h" +#include "inode.h" + +#include "filecheck.h" + + +/* File check error strings, + * must correspond with error number in header file. + */ +static const char * const ocfs2_filecheck_errs[] = { + "SUCCESS", + "FAILED", + "INPROGRESS", + "READONLY", + "INJBD", + "INVALIDINO", + "BLOCKECC", + "BLOCKNO", + "VALIDFLAG", + "GENERATION", + "UNSUPPORTED" +}; + +static DEFINE_SPINLOCK(ocfs2_filecheck_sysfs_lock); +static LIST_HEAD(ocfs2_filecheck_sysfs_list); + +struct ocfs2_filecheck { + struct list_head fc_head; /* File check entry list head */ + spinlock_t fc_lock; + unsigned int fc_max;/* Maximum number of entry in list */ + unsigned int fc_size; /* Current entry count in list */ + unsigned int fc_done; /* Finished entry count in list */ +}; + +struct ocfs2_filecheck_sysfs_entry { /* sysfs entry per mounting */ + struct list_head fs_list; + atomic_t fs_count; + struct super_block *fs_sb; + struct kset *fs_devicekset; + struct kset *fs_fcheckkset; + struct ocfs2_filecheck *fs_fcheck; +}; + +#define OCFS2_FILECHECK_MAXSIZE100 +#define OCFS2_FILECHECK_MINSIZE10 + +/* File check operation type */ +enum { + OCFS2_FILECHECK_TYPE_CHK = 0, /* Check a file(inode) */ + OCFS2_FILECHECK_TYPE_FIX, /* Fix a file(inode) */ + OCFS2_FILECHECK_TYPE_SET = 100 /* Set entry list maximum size */ +}; + +struct ocfs2_filecheck_entry { + struct list_head fe_list; + unsigned long fe_ino; + unsigned int fe_type; + unsigned int fe_done:1; + unsigned int fe_status:31; +}; + +struct ocfs2_filecheck_args { + unsigned int fa_type; + union { + unsigned long fa_ino; + unsigned int fa_len; + }; +}; + +static const char * +ocfs2_filecheck_error(int errno) +{ + if (!errno) + return ocfs2_filecheck_errs[errno]; + + BUG_ON(errno < OCFS2_FILECHECK_ERR_START || + errno > OCFS2_FILECHECK_ERR_END); + return ocfs2_filecheck_errs[errno - OCFS2_FILECHECK_ERR_START + 1]; +} + +static ssize_t ocfs2_filecheck_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf); +static ssize_t ocfs2_filecheck_store(struct kobject *kobj, +struct kobj_attribute *attr, +const char *buf, size_t count); +static struct kobj_attribute ocfs2_attr_filecheck_chk = + __ATTR(check, S_IRUSR | S_IWUSR, + ocfs2_filecheck_show, + ocfs2_filecheck_store); +static struct kobj_attribute ocfs2_attr_filecheck_fix = + __ATTR(fix, S_IRUSR | S_IWUSR, + ocfs2_filecheck_show, + ocfs2_filecheck_store); +static struct kobj_attribute
[PATCH v4 0/5] Add online file check feature
When there are errors in the ocfs2 filesystem, they are usually accompanied by the inode number which caused the error. This inode number would be the input to fixing the file. One of these options could be considered: A file in the sys filesytem which would accept inode numbers. This could be used to communication back what has to be fixed or is fixed. You could write: $# echo "" > /sys/fs/ocfs2/devname/filecheck/check or $# echo "" > /sys/fs/ocfs2/devname/filecheck/fix Compare with third version, I add buffer_jbd() check in inode block fix/writing dirty buffer back, make unsigned short type to unsigned int type for members in ocfs2_filecheck_entry struct, add feature document in this patch set. Compare with second version, I re-design filecheck sysfs interfaces, there are three sysfs files(check, fix and set) under filecheck directory(see above), sysfs will accept only one argument . Second, I adjust some code in ocfs2_filecheck_repair_inode_block() function according to upstream feedback, we cannot just add VALID_FL flag back as a inode block fix, then we will not fix this field corruption currently until having a complete solution. Compare with first version, I use strncasecmp instead of double strncmp functions. Second, update the source file contribution vendor. Gang He (5): ocfs2: export ocfs2_kset for online file check ocfs2: sysfile interfaces for online file check ocfs2: create/remove sysfile for online file check ocfs2: check/fix inode block for online file check ocfs2: add feature document for online file check .../filesystems/ocfs2-online-filecheck.txt | 94 fs/ocfs2/Makefile | 3 +- fs/ocfs2/filecheck.c | 606 + fs/ocfs2/filecheck.h | 49 ++ fs/ocfs2/inode.c | 225 +++- fs/ocfs2/inode.h | 3 + fs/ocfs2/ocfs2_trace.h | 2 + fs/ocfs2/stackglue.c | 3 +- fs/ocfs2/stackglue.h | 2 + fs/ocfs2/super.c | 5 + 10 files changed, 981 insertions(+), 11 deletions(-) create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt create mode 100644 fs/ocfs2/filecheck.c create mode 100644 fs/ocfs2/filecheck.h -- 2.1.2
linux-next: manual merge of the kvm-arm tree with the arm64 tree
Hi all, Today's linux-next merge of the kvm-arm tree got a conflict in: arch/arm64/include/asm/cpufeature.h between commit: 104a0c02e8b1 ("arm64: Add workaround for Cavium erratum 27456") from the arm64 tree and commit: d0be74f771d5 ("arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature") from the kvm-arm tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell diff --cc arch/arm64/include/asm/cpufeature.h index 1497163213ed,a5c769b1c65b.. --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@@ -30,12 -30,12 +30,13 @@@ #define ARM64_HAS_LSE_ATOMICS 5 #define ARM64_WORKAROUND_CAVIUM_23154 6 #define ARM64_WORKAROUND_834220 7 -/* #define ARM64_HAS_NO_HW_PREFETCH 8 */ -/* #define ARM64_HAS_UAO 9 */ -/* #define ARM64_ALT_PAN_NOT_UAO 10 */ +#define ARM64_HAS_NO_HW_PREFETCH 8 +#define ARM64_HAS_UAO 9 +#define ARM64_ALT_PAN_NOT_UAO 10 + #define ARM64_HAS_VIRT_HOST_EXTN 11 +#define ARM64_WORKAROUND_CAVIUM_27456 12 -#define ARM64_NCAPS 12 +#define ARM64_NCAPS 13 #ifndef __ASSEMBLY__
[PATCH v4 0/5] Add online file check feature
When there are errors in the ocfs2 filesystem, they are usually accompanied by the inode number which caused the error. This inode number would be the input to fixing the file. One of these options could be considered: A file in the sys filesytem which would accept inode numbers. This could be used to communication back what has to be fixed or is fixed. You could write: $# echo "" > /sys/fs/ocfs2/devname/filecheck/check or $# echo "" > /sys/fs/ocfs2/devname/filecheck/fix Compare with third version, I add buffer_jbd() check in inode block fix/writing dirty buffer back, make unsigned short type to unsigned int type for members in ocfs2_filecheck_entry struct, add feature document in this patch set. Compare with second version, I re-design filecheck sysfs interfaces, there are three sysfs files(check, fix and set) under filecheck directory(see above), sysfs will accept only one argument . Second, I adjust some code in ocfs2_filecheck_repair_inode_block() function according to upstream feedback, we cannot just add VALID_FL flag back as a inode block fix, then we will not fix this field corruption currently until having a complete solution. Compare with first version, I use strncasecmp instead of double strncmp functions. Second, update the source file contribution vendor. Gang He (5): ocfs2: export ocfs2_kset for online file check ocfs2: sysfile interfaces for online file check ocfs2: create/remove sysfile for online file check ocfs2: check/fix inode block for online file check ocfs2: add feature document for online file check .../filesystems/ocfs2-online-filecheck.txt | 94 fs/ocfs2/Makefile | 3 +- fs/ocfs2/filecheck.c | 606 + fs/ocfs2/filecheck.h | 49 ++ fs/ocfs2/inode.c | 225 +++- fs/ocfs2/inode.h | 3 + fs/ocfs2/ocfs2_trace.h | 2 + fs/ocfs2/stackglue.c | 3 +- fs/ocfs2/stackglue.h | 2 + fs/ocfs2/super.c | 5 + 10 files changed, 981 insertions(+), 11 deletions(-) create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt create mode 100644 fs/ocfs2/filecheck.c create mode 100644 fs/ocfs2/filecheck.h -- 2.1.2
linux-next: manual merge of the kvm-arm tree with the arm64 tree
Hi all, Today's linux-next merge of the kvm-arm tree got a conflict in: arch/arm64/include/asm/cpufeature.h between commit: 104a0c02e8b1 ("arm64: Add workaround for Cavium erratum 27456") from the arm64 tree and commit: d0be74f771d5 ("arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature") from the kvm-arm tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell diff --cc arch/arm64/include/asm/cpufeature.h index 1497163213ed,a5c769b1c65b.. --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@@ -30,12 -30,12 +30,13 @@@ #define ARM64_HAS_LSE_ATOMICS 5 #define ARM64_WORKAROUND_CAVIUM_23154 6 #define ARM64_WORKAROUND_834220 7 -/* #define ARM64_HAS_NO_HW_PREFETCH 8 */ -/* #define ARM64_HAS_UAO 9 */ -/* #define ARM64_ALT_PAN_NOT_UAO 10 */ +#define ARM64_HAS_NO_HW_PREFETCH 8 +#define ARM64_HAS_UAO 9 +#define ARM64_ALT_PAN_NOT_UAO 10 + #define ARM64_HAS_VIRT_HOST_EXTN 11 +#define ARM64_WORKAROUND_CAVIUM_27456 12 -#define ARM64_NCAPS 12 +#define ARM64_NCAPS 13 #ifndef __ASSEMBLY__
Re: [PATCH V3 3/3] vhost_net: basic polling support
On 02/29/2016 05:56 AM, Christian Borntraeger wrote: > On 02/26/2016 09:42 AM, Jason Wang wrote: >> > This patch tries to poll for new added tx buffer or socket receive >> > queue for a while at the end of tx/rx processing. The maximum time >> > spent on polling were specified through a new kind of vring ioctl. >> > >> > Signed-off-by: Jason Wang>> > --- >> > drivers/vhost/net.c| 79 >> > +++--- >> > drivers/vhost/vhost.c | 14 >> > drivers/vhost/vhost.h | 1 + >> > include/uapi/linux/vhost.h | 6 >> > 4 files changed, 95 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> > index 9eda69e..c91af93 100644 >> > --- a/drivers/vhost/net.c >> > +++ b/drivers/vhost/net.c >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info >> > *ubuf, bool success) >> >rcu_read_unlock_bh(); >> > } >> > >> > +static inline unsigned long busy_clock(void) >> > +{ >> > + return local_clock() >> 10; >> > +} >> > + >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev, >> > + unsigned long endtime) >> > +{ >> > + return likely(!need_resched()) && >> > + likely(!time_after(busy_clock(), endtime)) && >> > + likely(!signal_pending(current)) && >> > + !vhost_has_work(dev) && >> > + single_task_running(); >> > +} >> > + >> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net, >> > + struct vhost_virtqueue *vq, >> > + struct iovec iov[], unsigned int iov_size, >> > + unsigned int *out_num, unsigned int *in_num) >> > +{ >> > + unsigned long uninitialized_var(endtime); >> > + int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), >> > + out_num, in_num, NULL, NULL); >> > + >> > + if (r == vq->num && vq->busyloop_timeout) { >> > + preempt_disable(); >> > + endtime = busy_clock() + vq->busyloop_timeout; >> > + while (vhost_can_busy_poll(vq->dev, endtime) && >> > + vhost_vq_avail_empty(vq->dev, vq)) >> > + cpu_relax(); > Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for > almost > everybody but s390? cpu_relax (without low latency might give up the time > slice > when running under another hypervisor (like LPAR on s390), which might not be > what > we want here. Ok, will do this in next version.
Re: [PATCH V3 3/3] vhost_net: basic polling support
On 02/29/2016 05:56 AM, Christian Borntraeger wrote: > On 02/26/2016 09:42 AM, Jason Wang wrote: >> > This patch tries to poll for new added tx buffer or socket receive >> > queue for a while at the end of tx/rx processing. The maximum time >> > spent on polling were specified through a new kind of vring ioctl. >> > >> > Signed-off-by: Jason Wang >> > --- >> > drivers/vhost/net.c| 79 >> > +++--- >> > drivers/vhost/vhost.c | 14 >> > drivers/vhost/vhost.h | 1 + >> > include/uapi/linux/vhost.h | 6 >> > 4 files changed, 95 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> > index 9eda69e..c91af93 100644 >> > --- a/drivers/vhost/net.c >> > +++ b/drivers/vhost/net.c >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info >> > *ubuf, bool success) >> >rcu_read_unlock_bh(); >> > } >> > >> > +static inline unsigned long busy_clock(void) >> > +{ >> > + return local_clock() >> 10; >> > +} >> > + >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev, >> > + unsigned long endtime) >> > +{ >> > + return likely(!need_resched()) && >> > + likely(!time_after(busy_clock(), endtime)) && >> > + likely(!signal_pending(current)) && >> > + !vhost_has_work(dev) && >> > + single_task_running(); >> > +} >> > + >> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net, >> > + struct vhost_virtqueue *vq, >> > + struct iovec iov[], unsigned int iov_size, >> > + unsigned int *out_num, unsigned int *in_num) >> > +{ >> > + unsigned long uninitialized_var(endtime); >> > + int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), >> > + out_num, in_num, NULL, NULL); >> > + >> > + if (r == vq->num && vq->busyloop_timeout) { >> > + preempt_disable(); >> > + endtime = busy_clock() + vq->busyloop_timeout; >> > + while (vhost_can_busy_poll(vq->dev, endtime) && >> > + vhost_vq_avail_empty(vq->dev, vq)) >> > + cpu_relax(); > Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for > almost > everybody but s390? cpu_relax (without low latency might give up the time > slice > when running under another hypervisor (like LPAR on s390), which might not be > what > we want here. Ok, will do this in next version.
Re: [PATCH V3 3/3] vhost_net: basic polling support
On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote: > On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote: >> > This patch tries to poll for new added tx buffer or socket receive >> > queue for a while at the end of tx/rx processing. The maximum time >> > spent on polling were specified through a new kind of vring ioctl. >> > >> > Signed-off-by: Jason Wang> Looks good overall, but I still see one problem. > >> > --- >> > drivers/vhost/net.c| 79 >> > +++--- >> > drivers/vhost/vhost.c | 14 >> > drivers/vhost/vhost.h | 1 + >> > include/uapi/linux/vhost.h | 6 >> > 4 files changed, 95 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> > index 9eda69e..c91af93 100644 >> > --- a/drivers/vhost/net.c >> > +++ b/drivers/vhost/net.c >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info >> > *ubuf, bool success) >> >rcu_read_unlock_bh(); >> > } >> > >> > +static inline unsigned long busy_clock(void) >> > +{ >> > + return local_clock() >> 10; >> > +} >> > + >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev, >> > + unsigned long endtime) >> > +{ >> > + return likely(!need_resched()) && >> > + likely(!time_after(busy_clock(), endtime)) && >> > + likely(!signal_pending(current)) && >> > + !vhost_has_work(dev) && >> > + single_task_running(); > So I find it quite unfortunate that this still uses single_task_running. > This means that for example a SCHED_IDLE task will prevent polling from > becoming active, and that seems like a bug, or at least > an undocumented feature :). Yes, it may need more thoughts. > > Unfortunately this logic affects the behaviour as observed > by userspace, so we can't merge it like this and tune > afterwards, since otherwise mangement tools will start > depending on this logic. > > How about remove single_task_running() first here and optimize on top? We probably need something like this to handle overcommitment.
Re: [PATCH V3 3/3] vhost_net: basic polling support
On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote: > On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote: >> > This patch tries to poll for new added tx buffer or socket receive >> > queue for a while at the end of tx/rx processing. The maximum time >> > spent on polling were specified through a new kind of vring ioctl. >> > >> > Signed-off-by: Jason Wang > Looks good overall, but I still see one problem. > >> > --- >> > drivers/vhost/net.c| 79 >> > +++--- >> > drivers/vhost/vhost.c | 14 >> > drivers/vhost/vhost.h | 1 + >> > include/uapi/linux/vhost.h | 6 >> > 4 files changed, 95 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> > index 9eda69e..c91af93 100644 >> > --- a/drivers/vhost/net.c >> > +++ b/drivers/vhost/net.c >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info >> > *ubuf, bool success) >> >rcu_read_unlock_bh(); >> > } >> > >> > +static inline unsigned long busy_clock(void) >> > +{ >> > + return local_clock() >> 10; >> > +} >> > + >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev, >> > + unsigned long endtime) >> > +{ >> > + return likely(!need_resched()) && >> > + likely(!time_after(busy_clock(), endtime)) && >> > + likely(!signal_pending(current)) && >> > + !vhost_has_work(dev) && >> > + single_task_running(); > So I find it quite unfortunate that this still uses single_task_running. > This means that for example a SCHED_IDLE task will prevent polling from > becoming active, and that seems like a bug, or at least > an undocumented feature :). Yes, it may need more thoughts. > > Unfortunately this logic affects the behaviour as observed > by userspace, so we can't merge it like this and tune > afterwards, since otherwise mangement tools will start > depending on this logic. > > How about remove single_task_running() first here and optimize on top? We probably need something like this to handle overcommitment.
[lkp] [n_tty] dd9a6fee68: INFO: possible circular locking dependency detected ]
FYI, we noticed the below changes on https://github.com/0day-ci/linux Brian-Bloniarz/Re-n_tty-Check-the-other-end-of-pty-pair-before-returning-EAGAIN-on-a-read/20160229-070452 commit dd9a6fee6830f16f602b1aa2e85d6307acd04945 ("n_tty: Check the other end of pty pair before returning EAGAIN on a read()") ++--++ || v4.5-rc6 | dd9a6fee68 | ++--++ | boot_successes | 128 | 2 | | boot_failures | 9| 6 | | invoked_oom-killer:gfp_mask=0x | 9| 1 | | Mem-Info | 9| 2 | | Out_of_memory:Kill_process | 9| 1 | | backtrace:vfs_write| 1|| | backtrace:SyS_write| 1|| | backtrace:do_execveat_common | 1|| | backtrace:compat_SyS_execve| 1|| | backtrace:vfs_read | 1| 4 | | backtrace:SyS_read | 1| 4 | | backtrace:compat_process_vm_rw | 1|| | backtrace:compat_SyS_process_vm_readv | 1|| | backtrace:_do_fork | 1|| | backtrace:SyS_clone| 1|| | page_allocation_failure:order:#,mode | 0| 1 | | warn_alloc_failed+0x | 0| 1 | | backtrace:kswapd | 0| 1 | | INFO:possible_circular_locking_dependency_detected | 0| 4 | | backtrace:flush_to_ldisc | 0| 4 | ++--++ [ 17.523349] mount (2393) used greatest stack depth: 12392 bytes left [ 17.684314] [ 17.684972] == [ 17.686059] [ INFO: possible circular locking dependency detected ] [ 17.687174] 4.5.0-rc6-1-gdd9a6fe #64 Not tainted [ 17.688127] --- [ 17.689216] bootlogd/2434 is trying to acquire lock: [ 17.690167] ((>work)){+.+...}, at: [] flush_work+0x5/0x23d [ 17.692006] [ 17.692006] but task is already holding lock: [ 17.693433] (>termios_rwsem){..}, at: [] n_tty_read+0xd0/0x882 [ 17.695346] [ 17.695346] which lock already depends on the new lock. [ 17.695346] [ 17.697370] [ 17.697370] the existing dependency chain (in reverse order) is: [ 17.698961] -> #2 (>termios_rwsem){..}: [ 17.700507][] lock_acquire+0x147/0x1e2 [ 17.701621][] down_read+0x48/0x90 [ 17.702696][] n_tty_receive_buf_common+0x46/0x8c0 [ 17.703900][] n_tty_receive_buf2+0x14/0x16 [ 17.705046][] flush_to_ldisc+0xcb/0x125 [ 17.706167][] process_one_work+0x2b8/0x5b2 [ 17.707339][] worker_thread+0x28b/0x37d [ 17.708454][] kthread+0xfb/0x103 [ 17.709511][] ret_from_fork+0x3f/0x70 [ 17.710614] -> #1 (>lock){+.+...}: [ 17.712070][] lock_acquire+0x147/0x1e2 [ 17.713185][] mutex_lock_nested+0x79/0x35f [ 17.714328][] flush_to_ldisc+0x4b/0x125 [ 17.715443][] process_one_work+0x2b8/0x5b2 [ 17.716587][] worker_thread+0x28b/0x37d [ 17.717700][] kthread+0xfb/0x103 [ 17.718752][] ret_from_fork+0x3f/0x70 [ 17.719855] -> #0 ((>work)){+.+...}: [ 17.721333][] __lock_acquire+0x12dd/0x1932 [ 17.722489][] lock_acquire+0x147/0x1e2 [ 17.723598][] flush_work+0x3a/0x23d [ 17.724683][] n_tty_read+0x308/0x882 [ 17.725771][] tty_read+0x8b/0xcd [ 17.726830][] __vfs_read+0x26/0xb9 [ 17.727910][] vfs_read+0xa0/0x12e [ 17.728974][] SyS_read+0x51/0x92 [ 17.730032][] entry_SYSCALL_64_fastpath+0x12/0x72 [ 17.731237] [ 17.731237] other info that might help us debug this: [ 17.731237] [ 17.733255] Chain exists of: (>work) --> >lock --> >termios_rwsem [ 17.735644] Possible unsafe locking scenario: [ 17.735644] [ 17.737064]CPU0CPU1 [ 17.737969] [ 17.738873] lock(>termios_rwsem); [ 17.739832]lock(>lock); [ 17.740966]lock(>termios_rwsem); [ 17.742181] lock((>work)); [ 17.743081] [ 17.743081] *** DEADLOCK *** [ 17.743081] [ 17.744901] 3 locks held by
[lkp] [n_tty] dd9a6fee68: INFO: possible circular locking dependency detected ]
FYI, we noticed the below changes on https://github.com/0day-ci/linux Brian-Bloniarz/Re-n_tty-Check-the-other-end-of-pty-pair-before-returning-EAGAIN-on-a-read/20160229-070452 commit dd9a6fee6830f16f602b1aa2e85d6307acd04945 ("n_tty: Check the other end of pty pair before returning EAGAIN on a read()") ++--++ || v4.5-rc6 | dd9a6fee68 | ++--++ | boot_successes | 128 | 2 | | boot_failures | 9| 6 | | invoked_oom-killer:gfp_mask=0x | 9| 1 | | Mem-Info | 9| 2 | | Out_of_memory:Kill_process | 9| 1 | | backtrace:vfs_write| 1|| | backtrace:SyS_write| 1|| | backtrace:do_execveat_common | 1|| | backtrace:compat_SyS_execve| 1|| | backtrace:vfs_read | 1| 4 | | backtrace:SyS_read | 1| 4 | | backtrace:compat_process_vm_rw | 1|| | backtrace:compat_SyS_process_vm_readv | 1|| | backtrace:_do_fork | 1|| | backtrace:SyS_clone| 1|| | page_allocation_failure:order:#,mode | 0| 1 | | warn_alloc_failed+0x | 0| 1 | | backtrace:kswapd | 0| 1 | | INFO:possible_circular_locking_dependency_detected | 0| 4 | | backtrace:flush_to_ldisc | 0| 4 | ++--++ [ 17.523349] mount (2393) used greatest stack depth: 12392 bytes left [ 17.684314] [ 17.684972] == [ 17.686059] [ INFO: possible circular locking dependency detected ] [ 17.687174] 4.5.0-rc6-1-gdd9a6fe #64 Not tainted [ 17.688127] --- [ 17.689216] bootlogd/2434 is trying to acquire lock: [ 17.690167] ((>work)){+.+...}, at: [] flush_work+0x5/0x23d [ 17.692006] [ 17.692006] but task is already holding lock: [ 17.693433] (>termios_rwsem){..}, at: [] n_tty_read+0xd0/0x882 [ 17.695346] [ 17.695346] which lock already depends on the new lock. [ 17.695346] [ 17.697370] [ 17.697370] the existing dependency chain (in reverse order) is: [ 17.698961] -> #2 (>termios_rwsem){..}: [ 17.700507][] lock_acquire+0x147/0x1e2 [ 17.701621][] down_read+0x48/0x90 [ 17.702696][] n_tty_receive_buf_common+0x46/0x8c0 [ 17.703900][] n_tty_receive_buf2+0x14/0x16 [ 17.705046][] flush_to_ldisc+0xcb/0x125 [ 17.706167][] process_one_work+0x2b8/0x5b2 [ 17.707339][] worker_thread+0x28b/0x37d [ 17.708454][] kthread+0xfb/0x103 [ 17.709511][] ret_from_fork+0x3f/0x70 [ 17.710614] -> #1 (>lock){+.+...}: [ 17.712070][] lock_acquire+0x147/0x1e2 [ 17.713185][] mutex_lock_nested+0x79/0x35f [ 17.714328][] flush_to_ldisc+0x4b/0x125 [ 17.715443][] process_one_work+0x2b8/0x5b2 [ 17.716587][] worker_thread+0x28b/0x37d [ 17.717700][] kthread+0xfb/0x103 [ 17.718752][] ret_from_fork+0x3f/0x70 [ 17.719855] -> #0 ((>work)){+.+...}: [ 17.721333][] __lock_acquire+0x12dd/0x1932 [ 17.722489][] lock_acquire+0x147/0x1e2 [ 17.723598][] flush_work+0x3a/0x23d [ 17.724683][] n_tty_read+0x308/0x882 [ 17.725771][] tty_read+0x8b/0xcd [ 17.726830][] __vfs_read+0x26/0xb9 [ 17.727910][] vfs_read+0xa0/0x12e [ 17.728974][] SyS_read+0x51/0x92 [ 17.730032][] entry_SYSCALL_64_fastpath+0x12/0x72 [ 17.731237] [ 17.731237] other info that might help us debug this: [ 17.731237] [ 17.733255] Chain exists of: (>work) --> >lock --> >termios_rwsem [ 17.735644] Possible unsafe locking scenario: [ 17.735644] [ 17.737064]CPU0CPU1 [ 17.737969] [ 17.738873] lock(>termios_rwsem); [ 17.739832]lock(>lock); [ 17.740966]lock(>termios_rwsem); [ 17.742181] lock((>work)); [ 17.743081] [ 17.743081] *** DEADLOCK *** [ 17.743081] [ 17.744901] 3 locks held by