Re: [PATCH v4 5/5] staging/android: add flags member to sync ioctl structs
On 27 February 2016 at 15:27, Gustavo Padovan wrote: > Hi Emil, > > 2016-02-27 Emil Velikov : > >> Hi Gustavo, >> >> On 26 February 2016 at 18:31, Gustavo Padovan wrote: >> > From: Gustavo Padovan >> > >> > Play safe and add flags member to all structs. So we don't need to >> > break API or create new IOCTL in the future if new features that requires >> > flags arises. >> > >> > v2: check if flags are valid (zero, in this case) >> > >> > Signed-off-by: Gustavo Padovan >> > --- >> > drivers/staging/android/sync.c | 7 ++- >> > drivers/staging/android/uapi/sync.h | 6 ++ >> > 2 files changed, 12 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/staging/android/sync.c >> > b/drivers/staging/android/sync.c >> > index 837cff5..54fd5ab 100644 >> > --- a/drivers/staging/android/sync.c >> > +++ b/drivers/staging/android/sync.c >> > @@ -445,6 +445,11 @@ static long sync_file_ioctl_merge(struct sync_file >> > *sync_file, >> > goto err_put_fd; >> > } >> > >> > + if (data.flags) { >> > + err = -EFAULT; >> -EINVAL ? >> >> > + goto err_put_fd; >> > + } >> > + >> > fence2 = sync_file_fdget(data.fd2); >> > if (!fence2) { >> > err = -ENOENT; >> > @@ -511,7 +516,7 @@ static long sync_file_ioctl_fence_info(struct >> > sync_file *sync_file, >> > if (copy_from_user(&in, (void __user *)arg, sizeof(*info))) >> > return -EFAULT; >> > >> > - if (in.status || strcmp(in.name, "\0")) >> > + if (in.status || in.flags || strcmp(in.name, "\0")) >> > return -EFAULT; >> -EINVAL ? >> >> > >> > if (in.num_fences && !in.sync_fence_info) >> > diff --git a/drivers/staging/android/uapi/sync.h >> > b/drivers/staging/android/uapi/sync.h >> > index 9aad623..f56a6c2 100644 >> > --- a/drivers/staging/android/uapi/sync.h >> > +++ b/drivers/staging/android/uapi/sync.h >> > @@ -19,11 +19,13 @@ >> > * @fd2: file descriptor of second fence >> > * @name: name of new fence >> > * @fence: returns the fd of the new fence to userspace >> > + * @flags: merge_data flags >> > */ >> > struct sync_merge_data { >> > __s32 fd2; >> > charname[32]; >> > __s32 fence; >> > + __u32 flags; >> The overall size of the struct is not multiple of 64bit, so things >> will end up badly if we decide to extend it in the future. Even if >> there's a small chance that update will be needed, we might as well >> pad it now (and check the padding for zero, returning -EINVAL). > > I think name could be the first field here. > Up-to you really. I'm afraid that it doesn't resolve the issue :-( As a test add a u64 value at the end of the struct and check the output of pahole for 32 and 64 bit build. >> >> > }; >> > >> > /** >> > @@ -31,12 +33,14 @@ struct sync_merge_data { >> > * @obj_name: name of parent sync_timeline >> > * @driver_name: name of driver implementing the parent >> > * @status:status of the fence 0:active 1:signaled <0:error >> > + * @flags: fence_info flags >> > * @timestamp_ns: timestamp of status change in nanoseconds >> > */ >> > struct sync_fence_info { >> > charobj_name[32]; >> > chardriver_name[32]; >> > __s32 status; >> > + __u32 flags; >> > __u64 timestamp_ns; >> Should we be doing some form of validation in sync_fill_fence_info() >> of 'flags' ? > > Do you think it is necessary? The kernel allocates a zero'ed buffer to > fill sync_fence_info array. > Good point. Missed out the z in kzalloc :-) -Emil
Re: [PATCH v2] signals, pkeys: make si_pkey 32 bits
* Stephen Rothwell wrote: > In order to prevent a change of alignment of the _sifields union in the > siginfo structure on (some) 32 bit platforms and an ABI breakage, we > change the type of _pkey to unsigned int. If more bits are needed in > the future, a second unsigned int could be added. > > Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key > faults") > Acked-by: Dave Hansen > Signed-off-by: Stephen Rothwell > --- > arch/ia64/include/uapi/asm/siginfo.h | 2 +- > arch/mips/include/uapi/asm/siginfo.h | 2 +- > include/uapi/asm-generic/siginfo.h | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/ia64/include/uapi/asm/siginfo.h > b/arch/ia64/include/uapi/asm/siginfo.h > index 0151cfab929d..19e7db0c9453 100644 > --- a/arch/ia64/include/uapi/asm/siginfo.h > +++ b/arch/ia64/include/uapi/asm/siginfo.h > @@ -70,7 +70,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > > diff --git a/arch/mips/include/uapi/asm/siginfo.h > b/arch/mips/include/uapi/asm/siginfo.h > index 6f4edf0d794c..3cc14f4a5936 100644 > --- a/arch/mips/include/uapi/asm/siginfo.h > +++ b/arch/mips/include/uapi/asm/siginfo.h > @@ -93,7 +93,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > > diff --git a/include/uapi/asm-generic/siginfo.h > b/include/uapi/asm-generic/siginfo.h > index 90384d55225b..f4459dc3d31b 100644 > --- a/include/uapi/asm-generic/siginfo.h > +++ b/include/uapi/asm-generic/siginfo.h > @@ -98,7 +98,7 @@ typedef struct siginfo { > void __user *_upper; > } _addr_bnd; > /* used when si_code=SEGV_PKUERR */ > - u64 _pkey; > + unsigned int _pkey; > }; > } _sigfault; > Please use the standard ABI integer type pattern: __u32. The advantage of only using __[su][8|16|32|64] integer types is that it's "obvious" at a glance that an ABI is bitness-invariant. For example include/uapi/linux/perf_event.h only uses such ABI-safe types, and arch/x86/include/uapi is using these types 95%+ of the time. ( The various struct siginfo definitions should probably be harmonized as well, but in a separate patch. ) Thanks, Ingo
Re: linux-next: manual merge of the iommu tree with the samsung-krzk tree
Hi Stephen, On Mon, Feb 29, 2016 at 03:20:55PM +1100, Stephen Rothwell wrote: > Hi Joerg, > > Today's linux-next merge of the iommu tree got a conflict in: > > drivers/memory/Kconfig > > between commit: > > 78fbb9361ca3 ("memory: Add support for Exynos SROM driver") > > from the samsung-krzk tree and commit: > > cc8bbe1a8312 ("memory: mediatek: Add SMI driver") > > from the iommu tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). Thanks for fixing this (and the other conflict before) up. Joerg
Re: [PATCH] mm: __delete_from_page_cache WARN_ON(page_mapped)
2016-02-29 13:49 GMT+09:00 Hugh Dickins : > Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() > for compound pages") changed the famous BUG_ON(page_mapped(page)) in > __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which > gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not. > > Although it has not usually been very helpul, being hit long after the > error in question, we do need to know if it actually happens on users' > systems; but reinstating a crash there is likely to be opposed :) > > In the non-debug case, use WARN_ON() plus dump_page() and add_taint() - > I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the > standard procedure now. Move that, or the VM_BUG_ON_PAGE(), up before > the deletion from tree: so that the unNULLified page->mapping gives a > little more information. > > If the inode is being evicted (rather than truncated), it won't have > any vmas left, so it's safe(ish) to assume that the raised mapcount is > erroneous, and we can discount it from page_count to avoid leaking the > page (I'm less worried by leaking the occasional 4kB, than losing a > potential 2MB page with each 4kB page leaked). > > Signed-off-by: Hugh Dickins > --- > I think this should go into v4.5, so I've written it with an atomic_sub > on page->_count; but Joonsoo will probably want some page_ref thingy. Okay. I will do it after this patch is merged. Thanks for notification. Thanks.
Re: log spammed with "loading xx failed with error -2" since commit e40ba6d56b [replace call to fw_read_file_contents() with kernel version]
On Sun, 28 Feb 2016, Luis R. Rodriguez wrote: > >From e63d19975787c0e237a47c17efd01e41b2a8e2fa Mon Sep 17 00:00:00 2001 > From: "Luis R. Rodriguez" > Date: Sat, 27 Feb 2016 14:58:08 -0800 > Subject: [PATCH] firmware: change kernel read fail to dev_dbg() > Applied to git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next -- James Morris
Re: [PATCH] [RFC] mm/page_ref, crypto/async_pq: don't put_page from __exit
2016-02-29 6:57 GMT+09:00 Arnd Bergmann : > The addition of tracepoints to the page reference tracking had an > unfortunate side-effect in at least one driver that calls put_page > from its exit function, resulting in a link error: > > `.exit.text' referenced in section `__jump_table' of crypto/built-in.o: > defined in discarded section `.exit.text' of crypto/built-in.o > > I could not come up with a nice solution that ignores __jump_table > entries in discarded code, so we probably now have to treat this > as something a driver is not allowed to do. Removing the __exit > annotation avoids the problem in this particular driver, but the > same problem could come back any time in other code. > > On a related problem regarding the runtime patching for SMP > operations on ARM uniprocessor systems, we resorted to not > drop the .exit section at link time, but that doesn't seem > appropriate here. > > Signed-off-by: Arnd Bergmann > Fixes: 0f80830dd044 ("mm/page_ref: add tracepoint to track down page > reference manipulation") > --- > crypto/async_tx/async_pq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c > index c0748bbd4c08..be167145aa55 100644 > --- a/crypto/async_tx/async_pq.c > +++ b/crypto/async_tx/async_pq.c > @@ -442,7 +442,7 @@ static int __init async_pq_init(void) > return -ENOMEM; > } > > -static void __exit async_pq_exit(void) > +static void async_pq_exit(void) > { > put_page(pq_scribble_page); > } Hello, Arnd. I think that we can avoid this error by using __free_page(). It would not be inlined so calling it would have no problem. Could you test it, please? Thanks.
Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]
On 02/26/2016, 08:59 PM, Robert Święcki wrote: > It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e. > newer AMD FX, and Opteron 300 series (4300, 6300 etc.). Ok, I can confirm this is: AMD Opteron(tm) Processor 6348 And: microcode: CPU0: patch_level=0x06000836 Thank all the interested parties! -- js suse labs
Re: [PATCH v5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism
On Fri, Feb 26, 2016 at 11:29:52AM +0100, Borislav Petkov wrote: > On Fri, Feb 26, 2016 at 11:18:28AM +0100, Thomas Gleixner wrote: > > On Fri, 26 Feb 2016, Huang Rui wrote: > > > +/* Event code: LSB 8 bits, passed in attr->config any other bit is > > > reserved. */ > > > +#define AMD_POWER_EVENT_MASK 0xFFULL > > > + > > > +#define MAX_CUS 8 > > > > What's that define for? Max compute units? So is that stuff eternaly limited > > to 8? > > I already sent him a cleaned up version with that dumbness removed: > > https://lkml.kernel.org/r/20160128145436.ge14...@pd.tnic > > Rui, what's up? > Sorry, I will remove superfluous MAX_CUS check at next version. Thanks, Rui
[PATCH] PCI: PTM preliminary implementation
Simplified Precision Time Measurement driver, activates PTM feature if a PCIe PTM requester (as per PCI Express 3.1 Base Specification section 7.32)is found, but not before checking if the rest of the PCI hierarchy can support it. The driver does not take part in facilitating PTM conversations, neither does it provide any useful services, it is only responsible for setting up the required configuration space bits. As of writing, there aren't any PTM capable devices on the market yet, but it is supported by the Intel Apollo Lake platform. Signed-off-by: Yong, Jonathan --- drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 95d9e7b..c634fd11 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1335,6 +1335,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev) /* Active State Power Management */ pcie_aspm_create_sysfs_dev_files(dev); + /* PTM */ + pci_create_ptm_sysfs(dev); + if (!pci_probe_reset_function(dev)) { retval = device_create_file(&dev->dev, &reset_attr); if (retval) @@ -1433,6 +1436,10 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev) } pcie_aspm_remove_sysfs_dev_files(dev); + + /* PTM */ + pci_release_ptm_sysfs(dev); + if (dev->reset_fn) { device_remove_file(&dev->dev, &reset_attr); dev->reset_fn = 0; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9a1660f..fb90420 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -320,6 +320,27 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev, void pci_enable_acs(struct pci_dev *dev); +#ifdef CONFIG_PCIEPORTBUS +int pci_enable_ptm(struct pci_dev *dev); +void pci_create_ptm_sysfs(struct pci_dev *dev); +void pci_release_ptm_sysfs(struct pci_dev *dev); +void pci_disable_ptm(struct pci_dev *dev); +#else +static inline int pci_enable_ptm(struct pci_dev *dev) +{ + return -ENXIO; +} +static inline void pci_create_ptm_sysfs(struct pci_dev *dev) +{ +} +static inline void pci_release_ptm_sysfs(struct pci_dev *dev) +{ +} +static inline void pci_disable_ptm(struct pci_dev *dev) +{ +} +#endif + struct pci_dev_reset_methods { u16 vendor; u16 device; diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig index e294713..f65ff4d 100644 --- a/drivers/pci/pcie/Kconfig +++ b/drivers/pci/pcie/Kconfig @@ -80,3 +80,11 @@ endchoice config PCIE_PME def_bool y depends on PCIEPORTBUS && PM + +config PCIE_PTM + bool "Turn on Precision Time Management by default" + depends on PCIEPORTBUS + help + Say Y here to enable PTM feature on PCI Express devices that + support them as they are found during device enumeration. Otherwise + the feature can be enabled manually through sysfs entries. diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile index 00c62df..d18b4c7 100644 --- a/drivers/pci/pcie/Makefile +++ b/drivers/pci/pcie/Makefile @@ -5,7 +5,7 @@ # Build PCI Express ASPM if needed obj-$(CONFIG_PCIEASPM) += aspm.o -pcieportdrv-y := portdrv_core.o portdrv_pci.o portdrv_bus.o +pcieportdrv-y := portdrv_core.o portdrv_pci.o portdrv_bus.o pcie_ptm.o pcieportdrv-$(CONFIG_ACPI) += portdrv_acpi.o obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o diff --git a/drivers/pci/pcie/pcie_ptm.c b/drivers/pci/pcie/pcie_ptm.c new file mode 100644 index 000..a128c79 --- /dev/null +++ b/drivers/pci/pcie/pcie_ptm.c @@ -0,0 +1,353 @@ +/* + * PCI Express Precision Time Measurement + * Copyright (c) 2016, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ +#include +#include +#include +#include "../pci.h" + +#define PCI_PTM_REQ0x0001 /* Requester capable */ +#define PCI_PTM_RSP 0x0002 /* Responder capable */ +#define PCI_PTM_ROOT 0x0004 /* Root capable */ +#define PCI_PTM_GRANULITY 0xFF00 /* Local clock granulity */ +#define PCI_PTM_ENABLE 0x0001 /* PTM enable */ +#define PCI_PTM_ROOT_SEL 0x0002 /* Root select */ + +#define PCI_PTM_HEADER_REG_OFFSET
[RFC] PCI: PTM Driver
Hello LKML, This is a preliminary implementation of the PTM[1] support driver, the code is obviously hacked together and in need of refactoring. This driver has only been tested against a virtual PCI bus. The drivers job is to get to every PTM capable device, set some PCI config space bits, then go back to sleep [2]. PTM capable PCIe devices will get a new sysfs entry to allow PTM to be enabled if automatic PTM activation is disabled, or disabled if so desired. Comments? Should I explain the PTM registers in more details? Please CC me, thanks. [1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint clocks against the host clock as specified in the PCI Express Base Specification 3.1. It is identified by the 0x001f extended capability ID. PTM capable devices are split into 3 roles, master, responder and requester. Summary as follows: A master holds the master clock that will be used for all devices under its domain (not to be confused with PCI domains). There may be multiple masters in a PTM hierarchy, in which case, the highest master closest to the root complex will be selected for the PTM domain. A master is also always responder capable. Clock precision is signified by a Local Clock Granularity field, in nano-seconds. A responder responds to any PTM synchronization requests from a downstream device. A responder is typically a switch device. It may also hold a local clock signified by a non-zero Local Clock Granularity field. A value of 0 signifies that the device simply propagates timing information from upstream devices. A requester is typically an endpoint that will request synchronization updates from an upstream PTM capable time source. The driver will update the Effective Clock Granularity field based on the same field from the PTM domain master. The field should be programed with a value of 0 if any intervening responder has a Local Clock Granularity field value of 0. [2] The software drivers never see the PTM packets, the PCI Express Base Specificaton 3.1 reads: PTM capable components can make their PTM context available for inspection by software, enabling software to translate timing information between local times and PTM Master Time. This isn't very informative. Yong, Jonathan (1): PCI: PTM preliminary implementation drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c -- 2.4.10
Re: [GIT PULL] tpmdd fix
On Fri, 26 Feb 2016, Jarkko Sakkinen wrote: > Hi James, > > this is the fix for the build warning. > > /Jarkko > > The following changes since commit 481873d06f2bf2ad732450a3a5fa5b8c2a07ef88: > > Merge branch 'next' of > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next > (2016-02-26 15:06:41 +1100) > > are available in the git repository at: > > https://github.com/jsakkine/linux-tpmdd.git tags/tpmdd-next-20160226 > > for you to fetch changes up to 2cb6d6460f1a171c71c134e0efe3a94c2206d080: > > tpm_tis: fix build warning with tpm_tis_resume (2016-02-26 11:32:07 +0200) > > > tpmdd fix > > > Jarkko Sakkinen (1): > tpm_tis: fix build warning with tpm_tis_resume > Pulled to -next. -- James Morris
[RFC] PCI: PTM Driver
Hello LKML, This is a preliminary implementation of the PTM[1] support driver, the code is obviously hacked together and in need of refactoring. This driver has only been tested against a virtual PCI bus. The drivers job is to get to every PTM capable device, set some PCI config space bits, then go back to sleep [2]. PTM capable PCIe devices will get a new sysfs entry to allow PTM to be enabled if automatic PTM activation is disabled, or disabled if so desired. Comments? Should I explain the PTM registers in more details? Please CC me, thanks. [1] Precision Time Measurement: A protocol for synchronizing PCIe endpoint clocks against the host clock as specified in the PCI Express Base Specification 3.1. It is identified by the 0x001f extended capability ID. PTM capable devices are split into 3 roles, master, responder and requester. Summary as follows: A master holds the master clock that will be used for all devices under its domain (not to be confused with PCI domains). There may be multiple masters in a PTM hierarchy, in which case, the highest master closest to the root complex will be selected for the PTM domain. A master is also always responder capable. Clock precision is signified by a Local Clock Granularity field, in nano-seconds. A responder responds to any PTM synchronization requests from a downstream device. A responder is typically a switch device. It may also hold a local clock signified by a non-zero Local Clock Granularity field. A value of 0 signifies that the device simply propagates timing information from upstream devices. A requester is typically an endpoint that will request synchronization updates from an upstream PTM capable time source. The driver will update the Effective Clock Granularity field based on the same field from the PTM domain master. The field should be programed with a value of 0 if any intervening responder has a Local Clock Granularity field value of 0. [2] The software drivers never see the PTM packets, the PCI Express Base Specificaton 3.1 reads: PTM capable components can make their PTM context available for inspection by software, enabling software to translate timing information between local times and PTM Master Time. This isn't very informative. Yong, Jonathan (1): PCI: PTM preliminary implementation drivers/pci/pci-sysfs.c | 7 + drivers/pci/pci.h | 21 +++ drivers/pci/pcie/Kconfig| 8 + drivers/pci/pcie/Makefile | 2 +- drivers/pci/pcie/pcie_ptm.c | 353 drivers/pci/probe.c | 3 + 6 files changed, 393 insertions(+), 1 deletion(-) create mode 100644 drivers/pci/pcie/pcie_ptm.c -- 2.4.10
Re: [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0
Le 23/10/2015 05:33, Scott Wood a écrit : On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote: Simplify csum_add(a, b) in case a or b is constant 0 Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/checksum.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index 56deea8..f8a9704 100644 --- a/arch/powerpc/include/asm/checksum.h +++ b/arch/powerpc/include/asm/checksum.h @@ -119,7 +119,13 @@ static inline __wsum csum_add(__wsum csum, __wsum addend) { #ifdef __powerpc64__ u64 res = (__force u64)csum; +#endif + if (__builtin_constant_p(csum) && csum == 0) + return addend; + if (__builtin_constant_p(addend) && addend == 0) + return csum; +#ifdef __powerpc64__ res += (__force u64)addend; return (__force __wsum)((u32)res + (res >> 32)); #else How often does this happen? In the following patch (9/9), csum_add() is used to implement csum_partial() for small blocks. In several places in the networking code, csum_partial() is called with 0 as initial sum. Christophe
Re: [PATCH 4/9] powerpc: inline ip_fast_csum()
Le 23/09/2015 07:43, Denis Kirjanov a écrit : On 9/22/15, Christophe Leroy wrote: In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Hi Christophe, I did try it and see no difference on ppc64. Did you test with socklib with modified loopback and if so do you have any numbers? Hi Denis, I put a mftbl at start and end of ip_send_check() and tested on a MPC885: * Without ip_fast_csum() inlined, approxymatly 7 TB ticks are spent in ip_send_check() * With ip_fast_csum() inlined, approxymatly 5,4 TB ticks are spent in ip_send_check() So it is about 23% time reduction. Christophe
Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
On Mon, Feb 29, 2016 at 3:03 AM, Hugh Dickins wrote: > On Fri, 19 Feb 2016, Andrew Morton wrote: >> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov >> wrote: >> >> > >> What are your thoughts on this? >> > > >> > > My thoughts are NAK. A misleading stat is not so bad as a >> > > misleading stat whose meaning we change in some random kernel. >> > > >> > > By all means improve Documentation/filesystems/proc.txt on Cached. >> > > By all means promote Active(file)+Inactive(file)-Buffers as often a >> > > better measure (though Buffers itself is obscure to me - is it intended >> > > usually to approximate resident FS metadata?). By all means work on >> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions). >> > > >> > > We have to assume that Cached has been useful to some people, and that >> > > they've learnt to subtract Shmem from it, if slow or no swap concerns >> > > them. >> > > >> > > Added Konstantin to Cc: he's had valuable experience of people learning >> > > to adapt to the numbers that we put out. >> > > >> > >> > I think everything will ok. Subtraction of shmem isn't widespread practice, >> > more like secret knowledge. This wasn't documented and people who use >> > this should be aware that this might stop working at any time. So, ACK. >> >> It worries me as well - we're deliberately altering the behaviour of >> existing userspace code. Not all of those alterations will be welcome! >> >> We could add a shiny new field into meminfo and train people to migrate >> to that. But that would just be a sum of already-available fields. In >> an ideal world we could solve all of this with documentation and >> cluebatting (and some apologizing!). > > Ah, I missed this, and just sent a redundant addition to the thread; > followed by this doubly redundant addition. "Cached" has been used for ages as amount of "potentially free memory". This patch corrects it in original meaning and makes it closer to that "potential" meaining at the same time. MemAvailable means exactly that and thing else so logic behind it could be tuned and changed in the future. Thus, adding new fields makes no sense. BTW Glibc recently switched sysconf(_SC_PHYS_PAGES) / sysconf(_SC_AVPHYS_PAGES) from /proc/meminfo MemTotal / MemFree to sysinfo(2) totalram / freeram for performance reason. It seems possible to expose MemAvailable via sysinfo: there is space for one field. Probably it's also possible to switch _SC_AVPHYS_PAGES to really available memory and add memcg awareness too.
Re: [PATCH 1/2] sigaltstack: implement SS_AUTODISARM flag
29.02.2016 00:13, Stas Sergeev пишет: This patch implements the SS_AUTODISARM flag that can be ORed with SS_ONSTACK when forming ss_flags. When this flag is set, sigaltstack will be disabled when entering the signal handler; more precisely, after saving sas to uc_stack. When leaving the signal handler, the sigaltstack is restored by uc_stack. When this flag is used, it is safe to switch from sighandler with swapcontext(). Without this flag, the subsequent signal will corrupt the state of the switched-away sighandler. CC: Ingo Molnar CC: Peter Zijlstra CC: Richard Weinberger CC: Andrew Morton CC: Oleg Nesterov CC: Tejun Heo CC: Heinrich Schuchardt CC: Jason Low CC: Andrea Arcangeli CC: Frederic Weisbecker CC: Konstantin Khlebnikov CC: Josh Triplett CC: "Eric W. Biederman" CC: Aleksa Sarai CC: "Amanieu d'Antras" CC: Paul Moore CC: Sasha Levin CC: Palmer Dabbelt CC: Vladimir Davydov CC: linux-kernel@vger.kernel.org CC: linux-...@vger.kernel.org CC: Andy Lutomirski Signed-off-by: Stas Sergeev --- include/linux/sched.h | 1 + include/linux/signal.h | 4 +++- include/uapi/linux/signal.h | 3 +++ kernel/fork.c | 4 +++- kernel/signal.c | 23 --- 5 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a10494a..f561d34 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1587,6 +1587,7 @@ struct task_struct { unsigned long sas_ss_sp; size_t sas_ss_size; + unsigned sas_ss_flags; struct callback_head *task_works; diff --git a/include/linux/signal.h b/include/linux/signal.h index 92557bb..be3ebe0 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -432,8 +432,10 @@ int __save_altstack(stack_t __user *, unsigned long); stack_t __user *__uss = uss; \ struct task_struct *t = current; \ put_user_ex((void __user *)t->sas_ss_sp, &__uss->ss_sp); \ - put_user_ex(sas_ss_flags(sp), &__uss->ss_flags); \ + put_user_ex(t->sas_ss_flags, &__uss->ss_flags); \ put_user_ex(t->sas_ss_size, &__uss->ss_size); \ + if (t->sas_ss_flags & SS_AUTODISARM) \ + t->sas_ss_size = 0; \ Should also reset flags here... Will send v4.
Re: [PATCH v10 2/2] cpufreq: powernv: Add sysfs attributes to show throttle stats
On 26-02-16, 16:06, Shilpasri G Bhat wrote: > +static int powernv_cpufreq_policy_notifier(struct notifier_block *nb, > +unsigned long action, void *data) > +{ > + struct cpufreq_policy *policy = data; > + int ret; > + > + if (action == CPUFREQ_CREATE_POLICY) { > + ret = sysfs_create_group(&policy->kobj, &throttle_attr_grp); > + if (ret) > + pr_info("Failed to create throttle stats directory for > cpu %d\n", > + policy->cpu); > + } else if (action == CPUFREQ_REMOVE_POLICY) { > + sysfs_remove_group(&policy->kobj, &throttle_attr_grp); > + } > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block powernv_cpufreq_policy_nb = { > + .notifier_call = powernv_cpufreq_policy_notifier, > + .next = NULL, > +}; > + > static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) > { > struct powernv_smp_call_data freq_data; > @@ -603,6 +708,8 @@ static inline void clean_chip_info(void) > > static inline void unregister_all_notifiers(void) > { > + cpufreq_unregister_notifier(&powernv_cpufreq_policy_nb, > + CPUFREQ_POLICY_NOTIFIER); > opal_message_notifier_unregister(OPAL_MSG_OCC, >&powernv_cpufreq_opal_nb); > unregister_reboot_notifier(&powernv_cpufreq_reboot_nb); > @@ -628,6 +735,8 @@ static int __init powernv_cpufreq_init(void) > > register_reboot_notifier(&powernv_cpufreq_reboot_nb); > opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb); > + cpufreq_register_notifier(&powernv_cpufreq_policy_nb, > + CPUFREQ_POLICY_NOTIFIER); > > rc = cpufreq_register_driver(&powernv_cpufreq_driver); > if (!rc) @Rafael: This driver needs to do this *ugly* notifier hack, just because we aren't doing kobject_add() for policy->kobj before ->init(). And we did that because, we wanted to create the policyX structure with the first CPU in policy->related_cpus mask and related_cpus mask isn't available until we call ->init().. Should we do something in core to make this easier for this driver? -- viresh
linux-next: manual merge of the target-merge tree with the net-next tree
Hi Nicholas, Today's linux-next merge of the target-merge tree got a conflict in: drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h between commit: ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support") from the net-next tree and commit: c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD") from the target-merge tree. I fixed it up (the latter was a superset of the former) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell
Re: [PATCH] mm/zsmalloc: add compact column to pool stat
Hello, On (02/29/16 15:02), Minchan Kim wrote: > On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote: > > Add a new column to pool stats, which will tell us class' zs_can_compact() > > number, so it will be easier to analyze zsmalloc fragmentation. > > Just nitpick: > > Strictly speaking, zs_can_compact number is number of "ideal freeable page > by compaction". How about using high level term in description rather than > function name? OK, makes sense. > > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but > > they don't tell us how badly the class is fragmented internally. > > > > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows: > > > > class size almost_full almost_empty obj_allocated obj_used pages_used > > pages_per_zspage compact > > [..] > > 12 224 02 146 5 8 > > 4 4 > > 13 240 00 0 0 0 > > 1 0 > > 14 256 1 13 1840 1672115 > > 1 10 > > 15 272 00 0 0 0 > > 1 0 > > [..] > > 49 816 03 745735149 > > 1 2 > > 51 848 34 361306 76 > > 4 8 > > 52 864 12 14 378268 81 > > 3 21 > > 54 896 1 12 117 57 26 > > 2 12 > > 57 944 00 0 0 0 > > 3 0 > > [..] > > Total26 131 12709 10994 1071 > >134 > > > > For example, from this particular output we can easily conclude that > > class-896 > > is heavily fragmented -- it occupies 26 pages, 12 can be freed by > > compaction. > > How about using "freeable" or something which could represent "freeable"? > IMO, it's more strightforward for user. OK. didn't want to put any long column name there, which would bloat the output. will take a look. > Other than that, > > Acked-by: Minchan Kim > > > Thanks for the nice job! thanks. -ss
Re: [PATCH] mm/zsmalloc: add compact column to pool stat
On Sat, Feb 27, 2016 at 03:23:53PM +0900, Sergey Senozhatsky wrote: > Add a new column to pool stats, which will tell us class' zs_can_compact() > number, so it will be easier to analyze zsmalloc fragmentation. Just nitpick: Strictly speaking, zs_can_compact number is number of "ideal freeable page by compaction". How about using high level term in description rather than function name? > > At the moment, we have only numbers of FULL and ALMOST_EMPTY classes, but > they don't tell us how badly the class is fragmented internally. > > The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows: > > class size almost_full almost_empty obj_allocated obj_used pages_used > pages_per_zspage compact > [..] > 12 224 02 146 5 8 >4 4 > 13 240 00 0 0 0 >1 0 > 14 256 1 13 1840 1672115 >1 10 > 15 272 00 0 0 0 >1 0 > [..] > 49 816 03 745735149 >1 2 > 51 848 34 361306 76 >4 8 > 52 864 12 14 378268 81 >3 21 > 54 896 1 12 117 57 26 >2 12 > 57 944 00 0 0 0 >3 0 > [..] > Total26 131 12709 10994 1071 > 134 > > For example, from this particular output we can easily conclude that class-896 > is heavily fragmented -- it occupies 26 pages, 12 can be freed by compaction. How about using "freeable" or something which could represent "freeable"? IMO, it's more strightforward for user. Other than that, Acked-by: Minchan Kim Thanks for the nice job!
Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote: > The two header files got moved to include/linux, and most > users were already converted, this changes the remaining drivers > and removes the files. > > Signed-off-by: Arnd Bergmann > --- > drivers/dma/idma64.h| 2 +- For this: Acked-by: Vinod Koul Thanks -- ~Vinod
Re: [PATCH v3 22/22] sound/usb: Use Media Controller API to share media resources
On 02/27/2016 12:48 AM, Takashi Iwai wrote: > On Sat, 27 Feb 2016 03:55:39 +0100, > Shuah Khan wrote: >> >> On 02/26/2016 01:50 PM, Takashi Iwai wrote: >>> On Fri, 26 Feb 2016 21:08:43 +0100, >>> Shuah Khan wrote: On 02/26/2016 12:55 PM, Takashi Iwai wrote: > On Fri, 12 Feb 2016 00:41:38 +0100, > Shuah Khan wrote: >> >> Change ALSA driver to use Media Controller API to >> share media resources with DVB and V4L2 drivers >> on a AU0828 media device. Media Controller specific >> initialization is done after sound card is registered. >> ALSA creates Media interface and entity function graph >> nodes for Control, Mixer, PCM Playback, and PCM Capture >> devices. >> >> snd_usb_hw_params() will call Media Controller enable >> source handler interface to request the media resource. >> If resource request is granted, it will release it from >> snd_usb_hw_free(). If resource is busy, -EBUSY is returned. >> >> Media specific cleanup is done in usb_audio_disconnect(). >> >> Signed-off-by: Shuah Khan >> --- >> sound/usb/Kconfig| 4 + >> sound/usb/Makefile | 2 + >> sound/usb/card.c | 14 +++ >> sound/usb/card.h | 3 + >> sound/usb/media.c| 318 >> +++ >> sound/usb/media.h| 72 +++ >> sound/usb/mixer.h| 3 + >> sound/usb/pcm.c | 28 - >> sound/usb/quirks-table.h | 1 + >> sound/usb/stream.c | 2 + >> sound/usb/usbaudio.h | 6 + >> 11 files changed, 448 insertions(+), 5 deletions(-) >> create mode 100644 sound/usb/media.c >> create mode 100644 sound/usb/media.h >> >> diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig >> index a452ad7..ba117f5 100644 >> --- a/sound/usb/Kconfig >> +++ b/sound/usb/Kconfig >> @@ -15,6 +15,7 @@ config SND_USB_AUDIO >> select SND_RAWMIDI >> select SND_PCM >> select BITREVERSE >> +select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER >> && MEDIA_SUPPORT > > Looking at the media Kconfig again, this would be broken if > MEDIA_SUPPORT=m and SND_USB_AUDIO=y. The ugly workaround is something > like: > select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \ > if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND) My current config is MEDIA_SUPPORT=m and SND_USB_AUDIO=y It is working and I didn't see any issues so far. >>> >>> Hmm, how does it be? In drivers/media/Makefile: >>> >>> ifeq ($(CONFIG_MEDIA_CONTROLLER),y) >>> obj-$(CONFIG_MEDIA_SUPPORT) += media.o >>> endif >>> >>> So it's a module. Meanwhile you have reference from usb-audio driver >>> that is built-in kernel. How is the symbol resolved? >> >> Sorry my mistake. I misspoke. My config had: >> CONFIG_MEDIA_SUPPORT=m >> CONFIG_MEDIA_CONTROLLER=y >> CONFIG_SND_USB_AUDIO=m >> >> The following doesn't work as you pointed out. >> >> CONFIG_MEDIA_SUPPORT=m >> CONFIG_MEDIA_CONTROLLER=y >> CONFIG_SND_USB_AUDIO=y >> >> okay here is what will work for all of the possible >> combinations of CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO >> >> select SND_USB_AUDIO_USE_MEDIA_CONTROLLER \ >>if MEDIA_CONTROLLER && ((MEDIA_SUPPORT=y) || (MEDIA_SUPPORT=m && >> SND_USB_AUDIO=m)) >> >> The above will cover the cases when >> >> 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are >>both modules >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected >> >> 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y >>This is when we don't want >>CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected >> >> I verified all of the above combinations to make sure >> the logic works. >> >> If you think of a better way to do this please let me >> know. I will go ahead and send patch v4 with the above >> change and you can decide if that is acceptable. > > I'm not 100% sure whether CONFIG_SND_USB_AUDIO=m can be put there as > conditional inside CONFIG_SND_USB_AUDIO definition. Maybe a safer > form would be like: > > config SND_USB_AUDIO_USE_MEDIA_CONTROLLER > bool > default y > depends on SND_USB_AUDIO > depends on MEDIA_CONTROLLER > depends on (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) > > and drop select from SND_USB_AUDIO. > > > Other than that, it looks more or less OK to me. > The way how media_stream_init() gets called is a bit worrisome, but it > should work practically. Another concern is about the disconnection. > Can all function calls in media_device_delete() be safe even if it's > called while the application still ope
[PATCH v4 22/22] sound/usb: Use Media Controller API to share media resources
Change ALSA driver to use Media Controller API to share media resources with DVB and V4L2 drivers on a AU0828 media device. Media Controller specific initialization is done after sound card is registered. ALSA creates Media interface and entity function graph nodes for Control, Mixer, PCM Playback, and PCM Capture devices. snd_usb_hw_params() will call Media Controller enable source handler interface to request the media resource. If resource request is granted, it will release it from snd_usb_hw_free(). If resource is busy, -EBUSY is returned. Media specific cleanup is done in usb_audio_disconnect(). Signed-off-by: Shuah Khan --- Changes since v3: - Fixed Kconfig to handle the following 1. CONFIG_MEDIA_SUPPORT and CONFIG_SND_USB_AUDIO are both modules CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 2. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=m CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 3. CONFIG_MEDIA_SUPPORT=y and CONFIG_SND_USB_AUDIO=y CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER is selected 4. CONFIG_MEDIA_SUPPORT=m and CONFIG_SND_USB_AUDIO=y This is when we don't want CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER selected sound/usb/Kconfig| 4 + sound/usb/Makefile | 2 + sound/usb/card.c | 14 +++ sound/usb/card.h | 3 + sound/usb/media.c| 318 +++ sound/usb/media.h| 72 +++ sound/usb/mixer.h| 3 + sound/usb/pcm.c | 28 - sound/usb/quirks-table.h | 1 + sound/usb/stream.c | 2 + sound/usb/usbaudio.h | 6 + 11 files changed, 448 insertions(+), 5 deletions(-) create mode 100644 sound/usb/media.c create mode 100644 sound/usb/media.h diff --git a/sound/usb/Kconfig b/sound/usb/Kconfig index a452ad7..d14bf41 100644 --- a/sound/usb/Kconfig +++ b/sound/usb/Kconfig @@ -15,6 +15,7 @@ config SND_USB_AUDIO select SND_RAWMIDI select SND_PCM select BITREVERSE + select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) help Say Y here to include support for USB audio and USB MIDI devices. @@ -22,6 +23,9 @@ config SND_USB_AUDIO To compile this driver as a module, choose M here: the module will be called snd-usb-audio. +config SND_USB_AUDIO_USE_MEDIA_CONTROLLER + bool + config SND_USB_UA101 tristate "Edirol UA-101/UA-1000 driver" select SND_PCM diff --git a/sound/usb/Makefile b/sound/usb/Makefile index 2d2d122..8dca3c4 100644 --- a/sound/usb/Makefile +++ b/sound/usb/Makefile @@ -15,6 +15,8 @@ snd-usb-audio-objs := card.o \ quirks.o \ stream.o +snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o + snd-usbmidi-lib-objs := midi.o # Toplevel Module Dependency diff --git a/sound/usb/card.c b/sound/usb/card.c index 1f09d95..35fe256 100644 --- a/sound/usb/card.c +++ b/sound/usb/card.c @@ -66,6 +66,7 @@ #include "format.h" #include "power.h" #include "stream.h" +#include "media.h" MODULE_AUTHOR("Takashi Iwai "); MODULE_DESCRIPTION("USB Audio"); @@ -561,6 +562,11 @@ static int usb_audio_probe(struct usb_interface *intf, if (err < 0) goto __error; + if (quirk->media_device) { + /* don't want to fail when media_device_create() fails */ + media_device_create(chip, intf); + } + usb_chip[chip->index] = chip; chip->num_interfaces++; usb_set_intfdata(intf, chip); @@ -617,6 +623,14 @@ static void usb_audio_disconnect(struct usb_interface *intf) list_for_each(p, &chip->midi_list) { snd_usbmidi_disconnect(p); } + /* +* Nice to check quirk && quirk->media_device +* need some special handlings. Doesn't look like +* we have access to quirk here +* Acceses mixer_list + */ + media_device_delete(chip); + /* release mixer resources */ list_for_each_entry(mixer, &chip->mixer_list, list) { snd_usb_mixer_disconnect(mixer); diff --git a/sound/usb/card.h b/sound/usb/card.h index 71778ca..34a0898 100644 --- a/sound/usb/card.h +++ b/sound/usb/card.h @@ -105,6 +105,8 @@ struct snd_usb_endpoint { struct list_head list; }; +struct media_ctl; + struct snd_usb_substream { struct snd_usb_stream *stream; struct usb_device *dev; @@ -156,6 +158,7 @@ struct snd_usb_substream { } dsd_dop; bool trigger_tstamp_pending_update; /* trigger timestamp being updated from initial estimate */ + struct media_ctl *media_ctl; }; struct snd_usb_stream { diff --git a/sound/usb/media.c b/sound/usb/media.c new file mode 100644 index 000..cff1459 --- /dev/null +++ b/sound/usb/media.c
[PATCH] phy: Fix armada375 compile test build on UM
The phy-armada375-usb2 driver uses IOMEM functions so COMPILE_TEST && OF build failed with: drivers/built-in.o: In function `armada375_usb_phy_probe': phy-armada375-usb2.c:(.text+0x121d): undefined reference to `devm_ioremap_resource' Signed-off-by: Krzysztof Kozlowski --- drivers/phy/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig index 0124d17bd9fe..786a9d6356b8 100644 --- a/drivers/phy/Kconfig +++ b/drivers/phy/Kconfig @@ -32,7 +32,7 @@ config PHY_BERLIN_SATA config ARMADA375_USBCLUSTER_PHY def_bool y depends on MACH_ARMADA_375 || COMPILE_TEST - depends on OF + depends on OF && HAS_IOMEM select GENERIC_PHY config PHY_DM816X_USB -- 2.5.0
[GIT PULL] extcon next for 4.6
Dear Greg, This is extcon-next pull request for v4.6. I add detailed description of this pull request on below. Please pull extcon with following updates. Best Regards, Chanwoo Choi The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d: Linux 4.5-rc1 (2016-01-24 13:06:47 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon.git tags/extcon-next-for-4.6 for you to fetch changes up to ae64e42cc2b3a17ac0c11815f53211093a54cf55: extcon: palmas: Drop IRQF_EARLY_RESUME flag (2016-02-29 11:07:34 +0900) Update extcon for 4.6 Detailed description for patchset: 1. Add new EXTCON_CHG_USB_SDP type - SDP (Standard Downstream Port) USB Charging Port means the charging connector.a 2. Add the VBUS detection by using GPIO on extcon-palmas - Beaglex15 board uses the extcon-palmas driver But, beaglex15 board need the GPIO support for VBUS detection. 3. Fix the minor issue of extcon drivers Chanwoo Choi (1): extcon: Add the EXTCON_CHG_USB_SDP to support SDP charing port Charles Keepax (1): extcon: arizona: Use DAPM mutex helper functions Dan Carpenter (1): extcon: max77843: Use correct size for reading the interrupt register Felipe Balbi (3): extcon: palmas: Add the support for VBUS detection by using GPIO arm: boot: dts: beaglex15: Remove ID GPIO arm: boot: beaglex15: pass correct interrupt Geliang Tang (1): extcon: Use to_i2c_client for both rt8973a and sm5502 Grygorii Strashko (1): extcon: palmas: Drop IRQF_EARLY_RESUME flag Moritz Fischer (1): extcon: gpio: Fix typo in comment arch/arm/boot/dts/am57xx-beagle-x15.dts | 3 +- drivers/extcon/extcon-arizona.c | 4 +-- drivers/extcon/extcon-gpio.c| 2 +- drivers/extcon/extcon-max14577.c| 3 ++ drivers/extcon/extcon-max77693.c| 12 +++- drivers/extcon/extcon-max77843.c| 5 ++- drivers/extcon/extcon-max8997.c | 3 ++ drivers/extcon/extcon-palmas.c | 54 +++-- drivers/extcon/extcon-rt8973a.c | 8 +++-- drivers/extcon/extcon-sm5502.c | 8 +++-- include/linux/mfd/palmas.h | 3 ++ 11 files changed, 92 insertions(+), 13 deletions(-)
Re: [PATCH v6 00/12] Add T210 support in Tegra soctherm
Hi, Does anyone have comments on this series? Thanks. Wei. On 2016年02月22日 16:05, Wei Ni wrote: > This patchset adds following functions for tegra_soctherm driver: > 1. add T210 support. > 2. export debugfs to show some registers. > 3. add thermtrip funciton. > 4. add suspend/resume function. > > The v5 serial is in: > http://www.spinics.net/lists/linux-tegra/msg25079.html > The v4 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24972.html > The V3 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24911.html > The V2 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24901.html > The V1 serial is in: > http://www.spinics.net/lists/linux-tegra/msg24808.html > > Main changes from V5: > 1. Change to use linux thermal framework to implement > thermtrip funciton, per Rob's comment. > 2. Add .set_trip_temp() in of-thermal driver, so that > we can set trips on hardware. > > Main changes from V4: > 1. Change description of devicetree binding per Rob's comment. > 2. Call of_node_put to decrement refcount of the node. > > Main changes from V3: > 1. Change structures to "const" in chip specific files. > 2. Minor changes per Thieery's comments. > > Main changes from V2: > 1. Fix build error in patch [1/11]. > 2. Use of_get_child_by_name instead of of_find_node_by_name in patch [8/11]. > 3. Use debugfs_remove_recursive to remove debugfs in patch [6/11]. > > Main changes from V1: > 1. Use the new type to handl different Tegra chips in one driver, which > suggested by Thierry. > 2. Changes per Thieery's other comments. > > Wei Ni (12): > thermal: tegra: move tegra thermal files into tegra directory > thermal: tegra: combine sensor group-related data > thermal: tegra: get rid of PDIV/HOTSPOT hack > thermal: tegra: split tegra_soctherm driver > thermal: tegra: add Tegra210 specific SOC_THERM driver > thermal: tegra: add a debugfs to show registers > thermal: of-thermal: allow setting trip_temp on hardware > of: add notes of critical trips for soctherm > thermal: tegra: add thermtrip function > thermal: tegra: add PM support > arm64: tegra: add soctherm node for Tegra210 > arm: tegra: set critical trips for Tegra124 > > .../devicetree/bindings/thermal/tegra-soctherm.txt | 12 + > arch/arm/boot/dts/tegra124.dtsi| 16 + > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 60 ++ > drivers/thermal/Kconfig| 12 +- > drivers/thermal/Makefile | 2 +- > drivers/thermal/of-thermal.c | 8 + > drivers/thermal/tegra/Kconfig | 13 + > drivers/thermal/tegra/Makefile | 5 + > drivers/thermal/tegra/soctherm-fuse.c | 169 + > drivers/thermal/tegra/soctherm.c | 685 > + > drivers/thermal/tegra/soctherm.h | 123 > drivers/thermal/tegra/tegra124-soctherm.c | 196 ++ > drivers/thermal/tegra/tegra210-soctherm.c | 197 ++ > drivers/thermal/tegra_soctherm.c | 476 -- > include/dt-bindings/thermal/tegra124-soctherm.h| 1 + > include/linux/thermal.h| 1 + > 16 files changed, 1489 insertions(+), 487 deletions(-) > create mode 100644 drivers/thermal/tegra/Kconfig > create mode 100644 drivers/thermal/tegra/Makefile > create mode 100644 drivers/thermal/tegra/soctherm-fuse.c > create mode 100644 drivers/thermal/tegra/soctherm.c > create mode 100644 drivers/thermal/tegra/soctherm.h > create mode 100644 drivers/thermal/tegra/tegra124-soctherm.c > create mode 100644 drivers/thermal/tegra/tegra210-soctherm.c > delete mode 100644 drivers/thermal/tegra_soctherm.c >
Re: [PATCH 01/10] fs crypto: add basic definitions for per-file encryption
On 02/25/16 11:25, Jaegeuk Kim wrote: > This patch adds definitions for per-file encryption used by ext4 and f2fs. > > Signed-off-by: Jaegeuk Kim > --- > include/linux/fs.h | 8 ++ > include/linux/fscrypto.h | 239 > +++ > include/uapi/linux/fs.h | 18 > 3 files changed, 265 insertions(+) > create mode 100644 include/linux/fscrypto.h > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ae68100..d8f57cf 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -53,6 +53,8 @@ struct swap_info_struct; > struct seq_file; > struct workqueue_struct; > struct iov_iter; > +struct fscrypt_info; > +struct fscrypt_operations; > > extern void __init inode_init(void); > extern void __init inode_init_early(void); > @@ -678,6 +680,10 @@ struct inode { > struct hlist_head i_fsnotify_marks; > #endif > > +#ifdef CONFIG_FS_ENCRYPTION > + struct fscrypt_info *i_crypt_info; > +#endif > + > void*i_private; /* fs or device private pointer */ > }; > > @@ -1323,6 +1329,8 @@ struct super_block { > #endif > const struct xattr_handler **s_xattr; > > + const struct fscrypt_operations *s_cop; > + > struct hlist_bl_heads_anon; /* anonymous dentries for (nfs) > exporting */ > struct list_heads_mounts; /* list of mounts; _not_ for fs > use */ > struct block_device *s_bdev; > diff --git a/include/linux/fscrypto.h b/include/linux/fscrypto.h > new file mode 100644 > index 000..b0aed92 > --- /dev/null > +++ b/include/linux/fscrypto.h > @@ -0,0 +1,239 @@ > +/* > + * General per-file encryption definition > + * > + * Copyright (C) 2015, Google, Inc. > + * > + * Written by Michael Halcrow, 2015. > + * Modified by Jaegeuk Kim, 2015. > + */ > + > +#ifndef _LINUX_FSCRYPTO_H > +#define _LINUX_FSCRYPTO_H > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define FS_KEY_DERIVATION_NONCE_SIZE 16 > +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1 1 > + > +#define FS_POLICY_FLAGS_PAD_40x00 > +#define FS_POLICY_FLAGS_PAD_80x01 > +#define FS_POLICY_FLAGS_PAD_16 0x02 > +#define FS_POLICY_FLAGS_PAD_32 0x03 > +#define FS_POLICY_FLAGS_PAD_MASK 0x03 > +#define FS_POLICY_FLAGS_VALID0x03 > + > +/* Encryption algorithms */ > +#define FS_ENCRYPTION_MODE_INVALID 0 > +#define FS_ENCRYPTION_MODE_AES_256_XTS 1 > +#define FS_ENCRYPTION_MODE_AES_256_GCM 2 > +#define FS_ENCRYPTION_MODE_AES_256_CBC 3 > +#define FS_ENCRYPTION_MODE_AES_256_CTS 4 > + > +/** > + * Encryption context for inode > + * > + * Protector format: > + * 1 byte: Protector format (1 = this version) > + * 1 byte: File contents encryption mode > + * 1 byte: File names encryption mode > + * 1 byte: Flags > + * 8 bytes: Master Key descriptor > + * 16 bytes: Encryption Key derivation nonce > + */ > +struct fscrypt_context { > + char format; > + char contents_encryption_mode; > + char filenames_encryption_mode; > + char flags; > + char master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; > + char nonce[FS_KEY_DERIVATION_NONCE_SIZE]; how about u8 instead of char? > +} __packed; > + > +/* Encryption parameters */ > +#define FS_XTS_TWEAK_SIZE16 > +#define FS_AES_128_ECB_KEY_SIZE 16 > +#define FS_AES_256_GCM_KEY_SIZE 32 > +#define FS_AES_256_CBC_KEY_SIZE 32 > +#define FS_AES_256_CTS_KEY_SIZE 32 > +#define FS_AES_256_XTS_KEY_SIZE 64 > +#define FS_MAX_KEY_SIZE 64 > + > +#define FS_KEY_DESC_PREFIX "fscrypt:" > +#define FS_KEY_DESC_PREFIX_SIZE 8 > + > +/* This is passed in from userspace into the kernel keyring */ > +struct fscrypt_key { > + __u32 mode; > + char raw[FS_MAX_KEY_SIZE]; > + __u32 size; > +} __packed; > + > +struct fscrypt_info { > + char ci_data_mode; > + char ci_filename_mode; > + char ci_flags; ditto > + struct crypto_ablkcipher *ci_ctfm; > + struct key *ci_keyring_key; > + char ci_master_key[FS_KEY_DESCRIPTOR_SIZE]; > +}; > + > +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x0001 > +#define FS_WRITE_PATH_FL 0x0002 > + > +struct fscrypt_ctx { > + union { > + struct { > + struct page *bounce_page; /* Ciphertext page */ > + struct page *control_page; /* Original page */ > + } w; > + struct { > + struct bio *bio; > + struct work_struct work; > + } r; > + struct list_head free_list; /* Free list */ > + }; > + char flags; /* Flags */ > + char mode; /* Encryption mod
Re: [PATCH 06/10] fs crypto: add Makefile and Kconfig
On 02/25/16 11:26, Jaegeuk Kim wrote: > This patch adds a facility to enable per-file encryption. > > Arnd fixes a missing CONFIG_BLOCK check in the original patch. > "The newly added generic crypto abstraction for file systems operates > on 'struct bio' objects, which do not exist when CONFIG_BLOCK is > disabled: > > fs/crypto/crypto.c: In function 'fscrypt_zeroout_range': > fs/crypto/crypto.c:308:9: error: implicit declaration of function 'bio_alloc' > [-Werror=implicit-function-declaration] > > This adds a Kconfig dependency that prevents FS_ENCRYPTION from being > enabled without BLOCK." > > Signed-off-by: Arnd Bergmann > Signed-off-by: Jaegeuk Kim > --- > fs/Kconfig | 2 ++ > fs/Makefile| 1 + > fs/crypto/Kconfig | 17 + > fs/crypto/Makefile | 2 ++ > 4 files changed, 22 insertions(+) > create mode 100644 fs/crypto/Kconfig > create mode 100644 fs/crypto/Makefile > > diff --git a/fs/Kconfig b/fs/Kconfig > index 9adee0d..9d75767 100644 > --- a/fs/Kconfig > +++ b/fs/Kconfig > @@ -84,6 +84,8 @@ config MANDATORY_FILE_LOCKING > > To the best of my knowledge this is dead code that no one cares about. > > +source "fs/crypto/Kconfig" > + > source "fs/notify/Kconfig" > > source "fs/quota/Kconfig" > diff --git a/fs/Makefile b/fs/Makefile > index 79f5225..47571e2 100644 > --- a/fs/Makefile > +++ b/fs/Makefile > @@ -30,6 +30,7 @@ obj-$(CONFIG_EVENTFD) += eventfd.o > obj-$(CONFIG_USERFAULTFD)+= userfaultfd.o > obj-$(CONFIG_AIO) += aio.o > obj-$(CONFIG_FS_DAX) += dax.o > +obj-y+= crypto/ > obj-$(CONFIG_FILE_LOCKING) += locks.o > obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o > obj-$(CONFIG_BINFMT_AOUT)+= binfmt_aout.o > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig > new file mode 100644 > index 000..9bea124e > --- /dev/null > +++ b/fs/crypto/Kconfig > @@ -0,0 +1,17 @@ > +config FS_ENCRYPTION > + bool "FS Encryption (Per-file encryption)" > + depends on BLOCK depends on CRYPTO since all of the CRYPTO_xxx below also depend on CRYPTO. > + select CRYPTO_AES > + select CRYPTO_CBC > + select CRYPTO_ECB > + select CRYPTO_XTS > + select CRYPTO_CTS > + select CRYPTO_CTR > + select CRYPTO_SHA256 > + select KEYS > + select ENCRYPTED_KEYS > + help > + Enable encryption of files and directories. This > + feature is similar to ecryptfs, but it is more memory > + efficient since it avoids caching the encrypted and > + decrypted pages in the page cache. > diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile > new file mode 100644 > index 000..f9f68cd > --- /dev/null > +++ b/fs/crypto/Makefile > @@ -0,0 +1,2 @@ > +obj-y += fname.o > +obj-$(CONFIG_FS_ENCRYPTION) += crypto.o policy.o keyinfo.o > -- ~Randy
[PATCH 01/10] selftests/x86: In syscall_nt, test NT|TF as well
Setting TF prevents fastpath returns in most cases, which causes the test to fail on 32-bit kernels because 32-bit kernels do not, in fact, handle NT correctly on SYSENTER entries. The next patch will fix 32-bit kernels. Signed-off-by: Andy Lutomirski --- tools/testing/selftests/x86/syscall_nt.c | 57 +++- 1 file changed, 49 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c index 60c06af4646a..a6ceff86c199 100644 --- a/tools/testing/selftests/x86/syscall_nt.c +++ b/tools/testing/selftests/x86/syscall_nt.c @@ -17,6 +17,9 @@ #include #include +#include +#include +#include #include #include @@ -26,6 +29,8 @@ # define WIDTH "l" #endif +static unsigned int nerrs; + static unsigned long get_eflags(void) { unsigned long eflags; @@ -39,16 +44,52 @@ static void set_eflags(unsigned long eflags) : : "rm" (eflags) : "flags"); } -int main() +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) { - printf("[RUN]\tSet NT and issue a syscall\n"); - set_eflags(get_eflags() | X86_EFLAGS_NT); + struct sigaction sa; + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(&sa.sa_mask); + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); +} + +static void sigtrap(int sig, siginfo_t *si, void *ctx_void) +{ +} + +static void do_it(unsigned long extraflags) +{ + unsigned long flags; + + set_eflags(get_eflags() | extraflags); syscall(SYS_getpid); - if (get_eflags() & X86_EFLAGS_NT) { - printf("[OK]\tThe syscall worked and NT is still set\n"); - return 0; + flags = get_eflags(); + if ((flags & extraflags) == extraflags) { + printf("[OK]\tThe syscall worked and flags are still set\n"); } else { - printf("[FAIL]\tThe syscall worked but NT was cleared\n"); - return 1; + printf("[FAIL]\tThe syscall worked but flags were cleared (flags = 0x%lx but expected 0x%lx set)\n", + flags, extraflags); + nerrs++; } } + +int main() +{ + printf("[RUN]\tSet NT and issue a syscall\n"); + do_it(X86_EFLAGS_NT); + + /* +* Now try it again with TF set -- TF forces returns via IRET in all +* cases except non-ptregs-using 64-bit full fast path syscalls. +*/ + + sethandler(SIGTRAP, sigtrap, 0); + + printf("[RUN]\tSet NT|TF and issue a syscall\n"); + do_it(X86_EFLAGS_NT | X86_EFLAGS_TF); + + return nerrs == 0 ? 0 : 1; +} -- 2.5.0
[PATCH 00/10] x86: Various SYSENTER/SYSEXIT/#DB fixes and cleanups
hpa asked me to get rid of the ASM_CLAC at the beginning of the SYSENTER path. Little did he know... This series makes the observed behavior of SYSENTER wrt flags the same for all sane flags and kernel bitnesses. That is, SYSENTER preserves flags now unless you do a syscall that explicitly changes flags, and the HW flags that the syscall executes with are sanitized. This includes NT, TF, AC and all arithmetic flags. Prior to this series, 32-bit kernels clobbered TF and the arithmetic flags and behaved highly erratically if NT was set. (If IF is cleared by evil userspace when SYSENTER starts, IF will be set again on return. There's nothing the kernel can do about this -- SYSENTER inherently forgets the state of IF.) This series speeds up SYSENTER on all kernels by a surprisingly large amount on Skylake because it eliminates an unconditional CLAC. While SYSENTER used to handle TF correctly as far as I can tell on 64-bit kernels, the means by which it did so was heavily tangled up in the ptrace single-step logic. It now works just like all the other kernel entries except insofar as do_debug has a simple special case for it. Relatedly, the bizarre and poorly explained old fixup in do_debug is now hidden behind a WARN_ON_ONCE in preparation for deleting it at some point. The code that fixed up NMI and #DB early in SYSENTER in 32-bit kernels used to be both terrifying and incorrect. (It doesn't appear to have been exploitably bad, but the reason for that is subtle, and the code was certainy more fragile than it deserved to me.) We still need a special fixup, but it's much simpler now. While I was doing all this, I also noticed that DR6 and BTF handling in do_debug was a bit off. Two of the patches in here try to fix it up. Have fun! tl;dr: Cleanups and sanity fixes here, but no security fixes, and I don't think anything needs to be backported or put in x86/urgent. This series applies to the result of merging tip:x86/asm and tip:x86/urgent. I've been testing on a somewhat bastardized base, because tip currently doesn't work on my laptop in 32-bit mode. (That bug is fixed in Linus' tree.) Andy Lutomirski (10): selftests/x86: In syscall_nt, test NT|TF as well x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/32: Restore FLAGS on SYSEXIT x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/traps: Clear DR6 early in do_debug and improve the comment x86/entry: Vastly simplify SYSENTER TF handling x86/entry: Only allocate space for SYSENTER_stack if needed x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry/32: Add and check a stack canary for the SYSENTER stack arch/x86/entry/entry_32.S| 182 ++- arch/x86/entry/entry_64_compat.S | 15 ++- arch/x86/include/asm/processor.h | 5 +- arch/x86/include/asm/proto.h | 15 ++- arch/x86/kernel/asm-offsets_32.c | 5 + arch/x86/kernel/process.c| 3 + arch/x86/kernel/traps.c | 87 --- tools/testing/selftests/x86/syscall_nt.c | 57 -- 8 files changed, 263 insertions(+), 106 deletions(-) -- 2.5.0
[PATCH 02/10] x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
CLAC is slow, and the SYSENTER code already has an unlikely path that runs if unusual flags are set. Drop the CLAC and instead rely on the unlikely path to clear AC. This seems to save ~24 cycles on my Skylake laptop. (Hey, Intel, make this faster please!) Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_64_compat.S | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 89bcb4979e7a..7c8e72da7654 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -66,8 +66,6 @@ ENTRY(entry_SYSENTER_compat) */ pushfq /* pt_regs->flags (except IF = 0) */ orl $X86_EFLAGS_IF, (%rsp) /* Fix saved flags */ - ASM_CLAC/* Clear AC after saving FLAGS */ - pushq $__USER32_CS/* pt_regs->cs */ xorq%r8,%r8 pushq %r8 /* pt_regs->ip = 0 (placeholder) */ @@ -90,9 +88,9 @@ ENTRY(entry_SYSENTER_compat) cld /* -* Sysenter doesn't filter flags, so we need to clear NT +* Sysenter doesn't filter flags, so we need to clear NT and AC * ourselves. To save a few cycles, we can check whether -* NT was set instead of doing an unconditional popfq. +* either was set instead of doing an unconditional popfq. * This needs to happen before enabling interrupts so that * we don't get preempted with NT set. * @@ -102,7 +100,7 @@ ENTRY(entry_SYSENTER_compat) * we're keeping that code behind a branch which will predict as * not-taken and therefore its instructions won't be fetched. */ - testl $X86_EFLAGS_NT, EFLAGS(%rsp) + testl $X86_EFLAGS_NT|X86_EFLAGS_AC, EFLAGS(%rsp) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: -- 2.5.0
[PATCH 06/10] x86/traps: Clear DR6 early in do_debug and improve the comment
Leaving any bits set in DR6 on return from a debug exception is asking for trouble. Prevent it by writing zero right away and clarify the comment. Signed-off-by: Andy Lutomirski --- arch/x86/kernel/traps.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 19e6cfa501e3..6dddc220e3ed 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -593,6 +593,18 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) ist_enter(regs); get_debugreg(dr6, 6); + /* +* The Intel SDM says: +* +* Certain debug exceptions may clear bits 0-3. The remaining +* contents of the DR6 register are never cleared by the +* processor. To avoid confusion in identifying debug +* exceptions, debug handlers should clear the register before +* returning to the interrupted task. +* +* Keep it simple: clear DR6 immediately. +*/ + set_debugreg(0, 6); /* Filter out all the reserved bits which are preset to 1 */ dr6 &= ~DR6_RESERVED; @@ -616,9 +628,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) if ((dr6 & DR_STEP) && kmemcheck_trap(regs)) goto exit; - /* DR6 may or may not be cleared by the CPU */ - set_debugreg(0, 6); - /* Store the virtualized DR6 value */ tsk->thread.debugreg6 = dr6; -- 2.5.0
[PATCH 07/10] x86/entry: Vastly simplify SYSENTER TF handling
Due to a blatant design error, SYSENTER doesn't clear TF. As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. That's all we need to do. Don't get too excited -- our handling is still buggy on 32-bit kernels. There's nothing wrong with the SYSENTER code itself, but the #DB prologue has a clever fixup for traps on the very first instruction of entry_SYSENTER_32, and the fixup doesn't work quite correctly. The next two patches will fix that. [1] We could probably prevent it by forcing BTF on at all times and making sure we clear TF before any branches in the SYSENTER code. Needless to say, this is a bad idea. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S| 42 ++-- arch/x86/entry/entry_64_compat.S | 9 ++- arch/x86/include/asm/proto.h | 15 ++-- arch/x86/kernel/traps.c | 52 +--- 4 files changed, 94 insertions(+), 24 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ed171f938960..752d4f031a18 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -287,7 +287,26 @@ need_resched: END(resume_kernel) #endif - # SYSENTER call handler stub +GLOBAL(__begin_SYSENTER_singlestep_region) +/* + * All code from here through __end_SYSENTER_singlestep_region is subject + * to being single-stepped if a user program sets TF and executes SYSENTER. + * There is absolutely nothing that we can do to prevent this from happening + * (thanks Intel!). To keep our handling of this situation as simple as + * possible, we handle TF just like AC and NT, except that our #DB handler + * will ignore all of the single-step traps generated in this range. + */ + +#ifdef CONFIG_XEN +/* + * Xen doesn't set %esp to be precisely what the normal SYSENTER + * entry point expects, so fix it up before using the normal path. + */ +ENTRY(xen_sysenter_target) + addl$5*4, %esp /* remove xen-provided frame */ + jmp sysenter_past_esp +#endif + ENTRY(entry_SYSENTER_32) movlTSS_sysenter_sp0(%esp), %esp sysenter_past_esp: @@ -301,19 +320,25 @@ sysenter_past_esp: SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */ /* -* Sysenter doesn't filter flags, so we need to clear NT and AC -* ourselves. To save a few cycles, we can check whether +* Sysenter doesn't filter flags, so we need to clear NT, AC +* and TF ourselves. To save a few cycles, we can check whether * either was set instead of doing an unconditional popfq. * This needs to happen before enabling interrupts so that * we don't get preempted with NT set. * +* If TF is set, we will single-step all the way to here -- do_debug +* will ignore all the traps. (Yes, this is slow, but so is +* single-stepping in general. This allows us to avoid having +* a more complicated code to handle the case where a user program +* forces us to single-step through the SYSENTER entry code.) +* * NB.: .Lsysenter_fix_flags is a label with the code under it moved * out-of-line as an optimization: NT is unlikely to be set in the * majority of the cases and instead of polluting the I$ unnecessarily, * we're keeping that code behind a branch which will predict as * not-taken and therefore its instructions won't be fetched. */ - testl $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp) + testl $X86_EFLAGS_NT|X86_EFLAGS_AC|X86_EFLAGS_TF, PT_EFLAGS(%esp) jnz .Lsysenter_fix_flags .Lsysenter_flags_fixed: @@ -369,6 +394,7 @@ sysenter_past_esp: pushl $X86_EFLAGS_FIXED popfl jmp .Lsysenter_flags_fixed +GLOBAL(__end_SYSENTER_singlestep_region) ENDPROC(entry_SYSENTER_32) # system call handler stub @@ -662,14 +688,6 @@ ENTRY(spurious_interrupt_bug) END(spurious_interrupt_bug) #ifdef CONFIG_XEN -/* - * Xen doesn't set %esp to be precisely what the normal SYSENTER - * entry point expects, so fix it up before using the normal path. - */ -ENTRY(xen_sysenter_target) - addl$5*4, %esp /* remove xen-provided frame */ - jmp sysenter_past_esp - ENTRY(xen_hypervisor_callback) pushl $-1 /* orig_ax = -1 => not a system call */ SAVE_ALL diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 7c8e72da7654..6aec75b41b06 100
[PATCH 08/10] x86/entry: Only allocate space for SYSENTER_stack if needed
The SYSENTER stack is only used on 32-bit kernels. Remove it in 64-bit kernels. (We may end up using it down the road on 64-bit kernels. If so, we'll re-enable it for CONFIG_IA32_EMULATION.) Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/processor.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ecb410310e70..7cd01b71b5bd 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -297,10 +297,12 @@ struct tss_struct { */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; +#ifdef CONFIG_X86_32 /* * Space for the temporary SYSENTER stack: */ unsigned long SYSENTER_stack[64]; +#endif } cacheline_aligned; -- 2.5.0
[PATCH 10/10] x86/entry/32: Add and check a stack canary for the SYSENTER stack
Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/processor.h | 3 ++- arch/x86/kernel/process.c| 3 +++ arch/x86/kernel/traps.c | 8 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7cd01b71b5bd..50a6dc871cc0 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -299,8 +299,9 @@ struct tss_struct { #ifdef CONFIG_X86_32 /* -* Space for the temporary SYSENTER stack: +* Space for the temporary SYSENTER stack. */ + unsigned long SYSENTER_stack_canary; unsigned long SYSENTER_stack[64]; #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f7c21c22477..ee9a9792caeb 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -57,6 +57,9 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { */ .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, #endif +#ifdef CONFIG_X86_32 + .SYSENTER_stack_canary = STACK_END_MAGIC, +#endif }; EXPORT_PER_CPU_SYMBOL(cpu_tss); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 80928ea78373..590110119e6a 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -713,6 +713,14 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) debug_stack_usage_dec(); exit: +#if defined(CONFIG_X86_32) + /* +* This is the most likely code path that involves non-trivial use +* of the SYSENTER stack. Check that we haven't overrun it. +*/ + WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC, +"Overran or corrupted SYSENTER stack\n"); +#endif ist_exit(regs); } NOKPROBE_SYMBOL(do_debug); -- 2.5.0
[PATCH 09/10] x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
Right after SYSENTER, we can get a #DB or NMI. On x86_32, there's no IST, so the exception handler is invoked on the temporary SYSENTER stack. Because the SYSENTER stack is very small, we have a fixup to switch off the stack quickly when this happens. The old fixup had several issues: 1. It checked the interrupt frame's CS and EIP. This wasn't obviously correct on Xen or if vm86 mode was in use [1]. 2. In the NMI handler, it did some frightening digging into the stack frame. I'm not convinced this digging was correct. 3. The fixup didn't switch stacks and then switch back. Instead, it synthesized a brand new stack frame that would redirect the IRET back to the SYSENTER code. That frame was highly questionable. For one thing, if NMI nested inside #DB, we would effectively abort the #DB prologue, which was probably safe but was frightening. For another, the code used PUSHFL to write the FLAGS portion of the frame, which was simply bogus -- by the time PUSHFL was called, at least TF, NT, VM, and all of the arithmetic flags were clobbered. Simplify this considerably. Instead of looking at the saved frame to see where we came from, check the hardware ESP register against the SYSENTER stack directly. Malicious user code cannot spoof the kernel ESP register, and by moving the check after SAVE_ALL, we can use normal PER_CPU accesses to find all the relevant addresses. With this patch applied, the improved syscall_nt_32 test finally passes on 32-bit kernels. [1] It isn't obviously correct, but it is nonetheless safe from vm86 shenanigans as far as I can tell. A user can't point EIP at entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32, like all kernel addresses, is greater than 0x and would thus violate the CS segment limit. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S| 114 ++- arch/x86/kernel/asm-offsets_32.c | 5 ++ 2 files changed, 56 insertions(+), 63 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 752d4f031a18..99bf636a6eaf 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -987,51 +987,48 @@ error_code: jmp ret_from_exception END(page_fault) -/* - * Debug traps and NMI can happen at the one SYSENTER instruction - * that sets up the real kernel stack. Check here, since we can't - * allow the wrong stack to be used. - * - * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have - * already pushed 3 words if it hits on the sysenter instruction: - * eflags, cs and eip. - * - * We just load the right stack, and push the three (known) values - * by hand onto the new stack - while updating the return eip past - * the instruction that would have done it for sysenter. - */ -.macro FIX_STACK offset ok label - cmpw$__KERNEL_CS, 4(%esp) - jne \ok -\label: - movlTSS_sysenter_sp0 + \offset(%esp), %esp - pushfl - pushl $__KERNEL_CS - pushl $sysenter_past_esp -.endm - ENTRY(debug) + /* +* #DB can happen at the first instruction of +* entry_SYSENTER_32 or in Xen's SYSENTER prologue. If this +* happens, then we will be running on a very small stack. We +* need to detect this condition and switch to the thread +* stack before calling any C code at all. +* +* If you edit this code, keep in mind that NMIs can happen in here. +*/ ASM_CLAC - cmpl$entry_SYSENTER_32, (%esp) - jne debug_stack_correct - FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn -debug_stack_correct: pushl $-1 # mark this as an int SAVE_ALL - TRACE_IRQS_OFF xorl%edx, %edx # error code 0 movl%esp, %eax # pt_regs pointer + + /* Are we currently on the SYSENTER stack? */ + PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) + subl%eax, %ecx /* ecx = (end of SYENTER_stack) - esp */ + cmpl$SIZEOF_SYSENTER_stack, %ecx + jb .Ldebug_from_sysenter_stack + + TRACE_IRQS_OFF + calldo_debug + jmp ret_from_exception + +.Ldebug_from_sysenter_stack: + /* We're on the SYSENTER stack. Switch off. */ + movl%esp, %ebp + movlPER_CPU_VAR(cpu_current_top_of_stack), %esp + TRACE_IRQS_OFF calldo_debug + movl%ebp, %esp jmp ret_from_exception END(debug) /* - * NMI is doubly nasty. It can happen _while_ we're handling - * a debug fault, and the debug fault hasn't yet been able to - * clear up the stack. So we first check whether we got an - * NMI on the sysenter entry path, but after that we need to - * check whether we got an NMI on the debug path where the debug - * fault happened on the sysenter path. + * NMI is
[PATCH 05/10] x86/traps: Clear TIF_BLOCKSTEP on all debug exceptions
The SDM says that debug exceptions clear BTF, and we need to keep TIF_BLOCKSTEP in sync with BTF. Clear it unconditionally and improve the comment. I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not to be cleared was just an oversight. Signed-off-by: Andy Lutomirski --- arch/x86/kernel/traps.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index dd2c2e66c2e1..19e6cfa501e3 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -598,6 +598,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) dr6 &= ~DR6_RESERVED; /* +* The SDM says "The processor clears the BTF flag when it +* generates a debug exception." Clear TIF_BLOCKSTEP to keep +* TIF_BLOCKSTEP in sync with the hardware BTF flag. +*/ + clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP); + + /* * If dr6 has no reason to give us about the origin of this trap, * then it's very likely the result of an icebp/int01 trap. * User wants a sigtrap for that. @@ -612,11 +619,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) /* DR6 may or may not be cleared by the CPU */ set_debugreg(0, 6); - /* -* The processor cleared BTF, so don't mark that we need it set. -*/ - clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP); - /* Store the virtualized DR6 value */ tsk->thread.debugreg6 = dr6; -- 2.5.0
[PATCH 04/10] x86/entry/32: Restore FLAGS on SYSEXIT
We weren't restoring FLAGS at all on SYSEXIT. Apparently no one cared. With this patch applied, native kernels should always honor task_pt_regs()->flags, which opens the door for some sys_iopl cleanups. I'll do those as a separate series, though, since getting it right will involve tweaking some paravirt ops. (The short version is that, before this patch, sys_iopl, invoked via SYSENTER, wasn't guaranteed to ever transfer the updated regs->flags, so sys_iopl had to change the hardware flags register as well.) Reported-by: Brian Gerst Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 263ebde6333f..ed171f938960 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -343,6 +343,15 @@ sysenter_past_esp: popl%eax/* pt_regs->ax */ /* +* Restore all flags except IF (we restore IF separately because +* STI gives a one-instruction window in which we won't be interrupted, +* whereas POPF does not. +*/ + addl$PT_EFLAGS-PT_DS, %esp /* point esp at pt_regs->flags */ + btr $X86_EFLAGS_IF_BIT, (%esp) + popfl + + /* * Return back to the vDSO, which will pop ecx and edx. * Don't bother with DS and ES (they already contain __USER_DS). */ -- 2.5.0
[PATCH 03/10] x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
This makes the 32-bit code work just like the 64-bit code. It should speed up syscalls on 32-bit kernels on Skylake by something like 20 cycles (by analogy to the 64-bit compat case). It also cleans up NT just like we do for the 64-bit case. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_32.S | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ab710eee4308..263ebde6333f 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -294,7 +294,6 @@ sysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp/* pt_regs->sp (stashed in bp) */ pushfl /* pt_regs->flags (except IF = 0) */ - ASM_CLAC/* Clear AC after saving FLAGS */ orl $X86_EFLAGS_IF, (%esp) /* Fix IF */ pushl $__USER_CS /* pt_regs->cs */ pushl $0 /* pt_regs->ip = 0 (placeholder) */ @@ -302,6 +301,23 @@ sysenter_past_esp: SAVE_ALL pt_regs_ax=$-ENOSYS/* save rest */ /* +* Sysenter doesn't filter flags, so we need to clear NT and AC +* ourselves. To save a few cycles, we can check whether +* either was set instead of doing an unconditional popfq. +* This needs to happen before enabling interrupts so that +* we don't get preempted with NT set. +* +* NB.: .Lsysenter_fix_flags is a label with the code under it moved +* out-of-line as an optimization: NT is unlikely to be set in the +* majority of the cases and instead of polluting the I$ unnecessarily, +* we're keeping that code behind a branch which will predict as +* not-taken and therefore its instructions won't be fetched. +*/ + testl $X86_EFLAGS_NT|X86_EFLAGS_AC, PT_EFLAGS(%esp) + jnz .Lsysenter_fix_flags +.Lsysenter_flags_fixed: + + /* * User mode is traced as though IRQs are on, and SYSENTER * turned them off. */ @@ -339,6 +355,11 @@ sysenter_past_esp: .popsection _ASM_EXTABLE(1b, 2b) PTGS_TO_GS_EX + +.Lsysenter_fix_flags: + pushl $X86_EFLAGS_FIXED + popfl + jmp .Lsysenter_flags_fixed ENDPROC(entry_SYSENTER_32) # system call handler stub -- 2.5.0
Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep
On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds wrote: > On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen wrote: >> >> The exception handlers which use the IST stacks don't necessarily >> set irq count. Maybe they should. > > Hmm. I think they should. Since they clearly must not schedule, as > they use a percpu stack. > > Which exceptions use IST? > > [ grep grep ] > > Looks like stack, doublefault, nmi, debug and mce. And yes, I really > think they should all raise the irq count if they don't already. > Rather than add random arch-specific "let's check that we're on the > right stack" code to the might-sleep stuff, just use the one we have. > Resurrecting an old thread: The outcome of this discussion was that ist_enter now raises HARDIRQ_COUNT. I think this is causing a problem. If a user program enables TF, it generates a bunch of debug exceptions. The handlers raise the IRQ count and do stuff, and apparently some of that stuff can raise a softirq. (I have no idea where the softirq is being raised.) The softirq code notices that we're in_interrupt and doesn't wake ksoftirqd because it thinks we're about to exit the interrupt and process the softirq. But we don't, which causes occasional warnings and confuses things (and me!). So how do we fix it? If we stop raising HARDIRQ_COUNT (and apply $SUBJECT?), then raise_softirq will wake ksoftirqd and life is good. But this seems a bit silly, since, if we entered the ist exception handler from a context with irqs on and softirqs enabled, we *could* plausibly handle the softirq right away -- we're on an essentially empty stack. (Of course, it's a *small* stack, since it could be the IST stack.) Or we could just let ksoftirqd do its thing and stop raising HARDIRQ_COUNT. We could add a new preempt count field just for IST (yuck). We could try to hijack a different preempt count field (NMI?). But I kind of like the idea of just reinstating the original patch of explicitly checking that we're on a safe stack in schedule and __might_sleep, since that is the actual condition we care about. --Andy
[PATCH v4 3/5] ocfs2: create/remove sysfile for online file check
Create online file check sysfile when ocfs2 mount, remove the related sysfile when ocfs2 umount. Signed-off-by: Gang He Reviewed-by: Mark Fasheh --- fs/ocfs2/super.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 2de4c8a..5ef88b8 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -74,6 +74,7 @@ #include "suballoc.h" #include "buffer_head_io.h" +#include "filecheck.h" static struct kmem_cache *ocfs2_inode_cachep; struct kmem_cache *ocfs2_dquot_cachep; @@ -1204,6 +1205,9 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent) /* Start this when the mount is almost sure of being successful */ ocfs2_orphan_scan_start(osb); + /* Create filecheck sysfile /sys/fs/ocfs2//filecheck */ + ocfs2_filecheck_create_sysfs(sb); + return status; read_super_error: @@ -1671,6 +1675,7 @@ static void ocfs2_put_super(struct super_block *sb) ocfs2_sync_blockdev(sb); ocfs2_dismount_volume(sb, 0); + ocfs2_filecheck_remove_sysfs(sb); } static int ocfs2_statfs(struct dentry *dentry, struct kstatfs *buf) -- 2.1.2
[PATCH v4 4/5] ocfs2: check/fix inode block for online file check
Implement online check or fix inode block during reading a inode block to memory. Signed-off-by: Gang He --- fs/ocfs2/inode.c | 225 +++-- fs/ocfs2/ocfs2_trace.h | 2 + 2 files changed, 218 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 8f87e05..6ce531e 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -53,6 +53,7 @@ #include "xattr.h" #include "refcounttree.h" #include "ocfs2_trace.h" +#include "filecheck.h" #include "buffer_head_io.h" @@ -74,6 +75,14 @@ static int ocfs2_truncate_for_delete(struct ocfs2_super *osb, struct inode *inode, struct buffer_head *fe_bh); +static int ocfs2_filecheck_read_inode_block_full(struct inode *inode, +struct buffer_head **bh, +int flags, int type); +static int ocfs2_filecheck_validate_inode_block(struct super_block *sb, + struct buffer_head *bh); +static int ocfs2_filecheck_repair_inode_block(struct super_block *sb, + struct buffer_head *bh); + void ocfs2_set_inode_flags(struct inode *inode) { unsigned int flags = OCFS2_I(inode)->ip_attr; @@ -127,6 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 blkno) struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags, int sysfile_type) { + int rc = 0; struct inode *inode = NULL; struct super_block *sb = osb->sb; struct ocfs2_find_inode_args args; @@ -161,12 +171,17 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags, } trace_ocfs2_iget5_locked(inode->i_state); if (inode->i_state & I_NEW) { - ocfs2_read_locked_inode(inode, &args); + rc = ocfs2_read_locked_inode(inode, &args); unlock_new_inode(inode); } if (is_bad_inode(inode)) { iput(inode); - inode = ERR_PTR(-ESTALE); + if ((flags & OCFS2_FI_FLAG_FILECHECK_CHK) || + (flags & OCFS2_FI_FLAG_FILECHECK_FIX)) + /* Return OCFS2_FILECHECK_ERR_XXX related errno */ + inode = ERR_PTR(rc); + else + inode = ERR_PTR(-ESTALE); goto bail; } @@ -409,7 +424,7 @@ static int ocfs2_read_locked_inode(struct inode *inode, struct ocfs2_super *osb; struct ocfs2_dinode *fe; struct buffer_head *bh = NULL; - int status, can_lock; + int status, can_lock, lock_level = 0; u32 generation = 0; status = -EINVAL; @@ -477,7 +492,7 @@ static int ocfs2_read_locked_inode(struct inode *inode, mlog_errno(status); return status; } - status = ocfs2_inode_lock(inode, NULL, 0); + status = ocfs2_inode_lock(inode, NULL, lock_level); if (status) { make_bad_inode(inode); mlog_errno(status); @@ -494,16 +509,32 @@ static int ocfs2_read_locked_inode(struct inode *inode, } if (can_lock) { - status = ocfs2_read_inode_block_full(inode, &bh, -OCFS2_BH_IGNORE_CACHE); + if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK) + status = ocfs2_filecheck_read_inode_block_full(inode, + &bh, OCFS2_BH_IGNORE_CACHE, 0); + else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX) + status = ocfs2_filecheck_read_inode_block_full(inode, + &bh, OCFS2_BH_IGNORE_CACHE, 1); + else + status = ocfs2_read_inode_block_full(inode, + &bh, OCFS2_BH_IGNORE_CACHE); } else { status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, &bh); /* * If buffer is in jbd, then its checksum may not have been * computed as yet. */ - if (!status && !buffer_jbd(bh)) - status = ocfs2_validate_inode_block(osb->sb, bh); + if (!status && !buffer_jbd(bh)) { + if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_CHK) + status = ocfs2_filecheck_validate_inode_block( + osb->sb, bh); + else if (args->fi_flags & OCFS2_FI_FLAG_FILECHECK_FIX) + status = ocfs2_filecheck_repair_inode_block( +
[PATCH v4 1/5] ocfs2: export ocfs2_kset for online file check
Export ocfs2_kset object from ocfs2_stackglue kernel module, then online file check code will create the related sysfiles under ocfs2_kset object. Signed-off-by: Gang He Reviewed-by: Mark Fasheh --- fs/ocfs2/stackglue.c | 3 ++- fs/ocfs2/stackglue.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index 5d965e8..13219ed 100644 --- a/fs/ocfs2/stackglue.c +++ b/fs/ocfs2/stackglue.c @@ -629,7 +629,8 @@ static struct attribute_group ocfs2_attr_group = { .attrs = ocfs2_attrs, }; -static struct kset *ocfs2_kset; +struct kset *ocfs2_kset; +EXPORT_SYMBOL_GPL(ocfs2_kset); static void ocfs2_sysfs_exit(void) { diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h index 66334a3..f2dce10 100644 --- a/fs/ocfs2/stackglue.h +++ b/fs/ocfs2/stackglue.h @@ -298,4 +298,6 @@ void ocfs2_stack_glue_set_max_proto_version(struct ocfs2_protocol_version *max_p int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin); void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin); +extern struct kset *ocfs2_kset; + #endif /* STACKGLUE_H */ -- 2.1.2
[PATCH v4 5/5] ocfs2: add feature document for online file check
This document will describe OCFS2 online file check feature. OCFS2 is often used in high-availaibility systems. However, OCFS2 usually converts the filesystem to read-only when encounters an error. This may not be necessary, since turning the filesystem read-only would affect other running processes as well, decreasing availability. Then, a mount option (errors=continue) is introduced, which would return the -EIO errno to the calling process and terminate furhter processing so that the filesystem is not corrupted further. The filesystem is not converted to read-only, and the problematic file's inode number is reported in the kernel log. The user can try to check/fix this file via online filecheck feature. Signed-off-by: Gang He Reviewed-by: Mark Fasheh --- .../filesystems/ocfs2-online-filecheck.txt | 94 ++ 1 file changed, 94 insertions(+) create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt b/Documentation/filesystems/ocfs2-online-filecheck.txt new file mode 100644 index 000..1ab0786 --- /dev/null +++ b/Documentation/filesystems/ocfs2-online-filecheck.txt @@ -0,0 +1,94 @@ + OCFS2 online file check + --- + +This document will describe OCFS2 online file check feature. + +Introduction + +OCFS2 is often used in high-availaibility systems. However, OCFS2 usually +converts the filesystem to read-only when encounters an error. This may not be +necessary, since turning the filesystem read-only would affect other running +processes as well, decreasing availability. +Then, a mount option (errors=continue) is introduced, which would return the +-EIO errno to the calling process and terminate furhter processing so that the +filesystem is not corrupted further. The filesystem is not converted to +read-only, and the problematic file's inode number is reported in the kernel +log. The user can try to check/fix this file via online filecheck feature. + +Scope += +This effort is to check/fix small issues which may hinder day-to-day operations +of a cluster filesystem by turning the filesystem read-only. The scope of +checking/fixing is at the file level, initially for regular files and eventually +to all files (including system files) of the filesystem. + +In case of directory to file links is incorrect, the directory inode is +reported as erroneous. + +This feature is not suited for extravagant checks which involve dependency of +other components of the filesystem, such as but not limited to, checking if the +bits for file blocks in the allocation has been set. In case of such an error, +the offline fsck should/would be recommended. + +Finally, such an operation/feature should not be automated lest the filesystem +may end up with more damage than before the repair attempt. So, this has to +be performed using user interaction and consent. + +User interface +== +When there are errors in the OCFS2 filesystem, they are usually accompanied +by the inode number which caused the error. This inode number would be the +input to check/fix the file. + +There is a sysfs directory for each OCFS2 file system mounting: + + /sys/fs/ocfs2//filecheck + +Here, indicates the name of OCFS2 volumn device which has been already +mounted. The file above would accept inode numbers. This could be used to +communicate with kernel space, tell which file(inode number) will be checked or +fixed. Currently, three operations are supported, which includes checking +inode, fixing inode and setting the size of result record history. + +1. If you want to know what error exactly happened to before fixing, do + + # echo "" > /sys/fs/ocfs2//filecheck/check + # cat /sys/fs/ocfs2//filecheck/check + +The output is like this: + INO DONEERROR +39502 1 GENERATION + + lists the inode numbers. + indicates whether the operation has been finished. + says what kind of errors was found. For the detailed error numbers, +please refer to the file linux/fs/ocfs2/filecheck.h. + +2. If you determine to fix this inode, do + + # echo "" > /sys/fs/ocfs2//filecheck/fix + # cat /sys/fs/ocfs2//filecheck/fix + +The output is like this: + INO DONEERROR +39502 1 SUCCESS + +This time, the column indicates whether this fix is successful or not. + +3. The record cache is used to store the history of check/fix results. It's +defalut size is 10, and can be adjust between the range of 10 ~ 100. You can +adjust the size like this: + + # echo "" > /sys/fs/ocfs2//filecheck/set + +Fixing stuff + +On receivng the inode, the filesystem would read the inode and the +file metadata. In case of errors, the filesystem would fix the errors +and report the problems it fixed in the kernel log. As a precautionary measure, +the inode must first be checked for errors before performing a final fix. + +The inode and the result
[PATCH v4 2/5] ocfs2: sysfile interfaces for online file check
Implement online file check sysfile interfaces, e.g. how to create the related sysfile according to device name, how to display/handle file check request from the sysfile. Signed-off-by: Gang He --- fs/ocfs2/Makefile| 3 +- fs/ocfs2/filecheck.c | 606 +++ fs/ocfs2/filecheck.h | 49 + fs/ocfs2/inode.h | 3 + 4 files changed, 660 insertions(+), 1 deletion(-) create mode 100644 fs/ocfs2/filecheck.c create mode 100644 fs/ocfs2/filecheck.h diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile index ce210d4..e27e652 100644 --- a/fs/ocfs2/Makefile +++ b/fs/ocfs2/Makefile @@ -41,7 +41,8 @@ ocfs2-objs := \ quota_local.o \ quota_global.o \ xattr.o \ - acl.o + acl.o \ + filecheck.o ocfs2_stackglue-objs := stackglue.o ocfs2_stack_o2cb-objs := stack_o2cb.o diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c new file mode 100644 index 000..2cabbcf --- /dev/null +++ b/fs/ocfs2/filecheck.c @@ -0,0 +1,606 @@ +/* -*- mode: c; c-basic-offset: 8; -*- + * vim: noexpandtab sw=8 ts=8 sts=0: + * + * filecheck.c + * + * Code which implements online file check. + * + * Copyright (C) 2016 SuSE. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation, version 2. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ocfs2.h" +#include "ocfs2_fs.h" +#include "stackglue.h" +#include "inode.h" + +#include "filecheck.h" + + +/* File check error strings, + * must correspond with error number in header file. + */ +static const char * const ocfs2_filecheck_errs[] = { + "SUCCESS", + "FAILED", + "INPROGRESS", + "READONLY", + "INJBD", + "INVALIDINO", + "BLOCKECC", + "BLOCKNO", + "VALIDFLAG", + "GENERATION", + "UNSUPPORTED" +}; + +static DEFINE_SPINLOCK(ocfs2_filecheck_sysfs_lock); +static LIST_HEAD(ocfs2_filecheck_sysfs_list); + +struct ocfs2_filecheck { + struct list_head fc_head; /* File check entry list head */ + spinlock_t fc_lock; + unsigned int fc_max;/* Maximum number of entry in list */ + unsigned int fc_size; /* Current entry count in list */ + unsigned int fc_done; /* Finished entry count in list */ +}; + +struct ocfs2_filecheck_sysfs_entry { /* sysfs entry per mounting */ + struct list_head fs_list; + atomic_t fs_count; + struct super_block *fs_sb; + struct kset *fs_devicekset; + struct kset *fs_fcheckkset; + struct ocfs2_filecheck *fs_fcheck; +}; + +#define OCFS2_FILECHECK_MAXSIZE100 +#define OCFS2_FILECHECK_MINSIZE10 + +/* File check operation type */ +enum { + OCFS2_FILECHECK_TYPE_CHK = 0, /* Check a file(inode) */ + OCFS2_FILECHECK_TYPE_FIX, /* Fix a file(inode) */ + OCFS2_FILECHECK_TYPE_SET = 100 /* Set entry list maximum size */ +}; + +struct ocfs2_filecheck_entry { + struct list_head fe_list; + unsigned long fe_ino; + unsigned int fe_type; + unsigned int fe_done:1; + unsigned int fe_status:31; +}; + +struct ocfs2_filecheck_args { + unsigned int fa_type; + union { + unsigned long fa_ino; + unsigned int fa_len; + }; +}; + +static const char * +ocfs2_filecheck_error(int errno) +{ + if (!errno) + return ocfs2_filecheck_errs[errno]; + + BUG_ON(errno < OCFS2_FILECHECK_ERR_START || + errno > OCFS2_FILECHECK_ERR_END); + return ocfs2_filecheck_errs[errno - OCFS2_FILECHECK_ERR_START + 1]; +} + +static ssize_t ocfs2_filecheck_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf); +static ssize_t ocfs2_filecheck_store(struct kobject *kobj, +struct kobj_attribute *attr, +const char *buf, size_t count); +static struct kobj_attribute ocfs2_attr_filecheck_chk = + __ATTR(check, S_IRUSR | S_IWUSR, + ocfs2_filecheck_show, + ocfs2_filecheck_store); +static struct kobj_attribute ocfs2_attr_filecheck_fix = + __ATTR(fix, S_IRUSR | S_IWUSR, + ocfs2_filecheck_show, + ocfs2_filecheck_store); +static struct kobj_attribute ocfs2
[PATCH v4 0/5] Add online file check feature
When there are errors in the ocfs2 filesystem, they are usually accompanied by the inode number which caused the error. This inode number would be the input to fixing the file. One of these options could be considered: A file in the sys filesytem which would accept inode numbers. This could be used to communication back what has to be fixed or is fixed. You could write: $# echo "" > /sys/fs/ocfs2/devname/filecheck/check or $# echo "" > /sys/fs/ocfs2/devname/filecheck/fix Compare with third version, I add buffer_jbd() check in inode block fix/writing dirty buffer back, make unsigned short type to unsigned int type for members in ocfs2_filecheck_entry struct, add feature document in this patch set. Compare with second version, I re-design filecheck sysfs interfaces, there are three sysfs files(check, fix and set) under filecheck directory(see above), sysfs will accept only one argument . Second, I adjust some code in ocfs2_filecheck_repair_inode_block() function according to upstream feedback, we cannot just add VALID_FL flag back as a inode block fix, then we will not fix this field corruption currently until having a complete solution. Compare with first version, I use strncasecmp instead of double strncmp functions. Second, update the source file contribution vendor. Gang He (5): ocfs2: export ocfs2_kset for online file check ocfs2: sysfile interfaces for online file check ocfs2: create/remove sysfile for online file check ocfs2: check/fix inode block for online file check ocfs2: add feature document for online file check .../filesystems/ocfs2-online-filecheck.txt | 94 fs/ocfs2/Makefile | 3 +- fs/ocfs2/filecheck.c | 606 + fs/ocfs2/filecheck.h | 49 ++ fs/ocfs2/inode.c | 225 +++- fs/ocfs2/inode.h | 3 + fs/ocfs2/ocfs2_trace.h | 2 + fs/ocfs2/stackglue.c | 3 +- fs/ocfs2/stackglue.h | 2 + fs/ocfs2/super.c | 5 + 10 files changed, 981 insertions(+), 11 deletions(-) create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt create mode 100644 fs/ocfs2/filecheck.c create mode 100644 fs/ocfs2/filecheck.h -- 2.1.2
linux-next: manual merge of the kvm-arm tree with the arm64 tree
Hi all, Today's linux-next merge of the kvm-arm tree got a conflict in: arch/arm64/include/asm/cpufeature.h between commit: 104a0c02e8b1 ("arm64: Add workaround for Cavium erratum 27456") from the arm64 tree and commit: d0be74f771d5 ("arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature") from the kvm-arm tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell diff --cc arch/arm64/include/asm/cpufeature.h index 1497163213ed,a5c769b1c65b.. --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@@ -30,12 -30,12 +30,13 @@@ #define ARM64_HAS_LSE_ATOMICS 5 #define ARM64_WORKAROUND_CAVIUM_23154 6 #define ARM64_WORKAROUND_834220 7 -/* #define ARM64_HAS_NO_HW_PREFETCH 8 */ -/* #define ARM64_HAS_UAO 9 */ -/* #define ARM64_ALT_PAN_NOT_UAO 10 */ +#define ARM64_HAS_NO_HW_PREFETCH 8 +#define ARM64_HAS_UAO 9 +#define ARM64_ALT_PAN_NOT_UAO 10 + #define ARM64_HAS_VIRT_HOST_EXTN 11 +#define ARM64_WORKAROUND_CAVIUM_27456 12 -#define ARM64_NCAPS 12 +#define ARM64_NCAPS 13 #ifndef __ASSEMBLY__
Re: [PATCH V3 3/3] vhost_net: basic polling support
On 02/29/2016 05:56 AM, Christian Borntraeger wrote: > On 02/26/2016 09:42 AM, Jason Wang wrote: >> > This patch tries to poll for new added tx buffer or socket receive >> > queue for a while at the end of tx/rx processing. The maximum time >> > spent on polling were specified through a new kind of vring ioctl. >> > >> > Signed-off-by: Jason Wang >> > --- >> > drivers/vhost/net.c| 79 >> > +++--- >> > drivers/vhost/vhost.c | 14 >> > drivers/vhost/vhost.h | 1 + >> > include/uapi/linux/vhost.h | 6 >> > 4 files changed, 95 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> > index 9eda69e..c91af93 100644 >> > --- a/drivers/vhost/net.c >> > +++ b/drivers/vhost/net.c >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info >> > *ubuf, bool success) >> >rcu_read_unlock_bh(); >> > } >> > >> > +static inline unsigned long busy_clock(void) >> > +{ >> > + return local_clock() >> 10; >> > +} >> > + >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev, >> > + unsigned long endtime) >> > +{ >> > + return likely(!need_resched()) && >> > + likely(!time_after(busy_clock(), endtime)) && >> > + likely(!signal_pending(current)) && >> > + !vhost_has_work(dev) && >> > + single_task_running(); >> > +} >> > + >> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net, >> > + struct vhost_virtqueue *vq, >> > + struct iovec iov[], unsigned int iov_size, >> > + unsigned int *out_num, unsigned int *in_num) >> > +{ >> > + unsigned long uninitialized_var(endtime); >> > + int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), >> > + out_num, in_num, NULL, NULL); >> > + >> > + if (r == vq->num && vq->busyloop_timeout) { >> > + preempt_disable(); >> > + endtime = busy_clock() + vq->busyloop_timeout; >> > + while (vhost_can_busy_poll(vq->dev, endtime) && >> > + vhost_vq_avail_empty(vq->dev, vq)) >> > + cpu_relax(); > Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for > almost > everybody but s390? cpu_relax (without low latency might give up the time > slice > when running under another hypervisor (like LPAR on s390), which might not be > what > we want here. Ok, will do this in next version.
Re: [PATCH V3 3/3] vhost_net: basic polling support
On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote: > On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote: >> > This patch tries to poll for new added tx buffer or socket receive >> > queue for a while at the end of tx/rx processing. The maximum time >> > spent on polling were specified through a new kind of vring ioctl. >> > >> > Signed-off-by: Jason Wang > Looks good overall, but I still see one problem. > >> > --- >> > drivers/vhost/net.c| 79 >> > +++--- >> > drivers/vhost/vhost.c | 14 >> > drivers/vhost/vhost.h | 1 + >> > include/uapi/linux/vhost.h | 6 >> > 4 files changed, 95 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> > index 9eda69e..c91af93 100644 >> > --- a/drivers/vhost/net.c >> > +++ b/drivers/vhost/net.c >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info >> > *ubuf, bool success) >> >rcu_read_unlock_bh(); >> > } >> > >> > +static inline unsigned long busy_clock(void) >> > +{ >> > + return local_clock() >> 10; >> > +} >> > + >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev, >> > + unsigned long endtime) >> > +{ >> > + return likely(!need_resched()) && >> > + likely(!time_after(busy_clock(), endtime)) && >> > + likely(!signal_pending(current)) && >> > + !vhost_has_work(dev) && >> > + single_task_running(); > So I find it quite unfortunate that this still uses single_task_running. > This means that for example a SCHED_IDLE task will prevent polling from > becoming active, and that seems like a bug, or at least > an undocumented feature :). Yes, it may need more thoughts. > > Unfortunately this logic affects the behaviour as observed > by userspace, so we can't merge it like this and tune > afterwards, since otherwise mangement tools will start > depending on this logic. > > How about remove single_task_running() first here and optimize on top? We probably need something like this to handle overcommitment.
[lkp] [n_tty] dd9a6fee68: INFO: possible circular locking dependency detected ]
FYI, we noticed the below changes on https://github.com/0day-ci/linux Brian-Bloniarz/Re-n_tty-Check-the-other-end-of-pty-pair-before-returning-EAGAIN-on-a-read/20160229-070452 commit dd9a6fee6830f16f602b1aa2e85d6307acd04945 ("n_tty: Check the other end of pty pair before returning EAGAIN on a read()") ++--++ || v4.5-rc6 | dd9a6fee68 | ++--++ | boot_successes | 128 | 2 | | boot_failures | 9| 6 | | invoked_oom-killer:gfp_mask=0x | 9| 1 | | Mem-Info | 9| 2 | | Out_of_memory:Kill_process | 9| 1 | | backtrace:vfs_write| 1|| | backtrace:SyS_write| 1|| | backtrace:do_execveat_common | 1|| | backtrace:compat_SyS_execve| 1|| | backtrace:vfs_read | 1| 4 | | backtrace:SyS_read | 1| 4 | | backtrace:compat_process_vm_rw | 1|| | backtrace:compat_SyS_process_vm_readv | 1|| | backtrace:_do_fork | 1|| | backtrace:SyS_clone| 1|| | page_allocation_failure:order:#,mode | 0| 1 | | warn_alloc_failed+0x | 0| 1 | | backtrace:kswapd | 0| 1 | | INFO:possible_circular_locking_dependency_detected | 0| 4 | | backtrace:flush_to_ldisc | 0| 4 | ++--++ [ 17.523349] mount (2393) used greatest stack depth: 12392 bytes left [ 17.684314] [ 17.684972] == [ 17.686059] [ INFO: possible circular locking dependency detected ] [ 17.687174] 4.5.0-rc6-1-gdd9a6fe #64 Not tainted [ 17.688127] --- [ 17.689216] bootlogd/2434 is trying to acquire lock: [ 17.690167] ((&buf->work)){+.+...}, at: [] flush_work+0x5/0x23d [ 17.692006] [ 17.692006] but task is already holding lock: [ 17.693433] (&tty->termios_rwsem){..}, at: [] n_tty_read+0xd0/0x882 [ 17.695346] [ 17.695346] which lock already depends on the new lock. [ 17.695346] [ 17.697370] [ 17.697370] the existing dependency chain (in reverse order) is: [ 17.698961] -> #2 (&tty->termios_rwsem){..}: [ 17.700507][] lock_acquire+0x147/0x1e2 [ 17.701621][] down_read+0x48/0x90 [ 17.702696][] n_tty_receive_buf_common+0x46/0x8c0 [ 17.703900][] n_tty_receive_buf2+0x14/0x16 [ 17.705046][] flush_to_ldisc+0xcb/0x125 [ 17.706167][] process_one_work+0x2b8/0x5b2 [ 17.707339][] worker_thread+0x28b/0x37d [ 17.708454][] kthread+0xfb/0x103 [ 17.709511][] ret_from_fork+0x3f/0x70 [ 17.710614] -> #1 (&buf->lock){+.+...}: [ 17.712070][] lock_acquire+0x147/0x1e2 [ 17.713185][] mutex_lock_nested+0x79/0x35f [ 17.714328][] flush_to_ldisc+0x4b/0x125 [ 17.715443][] process_one_work+0x2b8/0x5b2 [ 17.716587][] worker_thread+0x28b/0x37d [ 17.717700][] kthread+0xfb/0x103 [ 17.718752][] ret_from_fork+0x3f/0x70 [ 17.719855] -> #0 ((&buf->work)){+.+...}: [ 17.721333][] __lock_acquire+0x12dd/0x1932 [ 17.722489][] lock_acquire+0x147/0x1e2 [ 17.723598][] flush_work+0x3a/0x23d [ 17.724683][] n_tty_read+0x308/0x882 [ 17.725771][] tty_read+0x8b/0xcd [ 17.726830][] __vfs_read+0x26/0xb9 [ 17.727910][] vfs_read+0xa0/0x12e [ 17.728974][] SyS_read+0x51/0x92 [ 17.730032][] entry_SYSCALL_64_fastpath+0x12/0x72 [ 17.731237] [ 17.731237] other info that might help us debug this: [ 17.731237] [ 17.733255] Chain exists of: (&buf->work) --> &buf->lock --> &tty->termios_rwsem [ 17.735644] Possible unsafe locking scenario: [ 17.735644] [ 17.737064]CPU0CPU1 [ 17.737969] [ 17.738873] lock(&tty->termios_rwsem); [ 17.739832]lock(&buf->lock); [ 17.740966]lock(&tty->termios_rwsem); [ 17.742181] lock((&buf->work)); [ 17.743081] [ 17.743081] *** DE
Western Union Pick up
Dear Recipient, You have £850,000 British Pounds Sterling waiting for pick-up at Western Union. Contact: wuglobaloff...@qq.com with your personal information for pick up. Sincerely, Hillary Wilson Heritage Lottery Fund Tel: +44 7024040428
Re: Softirq priority inversion from "softirq: reduce latencies"
On Sun, 2016-02-28 at 18:01 +0100, Francois Romieu wrote: > Mike Galbraith : > [...] > > Hrm, relatively new + tasklet woes rings a bell. Ah, that.. > > > > > > What's worse is that at the point where this code was written it was > > already well known that tasklets are a steaming pile of crap and > > should die. > > > > > > Source thereof https://lwn.net/Articles/588457/ > > tasklets are ingrained in the dmaengine API (see > Documentation/dmaengine/client.txt > and drivers/dma/virt-dma.h::vchan_cookie_complete). > > Moving everything to irq context or handling his own sub-{jiffy/ms} timer > while losing async dma doesn't exactly smell like roses either. :o( https://lwn.net/Articles/239633/ If I'm listening properly, the root cause is that there is a timing constraint involved, which is being exposed because one softirq raises another (ew). Processing timeout happens, freshly raised tasklet wanders off to SCHED_NORMAL kthread context where its constraint dies. Given the dma stuff apparently works fine in -rt (or did, see below), timing constraints can't be super tight, so perhaps we could grow realtime workqueue support for the truly deserving. The tricky bit would be being keeping everybody and his brother from abusing it. WRT -rt: if dma tasklets really do have hard (ish) constraints, -rt recently "broke" in the same way.. of all softirqs which are deferred to kthread context, due to a recent change, only timer/hrtimer are executed at realtime priority by default. -Mike
[PATCH v8] watchdog: Add watchdog timer support for the WinSystems EBC-C384
The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range supported by the watchdog timer is 1 second to 255 minutes. Timeouts under 256 seconds have a 1 second granularity, while the rest have a 1 minute granularity. This driver adds watchdog timer support for this onboard watchdog timer. The timeout may be configured via the timeout module parameter. Signed-off-by: William Breathitt Gray --- Changes in v8: - Utilize the roundup macro to round up second resolution to minute granularity when setting the timeout member MAINTAINERS | 6 ++ drivers/watchdog/Kconfig| 9 ++ drivers/watchdog/Makefile | 1 + drivers/watchdog/ebc-c384_wdt.c | 188 4 files changed, 204 insertions(+) create mode 100644 drivers/watchdog/ebc-c384_wdt.c diff --git a/MAINTAINERS b/MAINTAINERS index 28eb61b..66107fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11860,6 +11860,12 @@ M: David Härdeman S: Maintained F: drivers/media/rc/winbond-cir.c +WINSYSTEMS EBC-C384 WATCHDOG DRIVER +M: William Breathitt Gray +L: linux-watch...@vger.kernel.org +S: Maintained +F: drivers/watchdog/ebc-c384_wdt.c + WIMAX STACK M: Inaky Perez-Gonzalez M: linux-wi...@intel.com diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 0f6d851..11f3a3d 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -713,6 +713,15 @@ config ALIM7101_WDT Most people will say N. +config EBC_C384_WDT + tristate "WinSystems EBC-C384 Watchdog Timer" + depends on X86 + select WATCHDOG_CORE + help + Enables watchdog timer support for the watchdog timer on the + WinSystems EBC-C384 motherboard. The timeout may be configured via + the timeout module parameter. + config F71808E_WDT tristate "Fintek F71808E, F71862FG, F71869, F71882FG and F71889FG Watchdog" depends on X86 diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile index f566753..15762c8 100644 --- a/drivers/watchdog/Makefile +++ b/drivers/watchdog/Makefile @@ -88,6 +88,7 @@ obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o obj-$(CONFIG_ADVANTECH_WDT) += advantechwdt.o obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o +obj-$(CONFIG_EBC_C384_WDT) += ebc-c384_wdt.o obj-$(CONFIG_F71808E_WDT) += f71808e_wdt.o obj-$(CONFIG_SP5100_TCO) += sp5100_tco.o obj-$(CONFIG_GEODE_WDT) += geodewdt.o diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c new file mode 100644 index 000..77fda0b --- /dev/null +++ b/drivers/watchdog/ebc-c384_wdt.c @@ -0,0 +1,188 @@ +/* + * Watchdog timer driver for the WinSystems EBC-C384 + * Copyright (C) 2016 William Breathitt Gray + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MODULE_NAME"ebc-c384_wdt" +#define WATCHDOG_TIMEOUT 60 +/* + * The timeout value in minutes must fit in a single byte when sent to the + * watchdog timer; the maximum timeout possible is 15300 (255 * 60) seconds. + */ +#define WATCHDOG_MAX_TIMEOUT 15300 +#define BASE_ADDR 0x564 +#define ADDR_EXTENT5 +#define CFG_ADDR (BASE_ADDR + 1) +#define PET_ADDR (BASE_ADDR + 2) + +static bool nowayout = WATCHDOG_NOWAYOUT; +module_param(nowayout, bool, 0); +MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" + __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); + +static unsigned timeout; +module_param(timeout, uint, 0); +MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (default=" + __MODULE_STRING(WATCHDOG_TIMEOUT) ")"); + +static int ebc_c384_wdt_start(struct watchdog_device *wdev) +{ + unsigned t = wdev->timeout; + + /* resolution is in minutes for timeouts greater than 255 seconds */ + if (t > 255) + t = DIV_ROUND_UP(t, 60); + + outb(t, PET_ADDR); + + return 0; +} + +static int ebc_c384_wdt_stop(struct watchdog_device *wdev) +{ + outb(0x00, PET_ADDR); + + return 0; +} + +static int ebc_c384_wdt_set_timeout(struct watchdog_device *wdev, unsigned t) +{ + /* resolution is in minutes for timeouts greater than 255 seconds */ + if (t > 255) { + /* round second resolution up to minute granularity */ + wdev->timeout = roundup(t, 60); + + /* set watchdog timer for minute
Re: [PATCH] hwmon: (ntc_thermistor) Add support for ncpXXxh103
On 02/28/2016 02:31 PM, Joseph wrote: From: Joseph McNally This patch adds support for the Murata NCP15XH103 thermistor series. Signed-off-by: Joseph McNally Applied. Thanks, Guenter
[PATCH] mm: __delete_from_page_cache WARN_ON(page_mapped)
Commit e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages") changed the famous BUG_ON(page_mapped(page)) in __delete_from_page_cache() to VM_BUG_ON_PAGE(page_mapped(page)): which gives us more info when CONFIG_DEBUG_VM=y, but nothing at all when not. Although it has not usually been very helpul, being hit long after the error in question, we do need to know if it actually happens on users' systems; but reinstating a crash there is likely to be opposed :) In the non-debug case, use WARN_ON() plus dump_page() and add_taint() - I don't really believe LOCKDEP_NOW_UNRELIABLE, but that seems to be the standard procedure now. Move that, or the VM_BUG_ON_PAGE(), up before the deletion from tree: so that the unNULLified page->mapping gives a little more information. If the inode is being evicted (rather than truncated), it won't have any vmas left, so it's safe(ish) to assume that the raised mapcount is erroneous, and we can discount it from page_count to avoid leaking the page (I'm less worried by leaking the occasional 4kB, than losing a potential 2MB page with each 4kB page leaked). Signed-off-by: Hugh Dickins --- I think this should go into v4.5, so I've written it with an atomic_sub on page->_count; but Joonsoo will probably want some page_ref thingy. mm/filemap.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) --- 4.5-rc6/mm/filemap.c2016-02-28 09:04:38.816707844 -0800 +++ linux/mm/filemap.c 2016-02-28 19:45:23.406263928 -0800 @@ -195,6 +195,27 @@ void __delete_from_page_cache(struct pag else cleancache_invalidate_page(mapping, page); + VM_BUG_ON_PAGE(page_mapped(page), page); + if (!IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON(page_mapped(page))) { + int mapcount; + + dump_page(page, "still mapped when deleted"); + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + mapcount = page_mapcount(page); + if (mapping_exiting(mapping) && + page_count(page) >= mapcount + 2) { + /* +* All vmas have already been torn down, so it's +* a good bet that actually the page is unmapped, +* and we'd prefer not to leak it: if we're wrong, +* some other bad page check should catch it later. +*/ + page_mapcount_reset(page); + atomic_sub(mapcount, &page->_count); + } + } + page_cache_tree_delete(mapping, page, shadow); page->mapping = NULL; @@ -205,7 +226,6 @@ void __delete_from_page_cache(struct pag __dec_zone_page_state(page, NR_FILE_PAGES); if (PageSwapBacked(page)) __dec_zone_page_state(page, NR_SHMEM); - VM_BUG_ON_PAGE(page_mapped(page), page); /* * At this point page must be either written or cleaned by truncate.
Re: [v7, RESEND] watchdog: Add watchdog timer support for the WinSystems EBC-C384
Hi William, On Sun, Feb 28, 2016 at 11:29:10PM -0500, William Breathitt Gray wrote: > The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range > supported by the watchdog timer is 1 second to 255 minutes. Timeouts > under 256 seconds have a 1 second granularity, while the rest have a 1 > minute granularity. > > This driver adds watchdog timer support for this onboard watchdog timer. > The timeout may be configured via the timeout module parameter. > > Signed-off-by: William Breathitt Gray > Reviewed-by: Guenter Roeck > --- > MAINTAINERS | 6 ++ > drivers/watchdog/Kconfig| 9 ++ > drivers/watchdog/Makefile | 1 + > drivers/watchdog/ebc-c384_wdt.c | 188 > > [ ... ] > + > +static int ebc_c384_wdt_set_timeout(struct watchdog_device *wdev, unsigned t) > +{ > + /* resolution is in minutes for timeouts greater than 255 seconds */ > + if (t > 255) { > + /* round second resolution up to minute granularity */ > + wdev->timeout = DIV_ROUND_UP(t, 60) * 60; Good catch. Turns out there is a much better macro for this: wdev->timeout = roundup(t, 60); Guenter
linux-next: manual merge of the tip tree with the pm tree
Hi all, Today's linux-next merge of the tip tree got a conflict in: drivers/cpufreq/intel_pstate.c between commit: 7791e4aa59ad ("cpufreq: intel_pstate: Enable HWP by default") from the pm tree and commit: bc696ca05f5a ("x86/cpufeature: Replace the old static_cpu_has() with safe variant") from the tip tree. I fixed it up (the former removed the code modified by the latter) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell
Re: [PATCH v7] watchdog: Add watchdog timer support for the WinSystems EBC-C384
Hi, On 02/28/2016 08:20 PM, William Breathitt Gray wrote: The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range supported by the watchdog timer is 1 second to 255 minutes. Timeouts under 256 seconds have a 1 second granularity, while the rest have a 1 minute granularity. This driver adds watchdog timer support for this onboard watchdog timer. The timeout may be configured via the timeout module parameter. Signed-off-by: William Breathitt Gray Reviewed-by: Guenter Roeck --- Changes in v7: - Make sure timeout member is in seconds resolution despite minutes granularity For Wim's benefit: You forgot the actual change. The follow-up RESEND is really confusing; RESEND indicates that no change was made, and leaves it up to us to figure out what is going on. If something like this happens again, just add another rev and add a note indicating what has (really) changed. Also, when you make code changes, please drop previous Reviewed-by: or Acked-by: tags unless you got explicit permission from the reviewer to keep the tag. Thanks, Guenter
Re: [PATCH] 3c59x: Ensure to apply the expires time
From: Stafford Horne Date: Sun, 28 Feb 2016 16:49:29 +0900 > In commit 5b6490def9168af6a ("3c59x: Use setup_timer()") Amitoj > removed add_timer which sets up the epires timer. In this patch > the behavior is restore but it uses mod_timer which is a bit more > compact. > > Signed-off-by: Stafford Horne Applied, thanks.
[PATCH v7 RESEND] watchdog: Add watchdog timer support for the WinSystems EBC-C384
The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range supported by the watchdog timer is 1 second to 255 minutes. Timeouts under 256 seconds have a 1 second granularity, while the rest have a 1 minute granularity. This driver adds watchdog timer support for this onboard watchdog timer. The timeout may be configured via the timeout module parameter. Signed-off-by: William Breathitt Gray Reviewed-by: Guenter Roeck --- MAINTAINERS | 6 ++ drivers/watchdog/Kconfig| 9 ++ drivers/watchdog/Makefile | 1 + drivers/watchdog/ebc-c384_wdt.c | 188 4 files changed, 204 insertions(+) create mode 100644 drivers/watchdog/ebc-c384_wdt.c diff --git a/MAINTAINERS b/MAINTAINERS index 28eb61b..66107fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11860,6 +11860,12 @@ M: David Härdeman S: Maintained F: drivers/media/rc/winbond-cir.c +WINSYSTEMS EBC-C384 WATCHDOG DRIVER +M: William Breathitt Gray +L: linux-watch...@vger.kernel.org +S: Maintained +F: drivers/watchdog/ebc-c384_wdt.c + WIMAX STACK M: Inaky Perez-Gonzalez M: linux-wi...@intel.com diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 0f6d851..11f3a3d 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -713,6 +713,15 @@ config ALIM7101_WDT Most people will say N. +config EBC_C384_WDT + tristate "WinSystems EBC-C384 Watchdog Timer" + depends on X86 + select WATCHDOG_CORE + help + Enables watchdog timer support for the watchdog timer on the + WinSystems EBC-C384 motherboard. The timeout may be configured via + the timeout module parameter. + config F71808E_WDT tristate "Fintek F71808E, F71862FG, F71869, F71882FG and F71889FG Watchdog" depends on X86 diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile index f566753..15762c8 100644 --- a/drivers/watchdog/Makefile +++ b/drivers/watchdog/Makefile @@ -88,6 +88,7 @@ obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o obj-$(CONFIG_ADVANTECH_WDT) += advantechwdt.o obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o +obj-$(CONFIG_EBC_C384_WDT) += ebc-c384_wdt.o obj-$(CONFIG_F71808E_WDT) += f71808e_wdt.o obj-$(CONFIG_SP5100_TCO) += sp5100_tco.o obj-$(CONFIG_GEODE_WDT) += geodewdt.o diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c new file mode 100644 index 000..21a4e95 --- /dev/null +++ b/drivers/watchdog/ebc-c384_wdt.c @@ -0,0 +1,188 @@ +/* + * Watchdog timer driver for the WinSystems EBC-C384 + * Copyright (C) 2016 William Breathitt Gray + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MODULE_NAME"ebc-c384_wdt" +#define WATCHDOG_TIMEOUT 60 +/* + * The timeout value in minutes must fit in a single byte when sent to the + * watchdog timer; the maximum timeout possible is 15300 (255 * 60) seconds. + */ +#define WATCHDOG_MAX_TIMEOUT 15300 +#define BASE_ADDR 0x564 +#define ADDR_EXTENT5 +#define CFG_ADDR (BASE_ADDR + 1) +#define PET_ADDR (BASE_ADDR + 2) + +static bool nowayout = WATCHDOG_NOWAYOUT; +module_param(nowayout, bool, 0); +MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" + __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); + +static unsigned timeout; +module_param(timeout, uint, 0); +MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (default=" + __MODULE_STRING(WATCHDOG_TIMEOUT) ")"); + +static int ebc_c384_wdt_start(struct watchdog_device *wdev) +{ + unsigned t = wdev->timeout; + + /* resolution is in minutes for timeouts greater than 255 seconds */ + if (t > 255) + t = DIV_ROUND_UP(t, 60); + + outb(t, PET_ADDR); + + return 0; +} + +static int ebc_c384_wdt_stop(struct watchdog_device *wdev) +{ + outb(0x00, PET_ADDR); + + return 0; +} + +static int ebc_c384_wdt_set_timeout(struct watchdog_device *wdev, unsigned t) +{ + /* resolution is in minutes for timeouts greater than 255 seconds */ + if (t > 255) { + /* round second resolution up to minute granularity */ + wdev->timeout = DIV_ROUND_UP(t, 60) * 60; + + /* set watchdog timer for minutes */ + outb(0x00, CFG_ADDR); + } else { + wdev->timeout = t; +
[PATCH v19 03/10] x86/xen: Mark xen_cpuid() stack frame as non-standard
objtool reports the following false positive warning: arch/x86/xen/enlighten.o: warning: objtool: xen_cpuid()+0x41: can't find jump dest instruction at .text+0x108 The warning is due to xen_cpuid()'s use of XEN_EMULATE_PREFIX to insert some fake instructions which objtool doesn't know how to decode. Signed-off-by: Josh Poimboeuf Cc: David Vrabel Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky --- arch/x86/xen/enlighten.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index d09e4c9..5c45a69 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -32,6 +32,7 @@ #include #include #include +#include #ifdef CONFIG_KEXEC_CORE #include @@ -351,8 +352,8 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx, *cx &= maskecx; *cx |= setecx; *dx &= maskedx; - } +STACK_FRAME_NON_STANDARD(xen_cpuid); /* XEN_EMULATE_PREFIX */ static bool __init xen_check_mwait(void) { -- 2.4.3
Re: [PATCH v7] watchdog: Add watchdog timer support for the WinSystems EBC-C384
On 02/28/2016 11:20 PM, William Breathitt Gray wrote: > The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range > supported by the watchdog timer is 1 second to 255 minutes. Timeouts > under 256 seconds have a 1 second granularity, while the rest have a 1 > minute granularity. > > This driver adds watchdog timer support for this onboard watchdog timer. > The timeout may be configured via the timeout module parameter. > > Signed-off-by: William Breathitt Gray > Reviewed-by: Guenter Roeck > --- > Changes in v7: > - Make sure timeout member is in seconds resolution despite minutes > granularity Oops, my apologies, I sent out the wrong commit. Please ignore this version, I will resend the correct commit. William Breathitt Gray
[PATCH v19 01/10] objtool: Mark non-standard files and directories
Code which runs outside the kernel's normal mode of operation often does unusual things which can cause a static analysis tool like objtool to emit false positive warnings: - boot image - vdso image - relocation - realmode - efi - head - purgatory - modpost Set OBJECT_FILES_NON_STANDARD for their related files and directories, which will tell objtool to skip checking them. It's ok to skip them because they don't affect runtime stack traces. Also skip the following code which does the right thing with respect to frame pointers, but is too "special" to be validated by a tool: - entry - mcount Also skip the test_nx module because it modifies its exception handling table at runtime, which objtool can't understand. Fortunately it's just a test module so it doesn't matter much. Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it might eventually be useful for other tools. Signed-off-by: Josh Poimboeuf --- arch/x86/boot/Makefile| 3 ++- arch/x86/boot/compressed/Makefile | 3 ++- arch/x86/entry/Makefile | 4 arch/x86/entry/vdso/Makefile | 6 -- arch/x86/kernel/Makefile | 11 --- arch/x86/platform/efi/Makefile| 2 ++ arch/x86/purgatory/Makefile | 2 ++ arch/x86/realmode/Makefile| 4 +++- arch/x86/realmode/rm/Makefile | 3 ++- drivers/firmware/efi/libstub/Makefile | 1 + scripts/mod/Makefile | 2 ++ 11 files changed, 32 insertions(+), 9 deletions(-) diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index bbe1a62..0bf6749 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -9,7 +9,8 @@ # Changed by many, many contributors over the years. # -KASAN_SANITIZE := n +KASAN_SANITIZE := n +OBJECT_FILES_NON_STANDARD := y # If you want to preset the SVGA mode, uncomment the next line and # set SVGA_MODE to whatever number you want. diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index f9ce75d..5e1d26e 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -16,7 +16,8 @@ # (see scripts/Makefile.lib size_append) # compressed vmlinux.bin.all + u32 size of vmlinux.bin.all -KASAN_SANITIZE := n +KASAN_SANITIZE := n +OBJECT_FILES_NON_STANDARD := y targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \ vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile index bd55ded..fe91c25 100644 --- a/arch/x86/entry/Makefile +++ b/arch/x86/entry/Makefile @@ -1,6 +1,10 @@ # # Makefile for the x86 low level entry code # + +OBJECT_FILES_NON_STANDARD_entry_$(BITS).o := y +OBJECT_FILES_NON_STANDARD_entry_64_compat.o := y + obj-y := entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o obj-y += common.o diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index c854541..f9fb859 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -3,8 +3,9 @@ # KBUILD_CFLAGS += $(DISABLE_LTO) -KASAN_SANITIZE := n -UBSAN_SANITIZE := n +KASAN_SANITIZE := n +UBSAN_SANITIZE := n +OBJECT_FILES_NON_STANDARD := y VDSO64-$(CONFIG_X86_64):= y VDSOX32-$(CONFIG_X86_X32_ABI) := y @@ -16,6 +17,7 @@ vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o # files to link into kernel obj-y += vma.o +OBJECT_FILES_NON_STANDARD_vma.o:= n # vDSO images to build vdso_img-$(VDSO64-y) += 64 diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index b1b78ff..d5fb087 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -16,9 +16,14 @@ CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_early_printk.o = -pg endif -KASAN_SANITIZE_head$(BITS).o := n -KASAN_SANITIZE_dumpstack.o := n -KASAN_SANITIZE_dumpstack_$(BITS).o := n +KASAN_SANITIZE_head$(BITS).o := n +KASAN_SANITIZE_dumpstack.o := n +KASAN_SANITIZE_dumpstack_$(BITS).o := n + +OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y +OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o:= y +OBJECT_FILES_NON_STANDARD_mcount_$(BITS).o := y +OBJECT_FILES_NON_STANDARD_test_nx.o:= y CFLAGS_irq.o := -I$(src)/../include/asm/trace diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile index 2846aaa..066619b 100644 --- a/arch/x86/platform/efi/Makefile +++ b/arch/x86/platform/efi/Makefile @@ -1,3 +1,5 @@ +OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y + obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o obj-$(CONFIG_ACPI_BGRT) += efi-bgrt.o obj-$(CONFIG_EARLY_PRINTK_EFI) += early_printk.o diff --git a/arch/x86/purgatory/Makefi
[PATCH v19 05/10] sched: Mark __schedule() stack frame as non-standard
objtool reports the following warnings for __schedule(): kernel/sched/core.o: warning: objtool:__schedule()+0x3c0: duplicate frame pointer save kernel/sched/core.o: warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer kernel/sched/core.o: warning: objtool:__schedule()+0x40a: call without frame pointer save/setup kernel/sched/core.o: warning: objtool:__schedule()+0x7fd: frame pointer state mismatch kernel/sched/core.o: warning: objtool:__schedule()+0x421: frame pointer state mismatch Basically it's confused by two unusual attributes of the switch_to() macro: 1. It saves prev's frame pointer to the old stack and restores next's frame pointer from the new stack. 2. For new tasks it jumps directly to ret_from_fork. Eventually it would probably be a good idea to clean up the ret_from_fork hack so that new tasks are created with a valid initial stack, as suggested by Andy: https://lkml.kernel.org/r/CALCETrWsqCw4L1qKO9j9L5F+4ED4viuLQTFc=n1pkbzffpq...@mail.gmail.com Then __schedule() could return normally into the new code and objtool hopefully wouldn't have a problem anymore. In the meantime, mark its stack frame as non-standard so we can have a baseline with no objtool warnings. The marker also serves as a reminder that this code could be improved a bit. Signed-off-by: Josh Poimboeuf --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9503d59..641043d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -74,6 +74,7 @@ #include #include #include +#include #include #include @@ -3288,6 +3289,7 @@ static void __sched notrace __schedule(bool preempt) balance_callback(rq); } +STACK_FRAME_NON_STANDARD(__schedule); /* switch_to() */ static inline void sched_submit_work(struct task_struct *tsk) { -- 2.4.3
[PATCH v19 09/10] objtool: Add CONFIG_STACK_VALIDATION option
Add a CONFIG_STACK_VALIDATION option which will run "objtool check" for each .o file to ensure the validity of its stack metadata. Signed-off-by: Josh Poimboeuf --- Makefile | 5 - arch/Kconfig | 6 ++ lib/Kconfig.debug | 12 scripts/Makefile.build | 39 +++ 4 files changed, 57 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index fbe1b92..62be03b 100644 --- a/Makefile +++ b/Makefile @@ -993,7 +993,10 @@ prepare0: archprepare FORCE $(Q)$(MAKE) $(build)=. # All the preparing.. -prepare: prepare0 +prepare: prepare0 prepare-objtool + +PHONY += prepare-objtool +prepare-objtool: $(if $(CONFIG_STACK_VALIDATION), tools/objtool FORCE) # Generate some files # --- diff --git a/arch/Kconfig b/arch/Kconfig index f6b649d..81869a5 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -583,6 +583,12 @@ config HAVE_COPY_THREAD_TLS normal C parameter passing, rather than extracting the syscall argument from pt_regs. +config HAVE_STACK_VALIDATION + bool + help + Architecture supports the 'objtool check' host tool command, which + performs compile-time stack metadata validation. + # # ABI hall of shame # diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 8bfd1ac..8552656 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -342,6 +342,18 @@ config FRAME_POINTER larger and slower, but it gives very useful debugging information in case of kernel bugs. (precise oopses/stacktraces/warnings) +config STACK_VALIDATION + bool "Compile-time stack metadata validation" + depends on HAVE_STACK_VALIDATION + default n + help + Add compile-time checks to validate stack metadata, including frame + pointers (if CONFIG_FRAME_POINTER is enabled). This helps ensure + that runtime stack traces are more reliable. + + For more information, see + tools/objtool/Documentation/stack-validation.txt. + config DEBUG_FORCE_WEAK_PER_CPU bool "Force weak per-cpu definitions" depends on DEBUG_KERNEL diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 2c47f9c..130a452 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -241,10 +241,32 @@ cmd_record_mcount = \ fi; endif +ifdef CONFIG_STACK_VALIDATION + +__objtool_obj := $(objtree)/tools/objtool/objtool + +objtool_args = check +ifndef CONFIG_FRAME_POINTER +objtool_args += --no-fp +endif + +# 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory +# 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file +# 'OBJECT_FILES_NON_STANDARD_foo.o := 'n': override directory skip for a file +cmd_objtool = $(if $(patsubst y%,, \ + $(OBJECT_FILES_NON_STANDARD_$(basetarget).o)$(OBJECT_FILES_NON_STANDARD)n), \ + $(__objtool_obj) $(objtool_args) "$(@)";) +objtool_obj = $(if $(patsubst y%,, \ + $(OBJECT_FILES_NON_STANDARD_$(basetarget).o)$(OBJECT_FILES_NON_STANDARD)n), \ + $(__objtool_obj)) + +endif # CONFIG_STACK_VALIDATION + define rule_cc_o_c $(call echo-cmd,checksrc) $(cmd_checksrc) \ $(call echo-cmd,cc_o_c) $(cmd_cc_o_c);\ $(cmd_modversions)\ + $(cmd_objtool)\ $(call echo-cmd,record_mcount)\ $(cmd_record_mcount) \ scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >\ @@ -253,14 +275,23 @@ define rule_cc_o_c mv -f $(dot-target).tmp $(dot-target).cmd endef +define rule_as_o_S + $(call echo-cmd,as_o_S) $(cmd_as_o_S);\ + $(cmd_objtool)\ + scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,as_o_S)' >\ + $(dot-target).tmp; \ + rm -f $(depfile); \ + mv -f $(dot-target).tmp $(dot-target).cmd +endef + # Built-in and composite module parts -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(objtool_obj) FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) # Single-part modules are special since we need to mark them in $(MODVERDIR) -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) $(objtool_obj) FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod) @@ -290,8 +321,8 @@ $(obj
[PATCH v19 07/10] x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
objtool reports the following warning for kretprobe_trampoline(): arch/x86/kernel/kprobes/core.o: warning: objtool: kretprobe_trampoline()+0x20: call without frame pointer save/setup kretprobes are a special case where the stack is intentionally wrong. The return address isn't known at the beginning of the trampoline, so the stack frame can't be set up properly before it calls trampoline_handler(). Because kretprobe handlers don't sleep, the frame pointer doesn't *have* to be accurate in the trampoline. So it's ok to tell objtool to ignore it. This results in no actual changes to the generated code. Signed-off-by: Josh Poimboeuf Cc: Ananth N Mavinakayanahalli Cc: Anil S Keshavamurthy Cc: "David S. Miller" Cc: Masami Hiramatsu --- arch/x86/kernel/kprobes/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 48acaac..ae703ac 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -49,6 +49,7 @@ #include #include #include +#include #include #include @@ -703,6 +704,7 @@ asm( ".size kretprobe_trampoline, .-kretprobe_trampoline\n" ); NOKPROBE_SYMBOL(kretprobe_trampoline); +STACK_FRAME_NON_STANDARD(kretprobe_trampoline); /* * Called from kretprobe_trampoline -- 2.4.3
[PATCH v19 00/10] Compile-time stack metadata validation
This is v19 of the compile-time stack metadata validation patch set. It's based on tip:core/objtool. v18 can be found here: https://lkml.kernel.org/r/cover.1456440439.git.jpoim...@redhat.com For more information about the motivation behind this patch set, and more details about what it does, see the patch 8 changelog and tools/objtool/Documentation/stack-validation.txt. Patches 1-7 mark various directories, files, and functions as "non-standard" in preparation for objtool. Patches 8-10 add objtool and integrate it into the kernel build. v19: - add support for CONFIG_GCOV_KERNEL, CONFIG_KASAN, CONFIG_UBSAN - always inline context_switch() to prevent gcov inline changes - add main() return value in objtool.c - change warning output format to mimic gcc warnings v18: - include/linux/objtool.h -> include/linux/frame.h - __objtool_ignore_func -> __func_stack_frame_non_standard - reword commit messages and comments a bit - reorder patches v17: - __ex_table fix - rename stacktool -> objtool - STACKTOOL_IGNORE_FUNCTION -> STACK_FRAME_NON_STANDARD - 'STACKTOOL := n' -> 'OBJECT_FILES_NON_STANDARD := y' - updated global_noreturns list v16: - fix all allyesconfig warnings, except for staging - get rid of STACKTOOL_IGNORE_INSN which is no longer needed - remove several whitelists in favor of automatically whitelisting any function with a special instruction like ljmp, lret, or vmrun - split up stacktool patch into 3 parts as suggested by Ingo - update the global noreturn function list - detect noreturn function fallthroughs - skip weak functions in noreturn call detection logic - add empty function check to noreturn logic - allow non-section rela symbols for __ex_table sections - support rare switch table case with jmpq *[addr](%rip) - don't warn on frame pointer restore without save - rearrange patch order a bit v15: - restructure code for a new cmdline interface "stacktool check" using the new subcommand framework in tools/lib/subcmd - fix 32 bit build fail (put __sp at end) in paravirt_types.h patch 10 which was reported by 0day v14: - make tools/include/linux/list.h self-sufficient - create FRAME_OFFSET to allow 32-bit code to be able to access function arguments on the stack - add FRAME_OFFSET usage in crypto patch 14/24: "Create stack frames in aesni-intel_asm.S" - rename "index" -> "idx" to fix build with some compilers v13: - LDFLAGS order fix from Chris J Arges - new warning fix patches from Chris J Arges - "--frame-pointer" -> "--check-frame-pointer" v12: - rename "stackvalidate" -> "stacktool" - move from scripts/ to tools/: - makefile rework - make a copy of the x86 insn code (and warn if the code diverges) - use tools/include/linux/list.h - move warning macros to a new warn.h file - change wording: "stack validation" -> "stack metadata validation" v11: - attempt to answer the "why" question better in the documentation and commit message - s/FP_SAVE/FRAME_BEGIN/ in documentation v10: - add scripts/mod to directory ignores - remove circular dependencies for ignored objects which are built before stackvalidate - fix CONFIG_MODVERSIONS incompatibility v9: - rename FRAME/ENDFRAME -> FRAME_BEGIN/FRAME_END - fix jump table issue for when the original instruction is a jump - drop paravirt thunk alignment patch - add maintainers to CC for proposed warning fixes v8: - add proposed fixes for warnings - fix all memory leaks - process ignores earlier and add more ignore checks - always assume POPCNT alternative is enabled - drop hweight inline asm fix - drop __schedule() ignore patch - change .Ltemp_\@ to .Lstackvalidate_ignore_\@ in asm macro - fix CONFIG_* checks in asm macros - add C versions of ignore macros and frame macros - change ";" to "\n" in C macros - add ifdef CONFIG_STACK_VALIDATION checks in C ignore macros - use numbered label in C ignore macro - add missing break in switch case statement in arch-x86.c v7: - sibling call support - document proposed solution for inline asm() frame pointer issues - say "kernel entry/exit" instead of "context switch" - clarify the checking of switch statement jump tables - discard __stackvalidate_ignore_* sections in linker script - use .Ltemp_\@ to get a unique label instead of static 3-digit number - change STACKVALIDATE_IGNORE_FUNC variable to a static - move STACKVALIDATE_IGNORE_INSN to arch-specific .h file v6: - rename asmvalidate -> stackvalidate (again) - gcc-generated object file support - recursive branch state analysis - external jump support - fixup/exception table support - jump label support - switch statement jump table support - added documentation - detection of "noreturn" dead end functions - added a Kbuild mechanism for skipping files and dirs - moved frame pointer macros to arch/x86/include/asm/frame.h - moved ignore macros to include/linux/stackvalidate.h v5: - stackvalidate -> asmvalidate - frame pointers only required for non-leaf functions - check for the use of the FP_SAVE/RESTORE macros instead of manually
[PATCH v19 04/10] bpf: Mark __bpf_prog_run() stack frame as non-standard
objtool reports the following false positive warnings: kernel/bpf/core.o: warning: objtool: __bpf_prog_run()+0x5c: sibling call from callable instruction with changed frame pointer kernel/bpf/core.o: warning: objtool: __bpf_prog_run()+0x60: function has unreachable instruction kernel/bpf/core.o: warning: objtool: __bpf_prog_run()+0x64: function has unreachable instruction [...] It's confused by the following dynamic jump instruction in __bpf_prog_run():: jmp *(%r12,%rax,8) which corresponds to the following line in the C code: goto *jumptable[insn->code]; There's no way for objtool to deterministically find all possible branch targets for a dynamic jump, so it can't verify this code. In this case the jumps all stay within the function, and there's nothing unusual going on related to the stack, so we can whitelist the function. Signed-off-by: Josh Poimboeuf Acked-by: Daniel Borkmann Acked-by: Alexei Starovoitov Cc: net...@vger.kernel.org --- kernel/bpf/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 972d9a8..be0abf6 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -27,6 +27,7 @@ #include #include #include +#include #include @@ -649,6 +650,7 @@ load_byte: WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code); return 0; } +STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */ bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp) -- 2.4.3
[PATCH v19 10/10] objtool: Enable stack metadata validation on x86_64
Set HAVE_STACK_VALIDATION to enable stack metadata validation for x86_64. Signed-off-by: Josh Poimboeuf --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c46662f..adc5a6d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -155,6 +155,7 @@ config X86 select VIRT_TO_BUS select X86_DEV_DMA_OPS if X86_64 select X86_FEATURE_NAMESif PROC_FS + select HAVE_STACK_VALIDATIONif X86_64 config INSTRUCTION_DECODER def_bool y -- 2.4.3
[PATCH v19 06/10] sched: always inline context_switch()
When CONFIG_GCOV is enabled, gcc decides to put context_switch() out-of-line, which is inconsistent with its normal behavior. It also causes an objtool warning because __schedule() no longer inlines context_switch(), so the "STACK_FRAME_NON_STANDARD(__schedule)" statement loses its effect. Signed-off-by: Josh Poimboeuf --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 641043d..bb0daab 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2763,7 +2763,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) /* * context_switch - switch to the new MM and the new thread's register state. */ -static inline struct rq * +static __always_inline struct rq * context_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next) { -- 2.4.3
[PATCH v19 02/10] objtool: Add STACK_FRAME_NON_STANDARD macro
Add a new macro, STACK_FRAME_NON_STANDARD, which is used to denote a function which does something unusual related to its stack frame. Use of the macro prevents objtool from emitting a false positive warning. Signed-off-by: Josh Poimboeuf --- arch/x86/kernel/vmlinux.lds.S | 5 - include/linux/frame.h | 23 +++ 2 files changed, 27 insertions(+), 1 deletion(-) create mode 100644 include/linux/frame.h diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 92dc211..13fa0ad 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -343,7 +343,10 @@ SECTIONS /* Sections to be discarded */ DISCARDS - /DISCARD/ : { *(.eh_frame) } + /DISCARD/ : { + *(.eh_frame) + *(__func_stack_frame_non_standard) + } } diff --git a/include/linux/frame.h b/include/linux/frame.h new file mode 100644 index 000..e6baaba --- /dev/null +++ b/include/linux/frame.h @@ -0,0 +1,23 @@ +#ifndef _LINUX_FRAME_H +#define _LINUX_FRAME_H + +#ifdef CONFIG_STACK_VALIDATION +/* + * This macro marks the given function's stack frame as "non-standard", which + * tells objtool to ignore the function when doing stack metadata validation. + * It should only be used in special cases where you're 100% sure it won't + * affect the reliability of frame pointers and kernel stack traces. + * + * For more information, see tools/objtool/Documentation/stack-validation.txt. + */ +#define STACK_FRAME_NON_STANDARD(func) \ + static void __used __section(__func_stack_frame_non_standard) \ + *__func_stack_frame_non_standard_##func = func + +#else /* !CONFIG_STACK_VALIDATION */ + +#define STACK_FRAME_NON_STANDARD(func) + +#endif /* CONFIG_STACK_VALIDATION */ + +#endif /* _LINUX_FRAME_H */ -- 2.4.3
[PATCH v2] perf/x86/amd: Adding support for new IOMMU performance event
This patch adds new IOMMU performance event based on the information in table 74 of the AMD I/O Virtualization Technology (IOMMU) Specification (Document Id: 4882, Rev 2.62, Feb 2015) Link: http://support.amd.com/TechDocs/48882_IOMMU.pdf Reviewed-by: Joerg Roedel Acked-by: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- Hi Ingo/Peter, I have re-based the patch from tips, and re-send this as V2. If there is no other concern, would you please accept this patch when you get a chance. FYI, here is the link to V1 (https://lkml.org/lkml/2015/12/11/891). Thanks, Suravee arch/x86/events/amd/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c index 635e5eb..40625ca 100644 --- a/arch/x86/events/amd/iommu.c +++ b/arch/x86/events/amd/iommu.c @@ -118,6 +118,11 @@ static struct amd_iommu_event_desc amd_iommu_v2_event_descs[] = { AMD_IOMMU_EVENT_DESC(cmd_processed, "csource=0x11"), AMD_IOMMU_EVENT_DESC(cmd_processed_inv, "csource=0x12"), AMD_IOMMU_EVENT_DESC(tlb_inv, "csource=0x13"), + AMD_IOMMU_EVENT_DESC(ign_rd_wr_mmio_1ff8h,"csource=0x14"), + AMD_IOMMU_EVENT_DESC(vapic_int_non_guest, "csource=0x15"), + AMD_IOMMU_EVENT_DESC(vapic_int_guest, "csource=0x16"), + AMD_IOMMU_EVENT_DESC(smi_recv,"csource=0x17"), + AMD_IOMMU_EVENT_DESC(smi_blk, "csource=0x18"), { /* end: all zeroes */ }, }; -- 1.9.1
linux-next: manual merge of the iommu tree with the samsung-krzk tree
Hi Joerg, Today's linux-next merge of the iommu tree got a conflict in: drivers/memory/Kconfig between commit: 78fbb9361ca3 ("memory: Add support for Exynos SROM driver") from the samsung-krzk tree and commit: cc8bbe1a8312 ("memory: mediatek: Add SMI driver") from the iommu tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell diff --cc drivers/memory/Kconfig index bcb19822968b,51d5cd20c26a.. --- a/drivers/memory/Kconfig +++ b/drivers/memory/Kconfig @@@ -114,7 -114,14 +114,15 @@@ config JZ4780_NEM the Ingenic JZ4780. This controller is used to handle external memory devices such as NAND and SRAM. + config MTK_SMI + bool + depends on ARCH_MEDIATEK || COMPILE_TEST + help + This driver is for the Memory Controller module in MediaTek SoCs, + mainly help enable/disable iommu and control the power domain and + clocks for each local arbiter. + +source "drivers/memory/samsung/Kconfig" source "drivers/memory/tegra/Kconfig" endif
[PATCH v7] watchdog: Add watchdog timer support for the WinSystems EBC-C384
The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range supported by the watchdog timer is 1 second to 255 minutes. Timeouts under 256 seconds have a 1 second granularity, while the rest have a 1 minute granularity. This driver adds watchdog timer support for this onboard watchdog timer. The timeout may be configured via the timeout module parameter. Signed-off-by: William Breathitt Gray Reviewed-by: Guenter Roeck --- Changes in v7: - Make sure timeout member is in seconds resolution despite minutes granularity MAINTAINERS | 6 ++ drivers/watchdog/Kconfig| 9 ++ drivers/watchdog/Makefile | 1 + drivers/watchdog/ebc-c384_wdt.c | 188 4 files changed, 204 insertions(+) create mode 100644 drivers/watchdog/ebc-c384_wdt.c diff --git a/MAINTAINERS b/MAINTAINERS index 28eb61b..66107fd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11860,6 +11860,12 @@ M: David Härdeman S: Maintained F: drivers/media/rc/winbond-cir.c +WINSYSTEMS EBC-C384 WATCHDOG DRIVER +M: William Breathitt Gray +L: linux-watch...@vger.kernel.org +S: Maintained +F: drivers/watchdog/ebc-c384_wdt.c + WIMAX STACK M: Inaky Perez-Gonzalez M: linux-wi...@intel.com diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 0f6d851..11f3a3d 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -713,6 +713,15 @@ config ALIM7101_WDT Most people will say N. +config EBC_C384_WDT + tristate "WinSystems EBC-C384 Watchdog Timer" + depends on X86 + select WATCHDOG_CORE + help + Enables watchdog timer support for the watchdog timer on the + WinSystems EBC-C384 motherboard. The timeout may be configured via + the timeout module parameter. + config F71808E_WDT tristate "Fintek F71808E, F71862FG, F71869, F71882FG and F71889FG Watchdog" depends on X86 diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile index f566753..15762c8 100644 --- a/drivers/watchdog/Makefile +++ b/drivers/watchdog/Makefile @@ -88,6 +88,7 @@ obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o obj-$(CONFIG_ADVANTECH_WDT) += advantechwdt.o obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o +obj-$(CONFIG_EBC_C384_WDT) += ebc-c384_wdt.o obj-$(CONFIG_F71808E_WDT) += f71808e_wdt.o obj-$(CONFIG_SP5100_TCO) += sp5100_tco.o obj-$(CONFIG_GEODE_WDT) += geodewdt.o diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c new file mode 100644 index 000..2cdaf5d --- /dev/null +++ b/drivers/watchdog/ebc-c384_wdt.c @@ -0,0 +1,188 @@ +/* + * Watchdog timer driver for the WinSystems EBC-C384 + * Copyright (C) 2016 William Breathitt Gray + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MODULE_NAME"ebc-c384_wdt" +#define WATCHDOG_TIMEOUT 60 +/* + * The timeout value in minutes must fit in a single byte when sent to the + * watchdog timer; the maximum timeout possible is 15300 (255 * 60) seconds. + */ +#define WATCHDOG_MAX_TIMEOUT 15300 +#define BASE_ADDR 0x564 +#define ADDR_EXTENT5 +#define CFG_ADDR (BASE_ADDR + 1) +#define PET_ADDR (BASE_ADDR + 2) + +static bool nowayout = WATCHDOG_NOWAYOUT; +module_param(nowayout, bool, 0); +MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" + __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); + +static unsigned timeout; +module_param(timeout, uint, 0); +MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (default=" + __MODULE_STRING(WATCHDOG_TIMEOUT) ")"); + +static int ebc_c384_wdt_start(struct watchdog_device *wdev) +{ + unsigned t = wdev->timeout; + + /* resolution is in minutes for timeouts greater than 255 seconds */ + if (t > 255) + t = DIV_ROUND_UP(t, 60); + + outb(t, PET_ADDR); + + return 0; +} + +static int ebc_c384_wdt_stop(struct watchdog_device *wdev) +{ + outb(0x00, PET_ADDR); + + return 0; +} + +static int ebc_c384_wdt_set_timeout(struct watchdog_device *wdev, unsigned t) +{ + /* resolution is in minutes for timeouts greater than 255 seconds */ + if (t > 255) { + /* round second resolution up to minute resolution */ + wdev->timeout = DIV_ROUND_UP(t, 60); + + /* set watchdog timer for minute
Re: [PATCH] s390x: fix condition to choose correct function
merged into cifs-2.6.git Looks like alpha has a similar problem though On Wed, Feb 24, 2016 at 12:45 AM, Yadan Fan wrote: > This issue is involved from commit 02323db17e3a7 ("cifs: fix > cifs_uniqueid_to_ino_t not to ever return 0"), when BITS_PER_LONG > is 64 on s390x, the corresponding cifs_uniqueid_to_ino_t() > function will cast 64-bit fileid to 32-bit by using (ino_t)fileid, > because ino_t (typdefed __kernel_ino_t) is int type. > > Signed-off-by: Yadan Fan > --- > fs/cifs/cifsfs.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h > index 68c4547..02dcbe1 100644 > --- a/fs/cifs/cifsfs.h > +++ b/fs/cifs/cifsfs.h > @@ -31,7 +31,7 @@ > * so that it will fit. We use hash_64 to convert the value to 31 bits, and > * then add 1, to ensure that we don't end up with a 0 as the value. > */ > -#if BITS_PER_LONG == 64 > +#if BITS_PER_LONG == 64 && !defined(CONFIG_S390) > static inline ino_t > cifs_uniqueid_to_ino_t(u64 fileid) > { > -- > 2.6.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve
linux-next: manual merge of the iommu tree with the arm-soc tree
Hi Joerg, Today's linux-next merge of the iommu tree got a conflict in: arch/arm64/boot/dts/mediatek/mt8173.dtsi between commit: 93e9f5ee1e35 ("dts: arm64: Add EFUSE device node") from the arm-soc tree and commit: 5ff6b3a6d391 ("dts: mt8173: Add iommu/smi nodes for mt8173") from the iommu tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell diff --cc arch/arm64/boot/dts/mediatek/mt8173.dtsi index f4bd3c9182ad,804881181fcc.. --- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi @@@ -277,11 -278,17 +278,22 @@@ reg = <0 0x10200620 0 0x20>; }; + iommu: iommu@10205000 { + compatible = "mediatek,mt8173-m4u"; + reg = <0 0x10205000 0 0x1000>; + interrupts = ; + clocks = <&infracfg CLK_INFRA_M4U>; + clock-names = "bclk"; + mediatek,larbs = <&larb0 &larb1 &larb2 + &larb3 &larb4 &larb5>; + #iommu-cells = <1>; + }; + + efuse: efuse@10206000 { + compatible = "mediatek,mt8173-efuse"; + reg = <0 0x10206000 0 0x1000>; + }; + apmixedsys: clock-controller@10209000 { compatible = "mediatek,mt8173-apmixedsys"; reg = <0 0x10209000 0 0x1000>;
Re: [PATCH v2 02/13] clk: sunxi: add ahb1 clock for A83T
Hi, On Sun, Feb 28, 2016 at 7:18 AM, Vishnu Patekar wrote: > AHB1 on A83T is similar to ahb1 on A31, except parents are different. > clock index 0b1x is PLL6. > > Signed-off-by: Vishnu Patekar > Acked-by: Chen-Yu Tsai > Acked-by: Rob Herring > --- > Documentation/devicetree/bindings/clock/sunxi.txt | 1 + > drivers/clk/sunxi/clk-sunxi.c | 76 > +++ > 2 files changed, 77 insertions(+) > > diff --git a/Documentation/devicetree/bindings/clock/sunxi.txt > b/Documentation/devicetree/bindings/clock/sunxi.txt > index c09f59b..2ee7841 100644 > --- a/Documentation/devicetree/bindings/clock/sunxi.txt > +++ b/Documentation/devicetree/bindings/clock/sunxi.txt > @@ -29,6 +29,7 @@ Required properties: > "allwinner,sun6i-a31-ar100-clk" - for the AR100 on A31 > "allwinner,sun9i-a80-cpus-clk" - for the CPUS on A80 > "allwinner,sun6i-a31-ahb1-clk" - for the AHB1 clock on A31 > + "allwinner,sun8i-a83t-ahb1-clk" - for the AHB1 clock on A83T > "allwinner,sun8i-h3-ahb2-clk" - for the AHB2 clock on H3 > "allwinner,sun6i-a31-ahb1-gates-clk" - for the AHB1 gates on A31 > "allwinner,sun8i-a23-ahb1-gates-clk" - for the AHB1 gates on A23 > diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c > index 99f60ef..0ae1f09 100644 > --- a/drivers/clk/sunxi/clk-sunxi.c > +++ b/drivers/clk/sunxi/clk-sunxi.c > @@ -344,6 +344,67 @@ static void sun6i_ahb1_recalc(struct factors_request > *req) > req->rate >>= req->p; > } > > +#define SUN8I_A83T_AHB1_PARENT_PLL62 > +/** > + * sun8i_a83t_get_ahb_factors() - calculates m, p factors for AHB > + * AHB rate is calculated as follows > + * rate = parent_rate >> p > + * > + * if parent is pll6, then > + * parent_rate = pll6 rate / (m + 1) > + */ > + > +static void sun8i_a83t_get_ahb1_factors(struct factors_request *req) > +{ > + u8 div, calcp, calcm = 1; > + > + /* > +* clock can only divide, so we will never be able to achieve > +* frequencies higher than the parent frequency > +*/ > + if (req->parent_rate && req->rate > req->parent_rate) > + req->rate = req->parent_rate; > + > + div = DIV_ROUND_UP(req->parent_rate, req->rate); > + > + /* calculate pre-divider if parent is pll6 */ > + if (req->parent_index >= SUN8I_A83T_AHB1_PARENT_PLL6) { > + if (div < 4) > + calcp = 0; > + else if (div / 2 < 4) > + calcp = 1; > + else if (div / 4 < 4) > + calcp = 2; > + else > + calcp = 3; > + > + calcm = DIV_ROUND_UP(div, 1 << calcp); > + } else { > + calcp = __roundup_pow_of_two(div); > + calcp = calcp > 3 ? 3 : calcp; > +} Indent here. > + > + req->rate = (req->parent_rate / calcm) >> calcp; > + req->p = calcp; > + req->m = calcm - 1; > +} > + > +/** > +* sun8i_a83t_ahb1_recalc() - calculates AHB clock rate from m, p factors and > +* parent index Whitespace here. > +*/ > +static void sun8i_a83t_ahb1_recalc(struct factors_request *req) > +{ > + req->rate = req->parent_rate; > + > +/* apply pre-divider first if parent is pll6 */ Indent here. ChenYu > + if (req->parent_index >= SUN6I_AHB1_PARENT_PLL6) > + req->rate /= req->m + 1; > + > + /* clk divider */ > + req->rate >>= req->p; > +} > + > /** > * sun4i_get_apb1_factors() - calculates m, p factors for APB1 > * APB1 rate is calculated as follows > @@ -555,6 +616,14 @@ static const struct factors_data sun6i_ahb1_data > __initconst = { > .recalc = sun6i_ahb1_recalc, > }; > > +static const struct factors_data sun8i_a83t_ahb1_data __initconst = { > + .mux = 12, > + .muxmask = BIT(1) | BIT(0), > + .table = &sun6i_ahb1_config, > + .getter = sun8i_a83t_get_ahb1_factors, > + .recalc = sun8i_a83t_ahb1_recalc, > +}; > + > static const struct factors_data sun4i_apb1_data __initconst = { > .mux = 24, > .muxmask = BIT(1) | BIT(0), > @@ -627,6 +696,13 @@ static void __init sun6i_ahb1_clk_setup(struct > device_node *node) > CLK_OF_DECLARE(sun6i_a31_ahb1, "allwinner,sun6i-a31-ahb1-clk", >sun6i_ahb1_clk_setup); > > +static void __init sun8i_a83t_ahb1_clk_setup(struct device_node *node) > +{ > + sunxi_factors_clk_setup(node, &sun8i_a83t_ahb1_data); > +} > +CLK_OF_DECLARE(sun8i_a83t_ahb1, "allwinner,sun8i-a83t-ahb1-clk", > + sun8i_a83t_ahb1_clk_setup); > + > static void __init sun4i_apb1_clk_setup(struct device_node *node) > { > sunxi_factors_clk_setup(node, &sun4i_apb1_data); > -- > 1.9.1 >
Re: n_tty: Check the other end of pty pair before returning EAGAIN on a read()
(Take 3, fix compile error in n_hdlc.c) Hi Peter, I saw Marc Aurele La France's proposed patch to n_tty to fix OpenSSH, and your feedback. Patch below is an attempt to address that feedback. Please let me know if this is the change you envisioned; (see Marc's excellent original writeup for details on the issue). [PATCH] n_tty: wait for buffer work in read() and poll(). Undoes the following four changes: 1) f95499c3030fe1bfad57745f2db1959c5b43dca8 n_tty: Don't wait for buffer work in read() loop 2) f8747d4a466ab2cafe56112c51b3379f9fdb7a12 tty: Fix pty master read() after slave closes 3) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73 pty, n_tty: Simplify input processing on final close 4) 1a48632ffed61352a7810ce089dc5a8bcd505a60 pty: Fix input race when closing These changes caused a regression in OpenSSH, as it assumes that the first read() to return EAGAIN after a SIGCHLD means that all the child's output has been returned. Inspired by analysis and patch from Marc Aurele La France Reported-by: Volth Reported-by: Marc Aurele La France BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52 BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492 Signed-off-by: Brian Bloniarz --- Documentation/serial/tty.txt | 3 --- drivers/tty/n_hdlc.c | 4 ++-- drivers/tty/n_tty.c | 34 +++--- drivers/tty/pty.c| 4 +--- drivers/tty/tty_buffer.c | 29 + include/linux/tty.h | 1 - 6 files changed, 19 insertions(+), 56 deletions(-) diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt index bc3842d..e2dea3d 100644 --- a/Documentation/serial/tty.txt +++ b/Documentation/serial/tty.txt @@ -213,9 +213,6 @@ TTY_IO_ERRORIf set, causes all subsequent userspace read/write TTY_OTHER_CLOSED Device is a pty and the other side has closed. -TTY_OTHER_DONE Device is a pty and the other side has closed and - all pending input processing has been completed. - TTY_NO_WRITE_SPLIT Prevent driver from splitting up writes into smaller chunks. diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index bbc4ce6..644ddb8 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -600,7 +600,7 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, add_wait_queue(&tty->read_wait, &wait); for (;;) { - if (test_bit(TTY_OTHER_DONE, &tty->flags)) { + if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) { ret = -EIO; break; } @@ -828,7 +828,7 @@ static unsigned int n_hdlc_tty_poll(struct tty_struct *tty, struct file *filp, /* set bits for operations that won't block */ if (n_hdlc->rx_buf_list.head) mask |= POLLIN | POLLRDNORM;/* readable */ - if (test_bit(TTY_OTHER_DONE, &tty->flags)) + if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) mask |= POLLHUP; if (tty_hung_up_p(filp)) mask |= POLLHUP; diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index b280abaa..fc04011 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1952,10 +1952,20 @@ err: return -ENOMEM; } +/** + * Synchronously pushes the terminal flip buffers to the line discipline + * and checks for available data. + * + * Must not be called from IRQ context. + */ static inline int input_available_p(struct tty_struct *tty, int poll) { struct n_tty_data *ldata = tty->disc_data; - int amt = poll && !TIME_CHAR(tty) && MIN_CHAR(tty) ? MIN_CHAR(tty) : 1; + int amt; + + flush_work(&tty->port->buf.work); + + amt = poll && !TIME_CHAR(tty) && MIN_CHAR(tty) ? MIN_CHAR(tty) : 1; if (ldata->icanon && !L_EXTPROC(tty)) return ldata->canon_head != ldata->read_tail; @@ -1963,18 +1973,6 @@ static inline int input_available_p(struct tty_struct *tty, int poll) return ldata->commit_head - ldata->read_tail >= amt; } -static inline int check_other_done(struct tty_struct *tty) -{ - int done = test_bit(TTY_OTHER_DONE, &tty->flags); - if (done) { - /* paired with cmpxchg() in check_other_closed(); ensures -* read buffer head index is not stale -*/ - smp_mb__after_atomic(); - } - return done; -} - /** * copy_from_read_buf - copy read data directly * @tty: terminal device @@ -2170,7 +2168,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, struct n_tty_data *ldata = tty->disc_data; unsigned char __user *b = buf; DEFINE_WAIT_FUNC(wait, woken_wake_function); - int c, done; + int c; int minimum, time; ssize_t retval = 0;
linux-next: build failure after merge of the mfd tree
Hi Lee, After merging the mfd tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/regulator/tps65086-regulator.c:194:9: error: implicit declaration of function 'regmap_write_bits' [-Werror=implicit-function-declaration] ret = regmap_write_bits(config->regmap, ^ Caused by commit 23b92e4cf5fd ("regmap: remove regmap_write_bits()") from the sound-asoc & regmap trees. I am not sure why this is suddenly exposed by the mfd tree, but grep would have been useful when the regmap tree patch was applied. I have reverted that regmap commit for today. -- Cheers, Stephen Rothwell
[PATCH v4 3/8] fixdep: accept extra dependencies on stdin
... and merge them in the list of parsed dependencies. Signed-off-by: Nicolas Pitre --- scripts/basic/fixdep.c | 60 +- 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/scripts/basic/fixdep.c b/scripts/basic/fixdep.c index 5b327c67a8..d984deb120 100644 --- a/scripts/basic/fixdep.c +++ b/scripts/basic/fixdep.c @@ -120,13 +120,15 @@ #define INT_NFIG ntohl(0x4e464947) #define INT_FIG_ ntohl(0x4649475f) +int insert_extra_deps; char *target; char *depfile; char *cmdline; static void usage(void) { - fprintf(stderr, "Usage: fixdep \n"); + fprintf(stderr, "Usage: fixdep [-e] \n"); + fprintf(stderr, " -e insert extra dependencies given on stdin\n"); exit(1); } @@ -138,6 +140,40 @@ static void print_cmdline(void) printf("cmd_%s := %s\n\n", target, cmdline); } +/* + * Print out a dependency path from a symbol name + */ +static void print_config(const char *m, int slen) +{ + int c, i; + + printf("$(wildcard include/config/"); + for (i = 0; i < slen; i++) { + c = m[i]; + if (c == '_') + c = '/'; + else + c = tolower(c); + putchar(c); + } + printf(".h) \\\n"); +} + +static void do_extra_deps(void) +{ + if (insert_extra_deps) { + char buf[80]; + while(fgets(buf, sizeof(buf), stdin)) { + int len = strlen(buf); + if (len < 2 || buf[len-1] != '\n') { + fprintf(stderr, "fixdep: bad data on stdin\n"); + exit(1); + } + print_config(buf, len-1); + } + } +} + struct item { struct item *next; unsigned intlen; @@ -197,23 +233,12 @@ static void define_config(const char *name, int len, unsigned int hash) static void use_config(const char *m, int slen) { unsigned int hash = strhash(m, slen); - int c, i; if (is_defined_config(m, slen, hash)) return; define_config(m, slen, hash); - - printf("$(wildcard include/config/"); - for (i = 0; i < slen; i++) { - c = m[i]; - if (c == '_') - c = '/'; - else - c = tolower(c); - putchar(c); - } - printf(".h) \\\n"); + print_config(m, slen); } static void parse_config_file(const char *map, size_t len) @@ -250,7 +275,7 @@ static void parse_config_file(const char *map, size_t len) } } -/* test is s ends in sub */ +/* test if s ends in sub */ static int strrcmp(const char *s, const char *sub) { int slen = strlen(s); @@ -374,6 +399,8 @@ static void parse_dep_file(void *map, size_t len) exit(1); } + do_extra_deps(); + printf("\n%s: $(deps_%s)\n\n", target, target); printf("$(deps_%s):\n", target); } @@ -430,7 +457,10 @@ int main(int argc, char *argv[]) { traps(); - if (argc != 4) + if (argc == 5 && !strcmp(argv[1], "-e")) { + insert_extra_deps = 1; + argv++; + } else if (argc != 4) usage(); depfile = argv[1]; -- 2.5.0
[PATCH v4 2/8] allow for per-symbol configurable EXPORT_SYMBOL()
Similar to include/generated/autoconf.h, include/generated/autoksyms.h will contain a list of defines for each EXPORT_SYMBOL() that we want active. The format is: #define __KSYM_ 1 This list will be auto-generated with another patch. For now we only include the preprocessor magic to automatically create or omit the corresponding struct kernel_symbol declaration. Given the content of include/generated/autoksyms.h may not be known in advance, an empty file is created early on to let the build proceed. Signed-off-by: Nicolas Pitre Acked-by: Rusty Russell --- Makefile | 2 ++ include/linux/export.h | 22 -- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 6c1a3c2479..e916428cf7 100644 --- a/Makefile +++ b/Makefile @@ -986,6 +986,8 @@ prepare2: prepare3 outputmakefile asm-generic prepare1: prepare2 $(version_h) include/generated/utsrelease.h \ include/config/auto.conf $(cmd_crmodverdir) + $(Q)test -e include/generated/autoksyms.h || \ + touch include/generated/autoksyms.h archprepare: archheaders archscripts prepare1 scripts_basic diff --git a/include/linux/export.h b/include/linux/export.h index 96e45ea463..77afdb2a25 100644 --- a/include/linux/export.h +++ b/include/linux/export.h @@ -38,7 +38,7 @@ extern struct module __this_module; #ifdef CONFIG_MODULES -#ifndef __GENKSYMS__ +#if defined(__KERNEL__) && !defined(__GENKSYMS__) #ifdef CONFIG_MODVERSIONS /* Mark the CRC weak since genksyms apparently decides not to * generate a checksums for some symbols */ @@ -53,7 +53,7 @@ extern struct module __this_module; #endif /* For every exported symbol, place a struct in the __ksymtab section */ -#define __EXPORT_SYMBOL(sym, sec) \ +#define ___EXPORT_SYMBOL(sym, sec) \ extern typeof(sym) sym; \ __CRC_SYMBOL(sym, sec) \ static const char __kstrtab_##sym[] \ @@ -65,6 +65,24 @@ extern struct module __this_module; __attribute__((section("___ksymtab" sec "+" #sym), unused)) \ = { (unsigned long)&sym, __kstrtab_##sym } +#ifdef CONFIG_TRIM_UNUSED_KSYMS + +#include +#include + +#define __EXPORT_SYMBOL(sym, sec) \ + __cond_export_sym(sym, sec, config_enabled(__KSYM_##sym)) +#define __cond_export_sym(sym, sec, conf) \ + ___cond_export_sym(sym, sec, conf) +#define ___cond_export_sym(sym, sec, enabled) \ + __cond_export_sym_##enabled(sym, sec) +#define __cond_export_sym_1(sym, sec) ___EXPORT_SYMBOL(sym, sec) +#define __cond_export_sym_0(sym, sec) /* nothing */ + +#else +#define __EXPORT_SYMBOL ___EXPORT_SYMBOL +#endif + #define EXPORT_SYMBOL(sym) \ __EXPORT_SYMBOL(sym, "") -- 2.5.0
[PATCH v4 6/8] create/adjust generated/autoksyms.h
Given the list of exported symbols needed by all modules, we can create a header file containing preprocessor defines for each of those symbols. Also, when some symbols are added and/or removed from the list, we can update the time on the corresponding files used as build dependencies for those symbols. And finally, if any symbol did change state, the corresponding source files must be rebuilt. The insertion or removal of an EXPORT_SYMBOL() entry within a module may create or remove the need for another exported symbol. This is why this operation has to be repeated until the list of needed exported symbols becomes stable. Only then the final kernel and modules link take place. Signed-off-by: Nicolas Pitre Acked-by: Rusty Russell --- Makefile| 13 ++ scripts/adjust_autoksyms.sh | 97 + 2 files changed, 110 insertions(+) create mode 100755 scripts/adjust_autoksyms.sh diff --git a/Makefile b/Makefile index e916428cf7..bb865095ca 100644 --- a/Makefile +++ b/Makefile @@ -921,6 +921,10 @@ quiet_cmd_link-vmlinux = LINK$@ # Include targets which we want to # execute if the rest of the kernel build went well. vmlinux: scripts/link-vmlinux.sh $(vmlinux-deps) FORCE +ifdef CONFIG_TRIM_UNUSED_KSYMS + $(Q)$(CONFIG_SHELL) scripts/adjust_autoksyms.sh \ + "$(MAKE) KBUILD_MODULES=1 -f $(srctree)/Makefile autoksyms_recursive" +endif ifdef CONFIG_HEADERS_CHECK $(Q)$(MAKE) -f $(srctree)/Makefile headers_check endif @@ -935,6 +939,15 @@ ifdef CONFIG_GDB_SCRIPTS endif +$(call if_changed,link-vmlinux) +autoksyms_recursive: $(vmlinux-deps) + $(Q)$(CONFIG_SHELL) scripts/adjust_autoksyms.sh \ + "$(MAKE) KBUILD_MODULES=1 -f $(srctree)/Makefile autoksyms_recursive" +PHONY += autoksyms_recursive + +# standalone target for easier testing +include/generated/autoksyms.h: FORCE + $(Q)$(CONFIG_SHELL) scripts/adjust_autoksyms.sh true + # The actual objects are generated when descending, # make sure no implicit rule kicks in $(sort $(vmlinux-deps)): $(vmlinux-dirs) ; diff --git a/scripts/adjust_autoksyms.sh b/scripts/adjust_autoksyms.sh new file mode 100755 index 00..a145a24cd8 --- /dev/null +++ b/scripts/adjust_autoksyms.sh @@ -0,0 +1,97 @@ +#!/bin/sh + +# Script to create/update include/generated/autoksyms.h and dependency files +# +# Copyright: (C) 2016 Linaro Limited +# Created by: Nicolas Pitre, January 2016 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License version 2 as +# published by the Free Software Foundation. + +# Create/update the include/generated/autoksyms.h file from the list +# of all module's needed symbols as recorded on the third line of +# .tmp_versions/*.mod files. +# +# For each symbol being added or removed, the corresponding dependency +# file's timestamp is updated to force a rebuild of the affected source +# file. All arguments passed to this script are assumed to be a command +# to be exec'd to trigger a rebuild of those files. + +set -e + +cur_ksyms_file="include/generated/autoksyms.h" +new_ksyms_file="include/generated/autoksyms.h.tmpnew" + +info() { [ "$quiet" != "silent_" ] && printf " %-7s %s\n" "$1" "$2"; } + +info "CHK" "$cur_ksyms_file" + +# Use "make V=1" to debug this script. +case "$KBUILD_VERBOSE" in +*1*) + set -x + ;; +esac + +# We need access to CONFIG_ symbols +case "${KCONFIG_CONFIG}" in +*/*) + . "${KCONFIG_CONFIG}" + ;; +*) + # Force using a file from the current directory + . "./${KCONFIG_CONFIG}" +esac + +# In case it doesn't exist yet... +[ -e "$cur_ksyms_file" ] || touch "$cur_ksyms_file" + +# Generate a new ksym list file with symbols needed by the current +# set of modules. +cat > "$new_ksyms_file" << EOT +/* + * Automatically generated file; DO NOT EDIT. + */ + +EOT +sed -ns -e '3s/ /\n/gp' "$MODVERDIR"/*.mod | sort -u | +while read sym; do + [ -n "$CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX" ] && sym="${sym#_}" + echo "#define __KSYM_${sym} 1" +done >> "$new_ksyms_file" + +# Special case for modversions (see modpost.c) +if [ -n "$CONFIG_MODVERSIONS" ]; then + echo "#define __KSYM_module_layout 1" >> "$new_ksyms_file" +fi + +# Extract changes between old and new list and touch corresponding +# dependency files. +# Note: sort -m doesn't work well with underscore prefixed symbols so we +# use 'cat ... | sort' instead. +changed=$( +count=0 +cat "$cur_ksyms_file" "$new_ksyms_file" | sort | uniq -u | +sed -n 's/^#define __KSYM_\(.*\) 1/\1/p' | tr "A-Z_" "a-z/" | +while read sympath; do + [ -z "$sympath" ] && continue + depfile="include/config/ksym/${sympath}.h" + mkdir -p "$(dirname "$depfile")" + touch "$depfile" + echo $((count += 1)) +done | tail -1 ) +changed=${changed:-0} + +if [ $changed -gt 0 ]; then + # Replace the old list with tne new one + old=$(grep -c "^#define __
[PATCH v4 4/8] kbuild: de-duplicate fixdep usage
The generation and postprocessing of automatic dependency rules is duplicated in rule_cc_o_c and if_changed_dep. Since this is not a trivial one-liner action, it is now abstracted under cmd_and_fixdep to simplify things and make future changes easier. In the rule_cc_o_c case that means the order of some commands has been altered, namely fixdep and related file manipulations are executed earlier, but they didn't depend on those commands that now execute later. Signed-off-by: Nicolas Pitre --- scripts/Kbuild.include | 5 - scripts/Makefile.build | 9 ++--- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 1db6d73c8d..8a257fa663 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -256,10 +256,13 @@ if_changed = $(if $(strip $(any-prereq) $(arg-check)), \ # Execute the command and also postprocess generated .d dependencies file. if_changed_dep = $(if $(strip $(any-prereq) $(arg-check) ), \ @set -e; \ + $(cmd_and_fixdep)) + +cmd_and_fixdep = \ $(echo-cmd) $(cmd_$(1)); \ scripts/basic/fixdep $(depfile) $@ '$(make-cmd)' > $(dot-target).tmp;\ rm -f $(depfile);\ - mv -f $(dot-target).tmp $(dot-target).cmd) + mv -f $(dot-target).tmp $(dot-target).cmd; # Usage: $(call if_changed_rule,foo) # Will check if $(cmd_foo) or any of the prerequisites changed, diff --git a/scripts/Makefile.build b/scripts/Makefile.build index f4b4320e0d..8134ee81ad 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -243,14 +243,9 @@ endif define rule_cc_o_c $(call echo-cmd,checksrc) $(cmd_checksrc) \ - $(call echo-cmd,cc_o_c) $(cmd_cc_o_c);\ + $(call cmd_and_fixdep,cc_o_c) \ $(cmd_modversions)\ - $(call echo-cmd,record_mcount)\ - $(cmd_record_mcount) \ - scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >\ - $(dot-target).tmp; \ - rm -f $(depfile); \ - mv -f $(dot-target).tmp $(dot-target).cmd + $(call echo-cmd,record_mcount) $(cmd_record_mcount) endef # List module undefined symbols (or empty line if not enabled) -- 2.5.0
[PATCH v4 5/8] kbuild: add fine grained build dependencies for exported symbols
Like with kconfig options, we now have the ability to compile in and out individual EXPORT_SYMBOL() declarations based on the content of include/generated/autoksyms.h. However we don't want the entire world to be rebuilt whenever that file is touched. Let's apply the same build dependency trick used for CONFIG_* symbols where the time stamp of empty files whose paths matching those symbols is used to trigger fine grained rebuilds. In our case the key is the symbol name passed to EXPORT_SYMBOL(). However, unlike config options, we cannot just use fixdep to parse the source code for EXPORT_SYMBOL(ksym) because several variants exist and parsing them all in a separate tool, and keeping it in synch, is not trivially maintainable. Furthermore, there are variants such as EXPORT_SYMBOL_GPL(pci_user_read_config_##size); that are instanciated via a macro for which we can't easily determine the actual exported symbol name(s) short of actually running the preprocessor on them. Storing the symbol name string in a special ELF section doesn't work for targets that output assembly or preprocessed source. So the best way is really to leverage the preprocessor by having it emit a warning for each EXPORT_SYMBOL() instance and filtering those apart from stderr by the build system. Then the list of symbols is simply fed to fixdep to be merged with the other dependencies. Because of the lowercasing performed by fixdep, there might be name collisions triggering spurious rebuilds for similar symbols. But this shouldn't be a big issue in practice. (This is the case for CONFIG_* symbols and I didn't want to be different here, whatever the original reason for doing so.) To avoid needless build overhead, the exported symbol name gathering is performed only when CONFIG_TRIM_UNUSED_KSYMS is selected. Signed-off-by: Nicolas Pitre Acked-by: Rusty Russell --- include/linux/export.h | 16 ++-- scripts/Kbuild.include | 28 scripts/basic/fixdep.c | 1 + 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/include/linux/export.h b/include/linux/export.h index 77afdb2a25..794392102d 100644 --- a/include/linux/export.h +++ b/include/linux/export.h @@ -76,8 +76,20 @@ extern struct module __this_module; ___cond_export_sym(sym, sec, conf) #define ___cond_export_sym(sym, sec, enabled) \ __cond_export_sym_##enabled(sym, sec) -#define __cond_export_sym_1(sym, sec) ___EXPORT_SYMBOL(sym, sec) -#define __cond_export_sym_0(sym, sec) /* nothing */ +#define __cond_export_sym_1(sym, sec) \ + __KSYM_DEP(sym) ___EXPORT_SYMBOL(sym, sec) +#define __cond_export_sym_0(sym, sec) \ + __KSYM_DEP(sym) /* nothing */ + +/* + * For fine grained build dependencies, we want to tell the build system + * about each possible exported symbol even if they're not actually exported. + * This is accomplished with a preprocessor warning that gets captured by + * the make rule (see ksym_dep_filter in scripts/Kbuild.include). + */ +#define __KSYM_DEP(sym) __pragma_string( KBUILD_AUTOKSYM_DEP: sym ) +#define __pragma_string(x) __emit_pragma( GCC warning #x ) +#define __emit_pragma(x) _Pragma(#x) #else #define __EXPORT_SYMBOL ___EXPORT_SYMBOL diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 8a257fa663..0b69479310 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -258,12 +258,40 @@ if_changed_dep = $(if $(strip $(any-prereq) $(arg-check) ), \ @set -e; \ $(cmd_and_fixdep)) +ifndef CONFIG_TRIM_UNUSED_KSYMS + cmd_and_fixdep = \ $(echo-cmd) $(cmd_$(1)); \ scripts/basic/fixdep $(depfile) $@ '$(make-cmd)' > $(dot-target).tmp;\ rm -f $(depfile);\ mv -f $(dot-target).tmp $(dot-target).cmd; +else + +# Filter out exported kernel symbol names advertised as warning pragmas +# by the preprocessor and write them to $(1). We must consider continuation +# lines as well: they start with a blank, or the preceeding line ends with +# a ':'. Anything else is passed through as is. +# See also __KSYM_DEP() in include/linux/export.h. +ksym_dep_filter = sed -n \ + -e '1 {x; $$!d}' \ + -e '/^ / {H; $$!d}' \ + -e 'x; /:$$/ {x; H; $$!d; s/^/ /; x}' \ + -e ':filter; /^.*KBUILD_AUTOKSYM_DEP: /! {p; b next}' \ + -e 's//KSYM_/; s/\n.*//; w $(1)' \ + -e ':next; $$!d' \ + -e '1 q; s/^/ /; x; /^ /! b filter' + +cmd_and_fixdep = \ + $(echo-cmd) \ + $(cmd_$(1)) 2>&1 | $(call ksym_dep_filter,$(dot-target).ksym.tmp) >&2;\ + scripts/basic/fixdep -e $(depfile) $@
[PATCH v4 0/8] [PULL REQUEST] Trim unused exported kernel symbols
This patch series provides the option to omit exported symbols from the kernel and modules that are never referenced by any of the selected modules in the current kernel configuration. this allows for optimizing the compiled code and reducing final binaries' size. When using LTO the binary size reduction is even more effective. It could also be argued that this could bring some security advantages. The original cover letter with lots of test results can be found here: https://lkml.org/lkml/2016/2/8/813 Please consider for merging into your tree. Alternately, the following branch can be merged: http://git.linaro.org/people/nicolas.pitre/linux.git autoksyms Thanks. Changes from v3: - Shell portability changes to adjust_autoksyms.sh, partly from suggestions by Zev Weiss. - Fix sample modules by building them before adjust_autoksyms.sh is run. Changes from v2: - Generating the build dependencies by parsing the source with fixdep turned out to be unreliable due to all the EXPORT_SYMBOL() variants, and especially their use within macros where the actual symbol name is known only after running the preprocessor. This list of symbol names is now obtained from the preprocessor directly, fixing allmodconfig builds. Changes from v1: - Replaced "exp" that doesn't convey the right meaning as noted by Sam Ravnborg. The "ksym" identifier is actually what the kernel already uses for this. Therefore: - CONFIG_TRIM_UNUSED_EXPSYMS --> CONFIG_TRIM_UNUSED_KSYMS - include/generated/expsyms.h --> include/generated/autoksyms.h - #define __EXPSYM_* --> #define __KSYM_* - Some sed regexp improvements as suggested by Al Viro. - Renamed vmlinux_recursive target to autoksyms_recursive. - Accept EXPORT_SYMBOL variants with a prefix, e.g. ACPI_EXPORT_SYMBOL. - Minor commit log clarifications. - Added Rusty's ACK. diffstat: Makefile| 23 +++-- include/linux/export.h | 34 - init/Kconfig| 16 ++ scripts/Kbuild.include | 33 - scripts/Makefile.build | 22 + scripts/adjust_autoksyms.sh | 97 + scripts/basic/fixdep.c | 61 +-- 7 files changed, 256 insertions(+), 30 deletions(-)
[PATCH v4 7/8] kbuild: build sample modules along with the rest of the kernel
Make sample modules in parallel with the rest of the kernel rather than having them built from the vmlinux target. This makes the build slightly faster, and those modules are properly considered when adjust_autoksyms.sh is executed. Signed-off-by: Nicolas Pitre --- Makefile | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index bb865095ca..f5daa4bbf3 100644 --- a/Makefile +++ b/Makefile @@ -928,9 +928,6 @@ endif ifdef CONFIG_HEADERS_CHECK $(Q)$(MAKE) -f $(srctree)/Makefile headers_check endif -ifdef CONFIG_SAMPLES - $(Q)$(MAKE) $(build)=samples -endif ifdef CONFIG_BUILD_DOCSRC $(Q)$(MAKE) $(build)=Documentation endif @@ -948,6 +945,11 @@ PHONY += autoksyms_recursive include/generated/autoksyms.h: FORCE $(Q)$(CONFIG_SHELL) scripts/adjust_autoksyms.sh true +# Build samples along the rest of the kernel +ifdef CONFIG_SAMPLES +vmlinux-dirs += samples +endif + # The actual objects are generated when descending, # make sure no implicit rule kicks in $(sort $(vmlinux-deps)): $(vmlinux-dirs) ; -- 2.5.0
[PATCH v4 8/8] kconfig option for TRIM_UNUSED_KSYMS
The config option to enable it all. Signed-off-by: Nicolas Pitre Acked-by: Rusty Russell --- init/Kconfig | 16 1 file changed, 16 insertions(+) diff --git a/init/Kconfig b/init/Kconfig index 22320804fb..e6f666331b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1990,6 +1990,22 @@ config MODULE_COMPRESS_XZ endchoice +config TRIM_UNUSED_KSYMS + bool "Trim unused exported kernel symbols" + depends on MODULES && !UNUSED_SYMBOLS + help + The kernel and some modules make many symbols available for + other modules to use via EXPORT_SYMBOL() and variants. Depending + on the set of modules being selected in your kernel configuration, + many of those exported symbols might never be used. + + This option allows for unused exported symbols to be dropped from + the build. In turn, this provides the compiler more opportunities + (especially when using LTO) for optimizing the code and reducing + binary size. This might have some security advantages as well. + + If unsure say N. + endif # MODULES config MODULES_TREE_LOOKUP -- 2.5.0
[PATCH v4 1/8] kbuild: record needed exported symbols for modules
Kernel modules are partially linked object files with some undefined symbols that are expected to be matched with EXPORT_SYMBOL() entries from elsewhere. Each .tmp_versions/*.mod file currently contains two line of text separated by a newline character. The first line has the actual module file name while the second line has a list of object files constituting that module. Those files are parsed by modpost (scripts/mod/sumversion.c), scripts/Makefile.modpost, scripts/Makefile.modsign, etc. Only the modpost utility cares about the second line while the others retrieve only the first line. Therefore we can add a third line to record the list of undefined symbols aka required EXPORT_SYMBOL() entries for each module into that file without breaking anything. Like for the second line, symbols are separated by a blank and the list is terminated with a newline character. To avoid needless build overhead, the undefined symbols extraction is performed only when CONFIG_TRIM_UNUSED_KSYMS is selected. Signed-off-by: Nicolas Pitre Acked-by: Rusty Russell --- scripts/Makefile.build | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 2c47f9c305..f4b4320e0d 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -253,6 +253,13 @@ define rule_cc_o_c mv -f $(dot-target).tmp $(dot-target).cmd endef +# List module undefined symbols (or empty line if not enabled) +ifdef CONFIG_TRIM_UNUSED_KSYMS +cmd_undef_syms = $(NM) $@ | sed -n 's/^ \+U //p' | xargs echo +else +cmd_undef_syms = echo +endif + # Built-in and composite module parts $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE $(call cmd,force_checksrc) @@ -263,7 +270,8 @@ $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE $(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) - @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod) + @{ echo $(@:.o=.ko); echo $@; \ + $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod) quiet_cmd_cc_lst_c = MKLST $@ cmd_cc_lst_c = $(CC) $(c_flags) -g -c -o $*.o $< && \ @@ -393,7 +401,8 @@ $(call multi_depend, $(multi-used-y), .o, -objs -y) $(multi-used-m): FORCE $(call if_changed,link_multi-m) - @{ echo $(@:.o=.ko); echo $(link_multi_deps); } > $(MODVERDIR)/$(@F:.o=.mod) + @{ echo $(@:.o=.ko); echo $(link_multi_deps); \ + $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod) $(call multi_depend, $(multi-used-m), .o, -objs -y -m) targets += $(multi-used-y) $(multi-used-m) -- 2.5.0
Re: multipath: I/O hanging forever
On Sun, Feb 28, 2016 at 06:53:33PM -0700, Andrea Righi wrote: ... > I'm using 4.5.0-rc5+, from Linus' git. I'll try to do a git bisect > later, I'm pretty sure this problem has been introduced recently (i.e., > I've never seen this issue with 4.1.x). I confirm, just tested kernel 4.1 and this problem doesn't happen. Thanks, -Andrea
RE: [PATCH v2 3/4] mtd:spi-nor:fsl-quadspi:Add fast-read mode support
Hi Han, But I don't think QuadSPI driver need to check the m25p,fast-read property again since spi-nor layer has already done that. Adding the property in flash node should work in the same way. [Yunhui]: There are three modes in fsl-quadspi driver , fast mode, quad mode, ddr quad read. The last parameter mode of spi_nor_scan() I have to specify . Otherwise, flash is still set to quad mode. spi-nor.c: 1419 if (mode == SPI_NOR_QUAD && info->flags & SPI_NOR_QUAD_READ) { 1420 ret = set_quad_mode(nor, info); 1421 if (ret) { 1422 dev_err(dev, "quad mode not supported\n"); 1423 return ret; 1424 } 1425 nor->flash_read = SPI_NOR_QUAD; 1426 } else if (mode == SPI_NOR_DUAL && info->flags & SPI_NOR_DUAL_READ) { 1427 nor->flash_read = SPI_NOR_DUAL; 1428 } Thanks Yunhui -Original Message- From: Han Xu [mailto:xhnj...@gmail.com] Sent: Saturday, February 27, 2016 12:32 AM To: Yunhui Cui Cc: Yunhui Cui; dw...@infradead.org; computersforpe...@gmail.com; han...@freescale.com; linux-...@lists.infradead.org; linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Yao Yuan Subject: Re: [PATCH v2 3/4] mtd:spi-nor:fsl-quadspi:Add fast-read mode support On Thu, Feb 25, 2016 at 08:07:22AM +, Yunhui Cui wrote: > Hi Han, > > I have provided the options " m25p,fast-read ", because there are probable > some flashes can't support quad mode. > So we should support fast-read mode in our driver. Moreover, There is a > option to select fast-read mode in spi_nor.c : >/* If we were instantiated by DT, use it */ > if (of_property_read_bool(np, "m25p,fast-read")) > nor->flash_read = SPI_NOR_FAST; Did you have some REAL cases using SPI NOR that only supports upto fast-read with Quad SPI driver? Neither fast-read or normal-read, which is actually more general, supported in the driver, just because I didn't see any REAL cases till now. I didn't run against the patch, although IMO it's not that necessary. But I don't think QuadSPI driver need to check the m25p,fast-read property again since spi-nor layer has already done that. Adding the property in flash node should work in the same way. > > Thanks > Yunhui > > -Original Message- > From: Han Xu [mailto:xhnj...@gmail.com] > Sent: Thursday, February 18, 2016 2:08 AM > To: Yunhui Cui > Cc: dw...@infradead.org; computersforpe...@gmail.com; > han...@freescale.com; linux-...@lists.infradead.org; > linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > Yao Yuan > Subject: Re: [PATCH v2 3/4] mtd:spi-nor:fsl-quadspi:Add fast-read mode > support > > On Mon, Feb 01, 2016 at 07:30:07PM +0800, Yunhui Cui wrote: > > The qspi driver add generic fast-read mode for different flash > > venders. There are some different board flash work on different > > mode, such fast-read, quad-mode. > > So we have to modify the third entrace parameter of spi_nor_scan(). > > > > Signed-off-by: Yunhui Cui > > --- > > drivers/mtd/spi-nor/fsl-quadspi.c | 27 +-- > > 1 file changed, 21 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/mtd/spi-nor/fsl-quadspi.c > > b/drivers/mtd/spi-nor/fsl-quadspi.c > > index 9861290..0a31cb1 100644 > > --- a/drivers/mtd/spi-nor/fsl-quadspi.c > > +++ b/drivers/mtd/spi-nor/fsl-quadspi.c > > @@ -389,11 +389,21 @@ static void fsl_qspi_init_lut(struct fsl_qspi *q) > > /* Read */ > > lut_base = SEQID_READ * 4; > > > > - qspi_writel(q, LUT0(CMD, PAD1, read_op) | LUT1(ADDR, PAD1, addrlen), > > - base + QUADSPI_LUT(lut_base)); > > - qspi_writel(q, LUT0(DUMMY, PAD1, read_dm) | > > - LUT1(FSL_READ, PAD4, rxfifo), > > - base + QUADSPI_LUT(lut_base + 1)); > > + if (nor->flash_read == SPI_NOR_FAST) { > > + qspi_writel(q, LUT0(CMD, PAD1, read_op) | > > + LUT1(ADDR, PAD1, addrlen), > > + base + QUADSPI_LUT(lut_base)); > > + qspi_writel(q, LUT0(DUMMY, PAD1, read_dm) | > > + LUT1(FSL_READ, PAD1, rxfifo), > > + base + QUADSPI_LUT(lut_base + 1)); > > + } else if (nor->flash_read == SPI_NOR_QUAD) { > > + qspi_writel(q, LUT0(CMD, PAD1, read_op) | > > + LUT1(ADDR, PAD1, addrlen), > > + base + QUADSPI_LUT(lut_base)); > > + qspi_writel(q, LUT0(DUMMY, PAD1, read_dm) | > > + LUT1(FSL_READ, PAD4, rxfifo), > > + base + QUADSPI_LUT(lut_base + 1)); > > + } > > > > /* Write enable */ > > lut_base = SEQID_WREN * 4; > > @@ -468,6 +478,7 @@ static int fsl_qspi_get_seqid(struct fsl_qspi > > *q, > > u8 cmd) { > > switch (cmd) { > > case SPINOR_OP_READ_1_1_4: > > + case SPINOR_OP_READ_FAST:
Re: [PATCH] bus: imx-weim: Take the 'status' property value into account
On Mon, Feb 22, 2016 at 09:01:53AM -0300, Fabio Estevam wrote: > From: Fabio Estevam > > Currently we have an incorrect behaviour when multiple devices > are present under the weim node. For example: > > &weim { > ... > status = "okay"; > > sram@0,0 { > ... > status = "okay"; > }; > > mram@0,0 { > ... > status = "disabled"; > }; > }; > > In this case only the 'sram' device should be probed and not 'mram'. > > However what happens currently is that the status variable is ignored, > causing the 'sram' device to be disabled and 'mram' to be enabled. > > Change the weim_parse_dt() function to use > for_each_available_child_of_node()so that the devices marked with > 'status = disabled' are not probed. > > Cc: > Suggested-by: Wolfgang Netbal > Signed-off-by: Fabio Estevam Acked-by: Shawn Guo Arnd, Olof, I do not have any other 'driver' patches queued, so please help directly apply this one. Considering this fixes a real problem, it would be good if we can merge this through -rc. But we understand that it's -rc6 now, and this doesn't fix a regression or so-critical issue, so it should be fine to queue the patch for the next release as well. Shawn > --- > drivers/bus/imx-weim.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c > index e98d15e..1827fc4 100644 > --- a/drivers/bus/imx-weim.c > +++ b/drivers/bus/imx-weim.c > @@ -150,7 +150,7 @@ static int __init weim_parse_dt(struct platform_device > *pdev, > return ret; > } > > - for_each_child_of_node(pdev->dev.of_node, child) { > + for_each_available_child_of_node(pdev->dev.of_node, child) { > if (!child->name) > continue; > > -- > 1.9.1 > >
Re: [PATCH 2/5] oom reaper: handle mlocked pages
On Tue, 23 Feb 2016, Michal Hocko wrote: > On Mon 22-02-16 17:36:07, David Rientjes wrote: > > > > Are we concerned about munlock_vma_pages_all() taking lock_page() and > > perhaps stalling forever, the same way it would stall in exit_mmap() for > > VM_LOCKED vmas, if another thread has locked the same page and is doing an > > allocation? > > This is a good question. I have checked for that particular case > previously and managed to convinced myself that this is OK(ish). > munlock_vma_pages_range locks only THP pages to prevent from the > parallel split-up AFAICS. I think you're mistaken on that: there is also the lock_page() on every page in Phase 2 of __munlock_pagevec(). > And split_huge_page_to_list doesn't seem > to depend on an allocation. It can block on anon_vma lock but I didn't > see any allocation requests from there either. I might be missing > something of course. Do you have any specific path in mind? > > > I'm wondering if in that case it would be better to do a > > best-effort munlock_vma_pages_all() with trylock_page() and just give up > > on releasing memory from that particular vma. In that case, there may be > > other memory that can be freed with unmap_page_range() that would handle > > this livelock. I agree with David, that we ought to trylock_page() throughout munlock: just so long as it gets to do the TestClearPageMlocked without demanding page lock, the rest is the usual sugarcoating for accurate Mlocked stats, and leave the rest for reclaim to fix up. > > I have tried to code it up but I am not really sure the whole churn is > really worth it - unless I am missing something that would really make > the THP case likely to hit in the real life. Though I must have known about it forever, it was a shock to see all those page locks demanded in exit, brought home to us a week or so ago. The proximate cause in this case was my own change, to defer pte_alloc to suit huge tmpfs: it had not previously occurred to me that I was now doing the pte_alloc while __do_fault holds page lock. Bad Hugh. But change not yet upstream, so not so urgent for you. >From time immemorial, free_swap_and_cache() and free_swap_cache() only ever trylock a page, precisely so that they never hold up munmap or exit (well, if I looked harder, I might find lock ordering reasons too). > > Just for the reference this is what I came up with (just compile tested). I tried something similar internally (on an earlier kernel). Like you I've set that work aside for now, there were quicker ways to fix the issue at hand. But it does continue to offend me that munlock demands all those page locks: so if you don't get back to it before me, I shall eventually. I didn't understand why you complicated yours with the "enforce" arg to munlock_vma_pages_range(): why not just trylock in all cases? Hugh
Re: [PATCH] cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
On 28-02-16, 02:33, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > Commit 8fb47ff100af (cpufreq: governor: Replace timers with utilization > update callbacks) made CPU_FREQ select IRQ_WORK, but that's not > necessary, as it is sufficient for IRQ_WORK to be selected by > CPU_FREQ_GOV_COMMON, so modify the cpufreq Kconfig to that effect. > > Signed-off-by: Rafael J. Wysocki > --- > > On top of linux-next. > > Thanks, > Rafael > > --- > drivers/cpufreq/Kconfig |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-pm/drivers/cpufreq/Kconfig > === > --- linux-pm.orig/drivers/cpufreq/Kconfig > +++ linux-pm/drivers/cpufreq/Kconfig > @@ -3,7 +3,6 @@ menu "CPU Frequency scaling" > config CPU_FREQ > bool "CPU Frequency scaling" > select SRCU > - select IRQ_WORK > help > CPU Frequency scaling allows you to change the clock speed of > CPUs on the fly. This is a nice method to save power, because > @@ -20,6 +19,7 @@ config CPU_FREQ > if CPU_FREQ > > config CPU_FREQ_GOV_COMMON > + select IRQ_WORK > bool > > config CPU_FREQ_BOOST_SW Acked-by: Viresh Kumar -- viresh
Re: [PATCH 09/50] pinctrl: imx: Use devm_pinctrl_register() for pinctrl registration
On Wed, Feb 24, 2016 at 06:45:34PM +0530, Laxman Dewangan wrote: > Use devm_pinctrl_register() for pin control registration and remove > need of .remove callback. > > Signed-off-by: Laxman Dewangan > Cc: Shawn Guo Acked-by: Shawn Guo
linux-next: manual merge of the drm tree with Linus' tree
Hi Dave, Today's linux-next merge of the drm tree got a conflict in: drivers/gpu/drm/amd/amdgpu/amdgpu_display.c between commit: e1d09dc0ccc6 ("drm/amdgpu: Don't hang in amdgpu_flip_work_func on disabled crtc.") from Linus' tree and commit: 6bd9e877ce53 ("drm/amdgpu: Move MMIO flip out of spinlocked region") from the drm tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwell diff --cc drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index 8297bc319369,2cb53c24dec0.. --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@@ -72,13 -70,16 +70,16 @@@ static void amdgpu_flip_work_func(struc struct drm_crtc *crtc = &amdgpuCrtc->base; unsigned long flags; - unsigned i; - int vpos, hpos, stat, min_udelay; + unsigned i, repcnt = 4; + int vpos, hpos, stat, min_udelay = 0; struct drm_vblank_crtc *vblank = &crtc->dev->vblank[work->crtc_id]; - amdgpu_flip_wait_fence(adev, &work->excl); + if (amdgpu_flip_handle_fence(work, &work->excl)) + return; + for (i = 0; i < work->shared_count; ++i) - amdgpu_flip_wait_fence(adev, &work->shared[i]); + if (amdgpu_flip_handle_fence(work, &work->shared[i])) + return; /* We borrow the event spin lock for protecting flip_status */ spin_lock_irqsave(&crtc->dev->event_lock, flags); @@@ -123,19 -119,12 +124,19 @@@ spin_lock_irqsave(&crtc->dev->event_lock, flags); }; + if (!repcnt) + DRM_DEBUG_DRIVER("Delay problem on crtc %d: min_udelay %d, " + "framedur %d, linedur %d, stat %d, vpos %d, " + "hpos %d\n", work->crtc_id, min_udelay, + vblank->framedur_ns / 1000, + vblank->linedur_ns / 1000, stat, vpos, hpos); + - /* do the flip (mmio) */ - adev->mode_info.funcs->page_flip(adev, work->crtc_id, work->base); /* set the flip status */ amdgpuCrtc->pflip_status = AMDGPU_FLIP_SUBMITTED; - spin_unlock_irqrestore(&crtc->dev->event_lock, flags); + + /* Do the flip (mmio) */ + adev->mode_info.funcs->page_flip(adev, work->crtc_id, work->base); } /*
RE: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
> On Sun, Feb 28, 2016, Borislav Petkov wrote: > > Can we have some concrete examples for that please? > Our environment allows users to have exclusive access to some number of compute nodes for a limited time. Bit-level control of MSRs is required when a user might gain root or, more commonly, interfere with subsequent jobs run by other users. The canonical examples for bitwise control are MSR_PKG_POWER_LIMIT and MSR_DRAM_POWER_LIMIT. We want to provider user space control over power bounds, but if the lock bit is set the power bound cannot be changed without rebooting. As setting very low power bounds can slow performance by a factor of 4x or worse, leaving the lock bit writable allows a crude denial-of-service attack. A second use case for bitwise control is IA32_MISC_ENABLE. This MSR controls a wide variety of processor functionality, some of which is benign ("Performance Energy Bias Hint") and some that might not be ("Automatic Thermal Control Circuit Enable"). Rather than do a formal security review of the dozen features controlled by this MSR, we'd like to take the simpler step of allowing writes to only what we know is safe. Note that bit "Enhanced Intel SpeedStep Technology Select Lock" is a lock bit. Thanks, Marty McFadden
Re: [PATCH 1/7] extcon: palmas: Drop IRQF_EARLY_RESUME flag
Hi Grygorii, On 2016년 02월 27일 00:42, Grygorii Strashko wrote: > Palams extcon IRQs are nested threaded and wired to the Palmas > inerrupt controller. So, this flag is not required for nested irqs > anymore, since commit 3c646f2c6aa9 ("genirq: Don't suspend > nested_thread irqs over system suspend") was merged. > > Cc: MyungJoo Ham > Cc: Chanwoo Choi > Cc: Tony Lindgren > Cc: Lee Jones > Cc: Roger Quadros > Cc: Nishanth Menon > Signed-off-by: Grygorii Strashko > --- > drivers/extcon/extcon-palmas.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/extcon/extcon-palmas.c b/drivers/extcon/extcon-palmas.c > index 93c30a8..0a861b3 100644 > --- a/drivers/extcon/extcon-palmas.c > +++ b/drivers/extcon/extcon-palmas.c > @@ -266,7 +266,7 @@ static int palmas_usb_probe(struct platform_device *pdev) > palmas_usb->id_irq, > NULL, palmas_id_irq_handler, > IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING | > - IRQF_ONESHOT | IRQF_EARLY_RESUME, > + IRQF_ONESHOT, > "palmas_usb_id", palmas_usb); > if (status < 0) { > dev_err(&pdev->dev, "can't get IRQ %d, err %d\n", > @@ -304,7 +304,7 @@ static int palmas_usb_probe(struct platform_device *pdev) > palmas_usb->vbus_irq, NULL, > palmas_vbus_irq_handler, > IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING | > - IRQF_ONESHOT | IRQF_EARLY_RESUME, > + IRQF_ONESHOT, > "palmas_usb_vbus", palmas_usb); > if (status < 0) { > dev_err(&pdev->dev, "can't get IRQ %d, err %d\n", > Applied it on extcon git. Thanks, Chanwoo Choi
[PATCH 2/2] staging: dgap: use tty_alloc_driver instead of kcalloc
>From 60b1e6e5d9401f10f584928d4feeb8a3b72b46a9 Mon Sep 17 00:00:00 2001 From: Daeseok Youn Date: Mon, 29 Feb 2016 11:04:02 +0900 Subject: [PATCH 2/2] staging: dgap: use tty_alloc_driver instead of kcalloc the tty_alloc_driver() can allocate memory for ttys and termios. And also it can release allocated memory easly with using put_tty_driver(). Signed-off-by: Daeseok Youn --- drivers/staging/dgnc/dgnc_tty.c | 86 +++-- 1 file changed, 31 insertions(+), 55 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_tty.c b/drivers/staging/dgnc/dgnc_tty.c index 01a0018..da5cba7 100644 --- a/drivers/staging/dgnc/dgnc_tty.c +++ b/drivers/staging/dgnc/dgnc_tty.c @@ -176,9 +176,15 @@ int dgnc_tty_preinit(void) */ int dgnc_tty_register(struct dgnc_board *brd) { - int rc = 0; + int rc; + + brd->serial_driver = tty_alloc_driver(brd->maxports, + TTY_DRIVER_REAL_RAW | + TTY_DRIVER_DYNAMIC_DEV | + TTY_DRIVER_HARDWARE_BREAK); - brd->serial_driver->magic = TTY_DRIVER_MAGIC; + if (IS_ERR(brd->serial_driver)) + return PTR_ERR(brd->serial_driver); snprintf(brd->SerialName, MAXTTYNAMELEN, "tty_dgnc_%d_", brd->boardnum); @@ -186,31 +192,10 @@ int dgnc_tty_register(struct dgnc_board *brd) brd->serial_driver->name_base = 0; brd->serial_driver->major = 0; brd->serial_driver->minor_start = 0; - brd->serial_driver->num = brd->maxports; brd->serial_driver->type = TTY_DRIVER_TYPE_SERIAL; brd->serial_driver->subtype = SERIAL_TYPE_NORMAL; brd->serial_driver->init_termios = DgncDefaultTermios; brd->serial_driver->driver_name = DRVSTR; - brd->serial_driver->flags = (TTY_DRIVER_REAL_RAW | - TTY_DRIVER_DYNAMIC_DEV | - TTY_DRIVER_HARDWARE_BREAK); - - /* -* The kernel wants space to store pointers to -* tty_struct's and termios's. -*/ - brd->serial_driver->ttys = kcalloc(brd->maxports, -sizeof(*brd->serial_driver->ttys), -GFP_KERNEL); - if (!brd->serial_driver->ttys) - return -ENOMEM; - - kref_init(&brd->serial_driver->kref); - brd->serial_driver->termios = kcalloc(brd->maxports, - sizeof(*brd->serial_driver->termios), - GFP_KERNEL); - if (!brd->serial_driver->termios) - return -ENOMEM; /* * Entry points for driver. Called by the kernel from @@ -224,7 +209,7 @@ int dgnc_tty_register(struct dgnc_board *brd) if (rc < 0) { dev_dbg(&brd->pdev->dev, "Can't register tty device (%d)\n", rc); - return rc; + goto free_serial_driver; } brd->dgnc_Major_Serial_Registered = true; } @@ -234,38 +219,26 @@ int dgnc_tty_register(struct dgnc_board *brd) * again, separately so we don't get the LD confused about what major * we are when we get into the dgnc_tty_open() routine. */ - brd->print_driver->magic = TTY_DRIVER_MAGIC; + brd->print_driver = tty_alloc_driver(brd->maxports, +TTY_DRIVER_REAL_RAW | +TTY_DRIVER_DYNAMIC_DEV | +TTY_DRIVER_HARDWARE_BREAK); + + if (IS_ERR(brd->print_driver)) { + rc = PTR_ERR(brd->print_driver); + goto unregister_serial_driver; + } + snprintf(brd->PrintName, MAXTTYNAMELEN, "pr_dgnc_%d_", brd->boardnum); brd->print_driver->name = brd->PrintName; brd->print_driver->name_base = 0; brd->print_driver->major = brd->serial_driver->major; brd->print_driver->minor_start = 0x80; - brd->print_driver->num = brd->maxports; brd->print_driver->type = TTY_DRIVER_TYPE_SERIAL; brd->print_driver->subtype = SERIAL_TYPE_NORMAL; brd->print_driver->init_termios = DgncDefaultTermios; brd->print_driver->driver_name = DRVSTR; - brd->print_driver->flags = (TTY_DRIVER_REAL_RAW | - TTY_DRIVER_DYNAMIC_DEV | - TTY_DRIVER_HARDWARE_BREAK); - - /* -* The kernel wants space to store pointers to -* tty_struct's and termios's. Must be separated from -* the Serial Driver so we don't get confused -*/ - brd->print_driver->ttys = kcalloc(brd->maxports, - sizeof(*brd->print_driver->ttys), - GFP_KERNEL); - if (!brd->p
[PATCH 1/2] staging: dgnc: use pointer type of tty_struct
>From 70f8703b3bd73fa56f4ea91e98967b8925550aa6 Mon Sep 17 00:00:00 2001 From: Daeseok Youn Date: Thu, 25 Feb 2016 14:53:37 +0900 Subject: [PATCH 1/2] staging: dgnc: use pointer type of tty_struct For using tty_alloc_driver, SerialDriver has to be pointer type. It also has checkpatch.pl warning about Camelcase, so SerialDriver is changed to serial_driver. Signed-off-by: Daeseok Youn --- drivers/staging/dgnc/dgnc_driver.h | 4 +- drivers/staging/dgnc/dgnc_tty.c| 118 ++--- 2 files changed, 61 insertions(+), 61 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.h b/drivers/staging/dgnc/dgnc_driver.h index ce7cd9b..1c7a8fa 100644 --- a/drivers/staging/dgnc/dgnc_driver.h +++ b/drivers/staging/dgnc/dgnc_driver.h @@ -205,9 +205,9 @@ struct dgnc_board { * to our channels. */ - struct tty_driver SerialDriver; + struct tty_driver *serial_driver; charSerialName[200]; - struct tty_driver PrintDriver; + struct tty_driver *print_driver; charPrintName[200]; booldgnc_Major_Serial_Registered; diff --git a/drivers/staging/dgnc/dgnc_tty.c b/drivers/staging/dgnc/dgnc_tty.c index 8b1ba65..01a0018 100644 --- a/drivers/staging/dgnc/dgnc_tty.c +++ b/drivers/staging/dgnc/dgnc_tty.c @@ -178,20 +178,20 @@ int dgnc_tty_register(struct dgnc_board *brd) { int rc = 0; - brd->SerialDriver.magic = TTY_DRIVER_MAGIC; + brd->serial_driver->magic = TTY_DRIVER_MAGIC; snprintf(brd->SerialName, MAXTTYNAMELEN, "tty_dgnc_%d_", brd->boardnum); - brd->SerialDriver.name = brd->SerialName; - brd->SerialDriver.name_base = 0; - brd->SerialDriver.major = 0; - brd->SerialDriver.minor_start = 0; - brd->SerialDriver.num = brd->maxports; - brd->SerialDriver.type = TTY_DRIVER_TYPE_SERIAL; - brd->SerialDriver.subtype = SERIAL_TYPE_NORMAL; - brd->SerialDriver.init_termios = DgncDefaultTermios; - brd->SerialDriver.driver_name = DRVSTR; - brd->SerialDriver.flags = (TTY_DRIVER_REAL_RAW | + brd->serial_driver->name = brd->SerialName; + brd->serial_driver->name_base = 0; + brd->serial_driver->major = 0; + brd->serial_driver->minor_start = 0; + brd->serial_driver->num = brd->maxports; + brd->serial_driver->type = TTY_DRIVER_TYPE_SERIAL; + brd->serial_driver->subtype = SERIAL_TYPE_NORMAL; + brd->serial_driver->init_termios = DgncDefaultTermios; + brd->serial_driver->driver_name = DRVSTR; + brd->serial_driver->flags = (TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_DEV | TTY_DRIVER_HARDWARE_BREAK); @@ -199,28 +199,28 @@ int dgnc_tty_register(struct dgnc_board *brd) * The kernel wants space to store pointers to * tty_struct's and termios's. */ - brd->SerialDriver.ttys = kcalloc(brd->maxports, -sizeof(*brd->SerialDriver.ttys), + brd->serial_driver->ttys = kcalloc(brd->maxports, +sizeof(*brd->serial_driver->ttys), GFP_KERNEL); - if (!brd->SerialDriver.ttys) + if (!brd->serial_driver->ttys) return -ENOMEM; - kref_init(&brd->SerialDriver.kref); - brd->SerialDriver.termios = kcalloc(brd->maxports, - sizeof(*brd->SerialDriver.termios), + kref_init(&brd->serial_driver->kref); + brd->serial_driver->termios = kcalloc(brd->maxports, + sizeof(*brd->serial_driver->termios), GFP_KERNEL); - if (!brd->SerialDriver.termios) + if (!brd->serial_driver->termios) return -ENOMEM; /* * Entry points for driver. Called by the kernel from * tty_io.c and n_tty.c. */ - tty_set_operations(&brd->SerialDriver, &dgnc_tty_ops); + tty_set_operations(brd->serial_driver, &dgnc_tty_ops); if (!brd->dgnc_Major_Serial_Registered) { /* Register tty devices */ - rc = tty_register_driver(&brd->SerialDriver); + rc = tty_register_driver(brd->serial_driver); if (rc < 0) { dev_dbg(&brd->pdev->dev, "Can't register tty device (%d)\n", rc); @@ -234,19 +234,19 @@ int dgnc_tty_register(struct dgnc_board *brd) * again, separately so we don't get the LD confused about what major * we are when we get into the dgnc_tty_open() routine. */ - brd->PrintDriver.magic = TTY_DRIVER_MAGIC; + brd->print_driver->magic = TTY_DRIVER_MAGIC; snprintf(brd->PrintName, MAXTTYNAMELEN, "pr_dgnc_%d
Re: [PATCH v4 01/17] Xen: ACPI: Hide UART used by Xen
On 2016/2/12 6:22, Rafael J. Wysocki wrote: > On Thursday, February 11, 2016 04:04:14 PM Stefano Stabellini wrote: >> > On Wed, 10 Feb 2016, Rafael J. Wysocki wrote: >>> > > On Tuesday, February 09, 2016 11:19:02 AM Stefano Stabellini wrote: > > > On Mon, 8 Feb 2016, Rafael J. Wysocki wrote: > > > > > On Monday, February 08, 2016 10:57:01 AM Stefano Stabellini wrote: >> > > > > > On Sat, 6 Feb 2016, Rafael J. Wysocki wrote: >>> > > > > > > On Fri, Feb 5, 2016 at 4:05 AM, Shannon Zhao >>> > > > > > > wrote: > > > > > > > From: Shannon Zhao > > > > > > > > > > > > > > ACPI 6.0 introduces a new table STAO to list the devices > > > > > > > which are used > > > > > > > by Xen and can't be used by Dom0. On Xen virtual > > > > > > > platforms, the physical > > > > > > > UART is used by Xen. So here it hides UART from Dom0. > > > > > > > > > > > > > > Signed-off-by: Shannon Zhao > > > > > > > Reviewed-by: Stefano Stabellini > > > > > > > >>> > > > > > > >>> > > > > > > Well, this doesn't look right to me. >>> > > > > > > >>> > > > > > > We need to find a nicer way to achieve what you want. >> > > > > > >> > > > > > I take that you are talking about how to honor the STAO table >> > > > > > in Linux. >> > > > > > Do you have any concrete suggestions? > > > > > > > > > > I do. > > > > > > > > > > The last hunk of the patch is likely what it needs to be, > > > > > although I'm > > > > > not sure if the place it is added to is the right one. That's a > > > > > minor thing, > > > > > though. > > > > > > > > > > The other part is problematic. Not that as it doesn't work, but > > > > > because of > > > > > how it works. With these changes the device will be visible to > > > > > the OS (in > > > > > fact to user space even), but will never be "present". I'm not > > > > > sure if > > > > > that's what you want? > > > > > > > > > > It might be better to add a check to acpi_bus_type_and_status() > > > > > that will > > > > > evaluate the "should ignore?" thing and return -ENODEV if this is > > > > > true. This > > > > > way the device won't be visible at all. > > > > > > Something like below? Actually your suggestion is better, thank you! > > > > > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > > > index 78d5f02..4778c51 100644 > > > --- a/drivers/acpi/scan.c > > > +++ b/drivers/acpi/scan.c > > > @@ -1455,6 +1455,9 @@ static int > > > acpi_bus_type_and_status(acpi_handle handle, int *type, > > > if (ACPI_FAILURE(status)) > > > return -ENODEV; > > > > > > +if (acpi_check_device_is_ignored(handle)) > > > +return -ENODEV; > > > + > > > switch (acpi_type) { > > > case ACPI_TYPE_ANY: /* for ACPI_ROOT_OBJECT */ > > > case ACPI_TYPE_DEVICE: > > > >>> > > >>> > > I thought about doing that under ACPI_TYPE_DEVICE, because it shouldn't >>> > > be >>> > > applicable to the other types. But generally, yes. >> > >> > I was pondering about it myself. Maybe an ACPI_TYPE_PROCESSOR object >> > could theoretically be hidden with the STAO? > But this patch won't check for it anyway, will it? > > It seems to be only checking against the UART address or have I missed > anything? > >> > I added the check before >> > the switch because I thought that there would be no harm in being >> > caution about it. >> > >> > >>> > > Plus I'd move the table checks to acpi_scan_init(), so the UART address >>> > > can >>> > > be a static variable in scan.c. >>> > > >>> > > Also maybe rename acpi_check_device_is_ignored() to something like >>> > > acpi_device_should_be_hidden(). >> > >> > Both make sense. Shannon, are you happy to make these changes? > Plus maybe make acpi_device_should_be_hidden() print a (KERN_INFO) message > when it decides to hide something? Ok, will update this patch. Thanks a lot! -- Shannon